This enables peeling of loops with low dynamic iteration count by default,
when profile information is available.
Differential Revision: https://reviews.llvm.org/D27734
llvm-svn: 295796
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.
Also allow using clamp with f16, and use knowledge
of dx10_clamp.
llvm-svn: 295788
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.
This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.
General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
which returns an array of flags marking boundaries of vectorized
load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
that constructs a vector according to the plan.
Differential Revision: https://reviews.llvm.org/D30011
llvm-svn: 295784
For whatever reason ld64 requires that member headers (not the member
themselves) should be aligned. The only way to do that is to edit the
previous member so that it ends at an aligned boundary.
Since modifying data put in an archive is an undesirable property,
llvm-ar should only do it when it is absolutely necessary.
llvm-svn: 295765
This is part of trying to clean up our handling of min/max patterns in IR.
By converting these to canonical form, we're more likely to recognize them
because there are various places in InstCombine that don't use
matchSelectPattern or m_SMax and friends.
The backend fixups referenced in the now deleted TODO comment were added with:
https://reviews.llvm.org/rL291392https://reviews.llvm.org/rL289738
If there's any codegen fallout from this change, we should be able to address
it in DAGCombiner or target-specific lowering.
llvm-svn: 295758
Before frame offsets are calculated, try to eliminate the
frame indexes used by SGPR spills. Then we can delete them
after.
I think for now we can be sure that no other instruction
will be re-using the same frame indexes. It should be easy
to notice if this assumption ever breaks since everything
asserts if it tries to use a dead frame index later.
The unused emergency stack slot seems to still be left behind,
so an additional 4 bytes is still wasted.
llvm-svn: 295753
Conflicting debug info for function arguments causes hard-to-debug
assertions in the DWARF backend, so the Verifier should reject it.
For performance reasons this only checks function arguments from
non-inlined debug intrinsics for now.
rdar://problem/30520286
llvm-svn: 295749
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64. The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.
The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used. One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.
This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).
Reviewers: t.p.northover, qcolombet, MatzeB
Subscribers: aemerson, mcrosier, sebpop, llvm-commits
Differential Revision: https://reviews.llvm.org/D28813
llvm-svn: 295746
As i64 isn't a value type on 32-bit targets, we fail to fold the VZEXT_LOAD into VPBROADCASTQ.
Also shows that we're not decoding VPERMIV3 shuffles very well....
llvm-svn: 295729
This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns.
llvm-svn: 295723
Currently just contains one case where we combine to VZEXT_MOVL instead of VZEXT which would avoid the need for a zero vector to be generated
llvm-svn: 295721
Summary:
This is a fix for assertion failure in
`getInverseMinMaxSelectPattern` when ABS is passed in as a select pattern.
We should not be invoking the simplification rule for
ABS(MIN(~ x,y))) or ABS(MAX(~x,y)) combinations.
Added a test case which would cause an assertion failure without the patch.
Reviewers: sanjoy, majnemer
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30051
llvm-svn: 295719
They are all covered by the SSE4.2 intrinsics test with SSE4.2, AVX, and AVX512 command lines.
Merge sse42.ll into the other intrinsics test. Rename sse42_64.ll to be named like other intrinsic tests.
llvm-svn: 295707
The new method introduced under "-lsr-exp-narrow" option (currenlty set to true).
Summary:
The method is based on registers number mathematical expectation and should be
generally closer to optimal solution.
Please see details in comments to
"LSRInstance::NarrowSearchSpaceByDeletingCostlyFormulas()" function
(in lib/Transforms/Scalar/LoopStrengthReduce.cpp).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D29862
From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 295704
They are all covered by the SSE2 intrinsics test with SSE2, AVX, and AVX512 command lines.
Also remove an unneeded lfence intrinsic test since it was already covered.
llvm-svn: 295700
They are all covered by the SSE intrinsics test with SSE, AVX, and AVX512 command lines.
Also remove an unneeded sfence intrinsic test since it was already covered.
llvm-svn: 295699
Summary:
Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency.
This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do.
Reviewers: zvi
Reviewed By: zvi
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30181
llvm-svn: 295697
Summary:
Currently, BranchFolder drops DebugLoc for branch instructions in some places. For example, for the test code attached, the branch instruction of 'entry' block has a DILocation of
```
!12 = !DILocation(line: 6, column: 3, scope: !11)
```
, but this information is gone when then block is lowered because BranchFolder misses it. This patch is a fix for this issue.
Reviewers: qcolombet, aprantl, craig.topper, MatzeB
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29902
llvm-svn: 295684
Summary:
Each OperandPredicateMatcher shouldn't need to know how to generate the expression
to reference a MachineOperand. The OperandMatcher should provide it.
In addition to separating responsibilities, this also lays some groundwork for
decoupling source patterns from destination patterns to allow invented operands
or operands provided by GlobalISel's equivalent to the ComplexPattern<> class.
Depends on D29709
Reviewers: t.p.northover, ab, rovka, qcolombet, aditya_nandakumar
Reviewed By: ab
Subscribers: dberris, kristof.beyls, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D29710
llvm-svn: 295668