When legalizing vector operations on vNi128, they will be split to v1i128
because that is a legal type on ppc64, but then the compiler will crash in
selection dag because it fails to select for these operations. This patch fixes
shift operations. Logical shift right and left shift can be performed in the
vector unit, but algebraic shift right requires being split.
Differential Revision: https://reviews.llvm.org/D32774
llvm-svn: 303307
- '-verify-mahcineinstrs' starts to complain allocatable live-in physical
registers on non-entry or non-landing-pad basic blocks.
- Refactor the XBEGIN translation to define EAX on a dedicated fallback code
path due to XABORT. Add a pseudo instruction to define EAX explicitly to
avoid add physical register live-in.
Differential Revision: https://reviews.llvm.org/D33168
llvm-svn: 303306
In order for an arbitrary callee to access an object
in a caller's stack frame, the 32-bit offset used as
the private pointer needs to be relative to the kernel's
scratch wave offset register.
Convert to this by finding the difference from the current
stack frame and scaling by the wavefront size.
llvm-svn: 303303
Check the MachinePointerInfo for whether the access is
supposed to be relative to the stack pointer.
No tests because this is used in later commits implementing
calls.
llvm-svn: 303301
Summary: Moving LiveRangeShrink to x86 as this pass is mostly useful for archtectures with great register pressure.
Reviewers: MatzeB, qcolombet
Reviewed By: qcolombet
Subscribers: jholewinski, jyknight, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33294
llvm-svn: 303292
Avoids instructions to pack a vector when the source is really
a scalar being broadcast.
Also be smarter and look for per-component fneg.
Doesn't yet handle scalar from upper half of register
or other swizzles.
llvm-svn: 303291
The variables MinGPR/MinG8R were not updated properly when resetting the
offsets, which in the included testcase lead to saving the CR register
in the same location as R30.
This fixes another issue reported in PR26519.
Differential Revision: https://reviews.llvm.org/D33017
llvm-svn: 303257
It only failed on llvm-clang-x86_64-expensive-checks-win, probably
because the TableGen stuff hasn't been regenerated.
Requires a clean build.
llvm-svn: 303252
Summary:
This fixes pr32392.
The lowering pipeline is:
llvm.ppc.cfence in IR -> PPC::CFENCE8 in isel -> Actual instructions in
expandPostRAPseudo.
The reason why expandPostRAPseudo is chosen is because previous passes
are likely eliminating instructions like cmpw 3, 3 (early CSE) and bne-
7, .+4 (some branch pass(s)).
Differential Revision: https://reviews.llvm.org/D32763
llvm-svn: 303205
Using LIS can be quite expensive, so caching of calculated region
live-ins and pressure is implemented. It does two things:
1. Caches the info for the second stage when we schedule with
decreased target occupancy.
2. Tracks the basic block from top to bottom thus eliminating the
need to scan whole register file liveness at every region split
in the middle of the block.
The scheduling is now done in 3 stages instead of two, with the first
one being really a no-op and only used to collect scheduling regions
as sent by the scheduler driver.
There is no functional change to the current behavior, only compilation
speed is affected. In general computeBlockPressure() could be simplified
if we switch to backward RP tracker, because scheduler sends regions
within a block starting from the last upward. We could use a natural
order of upward tracker to seamlessly change between regions of the same
block, since live reg set of a previous tracked region would become a
live-out of the next region. That however requires fixing upward tracker
to properly account defs and uses of the same instruction as both are
contributing to the current pressure. When we converge on the produced
pressure we should be able to switch between them back and forth. In
addition, backward tracker is less expensive as it uses LIS in recede
less often than forward uses it in advance.
At the moment the worst known case compilation time has improved from 26
minutes to 8.5.
Differential Revision: https://reviews.llvm.org/D33117
llvm-svn: 303184
According to Intel's Optimization Reference Manual for SNB+:
" For LEA instructions with three source operands and some specific situations, instruction latency has increased to 3 cycles, and must
dispatch via port 1:
- LEA that has all three source operands: base, index, and offset
- LEA that uses base and index registers where the base is EBP, RBP,or R13
- LEA that uses RIP relative addressing mode
- LEA that uses 16-bit addressing mode "
This patch currently handles the first 2 cases only.
Differential Revision: https://reviews.llvm.org/D32277
llvm-svn: 303183
This factors register pressure estimation mechanism from the
GCNSchedStrategy into the forward tracker to unify interface
with other strategies and expose it to other interested phases.
Differential Revision: https://reviews.llvm.org/D33105
llvm-svn: 303179
This function gives the wrong answer on some non-ELF platforms in some
cases. The function that does the right thing lives in Mangler.h. To try to
discourage people from using this function, give it a different name.
Differential Revision: https://reviews.llvm.org/D33162
llvm-svn: 303134
We don't use section-relative relocations on AArch64, so all symbols must be at
least visible to the linker (i.e. properly global or l_whatever, but not
L_whatever).
llvm-svn: 303118
ARM Neon has native support for half-sized vector registers (64 bits). This
is beneficial for example for 2D and 3D graphics. This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.
*** Performance Analysis
This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.
The results are with -O3 and PGO. A negative percentage is an improvement.
The testsuite was run with a sample size of 4.
** SPEC
* CFP2006/482.sphinx3 -3.34%
A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.
* CFP2000/177.mesa -3.34%
* CINT2000/256.bzip2 +6.97%
My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%. There are also other problems with the codegen in
this loop so there is further room for improvement.
** LLVM testsuite
* SingleSource/Benchmarks/Misc/ReedSolomon -10.75%
There are multiple small SLP vectorizations outside the hot code. It's a bit
surprising that it adds up to 10%. Some of this may be code-layout noise.
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%
The opt-viewer screenshot can be seen at F3218284. We start at a colder store
but the tree leads us into the hottest loop.
* MultiSource/Applications/lambda-0.1.3/lambda -2.68%
* MultiSource/Benchmarks/Bullet/bullet -2.18%
This is using 3D vectors.
* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%
Noise, binary is unchanged.
* MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90%
There is an additional SLP in the cold code. The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.
* MultiSource/Applications/aha/aha +1.63%
* MultiSource/Applications/JM/lencod/lencod +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark +1.15%
Differential Revision: https://reviews.llvm.org/D31965
llvm-svn: 303116
This caused PR33053.
Original commit message:
> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247
llvm-svn: 303115
We were silently ignoring any features we couldn't match up, which led to
errors in an inline asm block missing the conventional "\n\t".
llvm-svn: 303108
Follow up to D33147
NVPTXTargetLowering::LowerCall was trusting the default argument values.
Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.
Differential Revision: https://reviews.llvm.org/D33189
llvm-svn: 303082
This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.
llvm-svn: 303073
Doing this means that if an LEApcrel is used in two places we will rematerialize
instead of generating two MOVs. This is particularly useful for printfs using
the same format string, where we want to generate an address into a register
that's going to get corrupted by the call.
Differential Revision: https://reviews.llvm.org/D32858
llvm-svn: 303054
Doing this lets us hoist it out of loops, and I've also marked it as
rematerializable the same as the thumb1 and thumb2 counterparts.
It looks like it being marked as such was just a mistake, as the commit that
made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the
LEApcrelJT instructions were marked as having side-effects, so it looks like
the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was
accidentally marked as such also.
Differential Revision: https://reviews.llvm.org/D32857
llvm-svn: 303053
NFC followup to D33147, this explicitly sets all the arguments (instead of relying on the defaults) to SelectionDAG::getMemIntrinsicNode to help identify -verify-machineinstrs issues.
llvm-svn: 303047
Replace SelectionDAG::getNode(ISD::SELECT, ...)
and SelectionDAG::getNode(ISD::VSELECT, ...)
with SelectionDAG::getSelect(...)
Saves a few lines of code and in some cases saves the need to explicitly
check the type of the desired node.
llvm-svn: 303024