When we forward a stored value to a load and eliminate it entirely we need to
make sure the liveness of the register is maintained all the way to its use.
Previously we only cleared liveness on the store doing the forwarding, but
there could be other killing uses in between.
We already do the right thing when the load has to be converted into something
else, it was just this one path that skipped it.
llvm-svn: 306318
processFixupValue is called on every relaxation iteration. applyFixup
is only called once at the very end. applyFixup is then the correct
place to do last minute changes and value checks.
While here, do proper range checks again for fixup_arm_thumb_bl. We
used to do it, but dropped because of thumb2. We now do it again, but
use the thumb2 range.
llvm-svn: 306177
This patch contains a pass that transforms CBZ/CBNZ/TBZ/TBNZ instructions into a
conditional branch (Bcc), when the NZCV flags can be set for "free". This is
preferred on targets that have more flexibility when scheduling Bcc
instructions as compared to CBZ/CBNZ/TBZ/TBNZ (assuming all other variables are
equal). This can reduce register pressure and is also the default behavior for
GCC.
A few examples:
add w8, w0, w1 -> cmn w0, w1 ; CMN is an alias of ADDS.
cbz w8, .LBB_2 -> b.eq .LBB0_2 ; single def/use of w8 removed.
add w8, w0, w1 -> adds w8, w0, w1 ; w8 has multiple uses.
cbz w8, .LBB1_2 -> b.eq .LBB1_2
sub w8, w0, w1 -> subs w8, w0, w1 ; w8 has multiple uses.
tbz w8, #31, .LBB6_2 -> b.ge .LBB6_2
In looking at all current sub-target machine descriptions, this transformation
appears to be either positive or neutral.
Differential Revision: https://reviews.llvm.org/D34220.
llvm-svn: 306144
It was trying to do too many things. The basic lumping together of values for
legalization purposes is now handled by G_MERGE_VALUES. More complex things
involving gaps and odd sizes are handled by G_INSERT sequences.
llvm-svn: 306120
Implemented support to AArch64 codegen for ARMv8.1 Large System
Extensions atomic instructions. Where supported, these instructions can
provide atomic operations with higher performance.
Currently supported operations include: fetch_add, fetch_or, fetch_xor,
fetch_smin, fetch_min/max (signed and unsigned), swap, and
compare_exchange.
This implementation implies sequential-consistency ordering, more
relaxed ordering is under development.
Subtarget->hasLSE is currently supported for Cavium ThunderX2T99.
Patch by Ananth Jasty.
Differential Revision: https://reviews.llvm.org/D33586
Change-Id: I82f6d3d64255622791ceb0715b7ab9f4dc4d4b2c
llvm-svn: 305893
There should be at most a single kill flag for the
promoted operand between the store/load pair.
Discussed in https://reviews.llvm.org/D34402.
llvm-svn: 305889
Summary:
This patch updates promoteLoadFromStore to use the store MachineOperand as the
source operand of the of the new instruction instead of creating a new
register MachineOperand. This way, the existing register flags are
preserved.
This fixes PR33468 (https://bugs.llvm.org/show_bug.cgi?id=33468).
Reviewers: MatzeB, t.p.northover, junbuml
Reviewed By: MatzeB
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34402
llvm-svn: 305885
Use llvm::make_unique to avoid ambiguity with MSVC.
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.
Differential Revision: https://reviews.llvm.org/D34144
llvm-svn: 305690
Summary:
This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.
Reviewers: craig.topper, evandro, t.p.northover, atrick, MatzeB
Reviewed By: MatzeB
Subscribers: atrick, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D34144
llvm-svn: 305677
Summary:
Scheduling AESE/AESMC and AESD/AESIMC instruction pairs back-to-back
gives a double digit speedup on benchmarks using those instructions on
Cortex-A processors. In GCC, this optimization is part of the generic
processor model as well.
This change should not have a major performance impact on processors
that do not optimize AES instruction pairs, although I only had access
to Cortex-A processors for benchmarking.
Reviewers: rengolin, kristof.beyls, javed.absar, evandro, silviu.baranga, MatzeB, mcrosier, joelkevinjones, joel_k_jones, bmakam, t.p.northover
Reviewed By: evandro
Subscribers: sbaranga, aemerson, llvm-commits
Differential Revision: https://reviews.llvm.org/D33836
llvm-svn: 305457
The "Add/sub (shifted reg)" instructions use the 31 encoding for xzr and wzr
rather than the SP, so we need to use different variants.
Situations where this actually comes up are rare enough (see test-case) that I
think falling back to DAG is fine.
llvm-svn: 305230
Summary: The method TargetTransformInfo::getRegisterBitWidth() is declared const, but the type erasing implementation classes (TargetTransformInfo::Concept & TargetTransformInfo::Model) that were introduced by Chandler in https://reviews.llvm.org/D7293 do not have the method declared const. This is an NFC to tidy up the const consistency between TTI and its implementation.
Reviewers: chandlerc, rnk, reames
Reviewed By: reames
Subscribers: reames, jfb, arsenm, dschuff, nemanjai, nhaehnle, javed.absar, sbc100, jgravelle-google, llvm-commits
Differential Revision: https://reviews.llvm.org/D33903
llvm-svn: 305189
Summary:
- Fix assertion failures on F16 to/from int types in FastISel by falling
back to regular ISel
- Add a testcase of various conversion cases with FastISel (-O0)
Reviewers: kristof.beyls, jmolloy, SjoerdMeijer
Reviewed By: SjoerdMeijer
Subscribers: SjoerdMeijer, llvm-commits, srhines, pirama, aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D33734
llvm-svn: 305127
This creates a new library called BinaryFormat that has all of
the headers from llvm/Support containing structure and layout
definitions for various types of binary formats like dwarf, coff,
elf, etc as well as the code for identifying a file from its
magic.
Differential Revision: https://reviews.llvm.org/D33843
llvm-svn: 304864
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
TargetPassConfig is not useful for targets that do not use the CodeGen
library, so we may just as well store a pointer to an
LLVMTargetMachine instead of just to a TargetMachine.
While at it, also change the constructor to take a reference instead of a
pointer as the TM must not be nullptr.
llvm-svn: 304247
Summary:
Currently FPOWI defaults to Legal and LegalizeDAG.cpp turns Legal into Expand for this opcode because Legal is a "lie".
This patch changes the default for this opcode to Expand and removes the hack from LegalizeDAG.cpp. It also removes all the code in the targets that set this opcode to Expand themselves since they can just rely on the default.
Reviewers: spatel, RKSimon, efriedma
Reviewed By: RKSimon
Subscribers: jfb, dschuff, sbc100, jgravelle-google, nemanjai, javed.absar, andrew.w.kaylor, llvm-commits
Differential Revision: https://reviews.llvm.org/D33530
llvm-svn: 304215
- Remove all uses of base sched model entries and set them all to
Unsupported so all the opcodes are described in
AArch64SchedFalkorDetails.td.
- Remove entries for unsupported half-float opcodes.
- Remove entries for unsupported LSE extension opcodes.
- Add entry for MOVbaseTLS (and set Sched in base td file entry to
WriteSys) and a few other pseudo ops.
- Fix a few FP load/store with reg offset entries to use the LSLfast
predicates.
- Add Q size BIF/BIT/BSL entries.
- Fix swapped Q/D sized CLS/CLZ/CNT/RBIT entires.
- Fix pre/post increment address register latency (this operand is
always dest 0).
- Fix swapped FCVTHD/FCVTHS/FCVTDH/FCVTDS entries.
- Fix XYZ resource over usage on LD[1-4] opcodes.
llvm-svn: 304108
- Rewrite livein calculation to use the computeLiveIns() helper
function. This is slightly less efficient but easier to reason about
and doesn't unnecessarily add pristine and reserved registers[1]
- Zero the status register at the beginning of the loop to make sure it
has a defined value.
- Remove kill flags of values that need to stay alive throughout the loop.
[1] An upcoming commit of mine will tighten the MachineVerifier to catch
these.
llvm-svn: 304048
Summary:
This is used in the Linux kernel, and effectively just means "print an
address". This brings back r193593.
Reviewed by: Renato Golin
Reviewers: t.p.northover, rengolin, richard.barton.arm, kristof.beyls
Subscribers: aemerson, javed.absar, llvm-commits, eraman
Differential Revision: https://reviews.llvm.org/D33558
llvm-svn: 303901
Summary:
This patch makes instruction fusion more aggressive by
* adding artificial edges between the successors of FirstSU and
SecondSU, similar to BaseMemOpClusterMutation::clusterNeighboringMemOps.
* updating PostGenericScheduler::tryCandidate to keep clusters together,
similar to GenericScheduler::tryCandidate.
This change increases the number of AES instruction pairs generated on
Cortex-A57 and Cortex-A72. This doesn't change code at all in
most benchmarks or general code, but we've seen improvement on kernels
using AESE/AESMC and AESD/AESIMC.
Reviewers: evandro, kristof.beyls, t.p.northover, silviu.baranga, atrick, rengolin, MatzeB
Reviewed By: evandro
Subscribers: aemerson, rengolin, MatzeB, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33230
llvm-svn: 303618
This commit fixes a bug introduced in r301019 where optimizeLogicalImm
would replace a logical node's immediate operand that was CSE'd and
was also an operand of another node.
This commit fixes the bug by replacing the logical node instead of its
immediate operand.
rdar://problem/32295276
llvm-svn: 303607
Summary:
This causes them to be re-computed more often than necessary but resolves
objections that were raised post-commit on r301750.
Reviewers: qcolombet, ab, t.p.northover, rovka, kristof.beyls
Reviewed By: qcolombet
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D32861
llvm-svn: 303418
This provides a new way to access the TargetMachine through
TargetPassConfig, as a dependency.
The patterns replaced here are:
* Passes handling a null TargetMachine call
`getAnalysisIfAvailable<TargetPassConfig>`.
* Passes not handling a null TargetMachine
`addRequired<TargetPassConfig>` and call
`getAnalysis<TargetPassConfig>`.
* MachineFunctionPasses now use MF.getTarget().
* Remove all the TargetMachine constructors.
* Remove INITIALIZE_TM_PASS.
This fixes a crash when running `llc -start-before prologepilog`.
PEI needs StackProtector, which gets constructed without a TargetMachine
by the pass manager. The StackProtector pass doesn't handle the case
where there is no TargetMachine, so it segfaults.
Related to PR30324.
Differential Revision: https://reviews.llvm.org/D33222
llvm-svn: 303360
We don't use section-relative relocations on AArch64, so all symbols must be at
least visible to the linker (i.e. properly global or l_whatever, but not
L_whatever).
llvm-svn: 303118
ARM Neon has native support for half-sized vector registers (64 bits). This
is beneficial for example for 2D and 3D graphics. This patch adds the option
to lower MinVecRegSize from 128 via a TTI in the SLP Vectorizer.
*** Performance Analysis
This change was motivated by some internal benchmarks but it is also
beneficial on SPEC and the LLVM testsuite.
The results are with -O3 and PGO. A negative percentage is an improvement.
The testsuite was run with a sample size of 4.
** SPEC
* CFP2006/482.sphinx3 -3.34%
A pretty hot loop is SLP vectorized resulting in nice instruction reduction.
This used to be a +22% regression before rL299482.
* CFP2000/177.mesa -3.34%
* CINT2000/256.bzip2 +6.97%
My current plan is to extend the fix in rL299482 to i16 which brings the
regression down to +2.5%. There are also other problems with the codegen in
this loop so there is further room for improvement.
** LLVM testsuite
* SingleSource/Benchmarks/Misc/ReedSolomon -10.75%
There are multiple small SLP vectorizations outside the hot code. It's a bit
surprising that it adds up to 10%. Some of this may be code-layout noise.
* MultiSource/Benchmarks/VersaBench/beamformer/beamformer -8.40%
The opt-viewer screenshot can be seen at F3218284. We start at a colder store
but the tree leads us into the hottest loop.
* MultiSource/Applications/lambda-0.1.3/lambda -2.68%
* MultiSource/Benchmarks/Bullet/bullet -2.18%
This is using 3D vectors.
* SingleSource/Benchmarks/Shootout-C++/Shootout-C++-lists +6.67%
Noise, binary is unchanged.
* MultiSource/Benchmarks/Ptrdist/anagram/anagram +4.90%
There is an additional SLP in the cold code. The test runs for ~1sec and
prints out over 2000 lines. This is most likely noise.
* MultiSource/Applications/aha/aha +1.63%
* MultiSource/Applications/JM/lencod/lencod +1.41%
* SingleSource/Benchmarks/Misc/richards_benchmark +1.15%
Differential Revision: https://reviews.llvm.org/D31965
llvm-svn: 303116
This caused PR33053.
Original commit message:
> The new experimental reduction intrinsics can now be used, so I'm enabling this
> for AArch64. We will need this for SVE anyway, so it makes sense to do this for
> NEON reductions as well.
>
> The existing code to match shufflevector patterns are replaced with a direct
> lowering of the reductions to AArch64-specific nodes. Tests updated with the
> new, simpler, representation.
>
> Differential Revision: https://reviews.llvm.org/D32247
llvm-svn: 303115
We were silently ignoring any features we couldn't match up, which led to
errors in an inline asm block missing the conventional "\n\t".
llvm-svn: 303108
This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.
llvm-svn: 303073
For stores, check if the stored value is defined by a floating point
instruction and if yes, we return a default mapping with FPR instead
of GPR.
llvm-svn: 302679
The new experimental reduction intrinsics can now be used, so I'm enabling this
for AArch64. We will need this for SVE anyway, so it makes sense to do this for
NEON reductions as well.
The existing code to match shufflevector patterns are replaced with a direct
lowering of the reductions to AArch64-specific nodes. Tests updated with the
new, simpler, representation.
Differential Revision: https://reviews.llvm.org/D32247
llvm-svn: 302678
For the ELF case, the default/preferred form is the generic one, not
the short one as used for Apple - fix the comment to say so. Currently
it is a copy-paste typo.
Make the comments on the darwin default a bit more verbose.
Use enum names instead of literal 0/1 to further increase readability
and reduce fragility.
Differential Revision: https://reviews.llvm.org/D32963
llvm-svn: 302634
This pass uses a new target hook to decide whether or not to expand a particular
intrinsic to the shuffevector sequence.
Differential Revision: https://reviews.llvm.org/D32245
llvm-svn: 302631
The AArch64 instruction set has a few "widening" instructions (e.g., uaddl,
saddl, uaddw, etc.) that take one or more doubleword operands and produce
quadword results. The operands are automatically sign- or zero-extended as
appropriate. However, in LLVM IR, these extends are explicit. This patch
updates TTI to consider these widening instructions as single operations whose
cost is attached to the arithmetic instruction. It marks extends that are part
of a widening operation "free" and applies a sub-target specified overhead
(zero by default) to the arithmetic instructions.
Differential Revision: https://reviews.llvm.org/D32706
llvm-svn: 302582
Use variadic templates instead of relying on <cstdarg> + sentinel.
This enforces better type checking and makes code more readable.
Differential Revision: https://reviews.llvm.org/D32541
llvm-svn: 302571
Using arguments with attribute inalloca creates problems for verification
of machine representation. This attribute instructs the backend that the
argument is prepared in stack prior to CALLSEQ_START..CALLSEQ_END
sequence (see http://llvm.org/docs/InAlloca.htm for details). Frame size
stored in CALLSEQ_START in this case does not count the size of this
argument. However CALLSEQ_END still keeps total frame size, as caller can
be responsible for cleanup of entire frame. So CALLSEQ_START and
CALLSEQ_END keep different frame size and the difference is treated by
MachineVerifier as stack error. Currently there is no way to distinguish
this case from actual errors.
This patch adds additional argument to CALLSEQ_START and its
target-specific counterparts to keep size of stack that is set up prior to
the call frame sequence. This argument allows MachineVerifier to calculate
actual frame size associated with frame setup instruction and correctly
process the case of inalloca arguments.
The changes made by the patch are:
- Frame setup instructions get the second mandatory argument. It
affects all targets that use frame pseudo instructions and touched many
files although the changes are uniform.
- Access to frame properties are implemented using special instructions
rather than calls getOperand(N).getImm(). For X86 and ARM such
replacement was made previously.
- Changes that reflect appearance of additional argument of frame setup
instruction. These involve proper instruction initialization and
methods that access instruction arguments.
- MachineVerifier retrieves frame size using method, which reports sum of
frame parts initialized inside frame instruction pair and outside it.
The patch implements approach proposed by Quentin Colombet in
https://bugs.llvm.org/show_bug.cgi?id=27481#c1.
It fixes 9 tests failed with machine verifier enabled and listed
in PR27481.
Differential Revision: https://reviews.llvm.org/D32394
llvm-svn: 302527
This fixes PR32550, in a way that does not imply running the greedy
mode at O0.
The fix consists in checking if a load is used by any floating point
instruction and if yes, we return a default mapping with FPR instead
of GPR.
llvm-svn: 302453
In r292478, we changed the order of the enum that is referenced by
PMI_FirstXXX. This had the side effect of changing the cost of the
mapping of all the loads, instead of just the FPRs ones.
Reinstate the higher cost for all but GPR loads.
Note: This did not have any external visible effects:
- For Fast mode, the cost would have been higher, but we don't care
because we don't try to use alternative mappings.
- For Greedy mode, the higher cost of the GPR loads, would have
triggered the use of the supposedly alternative mapping, that
would be in fact the same GPR mapping but with a lower cost.
llvm-svn: 302452
This is a step toward having statically allocated instruciton mapping.
We are going to tablegen them eventually, so let us reflect that in
the API.
NFC.
llvm-svn: 302316
Summary:
Remove the AArch64AddressTypePromotion pass as we migrated all transformations
done in this pass into CGP in r299379.
Reviewers: qcolombet, jmolloy, javed.absar, mcrosier
Reviewed By: qcolombet
Subscribers: aemerson, rengolin, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D31623
llvm-svn: 302245
That's only a required extension as of v8.1a.
Remove it from the "generic" CPU as well: it should only support the
base ISA (and binutils agrees).
Also unify the MC tests into crc.s and arm64-crc32.s
llvm-svn: 302077
Summary:
Do three things to help with that:
- Add AttributeList::FirstArgIndex, which is an enumerator currently set
to 1. It allows us to change the indexing scheme with fewer changes.
- Add addParamAttr/removeParamAttr. This just shortens addAttribute call
sites that would otherwise need to spell out FirstArgIndex.
- Remove some attribute-specific getters and setters from Function that
take attribute list indices. Most of these were only used from
BuildLibCalls, and doesNotAlias was only used to test or set if the
return value is malloc-like.
I'm happy to split the patch, but I think they are probably easier to
review when taken together.
This patch should be NFC, but it sets the stage to change the indexing
scheme to this, which is more convenient when indexing into an array:
0: func attrs
1: retattrs
2...: arg attrs
Reviewers: chandlerc, pete, javed.absar
Subscribers: david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D32811
llvm-svn: 302060
Remove "_NC" suffix and semantics from TLSDESC_LD{64,32}_LO12 and
TLSDESC_ADD_LO12 relocations
Rearrange ordering in AArch64.def to follow relocation encoding
Fix name:
R_AARCH64_P32_LD64_GOT_LO12_NC => R_AARCH64_P32_LD32_GOT_LO12_NC
Add support for several "TLS", "TLSGD", and "TLSLD" relocations for
ILP32
Fix return values from isNonILP32reloc
Add implementations for
R_AARCH64_ADR_PREL_PG_HI21_NC, R_AARCH64_P32_LD32_GOT_LO12_NC,
R_AARCH64_P32_TLSIE_LD32_GOTTPREL_LO12_NC,
R_AARCH64_P32_TLSDESC_LD32_LO12, R_AARCH64_LD64_GOT_LO12_NC,
*TLSLD_LDST128_DTPREL_LO12, *TLSLD_LDST128_DTPREL_LO12_NC,
*TLSLE_LDST128_TPREL_LO12, *TLSLE_LDST128_TPREL_LO12_NC
Modify error messages to give name of equivalent relocation in the
ABI not being used, along with better checking for non-existent
requested relocations.
Added assembler support for "pg_hi21_nc"
Relocation definitions added without implementations:
R_AARCH64_P32_TLSDESC_ADR_PREL21, R_AARCH64_P32_TLSGD_ADR_PREL21,
R_AARCH64_P32_TLSGD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_ADR_PREL21,
R_AARCH64_P32_TLSLD_ADR_PAGE21, R_AARCH64_P32_TLSLD_ADD_LO12_NC,
R_AARCH64_P32_TLSLD_LD_PREL19, R_AARCH64_P32_TLSDESC_LD_PREL19,
R_AARCH64_P32_TLSGD_ADR_PAGE21, R_AARCH64_P32_TLS_DTPREL,
R_AARCH64_P32_TLS_DTPMOD, R_AARCH64_P32_TLS_TPREL,
R_AARCH64_P32_TLSDESC
Fix encoding:
R_AARCH64_P32_TLSDESC_ADR_PAGE21
Reviewers: Peter Smith
Patch by: Joel Jones (jjones@cavium.com)
Differential Revision: https://reviews.llvm.org/D32072
llvm-svn: 301980
TLSDESC_ADD_LO12 relocations
Rearrange ordering in AArch64.def to follow relocation encoding
Fix name:
R_AARCH64_P32_LD64_GOT_LO12_NC => R_AARCH64_P32_LD32_GOT_LO12_NC
Add support for several "TLS", "TLSGD", and "TLSLD" relocations for
ILP32
Fix return values from isNonILP32reloc
Add implementations for
R_AARCH64_ADR_PREL_PG_HI21_NC, R_AARCH64_P32_LD32_GOT_LO12_NC,
R_AARCH64_P32_TLSIE_LD32_GOTTPREL_LO12_NC,
R_AARCH64_P32_TLSDESC_LD32_LO12, R_AARCH64_LD64_GOT_LO12_NC,
*TLSLD_LDST128_DTPREL_LO12, *TLSLD_LDST128_DTPREL_LO12_NC,
*TLSLE_LDST128_TPREL_LO12, *TLSLE_LDST128_TPREL_LO12_NC
Modify error messages to give name of equivalent relocation in the
ABI not being used, along with better checking for non-existent
requested relocations.
Added assembler support for "pg_hi21_nc"
Relocation definitions added without implementations:
R_AARCH64_P32_TLSDESC_ADR_PREL21, R_AARCH64_P32_TLSGD_ADR_PREL21,
R_AARCH64_P32_TLSGD_ADD_LO12_NC, R_AARCH64_P32_TLSLD_ADR_PREL21,
R_AARCH64_P32_TLSLD_ADR_PAGE21, R_AARCH64_P32_TLSLD_ADD_LO12_NC,
R_AARCH64_P32_TLSLD_LD_PREL19, R_AARCH64_P32_TLSDESC_LD_PREL19,
R_AARCH64_P32_TLSGD_ADR_PAGE21, R_AARCH64_P32_TLS_DTPREL,
R_AARCH64_P32_TLS_DTPMOD, R_AARCH64_P32_TLS_TPREL,
R_AARCH64_P32_TLSDESC
Fix encoding:
R_AARCH64_P32_TLSDESC_ADR_PAGE21
Reviewers: Peter Smith
Patch by: Joel Jones (jjones@cavium.com)
Differential Revision: https://reviews.llvm.org/D32072
llvm-svn: 301939
Summary:
Predicate<> now has a field to indicate how often it must be recomputed.
Currently, there are two frequencies, per-module (RecomputePerFunction==0)
and per-function (RecomputePerFunction==1). Per-function predicates are
currently recomputed more frequently than necessary since the only predicate
in this category is cheap to test. Per-module predicates are now computed in
getSubtargetImpl() while per-function predicates are computed in selectImpl().
Tablegen now manages the PredicateBitset internally. It should only be
necessary to add the required includes.
Also fixed a problem revealed by the test case where
constrainSelectedInstRegOperands() would attempt to tie operands that
BuildMI had already tied.
Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar
Reviewed By: rovka
Subscribers: kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D32491
llvm-svn: 301750
This eliminates many extra 'Idx' induction variables in loops over
arguments in CodeGen/ and Target/. It also reduces the number of places
where we assume that ReturnIndex is 0 and that we should add one to
argument numbers to get the corresponding attribute list index.
NFC
llvm-svn: 301666
This patch replaces the separate APInts for KnownZero/KnownOne with a single KnownBits struct. This is similar to what was done to ValueTracking's version recently.
This is largely a mechanical transformation from KnownZero to Known.Zero.
Differential Revision: https://reviews.llvm.org/D32569
llvm-svn: 301620
1. RegisterClass::getSize() is split into two functions:
- TargetRegisterInfo::getRegSizeInBits(const TargetRegisterClass &RC) const;
- TargetRegisterInfo::getSpillSize(const TargetRegisterClass &RC) const;
2. RegisterClass::getAlignment() is replaced by:
- TargetRegisterInfo::getSpillAlignment(const TargetRegisterClass &RC) const;
This will allow making those values depend on subtarget features in the
future.
Differential Revision: https://reviews.llvm.org/D31783
llvm-svn: 301221
Instruction isb takes as an operand either 'sy' or an immediate value. This
improves the diagnostic when the string is not 'sy' and adds a test case for
this which was missing. This also adds tests to check invalid inputs for dsb
and dmb.
Differential Revision: https://reviews.llvm.org/D32227
llvm-svn: 301165
This reverts commit r301105, 4, 3 and 1, as a follow up of the previous
revert, which broke even more bots.
For reference:
Revert "[APInt] Use operator<<= where possible. NFC"
Revert "[APInt] Use operator<<= instead of shl where possible. NFC"
Revert "[APInt] Use ashInPlace where possible."
PR32754.
llvm-svn: 301111
Summary:
Some targets need to be able to do more complex rendering than just adding an
operand or two to an instruction. For example, it may need to insert an
instruction to extract a subreg first, or it may need to perform an operation
on the operand.
In SelectionDAG, targets would create SDNode's to achieve the desired effect
during the complex pattern predicate. This worked because SelectionDAG had a
form of garbage collection that would take care of SDNode's that were created
but not used due to a later predicate rejecting a match. This doesn't translate
well to GlobalISel and the churn was wasteful.
The API changes in this patch enable GlobalISel to accomplish the same thing
without the waste. The API is now:
InstructionSelector::OptionalComplexRendererFn selectArithImmed(MachineOperand &Root) const;
where Root is the root of the match. The return value can be omitted to
indicate that the predicate failed to match, or a function with the signature
ComplexRendererFn can be returned. For example:
return OptionalComplexRendererFn(
[=](MachineInstrBuilder &MIB) { MIB.addImm(Immed).addImm(ShVal); });
adds two immediate operands to the rendered instruction. Immed and ShVal are
captured from the predicate function.
As an added bonus, this also reduces the amount of information we need to
provide to GIComplexOperandMatcher.
Depends on D31418
Reviewers: aditya_nandakumar, t.p.northover, qcolombet, rovka, ab, javed.absar
Reviewed By: ab
Subscribers: dberris, kristof.beyls, igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31761
llvm-svn: 301079
The code assumed that when saving an additional CSR register
(ExtraCSSpill==true) we would have a free register throughout the
function. This was not true if this CSR register is also used to pass
values as in the swiftself case.
rdar://31451816
llvm-svn: 301057
In addition to the original commit, tighten the condition for when to
pad empty functions to COFF Windows. This avoids running into problems
when targeting e.g. Win32 AMDGPU, which caused test failures when this
was committed initially.
llvm-svn: 301047
Empty functions can lead to duplicate entries in the Guard CF Function
Table of a binary due to multiple functions sharing the same RVA,
causing the kernel to refuse to load that binary.
We had a terrific bug due to this in Chromium.
It turns out we were already doing this for Mach-O in certain
situations. This patch expands the code for that in
AsmPrinter::EmitFunctionBody() and renames
TargetInstrInfo::getNoopForMachoTarget() to simply getNoop() since it
seems it was used for not just Mach-O anyway.
Differential Revision: https://reviews.llvm.org/D32330
llvm-svn: 301040
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
This recommits r300932 and r300930, which was causing dag-combine to
loop forever. The problem was that optimizeLogicalImm was returning
true even when there was no change to the immediate node (which happened
when the immediate was all zeros or ones), which caused dag-combine to
push and pop the same node to the work list over and over again without
making any progress.
This commit fixes the bug by returning false early in optimizeLogicalImm
if the immediate is all zeros or ones. Also, it changes the code to
compare the immediate with 0 or Mask rather than calling
countPopulation.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
llvm-svn: 301019
Factor out the common code used for generating addresses into common
templated functions that call overloaded versions of a new function,
getTargetNode.
Tested with make check-llvm with targets AArch64.
Differential Revision: https://reviews.llvm.org/D32169
llvm-svn: 301005
Summary:
The SelectionDAG importer now imports rules with Predicate's attached via
Requires, PredicateControl, etc. These predicates are implemented as
bitset's to allow multiple predicates to be tested together. However,
unlike the MC layer subtarget features, each target only pays for it's own
predicates (e.g. AArch64 doesn't have 192 feature bits just because X86
needs a lot).
Both AArch64 and X86 derive at least one predicate from the MachineFunction
or Function so they must re-initialize AvailableFeatures before each
function. They also declare locals in <Target>InstructionSelector so that
computeAvailableFeatures() can use the code from SelectionDAG without
modification.
Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab
Reviewed By: rovka
Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D31418
llvm-svn: 300993
It's causing llvm-clang-x86_64-expensive-checks-win to fail to compile and I
haven't worked out why. Reverting to make it green while I figure it out.
llvm-svn: 300978
Summary:
The SelectionDAG importer now imports rules with Predicate's attached via
Requires, PredicateControl, etc. These predicates are implemented as
bitset's to allow multiple predicates to be tested together. However,
unlike the MC layer subtarget features, each target only pays for it's own
predicates (e.g. AArch64 doesn't have 192 feature bits just because X86
needs a lot).
Both AArch64 and X86 derive at least one predicate from the MachineFunction
or Function so they must re-initialize AvailableFeatures before each
function. They also declare locals in <Target>InstructionSelector so that
computeAvailableFeatures() can use the code from SelectionDAG without
modification.
Reviewers: rovka, qcolombet, aditya_nandakumar, t.p.northover, ab
Reviewed By: rovka
Subscribers: aemerson, rengolin, dberris, kristof.beyls, llvm-commits, igorb
Differential Revision: https://reviews.llvm.org/D31418
llvm-svn: 300964
It seems that r300930 was creating an infinite loop in dag-combine when
compling the following file:
MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c
llvm-svn: 300940
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
This recommits r300913, which broke bots because I didn't fix a call to
ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of
TargetLoweringOpt and TargetLowering.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
llvm-svn: 300930
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
llvm-svn: 300913
Single-threaded fences aren't required to provide any synchronization with
other processing elements so there's no need for a DMB. They should still be a
barrier for compiler optimizations though.
llvm-svn: 300905
Currently fmov #0 with a vector destination is handle incorrectly and results in
fmov #-1.9375 being emitted but should instead give an error. This is due to the
way we cope with fmov #0 with a scalar destination being an alias of fmov zr, so
fix this by actually doing it through an alias.
Differential Revision: https://reviews.llvm.org/D31949
llvm-svn: 300830
When an integer is used as an fp immediate we're failing to check the return
value of getFP64Imm, so invalid values are silently permitted. Fix this by
merging together the integer and real handling.
llvm-svn: 300828
This fixes PR32471.
As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.
I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.
Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.
llvm-svn: 300664
This patch uses lshrInPlace to replace code where the object that lshr is called on is being overwritten with the result.
This adds an lshrInPlace(const APInt &) version as well.
Differential Revision: https://reviews.llvm.org/D32155
llvm-svn: 300566
This reverts r300535 and r300537.
The newly added tests in test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll
produces slightly different code between LLVM versions being built with different compilers.
E.g., dependent on the compiler LLVM is built with, either one of the following
can be produced:
remark: <unknown>:0:0: unable to legalize instruction: %vreg0<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg2; (in function: vector_of_pointers_extractelement)
remark: <unknown>:0:0: unable to legalize instruction: %vreg2<def>(p0) = G_EXTRACT_VECTOR_ELT %vreg1, %vreg0; (in function: vector_of_pointers_extractelement)
Non-determinism like this is clearly a bad thing, so reverting this until
I can find and fix the root cause of the non-determinism.
llvm-svn: 300538
This fixes PR32471.
As comment 10 on that bug report highlights
(https://bugs.llvm.org//show_bug.cgi?id=32471#c10), there are quite a
few different defendable design tradeoffs that could be made, including
not representing pointers at all in LLT.
I decided to go for representing vector-of-pointer as a concept in LLT,
while keeping the size of the LLT type 64 bits (this is an increase from
48 bits before). My rationale for keeping pointers explicit is that on
some targets probably it's very handy to have the distinction between
pointer and non-pointer (e.g. 68K has a different register bank for
pointers IIRC). If we keep a scalar pointer, it probably is easiest to
also have a vector-of-pointers to keep LLT relatively conceptually clean
and orthogonal, while we don't have a very strong reason to break that
orthogonality. Once we gain more experience on the use of LLT, we can
of course reconsider this direction.
Rejecting vector-of-pointer types in the IRTranslator is also an option
to avoid the crash reported in PR32471, but that is only a very
short-term solution; also needs quite a bit of code tweaks in places,
and is probably fragile. Therefore I didn't consider this the best
option.
llvm-svn: 300535
It's basically a terrible idea anyway but objc_msgSend gets emitted like that.
We can decide on a better way to deal with it in the unlikely event that anyone
actually uses it.
llvm-svn: 300474
It's almost certainly not a good idea to actually use it in most cases (there's
a pretty large code size overhead on AArch64), but we can't do those
experiments until it's supported.
llvm-svn: 300462
This further improves Ahmed's change in rL299482. See the new comment for the
rationale.
The patch recovers most of the regression for bzip2 after D31965. We're down
to +2.68% from +6.97%.
Differential Revision: https://reviews.llvm.org/D32028
llvm-svn: 300276
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.
Interleaved access vectorization enabled.
BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.
Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631
llvm-svn: 300052
This patch assumes that the dependents to be scanned for the ExitSU are its
predecessors; otherwise, the successors of the instr are scanned.
Furthermore, sometimes the ExitSU was being fused twice, since it may be
fused once when scanning the successors from the beginning of the BB and
then again when scanning the predecessors of ExitSU. Thus, when scanning
the successors of an instr, skip the ExitSU.
llvm-svn: 299974
This patch refactors and strengthens the type checks performed for interleaved
accesses. The primary functional change is to ensure that the interleaved
accesses have valid element types. The added test cases previously failed
because the element type is f128.
Differential Revision: https://reviews.llvm.org/D31817
llvm-svn: 299864
This concludes the refinements to Falkor Machine Model.
It includes SchedPredicates for immediate zero and LSL Fast.
Forwarding logic is also modeled for vector multiply and
accumulate only.
llvm-svn: 299810
When using -ffixed-x18, the x18 (or w18) register can safely be used
with the "global register variable" GCC extension, but the backend
fails to recognize it.
Patch by Roland McGrath.
Differential Revision: https://reviews.llvm.org/D31793
llvm-svn: 299799
Summary: This resolves the issue of tablegen-erated includes in the headers for non-GlobalISel builds in a simpler way than before.
Reviewers: qcolombet, ab
Reviewed By: ab
Subscribers: igorb, ab, mgorny, dberris, rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30998
llvm-svn: 299637
A number of backends (AArch64, MIPS, ARM) have been using
MCContext::reportError to report issues such as out-of-range fixup values in
their TgtAsmBackend. This is great, but because MCContext couldn't easily be
threaded through to the adjustFixupValue helper function from its usual
callsite (applyFixup), these backends ended up adding an MCContext* argument
and adding another call to applyFixup to processFixupValue. Adding an
MCContext parameter to applyFixup makes this unnecessary, and even better -
applyFixup can take a reference to MCContext rather than a potentially null
pointer.
Differential Revision: https://reviews.llvm.org/D30264
llvm-svn: 299529
This improves upon r246462: that prevented FMOVs from being emitted
for the cross-class INSERT_SUBREGs by disabling the formation of
INSERT_SUBREGs of LOAD. But the ld1.s that we started selecting
caused us to introduce partial dependencies on the vector register.
Avoid that by using SCALAR_TO_VECTOR: it's a first-class citizen that
is folded away by many patterns, including the scalar LDRS that we
want in this case.
Credit goes to Adam for finding the issue!
llvm-svn: 299482
This mode is just like -mcmodel=small except that it moves the
thread pointer from TPIDR_EL0 to TPIDR_EL1.
Patch by Roland McGrath.
Differential Revision: https://reviews.llvm.org/D31624
llvm-svn: 299462
Summary:
Lift the restrictions that prevented the tree walking introduced in the
previous change and add support for patterns like:
(G_ADD (G_MUL (G_SEXT $src1), (G_SEXT $src2)), $src3) -> SMADDWrrr $dst, $src1, $src2, $src3
Also adds support for G_SEXT and G_ZEXT to support these cases.
One particular aspect of this that I should draw attention to is that I've
tried to be overly conservative in determining the safety of matches that
involve non-adjacent instructions and multiple basic blocks. This is intended
to be used as a cheap initial check and we may add a more expensive check in
the future. The current rules are:
* Reject if any instruction may load/store (we'd need to check for intervening
memory operations.
* Reject if any instruction has implicit operands.
* Reject if any instruction has unmodelled side-effects.
See isObviouslySafeToFold().
Reviewers: t.p.northover, javed.absar, qcolombet, aditya_nandakumar, ab, rovka
Reviewed By: ab
Subscribers: igorb, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D30539
llvm-svn: 299430
Summary:
Move the aarch64-type-promotion pass within the existing type promotion framework in CGP.
This change also support forking sexts when a new sext is required for promotion.
Note that change is based on D27853 and I am submitting this out early to provide a better idea on D27853.
Reviewers: jmolloy, mcrosier, javed.absar, qcolombet
Reviewed By: qcolombet
Subscribers: llvm-commits, aemerson, rengolin, mcrosier
Differential Revision: https://reviews.llvm.org/D28680
llvm-svn: 299379