Ifcvt can replicate instructions as it converts them to be predicated. This
stops that from happening on thumb2 targets at minsize where an extra IT
instruction is likely needed.
Differential Revision: https://reviews.llvm.org/D60089
llvm-svn: 358974
This does two main things, firstly adding some at least basic addressing modes
for i64 types, and secondly treats floats and doubles sensibly when there is no
fpu. The floating point change can help codesize in some cases, especially with
D60294.
Most backends seems to not consider the exact VT in isLegalAddressingMode,
instead switching on type size. That is now what this does when the target does
not have an fpu (as the float data will be loaded using LDR's). i64's currently
use the address range of an LDRD (even though they may be legalised and loaded
with an LDR). This is at least better than marking them all as illegal
addressing modes.
I have not attempted to do much with vectors yet. That will need changing once
MVE is added.
Differential Revision: https://reviews.llvm.org/D60677
llvm-svn: 358845
Summary:
X86 is quite complicated; so I intend to leave it as is. ARM+Aarch64 do
basically the same thing (Aarch64 did not correctly handle immediates,
ARM has a test llvm/test/CodeGen/ARM/2009-04-06-AsmModifier.ll that uses
%a with an immediate) for a flag that should be target independent
anyways.
Reviewers: echristo, peter.smith
Reviewed By: echristo
Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60841
llvm-svn: 358618
Summary:
None of these derived classes do anything that the base class cannot.
If we remove these case statements, then the base class can handle them
just fine.
Reviewers: peter.smith, echristo
Reviewed By: echristo
Subscribers: nemanjai, javed.absar, eraman, kristof.beyls, hiraditya, kbarton, jsji, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60803
llvm-svn: 358603
As discussed on PR41359, this patch renames the pair of shift-mask target feature functions to make their purposes more obvious.
shouldFoldShiftPairToMask -> shouldFoldConstantShiftPairToMask
preferShiftsToClearExtremeBits -> shouldFoldMaskToVariableShiftPair
llvm-svn: 358526
Because CodeGen can't depend on GlobalISel, we need a way to encapsulate the CSE
configs that can be passed between TargetPassConfig and the targets' custom
pass configs. This CSEConfigBase allows targets to create custom CSE configs
which is then used by the GISel passes for the CSEMIRBuilder.
This support will be used in a follow up commit to allow constant-only CSE for
-O0 compiles in D60580.
llvm-svn: 358368
Summary:
The InlineAsm::AsmDialect is only required for X86; no architecture
makes use of it and as such it gets passed around between arch-specific
and general code while being unused for all architectures but X86.
Since the AsmDialect is queried from a MachineInstr, which we also pass
around, remove the additional AsmDialect parameter and query for it deep
in the X86AsmPrinter only when needed/as late as possible.
This refactor should help later planned refactors to AsmPrinter, as this
difference in the X86AsmPrinter makes it harder to make AsmPrinter more
generic.
Reviewers: craig.topper
Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, llvm-commits, peter.smith, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60488
llvm-svn: 358101
Make it possible to TableGen code for FCONSTS and FCONSTD.
We need to make two changes to the TableGen descriptions of vfp_f32imm
and vfp_f64imm respectively:
* add GISelPredicateCode to check that the immediate fits in 8 bits;
* extract the SDNodeXForms into separate definitions and create a
GISDNodeXFormEquiv and a custom renderer function for each of them.
There's a lot of boilerplate to get the actual value of the immediate,
but it basically just boils down to calling ARM_AM::getFP32Imm or
ARM_AM::getFP64Imm.
llvm-svn: 358063
required to be passed as different register types. E.g. <2 x i16> may need to
be passed as a larger <2 x i32> type, so formal arg lowering needs to be able
truncate it back. Likewise, when dealing with returns of these types, they need
to be widened in the appropriate way back.
Differential Revision: https://reviews.llvm.org/D60425
llvm-svn: 358032
Create method `optForNone()` testing for the function level equivalent of
`-O0` and refactor appropriately.
Differential revision: https://reviews.llvm.org/D59852
llvm-svn: 357638
There's an existing optimization for x != C, but somehow it was missing
a special case for 0.
While I'm here, also cleaned up the code/comments a bit: the second
value produced by the MERGE_VALUES was actually dead, since a CMOV only
produces one result.
Differential Revision: https://reviews.llvm.org/D59616
llvm-svn: 357437
It's a little tricky to make this issue show up because
prologue/epilogue emission normally likes to push at least two
registers... but it doesn't when lr is force-spilled due to function
length. Not sure if that really makes sense, but I decided not to touch
it for now.
Differential Revision: https://reviews.llvm.org/D59385
llvm-svn: 357436
G_SELECT uses a 1-bit scalar for the condition, and is currently
implemented with a plain CMPri against 0. This means that values such as
0x1110 are interpreted as true, when instead the higher bits should be
treated as undefined and therefore ignored. Replace the CMPri with a
TSTri against 0x1, which performs an implicit AND, yielding the expected
result.
llvm-svn: 357153
The last reference to this function was removed from the ARM
td files in 2015 in rL225266.
Differential Revision: https://reviews.llvm.org/D59868
llvm-svn: 357130
ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the
memory operands of load and store-multiple operations. This doesn't
really fix the problem properly, but it's enough to prevent crashing,
at least.
Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 .
Differential Revision: https://reviews.llvm.org/D59834
llvm-svn: 357109
This should hopefully lead to minor improvements in code generation, and
more accurate spill/reload comments in assembly.
Also fix isLoadFromStackSlotPostFE/isStoreToStackSlotPostFE so they
don't lead to misleading assembly comments for merged memory operands;
this is technically orthogonal, but in practice the relevant memory
operand lists don't show up without this change.
Differential Revision: https://reviews.llvm.org/D59713
llvm-svn: 356963
We currently use only VLDR/VSTR for all 64-bit loads/stores, so the
memory operands must be word-aligned. Mark aligned operations as legal
and narrow non-aligned ones to 32 bits.
While we're here, also mark non-power-of-2 loads/stores as unsupported.
llvm-svn: 356872
In r322972/r323136, the iteration here was changed to catch cases at the
beginning of a basic block... but we accidentally deleted an important
safety check. Restore that check to the way it was.
Fixes https://bugs.llvm.org/show_bug.cgi?id=41116
Differential Revision: https://reviews.llvm.org/D59680
llvm-svn: 356809
This doesn't have any practical effect at the moment, as far as I know,
because high registers aren't allocatable in Thumb1 mode. But it might
matter in the future.
Differential Revision: https://reviews.llvm.org/D59675
llvm-svn: 356791
This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them
to something like "str r0, [sp]".
For regular stack variables, this optimization was already implemented:
we lower loads and stores using frame indexes, which are expanded later.
However, when constructing a call frame for a call with more than four
arguments, the existing optimization doesn't apply. We need to use
stores which are actually relative to the current value of sp, and don't
have an associated frame index.
This patch adds a special case to handle that construct. At the DAG
level, this is an ISD::STORE where the address is a CopyFromReg from SP
(plus a small constant offset).
This applies only to Thumb1: in Thumb2 or ARM mode, a regular store
instruction can access SP directly, so the COPY gets eliminated by
existing code.
The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related
cleanup: we shouldn't pretend that it can select anything other than
frame indexes.
Differential Revision: https://reviews.llvm.org/D59568
llvm-svn: 356601
This change does two things. One, it ensures compilation will abort
instead of miscompiling if ARMFrameLowering::determineCalleeSaves
chooses not to save LR in a case where it's necessary. Two, it changes
the way we estimate the size of a function to be more conservative in
the presence of constant pool entries and jump tables.
EstimateFunctionSizeInBytes probably still isn't really conservative
enough, but I'm not sure how we can come up with a reliable estimate
before constant islands runs.
Differential Revision: https://reviews.llvm.org/D59439
llvm-svn: 356527
These changes are related to PR37743 and include:
SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.
Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.
Add promoting the integer ABS node in the LegalizeIntegerType.
Expand-based legalization of integer result for the ABS nodes.
Expand-based legalization of ABS vector operations.
Add some integer abs testcases for different typesizes for Thumb arch
Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
tmp = (SRA, Hi, 31)
Lo = (UADDO tmp, Lo)
Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
Lo = (XOR tmp, Lo)
The "detectZextAbsDiff" function is changed for the recognition of pattern with the ABS node. Given a ABS node, detect the following pattern:
(ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).
Change integer abs testcases for codegen with the ABS node support for AArch64.
Indicate that the ABS is legal for the i64 type when the NEON is supported.
Change the integer abs testcases to show changing of codegen.
Add combine and legalization of ABS nodes for Thumb arch.
Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.
For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743
Patch by: @ikulagin (Ivan Kulagin)
Differential Revision: https://reviews.llvm.org/D49837
llvm-svn: 356468
This allows better code size for aarch64 floating point materialization
in a future patch.
Reviewers: evandro
Differential Revision: https://reviews.llvm.org/D58690
llvm-svn: 356389
I am about to introduce some non-power-of-2 width vector MVTs. This
commit fixes a power-of-2 assumption that my forthcoming change would
otherwise break, as shown by test/CodeGen/ARM/vcvt_combine.ll and
vdiv_combine.ll.
Differential Revision: https://reviews.llvm.org/D58927
Change-Id: I56a282e365d3874ab0621e5bdef98a612f702317
llvm-svn: 356341
The constant island pass currently only looks at the instruction immediately
before a branch for a CMP to fold into a CBZ/CBNZ. This extends it to search
backwards for the instruction that defines CPSR. We need to ensure that the
register is not overridden between the CMP and the branch.
Differential Revision: https://reviews.llvm.org/D59317
llvm-svn: 356336
tMOVr and tPUSH/tPOP/tPOP_RET have register constraints which can't be
expressed in TableGen, so check them explicitly. I've unfortunately run
into issues with both of these recently; hopefully this saves some time
for someone else in the future.
Differential Revision: https://reviews.llvm.org/D59383
llvm-svn: 356303
Bail early when we don't have a preheader and also if the target is
big endian because it's written with only little endian in mind!
Differential Revision: https://reviews.llvm.org/D59368
llvm-svn: 356243
When choosing whether a pair of loads can be combined into a single
wide load, we check that the load only has a sext user and that sext
also only has one user. But this can prevent the transformation in
the cases when parallel macs use the same loaded data multiple times.
To enable this, we need to fix up any other uses after creating the
wide load: generating a trunc and a shift + trunc pair to recreate
the narrow values. We also need to keep a record of which loads have
already been widened.
Differential Revision: https://reviews.llvm.org/D59215
llvm-svn: 356132
This patch adds an XCOFF triple object format type into LLVM.
This XCOFF triple object file type will be used later by object file and assembly generation for the AIX platform.
Differential Revision: https://reviews.llvm.org/D58930
llvm-svn: 355989
AMDGPU target run out of Subtarget feature flags hitting the limit of 64.
AssemblerPredicates uses at most uint64_t for their representation.
At the same time CodeGen has exhausted this a long time ago and switched
to a FeatureBitset with the current limit of 192 bits.
This patch completes transition to the bitset for feature bits extending
it to asm matcher and MC code emitter.
Differential Revision: https://reviews.llvm.org/D59002
llvm-svn: 355839
The indexed variant of vfmal.f16 and vfmsl.f16
instructions use the uppser bits of the indexed
operand to store the index (1 bit for the double
variant, 2 bits for the quad).
This limits the usable registers to d0 - d7 or
s0 - s15. This patch enforces this limitation.
Differential Revision: https://reviews.llvm.org/D59021
llvm-svn: 355707
Use this feature to fix a bug on ARM where 4 byte alignment is
incorrectly assumed.
Differential Revision: https://reviews.llvm.org/D57335
llvm-svn: 355685
Use this feature to fix a bug on ARM where 4 byte alignment is
incorrectly assumed.
Differential Revision: https://reviews.llvm.org/D57335
llvm-svn: 355585
Use this feature to fix a bug on ARM where 4 byte alignment is
incorrectly assumed.
Differential Revision: https://reviews.llvm.org/D57335
llvm-svn: 355522
This uses the infrastructure added in rL353152 to sink zext and sexts to
sub/add users, to enable vsubl/vaddl generation when NEON is available.
See https://bugs.llvm.org/show_bug.cgi?id=40025.
Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma
Reviewed By: samparker
Differential Revision: https://reviews.llvm.org/D58063
llvm-svn: 355460
When lowering a select_cc node where the true and false values are of type f16,
we can't use a general conditional move because the FP16 instructions do not
support conditional execution. Instead, we must ensure that the condition code
is one of the four supported by the VSEL instruction.
Differential revision: https://reviews.llvm.org/D58813
llvm-svn: 355385
The isScaledConstantInRange function takes upper and lower bounds which are
checked after dividing by the scale, so the bounds checks for half, single and
double precision should all be the same. Previously, we had wrong bounds checks
for half precision, so selected an immediate the instructions can't actually
represent.
Differential revision: https://reviews.llvm.org/D58822
llvm-svn: 355305
1) GCC complains that KnownValid is set but not used.
2) In ARMInstructionSelector::selectGlobal() the code is mixing "enumeral
and non-enumeral type in conditional expression". Solve this by casting
to unsigned which is the final type anyway.
Differential Revision: https://reviews.llvm.org/D58834
llvm-svn: 355304
The new addressing mode added for the v8.2A FP16 instructions uses bit 8 of the
immediate to encode the sign of the offset, like the other FP loads/stores, so
need to be treated the same way.
Differential revision: https://reviews.llvm.org/D58816
llvm-svn: 355201
This function was not checking for the condition code variants which are
undefined if either input is NaN, so we were missing selection of the VSEL
instruction in some cases when using -fno-honor-nans or -ffast-math.
Differential revision: https://reviews.llvm.org/D58812
llvm-svn: 355199
Summary:
The description of KnownBits::zext() and
KnownBits::zextOrTrunc() has confusingly been telling
that the operation is equivalent to zero extending the
value we're tracking. That has not been true, instead
the user has been forced to explicitly set the extended
bits as known zero afterwards.
This patch adds a second argument to KnownBits::zext()
and KnownBits::zextOrTrunc() to control if the extended
bits should be considered as known zero or as unknown.
Reviewers: craig.topper, RKSimon
Reviewed By: RKSimon
Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58650
llvm-svn: 355099
This gets rid of some duplication in the TableGen definition, but it
forces us to keep both a pointer and a reference to the subtarget in the
ARMInstructionSelector. That is pretty ugly but it might be a reasonable
trade-off, since the TableGen descriptions should outlive the code in
the selector (or in the worst case we can update to use just the
reference when we get rid of DAGISel).
Differential Revision: https://reviews.llvm.org/D58031
llvm-svn: 355083
Add the same level of support as for ARM mode (i.e. still no TLS
support).
In most cases, it is sufficient to replace the opcodes with the
t2-equivalent, but there are some idiosyncrasies that I decided to
preserve because I don't understand the full implications:
* For ARM we use LDRi12 to load from constant pools, but for Thumb we
use t2LDRpci (I'm not sure if the ideal would be to use t2LDRi12 for
Thumb as well, or to use LDRcp for ARM).
* For Thumb we don't have an equivalent for MOV|LDRLIT_ga_pcrel_ldr, so
we have to generate MOV|LDRLIT_ga_pcrel plus a load from GOT.
The tests are in separate files because they're hard enough to read even
without doubling the number of checks.
llvm-svn: 355077
The --disassembler-options, or -M, are used to customize
the disassembler and affect its output.
The two implemented options allow selecting register names on ARM:
* With -Mreg-names-raw, the disassembler uses rNN for all registers.
* With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15,
which is the default behavior of llvm-objdump.
Differential Revision: https://reviews.llvm.org/D57680
llvm-svn: 354870
As requested during review of D57601 <https://reviews.llvm.org/D57601> https://reviews.llvm.org/D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.
Differential Revision: https://reviews.llvm.org/D58490
Note: D58498 landed in several pieces as individual backends were approved. This is the last chunk.
llvm-svn: 354845
This adds a few extra Thumb1 opcodes to improve the peephole opimisers
ability to remove redundant cmp instructions. tADC and tSBC require
a small fixup to prevent MOVS being moved past the instruction, giving
the wrong flags.
Differential Revision: https://reviews.llvm.org/D58281
llvm-svn: 354791
More or less all the instructions defined in the v8.2a full-fp16
extension are defined as UNPREDICTABLE if you put them in an IT block
(Thumb) or use with any condition other than AL (ARM). LLVM didn't
know that, and was happy to conditionalise them.
In order to force these instructions to count as not predicable, I had
to make a small Tablegen change. The code generation back end mostly
decides if an instruction was predicable by looking for something it
can identify as a predicate operand; there's an isPredicable bit flag
that overrides that check in the positive direction, but nothing that
overrides it in the negative direction.
(I considered the alternative approach of actually removing the
predicate operand from those instructions, but thought that it would
be more painful overall for instructions differing only in data type
to have different shapes of operand list. This way, the only code that
has to notice the difference is the if-converter.)
So I've added an isUnpredicable bit alongside isPredicable, and set
that bit on the right subset of FP16 instructions, and also on the
VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be
unpredicable for all data types.
I've included a couple of representative regression tests, both of
which previously caused an fp16 instruction to be conditionalised in
ARM state and (with -arm-no-restrict-it) to be put in an IT block in
Thumb.
Reviewers: SjoerdMeijer, t.p.northover, efriedma
Reviewed By: efriedma
Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57823
llvm-svn: 354768
This adds a number of missing Thumb1 opcodes so that the peephole optimiser can
remove redundant CMP instructions.
Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri
instruction can be used with frame indices. In thumb1 we use tADDframe.
Differential Revision: https://reviews.llvm.org/D57833
llvm-svn: 354667
This is exactly the same as arm mode, so for the instruction selector
tests we just extract them to a new file and run with the same checks
for both arm and thumb mode.
For the legalizer we need to update the tests for soft float a bit, but
only because BL and tBL are slightly different. We could be pedantic and
check that we get a well-formed BL for arm mode and a tBL for thumb, but
for the purposes of the legalizer test it's sufficient to just skip over
the predicate operands in the checks. Also note that we have the
pedantic checks in the divmod test, so we're covered.
llvm-svn: 354665
This adds a number of missing Thumb1 opcodes so that the peephole optimiser can
remove redundant CMP instructions.
Differential Revision: https://reviews.llvm.org/D57833
llvm-svn: 354564
During type promotion, sometimes we convert negative an add with a
negative constant into a sub with a positive constant. The loop that
performs this transformation has two issues:
- it iterates over a set, causing non-determinism.
- it breaks, instead of continuing, when it finds the first
non-negative operand.
Differential Revision: https://reviews.llvm.org/D58452
llvm-svn: 354557
Add the opcode for ADDrr / t2ADDrr to the Opcode cache, as we did for
all other opcodes where the handling is otherwise the same between arm
mode and thumb2.
llvm-svn: 354115
ConvertTruncs is used to replace a trunc for an AND mask, however
this function wasn't working as expected. By performing the change
later, we can create a wide type integer mask instead of a narrow -1
value, which could then be simply removed (incorrectly). Because we
now perform this action later, it's necessary to cache the trunc type
before we perform the promotion.
Differential Revision: https://reviews.llvm.org/D57686
llvm-svn: 354108
The Arm peephole optimiser code keeps track of both an MI and a SubAdd that can
be used to optimise away a CMP. In the rare case that both are found and not
ruled-out as valid, we could end up setting the flags on the wrong one.
Instead make sure we are using SubAdd if it exists, as it will be closer to the
CMP.
The testcase here is a little theoretical, with a dead def of cpsr. It should
hopefully show the point.
Differential Revision: https://reviews.llvm.org/D58176
llvm-svn: 354018
The v8m.base ISA contains movw, which can operate on an unsigned
16-bit value. Add the pattern that converts an add with a negative
value, that could fit into 16-bits when negated, into a sub with that
positive value.
Differential Revision: https://reviews.llvm.org/D57942
llvm-svn: 353692
The whole design of generating LDMs/STMs is fragile and unreliable: it depends on
rescheduling here in the LoadStoreOptimizer that isn't register pressure aware
and regalloc that isn't aware of generating LDMs/STMs.
This patch adds a (hidden) option to control the total number of instructions that
can be re-ordered. I appreciate this looks only a tiny bit better than a hard-coded
constant, but at least it allows more easy experimentation with different values
for now. Ideally we calculate this reorder limit based on some heuristics, and take
register pressure into account. I might be looking into that next.
Differential Revision: https://reviews.llvm.org/D57954
llvm-svn: 353678
This patch accompanies the RFC posted here:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127239.html
This patch adds a new CallBr IR instruction to support asm-goto
inline assembly like gcc as used by the linux kernel. This
instruction is both a call instruction and a terminator
instruction with multiple successors. Only inline assembly
usage is supported today.
This also adds a new INLINEASM_BR opcode to SelectionDAG and
MachineIR to represent an INLINEASM block that is also
considered a terminator instruction.
There will likely be more bug fixes and optimizations to follow
this, but we felt it had reached a point where we would like to
switch to an incremental development model.
Patch by Craig Topper, Alexander Ivchenko, Mikhail Dvoretckii
Differential Revision: https://reviews.llvm.org/D53765
llvm-svn: 353563
In many places in the backend, we like to know whether we're
optimising for code size and this is performed by checking the
current machine function attributes. A subtarget is created on a
per-function basis, so it's possible to know when we're compiling for
code size on construction so record this in the new object.
Differential Revision: https://reviews.llvm.org/D57812
llvm-svn: 353501
Modify GenerateConstantOffsetsImpl to create offsets that can be used
by indexed addressing modes. If formulae can be generated which
result in the constant offset being the same size as the recurrence,
we can generate a pre-indexed access. This allows the pointer to be
updated via the single pre-indexed access so that (hopefully) no
add/subs are required to update it for the next iteration. For small
cores, this can significantly improve performance DSP-like loops.
Differential Revision: https://reviews.llvm.org/D55373
llvm-svn: 353403
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).
Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".
These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.
When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.
Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.
Differential Revision: https://reviews.llvm.org/D57244
llvm-svn: 353043
This prevents Constant Hoisting from pulling the constant out of the block,
allowing us to still produce LDRH/UXTH nodes. LDRB/UXTB (255) is already cheap
by the default getIntImmCost, but I've added it for clarity.
Differential Revision: https://reviews.llvm.org/D57671
llvm-svn: 353040
This cleans up all GetElementPtr creation in LLVM to explicitly pass a
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57173
llvm-svn: 352913
This cleans up all LoadInst creation in LLVM to explicitly pass the
value type rather than deriving it from the pointer's element-type.
Differential Revision: https://reviews.llvm.org/D57172
llvm-svn: 352911
Constants can also be materialised using the negated value and a MVN, and this
case seem to have been missed for Thumb2. To check the constant materialisation
costs, we now call getT2SOImmVal twice, once for the original constant and then
also for its negated value, and this function checks if the constant can both
be splatted or rotated.
This was revealed by a test that optimises for minsize: instead of a LDR
literal pool load and having a literal pool entry, just a MVN with an immediate
is smaller (and also faster).
Differential Revision: https://reviews.llvm.org/D57327
llvm-svn: 352737
And instead just generate a libcall. My motivating example on ARM was a simple:
shl i64 %A, %B
for which the code bloat is quite significant. For other targets that also
accept __int128/i128 such as AArch64 and X86, it is also beneficial for these
cases to generate a libcall when optimising for minsize. On these 64-bit targets,
the 64-bits shifts are of course unaffected because the SHIFT/SHIFT_PARTS
lowering operation action is not set to custom/expand.
Differential Revision: https://reviews.llvm.org/D57386
llvm-svn: 352736
This attempts to optimise negative values used in load/store operands
a little. We currently try to selct them as rr, materialising the
negative constant using a MOV/MVN pair. This instead selects ri with
an immediate of 0, forcing the add node to become a simpler sub.
Differential Revision: https://reviews.llvm.org/D57121
llvm-svn: 352475
As the codebase is now under the Apache 2.0 license with LLVM
Exceptions, and all Arm's contributions, past or future, are under that
new license, this Arm specific LICENSE.TXT is no longer needed, thus
removing it.
llvm-svn: 352376
Support G_SDIV, G_UDIV, G_SREM and G_UREM.
The only significant difference between arm and thumb mode is that we
need to check a different subtarget feature.
llvm-svn: 352346
Same as ARM.
On this occasion we split some of the instruction select tests for more
complicated instructions into their own files, so we can reuse them for
ARM and Thumb mode. Likewise for the legalizer tests.
llvm-svn: 352188
Currently in Arm code, we allocate LR first, under the assumption that
it needs to be saved anyway. Unfortunately this has the disadvantage
that it will require any instructions using it to be the longer thumb2
instructions, not the shorter thumb1 ones.
This switches the order when we are optimising for minsize, returning to
the default order so that more lower registers can be used. It can end
up requiring more pushed registers, but on average produces smaller code.
Differential Revision: https://reviews.llvm.org/D56008
llvm-svn: 351938
In the last stage of type promotion, we replace any zext that uses a
new trunc with the operand of the trunc. This is okay when we only
allowed one type to be optimised, but now its the case that the trunc
maybe needed to produce a more narrow type than the one we were
optimising for. So we need to check this before doing the replacement.
Differential Revision: https://reviews.llvm.org/D57041
llvm-svn: 351935
For AMDGPU the shift amount is never 64-bit, and
this needs to use a 32-bit shift.
X86 uses i8, but seemed to be hacking around this before.
llvm-svn: 351882
This broke the RISCV build, and even with that fixed, one of the RISCV
tests behaves surprisingly differently with asserts than without,
leaving there no clear test pattern to use. Generally it seems bad for
hte IR to differ substantially due to asserts (as in, an alloca is used
with asserts that isn't needed without!) and nothing I did simply would
fix it so I'm reverting back to green.
This also required reverting the RISCV build fix in r351782.
llvm-svn: 351796
This patch may seem familiar... but my previous patch handled the
equivalent lsls+and, not this case. Usually instcombine puts the
"and" after the shift, so this case doesn't come up. However, if the
shift comes out of a GEP, it won't get canonicalized by instcombine,
and DAGCombine doesn't have an equivalent transform.
This also modifies isDesirableToCommuteWithShift to suppress DAGCombine
transforms which would make the overall code worse.
I'm not really happy adding a bunch of code to handle this, but it would
probably be tricky to substantially improve the behavior of DAGCombine
here.
Differential Revision: https://reviews.llvm.org/D56032
llvm-svn: 351776
to reflect the new license.
We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.
Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.
llvm-svn: 351636
Allow varargs functions to be called, both in arm and thumb mode. This
boils down to choosing the correct calling convention, which we can
easily test by making sure arm_aapcscc is used instead of
arm_aapcs_vfpcc when the callee is variadic.
llvm-svn: 351424
Make it possible for TableGen to produce code for selecting MOVi32imm.
This allows reasonably recent ARM targets to select a lot more constants
than before.
We achieve this by adding GISelPredicateCode to arm_i32imm. It's
impossible to use the exact same code for both DAGISel and GlobalISel,
since one uses "Subtarget->" and the other "STI." to refer to the
subtarget. Moreover, in GlobalISel we don't have ready access to the
MachineFunction, so we need to add a bit of code for obtaining it from
the instruction that we're selecting. This is also the reason why it
needs to remain a PatLeaf instead of the more specific IntImmLeaf.
llvm-svn: 351056
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"
Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.
"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".
tests are mostly updated with
// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"
// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"
Patch by Yuanfang Chen (tabloid.adroit)!
Differential Revision: https://reviews.llvm.org/D56351
llvm-svn: 351049
Add t2TEQrr to the map of instructions with can be reduced down into
a T1 instruction. This is a special case because TEQ just sets the
CPSR and doesn't write to a GPR, which is not the case for EOR. So,
we need to ensure that the EOR can write to the first operand.
Differential Revision: https://reviews.llvm.org/D56255
llvm-svn: 350801
Using a PatLeaf for sext_16_node allowed matching smulbb and smlabb
instructions once the operands had been sign extended. But we also
need to use sext_inreg operands along with sext_16_node to catch a
few more cases that enable use to remove the unnecessary sxth.
Differential Revision: https://reviews.llvm.org/D55992
llvm-svn: 350613
This patch adds the sign/zero extension done by
vgetlane to ARM computeKnownBitsForTargetNode.
Differential revision: https://reviews.llvm.org/D56098
llvm-svn: 350553
SB (Speculative Barrier) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SB, as it was previously only possible to
enable by selecting -march=armv8.5-a.
This patch also renames FeatureSpecRestrict to FeatureSB.
Reviewed By: olista01, LukeCheeseman
Differential Revision: https://reviews.llvm.org/D55990
llvm-svn: 350299
This saves materializing the immediate. The additional forms are less
common (they don't usually show up for bitfield insert/extract), but
they're still relevant.
I had to add a new target hook to prevent DAGCombine from reversing the
transform. That isn't the only possible way to solve the conflict, but
it seems straightforward enough.
Differential Revision: https://reviews.llvm.org/D55630
llvm-svn: 349857
All we have to do is mark it as legal.
This allows us to select a lot of new patterns handled by TableGen. This
patch adds tests for them and splits up the existing test file for
binary operators into 2 files, one for arithmetic ops and one for
logical ones.
llvm-svn: 349610
We keep a few iterators into the basic block we're selecting while
performing FastISel. Usually this is fine, but occasionally code wants
to remove already-emitted instructions. When this happens we have to be
careful to update those iterators so they're not pointint at dangling
memory.
llvm-svn: 349365
These features (fairly) recently got split out into their own feature, so we
should make CodeGen use them when available. The main change here is that the
check used to be based on the triple, but now it's based on CPU features.
llvm-svn: 349355
Refactor the ARMInstructionSelector to cache some opcodes in the
constructor instead of checking all the time if we're in ARM or Thumb
mode.
llvm-svn: 349143
Mark G_ADD, G_SUB, G_MUL, G_AND, G_OR and G_XOR as legal for both ARM
and Thumb2.
Extract the legalizer tests for these opcodes into another file.
Add tests for the instruction selector.
llvm-svn: 349142
Mark G_SEXT, G_ZEXT and G_ANYEXT to 32 bits as legal and add support for
them in the instruction selector. This uses handwritten code again
because the patterns that are generated with TableGen are tuned for what
the DAG combiner would produce and not for simple sext/zext nodes.
Luckily, we only need to update the opcodes to use the Thumb2 variants,
everything else can be reused from ARM.
llvm-svn: 349026
Unfortunately we can't use TableGen for this because it doesn't yet
support predicates on the source pattern root. Therefore, add a bit of
handwritten code to the instruction selector to handle the most basic
cases.
Also mark them as legal and extract their legalizer test cases to a new
test file.
llvm-svn: 348920
When we had dynamic call frames (i.e. sp adjustment around each call) we
were including that adjustment into offsets calculated based on r6, even
though it's only sp that changes. This led to incorrect stack slot
accesses.
llvm-svn: 348591
Adds fatal errors for any target that does not support the Tiny or Kernel
codemodels by rejigging the getEffectiveCodeModel calls.
Differential Revision: https://reviews.llvm.org/D50141
llvm-svn: 348585
...yet!
A lot of the current code should be shared for arm and thumb mode, but
until we add tests and work out some of the details (e.g. checking the
correct subtarget feature for G_SDIV) it's safer to bail out as early as
possible for thumb targets.
This should have arguably been part of r348347, which allowed Thumb
functions to be handled by the IR Translator.
llvm-svn: 348472
https://reviews.llvm.org/D54980
This provides a standard API across GISel passes to observe and notify
passes about changes (insertions/deletions/mutations) to MachineInstrs.
This patch also removes the recordInsertion method in MachineIRBuilder
and instead provides method to setObserver.
Reviewed by: vkeles.
llvm-svn: 348406
This has two positive effects. First, using a custom node prevents
recombination leading to an infinite loop since the output DAG is notionally a
little more complex than the input one. Using a flag-setting instruction also
allows the subtraction to be folded with the related comparison more easily.
https://reviews.llvm.org/D53190
llvm-svn: 348122
Currently, variadic operands on an MCInst are assumed to be uses,
because they come after the defs. However, this is not always the case,
for example the Arm/Thumb LDM instructions write to a variable number of
registers.
This adds a property of instruction definitions which can be used to
mark variadic operands as defs. This only affects MCInst, because
MachineInstruction already tracks use/def per operand in each instance
of the instruction, so can already represent this.
This property can then be checked in MCInstrDesc, allowing us to remove
some special cases in ARMAsmParser::isITBlockTerminator.
Differential revision: https://reviews.llvm.org/D54853
llvm-svn: 348114
In the Arm assembly parser, we first match an instruction, then call
processInstruction to possibly change it to a different encoding, to
match rules in the architecture manual which can't be expressed by the
table-generated matcher.
This adds debug printing so that this process is visible when using the
-debug option.
To support this, I've added a new overload of MCInst::dump_pretty which
takes the opcode name as a StringRef, since we don't have an InstPrinter
instance in the assembly parser. Instead, we can get the same
information directly from the MCInstrInfo.
Differential revision: https://reviews.llvm.org/D54852
llvm-svn: 348113
Don't expand SDIV with an immediate that is a power of 2 if we optimise for
minimum code size. For example:
sdiv %1, i32 4
gets expanded to a sequence of 3 instructions, but this is suboptimal for
minimum code size so instead we just generate a MOV and a SDIV if integer
division is supported.
Differential Revision: https://reviews.llvm.org/D54546
llvm-svn: 347965
Scattered ARM relocations for Mach-O's only have 24 bits available to
encode the offset. This is not checked but just truncated and can result
in corrupt binaries after linking because the relocations are applied to
the wrong offset. This patch will check and error out in those
situations instead of emitting a wrong relocation.
Patch by: Sander Bogaert (dzn)
Differential revision: https://reviews.llvm.org/D54776
llvm-svn: 347922
We can now select CLZ via the TableGen'erated code, so support G_CTLZ
and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32.
Legalizer:
If the CLZ instruction is available, use it for both G_CTLZ and
G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and
lower G_CTLZ in terms of it.
In order to achieve this we need to add support to the LegalizerHelper
for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2).
We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF
if that is supported as a libcall, as opposed to just if it is Legal or
Custom. Due to a minor refactoring of the helper function in charge of
this, we will also allow the same behaviour for G_CTTZ and G_CTPOP.
This is not going to be a problem in practice since we don't yet have
support for treating G_CTTZ and G_CTPOP as libcalls (not even in
DAGISel).
Reg bank select:
Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point.
Instruction select:
Nothing to do.
llvm-svn: 347545
Both zext and sext are currently allowed during the search for narrow
sequences and sexts operands are later added to the mac candidates.
But operands of muls are also added, without checking whether they're
sext or zext, which means we can generate a signed smlad when we
shouldn't.
Differential Revision: https://reviews.llvm.org/D54790
llvm-svn: 347542
In ARMOperand::print:
- Print human-readable register names, instead of numbers.
- Print the correct names for IT condition masks (these were in the wrong order
before).
- Print all parts of memory operands, not just the base register.
This makes the output of llvm-mc -show-inst-operands more readable.
Differential revision: https://reviews.llvm.org/D54850
llvm-svn: 347494
Truncs are treated as sources if their produce a value of the same
type as the one we currently trying to promote. Truncs used to be
considered as a sink if their operand was the same value type.
We now allow smaller types in the search, so we should search through
truncs that produce a smaller value. These truncs can then be
converted to an AND mask.
This leaves sinks as being:
- points where the value in the register is being observed, such as
an icmp, switch or store.
- points where value types have to match, such as calls and returns.
- zext are included to ease the transformation and are generally
removed later on.
During this change, it also became apart from truncating sinks was
broken: if a sink used a source, its type information had already
been lost by the time the truncation happens. So I've changed the
method of caching the type information.
Differential Revision: https://reviews.llvm.org/D54515
llvm-svn: 347191
LDRcp should be deleted when the dest register is dead in register
coalescing. Without MemOp, dead LDRcp will cause dead constant pool
value which references to non-existing label.
Patch by Yin Ma.
Differential Revision: https://reviews.llvm.org/D54173
llvm-svn: 346563
Now that we have mixed type sizes, i1 values need to be explicitly
handled as we want to avoid promoting these values.
Differential Revision: https://reviews.llvm.org/D54308
llvm-svn: 346499
Previously, during the search, all values had to have the same
'TypeSize', which is equal to number of bits of the integer type of
the icmp operand. All values in the tree had to match this size;
meaning that, if we searched from i16, we wouldn't accept i8s. A
change in type size requires zext and truncs to perform the casts so,
to allow mixed narrow types, the handling of these instructions is
now slightly different:
- we allow casts if their result or operand is <= TypeSize.
- zexts are sinks if their result > TypeSize.
- truncs are still sinks if their operand == TypeSize.
- truncs are still sources if their result == TypeSize.
The transformation bails on finding an icmp that operates on data
smaller than the current TypeSize.
Differential Revision: https://reviews.llvm.org/D54108
llvm-svn: 346480
A few code movement things:
- AreSymmetrical is now a method of BinOpChain.
- Created a lambda in CreateParallelMACPairs to reduce loop nesting.
- A Reduction object now gets pasted in a couple of places instead,
including CreateParallelMACPairs so it doesn't need to return a
value.
I've also added RecordSequentialLoads, which is run before the
transformation begins, and caches the interesting loads. This can then
be queried later instead of cross checking many load values.
Differential Revision: https://reviews.llvm.org/D54254
llvm-svn: 346479
Generalize code in Thumb2InstrInfo::storeRegToStackSlot() and
loadRegToStackSlot() to allow the GPR class or any of its sub-classes
(including hGPR) to be stored/loaded by ARM::t2STRi12/ARM::t2LDRi12.
Differential Revision: https://reviews.llvm.org/D51927
llvm-svn: 346401
The lowering was missing live-ins in certain cases, like a sequence of
multiple tMOVCCr_pseudo instructions. This would lead to a verifier
failure, and on pre-v6 Thumb CPSR would be incorrectly clobbered.
For reasons I don't completely understand, it's hard to get a sequence
of multiple tMOVCCr_pseudo instructions; the issue only seems to show up
with 64-bit comparisons where the result is zero-extended. I added some
extra testcases in case that changes in the future. Probably some
optimization opportunities here if anyone is interested. (@test_slt_not
is the case that was getting miscompiled.)
The code to check the liveness of CPSR was stolen from
X86ISelLowering.cpp; maybe it could be refactored into common helper,
but I have no idea where to put it.
Differential Revision: https://reviews.llvm.org/D54192
llvm-svn: 346355
Turn the assert in PrepareConstants into a conditon so that we can
handle mul instructions with negative immediates.
Differential Revision: https://reviews.llvm.org/D54094
llvm-svn: 346126
r345840 slightly changed the way promotion happens which could
result in zext and truncs having the same source and destination
types. This fixes that issue.
We can now also remove the zext and trunc in the following case:
(zext (trunc (promoted op)), i32)
This means that we can no longer treat a value, that is only used by
a sink, to be safe to promote.
I've also added in some extra asserts and replaced a cast for a
dyn_cast.
Differential Revision: https://reviews.llvm.org/D54032
llvm-svn: 346125
While mutating instructions, we sign extended negative constant
operands for binary operators that can safely overflow. This was to
allow instructions, such as add nuw i8 %a, -2, to still be able to
perform a subtraction. However, the code to handle constants doesn't
take into consideration that instructions, such as sub nuw i8 -2, %a,
require the i8 -2 to be converted into i32 254.
This is a relatively simple fix, but I've taken the time to
reorganise the code a bit - mainly that instructions that can be
promoted are cached and splitting up the Mutate function.
Differential Revision: https://reviews.llvm.org/D53972
llvm-svn: 345840
Shows up rarely for 64-bit arithmetic, more frequently for the compare
patterns added in r325323.
Differential Revision: https://reviews.llvm.org/D53848
llvm-svn: 345782
optsize using masked wide loads
Under Opt for Size, the vectorizer does not vectorize interleave-groups that
have gaps at the end of the group (such as a loop that reads only the even
elements: a[2*i]) because that implies that we'll require a scalar epilogue
(which is not allowed under Opt for Size). This patch extends the support for
masked-interleave-groups (introduced by D53011 for conditional accesses) to
also cover the case of gaps in a group of loads; Targets that enable the
masked-interleave-group feature don't have to invalidate interleave-groups of
loads with gaps; they could now use masked wide-loads and shuffles (if that's
what the cost model selects).
Reviewers: Ayal, hsaito, dcaballe, fhahn
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53668
llvm-svn: 345705
The "dead" markings allow existing target-independent optimizations,
like MachineSink, to trigger more frequently. The CPSR defs would have
eventually been marked dead by LiveVariables, so this only affects
optimizations before regalloc.
The ARMBaseInstrInfo.cpp change is fixing a bug which is only visible
with this change: the transform adds a use to an otherwise dead def
of CPSR. This is covered by existing regression tests.
thumb2-tbh.ll breaks for Thumb1 due to MachineLICM changing the
generated code; I'll fix it in D53452.
Differential Revision: https://reviews.llvm.org/D53453
llvm-svn: 345420
This mirrors what we already do for AArch64 as the cores are similar.
As discussed in the review, enabling the machine scheduler causes
more variations in performance changes so it is not enabled for now.
This patch improves LNT scores by a geomean of 1.57% at -O3.
Differential Revision: https://reviews.llvm.org/D53562
llvm-svn: 345272
I noticed while fixing PR39368 that we don't have generic shuffle costs for broadcast style shuffles.
This patch adds SK_BROADCAST handling, but exposes ARM/AARCH64 lack of handling of this type, which I've added a fix for at the same time.
Differential Revision: https://reviews.llvm.org/D53570
llvm-svn: 345253
Summary:
Changes all uses of minnan/maxnan to minimum/maximum
globally. These names emphasize that the semantic difference between
these operations is more than just NaN-propagation.
Reviewers: arsenm, aheejin, dschuff, javed.absar
Subscribers: jholewinski, sdardis, wdng, sbc100, jgravelle-google, jrtc27, atanasyan, llvm-commits
Differential Revision: https://reviews.llvm.org/D53112
llvm-svn: 345218
The BKPT instruction is specified to cause a software breakpoint,
and at least on Linux results in a SIGTRAP. This makes it more
suitable for implementing debugtrap than TRAP (aka UDF #254), which
is specified to cause an undefined instruction exception and results
in a SIGILL on Linux.
Moreover, BKPT is not marked as a terminator, which is not only
consistent with the IR instruction but allows the analyzeBlock
function to correctly analyze a basic block containing the instruction,
which fixes an assertion failure in the machine block placement pass
previously triggered by the included test case.
Because BKPT is only supported starting with ARMv5T, we continue to
use UDF #254 when targeting v4T.
Differential Revision: https://reviews.llvm.org/D53614
llvm-svn: 345171
A global alias may use indices which are not considered in bounds. In
such a case, accessing the base object will fail as it only peers
through inbounds accesses. This pattern is used by the swift compiler
to create references to preceeding members in the type metadata. This
would cause the code generation to fail when targeting a platform that
used ELF as the object file format. Be conservative and fail the
read-only check if we run into an alias that we cannot peer through.
llvm-svn: 345107
Previously reverted in rL343082.
Original commit message:
On failing to find sequences that can be converted into dual macs,
try to find sequential 16-bit loads that are used by muls which we
can then use smultb, smulbt, smultt with a wide load.
Differential Revision: https://reviews.llvm.org/D51983
llvm-svn: 344693
This is patch 2/2, following up on D53314, and is the functional change
to prevent fusing mul + add sequences into VFMAs.
Differential revision: https://reviews.llvm.org/D53315
llvm-svn: 344683
This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs
for performance reasons on the Cortex-M4 and Cortex-M33. This is a serie of 2
patches, that is trying to achieve the same for VFMA. The second column in the
table below shows what we were generating before rL342874, the third column
what changed with rL342874, and the last column what we want to achieve with
these 2 patches:
--------------------------------------------------------
| Opt | < rL342874 | >= rL342874 | |
|------------------------------------------------------|
|-O3 | vmla | vmul | vmul |
| | | vadd | vadd |
|------------------------------------------------------|
|-Ofast | vfma | vfma | vmul |
| | | | vadd |
|------------------------------------------------------|
|-Oz | vmla | vmla | vmla |
--------------------------------------------------------
This patch 1/2, is a cleanup of the spaghetti predicate logic on the different
VMLA and VFMA codegen rules, so that we can make the final functional change in
patch 2/2. This also fixes a typo in the regression test added in rL342874.
Differential revision: https://reviews.llvm.org/D53314
llvm-svn: 344671
As I suggested on PR39281, this patch uses PADDL pairwise addition to widen from the vXi8 CTPOP result to the target vector type.
This is a blocker for moving more x86 code to generic vector CTPOP expansion (P32655 + D53258) - ARM's vXi64 CTPOP currently expands, which would generate a vXi64 MUL but ARM's custom lowering expands the general MUL case and vectors aren't well handled in LegalizeDAG - improving the CTPOP lowering was a lot easier than fixing the MUL lowering for this one case......
Differential Revision: https://reviews.llvm.org/D53257
llvm-svn: 344512
interleave-group
The vectorizer currently does not attempt to create interleave-groups that
contain predicated loads/stores; predicated strided accesses can currently be
vectorized only using masked gather/scatter or scalarization. This patch makes
predicated loads/stores candidates for forming interleave-groups during the
Loop-Vectorizer's analysis, and adds the proper support for masked-interleave-
groups to the Loop-Vectorizer's planning and transformation stages. The patch
also extends the TTI API to allow querying the cost of masked interleave groups
(which each target can control); Targets that support masked vector loads/
stores may choose to enable this feature and allow vectorizing predicated
strided loads/stores using masked wide loads/stores and shuffles.
Reviewers: Ayal, hsaito, dcaballe, fhahn, javed.absar
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D53011
llvm-svn: 344472
Moving away from UnknownSize is part of the effort to migrate us to
LocationSizes (e.g. the cleanup promised in D44748).
This doesn't entirely remove all of the uses of UnknownSize; some uses
require tweaks to assume that UnknownSize isn't just some kind of int.
This patch is intended to just be a trivial replacement for all places
where LocationSize::unknown() will Just Work.
llvm-svn: 344186
When deciding if it is safe to optimize a conditional branch to a CBZ or
CBNZ the offsets of the BasicBlocks from the start of the function are
estimated. For inline assembly the generic getInlineAsmLength() function is
used to get a worst case estimate of the inline assembly by multiplying the
number of instructions by the max instruction size of 4 bytes. This
unfortunately doesn't take into account the generation of Thumb implicit IT
instructions. In edge cases such as when all the instructions in the block
are 4-bytes in size and there is an implicit IT then the size is
underestimated. This can cause an out of range CBZ or CBNZ to be generated.
The patch takes a conservative approach and assumes that every instruction
in the inline assembly block may have an implicit IT.
Fixes pr31805
Differential Revision: https://reviews.llvm.org/D52834
llvm-svn: 343960
This rebases and recommits r343520. hwasan should be fixed now and this
shouldn't break the tests anymore.
Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).
Differential Revision: https://reviews.llvm.org/D52125
llvm-svn: 343895
Finally all targets are enabling multiple regalloc hints, so the hook to
disable this can now be removed.
NFC.
Review: Simon Pilgrim
https://reviews.llvm.org/D52316
llvm-svn: 343851
The ARM elf emitter would omit printing data
symbol when constant data. This patch
overrides the emitFill method as to enforce that
the symbol is correctly printed.
Differential revision: https://reviews.llvm.org/D52737
llvm-svn: 343594
Spill/reload instructions are artificially generated by the compiler and
have no relation to the original source code. So the best thing to do is
not attach any debug location to them (instead of just taking the next
debug location we find on following instructions).
Differential Revision: https://reviews.llvm.org/D52125
llvm-svn: 343520
Correctly check for relocations in the constant to promote. And don't
allow promoting a constant multiple times.
This partially fixes https://bugs.llvm.org//show_bug.cgi?id=32780 ;
it's not a complete fix because we also need to prevent
ARMConstantIslands from cloning the constant.
(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)
Differential Revision: https://reviews.llvm.org/D51472
llvm-svn: 343361
This mostly affects IR generated by non-clang frontends because clang
generally sets the alignment of globals explicitly.
Fixes https://bugs.llvm.org//show_bug.cgi?id=32394 .
(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)
Differential Revision: https://reviews.llvm.org/D51469
llvm-svn: 343359
The NoMovt feature prevents the use of MOVW/MOVT
instructions on Cortex-M23 for performance reasons.
These instructions are required for execute only code
so NoMovt should be disabled when that option is enabled.
Differential Revision: https://reviews.llvm.org/D52551
llvm-svn: 343302
This adds two new barrier instructions which can be used to restrict
speculative execution of load instructions.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52484
llvm-svn: 343300
This is a new barrier which limits speculative execution of the
instructions following it.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52477
llvm-svn: 343213
This patch allows targeting Armv8.5-A, adding the architecture to
tablegen and setting the options to be identical to Armv8.4-A for the
time being. Subsequent patches will add support for the different
features included in the Armv8.5-A Reference Manual.
Patch by Pablo Barrio!
Differential revision: https://reviews.llvm.org/D52470
llvm-svn: 343102
When calculating whether a value can safely overflow for use by an
icmp, we weren't checking that the value couldn't wrap around. To do
this we need the icmp to be using a constant, as well as the incoming
add or sub.
bugzilla report: https://bugs.llvm.org/show_bug.cgi?id=39060
Differential Revision: https://reviews.llvm.org/D52463
llvm-svn: 343092
This broke Chromium's Android build (https://crbug.com/889390) and the
polly-aosp buildbot
(http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable).
> Originally committed in rL342210 but was reverted in rL342260 because
> it was causing issues in vectorized code, because I had forgotten to
> ensure that we're operating on scalar values.
>
> Original commit message:
>
> On failing to find sequences that can be converted into dual macs,
> try to find sequential 16-bit loads that are used by muls which we
> can then use smultb, smulbt, smultt with a wide load.
>
> Differential Revision: https://reviews.llvm.org/D51983
llvm-svn: 343082