If a 16-bit thumb STM with writeback stores the base register but it isn't the
first register in the list, then an unknown value is stored. The load/store
optimizer knows this and generates a 32-bit STM without writeback instead, but
thumb2 size reduction converts it into a 16-bit STM. Fix this by having thumb2
size reduction notice such STMs and leave them as they are.
Differential Revision: https://reviews.llvm.org/D78493
If the stack pointer is altered for local variables and we are generating
Thumb2 execute-only code the .pad directive is missing.
Usually the size of the adjustment is stored in a PC-relative location
and loaded into a register which is then added to the stack pointer.
However when we are generating execute-only code code the size of the
adjustment is instead generated using the MOVW/MOVT instruction pair.
As a by product of handling the execute-only case this also fixes an
existing issue that in the none execute-only case the .pad directive was
generated against the load of the constant to a register instruction,
instead of the instruction which adds the register to the stack pointer.
Differential Revision: https://reviews.llvm.org/D76849
When decided whether to generate a post-inc load/store, look at the
other memory nodes that use the same base address and, if any proceed
the current node, then don't do the combine.
The change only seems to be affecting the Arm backend, which I was
surprised at, but it appears to fix a lot of our issues around MVE
masked load/stores having to store a temporary address after an early
post-increment on a shared base address.
Differential Revision: https://reviews.llvm.org/D75847
The new behavior matches GNU objdump. A pair of angle brackets makes tests slightly easier.
`.foo:` is not unique and thus cannot be used in a `CHECK-LABEL:` directive.
Without `-LABEL`, the CHECK line can match the `Disassembly of section`
line and causes the next `CHECK-NEXT:` to fail.
```
Disassembly of section .foo:
0000000000001634 .foo:
```
Bdragon: <> has metalinguistic connotation. it just "feels right"
Reviewed By: rupprecht
Differential Revision: https://reviews.llvm.org/D75713
Summary:
It is not safe for ARMConstantIslands to undoLRSpillRestore. PrologEpilogInserter is
the one to ensure stack alignment, taking into consideration LR is spilled or not.
For noreturn function with StackAlignment 8 (function contains call/alloc),
undoLRSpillRestore cause stack be mis-aligned. Fixing stack alignment in
ARMConstantIslands doesn't give us much benefit, as undo LR spill/restore only
occur in large function with near branches only, also doesn't have callee-saved LR spill.
Reviewers: t.p.northover, rengolin, efriedma, apazos, samparker, ostannard
Reviewed By: ostannard
Subscribers: dmgreen, ostannard, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75288
This adds infrastructure to print and parse MIR MachineOperand comments.
The motivation for the ARM backend is to print condition code names instead of
magic constants that are difficult to read (for human beings). For example,
instead of this:
dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14, $noreg
t2Bcc %bb.4, 0, killed $cpsr
we now print this:
dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14 /* CC::always */, $noreg
t2Bcc %bb.4, 0 /* CC:eq */, killed $cpsr
This shows that MachineOperand comments are enclosed between /* and */. In this
example, the EOR instruction is not conditionally executed (i.e. it is "always
executed"), which is encoded by the 14 immediate machine operand. Thus, now
this machine operand has /* CC::always */ as a comment. The 0 on the next
conditional branch instruction represents the equal condition code, thus now
this operand has /* CC:eq */ as a comment.
As it is a comment, the MI lexer/parser completely ignores it. The benefit is
that this keeps the change in the lexer extremely minimal and no target
specific parsing needs to be done. The changes on the MIPrinter side are also
minimal, as there is only one target hooks that is used to create the machine
operand comments.
Differential Revision: https://reviews.llvm.org/D74306
Summary:
Backends should fold the subtraction into the comparison, but not all
seem to. Moreover, on targets where pointers are not integers, such as
CHERI, an integer subtraction is not appropriate. Instead we should just
compare the two pointers directly, as this should work everywhere and
potentially generate more efficient code.
Reviewers: bogner, lebedev.ri, efriedma, t.p.northover, uweigand, sunfish
Reviewed By: lebedev.ri
Subscribers: dschuff, sbc100, arichardson, jgravelle-google, hiraditya, aheejin, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74454
All of
"no-frame-pointer-elim-non-leaf"
"no-frame-pointer-elim-non-leaf"="true"
"no-frame-pointer-elim-non-leaf"="false"
mean "frame-pointer"="non-leaf", which is quite counter-intuitive.
llvmorg-10-init-16046-ga36ddf0aa9d accidentally broke it.
This fixes the -DLLVM_ENABLE_EXPENSIVE_CHECKS=On test:
```
*** Bad machine code: Non-flag-setting Thumb1 mov is v6-only ***
- function: pass_C
- basic block: %bb.0 entry (0x1fc9bf0)
- instruction: $r0 = tMOVr killed $r6, 14, $noreg
```
Provides support for using r6-r11 as globally scoped
register variables. This requires a -ffixed-rN flag
in order to reserve rN against general allocation.
If for a given GRV declaration the corresponding flag
is not found, or the the register in question is the
target's FP, we fail with a diagnostic.
Differential Revision: https://reviews.llvm.org/D68862
InstrEmitter's virtual register handling assumes that clones are emitted
after the cloned node. Make sure this assumption actually holds.
Fixes a "Node emitted out of order - early" assertion on the testcase.
This is probably a very rare case to actually hit in practice; even
without the explicit edge, the scheduler will usually end up scheduling
the nodes in the expected order due to other constraints.
Differential Revision: https://reviews.llvm.org/D68068
llvm-svn: 373782
Darwin platforms need the frame register to always point at a valid record even
if it's not updated in a leaf function. Backtraces are more important than one
extra GPR.
llvm-svn: 373738
Summary:
This catches malformed mir files which specify alignment as log2 instead of pow2.
See https://reviews.llvm.org/D65945 for reference,
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: MatzeB, qcolombet, dschuff, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67433
llvm-svn: 371608
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 369664
It caused assertions to fire when building Chromium:
lib/CodeGen/LiveDebugValues.cpp:331: bool
{anonymous}::LiveDebugValues::OpenRangesSet::empty() const: Assertion
`Vars.empty() == VarLocs.empty() && "open ranges are inconsistent"' failed.
See https://crbug.com/992871#c3 for how to reproduce.
> Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
>
> To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
>
> Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 368579
Patch https://reviews.llvm.org/D43256 introduced more aggressive loop layout optimization which depends on profile information. If profile information is not available, the statically estimated profile information(generated by BranchProbabilityInfo.cpp) is used. If user program doesn't behave as BranchProbabilityInfo.cpp expected, the layout may be worse.
To be conservative this patch restores the original layout algorithm in plain mode. But user can still try the aggressive layout optimization with -force-precise-rotation-cost=true.
Differential Revision: https://reviews.llvm.org/D65673
llvm-svn: 368339
Fix for https://bugs.llvm.org/show_bug.cgi?id=42760. A tBR_JTr
instruction is duplicated by tail duplication, which results in
the same jumptable with the same label being emitted twice.
Fix this by marking tBR_JTr as not duplicable. The corresponding
ARM/Thumb instructions are already marked as not duplicable.
Additionally also mark tTBB_JT and tTBH_JT to be consistent with
Thumb2, even though this shouldn't be strictly necessary.
Differential Revision: https://reviews.llvm.org/D65606
llvm-svn: 367753
This is extremely specific, but saves three instructions when it's
legal. I don't think the code can be usefully generalized.
Differential Revision: https://reviews.llvm.org/D65351
llvm-svn: 367492
Thumb1 has very limited immediate modes, so turning an "and" into a
shift can save multiple instructions.
It's possible to simplify the generated code for test2 and test3 in
cmp-and-fold.ll a little more, but I'll implement that as a followup.
Differential Revision: https://reviews.llvm.org/D65175
llvm-svn: 367491
Currently, stack protector loads and stores are resolved during
LocalStackSlotAllocation (if the pass needs to run). When this is the
case, the base register assigned to the frame access is going to be one
of the vregs created during LocalStackSlotAllocation. This means that we
are keeping a pointer to the stack protector slot, and we're using this
pointer to load and store to it.
In case register pressure goes up, we may end up spilling this pointer
to the stack, which can be a security concern.
Instead, leave it to PEI to resolve the frame accesses. In order to do
that, we make all stack protector accesses go through frame index
operands, then PEI will resolve this using an offset from sp/fp/bp.
Differential Revision: https://reviews.llvm.org/D64759
llvm-svn: 367068
The arm condition codes for GE is N==V (and for LT is N!=V). If the source of
flags cannot set V (overflow), such as a cmp against #0, then we can use the
simpler PL and MI conditions that only check N. As these PL/MI conditions are
simpler than GE/LT, other passes like the peephole optimiser can have a better
time optimising away the redundant CMPs.
The exception is the VSEL instruction, which cannot take the PL code, so there
the transform favours GE.
Differential Revision: https://reviews.llvm.org/D64160
llvm-svn: 365117
There were two issues here: one, some of the relevant instructions were
missing the expected "FrameSetup" flag, and two,
ARMAsmPrinter::EmitUnwindingInstruction wasn't expecting "mov"
instructions in the prologue.
I'm sticking the additional state into ARMFunctionInfo so it's obvious
it only applies to the current function.
I considered a few alternative approaches where we would compute the
correct unwind information as part of the prologue/epilogue lowering,
but it seems like a lot of work to introduce pseudo-instructions, and
the current code seems to be reliable enough.
Fixes https://bugs.llvm.org/show_bug.cgi?id=42408.
Differential Revision: https://reviews.llvm.org/D63964
llvm-svn: 364970
The current implementation of ThumbRegisterInfo::saveScavengerRegister
is bad for two reasons: one, it's buggy, and two, it blocks using R12
for other optimizations. So this patch gets rid of it, and adds the
necessary support for using an ordinary emergency spill slot on Thumb1.
(Specifically, I think saveScavengerRegister was broken by r305625, and
nobody noticed for two years because the codepath is almost never used.
The new code will also probably not be used much, but it now has better
tests, and if we fail to emit a necessary emergency spill slot we get a
reasonable error message instead of a miscompile.)
A rough outline of the changes in the patch:
1. Gets rid of ThumbRegisterInfo::saveScavengerRegister.
2. Modifies ARMFrameLowering::determineCalleeSaves to allocate an
emergency spill slot for Thumb1.
3. Implements useFPForScavengingIndex, so the emergency spill slot isn't
placed at a negative offset from FP on Thumb1.
4. Modifies the heuristics for allocating an emergency spill slot to
support Thumb1. This includes fixing ExtraCSSpill so we don't try to
use "lr" as a substitute for allocating an emergency spill slot.
5. Allocates a base pointer in more cases, so the emergency spill slot
is always accessible.
6. Modifies ARMFrameLowering::ResolveFrameIndexReference to compute the
right offset in the new cases where we're forcing a base pointer.
7. Ensures we never generate a load or store with an offset outside of
its frame object. This makes the heuristics more straightforward.
8. Changes Thumb1 prologue and epilogue emission so it never uses
register scavenging.
Some of the changes to the emergency spill slot heuristics in
determineCalleeSaves affect ARM/Thumb2; hopefully, they should allow
the compiler to avoid allocating an emergency spill slot in cases
where it isn't necessary. The rest of the changes should only affect
Thumb1.
Differential Revision: https://reviews.llvm.org/D63677
llvm-svn: 364490
This patch changes MIR stack-id from an integer to an enum,
and adds printing/parsing support for this in MIR files. The default
stack-id '0' is now renamed to 'default'.
This should make MIR tests that have stack objects with different stack-ids
more descriptive. It also clarifies code operating on StackID.
Reviewers: arsenm, thegameg, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60137
llvm-svn: 363533
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:
* a latch block
* it has two successors, one is loop header, another is exit
* it has more than one predecessors
If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.
Differential Revision: https://reviews.llvm.org/D43256
llvm-svn: 363471
Summary:
Relate bug: https://bugs.llvm.org/show_bug.cgi?id=37472
The shrink wrapping pass prematurally restores the stack, at a point where the stack might still be accessed.
Taking an exception can cause the stack to be corrupted.
As a first approach, this patch is overly conservative, assuming that any instruction that may load or store could access
the stack.
Reviewers: dmgreen, qcolombet
Reviewed By: qcolombet
Subscribers: simpal01, efriedma, eli.friedman, javed.absar, llvm-commits, eugenis, chill, carwil, thegameg
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63152
llvm-svn: 363265
This adds a simple extra test for constant hoisting to show it's
usefulness with constant addresses like those seen in memory
mapped registers in embedded systems.
llvm-svn: 358114
There's an existing optimization for x != C, but somehow it was missing
a special case for 0.
While I'm here, also cleaned up the code/comments a bit: the second
value produced by the MERGE_VALUES was actually dead, since a CMOV only
produces one result.
Differential Revision: https://reviews.llvm.org/D59616
llvm-svn: 357437
This takes sequences like "mov r4, sp; str r0, [r4]", and optimizes them
to something like "str r0, [sp]".
For regular stack variables, this optimization was already implemented:
we lower loads and stores using frame indexes, which are expanded later.
However, when constructing a call frame for a call with more than four
arguments, the existing optimization doesn't apply. We need to use
stores which are actually relative to the current value of sp, and don't
have an associated frame index.
This patch adds a special case to handle that construct. At the DAG
level, this is an ISD::STORE where the address is a CopyFromReg from SP
(plus a small constant offset).
This applies only to Thumb1: in Thumb2 or ARM mode, a regular store
instruction can access SP directly, so the COPY gets eliminated by
existing code.
The change to ARMDAGToDAGISel::SelectThumbAddrModeSP is a related
cleanup: we shouldn't pretend that it can select anything other than
frame indexes.
Differential Revision: https://reviews.llvm.org/D59568
llvm-svn: 356601
This change does two things. One, it ensures compilation will abort
instead of miscompiling if ARMFrameLowering::determineCalleeSaves
chooses not to save LR in a case where it's necessary. Two, it changes
the way we estimate the size of a function to be more conservative in
the presence of constant pool entries and jump tables.
EstimateFunctionSizeInBytes probably still isn't really conservative
enough, but I'm not sure how we can come up with a reliable estimate
before constant islands runs.
Differential Revision: https://reviews.llvm.org/D59439
llvm-svn: 356527
Summary:
A number of optimizations are inhibited by single-use TokenFactors not
being merged into the TokenFactor using it. This makes we consider if
we can do the merge immediately.
Most tests changes here are due to the change in visitation causing
minor reorderings and associated reassociation of paired memory
operations.
CodeGen tests with non-reordering changes:
X86/aligned-variadic.ll -- memory-based add folded into stored leaq
value.
X86/constant-combiners.ll -- Optimizes out overlap between stores.
X86/pr40631_deadstore_elision -- folds constant byte store into
preceding quad word constant store.
Reviewers: RKSimon, craig.topper, spatel, efriedma, courbet
Reviewed By: courbet
Subscribers: dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, eraman, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59260
llvm-svn: 356068
Targets can potentially emit more efficient code if they know address
computations never overflow. For example ILP32 code on AArch64 (which only has
64-bit address computation) can ignore the possibility of overflow with this
extra information.
llvm-svn: 355926
This adds a few extra Thumb1 opcodes to improve the peephole opimisers
ability to remove redundant cmp instructions. tADC and tSBC require
a small fixup to prevent MOVS being moved past the instruction, giving
the wrong flags.
Differential Revision: https://reviews.llvm.org/D58281
llvm-svn: 354791
This adds a number of missing Thumb1 opcodes so that the peephole optimiser can
remove redundant CMP instructions.
Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri
instruction can be used with frame indices. In thumb1 we use tADDframe.
Differential Revision: https://reviews.llvm.org/D57833
llvm-svn: 354667
This adds a number of missing Thumb1 opcodes so that the peephole optimiser can
remove redundant CMP instructions.
Differential Revision: https://reviews.llvm.org/D57833
llvm-svn: 354564
We may leave behind incorrect dead flags on instructions that are CSE'd. Make
sure we remove the dead flags on physical registers to prevent other incorrect
code motion.
Differential Revision: https://reviews.llvm.org/D58115
llvm-svn: 354443
Summary:
The SMULO/UMULO DAG nodes, when not directly supported by the target,
expand to a multiplication twice as wide. In case that the resulting
type is not legal, the legalizer cannot directly call the intrinsic
with the wide arguments; instead, it "pre-lowers" them by splitting
them in halves.
rL283203 made sure that on big endian targets, the legalizer passes
the argument halves in the correct order. It did not do the same
for the return value halves because the existing code used a hack;
it put an illegal type into DAG and hoped that nothing would break
and it would be correctly lowered elsewhere.
rL307207 fixed this, handling return value halves similar to how
argument handles are handled, but did not take big-endian targets
into account.
This commit fixes the expansion on big-endian targets, such as
the out-of-tree OR1K target.
Reviewers: eli.friedman, vadimcn
Subscribers: george-hopkins, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D45355
llvm-svn: 353854
We need to clear the kill flags on both SingleValReg and OldReg, to ensure they remain
conservatively correct.
Differential Revision: https://reviews.llvm.org/D58114
llvm-svn: 353847
This prevents Constant Hoisting from pulling the constant out of the block,
allowing us to still produce LDRH/UXTH nodes. LDRB/UXTB (255) is already cheap
by the default getIntImmCost, but I've added it for clarity.
Differential Revision: https://reviews.llvm.org/D57671
llvm-svn: 353040