Some implicit ilist iterator conversions have crept back into Analysis,
Transforms, Hexagon, and llvm-stress. This removes them.
I'll commit a patch immediately after this to disallow them (in a
separate patch so that it's easy to revert if necessary).
llvm-svn: 252371
We used to try to constant-fold them to i32 immediates.
Given that fast-isel doesn't otherwise support vNi1, when selecting
the result users, we'd fallback to SDAG anyway.
However, if the users were in another block, we'd insert broken
cross-class copies (GPR32 to FPR64).
Give up, let SDAG agree with itself on a vNi1 legalization strategy.
llvm-svn: 252364
When matching non-LSB-extracting truncating broadcasts, we now insert
the necessary SRL. If the scalar resulted from a load, the SRL will be
folded into it, creating a narrower, offset, load.
However, i16 loads aren't Desirable, so we get i16->i32 zextloads.
We already catch i16 aextloads; catch these as well.
llvm-svn: 252363
Now that we recognize this, we can support it instead of bailing out.
That is, we can fold:
(v8i16 (shufflevector
(v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))),
<1,1,...,1>))
into:
(v8i16 (vbroadcast (i16 (trunc (srl Y, 16)))))
llvm-svn: 252362
We used to incorrectly assume that the offset we're extracting from
was a multiple of the element size. So, we'd fold:
(v8i16 (shufflevector
(v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))),
<1,1,...,1>))
into:
(v8i16 (vbroadcast (i16 (trunc Y))))
whereas we should have extracted the higher bits from X.
Instead, bail out if the assumption doesn't hold.
llvm-svn: 252361
Summary:
Pass the VOPProfile object all the through to *_m multiclasses. This will
allow us to do more simplifications in the future.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D13437
llvm-svn: 252339
All 3 operands of FMA3 instructions are commutable now.
Patch by Slava Klochkov
Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab).
Differential Revision: http://reviews.llvm.org/D13269
llvm-svn: 252335
Modelling of the expression stack is evolving. This patch takes another
step by making pushes explicit.
Differential Revision: http://reviews.llvm.org/D14338
llvm-svn: 252334
Mark kernels that use certain features that require user
SGPRs to support with kernel attributes. We need to know
before instruction selection begins because it impacts
the kernel calling convention lowering.
For now this only detects the workitem intrinsics.
llvm-svn: 252323
For some reason VS_32 ends up factoring into the pressure heuristics
even though we should never see a virtual register with this class.
When SGPRs are reserved for register spilling, this for some reason
triggers reg-crit scheduling.
Setting isAllocatable = 0 may help with this since that seems to remove
it from the default implementation's generated table.
llvm-svn: 252321
Summary:
In this implementation, LiveIntervalAnalysis invents a few register
masks on basic block boundaries that preserve no registers. The nice
thing about this is that it prevents the prologue inserter from thinking
it needs to spill all XMM CSRs, because it doesn't see any explicit
physreg defs in the MI.
Reviewers: MatzeB, qcolombet, JosephTremoulet, majnemer
Subscribers: MatzeB, llvm-commits
Differential Revision: http://reviews.llvm.org/D14407
llvm-svn: 252318
The benefit from converting narrow loads into a wider load (r251438) could be
micro-architecturally dependent, as it assumes that a single load with two bitfield
extracts is cheaper than two narrow loads. Currently, this conversion is
enabled only in cortex-a57 on which performance benefits were verified.
llvm-svn: 252316
Summary:
The bug was that the sldi instructions have immediate widths dependant on
their element size. So sldi.d has a 1-bit immediate and sldi.b has a 4-bit
immediate. All of these were using 4-bit immediates previously.
Reviewers: vkalintiris
Subscribers: llvm-commits, atanasyan, dsanders
Differential Revision: http://reviews.llvm.org/D14018
llvm-svn: 252297
Summary:
The bug was that the MIPS32R6/MIPS64R6/microMIPS32R6 versions of LSA and DLSA
(unlike the MSA version) failed to account for the off-by-one encoding of the
immediate. The range is actually 1..4 rather than 0..3.
Reviewers: vkalintiris
Subscribers: atanasyan, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D14015
llvm-svn: 252295
Summary:
Without these patterns we would generate a complete LL/SC sequence.
This would be problematic for memory regions marked as WRITE-only or
READ-only, as the instructions LL/SC would read/write to the protected
memory regions correspondingly.
Reviewers: dsanders
Subscribers: llvm-commits, dsanders
Differential Revision: http://reviews.llvm.org/D14397
llvm-svn: 252293
This adds the EH_RESTORE x86 pseudo instr, which is responsible for
restoring the stack pointers: EBP and ESP, and ESI if stack realignment
is involved. We only need this on 32-bit x86, because on x64 the runtime
restores CSRs for us.
Previously we had to keep the CATCHRET instruction around during SEH so
that we could convince X86FrameLowering to restore our frame pointers.
Now we can split these instructions earlier.
This was confusing, because we had a return instruction which wasn't
really a return and was ultimately going to be removed by
X86FrameLowering. This change also simplifies X86FrameLowering, which
really shouldn't be building new MBBs.
No observable functional change currently, but with the new register
mask stuff in D14407, CATCHRET will become a register allocator barrier,
and our existing tests rely on us having reasonable register allocation
around SEH.
llvm-svn: 252266
We already had a test for this for 32-bit SEH catchpads, but those don't
actually create funclets. We had a bug that only appeared in funclet
prologues, where we would establish EBP and ESI as our FP and BP, and
then downstream prologue code would overwrite them.
While I was at it, I fixed Win64+funclets+stackrealign. This issue
doesn't come up as often there due to the ABI requring 16 byte stack
alignment, but now we can rest easy that AVX and WinEH will work well
together =P.
llvm-svn: 252210
Mangling type information into MachineInstr opcode names was a temporary
measure, and it's starting to get hairy. At the same time, the MC instruction
printer wants to use AsmString strings for printing. This patch takes the
first step, starting the process of adding AsmStrings for instructions.
llvm-svn: 252203
Also, remove an enum hack where enum values were used as indexes into an array.
We may want to make this a real class to allow pattern-based queries/customization (D13417).
llvm-svn: 252196
Summary:
This review is related to another review request http://reviews.llvm.org/D11268, does the same and merely fixes a couple of issues with it.
D11268 is quite old and has merge conflicts against the current trunk.
This request
- rebases D11268 onto the new trunk;
- resolves the merge conflicts;
- fixes the prologue_end tests, which do not pass due to the subprogram definitions not marked as distinct.
Reviewers: echristo, rengolin, kubabrecka
Subscribers: aemerson, rengolin, jyknight, dsanders, llvm-commits, asl
Differential Revision: http://reviews.llvm.org/D14338
llvm-svn: 252177
This fixes the issue of wrong CFA calculation in the following case:
0x08048400 <+0>: push %ebx
0x08048401 <+1>: sub $0x8,%esp
0x08048404 <+4>: **call 0x8048409 <test+9>**
0x08048409 <+9>: **pop %eax**
0x0804840a <+10>: add $0x1bf7,%eax
0x08048410 <+16>: mov %eax,%ebx
0x08048412 <+18>: call 0x80483f0 <bar>
0x08048417 <+23>: add $0x8,%esp
0x0804841a <+26>: pop %ebx
0x0804841b <+27>: ret
The highlighted instructions are a product of movpc instruction. The call
instruction changes the stack pointer, and pop instruction restores its
value. However, the rule for computing CFA is not updated and is wrong on
the pop instruction. So, e.g. backtrace in gdb does not work when on the pop
instruction. This adds cfi instructions for both call and pop instructions.
cfi_adjust_cfa_offset** instruction is used with the appropriate offset for
setting the rules to calculate CFA correctly.
Patch by Violeta Vukobrat.
Differential Revision: http://reviews.llvm.org/D14021
llvm-svn: 252176
We can conservatively know that CMOV's known bits are the intersection of known bits for each of its operands. This helps PerformCMOVToBFICombine find more opportunities.
I tried hard to create a testcase for this and failed - we have to sufficiently confuse DAG.computeKnownBits which can see through all the cheap tricks I tried to narrow my larger testcase down :(
This code is actually exercised in CodeGen/ARM/bfi.ll, there's just no functional difference because DAG.computeKnownBits gets the right answer in that case.
llvm-svn: 252168
The operand layout is slightly different for the atomic
opcodes from the usual MUBUF loads and stores.
This should only fix it on SI/CI. VI is still broken
because it still emits the addr64 replacement.
llvm-svn: 252140
Summary:
The CLR's personality routine passes the pointer to the establisher frame
in RCX, not RDX.
Reviewers: pgavlin, majnemer, rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D14343
llvm-svn: 252135
The generic infrastructure already did a lot of work to decide if the
fixup value is know or not. It doesn't make sense to reimplement a very
basic case: same fragment.
llvm-svn: 252090
Win64 has some strict requirements for the epilogue. As a result, we disable
shrink-wrapping for Win64 unless the block that gets the epilogue is already an
exit block.
Fixes PR24193.
llvm-svn: 252088
This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction.
The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening.
This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done.
It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted.
Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll
Differential Revision: http://reviews.llvm.org/D13988
llvm-svn: 252074