Added support for decoding VPERMILPS variable shuffle masks that aren't in the constant pool.
Added target shuffle mask decoding for SCALAR_TO_VECTOR+VZEXT_MOVL cases - these can happen for v2i64 constant re-materialization
Followup to D17681
llvm-svn: 262784
btver1 is a SSSE3/SSE4a only CPU - it doesn't have AVX and doesn't support XSAVE.
Differential Revision: http://reviews.llvm.org/D17683
llvm-svn: 262782
When the lowering of the setjmp intrinsic requires
a global base pointer to be set, make sure such pointer
gets defined by the CGBR pass.
This fixes PR26742.
llvm-svn: 262762
cmpxchgXXb uses RBX as one of its implicit argument. I.e., when
we use that instruction we need to clobber RBX. This is generally
fine, expect when RBX is a reserved register because in that case,
the register allocator will not track its value and will not
save and restore it when interferences occur.
rdar://problem/24851412
llvm-svn: 262759
The x86 ret instruction has a 16 bit immediate indicating how many bytes
to pop off of the stack beyond the return address.
There is a problem when extremely large structs are passed by value: we
might not be able to fit the number of bytes to pop into the return
instruction.
To fix this, expand RET_FLAG a little later and use a special sequence
to clean the stack:
pop %ecx ; return address is now in %ecx
add $n, %esp ; clean the stack
push %ecx ; bring the return address back on the stack
ret ; pop the return address and jmp to it's value
llvm-svn: 262755
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
Second attempt, creating TLI.isOperationCustom like isOperationExpand, to make
sure we only emit valid types or the ones that were explicitly marked as custom.
Now, passing check-all and test-suite on x86, ARM and AArch64.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262738
Summary:
This is necessary for when we run out of VGPRs and can no
longer use v_{read,write}_lane for spilling SGPRs.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17592
llvm-svn: 262732
Summary:
This allows us to use virtual registers when we need extra registers
for inserting spill instructions in SIRegisterInfo:eliminateFrameIndex().
Once all the frame indices have been eliminated, the
PrologEpilogueInserter does an extra pass over the program to replace
all virtual registers with physical ones.
This allows us to make more efficient use of our emergency spill slots,
so we only need to create one.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17591
llvm-svn: 262728
These correspond to IMAGE_ATOMIC_* and are going to be used by Mesa for the
GL_ARB_shader_image_load_store extension.
Initial change by Nicolai H.hnle
Differential Revision: http://reviews.llvm.org/D17401
llvm-svn: 262701
The variable mask form of VPERMILPD/VPERMILPS were only partially implemented, with much of it still performed as an intrinsic.
This patch properly defines the instructions in terms of X86ISD::VPERMILPV, permitting the opcode to be easily combined as a target shuffle.
Differential Revision: http://reviews.llvm.org/D17681
llvm-svn: 262635
Fixed the ordering to check first for X86 interrupt handler then for MCU target.
Differential Revision: http://reviews.llvm.org/D17801
llvm-svn: 262628
That's not the case for VPERMV/VPERMV3, which cover all possible
combinations (the C intrinsics use a different order; the AVX vs
AVX512 intrinsics are different still).
Since:
r246981 AVX-512: Lowering for 512-bit vector shuffles.
VPERMV is recognized in getTargetShuffleMask.
This breaks assumptions in most callers, as they expect
the non-mask operands to start at index 0.
VPERMV has the mask as operand #0; VPERMV3 has it in the middle.
Instead of the faulty assumption, have getTargetShuffleMask return
its operands as well.
One alternative we considered was to change the operand order of
VPERMV, but we agreed to stick to the instruction order, as there
are more AVX512 weirdness to cover (vpermt2/vpermi2 in particular).
Differential Revision: http://reviews.llvm.org/D17041
llvm-svn: 262627
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.
Differential Revision: http://reviews.llvm.org/D17691
llvm-svn: 262599
Patch by: Konstantin Zhuravlyov
Summary: Tools, such as debugger, need to pause execution based on user input (i.e. breakpoint). In order to do this, two S_NOP instructions are inserted for each high level source statement: one before first isa instruction of high level source statement, and one after last isa instruction of high level source statement. Further, debugger may replace S_NOP instructions with S_TRAP instructions based on user input.
Reviewers: tstellarAMD, arsenm
Subscribers: echristo, dblaikie, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17454
llvm-svn: 262579
Summary:
When there were no free SGPRs, we were trying to move this value into
some of the reserved registers which was causing a segmentation fault.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17590
llvm-svn: 262577
The code was previously not able to track a boolean argument
at a call site back to the formal argument of the caller.
Differential Revision: http://reviews.llvm.org/D17786
llvm-svn: 262575
Catch objects with a displacement of zero do not initialize a catch
object. The displacement is relative to %rsp at the end of the
function's prologue for x86_64 targets.
If we place an object at the top-of-stack, we will end up wit a
displacement of zero resulting in our catch object remaining
uninitialized.
Address this by creating our catch objects as fixed objects. We will
ensure that the UnwindHelp object is created after the catch objects so
that no catch object will have a displacement of zero.
Differential Revision: http://reviews.llvm.org/D17823
llvm-svn: 262546
When div+rem calls on the same arguments are found, the ARM back-end merges the
two calls into one __aeabi_divmod call for up to 32-bits values. However,
for 64-bit values, which also have a lib call (__aeabi_ldivmod), it wasn't
merging the calls, and thus calling ldivmod twice and spilling the temporary
results, which generated pretty bad code.
This patch legalises 64-bit lib calls for divmod, so that now all the spilling
and the second call are gone. It also relaxes the DivRem combiner a bit on the
legal type check, since it was already checking for isLegalOrCustom on every
value, so the extra check for isTypeLegal was redundant.
This patch fixes PR17193 (and a long time FIXME in the tests).
llvm-svn: 262507
This reverts commit r262370.
It turns out there is code out there that does sequences of allocas
greater than 4K: http://crbug.com/591404
The goal of this change was to improve the code size of inalloca call
sequences, but we got tangled up in the mess of dynamic allocas.
Instead, we should come back later with a separate MI pass that uses
dominance to optimize the full sequence. This should also be able to
remove the often unneeded stacksave/stackrestore pairs around the call.
llvm-svn: 262505
Most of the time ARM has the CCR.UNALIGN_TRP bit set to false which
means that unaligned loads/stores do not trap and even extensive testing
will not catch these bugs. However the multi/double variants are not
affected by this bit and will still trap. In effect a more aggressive
load/store optimization will break existing (bad) code.
These bugs do not necessarily manifest in the broken code where the
misaligned pointer is formed but often later in perfectly legal code
where it is accessed. This means recompiling system libraries (which
have no alignment bugs) with a newer compiler will break existing
applications (with alignment bugs) that worked before.
So (under protest) I implemented this safe mode which limits the
formation of multi/double operations to cases that are not affected by
user code (stack operations like spills/reloads) or cases where the
normal operations trap anyway (floating point load/stores). It is
disabled by default.
Differential Revision: http://reviews.llvm.org/D17015
llvm-svn: 262504
Summary:
This change enables frame pointer elimination in non-leaf functions.
The -fomit-frame-pointer option still needs to be used when compiling
via clang (or an equivalent method of not setting the
'no-frame-pointer-elim*' function attributes if generating llvm IR via
some other method) to take advantage of this optimization.
This change should be NFC when compiling via clang without
-fomit-frame-pointer.
Reviewers: t.p.northover
Subscribers: aemerson, rengolin, tberghammer, qcolombet, llvm-commits, danalbert, mcrosier, srhines
Differential Revision: http://reviews.llvm.org/D17730
llvm-svn: 262495
We have a number of useful lowering strategies for VBROADCAST instructions (both from memory and register element 0) which the 128-bit form of the MOVDDUP instruction can make use of.
This patch tweaks lowerVectorShuffleAsBroadcast to enable it to broadcast 2f64 args using MOVDDUP as well.
It does require a slight tweak to the lowerVectorShuffleAsBroadcast mechanism as the existing MOVDDUP lowering uses isShuffleEquivalent which can match binary shuffles that can lower to (unary) broadcasts.
Differential Revision: http://reviews.llvm.org/D17680
llvm-svn: 262478
This is going to be used in .hsatext disassembler and can be used
in current assembler parser (lit tests passed on parsing).
Code using this helpers isn't included in this patch.
Benefits:
unified approach
fast field name lookup on parsing
Later I would like to enhance some of the field naming/syntax using this code.
Patch by: Valery Pykhtin
Differential Revision: http://reviews.llvm.org/D17150
llvm-svn: 262473