without a frame pointer when unwind may happen.
This is a workaround for a bug in the way we emit the CFI directives for
frameless unwind information. See PR25614.
llvm-svn: 255175
Reinteroduce the code for moving ARGUMENTS back to the top of the basic block.
While the ARGUMENTS physical register prevents sinking and scheduling from
moving them, it does not appear to be sufficient to prevent SelectionDAG from
moving them down in the initial schedule. This patch introduces a patch that
moves them back to the top immediately after SelectionDAG runs.
This is still hopefully a temporary solution. http://reviews.llvm.org/D14750 is
one alternative, though the review has not been favorable, and proposed
alternatives are longer-term and have other downsides.
This fixes the main outstanding -verify-machineinstrs failures, so it adds
-verify-machineinstrs to several tests.
Differential Revision: http://reviews.llvm.org/D15377
llvm-svn: 255125
We mutated the DAG, which invalidated the node we were trying to use
as a base register. Sometimes we got away with it, but other times the
node really did get deleted before it was finished with.
Should fix PR25733
llvm-svn: 255120
During selection DAG legalization, extractelement is replaced with a load
instruction. To do this, a temporary store to the stack is used unless an
existing store is found that can be re-used.
If re-using a store, the chain going out of the store must be replaced by
the one going out of the new load (this ensures that any stores that must
take place after the store happens after the load, else the value might
be overwritten before it is loaded).
The problem is, if the extractelement index is dependent on the store
replacing the chain will introduce a cycle in the selection DAG (the load
uses the index, and by replacing the chain we will make the index dependent
on the load).
To fix this, if the index is dependent on the store, the store is skipped.
This is conservative as we may end up creating an unnecessary extra store
to the stack. However, the situation is not expected to occur very often.
Differential Revision: http://reviews.llvm.org/D15330
llvm-svn: 255114
Otherwise, we think that most types that look like they'd fit in a
legal vector type are legal (so, basically, *any* vector type with a
size between 33 and 128 bits, I think, since we use pow2 alignment;
e.g., v2i25, v3f32, ...).
DataLayout::getTypeAllocSize rounds up based on alignment.
When checking for target intrinsic legality, that's not what we want:
if rounding makes a difference, the type isn't legal, and the
target intrinsics shouldn't be used, as they are always assumed legal.
One could make the argument that alloc size is ultimately the most
relevant here, since we're dealing with LD/ST intrinsics. That's only
true if we did legalize them though; that's a problem for another day.
Use DataLayout::getTypeSizeInBits instead of getTypeAllocSizeInBits.
Type::getSizeInBits can't be used because that'd gratuitously break
pointer vector support.
Some of these uses are currently fine, because we only hit them when
the type is already known legal (e.g., r114454). Update them for
consistency. It's faster to avoid the rounding anyway!
llvm-svn: 255089
Summary:
This fixes failure when trying to select
insertelement <4 x half> undef, half %a, i64 0
which gets transformed to a scalar_to_vector node.
The accompanying v4 and v8 tests fail instruction selection without this
patch.
Reviewers: ab, jmolloy
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D15322
llvm-svn: 255072
On AVX and AVX2, BROADCAST instructions can load a scalar into all elements of a target vector.
This patch improves the lowering of 'splat' shuffles of a loaded vector into a broadcast - currently the lowering only works for cases where we are splatting the zero'th element, which is now generalised to any element.
Fix for PR23022
Differential Revision: http://reviews.llvm.org/D15310
llvm-svn: 255061
Summary:
Before ARMv5T, Thumb1 code could not pop PC, as described at D14357 and D14986;
so we need the special fixup in the epilogue.
Reviewers: jroelofs, qcolombet
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D15126
llvm-svn: 255047
Currently, vectors of halfs end up as ConstantVectors, but there isn't
a good reason they can't be ConstantDataVectors. This should save some
memory.
llvm-svn: 254991
It's strange to duplicate the logic for emitting FP values into
emitGlobalConstantDataSequential, and it's even stranger that we end
up printing the verbose assembly comments differently between the two
paths. Just call into emitGlobalConstantFP rather than crudely
duplicating its logic.
llvm-svn: 254988
The 2-element vector case shows a surprising bug: we failed to
eliminate ops on undefs, so there are 4 fmax calls even though
there can only be 2 valid elements in the inputs.
llvm-svn: 254920
FP logic instructions are supported in DQ extension on AVX-512 target.
I use integer operations instead.
Added tests.
I also enabled FABS in this patch in order to check ANDPS.
The operations are FOR, FXOR, FAND, FANDN.
The instructions, that supported for 512-bit vector under DQ are:
VORPS/PD, VXORPS/PD, VANDPS/PD, FANDNPS/PD.
Differential Revision: http://reviews.llvm.org/D15110
llvm-svn: 254913
Summary: This reverts r254234, and adds a simple fix for the annoying case of use-after-free.
Reviewers: rengolin
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D15236
llvm-svn: 254912
Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store.
This intrinsic comes with all legal types:
<8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru),
but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only.
All data operands should be widened to 512-bit vector.
The mask operand should be widened to v16i1 with zeroes.
Differential Revision: http://reviews.llvm.org/D15265
llvm-svn: 254909
Summary:
We are inserting both Scope and SP into the Seen map and check whether
it was already there in which case we skip the validation (the idea
being that we already checked this Subprogram before). However,
if (Scope == SP) as MDNodes, then inserting the Scope, will trigger
the Seen check causing us to incorrectly not validate this !dbg
attachment. Fix this by not performing the SP Seen check if Scope == SP
Reviewers: pcc, dexonsmith, dblaikie
Subscribers: dblaikie, llvm-commits
Differential Revision: http://reviews.llvm.org/D14697
llvm-svn: 254887
Also, switch to x86-64 because once we can lower these to something
more reasonable, there will be less noise in the checks. And add
AVX runs because those will be different than SSE.
llvm-svn: 254879
This removes the code path that generate "synchronous" (only correct at call site) CFA.
We will probably want to re-introduce it once we are capable of emitting different
.eh_frame and .debug_frame sections.
Differential Revision: http://reviews.llvm.org/D14948
llvm-svn: 254874
This patch introduces a codegen-only instruction currently named br_unless,
which makes it convenient to implement ReverseBranchCondition and re-enable
the MachineBlockPlacement pass. Then in a late pass, it lowers br_unless
back into br_if.
Differential Revision: http://reviews.llvm.org/D14995
llvm-svn: 254826
Add physical register defs to instructions used from stackified
instructions to prevent them from being scheduled into the middle of
a stack sequence. This is a conservative measure which may be loosened
in the future.
Differential Revision: http://reviews.llvm.org/D15252
llvm-svn: 254811
This is just prototype for load/store for i32 types. I'll add them to
the rest of the types if we like this direction.
Differential Revision: http://reviews.llvm.org/D15197
llvm-svn: 254807
Full varargs support will depend on prologue/epilogue support, but this patch
gets us started with most of the basic infrastructure.
Differential Revision: http://reviews.llvm.org/D15231
llvm-svn: 254799
These instructions are not supported by all CPUs in 64-bit mode. Emitting them
causes Chromium to crash on start-up for users with such chips.
(GCC puts these instructions behind -msahf on 64-bit for the same reason.)
This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets
and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering
from before r244503 when the instructions are not available.
Differential Revision: http://reviews.llvm.org/D15240
llvm-svn: 254793
This commit adds a new target-independent calling convention for C++ TLS
access functions. It aims to minimize overhead in the caller by perserving as
many registers as possible.
The target-specific implementation for X86-64 is defined as following:
Arguments are passed as for the default C calling convention
The same applies for the return value(s)
The callee preserves all GPRs - except RAX and RDI
The access function makes C-style TLS function calls in the entry and exit
block, C-style TLS functions save a lot more registers than normal calls.
The added calling convention ties into the existing implementation of the
C-style TLS functions, so we can't simply use existing calling conventions
such as preserve_mostcc.
rdar://9001553
llvm-svn: 254737
Add new x86 pass which replaces address calculations in load or store instructions with def register of existing LEA (must be in the same basic block), if the LEA calculates address that differs only by a displacement. Works only with -Os or -Oz.
Differential Revision: http://reviews.llvm.org/D13294
llvm-svn: 254712