The canonical form for allocas is a single allocation of the array type.
In case we see a non-canonical array alloca, make sure we aren't
replacing this with an array N times smaller.
llvm-svn: 267916
Currently Mips::emitAtomicBinaryPartword() does not properly respect the
width of pointers. For MIPS64 this causes the memory address that the ll/sc
sequence uses to be truncated. At runtime this causes a segmentation fault.
This can be fixed by applying similar changes as r266204, so that a full 64bit
pointer is loaded.
Reviewers: dsanders
Differential Review: http://reviews.llvm.org/D19651
llvm-svn: 267900
Summary:
Port rL265480, rL264754, rL265997 and rL266252 to SystemZ, in order to enable the Swift port on the architecture. SwiftSelf and SwiftError are assigned to R10 and R9, respectively, which are normally callee-saved registers. For more information, see:
RFC: Implementing the Swift calling convention in LLVM and Clang
https://groups.google.com/forum/#!topic/llvm-dev/epDd2w93kZ0
Reviewers: kbarton, manmanren, rjmccall, uweigand
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19414
llvm-svn: 267823
For compilations with no explicit cpu specified, this exhibits
nice gains on Silvermont, with neutral performance on big cores.
Differential Revision: http://reviews.llvm.org/D19138
llvm-svn: 267809
The callseq_end node must be glued with the TLS calls, otherwise,
the generic code will miss the uses of the returned value and will
mark it dead.
Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI,
the pseudo uses the symbol address at this point not RDI and the
lowering will do the right thing.
llvm-svn: 267797
transferSuccessors() would LoadCmpBB a successor of DoneBB,
whereas it should be a successor of the original MBB.
Follow-up to r266339.
Unfortunately, it's tricky to catch this in the verifier.
llvm-svn: 267779
transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas
it should be a successor of the original MBB.
The testcase changes are caused by Thumb2SizeReduction, which
was previously confused by the broken CFG.
Follow-up to r266679.
Unfortunately, it's tricky to catch this in the verifier.
llvm-svn: 267778
Summary:
Currently the NVVMReflect pass is run at the beginning of our backend
passes. But really, it should be run as early as possible, as it's
simply resolving an "if" statement in code. So copy it into
TargetMachine::addEarlyAsPossiblePasses.
We still run it at the beginning of the backend passes, since it's
needed for correctness when lowering to nvptx.
(Specifically, NVVMReflect changes each call to the __nvvm_reflect
function or llvm.nvvm.reflect intrinsic into an integer constant, based
on the pass's configuration. Clearly we miss many optimization
opportunities if we perform this transformation at the beginning of
codegen.)
Reviewers: rnk
Subscribers: tra, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D18616
llvm-svn: 267765
Added support of TTMP quads.
Reworked M0 exclusion machinery for SMRD and similar instructions
to enable usage of TTMP registers in those instructions as destinations.
Tests added.
Differential Revision: http://reviews.llvm.org/D19342
llvm-svn: 267733
Summary:
So it appears that to guarantee some of the ordering requirements of a GLSL
memoryBarrier() executed in the shader, we need to emit an s_waitcnt.
(We can't use an s_barrier, because memoryBarrier() may appear anywhere in
the shader, in particular it may appear in non-uniform control flow.)
Reviewers: arsenm, mareko, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19203
llvm-svn: 267729
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.
Differential Revision: http://reviews.llvm.org/D18523
llvm-svn: 267725
Possibility to specify code of hardware register kept.
Disassemble to symbolic name, if name is known.
Tests updated/added.
Differential Revision: http://reviews.llvm.org/D19335
llvm-svn: 267724
We run after PEI, so we need to AddPristinesAndCSRs.
In practice, that makes no difference here, because we only ask about
liveness of super-registers of defined GR8/GR16 registers, so they
can't be pristine. Still, it's the correct thing to do.
Thanks to Quentin for noticing!
Follow-up to r267495.
llvm-svn: 267658
This effectively adds back the extractelt combine removed by r262358:
the direct case can still occur (because x86_mmx is special, see
r262446), but it's the indirect case that's now superseded by the
generic combine.
llvm-svn: 267651
the prologue.
Do not use basic blocks that have EFLAGS live-in as prologue if we need
to realign the stack. Realigning the stack uses AND instruction and this
clobbers EFLAGS.
An other alternative would have been to save and restore EFLAGS around
the stack realignment code, but this is likely inefficient.
Fixes PR27531.
llvm-svn: 267634
NVPTXLowerKernelArgs is required for correctness, so it should not be guarded
by CodeGenOpt::None.
NVPTXPeephole is optimization only, so it should be skipped when
CodeGenOpt::None.
llvm-svn: 267619
Support for SDWA instructions for VOP1 and VOP2 encoding.
Not done yet:
- converters for support optional operands and modifiers
- VOPC
- sext() modifier
- intrinsics
- VOP2b (see vop_dpp.s)
- V_MAC_F32 (see vop_dpp.s)
Differential Revision: http://reviews.llvm.org/D19360
llvm-svn: 267553