Summary:
Port rL265480, rL264754, rL265997 and rL266252 to SystemZ, in order to enable the Swift port on the architecture. SwiftSelf and SwiftError are assigned to R10 and R9, respectively, which are normally callee-saved registers. For more information, see:
RFC: Implementing the Swift calling convention in LLVM and Clang
https://groups.google.com/forum/#!topic/llvm-dev/epDd2w93kZ0
Reviewers: kbarton, manmanren, rjmccall, uweigand
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D19414
llvm-svn: 267823
For compilations with no explicit cpu specified, this exhibits
nice gains on Silvermont, with neutral performance on big cores.
Differential Revision: http://reviews.llvm.org/D19138
llvm-svn: 267809
The callseq_end node must be glued with the TLS calls, otherwise,
the generic code will miss the uses of the returned value and will
mark it dead.
Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI,
the pseudo uses the symbol address at this point not RDI and the
lowering will do the right thing.
llvm-svn: 267797
transferSuccessors() would LoadCmpBB a successor of DoneBB,
whereas it should be a successor of the original MBB.
Follow-up to r266339.
Unfortunately, it's tricky to catch this in the verifier.
llvm-svn: 267779
transferSuccessors() would LoadCmpBB a successor of DoneBB, whereas
it should be a successor of the original MBB.
The testcase changes are caused by Thumb2SizeReduction, which
was previously confused by the broken CFG.
Follow-up to r266679.
Unfortunately, it's tricky to catch this in the verifier.
llvm-svn: 267778
Summary:
Currently the NVVMReflect pass is run at the beginning of our backend
passes. But really, it should be run as early as possible, as it's
simply resolving an "if" statement in code. So copy it into
TargetMachine::addEarlyAsPossiblePasses.
We still run it at the beginning of the backend passes, since it's
needed for correctness when lowering to nvptx.
(Specifically, NVVMReflect changes each call to the __nvvm_reflect
function or llvm.nvvm.reflect intrinsic into an integer constant, based
on the pass's configuration. Clearly we miss many optimization
opportunities if we perform this transformation at the beginning of
codegen.)
Reviewers: rnk
Subscribers: tra, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D18616
llvm-svn: 267765
Added support of TTMP quads.
Reworked M0 exclusion machinery for SMRD and similar instructions
to enable usage of TTMP registers in those instructions as destinations.
Tests added.
Differential Revision: http://reviews.llvm.org/D19342
llvm-svn: 267733
Summary:
So it appears that to guarantee some of the ordering requirements of a GLSL
memoryBarrier() executed in the shader, we need to emit an s_waitcnt.
(We can't use an s_barrier, because memoryBarrier() may appear anywhere in
the shader, in particular it may appear in non-uniform control flow.)
Reviewers: arsenm, mareko, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D19203
llvm-svn: 267729
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.
Differential Revision: http://reviews.llvm.org/D18523
llvm-svn: 267725
Possibility to specify code of hardware register kept.
Disassemble to symbolic name, if name is known.
Tests updated/added.
Differential Revision: http://reviews.llvm.org/D19335
llvm-svn: 267724
We run after PEI, so we need to AddPristinesAndCSRs.
In practice, that makes no difference here, because we only ask about
liveness of super-registers of defined GR8/GR16 registers, so they
can't be pristine. Still, it's the correct thing to do.
Thanks to Quentin for noticing!
Follow-up to r267495.
llvm-svn: 267658
This effectively adds back the extractelt combine removed by r262358:
the direct case can still occur (because x86_mmx is special, see
r262446), but it's the indirect case that's now superseded by the
generic combine.
llvm-svn: 267651
the prologue.
Do not use basic blocks that have EFLAGS live-in as prologue if we need
to realign the stack. Realigning the stack uses AND instruction and this
clobbers EFLAGS.
An other alternative would have been to save and restore EFLAGS around
the stack realignment code, but this is likely inefficient.
Fixes PR27531.
llvm-svn: 267634
NVPTXLowerKernelArgs is required for correctness, so it should not be guarded
by CodeGenOpt::None.
NVPTXPeephole is optimization only, so it should be skipped when
CodeGenOpt::None.
llvm-svn: 267619
Support for SDWA instructions for VOP1 and VOP2 encoding.
Not done yet:
- converters for support optional operands and modifiers
- VOPC
- sext() modifier
- intrinsics
- VOP2b (see vop_dpp.s)
- V_MAC_F32 (see vop_dpp.s)
Differential Revision: http://reviews.llvm.org/D19360
llvm-svn: 267553
Handle MachineBasicBlock as a memory displacement operand in the LEA optimization pass.
Differential Revision: http://reviews.llvm.org/D19409
llvm-svn: 267551
print-stack-trace.cc test failure of compiler-rt has been fixed by
r266869 (http://reviews.llvm.org/D19148), so reenable sibling call
optimization on ppc64
Reviewers: nemanjai kbarton
llvm-svn: 267527
Summary:
We don't use MinLatency any more since r184032.
Reviewers: atrick, hfinkel, mcrosier
Differential Revision: http://reviews.llvm.org/D19474
llvm-svn: 267502
Kill-flags, which computeRegisterLiveness uses, are not reliable.
LivePhysRegs is.
Differential Revision: http://reviews.llvm.org/D19472
llvm-svn: 267495
The SparcV8 fneg and fabs instructions interestingly come only in a
single-float variant. Since the sign bit is always the topmost bit no
matter what size float it is, you simply operate on the high
subregister, as if it were a single float.
However, the layout of double-floats in the float registers is reversed
on little-endian CPUs, so that the high bits are in the second
subregister, rather than the first.
Thus, this expansion must check the endianness to use the correct
subregister.
llvm-svn: 267489
log2(Mask) is smaller than 32, we must use the 32-bit variant because the 64-bit
variant cannot encode it. Therefore, set the subreg part accordingly.
[AArch64] Fix optimizeCondBranch logic.
The opcode for the optimized branch does not depend on the size
of the activate bits in the AND masks, but the AND opcode itself.
Indeed, we need to use a X or W variant based on the AND variant
not based on whether the mask fits into the related variant.
Otherwise, we may end up using the W variant of the optimized branch
for 64-bit register inputs!
This fixes the last make check verifier issues for AArch64: PR27479.
llvm-svn: 267465
Use the operand for how long to wait. This is somewhat
distasteful, since it would be better to just emit s_nop
with the right argument in the first place. This would require
changing TII::insertNoop to emit N operands, which would be easy.
Slightly more problematic is the post-RA scheduler and hazard recognizer
represent nops as a single null node, and would require inventing
another way of representing N nops.
llvm-svn: 267456
Previously findClosestSuitableAluInstr was only considering the base register when checking the current instruction for suitability. Expand check to consider the offset if the offset is a register.
llvm-svn: 267424
Commit r266977 was reason for failing LLVM test suite with error message: fatal error: error in backend: Cannot select: t17: i32 = rotr t2, t11 ...
llvm-svn: 267418
Summary:
The expression is detected as a redundant expression.
Turn out, this is probably a bug.
```
/home/etienneb/llvm/llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:306:26: warning: both side of operator are equivalent [misc-redundant-expression]
if (isSMRD(*FirstLdSt) && isSMRD(*FirstLdSt)) {
```
Reviewers: rnk, tstellarAMD
Subscribers: arsenm, cfe-commits
Differential Revision: http://reviews.llvm.org/D19460
llvm-svn: 267415
Summary:
This patch adds support for the X asm constraint.
To do this, we lower the constraint to either a "w" or "r" constraint
depending on the operand type (both constraints are supported on ARM).
Fixes PR26493
Reviewers: t.p.northover, echristo, rengolin
Subscribers: joker.eph, jgreenhalgh, aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D19061
llvm-svn: 267411
ADD8TLS, a variant of add instruction used for initial-exec TLS,
currently accepts r0 as a source register. While add itself supports
r0 just fine, linker can relax it to a local-exec sequence, converting
it to addi - which doesn't support r0.
Differential Revision: http://reviews.llvm.org/D19193
llvm-svn: 267388
This corrects the MI annotations for the stack adjustment following the __chkstk
invocation. We were marking the original SP usage as a Def rather than Kill.
The (new) assigned value is the definition, the original reference is killed.
Adjust the ISelLowering to mark Kills and FrameSetup as well.
This partially resolves PR27480.
llvm-svn: 267361
We aren't currently making use of this in any successful mask decode and its actually incorrect as it inserts the wrong number of SM_SentinelUndef mask elements.
llvm-svn: 267350
Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm.
This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472.
llvm-svn: 267343
This fixes PR22248 on s390x. The previous attempt at this was D19101,
which was before LOAD_STACK_GUARD existed. Compared to the previous
version, this always emits a rather ugly block of 4 instructions, involving
a thread pointer load that can't be shared with other potential users.
However, this is necessary for SSP - spilling the guard value (or thread
pointer used to load it) is counter to the goal, since it could be
overwritten along with the frame it protects.
Differential Revision: http://reviews.llvm.org/D19363
llvm-svn: 267340
The original patch caused crashes because it could derefence a null pointer
for SelectionDAGTargetInfo for targets that do not define it.
Evaluates fmul+fadd -> fmadd combines and similar code sequences in the
machine combiner. It adds support for float and double similar to the existing
integer implementation. The key features are:
- DAGCombiner checks whether it should combine greedily or let the machine
combiner do the evaluation. This is only supported on ARM64.
- It gives preference to throughput over latency: the heuristic used is
to combine always in loops. The targets decides whether the machine
combiner should optimize for throughput or latency.
- Supports for fmadd, f(n)msub, fmla, fmls patterns
- On by default at O3 ffast-math
llvm-svn: 267328
The option to control the emission of the new relocations
is -relax-relocations (blatantly copied from GNU as).
It can't be enabled by default because it breaks relatively
recent versions of ld.bfd/ld.gold (late 2015).
llvm-svn: 267307