Vector immediate load instructions should have the isAsCheapAsAMove, isMoveImm
and isReMaterializable flags set. With them, these instruction will get
hoisted out of loops.
Review: Ulrich Weigand
llvm-svn: 292790
Re-Commit r292543 with a fix for the situation when the chain end is
MBB.end().
This function can be used to accumulate the set of all read and modified
register in a sequence of instructions.
Use this code in AArch64A57FPLoadBalancing::scavengeRegister() to prove
the concept.
- The AArch64A57LoadBalancing code is using a backwards analysis now
which is irrespective of kill flags. This is the main motivation for
this change.
Differential Revision: http://reviews.llvm.org/D22082
llvm-svn: 292705
Summary:
Specifically, we upgrade llvm.nvvm.:
* brev{32,64}
* clz.{i,ll}
* popc.{i,ll}
* abs.{i,ll}
* {min,max}.{i,ll,u,ull}
* h2f
These either map directly to an existing LLVM target-generic
intrinsic or map to a simple LLVM target-generic idiom.
In all cases, we check that the code we generate is lowered to PTX as we
expect.
These builtins don't need to be backfilled in clang: They're not
accessible to user code from nvcc.
Reviewers: tra
Subscribers: majnemer, cfe-commits, llvm-commits, jholewinski
Differential Revision: https://reviews.llvm.org/D28793
llvm-svn: 292694
Summary:
DADToDAG has access to TargetLowering, but not vice versa, so this is
the more general location for these functions.
NFC
Reviewers: tra
Subscribers: jholewinski, llvm-commits
Differential Revision: https://reviews.llvm.org/D28795
llvm-svn: 292693
Newer ppc supports unaligned memory access, it reduces the cost of unaligned memory access significantly. This patch handles this case in PPCTTIImpl::getMemoryOpCost.
This patch fixes pr31492.
Differential Revision: https://reviews.llvm.org/D28630
llvm-svn: 292680
WebAssembly varargs functions use a significantly different ABI than
non-varargs functions, and the current code in
WebAssemblyFixFunctionBitcasts doesn't handle that difference. For now,
just avoid creating wrapper functions in the presence of varargs.
llvm-svn: 292645
Kill flags need to be updated correctly when moving stores up/down to
form store pair instructions.
Those invalid flags have been ignored before but as of r290014 they are
recognized when using -mllvm -verify-machineinstrs.
Also simplifies test/CodeGen/AArch64/ldst-opt-dbg-limit.mir, renames it
to ldst-opt.mir test and adds a new tests for this change.
Differential Revision: https://reviews.llvm.org/D28875
llvm-svn: 292625
This patch fixes debug information for __thread variable on Mips
using .dtprelword and .dtpreldword directives.
Patch by Aleksandar Beserminji.
Differential Revision: http://reviews.llvm.org/D28770
llvm-svn: 292624
We also want to optimise tests like this: return a*b == 0. The MULS
instruction is flag setting, so we don't need the CMP instruction but can
instead branch on the result of the MULS. The generated instructions sequence
for this example was: MULS, MOVS, MOVS, CMP. The MOVS instruction load the
boolean values resulting from the select instruction, but these MOVS
instructions are flag setting and were thus preventing this optimisation. Now
we first reorder and move the MULS to before the CMP and generate sequence
MOVS, MOVS, MULS, CMP so that the optimisation could trigger. Reordering of the
MULS and MOVS is safe to do because the subsequent MOVS instructions just set
the CPSR register and don't use it, i.e. the CPSR is dead.
Differential Revision: https://reviews.llvm.org/D27990
llvm-svn: 292608
Hunt down some of the places where we use bare addReg(0) or addImm(AL).addReg(0)
and replace with add(condCodeOp()) and add(predOps()). This should make it
easier to understand what those operands represent (without having to look at
the definition of the instruction that we're adding to).
Differential Revision: https://reviews.llvm.org/D27984
llvm-svn: 292587
Inline spiller can decide to move a spill as early as possible in the basic block.
It will skip phis and label, but we also need to make sure it skips instructions
in the basic block prologue which restore exec mask.
Added isPositionLike callback in TargetInstrInfo to detect instructions which
shall be skipped in addition to common phis, labels etc.
Differential Revision: https://reviews.llvm.org/D27997
llvm-svn: 292554
This function can be used to accumulate the set of all read and modified
register in a sequence of instructions.
Use this code in AArch64A57FPLoadBalancing::scavengeRegister() to prove
the concept.
- The AArch64A57LoadBalancing code is using a backwards analysis now
which is irrespective of kill flags. This is the main motivation for
this change.
Differential Revision: http://reviews.llvm.org/D22082
llvm-svn: 292543
This instruction is missing from LiveIntervals.
I'm not aware of any problems because of this though.
Differential Revision: https://reviews.llvm.org/D28879
llvm-svn: 292521
Summary:
Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future.
This patch is one of a series, see also
- https://reviews.llvm.org/D28623
Reviewers: rengolin, dberris
Reviewed By: dberris
Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown
Differential Revision: https://reviews.llvm.org/D28624
llvm-svn: 292516
As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further
llvm-svn: 292493
Summary:
Adds a RegisterBank tablegen class that can be used to declare the register
banks and an associated tablegen pass to generate the necessary code.
Changes since first commit attempt:
* Added missing guards
* Added more missing guards
* Found and fixed a use-after-free bug involving Twine locals
Reviewers: t.p.northover, ab, rovka, qcolombet
Reviewed By: qcolombet
Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka
Differential Revision: https://reviews.llvm.org/D27338
llvm-svn: 292478
Summary:
Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND.
We already do this for scalar i1 operations so I just extended it to vectors of i1.
Reviewers: zvi, delena
Reviewed By: delena
Subscribers: guyblank, llvm-commits
Differential Revision: https://reviews.llvm.org/D28888
llvm-svn: 292474
For -(x + y) -> (-x) + (-y), if x == -y, this would
change the result from -0.0 to 0.0. Since the fma/fmad
combine is an extension of this problem it also
applies there.
fmul should be fine, and I don't think any of the unary
operators or conversions should be a problem either.
llvm-svn: 292473
There's no neg.f16 instruction, so negation has to
be done via subtraction from zero.
Differential Revision: https://reviews.llvm.org/D28876
llvm-svn: 292452
r291670 doesn't crash on the original testcase from PR31589,
but it crashes on a slightly more complex one.
PR31589 has the new reproducer.
llvm-svn: 292444
ARM seems to prefer that long literals be formed from their little end in
order to promote the fusion of the instrs pairs MOV/MOVK and MOVK/MOVK on
Cortex A57 and others (v. "Cortex A57 Software Optimisation Guide", section
4.14).
Differential revision: https://reviews.llvm.org/D28697
llvm-svn: 292422
Limit register coalescer by not allowing it to artificially increase
size of registers beyond dword. Such super-registers are in fact
register sequences and not distinct HW registers.
With more super-regs we would need to allocate adjacent registers
and constraint regalloc more than needed. Moreover, our super
registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2,
VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers
allocation even more, resulting in excessive spilling.
Differential Revision: https://reviews.llvm.org/D28782
llvm-svn: 292413
A 64-bit relocation does not exist in 32-bit ARMELF. Report an error
instead of crashing.
PR23870
Patch by Sanne Wouda (sanwou01).
Differential Revision: https://reviews.llvm.org/D28851
llvm-svn: 292373