Commit Graph

1923 Commits

Author SHA1 Message Date
Tim Northover f9b560aa8e AArch64: get type from correct result when forming BFX
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one. Hopefully that's all of them.

llvm-svn: 323205
2018-01-23 15:11:27 +00:00
Tim Northover 9f3003d08f AArch64: get type from correct result when forming BFI/BFM
Some nodes produce multiple values so when obtaining the type of an ISD::OR we
need to make sure we ask for the correct one.

llvm-svn: 323202
2018-01-23 14:37:03 +00:00
MinSeong Kim 27f77b4300 [Analysis] Disable exp/exp2/pow finite lib calls on Android with -ffast-math.
Summary:
Since r322087, glibc's finite lib calls are generated when possible.
However, glibc is not supported on Android. Therefore this change
enables llvm to finely distinguish between linux and Android for
unsupported library calls. The change also include some regression
tests.

Reviewers: srhines, pirama

Reviewed By: srhines

Subscribers: kongyi, chh, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D42288

llvm-svn: 323187
2018-01-23 11:11:36 +00:00
Carey Williams da15b5b116 [AArch64] optimise v4f16 fcmps to utilise vector instructions
Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported.
Generating FCTVL(s) rather than a longer series of FCVTs.

Differential Revision: https://reviews.llvm.org/D41772

llvm-svn: 323118
2018-01-22 14:16:11 +00:00
Daniel Neilson 1e68724d24 Remove alignment argument from memcpy/memmove/memset in favour of alignment attributes (Step 1)
Summary:
 This is a resurrection of work first proposed and discussed in Aug 2015:
   http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
and initially landed (but then backed out) in Nov 2015:
   http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

 The @llvm.memcpy/memmove/memset intrinsics currently have an explicit argument
which is required to be a constant integer. It represents the alignment of the
dest (and source), and so must be the minimum of the actual alignment of the
two.

 This change is the first in a series that allows source and dest to each
have their own alignments by using the alignment attribute on their arguments.

 In this change we:
1) Remove the alignment argument.
2) Add alignment attributes to the source & dest arguments. We, temporarily,
   require that the alignments for source & dest be equal.

 For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 100, i32 4, i1 false)
will now read
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 100, i1 false)

 Downstream users may have to update their lit tests that check for
@llvm.memcpy/memmove/memset call/declaration patterns. The following extended sed script
may help with updating the majority of your tests, but it does not catch all possible
patterns so some manual checking and updating will be required.

s~declare void @llvm\.mem(set|cpy|move)\.p([^(]*)\((.*), i32, i1\)~declare void @llvm.mem\1.p\2(\3, i1)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* \3, i8 \4, i8 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* \3, i8 \4, i16 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* \3, i8 \4, i32 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* \3, i8 \4, i64 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* \3, i8 \4, i128 \5, i1 \6)~g
s~call void @llvm\.memset\.p([^(]*)i8\(i8([^*]*)\* (.*), i8 (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i8(i8\2* align \6 \3, i8 \4, i8 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i16\(i8([^*]*)\* (.*), i8 (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i16(i8\2* align \6 \3, i8 \4, i16 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i32\(i8([^*]*)\* (.*), i8 (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i32(i8\2* align \6 \3, i8 \4, i32 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i64\(i8([^*]*)\* (.*), i8 (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i64(i8\2* align \6 \3, i8 \4, i64 \5, i1 \7)~g
s~call void @llvm\.memset\.p([^(]*)i128\(i8([^*]*)\* (.*), i8 (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.memset.p\1i128(i8\2* align \6 \3, i8 \4, i128 \5, i1 \7)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* \4, i8\5* \6, i8 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* \4, i8\5* \6, i16 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* \4, i8\5* \6, i32 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* \4, i8\5* \6, i64 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 [01], i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* \4, i8\5* \6, i128 \7, i1 \8)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i8\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i8 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i8(i8\3* align \8 \4, i8\5* align \8 \6, i8 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i16\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i16 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i16(i8\3* align \8 \4, i8\5* align \8 \6, i16 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i32\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i32 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i32(i8\3* align \8 \4, i8\5* align \8 \6, i32 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i64\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i64 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i64(i8\3* align \8 \4, i8\5* align \8 \6, i64 \7, i1 \9)~g
s~call void @llvm\.mem(cpy|move)\.p([^(]*)i128\(i8([^*]*)\* (.*), i8([^*]*)\* (.*), i128 (.*), i32 ([0-9]*), i1 ([^)]*)\)~call void @llvm.mem\1.p\2i128(i8\3* align \8 \4, i8\5* align \8 \6, i128 \7, i1 \9)~g

 The remaining changes in the series will:
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
   source and dest alignments.
Step 3) Update Clang to use the new IRBuilder API.
Step 4) Update Polly to use the new IRBuilder API.
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
        and those that use use MemIntrinsicInst::[get|set]Alignment() to use
        getDestAlignment() and getSourceAlignment() instead.
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
        MemIntrinsicInst::[get|set]Alignment() methods.

Reviewers: pete, hfinkel, lhames, reames, bollu

Reviewed By: reames

Subscribers: niosHD, reames, jholewinski, qcolombet, jfb, sanjoy, arsenm, dschuff, dylanmckay, mehdi_amini, sdardis, nemanjai, david2050, nhaehnle, javed.absar, sbc100, jgravelle-google, eraman, aheejin, kbarton, JDevlieghere, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, llvm-commits

Differential Revision: https://reviews.llvm.org/D41675

llvm-svn: 322965
2018-01-19 17:13:12 +00:00
Matthias Braun 8bb5228db9 Move tests to the correct place
test/CodeGen/MIR is for testing the MIR parser/printer. Tests for passes
and targets belong to test/CodeGen/TARGETNAME.

llvm-svn: 322925
2018-01-19 06:08:15 +00:00
Matthias Braun 5c290dc206 AArch64: Fix emergency spillslot being out of reach for large callframes
Re-commit of r322200: The testcase shouldn't hit machineverifiers
anymore with r322917 in place.

Large callframes (calls with several hundreds or thousands or
parameters) could lead to situations in which the emergency spillslot is
out of range to be addressed relative to the stack pointer.
This commit forces the use of a frame pointer in the presence of large
callframes.

This commit does several things:
- Compute max callframe size at the end of instruction selection.
- Add mirFileLoaded target callback. Use it to compute the max callframe size
  after loading a .mir file when the size wasn't specified in the file.
- Let TargetFrameLowering::hasFP() return true if there exists a
  callframe > 255 bytes.
- Always place the emergency spillslot close to FP if we have a frame
  pointer.
- Note that `useFPForScavengingIndex()` would previously return false
  when a base pointer was available leading to the emergency spillslot
  getting allocated late (that's the whole effect of this callback).
  Which made no sense to me so I took this case out: Even though the
  emergency spillslot is technically not referenced by FP in this case
  we still want it allocated early.

Differential Revision: https://reviews.llvm.org/D40876

llvm-svn: 322919
2018-01-19 03:16:36 +00:00
Matthias Braun dc4b3e87f4 AArch64: Omit callframe setup/destroy when not necessary
Do not create CALLSEQ_START/CALLSEQ_END when there is no callframe to
setup and the callframe size is 0.

- Fixes an invalid callframe nesting for byval arguments, which would
  look like this before this patch (as in `big-byval.ll`):
    ...
    ADJCALLSTACKDOWN 32768, 0, ...   # Setup for extfunc
    ...
    ADJCALLSTACKDOWN 0, 0, ...  # setup for memcpy
    ...
    BL &memcpy ...
    ADJCALLSTACKUP 0, 0, ...    # destroy for memcpy
    ...
    BL &extfunc
    ADJCALLSTACKUP 32768, 0, ...   # destroy for extfunc

- Saves us two instructions in the common case of zero-sized stackframes.
- Remove an unnecessary scheduling barrier (hence the small unittest
  changes).

Differential Revision: https://reviews.llvm.org/D42006

llvm-svn: 322917
2018-01-19 02:45:38 +00:00
Amara Emerson d5785775f8 [AArch64][GlobalISel] Add isel support for global values in the large code model.
Fixes PR35958.

Differential Revision: https://reviews.llvm.org/D42175

llvm-svn: 322878
2018-01-18 19:21:27 +00:00
Francis Visoiu Mistrih 378b5f3de6 [CodeGen] Print RegClasses on MI in verbose mode
r322086 removed the trailing information describing reg classes for each
register.

This patch adds printing reg classes next to every register when
individual operands/instructions/basic blocks are printed. In the case
of dumping MIR or printing a full function, by default don't print it.

Differential Revision: https://reviews.llvm.org/D42239

llvm-svn: 322867
2018-01-18 17:59:06 +00:00
Justin Bogner a9346e050f GlobalISel: Make MachineCSE runnable in the middle of the GlobalISel
Right now, it is not possible to run MachineCSE in the middle of the
GlobalISel pipeline. Being able to run generic optimizations between the
core passes of GlobalISel was one of the goals of the new ISel framework.
This is the first attempt to do it.

The problem is that MachineCSE pass assumes all register operands have a
register class, which, in GlobalISel context, won't be true until after the
InstructionSelect pass. The reason for this behaviour is that before
replacing one virtual register with another, MachineCSE pass (and most of
the other optimization machine passes) must check if the virtual registers'
constraints have a (sufficiently large) intersection, and constrain the
resulting register appropriately if such intersection exists.

GlobalISel extends the representation of such constraints from just a
register class to a triple (low-level type, register bank, register
class).

This commit adds MachineRegisterInfo::constrainRegAttrs method that extends
MachineRegisterInfo::constrainRegClass to such a triple.

The idea is that going forward we should use:

- RegisterBankInfo::constrainGenericRegister within GlobalISel's
  InstructionSelect pass
- MachineRegisterInfo::constrainRegClass within SelectionDAG ISel
- MachineRegisterInfo::constrainRegAttrs everywhere else regardless
  the target and instruction selector it uses.

Patch by Roman Tereshin. Thanks!

llvm-svn: 322805
2018-01-18 02:06:56 +00:00
Volkan Keles 4aa73a649a Fix the failure caused by r322773
Do not run GlobalISel if `-fast-isel=0 -global-isel=false`.

llvm-svn: 322800
2018-01-18 01:10:30 +00:00
Reid Kleckner 1aa9061c5f [CodeGen] Hoist common AsmPrinter code out of X86, ARM, and AArch64
Every known PE COFF target emits /EXPORT: linker flags into a .drective
section. The AsmPrinter should handle this.

While we're at it, use global_values() and emit each export flag with
its own .ascii directive. This should make the .s file output more
readable.

llvm-svn: 322788
2018-01-17 23:55:23 +00:00
Eli Friedman c60a23a6af [LegalizeDAG] Fix ATOMIC_CMP_SWAP_WITH_SUCCESS legalization.
The code wasn't zero-extending correctly, so the comparison could
spuriously fail.

Adds some AArch64 tests to cover this case.

Inspired by D41791.

Differential Revision: https://reviews.llvm.org/D41798

llvm-svn: 322767
2018-01-17 22:04:36 +00:00
Pablo Barrio f2c29571da [AArch64] Fix incorrect LD1 of 16-bit FP vectors in big endian
Summary:
Loading a vector of 4 half-precision FP sometimes results in an LD1
of 2 single-precision FP + a reversal. This results in an incorrect
byte swap due to the conversion from little endian to big endian.

In order to generate the correct byte swap, it is easier to
generate the correct LD1 of 4 half-precision FP, thus avoiding the
subsequent reversal.

Reviewers: craig.topper, jmolloy, olista01

Reviewed By: olista01

Subscribers: efriedma, samparker, SjoerdMeijer, rogfer01, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41863

llvm-svn: 322663
2018-01-17 14:39:29 +00:00
Volkan Keles f7f2568613 [GlobalISel][TableGen] Add support for SDNodeXForm
Summary:
This patch adds CustomRenderer which renders the matched
operands to the specified instruction.

Targets can enable the matching of SDNodeXForm by adding
a definition that inherits from GICustomOperandRenderer and
GISDNodeXFormEquiv as follows.

def gi_imm8 : GICustomOperandRenderer<"renderImm8”>,
                       GISDNodeXFormEquiv<imm8_xform>;

Custom renderer functions should be of the form:
void render(MachineInstrBuilder &MIB, const MachineInstr &I);

Reviewers: dsanders, ab, rovka

Reviewed By: dsanders

Subscribers: kristof.beyls, javed.absar, llvm-commits, mgrang, qcolombet

Differential Revision: https://reviews.llvm.org/D42012

llvm-svn: 322582
2018-01-16 18:44:05 +00:00
Benjamin Kramer 736a343e97 Revert "[DAG] Elide overlapping stores"
This reverts commit r322085. Internal PPC testing is still showing the
same symptoms as when this patch landed the last time.

llvm-svn: 322474
2018-01-15 10:57:24 +00:00
Jessica Paquette 757e120379 [MachineOutliner] Move hasAddressTaken check to MachineOutliner.cpp
*Mostly* NFC. Still updating the test though just for completeness.

This moves the hasAddressTaken check to MachineOutliner.cpp and replaces it
with a per-basic block test rather than a per-function test. The old test was
too conservative and was preventing functions in C programs from being
outlined even though they were safe to outline.

This was mostly a problem in C sources.

llvm-svn: 322425
2018-01-13 00:42:28 +00:00
Sanjay Patel e63d8dda5a [ValueTracking] recognize min/max-of-min/max with notted ops (PR35875)
This was originally planned as the fix for:
https://bugs.llvm.org/show_bug.cgi?id=35834
...but simpler transforms handled that case, so I implemented a 
lesser solution. It turns out we need to handle the case with 'not'
ops too because the real code example that we are trying to solve:
https://bugs.llvm.org/show_bug.cgi?id=35875
...has extra uses of the intermediate values, so we can't rely on 
smaller canonicalizations to get us to the goal.

As with rL321672, I've tried to show every possibility in the
codegen tests because that's the simplest way to prove we're doing
the right thing in the wide variety of permutations of this pattern.

We can also show an InstCombine win because we added a fold for
this case in:
rL321998 / D41603

An Alive proof for one variant of the pattern to show that the 
InstCombine and codegen results are correct:
https://rise4fun.com/Alive/vd1

Name: min3_nots
  %nx = xor i8 %x, -1
  %ny = xor i8 %y, -1
  %nz = xor i8 %z, -1
  %cmpxz = icmp slt i8 %nx, %nz
  %minxz = select i1 %cmpxz, i8 %nx, i8 %nz
  %cmpyz = icmp slt i8 %ny, %nz
  %minyz = select i1 %cmpyz, i8 %ny, i8 %nz
  %cmpyx = icmp slt i8 %y, %x
  %r = select i1 %cmpyx, i8 %minxz, i8 %minyz
=>
  %cmpxyz = icmp slt i8 %minxz, %ny
  %r = select i1 %cmpxyz, i8 %minxz, i8 %ny

Name: min3_nots_alt
  %nx = xor i8 %x, -1
  %ny = xor i8 %y, -1
  %nz = xor i8 %z, -1
  %cmpxz = icmp slt i8 %nx, %nz
  %minxz = select i1 %cmpxz, i8 %nx, i8 %nz
  %cmpyz = icmp slt i8 %ny, %nz
  %minyz = select i1 %cmpyz, i8 %ny, i8 %nz
  %cmpyx = icmp slt i8 %y, %x
  %r = select i1 %cmpyx, i8 %minxz, i8 %minyz
=>
  %xz = icmp sgt i8 %x, %z
  %maxxz = select i1 %xz, i8 %x, i8 %z
  %xyz = icmp sgt i8 %maxxz, %y
  %maxxyz = select i1 %xyz, i8 %maxxz, i8 %y
  %r = xor i8 %maxxyz, -1

llvm-svn: 322283
2018-01-11 15:13:47 +00:00
Sanjay Patel f16fe0f205 [AArch64] add tests for notted variants of min/max; NFC
Like rL321668 / rL321672, the planned optimizer change to 
fix these will be in ValueTracking, but we can test the
changes cleanly here with AArch64 codegen.

llvm-svn: 322238
2018-01-10 23:31:42 +00:00
Matthias Braun e3a8db7ba1 Revert "AArch64: Fix emergency spillslot being out of reach for large callframes"
Revert for now as the testcase is hitting a pre-existing verifier error
that manifest as a failure when expensive checks are enabled (or
-verify-machineinstrs) is used.

This reverts commit r322200.

llvm-svn: 322231
2018-01-10 22:36:28 +00:00
Jessica Paquette c191f1097c [MachineOutliner] Outline ADRPs
ADRP instructions weren't being outlined because they're PC-relative and thus
fail the LR checks. This patch adds a special case for ADRPs to
getOutliningType to make sure that ADRPs can be outlined and updates the MIR
test.

llvm-svn: 322207
2018-01-10 18:49:57 +00:00
Matthias Braun b42ffa1283 AArch64: Fix emergency spillslot being out of reach for large callframes
Large callframes (calls with several hundreds or thousands or
parameters) could lead to situations in which the emergency spillslot is
out of range to be addressed relative to the stack pointer.
This commit forces the use of a frame pointer in the presence of large
callframes.

This commit does several things:
- Compute max callframe size at the end of instruction selection.
- Add mirFileLoaded target callback. Use it to compute the max callframe size
  after loading a .mir file when the size wasn't specified in the file.
- Let TargetFrameLowering::hasFP() return true if there exists a
  callframe > 255 bytes.
- Always place the emergency spillslot close to FP if we have a frame
  pointer.
- Note that `useFPForScavengingIndex()` would previously return false
  when a base pointer was available leading to the emergency spillslot
  getting allocated late (that's the whole effect of this callback).
  Which made no sense to me so I took this case out: Even though the
  emergency spillslot is technically not referenced by FP in this case
  we still want it allocated early.

Differential Revision: https://reviews.llvm.org/D40876

llvm-svn: 322200
2018-01-10 18:16:24 +00:00
Puyan Lotfi fe6c9cbb24 [MIR] Repurposing '$' sigil used by external symbols. Replacing with '&'.
Planning to add support for named vregs. This puts is in a conundrum since
physregs are named as well. To rectify this we need to use a sigil other than
'%' for physregs in MIR. We've settled on using '$' for physregs but first we
must repurpose it from external symbols using it, which is what this commit is
all about. We think '&' will have familiar semantics for C/C++ users.

llvm-svn: 322146
2018-01-10 00:56:48 +00:00
Francis Visoiu Mistrih 72cc21eefe [CodeGen] Print frame-setup/destroy flags in -debug output like we do in MIR
Currently the MachineInstr::print function prints the
frame-setup/frame-destroy differently than it does in MIR.

Instead of:

  %x21 = LDR %sp, -16; flags: FrameDestroy

print:

  %x21 = frame-destroy LDR %sp, -16

llvm-svn: 322088
2018-01-09 16:11:51 +00:00
Francis Visoiu Mistrih 2b3bd30637 [CodeGen] Don't print register classes in -debug output
Since register classes and banks are already printed with the register
definition, don't print it at the end of every instruction anymore.

This follows MIR in this regard and is another step to the unification
of the two formats.

llvm-svn: 322086
2018-01-09 15:39:44 +00:00
Nirav Dave 30304a3bd7 [DAG] Elide overlapping stores
Relanding after fixing handling of pre-indexed memory operations in
BaseIndexOffset analysis (r322003).

Extend overlapping store elision to handle overwrites of stores by
larger stores.

Reviewers: craig.topper, rnk, t.p.northover

Subscribers: javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40969

llvm-svn: 322085
2018-01-09 15:23:12 +00:00
Francis Visoiu Mistrih dbf2c48fc7 [MIR] Add support for the frame-destroy MachineInstr flag
We are printing / parsing the `frame-setup` MachineInstr flag but not
the `frame-destroy` one.

Differential Revision: https://reviews.llvm.org/D41509

llvm-svn: 322071
2018-01-09 11:33:22 +00:00
Jessica Paquette 3291e7353e [MachineOutliner] AArch64: Handle instrs that use SP and will never need fixups
This commit does two things. Firstly, it adds a collection of flags which can
be passed along to the target to encode information about the MBB that an
instruction lives in to the outliner.

Second, it adds some of those flags to the AArch64 outliner in order to add
more stack instructions to the list of legal instructions that are handled
by the outliner. The two flags added check if

- There are calls in the MachineBasicBlock containing the instruction
- The link register is available in the entire block

If the link register is available and there are no calls, then a stack
instruction can always be outlined without fixups, regardless of what it is,
since in this case, the outliner will never modify the stack to create a
call or outlined frame.

The motivation for doing this was checking which instructions are most often
missed by the outliner. Instructions like, say

%sp<def> = ADDXri %sp, 32, 0; flags: FrameDestroy

are very common, but cannot be outlined in the case that the outliner might
modify the stack. This commit allows us to outline instructions like this.
  

llvm-svn: 322048
2018-01-09 00:26:18 +00:00
Florian Hahn e970d64ec5 [AArch64] Fix -mcpu option in aarch64-combine-fmul-fsub.mir (NFC)
llvm-svn: 321865
2018-01-05 11:17:48 +00:00
Aditya Nandakumar 5710c44eee [GISel]: Don't create G_MUL with 1 during translation of GEP
When element size is 1, it's just wasteful to create MUL with 1.
https://reviews.llvm.org/D41738

llvm-svn: 321857
2018-01-05 02:56:28 +00:00
Evandro Menezes 6161a0b3b0 [AArch64] Improve code generation of vector build
Instead of using, for example, `dup v0.4s, wzr`, which transfers between
register files, use the more efficient `movi v0.4s, #0` instead.

Differential revision: https://reviews.llvm.org/D41515

llvm-svn: 321824
2018-01-04 21:43:12 +00:00
Amara Emerson 9de62130fd [GlobalISel][Legalizer] Fix legalization of llvm.smul.with.overflow
Previously the code for handling G_SMULO didn't properly check for the signed
multiply overflow, instead treating it the same as the unsigned G_UMULO.

Fixes PR35800.

llvm-svn: 321690
2018-01-03 04:56:56 +00:00
Sanjay Patel 24e6a8bde0 [AArch64] fix typos in comments; NFC
llvm-svn: 321673
2018-01-02 21:04:08 +00:00
Sanjay Patel 7811430588 [ValueTracking] recognize min/max of min/max patterns
This is part of solving PR35717:
https://bugs.llvm.org/show_bug.cgi?id=35717

The larger IR optimization is proposed in D41603, but we can show 
the improvement in ValueTracking using codegen tests because 
SelectionDAG creates min/max nodes based on ValueTracking. 

Any target with min/max ops should show wins here. I chose AArch64
vector ops because they're clean and uniform.

Some Alive proofs for the tests (can't put more than 2 tests in 1 
page currently because the web app says it's too long):
https://rise4fun.com/Alive/WRN
https://rise4fun.com/Alive/iPm
https://rise4fun.com/Alive/HmY
https://rise4fun.com/Alive/CNm
https://rise4fun.com/Alive/LYf

llvm-svn: 321672
2018-01-02 20:56:45 +00:00
Sanjay Patel 35a6ee86af [AArch64] add tests for min/max of min/max (PR35717); NFC
llvm-svn: 321668
2018-01-02 20:16:45 +00:00
Amara Emerson 913918cbef [AArch64][GlobalISel] Fix assert fail with unknown intrinsic.
A call may have an intrinsic name but not have a valid intrinsic ID,
for example with llvm.invariant.group.barrier. If so, treat it as a
normal call like FastISel does.

llvm-svn: 321662
2018-01-02 18:56:39 +00:00
Amara Emerson 854d10d10b [AArch64][GlobalISel] Enable GlobalISel at -O0 by default
Tests updated to explicitly use fast-isel at -O0 instead of implicitly.

This change also allows an explicit -fast-isel option to override an
implicitly enabled global-isel. Otherwise -fast-isel would have no effect at -O0.

Differential Revision: https://reviews.llvm.org/D41362

llvm-svn: 321655
2018-01-02 16:30:47 +00:00
Daniel Jasper cc4903e2ba Revert r321089: "[DAG] Elide overlapping store" (and subsequent fix in r321204)
Our internal testing has revealed has discovered bugs in PPC builds.
I have forward reproduction instructions to the original author (Nirav).

llvm-svn: 321649
2018-01-02 14:38:52 +00:00
Matthew Simpson 9439f54902 [AArch64] Change order of candidate FMLS patterns
r319980 added new patterns to the machine combiner for transforming (fsub (fmul
x y) z) into (fmla (fneg z) x y). That is, fsub's where the first source
operand is an fmul are transformed. We previously only matched the case where
the second source operand of an fsub was an fmul, transforming (fsub z (fmul x
y)) into (fmls z x y). Now, if we have an fsub where both source operands are
fmuls, both of the above patterns are applicable.

However, the order in which we add the patterns to the list of candidates
determines the transformation that takes place, since only the first pattern
that matches will be used. This patch changes the order these two patterns are
added to the list of candidates such that we prefer the case where the second
source operand is an fmul (the fmls case), rather than the other one (the
fmla/fneg case). When both source operands are fmuls, this ordering results in
fewer instructions.

Differential Revision: https://reviews.llvm.org/D41587

llvm-svn: 321491
2017-12-27 15:25:01 +00:00
Simon Pilgrim b17c204cc0 [DAGCombine] visitANDLike - ensure APInt is is in range for getSExtValue/getZExtValue
Reduced from oss-fuzz #4782 test case

llvm-svn: 321464
2017-12-26 23:27:44 +00:00
Michael Zolotukhin ad371e0caa [SimplifyCFG] Avoid quadratic on a predecessors number behavior in instruction sinking.
If a block has N predecessors, then the current algorithm will try to
sink common code to this block N times (whenever we visit a
predecessor). Every attempt to sink the common code includes going
through all predecessors, so the complexity of the algorithm becomes
O(N^2).
With this patch we try to sink common code only when we visit the block
itself. With this, the complexity goes down to O(N).
As a side effect, the moment the code is sunk is slightly different than
before (the order of simplifications has been changed), that's why I had
to adjust two tests (note that neither of the tests is supposed to test
SimplifyCFG):
* test/CodeGen/AArch64/arm64-jumptable.ll - changes in this test mimic
the changes that previous implementation of SimplifyCFG would do.
* test/CodeGen/ARM/avoid-cpsr-rmw.ll - in this test I disabled common
code sinking by a command line flag.

llvm-svn: 321236
2017-12-21 01:22:13 +00:00
Tim Northover 6db5d027c6 AArch64: fix one more place movi.2d could be created.
Somehow got missed out of r320965.

llvm-svn: 321162
2017-12-20 10:45:39 +00:00
Martin Storsjo 2778fd0b59 [AArch64] Implement stack probing for windows
Differential Revision: https://reviews.llvm.org/D41131

llvm-svn: 321150
2017-12-20 06:51:45 +00:00
Francis Visoiu Mistrih 3b265c8fcf [CodeGen] Move printing MO_FPImmediate operands to MachineOperand::print
Work towards the unification of MIR and debug output by refactoring the
interfaces.

llvm-svn: 321110
2017-12-19 21:47:00 +00:00
Amara Emerson b6ddbef673 [GlobalISel][Legalizer] Fix crash when trying to lower G_FNEG of fp128 types.
This doesn't add legalizer support, just prevents crashing so that we
can gracefully fall back to SDAG.

Fixes PR35690.

llvm-svn: 321091
2017-12-19 17:21:35 +00:00
Nirav Dave 51425fa5ba [DAG] Elide overlapping store
Summary:
Extend overlapping store elision to handle overwrites of stores by
larger stores.

Nontemporal tests have been modified to add memory dependencies to
prevent store elision.

Reviewers: craig.topper, rnk, t.p.northover

Subscribers: javed.absar, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D40969

llvm-svn: 321089
2017-12-19 17:10:56 +00:00
Justin Bogner 4314f3adc2 update_mir_test_checks: Accept IR as input as well as MIR
We need to handle IR for tests that want to do lowering (or just
-stop-after with IR as input). I've run this on one AArch64 test to
demonstrate what it looks like.

llvm-svn: 321048
2017-12-19 00:49:04 +00:00
Matthias Braun e29c0b8862 TargetLoweringBase: Followup to r321035
I missed some prefixes and the fact that on AArch64 we use "bzero"
instead of "__bzero" as on X86 when doing my refactoring in r321035.

Improve tests for bzero.

llvm-svn: 321046
2017-12-19 00:43:00 +00:00
Evandro Menezes 687df6380e [AArch64] Expand test coverage of vector element shuffling to Exynos
Make sure that all test cases are run for Exynos as well.  Otherwise, NFC.

llvm-svn: 321032
2017-12-18 22:17:39 +00:00