Commit Graph

22928 Commits

Author SHA1 Message Date
Matthias Braun 08abcac9dc PeepholeOptimizer: Do not form PHI with subreg arguments
When replacing a PHI the PeepholeOptimizer currently takes the register
class of the register at the first operand. This however is not correct
if this argument has a subregister index.

As there is currently no API to query the register class resulting from
applying a subregister index to all registers in a class, we can only
abort in these cases and not perform the transformation.

This changes findNextSource() to require the end of all copy chains to
not use a subregister if there is any PHI in the chain. I had to rewrite
the overly complicated inner loop there to have a good place to insert
the new check.

This fixes https://llvm.org/PR33071 (aka rdar://32262041)

Differential Revision: https://reviews.llvm.org/D40758

llvm-svn: 322313
2018-01-11 21:57:03 +00:00
Evgeniy Stepanov 5223b5d9d6 [arm] Implement Target Operand Flag MIR serialization.
Reviewers: efriedma, pcc

Subscribers: aemerson, javed.absar, kristof.beyls, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D39975

llvm-svn: 322312
2018-01-11 21:37:58 +00:00
Craig Topper 2aac3ee5bc [X86] Legalize 128/256 gathers/scatters on KNL by using widening rather than sign extending the index.
We can just widen the vectors with undef and zero extend the mask.

llvm-svn: 322308
2018-01-11 19:38:30 +00:00
Krzysztof Parzyszek 240df6faa4 [Hexagon] Fix building 64-bit vector from constant values
The constants were aggregated in a reverse order.

llvm-svn: 322303
2018-01-11 18:30:41 +00:00
Krzysztof Parzyszek 4ef6cfff6a [Hexagon] Cast elements to correct type when creating constant vector
llvm-svn: 322301
2018-01-11 18:03:23 +00:00
Zvi Rackover 999e6c2967 DAGCombine: Let truncates negate extension through extract-subvector
Summary:
Fold cases such as:
(v8i8 truncate (v8i32 extract_subvector (v16i32 sext (v16i8 V), Idx)))
->
(v8i8 extract_subvector (v16i8 V), Idx)

This can be generalized to cases where the truncate and extend do not
fully cancel each other out, but it may require querying the target
about profitability.

Reviewers: RKSimon, craig.topper, spatel, efriedma

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41927

llvm-svn: 322300
2018-01-11 18:02:33 +00:00
Zvi Rackover cf0999887a X86 Tests: Add zext cases in (trunc (subvector)) test. NFC
Cases were missing as observed in D41927

llvm-svn: 322297
2018-01-11 17:50:34 +00:00
Simon Pilgrim 8de035670e [X86][SSE] Drop old insertps stack folding test
Broken test from old attempt for folding tables - we don't peek through extract_subvector spills at all (which is why it doesn't fold), and we already have foldMemoryOperandCustom to handle insertps immediate correction anyway.

llvm-svn: 322292
2018-01-11 16:57:58 +00:00
Sanjay Patel e63d8dda5a [ValueTracking] recognize min/max-of-min/max with notted ops (PR35875)
This was originally planned as the fix for:
https://bugs.llvm.org/show_bug.cgi?id=35834
...but simpler transforms handled that case, so I implemented a 
lesser solution. It turns out we need to handle the case with 'not'
ops too because the real code example that we are trying to solve:
https://bugs.llvm.org/show_bug.cgi?id=35875
...has extra uses of the intermediate values, so we can't rely on 
smaller canonicalizations to get us to the goal.

As with rL321672, I've tried to show every possibility in the
codegen tests because that's the simplest way to prove we're doing
the right thing in the wide variety of permutations of this pattern.

We can also show an InstCombine win because we added a fold for
this case in:
rL321998 / D41603

An Alive proof for one variant of the pattern to show that the 
InstCombine and codegen results are correct:
https://rise4fun.com/Alive/vd1

Name: min3_nots
  %nx = xor i8 %x, -1
  %ny = xor i8 %y, -1
  %nz = xor i8 %z, -1
  %cmpxz = icmp slt i8 %nx, %nz
  %minxz = select i1 %cmpxz, i8 %nx, i8 %nz
  %cmpyz = icmp slt i8 %ny, %nz
  %minyz = select i1 %cmpyz, i8 %ny, i8 %nz
  %cmpyx = icmp slt i8 %y, %x
  %r = select i1 %cmpyx, i8 %minxz, i8 %minyz
=>
  %cmpxyz = icmp slt i8 %minxz, %ny
  %r = select i1 %cmpxyz, i8 %minxz, i8 %ny

Name: min3_nots_alt
  %nx = xor i8 %x, -1
  %ny = xor i8 %y, -1
  %nz = xor i8 %z, -1
  %cmpxz = icmp slt i8 %nx, %nz
  %minxz = select i1 %cmpxz, i8 %nx, i8 %nz
  %cmpyz = icmp slt i8 %ny, %nz
  %minyz = select i1 %cmpyz, i8 %ny, i8 %nz
  %cmpyx = icmp slt i8 %y, %x
  %r = select i1 %cmpyx, i8 %minxz, i8 %minyz
=>
  %xz = icmp sgt i8 %x, %z
  %maxxz = select i1 %xz, i8 %x, i8 %z
  %xyz = icmp sgt i8 %maxxz, %y
  %maxxyz = select i1 %xyz, i8 %maxxz, i8 %y
  %r = xor i8 %maxxyz, -1

llvm-svn: 322283
2018-01-11 15:13:47 +00:00
Simon Pilgrim 6e6da3f449 [X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding
Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering.

llvm-svn: 322279
2018-01-11 14:25:18 +00:00
Zvi Rackover 3ee66d9cd1 X86: Fix LowerBUILD_VECTORAsVariablePermute for case Src is smaller than Indices
Summary:
As RKSimon suggested in pr35820, in the case that Src is smaller in
bit-size than Indices, need to widen Src to avoid type mismatch.

Fixes pr35820

Reviewers: RKSimon, craig.topper

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41865

llvm-svn: 322272
2018-01-11 12:26:52 +00:00
Alex Bradbury 0715d35ed5 [RISCV] Reserve an emergency spill slot for the register scavenger when necessary
Although the register scavenger can often find a spare register, an emergency 
spill slot is needed to guarantee success. Reserve this slot in cases where 
the function is known to have a large stack (meaning the scavenger may be 
needed when forming stack addresses).

llvm-svn: 322269
2018-01-11 11:17:19 +00:00
Craig Topper 0b59034b15 [X86] Optimize v2i32/v2f32 scatters.
If the index is v2i64 we can use the scatter instruction that has v4i32/v4f32 data register, v2i64 index, and v2i1 mask. Similar was already done for gather.

Implement custom widening for v2i32 data to remove the code that reverses type legalization during lowering.

llvm-svn: 322254
2018-01-11 06:31:28 +00:00
Sanjay Patel f16fe0f205 [AArch64] add tests for notted variants of min/max; NFC
Like rL321668 / rL321672, the planned optimizer change to 
fix these will be in ValueTracking, but we can test the
changes cleanly here with AArch64 codegen.

llvm-svn: 322238
2018-01-10 23:31:42 +00:00
Matthias Braun e3a8db7ba1 Revert "AArch64: Fix emergency spillslot being out of reach for large callframes"
Revert for now as the testcase is hitting a pre-existing verifier error
that manifest as a failure when expensive checks are enabled (or
-verify-machineinstrs) is used.

This reverts commit r322200.

llvm-svn: 322231
2018-01-10 22:36:28 +00:00
Alex Bradbury 315cd3ace4 [RISCV] Implement support for the BranchRelaxation pass
Branch relaxation is needed to support branch displacements that overflow the
instruction's immediate field.

Differential Revision: https://reviews.llvm.org/D40830

llvm-svn: 322224
2018-01-10 21:05:07 +00:00
Matthias Braun 725ad0eee0 TargetLoweringBase: The ios simulator has no bzero function.
Make sure I really get back to the beahvior before my rewrite in r321035
which turned out not to be completely NFC as I changed the behavior for
the ios simulator environment.

llvm-svn: 322223
2018-01-10 20:49:57 +00:00
Alex Bradbury e027c93ac2 [RISCV] Implement branch analysis
This is a prerequisite for the branch relaxation pass, and allows a number of
optimisation passes (e.g. BranchFolding and MachineBlockPlacement) to work.

Differential Revision: https://reviews.llvm.org/D40808

llvm-svn: 322222
2018-01-10 20:47:00 +00:00
Alex Bradbury 70f137b6bf [RISCV] Add support for llvm.{frameaddress,returnaddress} intrinsics
llvm-svn: 322218
2018-01-10 20:12:00 +00:00
Alex Bradbury 9330e64485 [RISCV] Add basic support for inline asm constraints
llvm-svn: 322217
2018-01-10 20:05:09 +00:00
Alex Bradbury 9fea4881d0 [RISCV] Support stack frames and offsets up to 32-bits
Differential Revision: https://reviews.llvm.org/D40807

llvm-svn: 322216
2018-01-10 19:53:46 +00:00
Alex Bradbury c85be0de56 [RISCV] Support for varargs
Includes support for expanding va_copy. Also adds support for using 'aligned'
registers when necessary for vararg calls, and ensure the frame pointer always
points to the bottom of the vararg spill region. This is necessary to ensure
that the saved return address and stack pointer are always available at fixed
known offsets of the frame pointer.

Differential Revision: https://reviews.llvm.org/D40805

llvm-svn: 322215
2018-01-10 19:41:03 +00:00
Craig Topper af4eb17223 [SelectionDAG][X86] Explicitly store the scale in the gather/scatter ISD nodes
Currently we infer the scale at isel time by analyzing whether the base is a constant 0 or not. If it is we assume scale is 1, else we take it from the element size of the pass thru or stored value. This seems a little weird and I think it makes more sense to make it explicit in the DAG rather than doing tricky things in the backend.

Most of this patch is just making sure we copy the scale around everywhere.

Differential Revision: https://reviews.llvm.org/D40055

llvm-svn: 322210
2018-01-10 19:16:05 +00:00
Jessica Paquette c191f1097c [MachineOutliner] Outline ADRPs
ADRP instructions weren't being outlined because they're PC-relative and thus
fail the LR checks. This patch adds a special case for ADRPs to
getOutliningType to make sure that ADRPs can be outlined and updates the MIR
test.

llvm-svn: 322207
2018-01-10 18:49:57 +00:00
Matthias Braun b42ffa1283 AArch64: Fix emergency spillslot being out of reach for large callframes
Large callframes (calls with several hundreds or thousands or
parameters) could lead to situations in which the emergency spillslot is
out of range to be addressed relative to the stack pointer.
This commit forces the use of a frame pointer in the presence of large
callframes.

This commit does several things:
- Compute max callframe size at the end of instruction selection.
- Add mirFileLoaded target callback. Use it to compute the max callframe size
  after loading a .mir file when the size wasn't specified in the file.
- Let TargetFrameLowering::hasFP() return true if there exists a
  callframe > 255 bytes.
- Always place the emergency spillslot close to FP if we have a frame
  pointer.
- Note that `useFPForScavengingIndex()` would previously return false
  when a base pointer was available leading to the emergency spillslot
  getting allocated late (that's the whole effect of this callback).
  Which made no sense to me so I took this case out: Even though the
  emergency spillslot is technically not referenced by FP in this case
  we still want it allocated early.

Differential Revision: https://reviews.llvm.org/D40876

llvm-svn: 322200
2018-01-10 18:16:24 +00:00
Simon Pilgrim f74e3f45dc [X86][MMX] Add test for PR35869
llvm-svn: 322197
2018-01-10 17:05:03 +00:00
Zvi Rackover a27442f4f4 X86 Tests: Add isel tests for truncate-extract_vector-extend. NFC.
To be improved in a future patch

llvm-svn: 322192
2018-01-10 14:56:15 +00:00
Simon Pilgrim a0c59cce0e [X86][SSE] Add some basic FABS combine tests
llvm-svn: 322182
2018-01-10 13:28:34 +00:00
Simon Pilgrim a330a407c4 [X86][SSE] Add v2f64 u2 shuffle test
Adds missing coverage for SHUFPD undef argument lowering, and also shows a missed opportunity to remove a unnecessary move compared to 02 shuffle mask.

llvm-svn: 322175
2018-01-10 12:23:39 +00:00
Diana Picus e3591f3a17 [ARM GlobalISel] Add inst selector tests for G_FNEG s32 and s64
G_FNEG is already handled by the TableGen'erated code. Just add a few
tests to make sure everything works as expected.

llvm-svn: 322170
2018-01-10 11:13:36 +00:00
Diana Picus 0ed7513c83 [ARM GlobalISel] Map G_FNEG to the FPR bank
llvm-svn: 322169
2018-01-10 11:13:31 +00:00
Diana Picus f949a0abac [ARM GlobalISel] Legalize G_FNEG for s32 and s64
For hard float, it is legal.

For soft float, we need to lower to 0 - x first, and then we can use the
libcall for G_FSUB. This is undoing some of the canonicalization
performed by the IRTranslator (which introduces G_FNEG when it sees a
0 - x). Ideally, that canonicalization would be performed by a
pre-legalizer pass that would allow targets to opt out of this behaviour
rather than dance around it in the legalizer.

llvm-svn: 322168
2018-01-10 10:45:34 +00:00
Jonas Paulsson 1a76f3a2c2 Temporarily revert
"[SystemZ]  Check for legality before doing LOAD AND TEST transformations."

, due to test failures.

llvm-svn: 322165
2018-01-10 10:05:55 +00:00
Diana Picus 8f14886630 [ARM GlobalISel] Legalize s32/s64 G_FCONSTANT
Legal for hard float.
Change to G_CONSTANT for soft float (but preserve the binary
representation).

llvm-svn: 322164
2018-01-10 10:01:49 +00:00
Jonas Paulsson 9222b91e24 [SelectionDAGBuilder] Chain prefetches less aggressively.
Prefetches used to always be chained between any previous and following
memory accesses. The problem with this was that later optimizations, such as
folding of a load into the user instruction, got disrupted.

This patch relaxes the chaining of prefetches in order to remedy this.

Reveiw: Hal Finkel
https://reviews.llvm.org/D38886

llvm-svn: 322163
2018-01-10 09:33:00 +00:00
Diana Picus 734a5e8912 [ARM GlobalISel] Legalize G_CONSTANT for scalars > 32 bits
Make G_CONSTANT narrow for any scalars larger than 32 bits.

llvm-svn: 322162
2018-01-10 09:32:01 +00:00
Jonas Paulsson d9dde1ac56 [SystemZ] Check for legality before doing LOAD AND TEST transformations.
Since a load and test instruction treat its operands as signed, it can only
replace a logical compare for EQ/NE uses.

Review: Ulrich Weigand
https://bugs.llvm.org/show_bug.cgi?id=35662

llvm-svn: 322161
2018-01-10 09:18:17 +00:00
Puyan Lotfi fe6c9cbb24 [MIR] Repurposing '$' sigil used by external symbols. Replacing with '&'.
Planning to add support for named vregs. This puts is in a conundrum since
physregs are named as well. To rectify this we need to use a sigil other than
'%' for physregs in MIR. We've settled on using '$' for physregs but first we
must repurpose it from external symbols using it, which is what this commit is
all about. We think '&' will have familiar semantics for C/C++ users.

llvm-svn: 322146
2018-01-10 00:56:48 +00:00
Adrian McCarthy db2736ddd8 Reland "Emit Function IDs table for Control Flow Guard"
Adds option /guard:cf to clang-cl and -cfguard to cc1 to emit function IDs
of functions that have their address taken into a section named .gfids$y for
compatibility with Microsoft's Control Flow Guard feature.

The original patch didn't have the lit.local.cfg file that restricts the new
test to x86, thus the new test was failing on the non-x86 bots.

Differential Revision: https://reviews.llvm.org/D40531

The reverts r322008, which was a revert of r322005.

This reverts commit a05b89f9aca70597dc79fe97bc49b50b51f525ba.

llvm-svn: 322136
2018-01-09 23:49:30 +00:00
Sam Clegg ea7caceedc [WebAssembly] Add COMDAT support
This adds COMDAT support to the Wasm object-file format.
Spec: https://github.com/WebAssembly/tool-conventions/pull/31

Corresponding LLD change:
https://bugs.llvm.org/show_bug.cgi?id=35533, and D40845

Patch by Nicholas Wilson

Differential Revision: https://reviews.llvm.org/D40844

llvm-svn: 322135
2018-01-09 23:43:14 +00:00
Stefan Pintilie 1712700842 [PowerPC] Manually schedule the prologue and epilogue
This patch makes the following changes to the schedule of instructions in the
prologue and epilogue.

The stack pointer update is moved down in the prologue so that the callee saves
do not have to wait for the update to happen.
Saving the lr is moved down in the prologue to hide the latency of the mflr.
The stack pointer is moved up in the epilogue so that restoring of the lr can
happen sooner.
The mtlr is moved up in the epilogue so that it is away form the blr at the end
of the epilogue. The latency of the mtlr can now be hidden by the loads of the
callee saved registers.

This commit is almost identical to this one: r322036 except that two warnings
that broke build bots have been fixed.

The revision number is D41737 as before.

llvm-svn: 322124
2018-01-09 21:57:49 +00:00
Tim Renouf d68fa1be57 [SelectionDAG] Fixed f16-from-vector promotion problem
Summary:
In the case of an fp_extend of v1f16 to v1f32 where the v1f16 is the
result of a bitcast from i16, avoid creating an illegal fp16_to_fp where
the input is not a vector and the result is a v1f32.

V2: The fix is now to avoid vector scalarization creating a v1->scalar
bitcast.

Reviewers: srhines, t.p.northover

Subscribers: nhaehnle, llvm-commits, dstuttard, t-tye, yaxunl, wdng, kzhuravl, arsenm

Differential Revision: https://reviews.llvm.org/D41126

llvm-svn: 322120
2018-01-09 21:36:25 +00:00
Tim Renouf 6eaad1e539 [AMDGPU] Fixed incorrect uniform branch condition
Summary:
I had a case where multiple nested uniform ifs resulted in code that did
v_cmp comparisons, combining the results with s_and_b64, s_or_b64 and
s_xor_b64 and using the resulting mask in s_cbranch_vccnz, without first
ensuring that bits for inactive lanes were clear.

There was already code for inserting an "s_and_b64 vcc, exec, vcc" to
clear bits for inactive lanes in the case that the branch is instruction
selected as s_cbranch_scc1 and is then changed to s_cbranch_vccnz in
SIFixSGPRCopies. I have added the same code into SILowerControlFlow for
the case that the branch is instruction selected as s_cbranch_vccnz.

This de-optimizes the code in some cases where the s_and is not needed,
because vcc is the result of a v_cmp, or multiple v_cmp instructions
combined by s_and/s_or. We should add a pass to re-optimize those cases.

Reviewers: arsenm, kzhuravl

Subscribers: wdng, yaxunl, t-tye, llvm-commits, dstuttard, timcorringham, nhaehnle

Differential Revision: https://reviews.llvm.org/D41292

llvm-svn: 322119
2018-01-09 21:34:43 +00:00
Craig Topper c4d2dd80b6 [X86] Add a DAG combine to combine (sext (setcc)) with VLX
Normally target independent DAG combine would do this combine based on getSetCCResultType, but with VLX getSetCCResultType returns a vXi1 type preventing the DAG combining from kicking in.

But doing this combine can allow us to remove the explicit sign extend that would otherwise be emitted.

This patch adds a target specific DAG combine to combine the sext+setcc when the result type is the same size as the input to the setcc. I've restricted this to FP compares and things that can be represented with PCMPEQ and PCMPGT since we don't have full integer compare support on the older ISAs.

Differential Revision: https://reviews.llvm.org/D41850

llvm-svn: 322101
2018-01-09 18:14:22 +00:00
Francis Visoiu Mistrih 7d9bef8f5c [CodeGen] Don't print "pred:" and "opt:" in -debug output
In -debug output we print "pred:" whenever a MachineOperand is a
predicate operand in the instruction descriptor, and "opt:" whenever a
MachineOperand is an optional def in the instruction descriptor.

Differential Revision: https://reviews.llvm.org/D41870

llvm-svn: 322096
2018-01-09 17:31:07 +00:00
Zvi Rackover 72b0bb1405 X86 Tests: Update more isel tests with FastVariableShuffle feature
Summary:
Added the FastVariableShuffle feature to cases that resembled processors
for which this fearure is on.
For AVX2 there are processors with and w/o this fearue enable.
For AVX512 only KNL does enable this feature so cases which only have
+avx512f were left without the FastVariableShuffle enabled.

Reviewers: RKSimon, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D41851

llvm-svn: 322090
2018-01-09 16:26:06 +00:00
Zvi Rackover b11e84c1d8 X86 Tests: Add common check prefix to test-case. NFC.
As suggested in D41851

llvm-svn: 322089
2018-01-09 16:14:15 +00:00
Francis Visoiu Mistrih 72cc21eefe [CodeGen] Print frame-setup/destroy flags in -debug output like we do in MIR
Currently the MachineInstr::print function prints the
frame-setup/frame-destroy differently than it does in MIR.

Instead of:

  %x21 = LDR %sp, -16; flags: FrameDestroy

print:

  %x21 = frame-destroy LDR %sp, -16

llvm-svn: 322088
2018-01-09 16:11:51 +00:00
Sanjay Patel 37e28e40cb [SelectionDAG] lower math intrinsics to finite version of libcalls when possible (PR35672)
Ingredients in this patch:
1. Add HANDLE_LIBCALL defs for finite mathlib functions that correspond to LLVM intrinsics.
2. Plumbing to send TargetLibraryInfo down to SelectionDAGLegalize.
3. Relaxed math and library checking in SelectionDAGLegalize::ConvertNodeToLibcall() to choose finite libcalls.

There was a bug about determining the availability of the finite calls that should be fixed with:
rL322010

Not in this patch:
This doesn't resolve the question/bug of clang creating the intrinsic IR in the first place.
There's likely follow-up work needed to support the long double variants better.
There's room for improvement to reduce the code duplication.
Create finite calls that don't originate from a corresponding intrinsic or DAG node?

Differential Revision: https://reviews.llvm.org/D41338

llvm-svn: 322087
2018-01-09 15:41:00 +00:00
Francis Visoiu Mistrih 2b3bd30637 [CodeGen] Don't print register classes in -debug output
Since register classes and banks are already printed with the register
definition, don't print it at the end of every instruction anymore.

This follows MIR in this regard and is another step to the unification
of the two formats.

llvm-svn: 322086
2018-01-09 15:39:44 +00:00