Commit Graph

16639 Commits

Author SHA1 Message Date
Simon Pilgrim 0bf66c9d62 [X86] Regenerated popcnt scalar tests for 32/64-bit targets with/without POPCNT support
llvm-svn: 275709
2016-07-17 16:04:19 +00:00
Elena Demikhovsky eaa356501d X86: Updated a test file. NFC.
This test shows subotimal code generated for AVX-512 vs PENTIUM4.
The issue will be fixed in an upcomming commit.

llvm-svn: 275702
2016-07-17 07:03:13 +00:00
Hal Finkel 04b5330ccd Disable this-return argument forwarding on ARM/AArch64
r275042 reverted function-attribute inference for the 'returned' attribute
because the feature triggered self-hosting failures on ARM and AArch64. James
Molloy determined that the this-return argument forwarding feature, which
directly ties the returned input argument to the returned value, was the cause.
It seems likely that this forwarding code contains, or triggers, a subtle bug.
Disabling for now until we can track that down.

llvm-svn: 275677
2016-07-16 07:07:29 +00:00
Yaxun Liu a711cc7951 Re-commit [AMDGPU] Add metadata for runtime
Attempting to fix lit test failure on ppc.

llvm-svn: 275676
2016-07-16 05:09:21 +00:00
Matthias Braun 538859cca3 llc: Add support for -run-pass none
This does not schedule any passes besides the ones necessary to
construct and print the machine function. This is useful to test .mir
file reading and printing.

Differential Revision: http://reviews.llvm.org/D22432

llvm-svn: 275664
2016-07-16 02:24:59 +00:00
Matthias Braun c92a5fc9f6 ARM/MIR: Move test from MIR to CodeGen/ARM directory
test/CodeGen/MIR/ARM/ARMLoadStoreDBG.mir is an actual test for the ARM
load store optimization pass and not a test of the mir parser/printer.

It belongs to test/CodeGen/ARM; This also updates the test to use the
new -run-pass llc syntax.

llvm-svn: 275662
2016-07-16 02:24:13 +00:00
Matthias Braun 5d00b3213e MIParser: reject subregister indexes on physregs
llvm-svn: 275658
2016-07-16 01:36:18 +00:00
Matt Arsenault 73d2f8954a AMDGPU: Fix verifier error from partially undef copy
In this situation:

%VGPR2<def> = BUFFER_LOAD_DWORD_OFFSET %SGPR8_SGPR9_SGPR10_SGPR11,
%VGPR7<def,tied3> = V_MAC_F32_e32 %VGPR0<undef>, %VGPR1<kill>, %VGPR7<kill,tied0>, %EXEC<imp-use>
%VGPR3_VGPR4_VGPR5_VGPR6<def> = COPY %VGPR0_VGPR1_VGPR2_VGPR3
%VGPR4<def> = COPY %VGPR2

The copy for VGPR1 -> VGPR4 was an error from reading undefined VGPR1,
but VGPR4 is defined immediately after this copy.

llvm-svn: 275635
2016-07-15 22:32:02 +00:00
Michael Kuperstein be2e3f5ce5 ExpandPostRAPseudos should transfer implicit uses, not only implicit defs
Previously, we would expand:
%BL<def> = COPY %DL<kill>, %EBX<imp-use,kill>, %EBX<imp-def>
Into:
%BL<def> = MOV8rr %DL<kill>, %EBX<imp-def>
Dropping the imp-use on the floor.

That confused CriticalAntiDepBreaker, which (correctly) assumes that if an
instruction defs but doesn't use a register, that register is dead immediately
before the instruction - while in this case, the high lanes of EBX can be very
much alive.

This fixes PR28560.

Differential Revision: https://reviews.llvm.org/D22425

llvm-svn: 275634
2016-07-15 22:31:14 +00:00
Matt Arsenault a65e6b8335 AMDGPU: Remove brev intrinsic
llvm-svn: 275620
2016-07-15 21:27:13 +00:00
Matt Arsenault 82e5e1e564 AMDGPU: Fix TargetPrefix for remaining r600 intrinsics
llvm-svn: 275619
2016-07-15 21:27:08 +00:00
Matt Arsenault 11d3e21f2b AMDGPU: Remove AMDGPU.ldexp
llvm-svn: 275618
2016-07-15 21:26:56 +00:00
Matt Arsenault 09b2c4aee8 AMDGPU: Remove legacy rsq.clamped intrinsic
Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining.

Also fix mismatch with non-IEEE rsq selecting to IEEE rsq.

llvm-svn: 275617
2016-07-15 21:26:52 +00:00
Saleem Abdulrasool 467269a40e CodeGen: avoid emitting unnecessary CFI
Remove unnecessary clutter in assembly output.  When using SjLj EH, the CFI is
not actually used for anything.  Do not emit the CFI needlessly.  The minor test
adjustments are interesting.  The prologue test was just overzealous matcching.
The interesting case is the LSDA change.  It was originally added to ensure that
various compilations did not mangle the name (it explicitly checked the name!).
However, subsequent cleanups made it more reliant on the CFI to find the name.
Parse the generated code flow to generically find the label still.

llvm-svn: 275614
2016-07-15 21:10:29 +00:00
Nico Weber 8d66df15f4 Teach fast isel about the win64 calling convention.
This mostly just works.

Vectorcall rets are still not supported.

The win64_eh test change is because fast isel doesn't use rsi for temporary
computations, so it doesn't need to be pushed. The test case I'm changing was
originally added to test pushes, but by now there are other test cases in that
file exercising that code path.

https://reviews.llvm.org/D22422

llvm-svn: 275607
2016-07-15 20:18:37 +00:00
Vitaly Buka 7f64844481 Revert "[AMDGPU] Add metadata for runtime"
This reverts commit r275566.

llvm-svn: 275599
2016-07-15 19:14:57 +00:00
Krzysztof Parzyszek bba0bf7d37 [Hexagon] Improve patterns with stack-based addressing
- Treat bitwise OR with a frame index as an ADD wherever possible, fold it
  into addressing mode.
- Extend patterns for memops to allow memops with frame indexes as address
  operands.

llvm-svn: 275569
2016-07-15 15:35:52 +00:00
Nico Weber f7f2b81602 In dag-optnone.ll, use varargs instead of win64 to fast SDIsel.
The test used to rely on targeting win64 to disable fast isel,
but I'd like to teach fast isel about win64 rets.  Change the
test to use varargs to disable fast isel.

llvm-svn: 275568
2016-07-15 15:30:18 +00:00
Yaxun Liu b3d17690eb [AMDGPU] Add metadata for runtime
Added emitting metadata to elf for runtime.

Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream.

Differential Revision: https://reviews.llvm.org/D21849

llvm-svn: 275566
2016-07-15 14:58:21 +00:00
Simon Pilgrim efd841e294 [X86][AVX] Added shuffle tests for UNPCK+PERMUTE
lowerVectorShuffleAsPermuteAndUnpack could solve this if it worked with 256-bit vectors

llvm-svn: 275554
2016-07-15 11:51:46 +00:00
Simon Pilgrim cf9c31550c [X86][AVX2] Added a memory version of test_mm256_broadcastsi128_si256
This should lower to vbroadcasti128

llvm-svn: 275552
2016-07-15 11:40:27 +00:00
Simon Pilgrim 2683ad54ad [X86][AVX2] Improve lowerShuffleAsRepeatedMaskAndLanePermute permutation of 64-bit sub-lanes
As discussed on PR28136, lowerShuffleAsRepeatedMaskAndLanePermute was attempting to match repeated masks at the 128-bit level and then permute the resultant lanes at the 128-bit (AVX1) or 64-bit (AVX2) sub-lane level.

This change allows us to create the repeated masks at the sub-lane level (and then concat them together to create a 128-bit repeated mask) and then select which sub-lane to permute. This has no effect on the AVX1 codegen.

Fixes PR28136.

llvm-svn: 275543
2016-07-15 09:49:12 +00:00
James Molloy b3326df56a [Thumb-1] Select post-increment load and store where possible
Thumb-1 doesn't have post-inc or pre-inc load or store instructions. However the LDM/STM instructions with writeback can function as post-inc load/store:

  ldm r0!, {r1}  @ load from r0 into r1 and increment r0 by 4

Obviously, this only works if the post increment is 4.

llvm-svn: 275540
2016-07-15 08:03:56 +00:00
James Molloy a454a11d60 [ARM] Prefer indirect calls in minsize mode
... When we emit several calls to the same function in the same basic block.

An indirect call uses a "BLX r0" instruction which has a 16-bit encoding. If many calls are made to the same target, this can enable significant code size reductions.

llvm-svn: 275537
2016-07-15 07:55:21 +00:00
Matt Arsenault b91805ea2b AMDGPU: Fix not expanding control flow after some kill blocks
Also stop trying to insert skip blocks at end_cf. This
was inserting them at the end of the block which doesn't make
sense. The skip should be inserted at the beginning of the block
right after the end cf. Just remove this for now since no tests
seem to stress this and I think this can be handled more generally
later.

Fixes bug 28550

llvm-svn: 275510
2016-07-15 00:58:15 +00:00
Matt Arsenault fa5a86a403 AMDGPU: Fix trying to skip from a block with no successors
Found while reducing bug 28550

llvm-svn: 275509
2016-07-15 00:58:13 +00:00
Matt Arsenault 83ab049af2 AMDGPU: Fix splitting kill blocks with defs before kill
llvm-svn: 275508
2016-07-15 00:58:09 +00:00
Simon Pilgrim 420b266d0a [X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle (reapplied)
This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better.

This was incorrectly reverted in rL275421 during triage of PR28552.

llvm-svn: 275497
2016-07-14 23:05:09 +00:00
Krzysztof Parzyszek ecea07c50e [Hexagon] Packetize function call arguments with tail call instructions
On Hexagon is it legal to packetize the instructions setting up call
arguments with the call instruction itself. This was already done,
except for tail calls. Make sure tail calls are handled as well.

llvm-svn: 275458
2016-07-14 19:30:55 +00:00
Sanjay Patel 2996a342f3 auto-generate checks
Note: I removed the checks after each jump because that's noise, but we apparently 
need branches rather than returning i1 to see the bt codegen in some cases.

llvm-svn: 275439
2016-07-14 17:07:55 +00:00
Nico Weber 5bb284226b Don't optimize movs to pushes in -O0 builds.
https://reviews.llvm.org/D22362

llvm-svn: 275431
2016-07-14 15:40:22 +00:00
Nico Weber 3afaf16abc Revert r275411, it cause PR28552.
llvm-svn: 275421
2016-07-14 14:49:35 +00:00
Nico Weber ecdf45b1e6 Teach fast isel calls and rets about stdcall.
stdcall is callee-pop like thiscall, so the thiscall changes already did most
of the work for this.  This change only opts stdcall in and adds tests.

llvm-svn: 275414
2016-07-14 13:54:26 +00:00
Simon Pilgrim bed37ccd54 [X86][AVX] Added an additional vperm2f128 memory folding test
llvm-svn: 275413
2016-07-14 13:40:53 +00:00
Simon Pilgrim 3ecb6bdd5f [X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle
This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better.

llvm-svn: 275411
2016-07-14 13:28:43 +00:00
Daniel Sanders 46fe6550ac [mips] SelectionDAGISel subclasses now follow the optimization level.
Summary:
It was recently discovered that, for Mips's SelectionDAGISel subclasses,
all optimization levels caused SelectionDAGISel to behave like -O2.

This change adds the necessary plumbing to initialize the optimization level.

Reviewers: andrew.w.kaylor

Subscribers: andrew.w.kaylor, sdardis, dean, llvm-commits, vradosavljevic, petarj, qcolombet, probinson, dsanders

Differential Revision: https://reviews.llvm.org/D14900

llvm-svn: 275410
2016-07-14 13:25:22 +00:00
Simon Pilgrim 053d32906f [X86][AVX] Add support for narrowing 128-bit+ shuffle mask elements to 64-bits to allow combining
Primarily this is to allow blend with zero instead of having to use vperm2f128, but we can use this in the future to deal with AVX512 cases where we need to keep the original element size to correctly fold masked operations.

llvm-svn: 275406
2016-07-14 12:58:04 +00:00
Simon Pilgrim 700e4a1ab8 [X86][AVX] Add 128-bit wide shuffle tests that should combine to blend-with-zero
llvm-svn: 275402
2016-07-14 12:21:40 +00:00
Simon Pilgrim a76a8e50e5 [X86][AVX] Add VBROADCASTF128/VBROADCASTI128 shuffle comments support
llvm-svn: 275400
2016-07-14 12:07:43 +00:00
Simon Pilgrim 9e812169cc [X86][AVX] Regenerate broadcast upgrade tests
llvm-svn: 275398
2016-07-14 11:05:43 +00:00
Eli Friedman 17e8ea18e9 [X86] Fix stupid typo in isel lowering.
Apparently someone miscounted the number of zeros in the immediate.
Fixes https://llvm.org/bugs/show_bug.cgi?id=28544 .

llvm-svn: 275376
2016-07-14 05:48:25 +00:00
Matt Arsenault ca7f5701f8 AMDGPU/R600: Delete/rename intrinsics no longer used by mesa
Use the replacement pass to update the tests, and delete old names.

llvm-svn: 275375
2016-07-14 05:47:17 +00:00
Matt Arsenault 897eee4187 AMDGPU: Remove unused intrinsics
llvm-svn: 275371
2016-07-14 05:23:19 +00:00
Matt Arsenault aa94c1e7ee AMDGPU: Fix test not actually testing anything
It wasn't actually running the pass, and since it is
missing the llvm prefix, the eh intrinsic was not
really an IntrinsicInst.

Also add missing test for lifetime markers.

llvm-svn: 275370
2016-07-14 05:23:15 +00:00
Dean Michael Berris 52735fc435 XRay: Add entry and exit sleds
Summary:
In this patch we implement the following parts of XRay:

- Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches.
- Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts).
- X86-specific nop sleds as described in the white paper.
- A machine function pass that adds the different instrumentation marker instructions at a very late stage.
- A way of identifying which return opcode is considered "normal" for each architecture.

There are some caveats here:

1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet.

2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library.

Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk

Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits

Differential Revision: http://reviews.llvm.org/D19904

llvm-svn: 275367
2016-07-14 04:06:33 +00:00
Nico Weber af7e8465e1 Teach fast isel about thiscall (and callee-pop) calls.
http://reviews.llvm.org/D22315

llvm-svn: 275360
2016-07-14 01:52:51 +00:00
Mehdi Amini 9e332a7719 Add missing test for r275347 "[IPRA] Set callee saved registers to none for local function when IPRA is enabled."
llvm-svn: 275358
2016-07-14 01:31:20 +00:00
Michael Kuperstein be837fa40f [DAG] Correctly chain masked loads
If a masked loads is not added to the chain, it should not reset the chain's
root.

This fixes the remaining part of PR28515.

llvm-svn: 275340
2016-07-13 23:23:40 +00:00
Quentin Colombet 68a84587c5 [MIR] Fix one GlobalISel test case that I missed in r275314.
llvm-svn: 275333
2016-07-13 22:35:33 +00:00
Nico Weber b888555bcc Add a triple to fix test on bots after 275320.
llvm-svn: 275327
2016-07-13 22:19:40 +00:00
Nico Weber eb9488b151 Fix a TODO in X86CallFrameOptimization to not rely on a codegen artifact.
This happens to make X86CallFrameOptimization in -O0 / FastISel builds as well,
but it's not clear if the pass should run in that setup.

http://reviews.llvm.org/D22314

llvm-svn: 275320
2016-07-13 21:38:27 +00:00
Quentin Colombet 545e558b82 [MIR] Print on the given output instead of stderr.
Currently the MIR framework prints all its outputs (errors and actual
representation) on stderr.

This patch fixes that by printing the regular output in the output
specified with -o.

Differential Revision: http://reviews.llvm.org/D22251

llvm-svn: 275314
2016-07-13 20:36:03 +00:00
Matt Arsenault f071102647 AMDGPU: Remove last AMDIL intrinsics
llvm-svn: 275309
2016-07-13 19:42:06 +00:00
Andrew Kaylor 346dd7f1bd Reverting r275284 due to platform-specific test failures
llvm-svn: 275304
2016-07-13 19:09:16 +00:00
Simon Pilgrim 5d664af3c3 [X86][SSE] Regenerate truncated shift test
Check SSE2 and AVX2 implementations

llvm-svn: 275300
2016-07-13 18:50:10 +00:00
Simon Pilgrim 631643e7d9 Regenerate test
llvm-svn: 275299
2016-07-13 18:46:37 +00:00
Krzysztof Parzyszek cb4dd7656b Move mempcpy_call.ll to X86 subdirectory
llvm-svn: 275294
2016-07-13 18:28:45 +00:00
Andrew Kaylor 12cccdd731 Fix for Bug 26903, adds support to inline __builtin_mempcpy
Patch by Sunita Marathe

Differential Revision: http://reviews.llvm.org/D21920

llvm-svn: 275284
2016-07-13 17:25:11 +00:00
Matthias Braun 512424f28a PatchableFunction: Skip pseudos that do not create code
This fixes http://llvm.org/PR28524

llvm-svn: 275278
2016-07-13 16:37:29 +00:00
Sanjay Patel 610a2f6525 [x86][SSE/AVX] optimize pcmp results better (PR28484)
We know that pcmp produces all-ones/all-zeros bitmasks, so we can use that behavior to avoid unnecessary constant loading.

One could argue that load+and is actually a better solution for some CPUs (Intel big cores) because shifts don't have the
same throughput potential as load+and on those cores, but that should be handled as a CPU-specific later transformation if
it ever comes up. Removing the load is the more general x86 optimization. Note that the uneven usage of vpbroadcast in the
test cases is filed as PR28505:
https://llvm.org/bugs/show_bug.cgi?id=28505

Differential Revision: http://reviews.llvm.org/D22225

llvm-svn: 275276
2016-07-13 16:04:07 +00:00
Simon Pilgrim a99368fa35 [X86][AVX512] Add support for VPERMILPD/VPERMILPS variable shuffle mask comments
llvm-svn: 275272
2016-07-13 15:45:36 +00:00
Simon Pilgrim 48d8340760 [X86][AVX] Add support for target shuffle combining to VPERMILPS variable shuffle mask
Added AVX512F VPERMILPS shuffle decoding support

llvm-svn: 275270
2016-07-13 15:10:43 +00:00
Tom Stellard 418beb7671 AMDGPU/SI: Add support for R_AMDGPU_GOTPCREL
Reviewers: rafael, ruiu, tony-tye, arsenm, kzhuravl

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21484

llvm-svn: 275268
2016-07-13 14:23:33 +00:00
Matt Arsenault 0056868c4a AMDGPU: Fold out no-op kill intrinsics
llvm-svn: 275253
2016-07-13 06:04:22 +00:00
Tim Northover 72eebfa4b0 GlobalISel: freeze reserved regs after IRTranslator.
We can freeze the registers after the MachineFrameInfo has been configured (by
telling it about calls, inline asm, ...). This doesn't happen at all yet, but
will be part of IR translation.

Fixes -verify-machineinstrs assertion.

llvm-svn: 275221
2016-07-12 22:23:42 +00:00
Matt Arsenault 786724a22e AMDGPU: Follow up to r275203
I meant to squash this into it.

llvm-svn: 275220
2016-07-12 21:41:32 +00:00
Nemanja Ivanovic f0407e3902 The test case I added is PowerPC specific but I accidentally
had it in the wrong directory. Moved it to CodeGen/PowerPC.

Sorry about the noise.

llvm-svn: 275218
2016-07-12 21:24:08 +00:00
Nemanja Ivanovic b43bb6141e [Power9] Add codegen for VSX word insert/extract instructions
This patch corresponds to review:
http://reviews.llvm.org/D20239

It adds exploitation of XXINSERTW and XXEXTRACTUW instructions that
are useful in some cases for inserting and extracting vector elements of
v4[if]32 vectors.

llvm-svn: 275215
2016-07-12 21:00:10 +00:00
Simon Pilgrim 6fa71da4a4 [X86][AVX] Add support for target shuffle combining to VPERM2F128/VPERM2I128
llvm-svn: 275212
2016-07-12 20:27:32 +00:00
Matthias Braun 96ec47db74 X86FixupBWInsts: No need for forward liveness analysis.
With r274952 and r275201 in place there are no cases left where a
forward liveness analysis yields different results than a backward one.
So we can remove the forward stepping logic.

Differential Revision: http://reviews.llvm.org/D22083

llvm-svn: 275204
2016-07-12 19:04:30 +00:00
Matt Arsenault 657f871a4e AMDGPU: Fix verifier error with kill intrinsic
Don't create a terminator in the middle of the block.
We should probably get rid of this intrinsic.

llvm-svn: 275203
2016-07-12 19:01:23 +00:00
Wei Ding 5b2636a152 AMDGPU: Add LLVM IR Intrinsic for v_lerp_u8
Differential Revision: http://reviews.llvm.org/D22239

llvm-svn: 275197
2016-07-12 18:02:14 +00:00
Haicheng Wu 711ca868fc [AArch64] Set FMOVS0 and FMOVD0 as isAsCheapAsAMove when needed.
If a subtarget has both ZCZeroing and CustomCheapAsMoveHandling features (now
only Kryo has both), set FMOVS0 and FMOVD0 isAsCheapAsAMove.

Differential Revision: http://reviews.llvm.org/D22256

llvm-svn: 275178
2016-07-12 15:31:41 +00:00
Nemanja Ivanovic eebbcb6d57 [PowerPC] Cannonicalize applicable vector shift immediates as swaps
This patch corresponds to review:
http://reviews.llvm.org/D21358

Vector shifts that have the same semantics as a vector swap are cannonicalized
as such to provide additional opportunities for swap removal optimization to
remove unnecessary swaps.

llvm-svn: 275168
2016-07-12 12:16:27 +00:00
Nicolai Haehnle 7968c34586 AMDGPU: Unify MOVRELSOffset and MOVRELDOffset
Summary:
Previously, constant index insertelements would be turned into SI_INDIRECT_DST,
which is bound to prevent some optimization opportunities. Worse, it mislead
the heuristic that decides whether immediates should be lowered to S_MOV_B32
or V_MOV_B32 in a way that resulted in unnecessary v_readfirstlanes.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22217

llvm-svn: 275160
2016-07-12 08:12:16 +00:00
Craig Topper a6e6febe2c [AVX512] Remove masked logic op intrinsics and autoupgrade them to native IR.
llvm-svn: 275155
2016-07-12 05:27:53 +00:00
NAKAMURA Takumi e92e2124f6 llvm/test/CodeGen/AMDGPU/selected-stack-object.ll REQUIRES +Asserts, since it expects assertion failure.
llvm-svn: 275144
2016-07-12 02:18:09 +00:00
Haicheng Wu 1e39574e9f [Kryo] Enable ZCZeroing feature
This feature uses immediate #0 to zero a register.

Differential Revision: http://reviews.llvm.org/D19985

llvm-svn: 275143
2016-07-12 02:04:01 +00:00
Nico Weber c7bf646a99 Teach FastISel about thiscall (and, hence, about callee-pop).
http://reviews.llvm.org/D22115

llvm-svn: 275135
2016-07-12 01:30:35 +00:00
Matt Arsenault 45f8216cee AMDGPU: Remove superfluous string attributes from tests
Also fix v_mac.ll not testing right thing for fneg

llvm-svn: 275129
2016-07-11 23:35:48 +00:00
Nicolai Haehnle c06bfa1daa AMDGPU: Treat texture gather instructions more like other MIMG instructions
Summary:
Setting MIMG to 0 has a bunch of unexpected side effects, including that
isVMEM returns false which leads to incorrect treatment in the hazard
recognizer. The reason I noticed it is that it also leads to incorrect
treatment in VGPR-to-SGPR copies, which is one cause of the referenced bug.

The only reason why MIMG was set to 0 is to signal the special handling of
dmasks, but that can be checked differently.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96877

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D22210

llvm-svn: 275113
2016-07-11 21:59:43 +00:00
Nicolai Haehnle f52c3cf272 AMDGPU: fix local stack slot allocation bugs
Summary:
The main bug fix here is using the 32-bit encoding of V_ADD_I32 in
materializeFrameBaseRegister and resolveFrameIndex, so that arbitrary
immediates work.

The second part is that we may now require the SegmentWaveByteOffset
even when there are initially no stack objects and VGPR spilling isn't
enabled, for stack slots that are allocated later. This means that some
bits become effectively dead and can be cleaned up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96602
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21551

llvm-svn: 275108
2016-07-11 21:44:40 +00:00
Quentin Colombet fb82c7bc94 [X86] Fix tailcall return address clobber bug.
This bug (llvm.org/PR28124) was introduced by r237977, which refactored
the tail call  sequence to be generated in two passes instead of one.

Unfortunately, the stack adjustment produced by the first pass was not
recognized by X86FrameLowering::mergeSPUpdates() in all cases, causing
code such as the following, which clobbers the return address, to be
generated:

popl    %edi
popl    %edi
pushl   %eax
jmp     tailcallee              # TAILCALL

To fix the problem, the entire stack adjustment is performed in
X86ExpandPseudo::ExpandMI() for tail calls.

Patch by Magnus Lång <margnus1@gmail.com>

Differential Revision: http://reviews.llvm.org/D21325

llvm-svn: 275103
2016-07-11 21:03:03 +00:00
Michael Kuperstein cfbac5f361 [X86] Disable FixupSetCC for CodeGenOpt::None
It is an optimization pass, and should not run at -O0. Especially since Fast RA
will not do the required register coalescing anyway, so it's a loss even from
the optimization standpoint.

This also works around (but doesn't quite fix) PR28489.

llvm-svn: 275099
2016-07-11 20:40:44 +00:00
Chad Rosier 4f0dad1674 [IPRA] Properly compute register usage at call sites.
Differential Revision: http://reviews.llvm.org/D21395
Patch by Vivek Pandya.
PR28144

llvm-svn: 275087
2016-07-11 18:45:49 +00:00
Zhan Jun Liau def708a0f9 [SystemZ] Recognize Load On Condition Immediate (LOCHI/LOGHI) opportunities
Summary: Add support for the z13 instructions LOCHI and LOCGHI which
conditionally load immediate values.  Add target instruction info hooks so
that if conversion will allow predication of LHI/LGHI.

Author: RolandF

Reviewers: uweigand

Subscribers: zhanjunl

Commiting on behalf of Roland.

Differential Revision: http://reviews.llvm.org/D22117

llvm-svn: 275086
2016-07-11 18:45:03 +00:00
Sanjay Patel 8f1d408c74 [x86] make some of the tests 256-bit for testing diversity
llvm-svn: 275070
2016-07-11 15:08:37 +00:00
Sanjay Patel b428951990 [x86] specify triple to avoid bot failures
llvm-svn: 275067
2016-07-11 14:17:54 +00:00
Sanjay Patel 0d38830aca [x86] update checks
llvm-svn: 275064
2016-07-11 14:07:31 +00:00
Zlatko Buljan cba9f80ba8 [mips][microMIPS] Implement LDC1, SDC1, LDC2, SDC2, LWC1, SWC1, LWC2 and SWC2 instructions and add CodeGen support
Differential Revision: http://reviews.llvm.org/D18824

llvm-svn: 275050
2016-07-11 07:41:56 +00:00
Elena Demikhovsky d84f337953 AVX-512: DAG lowering for scalar MIN/MAX commutable ops
DAG lowering was missing for the scalar FMINC, FMAXC nodes.
The nodes are generated only in the "unsafe-fp-math" mode.
Added tests.

llvm-svn: 275048
2016-07-11 06:08:06 +00:00
Craig Topper 7ee070e7bc [AVX512] Add support for 512-bit ANDN now that all ones build vectors survive long enough to allow the matching.
llvm-svn: 275046
2016-07-11 05:36:53 +00:00
Craig Topper 516e14cd8e [AVX512] Use vpternlog with an immediate of 0xff to create 512-bit all one vectors.
llvm-svn: 275045
2016-07-11 05:36:48 +00:00
Jan Vesely 2fa28c330c AMDGPU/R600: Add implicitarg.ptr intrinsic
Differential Revision: http://reviews.llvm.org/D21622

llvm-svn: 275024
2016-07-10 21:20:29 +00:00
Simon Pilgrim 2191faa433 [X86][SSE] Add support for target shuffle combining to PSHUFLW/PSHUFHW
llvm-svn: 275022
2016-07-10 21:02:47 +00:00
Sanjay Patel ccd08fc8c4 [x86, SSE, AVX] add tests for icmp+zext (PR28484)
Note the inconsistent vpbroadcast generation for AVX2; another bug.

llvm-svn: 275020
2016-07-10 20:45:14 +00:00
Simon Pilgrim 51c786bd91 [X86][SSE] Added tests for combining shuffles to PSHUFLW/PSHUFHW
llvm-svn: 275019
2016-07-10 20:19:56 +00:00
Marcin Koscielnicki cf7cc724a7 [SystemZ] Utilize Test Data Class instructions.
This adds a new SystemZ-specific intrinsic, llvm.s390.tdc.f(32|64|128),
which maps straight to the test data class instructions.  A new IR pass
is added to recognize instructions that can be converted to TDC and
perform the necessary replacements.

Differential Revision: http://reviews.llvm.org/D21949

llvm-svn: 275016
2016-07-10 14:41:22 +00:00
Craig Topper 0b0954570a [AVX512] Add support for lowering to 512-bit SHUFPS.
llvm-svn: 275011
2016-07-10 05:55:53 +00:00
Simon Pilgrim 606126e848 [X86][SSE] Add support for target shuffle combining to INSERTPS
llvm-svn: 274990
2016-07-09 21:47:55 +00:00