Commit Graph

8654 Commits

Author SHA1 Message Date
Quentin Colombet ae3168da3f [InlineSpiller] Don't call TargetInstrInfo::foldMemoryOperand with an empty list.
Since r287792 if we try to do that we will hit an assert.

llvm-svn: 289001
2016-12-08 00:06:51 +00:00
Tim Northover 05cc4859ad GlobalISel: simplify MachineIRBuilder interface.
MachineIRBuilder had weird before/after and beginning/end flags for the insert
point. Unfortunately the non-default means that instructions will be inserted
in reverse order which is almost never what anyone wants.

Really, I think we just want (like IRBuilder has) the ability to insert at any
C++ iterator-style point (i.e. before any instruction or before MBB.end()). So
this fixes MIRBuilders to behave like IRBuilders in this respect.

llvm-svn: 288980
2016-12-07 21:05:38 +00:00
Michael Kuperstein 5842b20633 [X86] Skip over DEBUG_VALUE while looking for start of call sequence
If we don't skip over DEBUG_VALUEs, we get differences between -g and non-g
code.

This fixes PR31242.

Differential Revision: https://reviews.llvm.org/D27485

llvm-svn: 288965
2016-12-07 19:31:08 +00:00
Michael Kuperstein 18092cf2c3 [X86] Do not assume "ri" instructions always have an immediate operand
The second operand of an "ri" instruction may be an immediate, but it may
also be a globalvariable, so we should make any assumptions.

This fixes PR31271.

Differential Revision: https://reviews.llvm.org/D27481

llvm-svn: 288964
2016-12-07 19:29:18 +00:00
Simon Pilgrim ba05d41095 [SelectionDAG] Add knownbits support for vector demandedelts in SMAX/SMIN/UMAX/UMIN opcodes
llvm-svn: 288926
2016-12-07 17:54:00 +00:00
Simon Pilgrim ef76b83164 [X86] Add knownbits vector UMAX test
In preparation for demandedelts support

llvm-svn: 288920
2016-12-07 17:21:13 +00:00
Simon Pilgrim 967325b373 [SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes
llvm-svn: 288916
2016-12-07 16:28:21 +00:00
Simon Pilgrim b421ef2370 [X86] Add test to show missed opportunities to calculate knownbits in INSERT_VECTOR_ELT
llvm-svn: 288912
2016-12-07 15:27:18 +00:00
Simon Pilgrim 33f2a669c1 [X86][SSE] Fix vpextrd/vpextrq checks
They were testing for the pre-vex versions

llvm-svn: 288911
2016-12-07 15:10:05 +00:00
Simon Pilgrim 4b1ebf97fc [X86][SSE] Force execution domain of 32-bit extractps/pextrd in the stack folding tests
llvm-svn: 288910
2016-12-07 15:06:14 +00:00
Simon Pilgrim e75ff02269 [X86][SSE] Regenerate test.
llvm-svn: 288906
2016-12-07 13:05:04 +00:00
Simon Pilgrim 8893bd95f0 [X86][SSE] Consistently set MOVD/MOVQ load/store/move instructions to integer domain
We are being inconsistent with these instructions (and all their variants.....) with a random mix of them using the default float domain.

Differential Revision: https://reviews.llvm.org/D27419

llvm-svn: 288902
2016-12-07 12:10:49 +00:00
Simon Pilgrim d5bc5c16b2 [X86][XOP] Fix VPERMIL2 non-constant pool shuffle decoding (PR31296)
The non-constant pool version of DecodeVPERMIL2PMask was not offsetting correctly for the second input. I've updated the code to match the implementation in the constant-pool version.

Annoyingly this bug was hidden for so long as it's tricky to combine to useful variable shuffle masks that don't become constant-pool entries.

llvm-svn: 288898
2016-12-07 11:19:00 +00:00
Simon Pilgrim 0559b9e557 [X86][XOP] Add test case for PR31296
llvm-svn: 288858
2016-12-06 22:50:13 +00:00
Zvi Rackover 8bc7e4da51 [X86] Prefer reduced width multiplication over pmulld on Silvermont
Summary:
Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld.
On Silvermont [source: Optimization Reference Manual]:
PMULLD has a throughput of 1/11 [instruction/cycles].
PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles].

Fixes pr31202.

Analysis of this issue was done by Fahana Aleen.

Reviewers: wmi, delena, mkuper

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D27203

llvm-svn: 288844
2016-12-06 19:35:20 +00:00
Simon Pilgrim dd6ca639d5 [DAGCombine] Add (sext_in_reg (zext x)) -> (sext x) combine
Handle the case where a sign extension has ended up being split into separate stages (typically to get around vector legal ops) and a zext + sext_in_reg gets inserted.

Differential Revision: https://reviews.llvm.org/D27461

llvm-svn: 288842
2016-12-06 19:09:37 +00:00
Simon Pilgrim 1577b39f51 [SelectionDAG] We can ignore knownbits from an undef shuffle vector index if we don't actually demand that element
llvm-svn: 288839
2016-12-06 18:58:25 +00:00
Simon Pilgrim 4a2979ce12 [X86][SSE] Add knownbits test demonstrating demandedelts not ignoring undef shuffle elements
llvm-svn: 288825
2016-12-06 17:00:47 +00:00
Simon Pilgrim 0caaadfc2d [X86][SSE] Added vector sext_in_reg combine tests
llvm-svn: 288819
2016-12-06 15:57:26 +00:00
Simon Pilgrim 7c7b649639 [X86] Improve UMAX/UMIN knownbits test
Test the sequential effect of each op

llvm-svn: 288815
2016-12-06 15:17:50 +00:00
Ayman Musa 86c00b799f [X86][AVX512] Detect repeated constant patterns in BUILD_VECTOR suitable for broadcasting.
Check if a build_vector node includes a repeated constant pattern and replace it with a broadcast of that pattern.
For example:
"build_vector <0, 1, 2, 3, 0, 1, 2, 3>" would be replaced by "broadcast <0, 1, 2, 3>"

Differential Revision: https://reviews.llvm.org/D26802

llvm-svn: 288804
2016-12-06 12:24:14 +00:00
Simon Pilgrim ae63dd10f8 [X86] Add tests to show missed opportunities to calculate knownbits in SMAX/SMIN/UMAX/UMIN
llvm-svn: 288801
2016-12-06 12:12:20 +00:00
Florian Hahn 7582c669bd [framelowering] Improve tracking of first CS pop instruction.
Summary: This patch makes sure FirstCSPop and MBBI never point to DBG_VALUE instructions, which affected the code generated.

Reviewers: mkuper, aprantl, MatzeB

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27343

llvm-svn: 288794
2016-12-06 10:24:55 +00:00
Craig Topper b34eef7b41 [X86] Remove another weird scalar sqrt/rcp/rsqrt pattern.
This pattern turned a vector sqrt/rcp/rsqrt operation of sse_load_f32/f64 into the the scalar instruction for the operation and put undef into the upper bits. For correctness, the resulting code should still perform the sqrt/rcp/rsqrt on the upper bits after the load is extended since that's what the operation asked for. Particularly in the case where the upper bits are 0, in that case we need calculate the sqrt/rcp/rsqrt of the zeroes and keep the result in the upper-bits. This implies we should be using the packed instruction still.

The only test case for this pattern is one I just added so there was no coverage of this.

llvm-svn: 288784
2016-12-06 08:08:12 +00:00
Craig Topper 26ce4267ef [X86] Add test case demonstrating a case where a vector sqrt being passed (scalar_to_vector loadf64) uses a scalar sqrt instruction.
This occurs due to a pattern that uses sse_load_f32/f64 with vector sqrt/rcp/rsqrt operations and turns them into scalar instructions. Perhaps for the case were the upper bits come from undef this is ok.  I believe a (vzmovl load64) would do the same thing but those seems to become vzload instead and selectScalarSSELoad doesn't handle that today. In that case we should be performing the vector operation on the zeros in the upper bits which is not equivalent to using a scalar instruction.

I will remove this pattern in a follow up patch. There appears to be no other test content for it.

llvm-svn: 288783
2016-12-06 08:08:09 +00:00
Craig Topper aa2c38378c [X86] Regenerate a test using update_llc_test_checks.py
llvm-svn: 288782
2016-12-06 08:08:07 +00:00
Craig Topper 683470bf1b [X86] Remove bad pattern that caused 128-bit loads being used by scalar sqrt/rcp/rsqrt intrinsics to select the memory form of the corresponding instruction and violate the semantics of the intrinsic.
The intrinsics are supposed to pass the upper bits straight through to their output register. This means we need to make sure we still perform the 128-bit load to get those upper bits to pass to give to the instruction since the memory form of the instruction only reads 32 or 64 bits.

llvm-svn: 288781
2016-12-06 08:08:04 +00:00
Craig Topper 125939ff65 [X86] Add test case that shows a scalar sqrtsd intrinsic of a 128-bit vector load using the load form of the sqrtsd instruction which violates the intrinsic semantics.
The sqrtsd instruction only loads 64-bits and writes bits 63:0 with the sqrt result. Bits 127:64 are preserved in the destination register. The semantics of the intrinsic indicate bits 127:64 should come from the intrinsic argument which in this case is a 128-bit load. So the generated code should have a 128-bit load and use a register form of sqrtsd.

llvm-svn: 288780
2016-12-06 08:08:01 +00:00
Craig Topper 5fc7bc91f9 [X86] Correct pattern for VSQRTSSr_Int, VSQRTSDr_Int, VRCPSSr_Int, and VRSQRTSSr_Int to not have an IMPLICIT_DEF on the first input. The semantics of the intrinsic are clear and not undefined.
The intrinsic takes one argument, the lower bits are affected by the operation and the upper bits should be passed through. The instruction itself takes two operands, the high bits of the first operand are passed through and the low bits of the second operand are modified by the operation. To match this to the intrinsic we should pass the single intrinsic input to both operands.

I had to remove the stack folding test for these instructions since they depended on the incorrect behavior. The same register is now used for both inputs so the load can't be folded.

llvm-svn: 288779
2016-12-06 08:07:58 +00:00
Craig Topper 6413f8a8f2 [X86] Remove scalar logical op alias instructions. Just use COPY_FROM/TO_REGCLASS and the normal packed instructions instead
Summary:
This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently.

I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel.

I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available.

Reviewers: spatel, delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27401

llvm-svn: 288771
2016-12-06 04:58:39 +00:00
Michael Kuperstein e3036abcf9 [X86] Fix non-intrinsic roundss/roundsd to not read the destination register
This changes the scalar non-intrinsic non-avx roundss/sd instruction
definitions not to read their destination register - allowing partial dependency
breaking.

This fixes PR31143.

Differential Revision: https://reviews.llvm.org/D27323

llvm-svn: 288703
2016-12-05 20:57:37 +00:00
Adrian Prantl 941fa7588b [DIExpression] Introduce a dedicated DW_OP_LLVM_fragment operation
so we can stop using DW_OP_bit_piece with the wrong semantics.

The entire back story can be found here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html

The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's
offset field to mean the offset into the source variable rather than
the offset into the location at the top the DWARF expression stack. In
order to be able to fix this in a subsequent patch, this patch
introduces a dedicated DW_OP_LLVM_fragment operation with the
semantics that we used to apply to DW_OP_bit_piece, which is what we
actually need while inside of LLVM. This patch is complete with a
bitcode upgrade for expressions using the old format. It does not yet
fix the DWARF backend to use DW_OP_bit_piece correctly.

Implementation note: We discussed several options for implementing
this, including reserving a dedicated field in DIExpression for the
fragment size and offset, but using an custom operator at the end of
the expression works just fine and is more efficient because we then
only pay for it when we need it.

Differential Revision: https://reviews.llvm.org/D27361
rdar://problem/29335809

llvm-svn: 288683
2016-12-05 18:04:47 +00:00
Sanjay Patel 1f158d6955 [TargetLowering] add special-case for demanded bits analysis of 'not'
We treat bitwise 'not' as a special operation and try not to reduce its all-ones mask. 
Presumably, this is because a 'not' may be cheaper than a generic 'xor' or it may get
folded into another logic op if the target has those. However, if we can remove a logic
instruction by changing the xor's constant mask value, that should always be a win.

Note that the IR version of SimplifyDemandedBits() does not treat 'not' as a special-case
currently (although that's marked with a FIXME). So if you run this IR through -instcombine,
you should get the same end result. I'm hoping to add a different backend transform that 
will expose this problem though, so I need to solve this first.

Differential Revision: https://reviews.llvm.org/D27356

llvm-svn: 288676
2016-12-05 15:58:21 +00:00
Sanjay Patel f807f6a05f [x86] fold fand (fxor X, -1) Y --> fandn X, Y
I noticed this gap in the scalar FP-logic matching with:
D26712
and:
rL287171

Differential Revision: https://reviews.llvm.org/D27385

llvm-svn: 288675
2016-12-05 15:45:27 +00:00
Simon Pilgrim b08c98f125 [X86][SSE] Add support for combining target shuffles to UNPCKL/UNPCKH.
llvm-svn: 288663
2016-12-05 11:25:13 +00:00
Craig Topper db8467ae26 [AVX-512] Teach fast isel to handle 512-bit vector bitcasts.
llvm-svn: 288641
2016-12-05 05:50:51 +00:00
Craig Topper 7ef6ea324a [AVX-512] Teach fast isel to use masked compare and movss for handling scalar cmp and select sequence when AVX-512 is enabled. This matches the behavior of normal isel.
llvm-svn: 288636
2016-12-05 04:51:31 +00:00
Craig Topper 227d4279a8 [AVX-512] Add avx512f command lines to fast isel SSE select test.
Currently the fast isel code emits an avx1 instruction sequence even with avx512. This is different than normal isel. A follow up commit will fix this.

llvm-svn: 288635
2016-12-05 04:51:28 +00:00
Simon Pilgrim 6133fc3aa2 [X86][XOP] Add target shuffle tests showing missing UNPCKL combine.
llvm-svn: 288628
2016-12-04 22:55:57 +00:00
Simon Pilgrim 38d245197e [X86][AVX512] Add target shuffle tests showing missing UNPCK combines.
llvm-svn: 288627
2016-12-04 22:54:21 +00:00
Matt Arsenault 92fede361f DAG: Fold out out of bounds insert_vector_elt
getNode already prevents formation of out of bounds constant
extract_vector_elts. Do the same for insert_vector_elt.

llvm-svn: 288603
2016-12-03 23:03:26 +00:00
Craig Topper 9d16bfa0f5 [AVX-512] Add many of the VPERM instructions to the load folding table. Move VPERMPDZri to the correct table.
llvm-svn: 288591
2016-12-03 19:37:39 +00:00
Craig Topper c210827b53 [AVX-512] Add EVEX VPMADDUBSW and VPMADDWD to the load folding tables.
llvm-svn: 288587
2016-12-03 17:19:15 +00:00
Craig Topper 8e7498976a [X86] Fix VEX encoded VPMADDUBSW to not be marked commutable.
This was accidentallly broken in r285515 when we started lowering the intrinsic to an ISD node. Should fix PR31241.

llvm-svn: 288578
2016-12-03 05:35:44 +00:00
Craig Topper da73a09fcd [X86] Add test cases demonstrating where we incorrectly commute VEX VPMADDUSBW due to a bug introduced in r285515.
I believe this is the cause of PR31241.

llvm-svn: 288577
2016-12-03 05:35:38 +00:00
Sanjay Patel a5dbdf342b [x86] add common check prefix to reduce duplication; NFC
llvm-svn: 288522
2016-12-02 17:58:26 +00:00
Sanjay Patel c731187732 fix check-label
llvm-svn: 288517
2016-12-02 17:50:14 +00:00
Sanjay Patel 91d1ed5ee6 [x86] add tests to show missing demanded bits analysis; NFC
llvm-svn: 288515
2016-12-02 17:48:48 +00:00
Nicolai Haehnle 33ca182c91 [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default
Summary:
When X = 0 and Y = inf, the original code produces inf, but the transformed
code produces nan. So this transform (and its relatives) should only be
used when the no-infs-fp-math flag is explicitly enabled.

Also disable the transform using fmad (intermediate rounding) when unsafe-math
is not enabled, since it can reduce the precision of the result; consider this
example with binary floating point numbers with two bits of mantissa:

  x = 1.01
  y = 111

  x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step)

  x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero)

The example relies on rounding towards zero at least in the second step.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578

Reviewers: RKSimon, tstellarAMD, spatel, arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26602

llvm-svn: 288506
2016-12-02 16:06:18 +00:00
Simon Pilgrim 3a19863f1c [X86][SSE] Renamed shuffle combine test.
We're trying to combine to vpunpckhbw not vpunpckhwd

llvm-svn: 288501
2016-12-02 14:43:39 +00:00