Commit Graph

8681 Commits

Author SHA1 Message Date
Craig Topper b34eef7b41 [X86] Remove another weird scalar sqrt/rcp/rsqrt pattern.
This pattern turned a vector sqrt/rcp/rsqrt operation of sse_load_f32/f64 into the the scalar instruction for the operation and put undef into the upper bits. For correctness, the resulting code should still perform the sqrt/rcp/rsqrt on the upper bits after the load is extended since that's what the operation asked for. Particularly in the case where the upper bits are 0, in that case we need calculate the sqrt/rcp/rsqrt of the zeroes and keep the result in the upper-bits. This implies we should be using the packed instruction still.

The only test case for this pattern is one I just added so there was no coverage of this.

llvm-svn: 288784
2016-12-06 08:08:12 +00:00
Craig Topper 26ce4267ef [X86] Add test case demonstrating a case where a vector sqrt being passed (scalar_to_vector loadf64) uses a scalar sqrt instruction.
This occurs due to a pattern that uses sse_load_f32/f64 with vector sqrt/rcp/rsqrt operations and turns them into scalar instructions. Perhaps for the case were the upper bits come from undef this is ok.  I believe a (vzmovl load64) would do the same thing but those seems to become vzload instead and selectScalarSSELoad doesn't handle that today. In that case we should be performing the vector operation on the zeros in the upper bits which is not equivalent to using a scalar instruction.

I will remove this pattern in a follow up patch. There appears to be no other test content for it.

llvm-svn: 288783
2016-12-06 08:08:09 +00:00
Craig Topper aa2c38378c [X86] Regenerate a test using update_llc_test_checks.py
llvm-svn: 288782
2016-12-06 08:08:07 +00:00
Craig Topper 683470bf1b [X86] Remove bad pattern that caused 128-bit loads being used by scalar sqrt/rcp/rsqrt intrinsics to select the memory form of the corresponding instruction and violate the semantics of the intrinsic.
The intrinsics are supposed to pass the upper bits straight through to their output register. This means we need to make sure we still perform the 128-bit load to get those upper bits to pass to give to the instruction since the memory form of the instruction only reads 32 or 64 bits.

llvm-svn: 288781
2016-12-06 08:08:04 +00:00
Craig Topper 125939ff65 [X86] Add test case that shows a scalar sqrtsd intrinsic of a 128-bit vector load using the load form of the sqrtsd instruction which violates the intrinsic semantics.
The sqrtsd instruction only loads 64-bits and writes bits 63:0 with the sqrt result. Bits 127:64 are preserved in the destination register. The semantics of the intrinsic indicate bits 127:64 should come from the intrinsic argument which in this case is a 128-bit load. So the generated code should have a 128-bit load and use a register form of sqrtsd.

llvm-svn: 288780
2016-12-06 08:08:01 +00:00
Craig Topper 5fc7bc91f9 [X86] Correct pattern for VSQRTSSr_Int, VSQRTSDr_Int, VRCPSSr_Int, and VRSQRTSSr_Int to not have an IMPLICIT_DEF on the first input. The semantics of the intrinsic are clear and not undefined.
The intrinsic takes one argument, the lower bits are affected by the operation and the upper bits should be passed through. The instruction itself takes two operands, the high bits of the first operand are passed through and the low bits of the second operand are modified by the operation. To match this to the intrinsic we should pass the single intrinsic input to both operands.

I had to remove the stack folding test for these instructions since they depended on the incorrect behavior. The same register is now used for both inputs so the load can't be folded.

llvm-svn: 288779
2016-12-06 08:07:58 +00:00
Craig Topper 6413f8a8f2 [X86] Remove scalar logical op alias instructions. Just use COPY_FROM/TO_REGCLASS and the normal packed instructions instead
Summary:
This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently.

I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel.

I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available.

Reviewers: spatel, delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27401

llvm-svn: 288771
2016-12-06 04:58:39 +00:00
Michael Kuperstein e3036abcf9 [X86] Fix non-intrinsic roundss/roundsd to not read the destination register
This changes the scalar non-intrinsic non-avx roundss/sd instruction
definitions not to read their destination register - allowing partial dependency
breaking.

This fixes PR31143.

Differential Revision: https://reviews.llvm.org/D27323

llvm-svn: 288703
2016-12-05 20:57:37 +00:00
Adrian Prantl 941fa7588b [DIExpression] Introduce a dedicated DW_OP_LLVM_fragment operation
so we can stop using DW_OP_bit_piece with the wrong semantics.

The entire back story can be found here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html

The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's
offset field to mean the offset into the source variable rather than
the offset into the location at the top the DWARF expression stack. In
order to be able to fix this in a subsequent patch, this patch
introduces a dedicated DW_OP_LLVM_fragment operation with the
semantics that we used to apply to DW_OP_bit_piece, which is what we
actually need while inside of LLVM. This patch is complete with a
bitcode upgrade for expressions using the old format. It does not yet
fix the DWARF backend to use DW_OP_bit_piece correctly.

Implementation note: We discussed several options for implementing
this, including reserving a dedicated field in DIExpression for the
fragment size and offset, but using an custom operator at the end of
the expression works just fine and is more efficient because we then
only pay for it when we need it.

Differential Revision: https://reviews.llvm.org/D27361
rdar://problem/29335809

llvm-svn: 288683
2016-12-05 18:04:47 +00:00
Sanjay Patel 1f158d6955 [TargetLowering] add special-case for demanded bits analysis of 'not'
We treat bitwise 'not' as a special operation and try not to reduce its all-ones mask. 
Presumably, this is because a 'not' may be cheaper than a generic 'xor' or it may get
folded into another logic op if the target has those. However, if we can remove a logic
instruction by changing the xor's constant mask value, that should always be a win.

Note that the IR version of SimplifyDemandedBits() does not treat 'not' as a special-case
currently (although that's marked with a FIXME). So if you run this IR through -instcombine,
you should get the same end result. I'm hoping to add a different backend transform that 
will expose this problem though, so I need to solve this first.

Differential Revision: https://reviews.llvm.org/D27356

llvm-svn: 288676
2016-12-05 15:58:21 +00:00
Sanjay Patel f807f6a05f [x86] fold fand (fxor X, -1) Y --> fandn X, Y
I noticed this gap in the scalar FP-logic matching with:
D26712
and:
rL287171

Differential Revision: https://reviews.llvm.org/D27385

llvm-svn: 288675
2016-12-05 15:45:27 +00:00
Simon Pilgrim b08c98f125 [X86][SSE] Add support for combining target shuffles to UNPCKL/UNPCKH.
llvm-svn: 288663
2016-12-05 11:25:13 +00:00
Craig Topper db8467ae26 [AVX-512] Teach fast isel to handle 512-bit vector bitcasts.
llvm-svn: 288641
2016-12-05 05:50:51 +00:00
Craig Topper 7ef6ea324a [AVX-512] Teach fast isel to use masked compare and movss for handling scalar cmp and select sequence when AVX-512 is enabled. This matches the behavior of normal isel.
llvm-svn: 288636
2016-12-05 04:51:31 +00:00
Craig Topper 227d4279a8 [AVX-512] Add avx512f command lines to fast isel SSE select test.
Currently the fast isel code emits an avx1 instruction sequence even with avx512. This is different than normal isel. A follow up commit will fix this.

llvm-svn: 288635
2016-12-05 04:51:28 +00:00
Simon Pilgrim 6133fc3aa2 [X86][XOP] Add target shuffle tests showing missing UNPCKL combine.
llvm-svn: 288628
2016-12-04 22:55:57 +00:00
Simon Pilgrim 38d245197e [X86][AVX512] Add target shuffle tests showing missing UNPCK combines.
llvm-svn: 288627
2016-12-04 22:54:21 +00:00
Matt Arsenault 92fede361f DAG: Fold out out of bounds insert_vector_elt
getNode already prevents formation of out of bounds constant
extract_vector_elts. Do the same for insert_vector_elt.

llvm-svn: 288603
2016-12-03 23:03:26 +00:00
Craig Topper 9d16bfa0f5 [AVX-512] Add many of the VPERM instructions to the load folding table. Move VPERMPDZri to the correct table.
llvm-svn: 288591
2016-12-03 19:37:39 +00:00
Craig Topper c210827b53 [AVX-512] Add EVEX VPMADDUBSW and VPMADDWD to the load folding tables.
llvm-svn: 288587
2016-12-03 17:19:15 +00:00
Craig Topper 8e7498976a [X86] Fix VEX encoded VPMADDUBSW to not be marked commutable.
This was accidentallly broken in r285515 when we started lowering the intrinsic to an ISD node. Should fix PR31241.

llvm-svn: 288578
2016-12-03 05:35:44 +00:00
Craig Topper da73a09fcd [X86] Add test cases demonstrating where we incorrectly commute VEX VPMADDUSBW due to a bug introduced in r285515.
I believe this is the cause of PR31241.

llvm-svn: 288577
2016-12-03 05:35:38 +00:00
Sanjay Patel a5dbdf342b [x86] add common check prefix to reduce duplication; NFC
llvm-svn: 288522
2016-12-02 17:58:26 +00:00
Sanjay Patel c731187732 fix check-label
llvm-svn: 288517
2016-12-02 17:50:14 +00:00
Sanjay Patel 91d1ed5ee6 [x86] add tests to show missing demanded bits analysis; NFC
llvm-svn: 288515
2016-12-02 17:48:48 +00:00
Nicolai Haehnle 33ca182c91 [DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default
Summary:
When X = 0 and Y = inf, the original code produces inf, but the transformed
code produces nan. So this transform (and its relatives) should only be
used when the no-infs-fp-math flag is explicitly enabled.

Also disable the transform using fmad (intermediate rounding) when unsafe-math
is not enabled, since it can reduce the precision of the result; consider this
example with binary floating point numbers with two bits of mantissa:

  x = 1.01
  y = 111

  x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step)

  x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero)

The example relies on rounding towards zero at least in the second step.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578

Reviewers: RKSimon, tstellarAMD, spatel, arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26602

llvm-svn: 288506
2016-12-02 16:06:18 +00:00
Simon Pilgrim 3a19863f1c [X86][SSE] Renamed shuffle combine test.
We're trying to combine to vpunpckhbw not vpunpckhwd

llvm-svn: 288501
2016-12-02 14:43:39 +00:00
Simon Pilgrim cbf5f97018 [X86][SSE] Add support for extracting constant bit data from broadcasted constants
llvm-svn: 288499
2016-12-02 13:16:08 +00:00
Craig Topper 4961fa9bba [AVX-512] Add EVEX vpshuflw/vpshufhw/vpshufd instructions to load folding tables.
llvm-svn: 288484
2016-12-02 07:57:11 +00:00
Craig Topper 17ddb521ef [AVX-512] Add EVEX PSHUFB instructions to load folding tables.
llvm-svn: 288482
2016-12-02 07:06:30 +00:00
Craig Topper f7866fad54 [AVX-512] Add masked VINSERTF/VINSERTI instructions to load folding tables.
llvm-svn: 288481
2016-12-02 06:24:38 +00:00
Chih-Hung Hsieh 76b913c470 [SelectionDAG] getRawSubclassData should not return HasDebugValue.
This change fixes a regression in r279537 and
makes getRawSubclassData behave like r279536.
Without this change, the fp128-g.ll test case will have an
infinite loop involving SoftenFloatRes_LOAD.

Differential Revision: http://reviews.llvm.org/D26942

llvm-svn: 288420
2016-12-01 21:56:33 +00:00
Simon Pilgrim 5fe6236035 [X86][SSE] Classify AND bitmasks as variable shuffle masks
They are loading the bitmasks from the constant pool so the cost is similar to loading a shuffle mask.

llvm-svn: 288367
2016-12-01 16:00:14 +00:00
Simon Pilgrim 1e4d870999 [X86][SSE] Add support for combining AND bitmasks to shuffles.
llvm-svn: 288365
2016-12-01 15:41:40 +00:00
Simon Pilgrim 2cff28dd27 [X86][SSE] Tidied up filecheck prefixes for uitofp fast-math tests.
They should be in 'narrowing' order from common to more specific test prefixes.

llvm-svn: 288338
2016-12-01 14:56:48 +00:00
Simon Pilgrim 55066e5622 [X86][SSE] Add support for combining target shuffles to AND bitmasks.
llvm-svn: 288335
2016-12-01 13:47:02 +00:00
Simon Pilgrim 947650e99d [X86][SSE] Add support for combining ISD::AND with shuffles.
Attempts to convert an AND with a vector of 255 or 0 values into a shuffle (blend) mask.

llvm-svn: 288333
2016-12-01 11:52:37 +00:00
Simon Pilgrim ed4ede0c29 [X86][SSE] Added tests showing missed combines of shuffles with ANDs.
llvm-svn: 288330
2016-12-01 11:26:07 +00:00
Kostya Serebryany b66cb88c2e revert r288283 as it causes debug info (line numbers) to be lost in instrumented code. also revert r288299 which was a workaround for the problem.
llvm-svn: 288300
2016-12-01 02:06:56 +00:00
Matthias Braun 39c3c89cdc MCStreamer: Use "cfi" for CFI related temp labels.
Choosing a "cfi" name makes the intend a bit clearer in an assembly dump
and more importantly the assembly dumps are slightly more stable as the
numbers don't move around anymore when unrelated code calls
createTempSymbol() more or less often.
As they are temp labels the name doesn't influence the generated object
code.

Differential Revision: https://reviews.llvm.org/D27244

llvm-svn: 288290
2016-11-30 23:48:26 +00:00
Paul Robinson 37a13ddb4b Recommit r288212: Emit 'no line' information for interesting 'orphan' instructions.
The LLDB tests are now ready for this patch.

DWARF specifies that "line 0" really means "no appropriate source
location" in the line table.  Use this for branch targets and some
other cases that have no specified source location, to prevent
inheriting unfortunate line numbers from physically preceding
instructions (which might be from completely unrelated source).

Differential Revision: http://reviews.llvm.org/D24180

llvm-svn: 288283
2016-11-30 22:49:55 +00:00
Simon Pilgrim 4ae3834792 [X86][SSE] Added tests showing missed combines of ANDs with shuffles.
llvm-svn: 288259
2016-11-30 18:15:10 +00:00
Simon Pilgrim 288c088c17 [X86][SSE] Add support for target shuffle constant folding
Initial support for target shuffle constant folding in cases where all shuffle inputs are constant. We may be able to relax this and merge shuffles with only some constant inputs in the future.

I've added the helper function getTargetConstantBitsFromNode (based off a similar function in X86ShuffleDecodeConstantPool.cpp) that could be reused for other cases requiring constant vector extraction.

Differential Revision: https://reviews.llvm.org/D27220

llvm-svn: 288250
2016-11-30 16:33:46 +00:00
Simon Pilgrim 6c38b7548d Updated test with -verify-machineinstrs to check for PR21931
llvm-svn: 288242
2016-11-30 13:21:12 +00:00
Simon Pilgrim b099d16516 [X86][SSE] Add tests demonstrating missed opportunities to combine 64-bit element unpacks with horizontal pair ops.
llvm-svn: 288240
2016-11-30 11:30:33 +00:00
Paul Robinson 957ba405e8 Revert r288212 due to lldb failure.
llvm-svn: 288216
2016-11-29 23:20:35 +00:00
Paul Robinson 96de8c778b Emit 'no line' information for interesting 'orphan' instructions.
DWARF specifies that "line 0" really means "no appropriate source
location" in the line table.  Use this for branch targets and some
other cases that have no specified source location, to prevent
inheriting unfortunate line numbers from physically preceding
instructions (which might be from completely unrelated source).

Differential Revision: http://reviews.llvm.org/D24180

llvm-svn: 288212
2016-11-29 22:41:16 +00:00
Simon Pilgrim 17062a2bf6 [X86][AVX512VL] Improved testing of vcvtpd2ps, vcvtpd2dq/vcvtpd2udq and vcvttpd2dq/vcvttpd2udq implicit zeroing of upper 64-bits of xmm result
Ensure that masked instruction doesn't assume implicit zeroing.

llvm-svn: 288211
2016-11-29 22:38:30 +00:00
Simon Pilgrim 849473ae99 [X86][AVX512DQVL] Improved testing of vcvtqq2ps/vcvtuqq2ps implicit zeroing of upper 64-bits of xmm result
Ensure that masked instruction doesn't assume implicit zeroing.

llvm-svn: 288209
2016-11-29 22:36:28 +00:00
Simon Pilgrim 35c47c494d [X86][SSE] Add initial support for combining target shuffles to (V)PMOVZX.
We can only handle 128-bit vectors until we support target shuffle inputs of different size to the output.

llvm-svn: 288140
2016-11-29 14:18:51 +00:00
Simon Pilgrim c17fb85090 [X86][SSE] Added tests showing missed combines to (V)PMOVZX
llvm-svn: 288136
2016-11-29 13:16:11 +00:00
Reid Kleckner c68a6c4ca9 Recognize ${:uid} escapes in intel syntax inline asm
It looks like this logic was duplicated long ago and the GCC side of
things has grown additional functionality. We need ${:uid} at least to
generate unique MS inline asm labels (PR23715), so expose these.

llvm-svn: 288092
2016-11-29 00:29:27 +00:00
Joerg Sonnenberger caaa82d90d Revert r287553: [CodeGenPrep] Skip merging empty case blocks
It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line
670 ("Expected irreducible CFG").

llvm-svn: 288052
2016-11-28 18:56:54 +00:00
Simon Pilgrim 2228f70a85 [X86][SSE] Add initial support for combining (V)PMOVZX with shuffles.
llvm-svn: 288049
2016-11-28 17:58:19 +00:00
Simon Pilgrim 3f10e66981 [X86][SSE] Added support for combining bit-shifts with shuffles.
Bit-shifts by a whole number of bytes can be represented as a shuffle mask suitable for combining.

Added a 'getFauxShuffleMask' function to allow us to create shuffle masks from other suitable operations.

llvm-svn: 288040
2016-11-28 16:25:01 +00:00
Simon Pilgrim 3def9e11e2 [X86][SSE] Added tests showing missed combines of shifts with shuffles.
llvm-svn: 288037
2016-11-28 15:50:39 +00:00
Craig Topper ff9d45875a [X86][FMA4] Add load folding support for FMA4 scalar intrinsic instructions.
llvm-svn: 288009
2016-11-27 21:37:00 +00:00
Craig Topper b00872b983 [X86][FMA4] Add test cases to demonstrate missed folding opportunities for FMA4 scalar intrinsics.
llvm-svn: 288008
2016-11-27 21:36:58 +00:00
Simon Pilgrim 91d6f5fbc1 [X86][SSE] Add support for combining target shuffles to 128/256-bit PSLL/PSRL bit shifts
llvm-svn: 288006
2016-11-27 21:08:19 +00:00
Craig Topper 4fab487265 [AVX-512] Add integer and fp unpck instructions to load folding tables.
llvm-svn: 288004
2016-11-27 19:51:41 +00:00
Simon Pilgrim 4571157d2d [X86][SSE] Added tests showing missed combines for shuffle to shifts.
llvm-svn: 288000
2016-11-27 18:25:02 +00:00
Craig Topper c3b3926f8b [AVX-512] Add masked EVEX vpmovzx/sx instructions to load folding tables.
llvm-svn: 287995
2016-11-27 08:55:31 +00:00
Craig Topper 991d1ca3ba [X86] Add a hasOneUse check to selectScalarSSELoad to keep the same load from being folded multiple times.
Summary: When selectScalarSSELoad is looking for a scalar_to_vector of a scalar load, it makes sure the load is only used by the scalar_to_vector. But it doesn't make sure the scalar_to_vector is only used once. This can cause the same load to be folded multiple times. This can be bad for performance. This also causes the chain output to be duplicated, but not connected to anything so chain dependencies will not be satisfied.

Reviewers: RKSimon, zvi, delena, spatel

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D26790

llvm-svn: 287983
2016-11-26 17:29:25 +00:00
Craig Topper 10d5eec1a1 [AVX-512] Add unmasked EVEX vpmovzx/sx instructions to load folding tables.
llvm-svn: 287975
2016-11-26 08:21:52 +00:00
Craig Topper 97169ea5f9 [AVX-512] Add masked 128/256-bit integer add/sub instructions to load folding tables.
llvm-svn: 287974
2016-11-26 08:21:48 +00:00
Craig Topper 53b33de1e3 [AVX-512] Add masked 512-bit integer add/sub instructions to load folding tables.
llvm-svn: 287972
2016-11-26 07:21:00 +00:00
Craig Topper 6677bb4e50 [AVX-512] Teach LowerFormalArguments to use the extended register class when available. Fix the avx512vl stack folding tests to clobber more registers or otherwise they use xmm16 after this change.
llvm-svn: 287971
2016-11-26 07:20:57 +00:00
Craig Topper 39265bb1ce [AVX-512] Add VLX versions of VDIVPD/PS and VMULPD/PS to load folding tables.
llvm-svn: 287970
2016-11-26 07:20:53 +00:00
Craig Topper 88071b37ab [AVX-512] Add support for changing VSHUFF64x2 to VSHUFF32x4 when its feeding a vselect with 32-bit element size.
Summary:
Shuffle lowering may have widened the element size of a i32 shuffle to i64 before selecting X86ISD::SHUF128. If this shuffle was used by a vselect this can prevent us from selecting masked operations.

This patch detects this and changes the element size to match the vselect.

I don't handle changing integer to floating point or vice versa as its not clear if its better to push such a bitcast to the inputs of the shuffle or to the user of the vselect. So I'm ignoring that case for now.

Reviewers: delena, zvi, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D27087

llvm-svn: 287939
2016-11-25 16:48:05 +00:00
Craig Topper 1e48829747 [AVX-512] Add VPERMT2* and VPERMI2* instructions to load folding tables.
llvm-svn: 287937
2016-11-25 16:33:53 +00:00
Simon Pilgrim 84b6f26eca [X86][SSE] Added knownbits through bitcast test
llvm-svn: 287928
2016-11-25 15:07:15 +00:00
Simon Pilgrim 8424b03d00 [X86][SSE] Added v16i8 shuffle test case from PR31151
llvm-svn: 287919
2016-11-25 11:10:43 +00:00
Craig Topper d621e3a25b [X86] Modify two tests that passed undef to both sides of a vselect to instead pass unique values.
I'd like to teach DAG combine to remove vselects where both sides are identical and these tests were in the way of that.

llvm-svn: 287903
2016-11-24 21:48:50 +00:00
Craig Topper 00758090ca [AVX-512] Add tests demonstrating failure to generated masked instructions for VSHUFF32x4 and VSHUFI32x4 due to shuffle lowering widening elements.
llvm-svn: 287897
2016-11-24 18:24:46 +00:00
Simon Pilgrim 9c71e07276 [X86][SSE] Improve UINT_TO_FP v2i32 -> v2f64
Vectorize UINT_TO_FP v2i32 -> v2f64 instead of scalarization (albeit still on the SIMD unit).

The codegen matches that generated by legalization (and is in fact used by AVX for UINT_TO_FP v4i32 -> v4f64), but has to be done in the x86 backend to account for legalization via 4i32.

Differential Revision: https://reviews.llvm.org/D26938

llvm-svn: 287886
2016-11-24 15:12:56 +00:00
Simon Pilgrim 841d7ca463 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Simon Pilgrim 7c26a6f9ef [X86][AVX512DQVL] Add awareness of vcvtqq2ps and vcvtuqq2ps implicit zeroing of upper 64-bits of xmm result
llvm-svn: 287878
2016-11-24 14:02:30 +00:00
Simon Pilgrim ab323ec411 [X86][AVX512DQVL] Add support for v2i64 -> v2f32 SINT_TO_FP/UINT_TO_FP lowering
llvm-svn: 287877
2016-11-24 13:38:59 +00:00
Simon Pilgrim 1e7a846d5f [X86][AVX512DQVL] Add v2i64 -> v2f32 + zero codegen tests
llvm-svn: 287876
2016-11-24 13:26:51 +00:00
Nikolai Bozhenov 3a8d108b2b [x86] Fixing PR28755 by precomputing the address used in CMPXCHG8B
The bug arises during register allocation on i686 for
CMPXCHG8B instruction when base pointer is needed. CMPXCHG8B
needs 4 implicit registers (EAX, EBX, ECX, EDX) and a memory address,
plus ESI is reserved as the base pointer. With such constraints the only
way register allocator would do its job successfully is when the addressing
mode of the instruction requires only one register. If that is not the case
- we are emitting additional LEA instruction to compute the address.

It fixes PR28755.

Patch by Alexander Ivchenko <alexander.ivchenko@intel.com>

Differential Revision: https://reviews.llvm.org/D25088

llvm-svn: 287875
2016-11-24 13:23:35 +00:00
Craig Topper f23b995f78 [AVX-512] Fix some mask shuffle tests to actually test the case they were supposed to test.
llvm-svn: 287854
2016-11-24 05:36:50 +00:00
Craig Topper 993c7416d3 [AVX-512] Move a 16 x float shuffle test to the v16 test file and add an integer variant.
llvm-svn: 287853
2016-11-24 05:36:47 +00:00
Simon Pilgrim 3ce6a545c7 [X86][SSE] Add awareness of (v)cvtpd2dq and vcvtpd2udq implicit zeroing of upper 64-bits of xmm result
We've already added the equivalent for (v)cvttpd2dq (rL284459) and vcvttpd2udq

llvm-svn: 287835
2016-11-23 22:35:06 +00:00
Simon Pilgrim eda1193456 [X86][AVX512VL] Add v2f64 -> v2i32/v2f32 + zero codegen tests
llvm-svn: 287821
2016-11-23 22:01:50 +00:00
Simon Pilgrim 9234ff26d9 [X86][SSE] Add v2i64 -> v2i32 + zero codegen test
llvm-svn: 287813
2016-11-23 21:19:57 +00:00
Michael Kuperstein 47eb85a003 [X86] Allow folding of stack reloads when loading a subreg of the spilled reg
We did not support subregs in InlineSpiller:foldMemoryOperand() because targets
may not deal with them correctly.

This adds a target hook to let the spiller know that a target can handle
subregs, and actually enables it for x86 for the case of stack slot reloads.
This fixes PR30832.

Differential Revision: https://reviews.llvm.org/D26521

llvm-svn: 287792
2016-11-23 18:33:49 +00:00
John Brawn 150addb45c [DAGCombiner] Fix infinite loop in vector mul/shl combining
We have the following DAGCombiner transformations:
 (mul (shl X, c1), c2) -> (mul X, c2 << c1)
 (mul (shl X, C), Y) -> (shl (mul X, Y), C)
 (shl (mul x, c1), c2) -> (mul x, c1 << c2)
Usually the constant shift is optimised by SelectionDAG::getNode when it is
constructed, by SelectionDAG::FoldConstantArithmetic, but when we're dealing
with vectors and one of those vector constants contains an undef element
FoldConstantArithmetic does not fold and we enter an infinite loop.

Fix this by making FoldConstantArithmetic use getNode to decide how to fold each
vector element, the same as FoldConstantVectorArithmetic does, and rather than
adding the constant shift to the work list instead only apply the transformation
if it's already been folded into a constant, as if it's not we're going to loop
endlessly. Additionally add missing NoOpaques to one of those transformations,
which I noticed when writing the tests for this.

Differential Revision: https://reviews.llvm.org/D26605

llvm-svn: 287766
2016-11-23 16:05:51 +00:00
Simon Pilgrim 4e9b9cbee9 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Elena Demikhovsky 09375d98b8 Type legalization for compressstore and expandload intrinsics.
Implemented widening (v2f32) and splitting (v16f64).
On splitting, I use "popcnt" to calculate memory increment. 
More type legalization work will come in the next patches.

llvm-svn: 287761
2016-11-23 13:58:24 +00:00
Craig Topper f57e17def0 [AVX-512] Remove intrinsics for valignd/q and autoupgrade them to native shuffles.
llvm-svn: 287744
2016-11-23 06:54:55 +00:00
Kuba Mracek 06995e866b [xray] Add XRay support for Mach-O in CodeGen
Currently, XRay only supports emitting the XRay table (xray_instr_map) on ELF binaries. Let's add Mach-O support.

Differential Revision: https://reviews.llvm.org/D26983

llvm-svn: 287734
2016-11-23 02:07:04 +00:00
Simon Pilgrim eda365cf80 [X86][AVX512DQ] Add fp <-> int tests for AVX512DQ/AVX512DQ+VL
llvm-svn: 287706
2016-11-22 22:04:50 +00:00
Simon Pilgrim 4aa876ca7c [X86][SSE] Combine UNPCKL(FHADD,FHADD) -> FHADD for v2f64 shuffles.
This occurs during UINT_TO_FP v2f64 lowering. 

We can easily generalize this to other horizontal ops (FHSUB, PACKSS, PACKUS) as required - we are doing something similar with PACKUS in lowerV2I64VectorShuffle

llvm-svn: 287676
2016-11-22 17:50:06 +00:00
Simon Pilgrim d70c03ad68 Fix line endings
llvm-svn: 287638
2016-11-22 13:27:29 +00:00
Simon Pilgrim 72e43570b7 [SelectionDAG] ComputeNumSignBits of TRUNCATE operations
Add basic ComputeNumSignBits support for TRUNCATE ops for cases where the source's number of sign bits overlaps with the truncated size.

Improves X86 SIGN_EXTEND_IN_REG vector cases which were needlessly sign extending boolean vector results.

Differential Revision: https://reviews.llvm.org/D26851

llvm-svn: 287635
2016-11-22 11:29:19 +00:00
Craig Topper cada9f2275 [AVX-512] Add support for commuting VPERMT2(B/W/D/Q/PS/PD) to/from VPERMI2(B/W/D/Q/PS/PD).
Summary:
The index and one of the table operands can be swapped by changing the opcode to the other version. Neither of these operands are the one that can load from memory so this can't be used to increase memory folding opportunities.

We need to handle the unmasked forms and the kz forms. Since the load operand isn't being commuted we can commute the load and broadcast instructions too.

Reviewers: igorb, delena, Ayal, Farhana, RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D25652

llvm-svn: 287621
2016-11-22 04:57:34 +00:00
Craig Topper da22267055 [AVX-512] Add support for changing the element size of PALIGNR/VALIGND/VALIGNQ shuffles if they feed a vselect with a different type
Summary:
Shuffle lowering widens the element size of a shuffle if elements are contiguous. This is sometimes help because wider element types have more shuffle options. If the shuffle is one of the arguments to a vselect this shuffle widening can introduce a bitcast between the vselect and the shuffle. This will prevent isel from selecting a masked operation. If the shuffle can be written equally efficiently with a different element size to match the vselect type we should change the shuffle type to allow masking.

This patch does this conversion for all VALIGND/VALIGNQ sizes. It also supports turning 128-bit PALIGNR into VALIGND/VALIGNQ. This fixes the case shown in PR31018.

I plan to add support for more operations in future patches.

Reviewers: RKSimon, zvi, delena

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26902

llvm-svn: 287612
2016-11-22 03:51:53 +00:00
Jun Bum Lim 82f55c5446 [CodeGenPrep] Skip merging empty case blocks
Summary: Merging an empty case block into the header block of switch could cause
ISel to add COPY instructions in the header of switch, instead of the case
block, if the case block is used as an incoming block of a PHI. This could
potentially increase dynamic instructions, especially when the switch is in a
loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl

Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 287553
2016-11-21 16:47:28 +00:00
Simon Pilgrim 6704059a0d [X86][SSE] Add SSE reciprocal estimate tests
llvm-svn: 287543
2016-11-21 15:28:21 +00:00
Simon Pilgrim 49d7eda968 [SelectionDAG] Add ComputeNumSignBits support for CONCAT_VECTORS opcode
llvm-svn: 287541
2016-11-21 14:36:19 +00:00