Commit Graph

11958 Commits

Author SHA1 Message Date
Roman Lebedev 488d28d4e5 [X86] Emit BZHI when mask is ~(-1 << nbits))
Summary:
In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation.
As it is seen from these tests, there is a reason for that.

AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!).
The other way around for X86.
It would be much better to canonicalize.

This patch is completely monkey-typing.
I don't really understand how this works :)
I have based it on `// x & (-1 >> (32 - y))` pattern.

Also, when we only have `BMI`, i wonder if we could use `BEXTR` with `start=0` ?

Related links:
https://bugs.llvm.org/show_bug.cgi?id=36419
https://bugs.llvm.org/show_bug.cgi?id=37603
https://bugs.llvm.org/show_bug.cgi?id=37610
https://rise4fun.com/Alive/idM

Reviewers: craig.topper, spatel, RKSimon, javed.absar

Reviewed By: craig.topper

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D47453

llvm-svn: 334125
2018-06-06 19:38:16 +00:00
Roman Lebedev cb56f7a550 [NFC][X86][AArch64] Reorganize/cleanup BZHI test patterns
Summary:
In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation.
As it is seen from these tests, there is a reason for that.

AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!).
The other way around for X86.
It would be much better to canonicalize.

It would seem that there is too much tests, but this is most of all the auto-generated possible variants
of C code that one would expect for BZHI to be formed, and then manually cleaned up a bit.
So this should be pretty representable, which somewhat good coverage...

Related links:
https://bugs.llvm.org/show_bug.cgi?id=36419
https://bugs.llvm.org/show_bug.cgi?id=37603
https://bugs.llvm.org/show_bug.cgi?id=37610
https://rise4fun.com/Alive/idM

Reviewers: javed.absar, craig.topper, RKSimon, spatel

Reviewed By: RKSimon

Subscribers: kristof.beyls, llvm-commits, RKSimon, craig.topper, spatel

Differential Revision: https://reviews.llvm.org/D47452

llvm-svn: 334124
2018-06-06 19:38:10 +00:00
Michael Berg cc1c4b6912 guard fsqrt with fmf sub flags
Summary:
This change uses fmf subflags to guard optimizations as well as unsafe. These changes originated from D46483.
It contains only context for fsqrt.


Reviewers: spatel, hfinkel, arsenm

Reviewed By: spatel

Subscribers: hfinkel, wdng, andrew.w.kaylor, wristow, efriedma, nemanjai

Differential Revision: https://reviews.llvm.org/D47749

llvm-svn: 334113
2018-06-06 18:47:55 +00:00
Simon Pilgrim 3d14158891 [X86][BMI][TBM] Only demand bottom 16-bits of the BEXTR control op (PR34042)
Only the bottom 16-bits of BEXTR's control op are required (0:8 INDEX, 15:8 LENGTH).

Differential Revision: https://reviews.llvm.org/D47690

llvm-svn: 334083
2018-06-06 10:52:10 +00:00
Sanjay Patel 59313be8d3 [CodeGen] assume max/default throughput for unspecified instructions
This is a fix for the problem arising in D47374 (PR37678):
https://bugs.llvm.org/show_bug.cgi?id=37678

We may not have throughput info because it's not specified in the model 
or it's not available with variant scheduling, so assume that those
instructions can execute/complete at max-issue-width.

Differential Revision: https://reviews.llvm.org/D47723

llvm-svn: 334055
2018-06-05 23:34:45 +00:00
Guozhi Wei c4c6b548c5 [CodeGenPrepare] Move Extension Instructions Through Logical And Shift Instructions
CodeGenPrepare pass move extension instructions close to load instructions in different BB, so they can be combined later. But the extension instructions can't move through logical and shift instructions in current implementation. This patch enables this enhancement, so we can eliminate more extension instructions.

Differential Revision: https://reviews.llvm.org/D45537

This is re-commit of r331783, which was reverted by r333305. The performance regression was caused by some unlucky alignment, not a code generation problem.

llvm-svn: 334049
2018-06-05 21:03:52 +00:00
Simon Pilgrim f2f043acbb [X86][SSE] Use multiplication scale factors for v8i16 SHL on pre-AVX2 targets.
Similar to v4i32 SHL, convert v8i16 shift amounts to scale factors instead to improve performance and reduce instruction count. We were already doing this for constant shifts, this adds variable shift support.

Reduces the serial nature of the codegen, which relies on chains of plendvb/pand+pandn+por shifts.

This is a step towards adding support for vXi16 vector rotates.

Differential Revision: https://reviews.llvm.org/D47546

llvm-svn: 334023
2018-06-05 15:17:39 +00:00
Simon Pilgrim fef9b6eea6 [X86][SSE] Add target shuffle support to X86TargetLowering::computeKnownBitsForTargetNode
Ideally we'd use resolveTargetShuffleInputs to handle faux shuffles as well but:
(a) that code path doesn't handle general/pre-legalized ops/types very well.
(b) I'm concerned about the compute time as they recurse to calls to computeKnownBits/ComputeNumSignBits which would need depth limiting somehow.

llvm-svn: 334007
2018-06-05 10:52:29 +00:00
Simon Pilgrim 7bbe7a2920 [X86][SSE] Add basic PACKUS support to X86TargetLowering::computeKnownBitsForTargetNode
Helps improve analysis of saturation ops

llvm-svn: 333995
2018-06-05 09:45:03 +00:00
Alexander Ivchenko 964b27fa21 [X86][CET] Shadow stack fix for setjmp/longjmp
This is the new version of D46181, allowing setjmp/longjmp
to work correctly with the Intel CET shadow stack by storing
SSP on setjmp and fixing it on longjmp. The patch has been
updated to use the cf-protection-return module flag instead
of HasSHSTK, and the bug that caused D46181 to be reverted
has been fixed with the test expanded to track that fix.

patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D47311

llvm-svn: 333990
2018-06-05 09:22:30 +00:00
Craig Topper f17b33d6c6 [X86] Make all instructions that operate on MMX types, but were added after the initial MMX support via one of the SSE features flags make them require the MMX feature as well.
Passing -mattr=-mmx needs to disable these instructions since the MMX register class won't have been set up. But we don't want -mattr=-mmx to disable SSE so we have to do it separately.

llvm-svn: 333984
2018-06-05 06:20:06 +00:00
Vedant Kumar 800255f9f1 [Debugify] Don't insert debug values after terminating deopts
As is the case with musttail calls, the IR does not allow for
instructions inserted after a terminating deopt.

llvm-svn: 333976
2018-06-05 00:56:07 +00:00
Francis Visoiu Mistrih ca69b3bf6d [ShrinkWrap] Add optimization remarks to the shrink-wrapping pass
Start by emitting remarks for very basic unsupported cases such as
irreducible CFGs and EHFunclets. The end goal is to be able to cover all
the cases where we give up with an explanation.

llvm-svn: 333972
2018-06-05 00:27:24 +00:00
Amaury Sechet 800ac42573 Remove various use of undef in the X86 test suite as patern involving undef can collapse them. NFC
llvm-svn: 333961
2018-06-04 22:09:26 +00:00
Amaury Sechet e2729faf52 Revert "Regenerate expected test results for test/CodeGen/X86/pr23103.ll . NFC"
This reverts commit cf25dfc503c861845947f3e6a9d308811ebb9da3.

llvm-svn: 333960
2018-06-04 21:49:23 +00:00
Amaury Sechet f5db3a15bf Revert "Remove various use of undef in the X86 test suite as patern involving undef can collapse them. NFC"
This reverts commit f0e85c194ae5e87476bc767304470dec85b6774f.

llvm-svn: 333953
2018-06-04 21:20:45 +00:00
Alexander Ivchenko 2f038c4094 [X86][ELF][CET] Adding the .note.gnu.property ELF section in X86
In preparation for the proposed linker ABI changes
(https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf,
https://github.com/hjl-tools/x86-psABI/wiki/x86-64-psABI-cet.pdf),
this patch enables emission of the .note.gnu.property section to
ELF object files when building CET-enabled modules.

patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D47145

llvm-svn: 333951
2018-06-04 21:07:35 +00:00
Amaury Sechet 87f1a240ba Remove various use of undef in the X86 test suite as patern involving undef can collapse them. NFC
llvm-svn: 333950
2018-06-04 20:57:27 +00:00
Amaury Sechet 1910090328 Regenerate expected test results for test/CodeGen/X86/pr23103.ll . NFC
llvm-svn: 333949
2018-06-04 20:47:00 +00:00
Amaury Sechet da661e9236 [DAGcombine] Teach the combiner about -a = ~a + 1
Summary: This include variant for add, uaddo and addcarry. usubo and subcarry require the carry to be flipped to preserve semantic, but we chose to do the transform anyway in that case as to push the transform down the carry chain.

Reviewers: efriedma, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46505

llvm-svn: 333943
2018-06-04 19:23:22 +00:00
Andrea Di Biagio 39e5a5695f [RFC][patch 3/3] Add support for variant scheduling classes in llvm-mca.
This patch is the last of a sequence of three patches related to LLVM-dev RFC
"MC support for variant scheduling classes".
http://lists.llvm.org/pipermail/llvm-dev/2018-May/123181.html

This fixes PR36672.

The main goal of this patch is to teach llvm-mca how to solve variant scheduling
classes.  This patch does that, plus it adds new variant scheduling classes to
the BtVer2 scheduling model to identify so-called zero-idioms (i.e. so-called
dependency breaking instructions that are known to generate zero, and that are
optimized out in hardware at register renaming stage).

Without the BtVer2 change, this patch would not have had any meaningful tests.
This patch is effectively the union of two changes:
 1) a change that teaches llvm-mca how to resolve variant scheduling classes.
 2) a change to the BtVer2 scheduling model that allows us to special-case
    packed XOR zero-idioms (this partially fixes PR36671).

Differential Revision: https://reviews.llvm.org/D47374 

llvm-svn: 333909
2018-06-04 15:43:09 +00:00
Craig Topper 9923eac358 [X86] Remove and autoupgrade masked avx512vnni intrinsics using the unmasked intrinsics and select instructions.
llvm-svn: 333857
2018-06-03 23:24:17 +00:00
Vedant Kumar 77f4d4d8aa [Debugify] Skip dbg.value placement for EH pads, musttail
Placing meta-instructions into EH pads breaks certain IR invariants, as
does placing instructions after a musttail call.

llvm-svn: 333856
2018-06-03 22:50:22 +00:00
Simon Pilgrim 7c4446ce0c [X86][TBM] Use realistic BEXTR control bits
Avoid constant values that are guaranteed to give zero

Found while investigating BEXTR optimizations for PR34042.

llvm-svn: 333849
2018-06-03 18:15:06 +00:00
Simon Pilgrim 1f60e2b41b [X86][AVX512] Cleanup intrinsics tests
Ensure we test on 32-bit and 64-bit targets, and strip -mcpu usage.

Part of ongoing work to ensure we test all intrinsic style tests on 32 and 64 bit targets where possible.

llvm-svn: 333843
2018-06-03 14:56:04 +00:00
Simon Pilgrim 7d717fed0b [X86][AVX512BW] Regenerate arithmetic tests using update_llc_test_checks.py script
Require manual stripping of existing CHECKs as update_llc_test_checks doesn't remove them if they're outside the function

llvm-svn: 333842
2018-06-03 14:31:30 +00:00
Simon Pilgrim e370ade180 [X86][BMI1] Test i32 intrinsics on 32/64 bits + branch off i64 tests
Further refactoring will wait until D47452 has landed.

Part of ongoing work to ensure we test all intrinsic style tests on 32 and 64 bit targets where possible.

llvm-svn: 333841
2018-06-03 14:11:34 +00:00
Simon Pilgrim 8dc43621ec [X86][BMI] Remove CTTZ tests - this is fully covered in clz.ll
llvm-svn: 333840
2018-06-03 13:55:17 +00:00
Simon Pilgrim d4ef869e28 [X86][TBM] Branch off i32 intrinsics and test on 32/64 bits
Part of ongoing work to ensure we test all intrinsic style tests on 32 and 64 bit targets where possible.

llvm-svn: 333839
2018-06-03 13:38:52 +00:00
Simon Pilgrim 2b55e751ce [X86][SSE] Cleanup AVX1 intrinsics tests
Ensure we cover 32/64-bit targets for SSE/AVX/AVX512 cases as necessary, strip -mcpu usage.

llvm-svn: 333834
2018-06-02 21:35:48 +00:00
Simon Pilgrim 58ff2ecc4b [X86][SSE] Cleanup SSE1 intrinsics tests
Ensure we cover 32/64-bit targets for SSE/AVX/AVX512 cases as necessary

llvm-svn: 333833
2018-06-02 20:25:56 +00:00
Simon Pilgrim 8790844848 [X86][SSE] Cleanup SSE2 intrinsics tests
Ensure we cover 32/64-bit targets for SSE/AVX/AVX512 cases as necessary

llvm-svn: 333832
2018-06-02 19:43:14 +00:00
Simon Pilgrim 8c5b33a085 [X86][SSE] Cleanup SSE3/SSSE3 intrinsics tests
Ensure we cover 32/64-bit targets for SSE/AVX/AVX512 cases as necessary

llvm-svn: 333831
2018-06-02 18:41:46 +00:00
Simon Pilgrim 1c0fa05397 [X86][SSE4] Tweak rL333828 sse41/sse42 cleanup to recover SKX/EVEX2VEX testing
Just testing for avx512f was missing the tests for EVEX TO VEX Compression encoding etc.

llvm-svn: 333830
2018-06-02 18:01:09 +00:00
Simon Pilgrim dda8daec73 [X86][SSE] Cleanup SSE4A/SSE41/SSE42 intrinsics tests
Ensure we cover 32/64-bit targets for SSE/AVX/AVX512 cases as necessary

Added some missing encoding checks to SSE4A tests

llvm-svn: 333828
2018-06-02 17:33:26 +00:00
Simon Pilgrim d93157c1b3 [X86][BMI2] Test i32 intrinsics on 32/64 bits + branch off i64 tests
I had to tweak the i32 tests so we check both reg-reg and reg-mem cases.

I also added i64 load tests.

Part of ongoing work to ensure we test all intrinsic style tests on 32 and 64 bit targets where possible.

llvm-svn: 333827
2018-06-02 17:22:13 +00:00
Simon Pilgrim 6028dc451a [X86][BMI1] Remove test for non-existent andn i16 instruction
llvm-svn: 333826
2018-06-02 17:02:27 +00:00
Craig Topper 3828ce7eab [X86] Do something sensible when an expand load intrinsic is passed a 0 mask.
Previously we just returned undef, but really we should be returning the pass thru input. We also need to make sure we preserve the chain output that the original intrinsic node had to maintain connectivity in the DAG. So we should just return the incoming chain as the output chain.

llvm-svn: 333804
2018-06-01 22:59:07 +00:00
Craig Topper aa747412b1 [X86] Add isel patterns to use vexpand with zero masking when the passthru value is a zero vector.
llvm-svn: 333800
2018-06-01 22:28:28 +00:00
Craig Topper c45479c08e [X86] Expand the testing of expand and compress intrinsics
The avx512f intrinsic tests were in the avx512vl file. We were also missing some combinations of masking.

This does show that we fail to use the zero masking form of expand loads when the passthru is zero. I'll try to get that fixed shortly.

llvm-svn: 333795
2018-06-01 21:59:24 +00:00
Craig Topper d7e11ee342 [X86] Add fast-isel tests for avx512vbmi2 instructions.
llvm-svn: 333794
2018-06-01 21:59:22 +00:00
Alexander Ivchenko b34afcec5d [x86] NFC. Reautogenerate test/CodeGen/X86/vector-half-conversions.ll
llvm-svn: 333750
2018-06-01 13:51:53 +00:00
Simon Pilgrim ee7694442d [Utils][X86] Help update_llc_test_checks.py to recognise retl/retq to reduce CHECK duplication (PR35003)
This patch replaces the --x86_extra_scrub command line argument to automatically support a second level of regex-scrubbing if it improves the matching of nearly-identical code patterns. The argument '--extra_scrub' is there now to force extra matching if required.

This is mostly useful to help us share 32-bit/64-bit x86 vector tests which only differs by retl/retq instructions, but any scrubber can now technically support this, meaning test checks don't have to be needlessly obfuscated.

I've updated some of the existing checks that had been manually run with --x86_extra_scrub, to demonstrate the extra "ret{{[l|q]}}" scrub now only happens when useful, and re-run the sse42-intrinsics file to show extra matches - most sse/avx intrinsics files should be able to now share 32/64 checks.

Tested with the opt/analysis scripts as well which share common code - AFAICT the other update scripts use their own versions.

Differential Revision: https://reviews.llvm.org/D47485

llvm-svn: 333749
2018-06-01 13:37:01 +00:00
Sriraman Tallam d10c4e07f5 Relax GOTPCREL relocations for tail jmp instructions.
Differential Revision: https://reviews.llvm.org/D47563

llvm-svn: 333676
2018-05-31 18:12:33 +00:00
Simon Pilgrim ff0623cd29 [X86][SSE] Recognise splat rotations and expand back to shift ops.
Noticed while fixing PR37426, for splat rotations (rotation by an uniform value) its better to just expand back to shift ops than performing as a general non-uniform rotation.

llvm-svn: 333661
2018-05-31 15:47:17 +00:00
Clement Courbet 2e41c5a79c [X86] Introduce WriteFLDC for x87 constant loads.
Summary:
{FLDL2E, FLDL2T, FLDLG2, FLDLN2, FLDPI} were using WriteMicrocoded.

 - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake.
 - For ZnVer1 and Atom, values were transferred form InstRWs.
 - For SLM and BtVer2, I've guessed some values :(

Reviewers: RKSimon, craig.topper, andreadb

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D47585

llvm-svn: 333656
2018-05-31 14:22:01 +00:00
Clement Courbet b78ab5097d [X86] Extract latency of fldz/fld1 in separate classes.
Summary:
 - I've measured the values for Broadwell, Haswell, SandyBridge, Skylake.
 - For ZnVer1 and Atom, values were transferred form `InstRW`s.
 - For SLM and BtVer2, values are from Agner.

This is split off from https://reviews.llvm.org/D47377

Reviewers: RKSimon, andreadb

Subscribers: gbedwell, llvm-commits

Differential Revision: https://reviews.llvm.org/D47523

llvm-svn: 333642
2018-05-31 11:41:27 +00:00
Simon Pilgrim 346886bc0d [X86][SSE] Add support for detecting SUB(SPLAT_BV, SPLAT) cases for shift-rotate patterns.
This improves splat rotations (rotation by an uniform value), to avoid having to use the generic non-uniform shift code (extension to PR37426).

llvm-svn: 333641
2018-05-31 11:25:16 +00:00
Craig Topper 2fbbffa4bf [X86] Update the fast-isel tests for _mm_rcp_ss, _mm_rsqrt_ss, and _mm_sqrt_ss to match clang codegen after r333572.
llvm-svn: 333573
2018-05-30 18:30:44 +00:00
Simon Pilgrim 3173f73554 [X86][AVX512BW] Fixed check prefix copy+paste typo in avx512bw-intrinsics.ll
Prefix was for AVX512F instead of AVX512BW 

llvm-svn: 333560
2018-05-30 16:29:06 +00:00
Gabor Buella 890e363e11 [X86] Lowering FMA intrinsics to native IR (LLVM part)
Support for Clang lowering of fused intrinsics. This patch:

1. Removes bindings to clang fma intrinsics.
2. Introduces new LLVM unmasked intrinsics with rounding mode:
     int_x86_avx512_vfmadd_pd_512
     int_x86_avx512_vfmadd_ps_512
     int_x86_avx512_vfmaddsub_pd_512
     int_x86_avx512_vfmaddsub_ps_512
     supported with a new intrinsic type (INTR_TYPE_3OP_RM).
3. Introduces new x86 fmaddsub/fmsubadd folding.
4. Introduces new tests for code emitted by sequentions introduced in Clang part.

Patch by tkrupa

Reviewers: craig.topper, sroland, spatel, RKSimon

Reviewed By: craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D47443

llvm-svn: 333554
2018-05-30 15:25:16 +00:00
Simon Pilgrim 8df8b129ce [X86][AVX512] Replace -cpu=knl with -mattr=+avx512f for avx512-intrinsics tests
It was noticed on D47377 that these tests were being unnecessarily affected by scheduler changes.

This adds vzeroupper at the end of some tests as we lose the 'FeatureFastPartialYMMorZMMWrite' feature from KNL, since Skylake+ don't support this its probably better.

llvm-svn: 333549
2018-05-30 14:36:41 +00:00
Simon Pilgrim 173e225f1c [X86][SSE] Remove unnecessary -cpu from sttni tests
It was noticed on D47377 that these tests (for PR37246) were being unnecessarily affected by scheduler changes.

llvm-svn: 333546
2018-05-30 14:11:57 +00:00
Simon Pilgrim 61b859dca7 [X86][SSE] Replace -cpu with equivalent -mattr for vec_cast tests
It was noticed on D47377 that these tests were being unnecessarily affected by scheduler changes.

llvm-svn: 333545
2018-05-30 14:01:21 +00:00
Craig Topper aba57bfebd [X86] Rename the operands in the recently introduced MOVSS+FMA patterns so that the operand names in the output pattern are always in 1, 2, 3 order since those are the operand names in the instruction.
The order should be controlled in the input pattern.

llvm-svn: 333463
2018-05-29 20:46:26 +00:00
Craig Topper 5439b3d1e5 [X86] Fix a potential crash that occur after r333419.
The code could issue a truncate from a small type to larger type. We need to extend in that case instead.

llvm-svn: 333460
2018-05-29 20:04:10 +00:00
Simon Pilgrim db9dbac501 [X86][SSE] Regenerate sdiv combine tests
llvm-svn: 333431
2018-05-29 16:36:27 +00:00
Simon Pilgrim 77149a801f [X86][AVX] Regenerate vzeroall/vzeroupper cleanup tests
llvm-svn: 333430
2018-05-29 16:35:38 +00:00
Alexander Ivchenko 96062eaa8e [X86] Scalar mask and scalar move optimizations
1. Introduction of mask scalar TableGen patterns.
2. Introduction of new scalar move TableGen patterns
   and refactoring of existing ones.
3. Folding of pattern created by introducing scalar
   masking in Clang header files.

Patch by tkrupa

Differential Revision: https://reviews.llvm.org/D47012

llvm-svn: 333419
2018-05-29 14:27:11 +00:00
Than McIntosh 48bf43df8a StackColoring: better handling of statically unreachable code
Summary:
Avoid assert/crash during liveness calculation in situations
where the incoming machine function has statically unreachable BBs.
Second attempt at submitting; this version of the change includes
a revised testcase.

Fixes PR37130.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47372

llvm-svn: 333416
2018-05-29 13:52:24 +00:00
Craig Topper a34f8731c7 [X86] Disable a DAG combine to allow packed AVX512DQ instructions to be consistently used for i64->float/double conversions.
Summary: We already get this right if the i64 didn't come from a load.

Reviewers: RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47439

llvm-svn: 333393
2018-05-29 06:22:45 +00:00
Clement Courbet 07c9ec6f2e [X86][Sched] Add InstRW for CLC on Intel after SNB.
Summary:
After SNB, Intel CPUs can rename CF independently of other EFLAGS,
so the renamer can zero it for free. Note that STC still consumes resources.

To reproduce: `$ llvm-exegesis -mode=uops -opcode-name=CLC`

On SNB:
```
---
key:
  opcode_name:     CLC
  mode:            uops
  config:          ''
cpu_name:        sandybridge
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: '3', value: 0.0014, debug_string: SBPort0 }
  - { key: '4', value: 0.0013, debug_string: SBPort1 }
  - { key: '5', value: 0.0003, debug_string: SBPort4 }
  - { key: '6', value: 0.0029, debug_string: SBPort5 }
  - { key: '10', value: 0.0003, debug_string: SBPort23 }
error:           ''
info:            'instruction is serial, repeating a random one.
Snippet:
CLC
'
...
```

On HSW:
```
---
key:
  opcode_name:     CLC
  mode:            uops
  config:          ''
cpu_name:        haswell
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: '3', value: 0.001, debug_string: HWPort0 }
  - { key: '4', value: 0.0009, debug_string: HWPort1 }
  - { key: '5', value: 0.0004, debug_string: HWPort2 }
  - { key: '6', value: 0.0006, debug_string: HWPort3 }
  - { key: '7', value: 0.0002, debug_string: HWPort4 }
  - { key: '8', value: 0.0012, debug_string: HWPort5 }
  - { key: '9', value: 0.0022, debug_string: HWPort6 }
  - { key: '10', value: 0.0001, debug_string: HWPort7 }
error:           ''
info:            'instruction is serial, repeating a random one.
Snippet:
CLC
'
...

```

Reviewers: craig.topper, RKSimon

Subscribers: gchatelet, llvm-commits

Differential Revision: https://reviews.llvm.org/D47362

llvm-svn: 333392
2018-05-29 06:19:39 +00:00
Craig Topper 21aeddc3dc [X86] Remove masked vpermi2var/vpermt2var intrinsics and autoupgrade.
We have unmasked intrinsics now and wrap them with a select. This is a net reduction of 36 intrinsics from before the unmasked intrinsics were added.

llvm-svn: 333388
2018-05-29 05:22:05 +00:00
Craig Topper 2adc7d956c [X86] Add unmasked vermi2var intrinsics so we can use explicit select instructions for masking in clang.
This will allow us to remove the 3 different flavors of masked intrinsics. I'm leaving the actual intrinsic removal for another patch.

llvm-svn: 333386
2018-05-29 03:26:30 +00:00
Craig Topper dcfcfdb0d1 [X86] Converge X86ISD::VPERMV3 and X86ISD::VPERMIV3 to a single opcode.
These do the same thing with the first and second sources swapped. They previously came from separate intrinsics that specified different masking behavior. But we can cover that with isel patterns and a single node.

This is a step towards reducing the number of intrinsics needed.

A bunch of tests change because we are now biased to choosing VPERMT over VPERMI when there is nothing to signal that commuting is beneficial.

llvm-svn: 333383
2018-05-28 19:33:11 +00:00
Simon Pilgrim 79dae5ba2a [X86] Don't hardcode scheduler class
Also fixes BEXTRI instruction to use WritBEXTR class, which was missed when the class was added.

llvm-svn: 333360
2018-05-27 14:54:18 +00:00
Craig Topper 51eddb8749 [X86] Remove masking from avx512ifma intrinsics. Use a select instead.
This allows us to avoid having mask and maskz variant. Reducing from 12 intrinsics to 6.

llvm-svn: 333346
2018-05-26 18:55:19 +00:00
Amaury Sechet c9edc0cfe2 Add test case for D46505 . NFC
llvm-svn: 333341
2018-05-26 12:28:23 +00:00
Guozhi Wei 03005d3f03 [CodeGenPrepare] Revert r331783
The patch r331783 caused regression in one of our internal application. So revert it now, will investigate it further.

llvm-svn: 333305
2018-05-25 20:30:26 +00:00
Simon Pilgrim 0155bf0da9 [X86][SNB] Fix differences between vex/non-vex XMM vector moves (PR37286)
As confirmed by llvm-exegesis, there is no scheduler difference between MOVDQA/MOVDQU and VMOVDQA/VMOVDQU xmm reg-reg moves

Another chapter in the never ending crusade to remove useless InstRW overrides from the x86 scheduler models......

llvm-svn: 333271
2018-05-25 12:18:11 +00:00
Gabor Buella d2f1ab1b10 [x86] invpcid LLVM intrinsic
Re-add the feature flag for invpcid, which was removed in r294561.
Add an intrinsic, which always uses a 32 bit integer as first argument,
while the instruction actually uses a 64 bit register in 64 bit mode
for the INVPCID_TYPE argument.

Reviewers: craig.topper

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D47141

llvm-svn: 333255
2018-05-25 06:32:05 +00:00
Sanjay Patel 92092ecc02 [x86] add vector load-cmp-select tests; NFC
llvm-svn: 333185
2018-05-24 13:49:57 +00:00
Ekaterina Romanova db5c58ca00 Added a testcase for PR31593. A patch (r291535) that fixed this bug didn't have a testcase.
Differential Revision: https://reviews.llvm.org/D47129

llvm-svn: 333167
2018-05-24 08:45:15 +00:00
Paul Robinson 543c0e1d50 [DWARFv5] Put the DWO ID in its place.
In DWARF v5, the DWO ID is in the (split/skeleton) CU header, not an
attribute on the CU DIE.

This changes the size of those headers, so use the parsed size whenever
we have one, for simplicitly.

Differential Revision: https://reviews.llvm.org/D47158

llvm-svn: 333004
2018-05-22 17:27:31 +00:00
Gabor Buella e96c488aa7 [x86] NFC Add some more shuffle-vs-trunc tests
These are related to: https://reviews.llvm.org/D46957

llvm-svn: 332962
2018-05-22 09:47:42 +00:00
Sanjay Patel 17a870f07c [DAG] fold FP binops with undef operands to NaN
This is the FP sibling of D43141 with the corresponding IR change in rL327212.

We can't propagate undef here because if a variable operand is a NaN, these 
binops must propagate NaN. Neither global nor node-level fast-math makes a 
difference. If we have 'nnan', I think later folds can turn the NaN into undef.

The tests in X86/fp-undef.ll are meant to be the definitive verification for 
these folds - everything reduces identically now.

The other test changes are collateral damage. They may need to be altered to
preserve their intent.

Differential Revision: https://reviews.llvm.org/D47026

llvm-svn: 332920
2018-05-21 23:54:19 +00:00
Craig Topper 358b094971 [X86] Remove 128/256-bit cvtdq2ps, cvtudq2ps, cvtqq2pd, cvtuqq2pd intrinsics.
These can all be implemented with sitofp/uitofp instructions.

llvm-svn: 332916
2018-05-21 23:15:00 +00:00
Roman Lebedev 9f65d16d5d [DAGCombiner] isAllOnesConstantOrAllOnesSplatConstant(): look through bitcasts
Summary:
As pointed out in D46528, we errneously transform cases like `xor X, -1`,
even though we use said function.
It's because the `-1` is actually a bitcast there.
So i think we can just look through it in the function.

Differential Revision: https://reviews.llvm.org/D47156

llvm-svn: 332905
2018-05-21 21:41:10 +00:00
Roman Lebedev 7772de25d0 [DAGCombine][X86][AArch64] Masked merge unfolding: vector edition.
Summary:
This **appears** to be the last missing piece for the masked merge pattern handling in the backend.

This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`), and we need to make sure that they are generated.

Differential Revision: https://reviews.llvm.org/D46528

llvm-svn: 332904
2018-05-21 21:41:02 +00:00
Roman Lebedev fd79bc3aa2 [X86][AArch64][NFC] Add tests for vector masked merge unfolding
Summary:
This is [[ https://bugs.llvm.org/show_bug.cgi?id=37104 | PR37104 ]].

[[ https://bugs.llvm.org/show_bug.cgi?id=6773 | PR6773 ]] will introduce an IR canonicalization that is likely bad for the end assembly.
Previously, `andps`+`andnps` / `bsl` would be generated. (see `@out`)
Now, they would no longer be generated  (see `@in`).

Differential Revision: https://reviews.llvm.org/D46008

llvm-svn: 332903
2018-05-21 21:40:51 +00:00
Craig Topper 25444c852a [DAGCombiner] Use computeKnownBits to match rotate patterns that have had their amount masking modified by simplifyDemandedBits
SimplifyDemandedBits can remove bits from the masks for the shift amounts we need to see to detect rotates.

This patch uses zeroes from computeKnownBits to fill in some of these mask bits to make the match work.

As currently written this calls computeKnownBits even when the mask hasn't been simplified because it made the code simpler. If we're worried about compile time performance we can improve this.

I know we're talking about making a rotate intrinsic, but hopefully we can go ahead and do this change and just make sure the rotate intrinsic also handles it.

Differential Revision: https://reviews.llvm.org/D47116

llvm-svn: 332895
2018-05-21 21:09:18 +00:00
Craig Topper dc3bf90447 [X86] Remove some unneeded check lines that I copy and pasted when I made vector tests from some scalar test cases.
llvm-svn: 332892
2018-05-21 21:01:13 +00:00
Craig Topper aad3aefaeb [X86] Remove masking from vpternlog intrinsics. Use a select in IR instead.
This removes 6 intrinsics since we no longer need separate mask and maskz intrinsics.

Differential Revision: https://reviews.llvm.org/D47124

llvm-svn: 332890
2018-05-21 20:58:09 +00:00
Peter Collingbourne 9a45114b3c CodeGen: Add a dwo output file argument to addPassesToEmitFile and hook it up to dwo output.
Part of PR37466.

Differential Revision: https://reviews.llvm.org/D47089

llvm-svn: 332881
2018-05-21 20:16:41 +00:00
Craig Topper a010b3c9dc [X86] Add test cases for D47012.
Patch by Thomasz Krupa.

llvm-svn: 332872
2018-05-21 19:33:42 +00:00
Craig Topper ef313905f0 [X86] Add test cases for missed vector rotate matching due to SimplifyDemandedBits interfering with the AND masks
As requested in D47116

llvm-svn: 332869
2018-05-21 19:27:50 +00:00
Lama Saba 9417f7ff2e [X86] - Avoid SFB pass - fix bug in updating the offsets for newly created copies
Change-Id: I169ab6fe7e187727c0298c2a1e2868a683f3e688
llvm-svn: 332849
2018-05-21 16:23:16 +00:00
Simon Pilgrim 5aa7cdfd70 [X86][SSE] Support v4i32 rotations (PR37426)
As suggested by Fabian on PR37426, we can use PMULUDQ to perform v4i32 vector rotations as the upper 32bits of the multiply will contain the 'wrapped' bits of the rotation.

v8i16/v16i8 rotations would be straightforward to add to lowerRotate in the future - ideally we'd mostly share code with the vector shifts lowering.

Differential Revision: https://reviews.llvm.org/D46954

llvm-svn: 332832
2018-05-21 09:45:59 +00:00
Craig Topper e4c045b7df [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead.
Someday maybe we'll use selects for all intrinsics.

llvm-svn: 332824
2018-05-20 23:34:04 +00:00
Craig Topper 9ed890b0db [X86] Add test cases to show missed rotate opportunities due to SimplifyDemandedBits.
llvm-svn: 332815
2018-05-20 02:32:45 +00:00
Sanjay Patel b41a5affea [x86] add more FP with FMF simplification tests; NFC
llvm-svn: 332780
2018-05-18 22:31:43 +00:00
Matt Arsenault 9fc8593a77 DAG: Fix crash on shift with large shift amounts
Fixes bug 37521.

llvm-svn: 332774
2018-05-18 21:54:16 +00:00
Michael Berg 1fa76cc3ea adding baseline fp fold tests for unsafe on and off
llvm-svn: 332756
2018-05-18 19:30:49 +00:00
Simon Pilgrim 1273f4ad93 [X86] Add GPR<->XMM Schedule Tags
BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737)

The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1:
SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM)
Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner)

llvm-svn: 332745
2018-05-18 17:58:36 +00:00
Craig Topper f94ed26ea9 [X86] Directly legalize v16i16/v8i16 vselect to vXi8 vselect to use VPBLENDVB
The intrinsic legalization for masked truncate uses ISD::TRUNCATE which can be constant folded by getNode. This prevents getVectorMaskingNode from seeing the ISD::TRUNCATE special case where it should emit X86ISD::SELECT instead of ISD::VSELECT. This causes a vselect with a v16i1 or v8i1 condition to be emitted during vector legalization. but vector legalization doesn't revisit nodes it creates. DAG combine will then promote this condition to match the result type. Then op legalization will try to legalize it, but the custom lowering hook returned SDValue(). But op legalization doesn't have an Expand for VSELECT because it expects vector legalization to have taken care of it. So the operation sticks around and fails in isel.

This patch adds a custom legalization hook to morph it to a vXi8 vselect instead.

This also simplifies the normal vXi16 vselect handling because vector legalization was normally expanding to AND/ANDN/OR and DAG combine was turning that into VBLENDVB. So we can skip a step by doing it directly.

Fixes PR37499

Differential Revision: https://reviews.llvm.org/D47025

llvm-svn: 332743
2018-05-18 17:48:06 +00:00
Than McIntosh 3c639dbd0d Revert changes from D46265.
This is a revert of the changes from https://reviews.llvm.org/D46265;
the new test introduced (test/CodeGen/X86/PR37310.mir) causes buildbot
failures.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47061

llvm-svn: 332742
2018-05-18 17:47:10 +00:00
Craig Topper fdde9f363c [X86] Update fast-isel test cases for _mm256_mask_cvtepi16_epi8 to match clang r332738.
llvm-svn: 332740
2018-05-18 17:29:47 +00:00
Simon Pilgrim 007b50fd35 [X86][BtVer2] Improve simulation of (V)PINSR values
Include the 6cy delay transferring from the GPR to FPU.

llvm-svn: 332737
2018-05-18 17:09:41 +00:00
Simon Pilgrim 3ecb0b80f6 [X86][BtVer2] Partial vector stores (inc MMX) have a 2cy latency
llvm-svn: 332722
2018-05-18 14:22:22 +00:00
Than McIntosh 4c21a363af StackColoring: better handling of statically unreachable code
Summary:
Avoid assert/crash during liveness calculation in situations where the
incoming machine function has statically unreachable BBs.

Fixes PR37130.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46265

llvm-svn: 332707
2018-05-18 12:25:30 +00:00