Commit Graph

11824 Commits

Author SHA1 Message Date
Craig Topper a010b3c9dc [X86] Add test cases for D47012.
Patch by Thomasz Krupa.

llvm-svn: 332872
2018-05-21 19:33:42 +00:00
Craig Topper ef313905f0 [X86] Add test cases for missed vector rotate matching due to SimplifyDemandedBits interfering with the AND masks
As requested in D47116

llvm-svn: 332869
2018-05-21 19:27:50 +00:00
Lama Saba 9417f7ff2e [X86] - Avoid SFB pass - fix bug in updating the offsets for newly created copies
Change-Id: I169ab6fe7e187727c0298c2a1e2868a683f3e688
llvm-svn: 332849
2018-05-21 16:23:16 +00:00
Simon Pilgrim 5aa7cdfd70 [X86][SSE] Support v4i32 rotations (PR37426)
As suggested by Fabian on PR37426, we can use PMULUDQ to perform v4i32 vector rotations as the upper 32bits of the multiply will contain the 'wrapped' bits of the rotation.

v8i16/v16i8 rotations would be straightforward to add to lowerRotate in the future - ideally we'd mostly share code with the vector shifts lowering.

Differential Revision: https://reviews.llvm.org/D46954

llvm-svn: 332832
2018-05-21 09:45:59 +00:00
Craig Topper e4c045b7df [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select in IR instead.
Someday maybe we'll use selects for all intrinsics.

llvm-svn: 332824
2018-05-20 23:34:04 +00:00
Craig Topper 9ed890b0db [X86] Add test cases to show missed rotate opportunities due to SimplifyDemandedBits.
llvm-svn: 332815
2018-05-20 02:32:45 +00:00
Sanjay Patel b41a5affea [x86] add more FP with FMF simplification tests; NFC
llvm-svn: 332780
2018-05-18 22:31:43 +00:00
Matt Arsenault 9fc8593a77 DAG: Fix crash on shift with large shift amounts
Fixes bug 37521.

llvm-svn: 332774
2018-05-18 21:54:16 +00:00
Michael Berg 1fa76cc3ea adding baseline fp fold tests for unsafe on and off
llvm-svn: 332756
2018-05-18 19:30:49 +00:00
Simon Pilgrim 1273f4ad93 [X86] Add GPR<->XMM Schedule Tags
BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737)

The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1:
SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM)
Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner)

llvm-svn: 332745
2018-05-18 17:58:36 +00:00
Craig Topper f94ed26ea9 [X86] Directly legalize v16i16/v8i16 vselect to vXi8 vselect to use VPBLENDVB
The intrinsic legalization for masked truncate uses ISD::TRUNCATE which can be constant folded by getNode. This prevents getVectorMaskingNode from seeing the ISD::TRUNCATE special case where it should emit X86ISD::SELECT instead of ISD::VSELECT. This causes a vselect with a v16i1 or v8i1 condition to be emitted during vector legalization. but vector legalization doesn't revisit nodes it creates. DAG combine will then promote this condition to match the result type. Then op legalization will try to legalize it, but the custom lowering hook returned SDValue(). But op legalization doesn't have an Expand for VSELECT because it expects vector legalization to have taken care of it. So the operation sticks around and fails in isel.

This patch adds a custom legalization hook to morph it to a vXi8 vselect instead.

This also simplifies the normal vXi16 vselect handling because vector legalization was normally expanding to AND/ANDN/OR and DAG combine was turning that into VBLENDVB. So we can skip a step by doing it directly.

Fixes PR37499

Differential Revision: https://reviews.llvm.org/D47025

llvm-svn: 332743
2018-05-18 17:48:06 +00:00
Than McIntosh 3c639dbd0d Revert changes from D46265.
This is a revert of the changes from https://reviews.llvm.org/D46265;
the new test introduced (test/CodeGen/X86/PR37310.mir) causes buildbot
failures.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47061

llvm-svn: 332742
2018-05-18 17:47:10 +00:00
Craig Topper fdde9f363c [X86] Update fast-isel test cases for _mm256_mask_cvtepi16_epi8 to match clang r332738.
llvm-svn: 332740
2018-05-18 17:29:47 +00:00
Simon Pilgrim 007b50fd35 [X86][BtVer2] Improve simulation of (V)PINSR values
Include the 6cy delay transferring from the GPR to FPU.

llvm-svn: 332737
2018-05-18 17:09:41 +00:00
Simon Pilgrim 3ecb0b80f6 [X86][BtVer2] Partial vector stores (inc MMX) have a 2cy latency
llvm-svn: 332722
2018-05-18 14:22:22 +00:00
Than McIntosh 4c21a363af StackColoring: better handling of statically unreachable code
Summary:
Avoid assert/crash during liveness calculation in situations where the
incoming machine function has statically unreachable BBs.

Fixes PR37130.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46265

llvm-svn: 332707
2018-05-18 12:25:30 +00:00
Alexander Ivchenko 5c54742da4 [X86][CET] Changing -fcf-protection behavior to comply with gcc (LLVM part)
This patch aims to match the changes introduced in gcc by
https://gcc.gnu.org/ml/gcc-cvs/2018-04/msg00534.html. The
IBT feature definition is removed, with the IBT instructions
being freely available on all X86 targets. The shadow stack
instructions are also being made freely available, and the
use of all these CET instructions is controlled by the module
flags derived from the -fcf-protection clang option. The hasSHSTK
option remains since clang uses it to determine availability of
shadow stack instruction intrinsics, but it is no longer directly used.

Comes with a clang patch (D46881).

Patch by mike.dvoretsky

Differential Revision: https://reviews.llvm.org/D46882

llvm-svn: 332705
2018-05-18 11:58:25 +00:00
Keno Fischer 1e11fc1ccb [X86DomainReassignment] Hopefully fix buildbot failure
The Darwin build bot failed with:
```
llc -mcpu=skylake-avx512 -mtriple=x86_64-unknown-linux-gnu domain-reassignment-test.ll -o - | llvm-mc
--
Exit Code: 134

Command Output (stderr):
--
Assertion failed: (MAI->hasSingleParameterDotFile()), function EmitFileDirective, file lib/MC/MCAsmStreamer.cpp, line 1087.
```

Looks like this is because the `llvm-mc` command was missing a triple
directive and defaulting to MachO. Add the triple option.

llvm-svn: 332694
2018-05-18 04:36:38 +00:00
Keno Fischer e07153a859 [X86DomainReassignment] Don't compare stack-allocated values by address
Summary:
The Closure allocated in the main loop is allocated on the stack. However,
later in the code its address is taken (and used for comparisons). This
obviously doesn't work. In fact, the Closure will get the same stack address
during every loop iteration, rendering the check that intended to identify
Closure conflicts entirely ineffective. Fix this bug by giving every Closure
a unique ID and using that for comparison. Alternatively, we could heap
allocate the closure object.

Fixes PR37396
Fixes JuliaLang/julia#27032

Reviewers: craig.topper, guyblank

Reviewed By: craig.topper

Subscribers: vchuravy, llvm-commits

Differential Revision: https://reviews.llvm.org/D46800

llvm-svn: 332682
2018-05-18 01:03:01 +00:00
Keno Fischer 66ab99c3ee [X86DomainReassignment] Don't delete IMPLICIT_DEF nodes
Summary:
We cannot simply delete IMPLICIT_DEF nodes. They may be used
later (e.g. by a PHI) and deleting them will cause later passes (e.g.
LiveVariables) to crash. However, it seems fine to ignore them for
purposes of the domain reassignment (as we do with PHI).

Fixes PR37430
Fixes JuliaLang/julia#27080

Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D46797

llvm-svn: 332680
2018-05-18 00:40:52 +00:00
Sanjay Patel ed58fed9ef [x86] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in https://reviews.llvm.org/D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

llvm-svn: 332648
2018-05-17 18:43:44 +00:00
Sanjay Patel 87621174ac [x86] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in https://reviews.llvm.org/D43141.

And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.

llvm-svn: 332640
2018-05-17 18:13:58 +00:00
Simon Pilgrim b5741f5c3d [X86][BtVer2] ADC/SBB take 2cy on an ALU pipe, not 1cy like ADD/SUB
llvm-svn: 332616
2018-05-17 15:43:23 +00:00
Simon Pilgrim 0c0336e003 [X86] Split WriteADC/WriteADCRMW scheduler classes
For integer ALU instructions taking eflags as an input (ADC/SBB/ADCX/ADOX)

llvm-svn: 332605
2018-05-17 12:43:42 +00:00
Simon Pilgrim 2dc00a64a2 [X86] Update SNB/generic scheduler tests missed from rL332536
llvm-svn: 332540
2018-05-16 22:24:22 +00:00
Simon Pilgrim 2e0f6c9b21 [X86][SSE] Reduce instruction/register usages for v4i32 vector shifts (PR37441)
As suggested by Fabian on PR37441, use PSHUFLW to extend shift amount types for use with PSRAD/PSRLD to reduce register pressure.

Some of this ideally would be done by combineTargetShuffle but its tricky to do as most of the shuffles are sharing inputs.

Differential Revision: https://reviews.llvm.org/D46959

llvm-svn: 332524
2018-05-16 20:52:52 +00:00
Sanjay Patel 332bbb0fea [x86] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141. 

And as we did there, I'm trying to reduce the patch by 
changing tests that would probably become meaningless
once we make those fixes.

llvm-svn: 332501
2018-05-16 17:58:50 +00:00
Sanjay Patel 502d115505 [x86] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141. 

And as we did there, I'm trying to reduce the patch by 
changing tests that would probably become meaningless
once we make those fixes.

llvm-svn: 332500
2018-05-16 17:58:08 +00:00
Sanjay Patel a7874a52c9 [x86] preserve test intent by removing undef
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141. 

And as we did there, I'm trying to reduce the patch by 
changing tests that would probably become meaningless
once we make those fixes.

llvm-svn: 332499
2018-05-16 17:57:35 +00:00
Craig Topper 67aa726f8c [X86][AVX512DQ] Use packed instructions for scalar FP<->i64 conversions on 32-bit targets
As i64 types are not legal on 32-bit targets, insert these into a suitable zero vector and use the packed vXi64<->FP conversion instructions instead.

Fixes PR3163.

Differential Revision: https://reviews.llvm.org/D43441

llvm-svn: 332498
2018-05-16 17:40:07 +00:00
Sanjay Patel 84caa9659e [x86] add run with unsafe global param; NFC
llvm-svn: 332486
2018-05-16 16:23:41 +00:00
Sanjay Patel b3ac148cb4 [x86] add tests for DAG FP undef operands; NFC
llvm-svn: 332484
2018-05-16 16:16:48 +00:00
Simon Pilgrim 5df1ef7a8c [X86][SSE] Fix tests for vector rotates by splat variable.
We weren't correctly splatting the offset shift

llvm-svn: 332435
2018-05-16 08:23:47 +00:00
Simon Pilgrim de13589625 [X86][SSE] Add tests for vector rotates by splat variable.
llvm-svn: 332410
2018-05-15 22:11:51 +00:00
Chandler Carruth 5ecd81aab0 [x86][eflags] Fix PR37431 by teaching the EFLAGS copy lowering to
specially handle SETB_C* pseudo instructions.

Summary:
While the logic here is somewhat similar to the arithmetic lowering, it
is different enough that it made sense to have its own function.
I actually tried a bunch of different optimizations here and none worked
well so I gave up and just always do the arithmetic based lowering.

Looking at code from the PR test case, we actually pessimize a bunch of
code when generating these. Because SETB_C* pseudo instructions clobber
EFLAGS, we end up creating a bunch of copies of EFLAGS to feed multiple
SETB_C* pseudos from a single set of EFLAGS. This in turn causes the
lowering code to ruin all the clever code generation that SETB_C* was
hoping to achieve. None of this is needed. Whenever we're generating
multiple SETB_C* instructions from a single set of EFLAGS we should
instead generate a single maximally wide one and extract subregs for all
the different desired widths. That would result in substantially better
code generation. But this patch doesn't attempt to address that.

The test case from the PR is included as well as more directed testing
of the specific lowering pattern used for these pseudos.

Reviewers: craig.topper

Subscribers: sanjoy, mcrosier, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D46799

llvm-svn: 332389
2018-05-15 20:16:57 +00:00
Simon Pilgrim be9a206883 [X86] Split WriteCvtF2F into F32->F64 and F64->F32 scheduler classes
BtVer2 - Fixes schedules for (V)CVTPS2PD instructions

A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first

llvm-svn: 332376
2018-05-15 17:36:49 +00:00
Sanjay Patel 8652c53d29 [DAG] propagate FMF for all FPMathOperators
This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups. 
It gets us most of the FMF functionality that we want without adding any state bits to the flags. It 
also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch.

It should provide a superset of the functionality from D46563 - the extra tests show propagation and 
codegen diffs for fcmp, vecreduce, and FP libcalls.

The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last 
node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently, 
so that shouldn't make any difference.

Differential Revision: https://reviews.llvm.org/D46854

llvm-svn: 332358
2018-05-15 14:16:24 +00:00
Simon Pilgrim 891ebcdbaa [X86] Split off F16C WriteCvtPH2PS/WriteCvtPS2PH scheduler classes
Btver2 - VCVTPH2PSYrm needs to double pump the AGU
Broadwell - missing VCVTPS2PH*mr stores extra latency

Allows us to remove the WriteCvtF2FSt conversion store class

llvm-svn: 332357
2018-05-15 14:12:32 +00:00
Artur Gainullin 243a3d56d8 [X86] Improve unsigned saturation downconvert detection.
Summary:
New unsigned saturation downconvert patterns detection was implemented in
X86 Codegen:

(truncate (smin (smax (x, C1), C2)) to dest_type),
where C1 >= 0 and C2 is unsigned max of destination type.

(truncate (smax (smin (x, C2), C1)) to dest_type)
where C1 >= 0, C2 is unsigned max of destination type and C1 <= C2.
These two patterns are equivalent to:

(truncate (umin (smax(x, C1), unsigned_max_of_dest_type)) to dest_type)

Reviewers: RKSimon

Subscribers: llvm-commits, a.elovikov

Differential Revision: https://reviews.llvm.org/D45315

llvm-svn: 332336
2018-05-15 10:24:12 +00:00
Craig Topper fadf8b8dec [X86] Add fast isel tests for some of the avx512 truncate intrinsics to match current clang codegen.
llvm-svn: 332326
2018-05-15 04:26:27 +00:00
Craig Topper 53ceb4805f [X86] Remove and autoupgrade avx512.vbroadcast.ss/avx512.vbroadcast.sd intrinsics.
llvm-svn: 332271
2018-05-14 18:21:22 +00:00
Simon Pilgrim 228d24a2d6 [X86][BtVer2] Fix MMX/YMM integer vector nt store schedules
MMX was missing and YMM was tagged as a fp nt store

llvm-svn: 332269
2018-05-14 18:07:28 +00:00
Craig Topper f633f3eb67 [X86] Add fast isel test cases for the clang output for 512-bit cvtps2pd related intrinsics.
llvm-svn: 332214
2018-05-14 05:09:41 +00:00
Craig Topper 0e71c6d5ca [X86] Remove and autoupgrade the cvtusi2sd intrinsic. Use uitofp+insertelement instead.
llvm-svn: 332206
2018-05-14 00:06:49 +00:00
Craig Topper 97e74b05ef [X86] Add patterns for combining movss+uint_to_fp into the intrinsic instructions under AVX512.
This matches what we do for sint_to_fp.

llvm-svn: 332205
2018-05-13 23:24:21 +00:00
Craig Topper 12067185d4 [X86] Add fast-isel test cases for _mm_cvtu32_sd, _mm_cvtu64_sd, _mm_cvtu32_ss, and _mm_cvtu64_ss.
llvm-svn: 332204
2018-05-13 23:24:19 +00:00
Craig Topper 85906cf041 [X86] Remove and autoupgrade masked vpermd/vpermps intrinsics.
llvm-svn: 332198
2018-05-13 18:03:59 +00:00
Dimitry Andric a39c409619 Follow-up to rL332176 by adding a test case for PR37264.
Noticed by Simon Pilgrim.

llvm-svn: 332197
2018-05-13 14:32:23 +00:00
Craig Topper 38b713d4a7 [X86] Add some load folding patterns for cvtsi2ss/sd into intrinsic instructions.
llvm-svn: 332189
2018-05-13 01:54:33 +00:00
Craig Topper 28b85caea8 [X86] Remove some unused CHECK lines from tests.
llvm-svn: 332188
2018-05-13 00:58:23 +00:00