Commit Graph

11957 Commits

Author SHA1 Message Date
Craig Topper 5ec210cc27 [X86] Prevent folding stack reloads with instructions that have an undefined register update.
We want to keep the load unfolded so we can use the same register for both sources to avoid a false dependency.

llvm-svn: 334802
2018-06-15 06:11:36 +00:00
Craig Topper 3c4cc01226 [X86] Add more instructions to the memory folding tables using the autogenerated table as a guide.
I think this covers most of the unmasked vector instructions. We're still missing a lot of the masked instructions.

There are some test changes here because of the new folding support. I don't think these particular cases should be folded because it creates an undef register dependency. I think the changes introduced in r334175 are not handling stack folding. They're only blocking the peephole pass.

llvm-svn: 334800
2018-06-15 05:49:19 +00:00
Craig Topper 3b060daba5 [X86] Fix some checks to use X86 instead of X32.
These tests were recently updated so it looks like gone wrong.

llvm-svn: 334786
2018-06-15 04:42:55 +00:00
Sanjay Patel f85ca6abee [x86] be more selective about converting 'and' to shuffle (PR37749)
isVectorClearMaskLegal() is the TLI hook used by the generic
DAGCombiner::XformToShuffleWithZero().

We've grown to accomodate/expect this transform to shuffle
(disabling it more generally results in many regressions).
So I'm narrowly excluding the 256-bit types that clearly 
are not worthwhile for AVX1. 

I think in most cases we are able to recover by converting 
the shuffle back into 'and' ops, but the cases in:
https://bugs.llvm.org/show_bug.cgi?id=37749
...show that there are cracks.

llvm-svn: 334759
2018-06-14 19:55:02 +00:00
Michael Berg 4663ceb63f updating isNegatibleForFree and GetNegatedExpression with fmf for fadd
Summary:  A FMF constraint is added to FADD with unsafe still available as the fallback

Reviewers: spatel, wristow, arsenm, hfinkel

Reviewed By: spatel

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D48180

llvm-svn: 334753
2018-06-14 18:48:31 +00:00
Sanjay Patel d49219db84 [x86] add tests for AVX1 FP logic op abuse (PR37749); NFC
Also, add a RUN for AVX2 to make sure that's good.

llvm-svn: 334744
2018-06-14 18:08:06 +00:00
Tomasz Krupa d8d66a6b28 [X86] Lowering Mask Scalar intrinsics to native IR (LLVM part)
Summary: Complementary patch to lowering add, sub, mul and div mask scalar
intrinsics in Clang.

Reviewers: craig.topper, sroland, spatel, RKSimon

Reviewed by: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47978

llvm-svn: 334740
2018-06-14 17:32:58 +00:00
Craig Topper 3ffeb41f6b [X86] Add more vector instructions to the memory folding table using the autogenerated table as a guide.
The test cahnge is because we now fold stack reload into RNDSCALE and RNDSCALE can be turned into ROUND by EVEX->VEX.

llvm-svn: 334728
2018-06-14 15:40:31 +00:00
Craig Topper b2552e1e08 [x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551)
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.

This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll

Reviewers: RKSimon, gbedwell, spatel

Reviewed By: spatel

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D47993

llvm-svn: 334685
2018-06-14 03:16:58 +00:00
Francis Visoiu Mistrih 0c3a7761f3 Revert r334649 "[Timers] Use the pass argument name for JSON keys in time-passes"
This reverts commit r334649.

This breaks a test.

llvm-svn: 334651
2018-06-13 20:44:02 +00:00
Francis Visoiu Mistrih fbd450b052 [Timers] Use the pass argument name for JSON keys in time-passes
When using clang --save-stats -mllvm -time-passes, both timers and stats
end up in the same json file.

We could end up with things like:

{
  "asm-printer.EmittedInsts": 1,
  "time.pass.Virtual Register Map.wall": 2.9015541076660156e-04,
  "time.pass.Virtual Register Map.user": 2.0500000000000379e-04,
  "time.pass.Virtual Register Map.sys": 8.5000000000001741e-05,
}

This patch makes use of the pass argument name (if available) in the
JSON key to end up with things like:

{
  "asm-printer.EmittedInsts": 1,
  "time.pass.virtregmap.wall": 2.9015541076660156e-04,
  "time.pass.virtregmap.user": 2.0500000000000379e-04,
  "time.pass.virtregmap.sys": 8.5000000000001741e-05,
}

This also helps avoiding to write another JSON printer to handle all the
cases that we could have in our pass names.

Differential Revision: https://reviews.llvm.org/D48109

llvm-svn: 334649
2018-06-13 20:09:59 +00:00
Craig Topper f7f663e0a9 [X86] Move RCPSSr_Int, RSQRTSSr_Int, SQRTSDr_Int, SQRTSSr_Int to the correct load folding table.
They were in the operand 1 folding table, but their foldable operand is operand 2.

llvm-svn: 334648
2018-06-13 20:03:42 +00:00
Craig Topper e399f55826 [X86] Add one more intrinsic and test cases to avx512-cvttp2i.ll.
spatel noticed it was missing in D47993.

llvm-svn: 334629
2018-06-13 17:55:13 +00:00
Sanjay Patel 7d4929611c [DAGCombiner] remove hasOneUse() check from fadd constants transform
We're constant folding here, so we shouldn't check uses. This matches
the IR optimizer behavior.

The x86 test shows the expected win. The AArch64 test shows something
else. This only seems to happen if the "generic" AArch64 CPU model is 
used by MachineCombiner, so I'll file a bug report to follow-up.

llvm-svn: 334608
2018-06-13 15:22:48 +00:00
Sanjay Patel 9f3f18d6f6 [x86] add test for fadd with more than one use; NFC
The equivalent AArch64 test added at rL334556 isn't showing
the expected output from the DAGCombiner code change that 
would fix this example. That's a machine combiner bug from 
what I see.

llvm-svn: 334605
2018-06-13 15:01:07 +00:00
Cameron McInally f37bd01ddc [FPEnv] Expand constrained FP operations
Add a helper function to expand constrained FP operations as needed. 
Note that the Strict POWI operation is not handled in this patch since 
the format is slightly different from the others.

Differential Revision: https://reviews.llvm.org/D47491

llvm-svn: 334603
2018-06-13 14:32:12 +00:00
Sanjay Patel b983ac6fe1 [x86] eliminate even more sign-bit tests with vector select
This shortcoming was noted in D47330, and the test diffs show we already 
had other examples where we failed to fold to a SHRUNKBLEND:

/// Dynamic (non-constant condition) vector blend where only the sign bits
/// of the condition elements are used. This is used to enforce that the
/// condition mask is not valid for generic VSELECT optimizations.

This patch implements an idea from D48043 and would obsolete that patch 
because it catches more cases (notable the AVX1 case that was missed there). 
All we're doing is allowing the existing transform to fire more often by 
removing the post-legalize constraint. All of the relevant feature checks 
and other predicates are left as-is.

Differential Revision: https://reviews.llvm.org/D48078

llvm-svn: 334592
2018-06-13 12:28:32 +00:00
Craig Topper 3829d258ee [X86] Remove masking from avx512vbmi2 concat and shift by immediate intrinsics. Use select in IR instead.
llvm-svn: 334576
2018-06-13 07:19:21 +00:00
Craig Topper 651ef359ab [X86] add avx512 tests for potentially miscompiling cvttp2si/cvttp2ui (PR37751).
llvm-svn: 334551
2018-06-12 21:42:42 +00:00
Michael Berg 5d49f66570 Utilize new SDNode flag functionality to expand current support for fmul
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fmul.

Reviewers: spatel, hfinkel, wristow, arsenm

Reviewed By: spatel

Subscribers: nhaehnle, wdng

Differential Revision: https://reviews.llvm.org/D47911

llvm-svn: 334514
2018-06-12 16:13:11 +00:00
Craig Topper 957b738432 [X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc.
We were missing packed isel folding patterns for all of sse41, avx, and avx512.

For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns.

Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch.

Some of this was spotted in the review for D47993.

This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns.

llvm-svn: 334460
2018-06-12 00:48:57 +00:00
Clement Courbet 7db69cc08a [X86] Fix skylake server scheduling info.
Summary:
This fixes most of the scheduling info for SKX vector operations.
I had to split a lot of the YMM/ZMM classes into separate classes for YMM and ZMM.

The before/after llvm-exegesis analysis are in the phabricator diff.

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47721

llvm-svn: 334407
2018-06-11 14:37:53 +00:00
Sanjay Patel a1791be455 [x86] add scalar cvtt intrinsic tests; NFC
More coverage for the problem noted in D47993 (although these shouldn't be affected by that patch).

llvm-svn: 334404
2018-06-11 13:51:34 +00:00
Craig Topper c12ac0c984 [X86] Add test files for upgrade of vbmi2 expand load and compress store intrinsics that was done in r334381.
llvm-svn: 334386
2018-06-11 06:20:24 +00:00
Craig Topper 0e25c8239a [X86] Remove masking from dbpsadbw intrinsics, use select in IR instead.
llvm-svn: 334384
2018-06-11 06:18:22 +00:00
Craig Topper e71ad1f6d0 [X86] Remove and autoupgrade the expandload and compressstore intrinsics.
We use the target independent intrinsics now.

llvm-svn: 334381
2018-06-11 01:25:22 +00:00
Sanjay Patel 3e5c70cc1d [DAGCombiner] match vector compare and select sizes with extload operand (PR37427)
This patch started off much more general and ambitious, but it's been a nightmare 
seeing all the ways x86 vector codegen can go wrong.

So the code is still structured to allow extending easily, but it's currently 
limited in several ways:

1. Only handle cases with an extending load.
2. Only handle cases with a zero constant compare.
3. Ignore setcc with vector bitmask (SetCCWidth != 1) - so AVX512 should be unaffected.

The motivating case from PR37427:
https://bugs.llvm.org/show_bug.cgi?id=37427
...is the 1st test, and that shows the expected win - we eliminated the unnecessary 
intermediate cast.

There's a clear regression in the last test (sgt_zero_fp_select) because we longer 
recognize a 'SHRUNKBLEND' opportunity. I think that general problem is also present 
in sgt_zero, so I'll try to fix that in a follow-up. We need to match a sign-bit 
setcc from a sign-extended operand and remove it.

Differential Revision: https://reviews.llvm.org/D47330

llvm-svn: 334378
2018-06-10 23:09:50 +00:00
Craig Topper 304bd747af [X86] Add expandload and compresstore fast-isel tests for avx512f and avx512vl. Update existing tests for avx512vbmi2 to use target independent intrinsics.
llvm-svn: 334368
2018-06-10 18:55:37 +00:00
Sanjay Patel 15bee8f1c0 [x86] add tests for potentially miscompiling cvttp2si (PR37751); NFC
llvm-svn: 334367
2018-06-10 17:42:12 +00:00
Craig Topper 301d526329 [X86] Fix forward declaration in a test case that was messed up in r334358
llvm-svn: 334360
2018-06-10 06:43:48 +00:00
Craig Topper 98a79934af [X86] Remove masking from the 512-bit masked floating point add/sub/mul/div intrinsics. Use a select in IR instead.
llvm-svn: 334358
2018-06-10 06:01:36 +00:00
Simon Pilgrim 5c32989c91 [X86][SSE] Support v8i16/v16i16 rotations
Extension to D46954 (PR37426), this patch adds support for v8i16/v16i16 rotations in a similar manner - the conversion of the shift/rotate amount to a multiplication factor and the use of PMULLW to shift left and PMULHUW (ISD::MULHU) to shift the wrapped bits back around to be ORd together.

Differential Revision: https://reviews.llvm.org/D47822

llvm-svn: 334309
2018-06-08 17:58:42 +00:00
Sanjay Patel 70314bd61c [x86] add tests for node-level FMF; NFC
These cases should be optimized using the change from D47911.

llvm-svn: 334308
2018-06-08 17:54:28 +00:00
Sanjay Patel 9995a00a94 [x86] regenerate test checks; NFC
llvm-svn: 334307
2018-06-08 17:42:35 +00:00
Michael Berg bf90d1f263 Utilize new SDNode flag functionality to expand current support for fsub
Summary: This patch originated from D46562 and is a proper subset, with some issues addressed for fsub.

Reviewers: spatel, hfinkel, wristow, arsenm

Reviewed By: spatel

Subscribers: wdng

Differential Revision: https://reviews.llvm.org/D47910

llvm-svn: 334306
2018-06-08 17:39:50 +00:00
Simon Pilgrim 89deac6694 [X86][BtVer2] Add support for all SUB/XOR 32/64 scalar instructions that should match the dependency-breaking 'zero-idiom'
As detailed on Agner's Microarchitecture doc (21.8 AMD Bobcat and Jaguar pipeline - Dependency-breaking instructions), these instructions are dependency breaking and fast-path zero the destination register (and appropriate EFLAGS bits).

llvm-svn: 334303
2018-06-08 17:00:45 +00:00
Simon Pilgrim 59e915c691 [X86] Fix schedule-x86_64.s tests to use different registers in reg-reg cases
Same fix as rL334110: I noticed while working on zero-idiom + dependency-breaking support (PR36671) that most of our binary instruction schedule tests were reusing the same src registers, which would cause the tests to fail once we enable scalar zero-idiom support on btver2.

llvm-svn: 334302
2018-06-08 16:40:15 +00:00
Simon Pilgrim eab9d20424 [X86][SSE] Add SSE2/AVX2 vector rotate tests
Now that we're custom lowering vector rotates for SSE in general we should be testing the combines with them as well.

llvm-svn: 334290
2018-06-08 14:07:21 +00:00
Simon Pilgrim a6afa310c9 [X86][SSE] Simplify combineVectorTruncationWithPACKUS to reduce code duplication
Simplify combineVectorTruncationWithPACKUS to mask the upper bits followed by calling truncateVectorWithPACK instead of duplicating with similar code.

This results in the codegen using (V)PACKUSDW on SSE41+ targets for vXi64/vXi32 inputs where before it always used PACKUSWB (along with a lot more bitcasting).

I've raised PR37749 as until we avoid unnecessary concats back to 256-bit for bitwise ops, we can't avoid splitting the input value into 128-bit subvectors for masking.

llvm-svn: 334289
2018-06-08 13:59:11 +00:00
Sanjay Patel ab4ca0603c [x86] restore test comment; NFC
The description got deleted along with the FIXME note in
rL334268.

llvm-svn: 334288
2018-06-08 13:53:13 +00:00
Simon Pilgrim ad45efc445 [X86][SSE] Consistently prefer lowering to PACKUS over PACKSS
We have some combines/lowerings that attempt to use PACKSS-then-PACKUS and others that use PACKUS-then-PACKSS.

PACKUS is much easier to combine with if we know the upper bits are zero as ComputeKnownBits can easily see through BITCASTs etc. especially now that rL333995 and rL334007 have landed. It also effectively works at byte level which further simplifies shuffle combines.

The only (minor) annoyances are that ComputeKnownBits can sometimes take longer as it doesn't fail as quickly as ComputeNumSignBits (but I'm not seeing any actual regressions in tests) and PACKUSDW only became available after SSE41 so we have more codegen diffs between targets.

llvm-svn: 334276
2018-06-08 10:29:00 +00:00
Sam Parker 16f963ba0d [DAGCombine] Fix for PR37667
While trying to propagate AND masks back to loads, we currently allow
one non-load node to be included as a leaf in chain. This fix now
limits that node to produce only a single data value.

Differential Revision: https://reviews.llvm.org/D47878

llvm-svn: 334268
2018-06-08 07:49:04 +00:00
Michael Berg 77b5be7ec6 propagate fast math flags via IR on fma and sub expressions
Summary: This change uses fmf subflags to guard fma optimizations as well as unsafe. These changes originated from D46483 and have been simplified via getNode.

Reviewers: spatel, arsenm, hfinkel, javed.absar

Reviewed By: spatel

Subscribers: nemanjai, wdng

Differential Revision: https://reviews.llvm.org/D47388

llvm-svn: 334242
2018-06-07 22:49:09 +00:00
Sanjay Patel 898fbd7c47 [x86] add tests for backwards propagate mask bug (PR37060, PR37667); NFC
llvm-svn: 334199
2018-06-07 14:11:18 +00:00
Simon Pilgrim 09953d8412 [X86][SSE] Simplify combineVectorTruncationWithPACKSS to reduce code duplication
Simplify combineVectorTruncationWithPACKSS to just a SIGN_EXTEND_INREG followed by using the existing truncateVectorWithPACK instead of duplicating code.

llvm-svn: 334193
2018-06-07 13:01:42 +00:00
Simon Pilgrim 0e29d8d81f [X86][SSE] Add extra trunc(shl) test cases
The existing trunc_shl_17_v8i16_v8i32 test case should (but doesn't) fold to zero, I've added 2 new test cases:
 - trunc_shl_16_v8i16_v8i32 which folds to zero (this is actually testing the target faux shuffle combine)
 - trunc_shl_15_v8i16_v8i32 which should perform the full shl + truncate

llvm-svn: 334188
2018-06-07 11:22:52 +00:00
Simon Pilgrim cc92897be9 [X86] Regenerate rotate tests
Add 32-bit tests to show missed SHLD/SHRD cases

llvm-svn: 334183
2018-06-07 10:13:09 +00:00
Tomasz Krupa f8c7637027 [X86] Block UndefRegUpdate
Summary: Prevent folding of operations with memory loads when one of the sources has undefined register update.

Reviewers: craig.topper

Subscribers: llvm-commits, mike.dvoretsky, ashlykov

Differential Revision: https://reviews.llvm.org/D47621

llvm-svn: 334175
2018-06-07 08:48:45 +00:00
Karl-Johan Karlsson abb11f805f [BranchFolding] Fix live-in's when hoisting code
Summary:
When the branch folder hoist code into a predecessor it adjust live-in's
in the blocks it hoist code from. However it fail to handle hoisted code
that contain a defed register that originally is live-in in the block
through a super register.

This is fixed by replacing the live-in handling code with calls to
utility functions in LivePhysRegs.

Reviewers: kparzysz, gberry, MatzeB, uweigand, aprantl

Reviewed By: kparzysz

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D47529

llvm-svn: 334163
2018-06-07 07:20:33 +00:00
Roman Lebedev 488d28d4e5 [X86] Emit BZHI when mask is ~(-1 << nbits))
Summary:
In D47428, i propose to choose the `~(-(1 << nbits))` as the canonical form of low-bit-mask formation.
As it is seen from these tests, there is a reason for that.

AArch64 currently better handles `~(-(1 << nbits))`, but not the more traditional `(1 << nbits) - 1` (sic!).
The other way around for X86.
It would be much better to canonicalize.

This patch is completely monkey-typing.
I don't really understand how this works :)
I have based it on `// x & (-1 >> (32 - y))` pattern.

Also, when we only have `BMI`, i wonder if we could use `BEXTR` with `start=0` ?

Related links:
https://bugs.llvm.org/show_bug.cgi?id=36419
https://bugs.llvm.org/show_bug.cgi?id=37603
https://bugs.llvm.org/show_bug.cgi?id=37610
https://rise4fun.com/Alive/idM

Reviewers: craig.topper, spatel, RKSimon, javed.absar

Reviewed By: craig.topper

Subscribers: kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D47453

llvm-svn: 334125
2018-06-06 19:38:16 +00:00