Commit Graph

14680 Commits

Author SHA1 Message Date
Craig Topper 220cf53540 [X86] Consider isCodeGenOnly in the EVEX2VEX pass to make VMAXPD/PS map to the non-commutable VEX instruction. Use EVEX2VEX override to fix the scalar instructions.
Previously the match was ambiguous and VMAXPS/PD and VMAXCPS/PD
were mapped to the same VEX instruction. But we should keep
the commutableness when change the opcode.

llvm-svn: 373303
2019-10-01 07:10:09 +00:00
Craig Topper 5dc49a8374 [X86] Add test case to show missed opportunity to shrink a constant index to a gather in order to avoid splitting.
Also add a test case for an index that could be shrunk, but
would create a narrow type. We can go ahead and do it we just
need to be before type legalization.

Similar test cases for scatter as well.

llvm-svn: 373290
2019-10-01 01:27:52 +00:00
Amaury Sechet d60c297d1d Add partial bswap test to the X86 backend. NFC
llvm-svn: 373271
2019-09-30 22:52:28 +00:00
Craig Topper 3405237f77 [X86] Mask off upper bits of splat element in LowerBUILD_VECTORvXi1 when forming a SELECT.
The i1 scalar would have been type legalized to i8, but that
doesn't guarantee anything about the upper bits. If we're going
to use it as condition we need to make sure the upper bits are 0.

I've special cased ISD::SETCC conditions since that should
guarantee zero upper bits. We could go further and use
computeKnownBits, but we have no tests that would need that.

Fixes PR43507.

llvm-svn: 373246
2019-09-30 18:43:44 +00:00
Craig Topper 299ebacfe9 [X86] Add ANY_EXTEND to switch in ReplaceNodeResults, but just fall back to default handling.
ANY_EXTEND of v8i8 is marked Custom on AVX512 for handling extends
from v8i8. But the type legalization infrastructure will call
ReplaceNodeResults for v8i8 results. We should just defer it the
default handling instead of asserting in the default of the switch.

Fixes PR43509.

llvm-svn: 373234
2019-09-30 17:14:22 +00:00
Amaury Sechet 09025ca6fc Add tests for rotate with demanded bits. NFC
llvm-svn: 373223
2019-09-30 16:26:09 +00:00
Paul Robinson ed1f3f36ae [SSP] [3/3] cmpxchg and addrspacecast instructions can now
trigger stack protectors.  Fixes PR42238.

Add test coverage for llvm.memset, as proxy for all llvm.mem*
intrinsics. There are two issues here: (1) they could be lowered to a
libc call, which could be intercepted, and do Bad Stuff; (2) with a
non-constant size, they could overwrite the current stack frame.

The test was mostly written by Matt Arsenault in r363169, which was
later reverted; I tweaked what he had and added the llvm.memset part.

Differential Revision: https://reviews.llvm.org/D67845

llvm-svn: 373220
2019-09-30 15:11:23 +00:00
Paul Robinson 14945186c2 [SSP] [1/3] Revert "StackProtector: Use PointerMayBeCaptured"
"Captured" and "relevant to Stack Protector" are not the same thing.

This reverts commit f29366b1f5.
aka r363169.

Differential Revision: https://reviews.llvm.org/D67842

llvm-svn: 373216
2019-09-30 15:01:35 +00:00
Hans Wennborg 8569c0f1ab Pre-commit a test case for PR43129.
llvm-svn: 373190
2019-09-30 08:47:46 +00:00
Roger Ferrer Ibanez 5a2a14db0b [TargetLowering] Simplify expansion of S{ADD,SUB}O
ISD::SADDO uses the suggested sequence described in the section §2.4 of
the RISCV Spec v2.2. ISD::SSUBO uses the dual approach but checking for
(non-zero) positive.

Differential Revision: https://reviews.llvm.org/D47927

llvm-svn: 373187
2019-09-30 07:58:50 +00:00
Craig Topper 1b0ea0a12e [X86] Split v16i32/v8i64 bitreverse on avx512f targets without avx512bw to enable the use of vpshufb on the 256-bit halves.
llvm-svn: 373177
2019-09-30 03:14:38 +00:00
Craig Topper 0e3f659137 [X86] Add custom isel logic to match VPTERNLOG from 2 logic ops.
There's room from improvement here, but this is a decent
starting point.

There are a few minor regressions in the vector-rotate tests,
where we are now forming a vpternlog from an and before we get
a chance to form it for a bitselect that we were matching
previously. This results in an AND and an ANDN feeding the
vpternlog where previously we just had an AND after the
vpternlog. I think we can probably DAG combine the AND with
the bitselect to get back to similar codegen.

llvm-svn: 373172
2019-09-29 18:43:08 +00:00
Amaury Sechet aabf8cbfca Add test case peeking through vector concat when combining insert into shuffles. NFC
llvm-svn: 373171
2019-09-29 17:54:03 +00:00
Craig Topper 494bfd9fed [X86] Enable isel to fold broadcast loads that have been bitcasted from FP into a vpternlog.
llvm-svn: 373157
2019-09-29 01:24:33 +00:00
Craig Topper b6a2207ba2 [X86] Move bitselect matching to vpternlog into X86ISelDAGToDAG.cpp
This allows us to reduce the use count on the condition node before
the match. This enables load folding for that operand without
relying on the peephole pass. This will be improved on for
broadcast load folding in a subsequent commit.

This still requires a bunch of isel patterns for vXi16/vXi8 types
though.

llvm-svn: 373156
2019-09-29 01:24:29 +00:00
Craig Topper 0ac4aacea8 [X86] Enable canonicalizeBitSelect for AVX512 since we can use VPTERNLOG now.
llvm-svn: 373155
2019-09-29 01:24:22 +00:00
Craig Topper 6195ed8397 [X86] Match (or (and A, B), (andn (A, C))) to VPTERNLOG with AVX512.
This uses a similar isel pattern as we used for vpcmov with XOP.

llvm-svn: 373154
2019-09-29 01:24:16 +00:00
Amara Emerson 509a4947c9 Add an operand to memory intrinsics to denote the "tail" marker.
We need to propagate this information from the IR in order to be able to safely
do tail call optimizations on the intrinsics during legalization. Assuming
it's safe to do tail call opt without checking for the marker isn't safe because
the mem libcall may use allocas from the caller.

This adds an extra immediate operand to the end of the intrinsics and fixes the
legalizer to handle it.

Differential Revision: https://reviews.llvm.org/D68151

llvm-svn: 373140
2019-09-28 05:33:21 +00:00
Craig Topper 8b5ad3d16e [X86] Add broadcast load unfolding support for VPTESTMD/Q and VPTESTNMD/Q.
llvm-svn: 373138
2019-09-28 01:56:36 +00:00
Craig Topper 22984ebd0e [X86] Split combineGatherScatter into a version for generic ISD nodes and another version for X86 specific nodes.
The majority of the code doesn't run on the X86 nodes today since
its gated by isBeforeLegalizeOps and we don't formm X86 nodes
until after that. Except for a couple special case in type
legalization. But I think we would probably break those if
some of the transforms fire on them.

I want to remove the hardcoded operand numbers and the unusual
use of UpdateNodeOperands. Being able to know which ISD opcodes
are present should help with that.

llvm-svn: 373136
2019-09-28 01:06:58 +00:00
Craig Topper 305c811fd4 [X86] Add test case to show missed opportunity to turn (add (zext (vXi1 X)), Y) -> (sub Y, (sext (vXi1 X))) with avx512.
With avx512, the vXi1 type is legal. And we can more easily sign
extend them to vector registers. zext requires a sign extend and
a shift.

If we can easily turn the zext into a sext we should.

llvm-svn: 373131
2019-09-27 22:30:24 +00:00
Craig Topper 750bdda638 [X86] Call SimplifyDemandedBits in combineGatherScatter any time the mask element is wider than i1, not just when AVX512 is disabled.
The AVX2 intrinsics can still be used when AVX512 is enabled and
those go through this path. So we should simplify them.

llvm-svn: 373108
2019-09-27 18:23:55 +00:00
Craig Topper 432a88bf04 [X86] Add test case to show failure to perform SimplifyDemandedBits on mask of avx2 gather intrinsics when avx512 is enabled.
llvm-svn: 373107
2019-09-27 18:23:46 +00:00
Jesper Antonsson 39b81f1cbc [CodeGenPrepare] Mend "avoid crashing from replacing a phi twice" fix.
Summary:
An erroneously negated if-statement by an earlier (March 2019) bugfix left phi replacement/simplification under optimizeMemoryInst()  in CodeGenPrepare largely inactivated. The error was found when csmith found that the same assert as in the original bug report could still be triggered in a different way. This patch fixes the bugfix. The original bug was:
 https://bugs.llvm.org/show_bug.cgi?id=41052
... and the previous fix was D59358.

Reviewers: aprantl, skatkov

Reviewed By: skatkov

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67838

llvm-svn: 373084
2019-09-27 13:01:37 +00:00
Craig Topper d3f82b8b97 [X86] Add VMOVSSZrrk/VMOVSDZrrk/VMOVSSZrrkz/VMOVSDZrrkz to getUndefRegClearance.
We have isel patterns that can put an IMPLICIT_DEF on one of
the sources for these instructions. So we should make sure
we break any dependencies there. This should be done by
just using one of the other sources.

llvm-svn: 373025
2019-09-26 22:56:06 +00:00
Craig Topper c898724974 [X86] Add CodeGenOnly instructions for (f32 (X86selects $mask, (loadf32 addr), fp32imm0) to use masked MOVSS from memory.
Similar for f64 and having a non-zero passthru value.

We were previously not trying to fold the load at all. Using
a CodeGenOnly instruction allows us to use FR32X/FR64X as the
register class to avoid a bunch of COPY_TO_REGCLASS.

llvm-svn: 373021
2019-09-26 22:23:09 +00:00
Roman Lebedev 3a5ca1c8b5 [DAGCombine][X86][AArch64][NFC] Add tests for shift-by-signext
llvm-svn: 373014
2019-09-26 20:49:49 +00:00
Craig Topper ee78e44126 [X86] Mark the EVEX encoded PSADBW instructions as commutable to enable load folding of the other operand.
The SSE and VEX versions are already correct.

llvm-svn: 372941
2019-09-26 04:42:58 +00:00
Sanjay Patel 831a7e7068 [DAGCombiner] add one-use restriction to vector transform with cheap extract
We might be able to do better on the example in the test,
but in general, we should not scalarize a splatted vector
binop if there are other uses of the binop. Otherwise, we
can end up with code as we had - a scalar op that is
redundant with a vector op.

llvm-svn: 372886
2019-09-25 15:08:33 +00:00
Sanjay Patel 1aa09e0585 [x86] add test for multi-use scalarization of vector binop; NFC
llvm-svn: 372883
2019-09-25 14:57:45 +00:00
Simon Pilgrim a7f27f357d [X86] Add MMX MOVD/MOVQ stores to folding tables to support stack folding
llvm-svn: 372770
2019-09-24 16:15:32 +00:00
Simon Pilgrim 682d41a506 [X86] Add tests showing failure to stack fold MMX MOVD/MOVQ stores
llvm-svn: 372766
2019-09-24 15:40:09 +00:00
Ilya Biryukov 60e5e0b667 Revert r372333: [DAG][X86] Convert isNegatibleForFree/GetNegatedExpression to a target hook (PR42863)
Reason: this caused severe compile time regressions in JAX.
See email thread  of original revision on llvm-commits for details:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190923/697042.html

llvm-svn: 372756
2019-09-24 13:48:02 +00:00
Craig Topper 8a6916e6db [X86] Reduce the number of unique check prefixes in memset-nonzero.ll. NFC
The avx512 with prefer-256-bit generates the same code as AVX2 so
just reuse that prefix.

llvm-svn: 372661
2019-09-23 21:29:28 +00:00
Sanjay Patel 7414151929 [BreakFalseDeps] ignore function with minsize attribute
This came up in the x86-specific:
https://bugs.llvm.org/show_bug.cgi?id=43239
...but it is a general problem for the BreakFalseDeps pass.
Dependencies may be broken by adding some other instruction,
so that should be avoided if the overall goal is to minimize size.

Differential Revision: https://reviews.llvm.org/D67363

llvm-svn: 372628
2019-09-23 17:01:01 +00:00
Sanjay Patel 31b9dfe23f [x86] fix assert with horizontal math + broadcast of vector (PR43402)
https://bugs.llvm.org/show_bug.cgi?id=43402

llvm-svn: 372606
2019-09-23 13:30:23 +00:00
Craig Topper 03b5a13ee3 [X86] Canonicalize all zeroes vector to RHS in X86DAGToDAGISel::tryVPTESTM.
llvm-svn: 372544
2019-09-23 05:35:23 +00:00
Craig Topper 5e26064c40 [X86] Remove SETEQ/SETNE canonicalization code from LowerIntVSETCC_AVX512 to prevent an infinite loop.
The attached test case would previous infinite loop after
r365711.

I'm going to move this to X86ISelDAGToDAG.cpp to get the setcc
to match VPTEST in 32-bit mode in a follow up commit.

llvm-svn: 372543
2019-09-23 05:35:20 +00:00
Craig Topper 1f058538e0 [X86] Add 32-bit command line to avx512f-vec-test-testn.ll
llvm-svn: 372542
2019-09-23 05:35:15 +00:00
David Zarzycki a7a515cb77 Prefer AVX512 memcpy when applicable
When AVX512 is available and the preferred vector width is 512-bits or
more, we should prefer AVX512 for memcpy().

https://bugs.llvm.org/show_bug.cgi?id=43240

https://reviews.llvm.org/D67874

llvm-svn: 372540
2019-09-23 05:00:59 +00:00
Craig Topper a533e87792 [X86][SelectionDAGBuilder] Move the hack for handling MMX shift by i32 intrinsics into the X86 backend.
This intrinsics should be shift by immediate, but gcc allows any
i32 scalar and clang needs to match that. So we try to detect the
non-constant case and move the data from an integer register to an
MMX register.

Previously this was done by creating a v2i32 build_vector and
bitcast in SelectionDAGBuilder. This had to be done early since
v2i32 isn't a legal type. The bitcast+build_vector would be DAG
combined to X86ISD::MMX_MOVW2D which isel will turn into a
GPR->MMX MOVD.

This commit just moves the whole thing to lowering and emits
the X86ISD::MMX_MOVW2D directly to avoid the illegal type. The
test changes just seem to be due to nodes being linearized in a
different order.

llvm-svn: 372535
2019-09-23 01:05:33 +00:00
Roman Lebedev 7c3d6f5a1b [X86] X86DAGToDAGISel::matchBEXTRFromAndImm(): if can't use BEXTR, fallback to BZHI is profitable (PR43381)
Summary:
PR43381 notes that while we are good at matching `(X >> C1) & C2` as BEXTR/BEXTRI,
we only do that if we either have BEXTRI (TBM),
or if BEXTR is marked as being fast (`-mattr=+fast-bextr`).
In all other cases we don't match.

But that is mainly only true for AMD CPU's.
However, for all the CPU's for which we have sched models,
the BZHI is always fast (or the sched models are all bad.)

So if we decide that it's unprofitable to emit BEXTR/BEXTRI,
we should consider falling-back to BZHI if it is available,
and follow-up with the shift.

While it's really tempting to do something because it's cool
it is wise to first think whether it actually makes sense to do.
We shouldn't just use BZHI because we can, but only it it is beneficial.
In particular, it isn't really worth it if the input is a register,
mask is small, or we can fold a load.
But it is worth it if the mask does not fit into 32-bits.

(careful, i don't know much about intel cpu's, my choice of `-mcpu` may be bad here)
Thus we manage to fold a load:
https://godbolt.org/z/Er0OQz
Or if we'd end up using BZHI anyways because the mask is large:
https://godbolt.org/z/dBJ_5h
But this isn'r actually profitable in general case,
e.g. here we'd increase microop count
(the register renaming is free, mca does not model that there it seems)
https://godbolt.org/z/k6wFoz
Likewise, not worth it if we just get load folding:
https://godbolt.org/z/1M1deG

https://bugs.llvm.org/show_bug.cgi?id=43381

Reviewers: RKSimon, craig.topper, davezarzycki, spatel

Reviewed By: craig.topper, davezarzycki

Subscribers: andreadb, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67875

llvm-svn: 372532
2019-09-22 22:04:29 +00:00
Roman Lebedev 24159592ca [NFC][X86] Add BEXTR test with load and 33-bit mask (PR43381 / D67875)
llvm-svn: 372524
2019-09-22 19:36:38 +00:00
Craig Topper a1d86857ff [X86] Update commutable EVEX vcmp patterns to use timm instead of imm.
We need to match TargetConstant, not Constant. This was broken
in r372338, but we lacked test coverage.

llvm-svn: 372523
2019-09-22 19:06:13 +00:00
Craig Topper ac84771261 [X86] Add more tests for commuting evex vcmp instructions during isel to fold a load.
Some of the isel patterns were not updated to check for
TargetConstant instead of Constant in r372338.

llvm-svn: 372522
2019-09-22 19:06:08 +00:00
Craig Topper 38014c553f [X86] Add test memset and memcpy testcases for D67874. NFC
llvm-svn: 372494
2019-09-22 06:52:25 +00:00
Roman Lebedev 854b0f0f00 [NFC][X86] Adjust check prefixes in bmi.ll (PR43381)
llvm-svn: 372468
2019-09-21 11:12:55 +00:00
Craig Topper 04682939eb [X86] Use sse_load_f32/f64 and timm in patterns for memory form of vgetmantss/sd.
Previously we only matched scalar_to_vector and scalar load, but
we should be able to narrow a vector load or match vzload.

Also need to match TargetConstant instead of Constant. The register
patterns were previously updated, but not the memory patterns.

llvm-svn: 372458
2019-09-21 06:44:29 +00:00
Craig Topper 4fa12ac92c [X86] Add test case to show failure to fold load with getmantss due to isel pattern looking for Constant instead of TargetConstant
The intrinsic has an immarg so its gets created with a TargetConstant
instead of a Constant after r372338. The isel pattern was only
updated for the register form, but not the memory form.

llvm-svn: 372457
2019-09-21 06:44:24 +00:00
Sterling Augustine 4a58936716 Fix missed case of switching getConstant to getTargetConstant. Try 2.
Summary: This fixes a crasher introduced by r372338.

Reviewers: echristo, arsenm

Subscribers: wdng, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67850

llvm-svn: 372434
2019-09-20 22:26:55 +00:00