Commit Graph

21150 Commits

Author SHA1 Message Date
Craig Topper 375849518d [X86] Add a X86ISD::BEXTRI to distinquish the case where the control must be a constant.
The bextri intrinsic has a ImmArg attribute which will be converted
in SelectionDAG using TargetConstant. We previously converted this
to a plain Constant to allow X86ISD::BEXTR to call SimplifyDemandedBits
on it.

But while trying to decide if D89178 was safe, I realized that
this conversion of TargetConstant to Constant would be one case
where that would break.

So this patch adds a new opcode specifically for the immediate case.
And then teaches computeKnownBits and SimplifyDemandedBits to also
handle it, but not try to SimplifyDemandedBits on it. To make up
for that, I immediately masked the constant to 16 bits when
converting from the intrinsic node to the X86ISD node.
2020-10-10 19:18:06 -07:00
Joao Moreira e0b89df2e0 [X86] Check if call is indirect before emitting NT_CALL
The notrack prefix is a relaxation of CET policies which makes it possible to indirectly call targets which do not have an ENDBR instruction in the landing address. To emit a call with this prefix, the special attribute "nocf_check" is used. When used as a function attribute, a CallInst targeting the respective function will return true for the method "doesNoCfCheck()", no matter if it is a direct call (and such should remain like this, as the information that the to-be-called function won't perform control-flow checks is useful in other contexts). Yet, when emitting an X86ISD::NT_CALL, the respective CallInst should be verified for its indirection, allowing that the prefixed calls are only emitted in the right situations.

Update the respective testing unit to also verify for direct calls to functions with ''nocf_check'' attributes.

The bug can also be reproduced through compiling the following C code using the -fcf-protection=full flag.

int __attribute__((nocf_check)) foo(int a) {};

int main() {
  foo(42);
}

Differential Revision: https://reviews.llvm.org/D87320
2020-10-09 15:54:23 -07:00
Craig Topper f34bb06935 [X86] When expanding LCMPXCHG16B_NO_RBX in EmitInstrWithCustomInserter, directly copy address operands instead of going through X86AddressMode.
I suspect getAddressFromInstr and addFullAddress are not handling
all addresses cases properly based on a report from MaskRay.

So just copy the operands directly. This should be more efficient
anyway.
2020-10-09 11:55:24 -07:00
Craig Topper 662024df33 [X86] Don't copy kill flag when expanding LCMPXCHG16B_SAVE_RBX
The expansion code creates a copy to RBX before the real LCMPXCHG16B.
It's possible this copy uses a register that is also used by the
real LCMPXCHG16B. If we set the kill flag on the use in the copy,
then we'll fail the machine verifier on the use on the LCMPXCHG16B.

Differential Revision: https://reviews.llvm.org/D89151
2020-10-09 11:55:24 -07:00
Fangrui Song e36a41b3cf [X86] Fix some clang-tidy bugprone-argument-comment issues 2020-10-08 15:26:50 -07:00
Scott Constable ad4313fc83 [X86] Fix bug in -mlvi-cfi that may clobber a live register
Fix for this bug: https://bugs.llvm.org/show_bug.cgi?id=47740

The fix uses the existing findDeadCallerSavedReg() function instead of a hacky heuristic to find a scratch register to clobber.

Differential Revision: https://reviews.llvm.org/D88925
2020-10-07 18:33:51 -07:00
Scott Constable dc3dba7dbd [X86] Move findDeadCallerSavedReg() into X86RegisterInfo
The findDeadCallerSavedReg() function has utility outside of X86FrameLowering.cpp

Differential Revision: https://reviews.llvm.org/D88924
2020-10-07 18:33:48 -07:00
Anna Thomas 35cb45c533 [ImplicitNullChecks] Support complex addressing mode
The pass is updated to handle loads through complex addressing mode,
specifically, when we have a scaled register and a scale.
It requires two API updates in TII which have been implemented for X86.

See added IR and MIR testcases.

Tests-Run: make check
Reviewed-By: reames, danstrushin
Differential Revision: https://reviews.llvm.org/D87148
2020-10-07 20:55:38 -04:00
Craig Topper 68e1a8d207 [X86] Defer the creation of LCMPXCHG16B_SAVE_RBX until finalize-isel
We need to use LCMPXCHG16B_SAVE_RBX if RBX/EBX is being used as
the frame pointer. We previously checked for this during type
legalization, but that's too early to know for sure if the base
pointer is needed.

This patch adds a new pseudo instruction to emit from isel that
uses a virtual register for the RBX input. Then we use the custom
inserter hook to emit LCMPXCHG16B if RBX isn't needed as a base
pointer or LCMPXCHG16B_SAVE_RBX if it is.

Fixes PR42064.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D88808
2020-10-07 17:00:43 -07:00
Simon Pilgrim 6c7d713cf5 [X86][SSE] combineX86ShuffleChain add 'CanonicalizeShuffleInput' helper. NFCI.
As part of PR45974, we're getting closer to not creating 'padded' vectors on-the-fly in combineX86ShufflesRecursively, and only pad the source inputs if we have a definite match inside combineX86ShuffleChain.

At the moment combineX86ShuffleChain just has to bitcast an input to the correct shuffle type, but eventually we'll need to pad them as well. So, move the bitcast into a 'CanonicalizeShuffleInput helper for now, making the diff for future padding support a lot smaller.
2020-10-06 17:47:24 +01:00
Fangrui Song 43c7dc52f1 [X86] .code16: temporarily set Mode32Bit when matching an instruction with the data32 prefix
PR47632

This allows MC to match `data32 ...` as one instruction instead of two (data32 without insn + insn).

The compatibility with GNU as improves: `data32 ljmp` will be matched as ljmpl.
`data32 lgdt 4(%eax)` will be matched as `lgdtl` (prefixes: 0x67 0x66, instead
of 0x66 0x67).

GNU as supports many other `data32 *w` as `*l`. We currently just hard code
`data32 callw` and `data32 ljmpw`.  Generalizing the suffix replacement is
tricky and requires a think about the "bwlq" appending suffix rules in MatchAndEmitATTInstruction.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88772
2020-10-06 08:32:03 -07:00
Craig Topper 4da4e7cb20 [X86] Remove X86ISD::LCMPXCHG8_SAVE_EBX_DAG and LCMPXCHG8B_SAVE_EBX pseudo instruction
This and its friend X86ISD::LCMPXCHG8_SAVE_RBX_DAG are used if we need to avoid clobbering the frame pointer in EBX/RBX. EBX/RBX are only used a frame pointer in 64-bit mode. In 64-bit mode we don't use CMPXCHG8B since we have a GR64 cmpxchg available. So we don't need special handling for LCMPXCHG8B.

Split from D88808

Differential Revision: https://reviews.llvm.org/D88853
2020-10-05 15:03:07 -07:00
Craig Topper 1127662c6d [SelectionDAG] Make sure FMF are propagated when getSetcc canonicalizes FP constants to RHS.
getNode handling for ISD:SETCC calls FoldSETCC which can canonicalize
FP constants to the RHS. When this happens we should create the node
with the FMF that was requested. By using FlagInserter when can ensure
any calls to getNode/getSetcc during canonicalization will also get the flags.

Differential Revision: https://reviews.llvm.org/D88063
2020-10-05 14:55:23 -07:00
Simon Pilgrim 0ac210e580 [X86] isTargetShuffleEquivalent - merge duplicate array accesses. NFCI. 2020-10-05 17:22:14 +01:00
Craig Topper b18026114a [X86] MWAITX_SAVE_RBX should not have EBX as an implicit use.
RBX was copied to a virtual register before this instruction
was created. And the EBX input for the final MWAITX is still
in a virtual register. So EBX isn't read by this pseudo.
2020-10-04 20:34:31 -07:00
Craig Topper 4b38ceb0eb [X86] Remove MWAITX_SAVE_EBX pseudo instruction. Always save/restore the full %rbx register even in gnux32.
ebx/rbx only needs to be saved when 64-bit registers are supported
anyway. It should be fine to save/restore the whole rbx register
even in gnux32 where the base is technically just ebx.

This matches what we do for cmpxchg16b where rbx is saved/restored
regardless of gnux32.
2020-10-04 16:28:15 -07:00
Craig Topper 952dfd76c6 [X86] Correct the implicit defs/uses for the MWAITX pseudo instructions.
MWAITX doesn't touch EFLAGS so no pseudos should def EFLAGS.

The SAVE_EBX/RBX pseudos only needs to def the EBX register that
the expansion overwrites. The EAX and ECX registers are only read.

The pseudo emitted during isel that is used by the custom inserter
shouldn't have any implicit defs or uses since everything is in
vregs.
2020-10-04 15:28:38 -07:00
Craig Topper 0db97234cf [X86] Remove usesCustomInserter from MWAITX_SAVE_EBX and MWAITX_SAVE_RBX. NFC
These are now emitted by a CustomInserter rather than using a custom
inserter themselves.
2020-10-04 15:28:38 -07:00
Martin Storsjö b4288f278a [X86] Remove an accidentally added file. NFC.
This file seems to have been accidentally added as part of commit
413577a879.
2020-10-04 22:33:26 +03:00
Craig Topper 28595cbbeb [X86] Synchronize the loadiwkey builtin operand order with gcc version. 2020-10-04 12:09:29 -07:00
Simon Pilgrim e4e5c42896 [X86][SSE] isTargetShuffleEquivalent - ensure shuffle inputs are the correct size.
Preliminary patch for the next stage of PR45974 - we don't want to be creating 'padded' vectors on-the-fly at all in combineX86ShufflesRecursively, and only pad the source inputs if we have a definite match inside combineX86ShuffleChain.

This means that the inputs to combineX86ShuffleChain might soon be smaller than the final root value type, so we should ensure that isTargetShuffleEquivalent only matches with the inputs if they are the correct size.
2020-10-04 15:32:05 +01:00
Craig Topper ae2e51597f [X86] LOADIWKEY, ENCODEKEY128 and ENCODEKEY256 clobber EFLAGS. 2020-10-03 21:55:03 -07:00
Craig Topper a7e45ea30d [X86] Add memory operand to AESENC/AESDEC Key Locker instructions.
This removes FIXMEs from selectAddr.
2020-10-03 21:42:16 -07:00
Craig Topper 39fc4a0b0a [X86] Move ENCODEKEY128/256 handling from lowering to selection.
We should avoid emitting MachineSDNodes from lowering.

We can use the the implicit def handling in InstrEmitter to avoid
manually copying from each xmm result register. We only need to
manually emit the copies for the implicit uses.
2020-10-03 18:44:53 -07:00
Craig Topper 7f3da48885 [X86] Remove X86ISD::MWAITX_DAG. Just match the intrinsic to the custom inserter pseudo instruction during isel. 2020-10-03 18:44:53 -07:00
Craig Topper adccc0bfa3 [X86] Add X86ISD opcodes for the Key Locker AESENC*KL and AESDEC*KL instructions
Instead of emitting MachineSDNodes during lowering, emit X86ISD
opcodes. These opcodes will either be selected by tablegen
patterns or custom selection code.

Emitting MachineSDNodes during lowering is uncommon so this makes
things more consistent. It also allows selectAddr to be called to
perform address matching during instruction selection.

I had trouble getting tablegen to accept XMM0-XMM7 as results in
an isel pattern for the WIDE instructions so I had to use custom
instruction selection.
2020-10-03 16:55:19 -07:00
Craig Topper e2dd86bbfc [X86] Key Locker instructions should use VR128 regclass not VR128X. 2020-10-02 21:55:07 -07:00
Craig Topper 8ae4842669 [X86] Move MWAITX_DAG ISD opcode so it is not in the strict FP range.
Add a comment to hopefully prevent anyone else from making the
same mistake.
2020-10-02 18:22:02 -07:00
serge-sans-paille f2c6bfa350 Fix interaction between stack alignment and inline-asm stack clash protection
As reported in https://github.com/rust-lang/rust/issues/70143 alignment is not
taken into account when doing the probing. Fix that by adjusting the first probe
if the stack align is small, or by extending the dynamic probing if the
alignment is large.

Differential Revision: https://reviews.llvm.org/D84419
2020-10-02 16:51:49 +02:00
serge-sans-paille 9573c9f2a3 Fix limit behavior of dynamic alloca
When the allocation size is 0, we shouldn't probe. Within [1,  PAGE_SIZE], we
should probe once etc.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47657

Differential Revision: https://reviews.llvm.org/D88548
2020-10-02 11:10:02 +02:00
Craig Topper d1d7fc9832 [X86] Canonicalize (x > 1) ? x : 1 -> (x >= 1) ? x : 1 for sign and unsigned to enable the use of test instructions for the compare.
This will be further canonicalized to a compare involving 0
which will enable the use of test instructions. Either using
cmovg for signed for cmovne for unsigned.

Fixes more case for PR47049
2020-09-30 13:50:52 -07:00
Xiang1 Zhang 413577a879 [X86] Support Intel Key Locker
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88398
2020-09-30 18:08:45 +08:00
Craig Topper 618a890b72 [X86] Increase the depth threshold required to form VPERMI2W/VPERMI2B in shuffle combining
These instructions are implemented with two port 5 uops and one port 015 uop so they are more complicated that most shuffles.

This patch increases the depth threshold for when we form them during shuffle combining to try to limit increasing the number of uops especially on port 5.

Differential Revision: https://reviews.llvm.org/D88503
2020-09-29 18:37:23 -07:00
Eric Astor feb74530f8 [ms] [llvm-ml] Accept whitespace around the dot operator
MASM allows arbitrary whitespace around the Intel dot operator, especially when used for struct field lookup

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D88450
2020-09-29 17:01:13 -04:00
Eric Astor 6b70a83d9c [ms] [llvm-ml] Add support for .radix directive, and accept all radix specifiers
Add support for .radix directive, and radix specifiers [yY] (binary), [oOqQ] (octal), and [tT] (decimal).

Also, when lexing MASM integers, require radix specifier; MASM requires that all literals without a radix specifier be treated as in the default radix. (e.g., 0100 = 100)

Relanding D87400, now with fewer ms-inline-asm tests broken!

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D88337
2020-09-29 16:55:51 -04:00
Craig Topper 82da0cabb9 [X86] Add computeKnownBits support for PEXT.
The number of zeros in the mask provides a lower bound on the number
of leading zeros in the result.
2020-09-28 22:54:07 -07:00
Craig Topper e53196b1e8 [X86] Add support for calling SimplifyDemandedBits on the input of PDEP with a constant mask.
We can do several optimizations for PDEP using computeKnownBits and SimplifyDemandedBits

-If the MSBs of the output aren't demanded, those MSBs of the mask input aren't demanded either. We need to keep the most significant demanded bit of the mask and any mask bits before it.
-The number of possible ones in the mask determines how many bits of the lsbs of the other operand are demanded. Any bits of the mask we don't demand by the previous rule should not be counted.
-The result will have zeros in any position that the mask is zero.
-Since non-mask input bits can only be output in the original position or a higher bit position, the result will have at least as many trailing zeroes as the non-mask input.

Differential Revision: https://reviews.llvm.org/D87883
2020-09-28 14:21:30 -07:00
Simon Pilgrim e0820d87e3 [X86] Flip isShuffleEquivalent argument order to match isTargetShuffleEquivalent
A while ago, we converted isShuffleEquivalent/isTargetShuffleEquivalent to both use IsElementEquivalent internally.

This allows us to make the shuffle args optional like isTargetShuffleEquivalent and update foldShuffleOfHorizOp to use isShuffleEquivalent (which it should as its using a ISD::VECTOR_SHUFFLE mask).
2020-09-28 12:53:56 +01:00
Simon Pilgrim 6b5198f06b [X86] Simplify broadcast mask detection with isUndefOrEqual helper.
Add an additional isUndefOrEqual variant that matches an entire mask, not just a single value.
2020-09-28 12:53:56 +01:00
Simon Pilgrim 283036394e [X86][SSE] combineVectorTruncation - enable (pre-SSSE3) vXi16->vXi8 truncation.
Shuffle combining can now handle this output, and by performing this early in combineVectorTruncation we avoid a scalarization that caused a regression on D87502.
2020-09-24 15:51:36 +01:00
Fangrui Song 3d38a975d7 [X86] Parse data32 call in .code16 as CALLpcrel32
Used by kexec-tools (PR46942)
In GNU as, tc-i386.c:output_jump uses 4-byte immediate if a data32 prefix is present.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88137
2020-09-23 18:37:41 -07:00
Freddy Ye bc7f6c6dd8 [X86] Add TDX instructions.
For more details about these instructions, please refer to the latest TDX document: https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D88006
2020-09-24 09:35:44 +08:00
Eric Astor b901b6ab17 Revert "[ms] [llvm-ml] Add support for .radix directive, and accept all radix specifiers"
This reverts commit 5dd1b6d612.
2020-09-23 13:59:34 -04:00
Craig Topper f21f835ee8 [X86] Improve demanded bits for X86ISD::BEXTR.
If the control is constant we can figure out exactly which bits
of the input are demanded.

Differential Revision: https://reviews.llvm.org/D88072
2020-09-23 10:51:02 -07:00
Eric Astor 5dd1b6d612 [ms] [llvm-ml] Add support for .radix directive, and accept all radix specifiers
Add support for .radix directive, and radix specifiers [yY] (binary), [oOqQ] (octal), and [tT] (decimal).

Also, when lexing MASM integers, require radix specifier; MASM requires that all literals without a radix specifier be treated as in the default radix. (e.g., 0100 = 100)

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D87400
2020-09-23 13:45:58 -04:00
Bing1 Yu ec24e50553 [CostModel][X86] add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8)
add CostModel for SK_Select(v8f64, v8i64, v16f32, v16i32, v32i16, v64i8)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D87884
2020-09-23 10:29:10 +08:00
Simon Pilgrim 0793b45660 [X86] Add missing namespace closure comments. NFCI.
Fixes some clang-tidy llvm-namespace-comment warnings.
2020-09-22 15:06:59 +01:00
Simon Pilgrim af71298648 [X86] Cleanup/add namespace closure comments. NFCI.
Fixes some clang-tidy llvm-namespace-comment warnings.
2020-09-22 15:06:58 +01:00
Meera Nakrani a3d0dce260 [ARM][TTI] Prevents constants in a min(max) or max(min) pattern from being hoisted when in a loop
Changes TTI function getIntImmCostInst to take an additional Instruction parameter,
which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT.
We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated.
Required minor changes in some non-ARM backends to allow for the optional parameter to be included.

Differential Revision: https://reviews.llvm.org/D87457
2020-09-22 11:54:10 +00:00
Craig Topper a74b1faba2 [X86] Make reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore work for avx512 after type legalization.
The scalar elements of the vXi1 build_vector will have been type legalized to i8 by padding with 0s. So we can't check for all ones. Instead we should just look at bit 0 of the constant.

Differential Revision: https://reviews.llvm.org/D87863
2020-09-20 13:54:20 -07:00
Craig Topper 4e8c028158 [X86] Stop reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore from creating scalar i64 load/stores in 32-bit mode
If we emit a scalar i64 load/store it will get type legalized to two i32 load/stores.

Differential Revision: https://reviews.llvm.org/D87862
2020-09-20 13:46:59 -07:00
Simon Pilgrim 0bfeede669 [X86][SSE] Fold EXTEND_VECTOR_INREG(EXTRACT_SUBVECTOR(EXTEND(X),0)) -> EXTEND_VECTOR_INREG(X) 2020-09-20 18:39:12 +01:00
Simon Pilgrim bb0078e591 [X86][SSE] Fold SIGN_EXTEND(SIGN_EXTEND_VECTOR_INREG(X)) -> SIGN_EXTEND_VECTOR_INREG(X)
It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp
2020-09-20 18:39:12 +01:00
Simon Pilgrim 15c8306056 [X86][SSE] Fold EXTEND_VECTOR_INREG(EXTEND_VECTOR_INREG(X)) -> EXTEND_VECTOR_INREG(X)
It should be possible to make this generic, but we're not great at checking legality of *_EXTEND_VECTOR_INREG ops so I'm conservatively putting this inside X86ISelLowering.cpp
2020-09-20 16:33:02 +01:00
Simon Pilgrim a0c8793ce6 [X86][SSE] Enable ZERO_EXTEND_VECTOR_INREG shuffle combining on SSE41 targets.
Allows ZERO_EXTEND_VECTOR_INREG to be shuffle combined on all targets where it is legal.
2020-09-20 16:05:10 +01:00
Simon Pilgrim 2b634a9d0e [X86] Rename getExtendInVec to getEXTEND_VECTOR_INREG. NFCI.
Make it easier to find the method by naming it after the ops it actually handles. We already do this for lowering/combining.
2020-09-20 15:19:39 +01:00
Simon Pilgrim 91720ee561 [X86] combineX86ShufflesRecursively - fix use after move warning. NFCI.
After moving WidenedMask is in an undefined state, so reduce scope of the variable so its reinitialized every iteration - we should still retain any memory allocation savings.
2020-09-20 14:06:50 +01:00
Simon Pilgrim e17686ae60 [X86] Rename combineExtInVec to combineEXTEND_VECTOR_INREG. NFCI.
Make it easier to find the method by naming it after the ops it actually handles. We already do this for lowering.
2020-09-20 12:16:00 +01:00
Craig Topper 721d57f952 [X86] Return from SimplifyDemandedBitsForTargetNode after calculating known bits for VSHLI/VSRAI/VSRLI.
We were breaking out of the switch which falls into the default
implementation of SimplifyDemandedBitsForTargetNode which is a
wrapper around computeKnownBits. So we end up doing the recursion
and known bits calculation all over again. Instead we should return
with the known bits we calculated in the switch.
2020-09-18 23:57:01 -07:00
Craig Topper 58ecbbcdcd [X86] Fix copy paste mistake in @ccnp flag.
We were treating @ccp and @ccnp the same.
2020-09-18 21:28:01 -07:00
Philip Reames 06f136f61e [instcombine][x86] Converted pdep/pext with shifted mask to simple arithmetic
If the mask of a pdep or pext instruction is a shift masked (i.e. one contiguous block of ones) we need at most one and and one shift to represent the operation without the intrinsic. One all platforms I know of, this is faster than the pdep/pext.

The cost modelling for multiple contiguous blocks might be worth exploring in a follow up, but it's not relevant for my current use case. It would almost certainly be a win on AMDs where these are really really slow though.

Differential Revision: https://reviews.llvm.org/D87861
2020-09-18 14:54:24 -07:00
Simon Pilgrim 4ebd30722a [X86][AVX] lowerBuildVectorAsBroadcast - improve BROADCASTM lowering on non-VLX targets
Broadcast to a ZMM type then extract the low subvector.
2020-09-18 19:52:02 +01:00
Simon Pilgrim ceadd98c2f [X86][AVX] lowerBuildVectorAsBroadcast - improve i64 BROADCASTM lowering on 32-bit targets
We already handle the the cases where we have a 'zero extended splat' build vector (a, 0, 0, 0, a, 0, 0, 0, ...) but were missing the case where the 'a' scalar was zero-extended as well - such as i64 -> vXi64 splat cases on 32-bit targets.
2020-09-18 16:59:57 +01:00
Craig Topper 3783d3bc7b [X86] Don't match x87 register inline asm constraints unless the VT is floating point or its a clobber
The register class picked will be the RFP80 register class which has a f80 VT. The code in SelectionDAGBuilder that generates copies around inline assembly doesn't know how to handle an integer and floating point type of different bit widths.

The test case is derived from this https://godbolt.org/z/sEa659 which gcc accepts but clang crashes on. This patch just gives a more graceful error. I'm not sure if the single element struct case is special in gcc. Adding another field to the struct makes gcc reject it. If we want to support this correctly I think we need a change in the frontend to give us the true element type. Right now the frontend just realizes the constraint can take a memory argument so creates an integer type of the same size and bitcasts.

Differential Revision: https://reviews.llvm.org/D87485
2020-09-17 11:26:50 -07:00
Simon Pilgrim f026812110 InstCombiner.h - remove unnecessary KnownBits.h include. NFCI.
Move the include down to cpp files with an implicit dependency.
2020-09-17 14:28:42 +01:00
Rainer Orth a9cbe5cf30 [X86] Fix stack alignment on 32-bit Solaris/x86
On Solaris/x86, several hundred 32-bit tests `FAIL`, all in the same way:

  env ASAN_OPTIONS=halt_on_error=false ./halt_on_error_suppress_equal_pcs.cpp.tmp
  Segmentation Fault (core dumped)

They segfault during startup:

  Thread 2 received signal SIGSEGV, Segmentation fault.
  [Switching to Thread 1 (LWP 1)]
  0x080f21f0 in __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) () at /vol/llvm/src/llvm-project/dist/compiler-rt/lib/sanitizer_common/sanitizer_solaris.cpp:65
  65	                             int prot, int flags, int fd, OFF_T offset) {
  1: x/i $pc
  => 0x80f21f0 <_ZN11__sanitizer13internal_mmapEPvmiiiy+16>:	movaps 0x30(%esp),%xmm0
  (gdb) p/x $esp
  $3 = 0xfeffd488

The problem is that `movaps` expects 16-byte alignment, while 32-bit Solaris/x86
only guarantees 4-byte alignment following the i386 psABI.

This patch updates `X86Subtarget::initSubtargetFeatures` accordingly,
handles Solaris/x86 in the corresponding testcase, and allows for some
variation in address alignment in
`compiler-rt/test/ubsan/TestCases/TypeCheck/vptr.cpp`.

Tested on `amd64-pc-solaris2.11` and `x86_64-pc-linux-gnu`.

Differential Revision: https://reviews.llvm.org/D87615
2020-09-17 11:17:11 +02:00
Simon Pilgrim b2c931eff3 [X86] EmitInstrWithCustomInserter - remove redundant getDebugLoc() calls. NFCI.
Use the same DebugLoc that is called at the top of the method.

Fixes some Wshadow static analyzer warnings.
2020-09-16 16:29:56 +01:00
Simon Pilgrim cd46151202 [X86] Assert that we've found a terminator instruction. NFCI.
Fixes clang static analayzer null dereference warning.
2020-09-16 16:17:49 +01:00
Simon Pilgrim aa4b0b755a [X86][SSE] Move VZEXT_MOVL(INSERT_SUBVECTOR(UNDEF,X,0)) handling into combineTargetShuffle.
Now that we're getting better at combining shuffles of different vector widths, this can now be performed as part of the standard target shuffle combines and isn't required for cleanup.

Exposed a minor issue in combineX86ShufflesRecursively where we failed to check if a shuffle's src ops were simple types.
2020-09-16 16:08:31 +01:00
Craig Topper 41f4cd60d5 [X86] Don't scalarize gather/scatters with non-power of 2 element counts. Widen instead.
We can pad the mask with zeros in order to widen. We already do
this for power 2 types that are smaller than a legal type.
2020-09-15 23:22:53 -07:00
Craig Topper 2ce1a697f0 [X86] Always use 16-bit displacement in 16-bit mode when there is no base or index register.
Previously we only did this if the immediate fit in 16 bits, but
the GNU assembler seems to just truncate.

Fixes PR46952
2020-09-15 19:31:48 -07:00
Craig Topper 05134877e6 [X86] Use Align in reduceMaskedLoadToScalarLoad/reduceMaskedStoreToScalarStore. Correct pointer info.
If we offset the pointer, we also need to offset the pointer info

Differential Revision: https://reviews.llvm.org/D87593
2020-09-15 11:22:02 -07:00
Simon Pilgrim a43e68b58b [X86][AVX] lowerShuffleWithSHUFPS - handle missed canonicalization cases.
PR47534 exposes a case where calling lowerShuffleWithSHUFPS directly from a derived repeated mask (found by is128BitLaneRepeatedShuffleMask) results in us using an non-canonicalized mask.

The missed canonicalization in this case is trivial - just commute the mask so we have more (swapped) LHS than RHS references so lowerShuffleWithSHUFPS can handle it.
2020-09-15 17:31:08 +01:00
Simon Pilgrim fc446935d7 [X86] detectAVGPattern - accept non-pow2 vectors by padding.
Drop the pow2 vector limitation for AVG generation by padding the vector to the next pow2, creating the PAVG nodes and then extracting the final subvector.

Fixes some poor codegen that has been annoying me for years.....
2020-09-15 10:07:03 +01:00
Craig Topper 46673763fe [X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract
Fixes PR47482
2020-09-14 16:59:04 -07:00
Craig Topper da1aaa0b70 Revert "[X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract."
I got the bug number wrong.

This reverts commit 3251593890.
2020-09-14 16:58:57 -07:00
Craig Topper 3251593890 [X86] Place new constant node in topological order in X86DAGToDAGISel::matchBitExtract.
Fixes PR47525
2020-09-14 16:28:37 -07:00
Craig Topper c193a689b4 [SelectionDAG] Use Align/MaybeAlign in calls to getLoad/getStore/getExtLoad/getTruncStore.
The versions that take 'unsigned' will be removed in the future.

I tried to use getOriginalAlign instead of getAlign in some
places. getAlign factors in the minimum alignment implied by
the offset in the pointer info. Since we're also passing the
pointer info we can use the original alignment.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D87592
2020-09-14 13:54:50 -07:00
Eric Astor 23a2b03221 [ms] [llvm-ml] Add basic support for SEH, including PROC FRAME
Add basic support for SEH, including PROC FRAME

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86948
2020-09-14 14:32:55 -04:00
Eric Astor 20201dc76a [ms] [llvm-ml] Add support for size queries in MASM
Add support for size inference, sizeof, typeof, and lengthof.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86947
2020-09-14 14:27:06 -04:00
Craig Topper 758732a34e [X86] Use ISD::PARITY directly instead of emitting CTPOP and AND from combineHorizontalPredicateResult.
We have a PARITY ISD node now so might as well use it. It will
get re-expanded later.
2020-09-12 20:01:17 -07:00
Craig Topper ad3d6f993d [SelectionDAG][X86][ARM][AArch64] Add ISD opcode for __builtin_parity. Expand it to shifts and xors.
Clang emits (and (ctpop X), 1) for __builtin_parity. If ctpop
isn't natively supported by the target, this leads to poor codegen
due to the expansion of ctpop being more complex than what is needed
for parity.

This adds a DAG combine to convert the pattern to ISD::PARITY
before operation legalization. Type legalization is updated
to handled Expanding and Promoting this operation. If after type
legalization, CTPOP is supported for this type, LegalizeDAG will
turn it back into CTPOP+AND. Otherwise LegalizeDAG will emit a
series of shifts and xors followed by an AND with 1.

I've avoided vectors in this patch to avoid more legalization
complexity for this patch.

X86 previously had a custom DAG combiner for this. This is now
moved to Custom lowering for the new opcode. There is a minor
regression in vector-reduce-xor-bool.ll, but a follow up patch
can easily fix that.

Fixes PR47433

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87209
2020-09-12 11:42:18 -07:00
Simon Pilgrim 3170d54842 [InstCombine][X86] Covert masked load/stores with (sign extended) bool vector masks to generic intrinsics.
As detailed on PR11210, if the mask is known to come from a (sign extended) bool vector (e.g. comparisons) then we can represent with a generic masked load/store without losing anything.

We already do something similar for BLENDV -> SELECT conversion.
2020-09-12 15:09:28 +01:00
Simon Pilgrim 50ee0b99ec [InstCombine][X86] getNegativeIsTrueBoolVec - use ConstantExpr evaluators. NFCI.
Don't do this manually, we can just use the ConstantExpr evaluators to do it more tidily for us.
2020-09-12 13:58:58 +01:00
Simon Pilgrim 35dc91aee2 [X86][SSE] lowerShuffleAsDecomposedShuffleBlend - support decomposed unpacks for some vXi8/vXi16 cases
Follow up to D86429 to handle the remaining regressions.

This patch generalizes lowerShuffleAsDecomposedShuffleBlend to lowerShuffleAsDecomposedShuffleMerge, and attempts to use an UNPCKL shuffle mask instead of a blend for the cases where the inputs are coming from alternating vXi8/vXi16 sources. Technically they don't have to be alternating (just as long as they can fit into a lower lane half for the unpack) but I didn't find as many general cases and it needed a lot more of the function to be altered.

For vXi32/vXi64 cases this could still be beneficial but in most cases the existing permute+blend approach was better.

Differential Revision: https://reviews.llvm.org/D87405
2020-09-12 13:39:33 +01:00
Simon Pilgrim 70a05ee288 [X86] Keep variables from getDataLayout/getDebugLoc calls as const reference. NFCI.
These are only ever used as references in the called functions, so just pass the original reference instead of copying it.
2020-09-11 10:44:42 +01:00
Anna Thomas 46329f6079 [ImplicitNullCheck] Handle instructions that preserve zero value
This is the first in a series of patches to make implicit null checks
more general. This patch identifies instructions that preserves zero
value of a register and considers that as a valid instruction to hoist
along with the faulting load. See added testcases.

Reviewed-By: reames, dantrushin

Differential Revision: https://reviews.llvm.org/D87108
2020-09-10 13:39:50 -04:00
Simon Pilgrim b585fdae24 [X86] Use Register instead of unsigned. NFCI.
Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.
2020-09-10 16:05:33 +01:00
Simon Pilgrim de25ebaac6 [CostModel][X86] Add vXi32 division by uniform constant costs (PR47476)
Other types can be handled in future patches but their uniform / non-uniform costs are more similar and don't appear to cause many vectorization issues.
2020-09-10 12:17:54 +01:00
Simon Pilgrim fc49abee56 [X86][SSE] lowerShuffleAsSplitOrBlend always returns a shuffle.
lowerShuffleAsSplitOrBlend always returns a target shuffle result (and is the default operation for lowering some shuffle types), so we don't need to check for null.
2020-09-10 11:45:08 +01:00
Simon Pilgrim e80605e242 [X86] Remove WaitInsert::TTI member. NFCI.
This is only ever set/used inside WaitInsert::runOnMachineFunction so don't bother storing it in the class.
2020-09-10 11:45:08 +01:00
Amara Emerson e5784ef8f6 [GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator.
We weren't using this before, so none of the MachineFunction CFG edges had the
branch probability information added. As a result, block placement later in the
pipeline was flying blind.

This is enabled only with optimizations enabled like SelectionDAG.

Differential Revision: https://reviews.llvm.org/D86824
2020-09-09 14:31:12 -07:00
Hiroshi Yamauchi 0ab6a15698 [X86] Add support for using fast short rep mov for memcpy lowering.
Disabled by default behind an option.

Differential Revision: https://reviews.llvm.org/D86883
2020-09-09 12:46:40 -07:00
Simon Pilgrim 6e45b98934 X86CallFrameOptimization.cpp - use const references where possible. NFCI. 2020-09-09 16:35:08 +01:00
Simon Pilgrim e706116e11 X86FrameLowering::adjustStackWithPops - cleanup auto usage. NFCI.
Don't use auto for non-obvious types, and use const references.
2020-09-09 16:15:02 +01:00
Craig Topper b1e68f885b [SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.:
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.

This required adding a SDNodeFlags to SelectionDAG::getSetCC.

Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.

Differential Revision: https://reviews.llvm.org/D87200
2020-09-08 15:27:21 -07:00
Simon Pilgrim fcff2c32c0 X86CallLowering.cpp - improve auto const/pointer/reference qualifiers. NFCI.
Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.
2020-09-08 13:01:23 +01:00
Simon Pilgrim 0729ae367a X86DomainReassignment.cpp - improve auto const/pointer/reference qualifiers. NFCI.
Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.
2020-09-08 13:01:23 +01:00
Craig Topper da79b1eecc [SelectionDAG][X86][ARM] Teach ExpandIntRes_ABS to use sra+add+xor expansion when ADDCARRY is supported.
Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and
XORs to expand ABS. This is the multi-part version of the sequence
we use in LegalizeDAG.

It's also the same as the Custom sequence uses for i64 on 32-bit
and i128 on 64-bit. So we can remove the X86 customization.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D87215
2020-09-07 13:15:26 -07:00
Craig Topper 01b3e16757 [X86] Use the same sequence for i128 ISD::ABS on 64-bit targets as we use for i64 on 32-bit targets.
Differential Revision: https://reviews.llvm.org/D87214
2020-09-07 11:14:05 -07:00
Simon Pilgrim 9de0a3da6a [X86][SSE] Don't use LowerVSETCCWithSUBUS for unsigned compare with +ve operands (PR47448)
We already simplify the unsigned comparisons if we've found the operands are non-negative, but we were still calling LowerVSETCCWithSUBUS which resulted in the PR47448 regressions.
2020-09-07 16:11:40 +01:00
Simon Pilgrim 5bb27e735d X86AvoidStoreForwardingBlocks.cpp - use unsigned for Opcode values. NFCI.
Fixes clang-tidy cppcoreguidelines-narrowing-conversions warnings.
2020-09-07 12:56:27 +01:00
Simon Pilgrim 9b645ebfff [X86][AVX] Use lowerShuffleWithPERMV in shuffle combining to support non-VLX targets
lowerShuffleWithPERMV allows us to use the ZMM variants for 128/256-bit variable shuffles on non-VLX AVX512 targets.

This is another step towards shuffle combining through between vector widths - we still end up with an annoying regression (combine_vpermilvar_vperm2f128_zero_8f32) but we're going in the right direction....
2020-09-07 12:50:50 +01:00
Benjamin Kramer 7ba0f81934 [X86] Unbreak the build after 22fa6b20d9 2020-09-07 12:24:30 +02:00
Simon Pilgrim 71dfdbe2c7 [X86] getFauxShuffleMask - handle insert_subvector(zero, sub, C)
Directly use SM_SentinelZero elements if we're (widening)inserting into a zero vector.
2020-09-07 11:10:40 +01:00
Simon Pilgrim 9ad261540d [X86] Use Register instead of unsigned. NFCI.
Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.
2020-09-07 10:49:29 +01:00
Simon Pilgrim 22fa6b20d9 [X86] Use Register instead of unsigned. NFCI.
Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.
2020-09-07 10:38:09 +01:00
Simon Pilgrim 0dbe2504af [X86] Use Register instead of unsigned. NFCI.
Fixes llvm-prefer-register-over-unsigned clang-tidy warning.
2020-09-07 10:38:08 +01:00
Simon Pilgrim ecac5c2808 [X86][AVX] lowerShuffleWithPERMV - adjust binary shuffle masks to account for widening on non-VLX targets
rGabd33bf5eff2 enabled us to pad 128/256-bit shuffles to 512-bit on non-VLX targets, but wasn't updating binary shuffles to account for the new vector width.
2020-09-06 14:52:25 +01:00
Craig Topper 35b35a373d [X86] Prevent shuffle combining from creating an identical X86ISD::SHUF128.
This can cause an infinite loop if SimplifiedDemandedElts asks
for the node to replace itself.

A similar protection exists in other places in shuffle combining.

Fixes ISPC https://github.com/ispc/ispc/issues/1864
2020-09-04 14:12:49 -07:00
Simon Pilgrim 740625fecd [X86] Make lowerShuffleAsLanePermuteAndPermute use sublanes on AVX2
Extends lowerShuffleAsLanePermuteAndPermute to search for opportunities to use vpermq (64-bit cross-lane shuffle) and vpermd (32-bit cross-lane shuffle) to get elements into the correct lane, in addition to the 128-bit full-lane permutes it previously searched for.

This is especially helpful in cross-lane byte shuffles, where the alternative tends to be "vpshufb both lanes separately and blend them with a vpblendvb", which is very expensive, especially on Haswell where vpblendvb uses the same execution port as all the shuffles.

Addresses PR47262

Patch By: @TellowKrinkle (TellowKrinkle)

Differential Revision: https://reviews.llvm.org/D86429
2020-09-04 11:41:26 +01:00
Craig Topper 0851350557 [X86] Update stale comment. NFC
The optimization in ExpandIntOp_UINT_TO_FP was removed in D72728
in January 2020.
2020-09-03 16:19:10 -07:00
Simon Pilgrim 58afaecdc2 X86/X86TargetObjectFile.cpp - remove unused headers. NFCI. 2020-09-03 15:17:44 +01:00
Simon Pilgrim 0563cd6739 Fix spelling mistake. NFC. 2020-09-03 15:17:44 +01:00
Simon Pilgrim 890707aa01 [X86] Avoid llvm-qualified-auto warning by not using auto. NFC.
Try to consistently use the actual type name in the file.
2020-09-03 14:21:17 +01:00
Simon Pilgrim 23d9f4b958 [X86] Fix llvm-qualified-auto warning by using auto*. NFC. 2020-09-03 14:21:17 +01:00
Simon Pilgrim 5b29269744 [X86] Fix llvm-qualified-auto warning by using const auto*. NFC. 2020-09-03 14:21:17 +01:00
Simon Pilgrim e56edb801b [X86][SSE] Fold select(X > -1, A, B) -> select(0 > X, B, A) (PR47404)
Help PBLENDVB peek through to the sign bit source of the selection mask by swapping the select condition and inputs.
2020-09-03 13:02:08 +01:00
Simon Pilgrim 888049b97a [X86][SSE] Fold vselect(pshufb,pshufb) -> or(pshufb,pshufb)
If the PSHUFBs have no other uses, then we can force the unselected elements to zero to OR them instead, avoiding both an extra mask load and a costly variable blend.

Eventually we should try to bring this into shuffle combining, once we can more easily convert between shuffles + select patterns.
2020-09-02 16:55:00 +01:00
Martin Storsjö 4820af2bfc [X86] Remove superfluous trailing semicolons, fixing warnings. NFC. 2020-09-02 11:43:27 +03:00
Simon Pilgrim 21d02dc595 [X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support
This patch uses partial DemandedElts masks to further simplify target shuffle chains and finally starts making target shuffle combining part of SimplifyDemandedBits/SimplifyDemandedVectorElts.

We already manage this for Depth == 0 cases, where combineX86ShuffleChain would early-out if the shuffle combined to the same op, but the patch generalizes this by manipulating the depth handling of combineX86ShufflesRecursively - calling with a new Depth = 0 and reducing the maximum shuffle combine depth accordingly.

Differential Revision: https://reviews.llvm.org/D66004
2020-09-02 09:24:46 +01:00
Eric Astor a57fdcdd40 x87 FPU state instructions do not use an f32 memory location
These instructions actually use a 512-byte location, where bytes 464-511 are ignored.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86942
2020-09-01 13:50:07 -04:00
Sanjay Patel 2d3e12818e [FastISel] update to use intrinsic's isCommutative(); NFC
This requires adding a missing 'const' to the definition because
the callers are using const args, but there should be no change
in behavior.

The intrinsic method was added with D86798 / rG096527214033
2020-08-30 11:36:41 -04:00
Craig Topper aab90384a3 [Attributes] Add a method to check if an Attribute has AttrKind None. Use instead of hasAttribute(Attribute::None)
There's a special case in hasAttribute for None when pImpl is null. If pImpl is not null we dispatch to pImpl->hasAttribute which will always return false for Attribute::None.

So if we just want to check for None its sufficient to just check that pImpl is null. Which can even be done inline.

This patch adds a helper for that case which I hope will speed up our getSubtargetImpl implementations.

Differential Revision: https://reviews.llvm.org/D86744
2020-08-28 13:23:45 -07:00
Craig Topper ba852e1e19 [X86] Don't call hasFnAttribute and getFnAttribute for 'prefer-vector-width' and 'min-legal-vector-width' in getSubtargetImpl
We only need to call getFnAttribute and then check if the Attribute
is None or not.
2020-08-27 10:40:20 -07:00
Matt Arsenault 0b7f6cc71a GlobalISel: Add generic instructions for memory intrinsics
AArch64, X86 and Mips currently directly consumes these and custom
lowering to produce a libcall, but really these should follow the
normal legalization process through the libcall/lower action.
2020-08-26 20:08:45 -04:00
Craig Topper 92d3e70df3 [X86] Change pentium4 tuning settings and scheduler model back to their values before D83913.
Clang now defaults to -march=pentium4 -mtune=generic so we don't
need modern tune settings on pentium4.
2020-08-26 15:38:12 -07:00
Craig Topper 09288bcbf5 [X86] Add assembler support for .d32 and .d8 mnemonic suffixes to control displacement size.
This is an older syntax than the {disp32} and {disp8} pseudo
prefixes that were added a few weeks ago. We can reuse most of
the support for that to support .d32 and .d8 as well.
2020-08-26 10:45:50 -07:00
Pierre Gousseau cda6b09242 [X86] Make sure we do not clobber RBX with mwaitx when used as a base
pointer.

mwaitx uses EBX as one of its argument.
Using this instruction clobbers RBX as it is defined to hold one of the
input. When the backend uses dynamically allocated stack, RBX is used as
a reserved register for the base pointer.

This patch is adapted from @qcolombet patch for cmpxchg at r263325.

This fixes PR43528.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D73475
2020-08-26 11:20:31 +01:00
Craig Topper 1d1515a9e2 [X86] Add an isel pattern for (i8 (trunc (i16 (bitconvert (v16i1 X))))) to avoid an extra EXTRACT_SUBREG
Since we can only copy to GR32 we had to EXTRACT from GR32, but
we would first go to GR16 and then the truncate would extra again
to GR8. This adds a special case to go directly from GR32 to GR8.
This would eventually get cleaned up, but though maybe we should
avoid doing it in the first place. Our k-register handling is weird
and we could probably stand to have some more special ISD nodes
for the conversions so the i32 type would be explicit.
2020-08-25 18:20:43 -07:00
Craig Topper b8ec8f5776 [X86] Remove extra getOperand(0) call from recently introduced store(extract_element(vtrunc)) to truncated store combine.
The IsExtractedElement already called getOperand(0) so Extract
here is the source vector. We shouldn't call getOperand(0). This
worked for the original test cases because the result was a
bitcast so the getOperand(0) accidently peeked through the bitcast
which is what we wanted.

In the failing case here, the operand turns out to be undef so
the getOperand(0) asserts because undef has no operands.

Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25184

Differential Revision: https://reviews.llvm.org/D86428
2020-08-25 16:16:54 -07:00
Craig Topper ba319ac47e [X86] Remove a redundant COPY_TO_REGCLASS for VK16 after a KMOVWkr in an isel output pattern.
KMOVWkr produces VK16, there's no reason to copy it to VK16 again.

Test changes are presumably because we were scheduling based on
the COPY that is no longer there.
2020-08-25 15:19:27 -07:00
Freddy Ye e02d081f2b [X86] Support -march=sapphirerapids
Support -march=sapphirerapids for x86.
Compare with Icelake Server, it includes 14 more new features. They are
amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote,
enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86503
2020-08-25 14:21:21 +08:00
Craig Topper f7c87b7e37 [X86] Copy the tuning features and scheduler model from pentium4/x86-64 to generic
This is preparation for making clang default to -mtune=generic when no -march is specified. This will allow the default tuning to be "generic" even though our default march is "pentium4" or "x86-64".

To avoid llc lit test regressions, if no mcpu is specified, I've defaulted tune to use i586 to match the old tuning settings of no CPU. Some tests explicitly used -mcpu=generic which I've removed so they instead get this default of architecture features from generic and tune from i586.

I updated one llvm-mca test to check a different CPU since generic has a scheduler model now

Differential Revision: https://reviews.llvm.org/D86312
2020-08-24 14:47:10 -07:00
Fangrui Song bef684154d [X86][FastISel] Support materializing floating-point constants for large code model & PIC
The following program miscompiles because rL216012 added static
relocation model support but not for PIC.

```
// clang -fpic -mcmodel=large -O0 a.cc
double foo() { return 42.0; }
```

This patch adds PIC support.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D86024
2020-08-23 08:36:18 -07:00
Hiroshi Yamauchi 28ccc52c40 [X86] Add feature for Fast Short REP MOV (FSRM) for Icelake or newer.
Differential Revision: https://reviews.llvm.org/D85989
2020-08-19 13:39:42 -07:00
Simon Pilgrim 057bdd63a4 [X86][AVX] lowerShuffleWithVPMOV - minor refactor to more closely match lowerShuffleAsVTRUNC
Replace isBuildVectorAllZeros check by using the Zeroable bitmask instead.
2020-08-19 14:34:32 +01:00
Simon Pilgrim 9fee2bad6d [X86] lowerShuffleWithVPMOV - remove unnecessary shuffle commutation. NFCI.
canonicalizeShuffleMaskWithCommute should have already ensured the lower elements are from V1, we do have test coverage for this already.
2020-08-19 13:28:59 +01:00
Simon Pilgrim b61cef3a92 [X86][AVX] getAVX512TruncNode - don't truncate from illegal vector widths.
Thanks to @fhahn for the test case.
2020-08-19 13:00:26 +01:00
Simon Pilgrim 80a0dc59b7 [X86][AVX] computeKnownBitsForTargetNode - add VTRUNC/VTRUNCS/VTRUNCUS known zero upper elements handling.
Like many of the AVX512 conversion ops, the VTRUNC ops guarantee the upper destination elements are zero.
2020-08-19 11:39:27 +01:00
Simon Pilgrim 46fc9a0dfc [X86][AVX] Fold store(extract_element(vtrunc)) to truncated store
Add handling for storing the extracted lower (truncated bits) element from a X86ISD::VTRUNC node - this can be lowered to a generic truncated store directly.

Differential Revision: https://reviews.llvm.org/D86158
2020-08-19 11:10:20 +01:00
Craig Topper 9028c03ce6 [X86] Fix the Predicates on MMX_PSHUFWri/PSHUFWmi to include SSE1 in addition to MMX.
These instructions weren't in the initial version of MMX, but
were added when SSE1 was introduced. We already have the intrinsic
named correctly to include sse and the frontened header enforces
sse. We have one place in the backend where we DAG combine to
this intrinsic, but that's also qualified. So don't know of anything
currently broken unless someone writes their own IR and doesn't
set the sse feature.
2020-08-18 14:28:26 -07:00
Simon Pilgrim 11ff5176c4 [X86][AVX] lowerShuffleWithVPMOV - add non-VLX support.
We can efficiently handle non-VLX cases now that we have the getAVX512TruncNode helper.
2020-08-18 17:51:14 +01:00
Simon Pilgrim abd33bf5ef [X86][AVX] lowerShuffleWithPERMV - pad 128/256-bit shuffles on non-VLX targets
Allow non-VLX targets to use 512-bits VPERMV/VPERMV3 for 128/256-bit shuffles.

TBH I'm not sure these targets actually exist in the wild, but we're testing for them and its good test coverage for shuffle lowering/combines across different subvector widths.
2020-08-18 15:46:02 +01:00
Simon Pilgrim 011bf4fd96 [X86][AVX] lowerShuffleWithVTRUNC - extend to support v16i16/v32i8 binary shuffles.
This requires a few additional SrcVT vs DstVT padding cases in getAVX512TruncNode.
2020-08-18 15:30:02 +01:00
Simon Pilgrim d5621b83a5 [X86][AVX] lowerShuffleWithVTRUNC - pull out TRUNCATE/VTRUNC creation into helper code. NFCI.
Prep work toward adding v16i16/v32i8 support for lowerShuffleWithVTRUNC and improving lowerShuffleWithVPMOV.
2020-08-18 14:52:42 +01:00
Simon Pilgrim 7db5124736 [X86][AVX] lowerShuffleWithVTRUNC - avoid unnecessary division in element counts. NFCI.
(256 / SrcEltBits) == ((2 * EltSizeInBits * NumElts) / (EltSizeInBits * Scale)) == (2 * (NumElts / Scale)) == NumSrcElts
2020-08-18 13:48:22 +01:00
Simon Pilgrim d2057a8015 [X86][AVX] Lower v16i8/v8i16 binary shuffles using VTRUNC/TRUNCATE
This patch adds lowerShuffleWithVTRUNC to handle basic binary shuffles that can be lowered either as a pure ISD::TRUNCATE or a X86ISD::VTRUNC (with undef/zero values in the remaining upper elements).

We concat the binary sources together into a single 256-bit source vector. To avoid regressions we perform this after we've tried to lower with PACKS/PACKUS which typically does a cleaner job than a concat.

For non-AVX512VL cases we have to canonicalize VTRUNC cases to use a 512-bit source vectors (inserting undefs/zeros in the upper elements as necessary), truncate and then (possibly) extract the 128-bit result.

This should address the last regressions in D66004

Differential Revision: https://reviews.llvm.org/D86093
2020-08-18 11:11:58 +01:00
Craig Topper b673dfbb9a [X86] When manually creating intrinsic nodes in X86ISelLowering, make sure we use getTargetConstant and pointer type for the intrinsic ID.
Doesn't really matter in practice but that's how the nodes are
normally created by SelectionDAGBuilder. So we should match.

Found by temporarily hacking type checks into isel table.
2020-08-17 17:25:53 -07:00
Craig Topper 2ffa5d218f [X86] Rename INTR_TYPE_4OP to INTR_TYPE_4OP_IMM8 and truncate immediates to MVT::i8
This makes sure VPTERNLOG is generated with MVT::i8 immediate
as its SDNode declaration in X86InstrFragmentsSIMD.td declares.
2020-08-17 17:25:52 -07:00
Craig Topper bc244f08cf [X86] Truncate immediate to i8 for INTR_TYPE_3OP_IMM8
This is used for DBPSADBW which has a i32 immediate for its
intrinsic and an i8 immediate in tablegen isel patterns.
2020-08-17 17:25:51 -07:00
Craig Topper ab7151f1cf [X86] Make PreprocessISelDAG create X86ISD::VRNDSCALE nodes with i32 constants instead of i8.
This is the type declared in X86InstrFragmentsSIMD.td. ISel pattern
matching doesn't check so it doesn't matter in practice. Maybe for
SelectionDAG CSE it would matter.
2020-08-17 17:25:51 -07:00
Hongtao Yu 819b2d9c79 [llvm-objdump] Symbolize binary addresses for low-noisy asm diff.
When diffing disassembly dump of two binaries, I see lots of noises from mismatched jump target addresses and global data references, which unnecessarily causes diffs on every function, making it impractical. I'm trying to symbolize the raw binary addresses to minimize the diff noise.
In this change, a local branch target is modeled as a label and the branch target operand will simply be printed as a label. Local labels are collected by a separate pre-decoding pass beforehand. A global data memory operand will be printed as a global symbol instead of the raw data address. Unfortunately, due to the way the disassembler is set up and to be less intrusive, a global symbol is always printed as the last operand of a memory access instruction. This is less than ideal but is probably acceptable from checking code quality point of view since on most targets an instruction can have at most one memory operand.

So far only the X86 disassemblers are supported.

Test Plan:

llvm-objdump -d  --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:

<_start>:
               	push	rax
               	mov	dword ptr [rsp + 4], 0
               	mov	dword ptr [rsp], 0
               	mov	eax, dword ptr [rsp]
               	cmp	eax, dword ptr [rip + 4112]  # 202182 <g>
               	jge	0x20117e <_start+0x25>
               	call	0x201158 <foo>
               	inc	dword ptr [rsp]
               	jmp	0x201169 <_start+0x10>
               	xor	eax, eax
               	pop	rcx
               	ret
```

llvm-objdump -d  **--symbolize-operands** --x86-asm-syntax=intel --no-show-raw-insn --no-leading-addr :
```
Disassembly of section .text:

<_start>:
               	push	rax
               	mov	dword ptr [rsp + 4], 0
               	mov	dword ptr [rsp], 0
<L1>:
               	mov	eax, dword ptr [rsp]
               	cmp	eax, dword ptr  <g>
               	jge	 <L0>
               	call	 <foo>
               	inc	dword ptr [rsp]
               	jmp	 <L1>
<L0>:
               	xor	eax, eax
               	pop	rcx
               	ret
```

Note that the jump instructions like `jge 0x20117e <_start+0x25>` without this work is printed as a real target address and an offset from the leading symbol. With a change in the optimizer that adds/deletes an instruction, the address and offset may shift for targets placed after the instruction. This will be a problem when diffing the disassembly from two optimizers where there are unnecessary false positives due to such branch target address changes. With `--symbolize-operand`, a label is printed for a branch target instead to reduce the false positives. Similarly, the disassemble of PC-relative global variable references is also prone to instruction insertion/deletion.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D84191
2020-08-17 16:55:12 -07:00
Simon Pilgrim 1d2ede87ea [X86][AVX] Move lowerShuffleWithVPMOV inside explicit shuffle lowering cases
Perform lowerShuffleWithVPMOV as part of the v16i8/v8i16 shuffle lowering stages, which are the only types that are currently supported.

We need to expand support for lowering shuffles as truncations to fix the remaining regressions in D66004
2020-08-17 11:58:51 +01:00
Craig Topper a206f85091 [X86] Reject dirflag in inline asm constraints other than clobber.
Fixes the crash from PR47195.
2020-08-16 23:33:45 -07:00
Simon Pilgrim f25d47b7ed [X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types
We can now enable this for AVX1 targets can now assist with canonicalizeShuffleMaskWithHorizOp cleanup.

There's still a few missed opportunities for merging subvector insert/extracts into shuffles, but they shouldn't cause any regressions now.
2020-08-16 15:00:41 +01:00
Simon Pilgrim dca7eb7d60 [X86][SSE] Replace combineShuffleWithHorizOp with canonicalizeShuffleMaskWithHorizOp
Instead of just attempting to fold shuffle(HOP,HOP) for a specific target shuffle, make this part of combineX86ShufflesRecursively so we can perform this on the combined shuffle chain, which is particularly useful for recognising more cases of where we're performing multiple HOPs that can be merged and pre-AVX where we don't have good blend/unary target shuffle support.
2020-08-16 12:26:27 +01:00
Simon Pilgrim c27baa54b7 [X86] isRepeatedTargetShuffleMask - don't require specific MVT type. NFC.
Split the isRepeatedTargetShuffleMask into a wrapper variant that takes a MVT describing the mask width, and an internal version that just needs the raw mask element bit size.

This will be necessary for an upcoming change where the horizontal ops element width might not match the shuffle mask element width.
2020-08-16 11:51:44 +01:00
Craig Topper c7a0b2684f [X86][MC][Target] Initial backend support a tune CPU to support -mtune
This patch implements initial backend support for a -mtune CPU controlled by a "tune-cpu" function attribute. If the attribute is not present X86 will use the resolved CPU from target-cpu attribute or command line.

This patch adds MC layer support a tune CPU. Each CPU now has two sets of features stored in their GenSubtargetInfo.inc tables . These features lists are passed separately to the Processor and ProcessorModel classes in tablegen. The tune list defaults to an empty list to avoid changes to non-X86. This annoyingly increases the size of static tables on all target as we now store 24 more bytes per CPU. I haven't quantified the overall impact, but I can if we're concerned.

One new test is added to X86 to show a few tuning features with mismatched tune-cpu and target-cpu/target-feature attributes to demonstrate independent control. Another new test is added to demonstrate that the scheduler model follows the tune CPU.

I have not added a -mtune to llc/opt or MC layer command line yet. With no attributes we'll just use the -mcpu for both. MC layer tools will always follow the normal CPU for tuning.

Differential Revision: https://reviews.llvm.org/D85165
2020-08-14 15:31:50 -07:00
Simon Pilgrim e9eb2dc332 [X86][SSE] Fold HOP(SHUFFLE(X),SHUFFLE(Y)) --> SHUFFLE(HOP(X,Y))
This is beginning to look like a canonicalization stage that could be performed as part of shuffle combining

Another step towards PR41813

Recommit of rG9bd97d036398 with fixed offset adjustments
2020-08-14 18:43:19 +01:00
Simon Pilgrim cd3b850a4c rG9bd97d0363987b582 - Revert "[X86][SSE] Fold HOP(SHUFFLE(X),SHUFFLE(Y)) --> SHUFFLE(HOP(X,Y))"
This reverts commit 9bd97d0363.

Seeing some codegen issues in internal testing.
2020-08-13 15:21:15 +01:00
Simon Pilgrim a31d20e67e [X86][SSE] IsElementEquivalent - add HOP(X,X) support
For HADD/HSUB/PACKS ops with repeated operands the lower/upper half element of each lane are known to be equivalent
2020-08-13 12:42:59 +01:00
Simon Pilgrim 39de63aef9 Fix signed/unsigned comparison warnings. NFC. 2020-08-12 19:22:13 +01:00
Simon Pilgrim 13d6cf0951 [X86][SSE] Pull out BUILD_VECTOR operand equivalence tests. NFC.
Pull out element equivalence code from isShuffleEquivalent/isTargetShuffleEquivalent, I've also removed many of the index modulos where possible.

First step toward simply adding some additional equivalence tests.
2020-08-12 18:20:18 +01:00
Craig Topper 5f7cdb2eff [X86][GlobalISel] Legalize G_ICMP results to s8.
We need to produce a setcc instruction which has an 8-bit result.
This gets rid of a bunch of cases that were using the s1->s8/s16/s32/s64
handling in selectZExt.

I'm not very familiar with GlobalISel yet so I'm not yet sure
the best way to do things. I'd especially like feedback on the
best way to handle the currently split 32-bit and 64-bit mode
handling.

Differential Revision: https://reviews.llvm.org/D85814
2020-08-12 10:13:59 -07:00
Simon Pilgrim 9bd97d0363 [X86][SSE] Fold HOP(SHUFFLE(X),SHUFFLE(Y)) --> SHUFFLE(HOP(X,Y))
This is beginning to look like a canonicalization stage that could be performed as part of shuffle combining

Another step towards PR41813
2020-08-12 12:16:36 +01:00
Simon Pilgrim a0c2c6aa42 [X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for float types
Only do this for AVX2+ targets as we still get some regressions on AVX1 without PERMPD/PERMQ
2020-08-12 11:31:05 +01:00
Craig Topper 6b3dc96e59 [X86][GlobalISel] Replace a misuse of SUBREG_TO_REG with INSERT_SUBREG.
SUBREG_TO_REG is supposed to be used when we know the producing
instruction already zeroed the bits we're extending. But that's
not the case here. So INSERT_SUBREG with an IMPLICIT_DEF is the
correct thing to use.
2020-08-11 23:51:02 -07:00
Simon Pilgrim 2655bd51d6 [X86][SSE] combineShuffleWithHorizOp - canonicalize SHUFFLE(HOP(X,Y),HOP(Y,X)) -> SHUFFLE(HOP(X,Y))
Attempt to canonicalize binary shuffles of HOPs with commuted operands to an unary shuffle.
2020-08-11 18:13:03 +01:00
Eric Christopher 8155cb27a2 Fold Opcode into assert uses to fix an unused variable warning without asserts. 2020-08-11 09:30:51 -07:00
Simon Pilgrim fe1f36986b [X86][SSE] combineShuffleWithHorizOp - avoid unnecessary subtraction. NFCI.
We can safely replace ((M - NumElts) % NumEltsPerLane) with (M % NumEltsPerLane) as the modulo result will be the same.
2020-08-11 17:07:32 +01:00
Simon Pilgrim 91d59cbf1b [X86][SSE] Add HADD/SUB support to combineHorizOpWithShuffle
Handles some HOP(SHUFFLE,SHUFFLE) patterns and sets us up to improve some of the cases mentioned in PR41813.
2020-08-11 16:14:14 +01:00
Kerry McLaughlin 85c7e89f3b [CodeGen] Refactor getMemBasePlusOffset & getObjectPtrOffset to accept a TypeSize
Changes the Offset arguments to both functions from int64_t to TypeSize
& updates all uses of the functions to create the offset using TypeSize::Fixed()

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D85220
2020-08-11 12:17:10 +01:00
Simon Pilgrim 49016eeab6 [X86] Rename combineVectorPackWithShuffle -> combineHorizOpWithShuffle. NFC.
The plan is to use this for (F)HADD/SUB opcodes as well as PACKs - similar to how we use combineShuffleWithHorizOp
2020-08-11 11:38:43 +01:00
Craig Topper 9201efb3b9 [X86] Custom match X86ISD::VPTERNLOG in X86ISelDAGToDAG in order to reduce isel patterns.
By factoring out the end of tryVPTERNLOG, we can use the same code
to directly match X86ISD::VPTERNLOG. This allows us to remove
around 3-4K worth of X86GenDAGISel.inc.
2020-08-10 23:15:58 -07:00
Wang, Pengfei 9512525947 [X86][FPEnv] Teach X86 mask compare intrinsics to respect strict FP semantics.
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D85385
2020-08-11 10:28:41 +08:00
Craig Topper 96dfc783b2 [BreakFalseDeps][X86] Move operand loop out of X86's getUndefRegClearance and put in the pass.
X86 is the only user of this interface in tree. Previously the
X86 pass would loop over operands looking for one undef operand for
the pass to fix. But there could theoretically be multiple operands
to fix. So it makes more sense for the pass to do the looping and
ask the target if an operand needs to be fixed.
2020-08-10 10:32:29 -07:00
Simon Pilgrim 9a368d2b00 [X86][SSE] shuffle(hop,hop) - canonicalize unary hop(x,x) shuffle masks
If a shuffle is referring to both the lower and upper half lanes of an unary horizontal op, then canonicalize the mask to only refer to the lower half.
2020-08-10 16:09:27 +01:00
Simon Pilgrim 07e673a02b [X86][SSE] Pull out shuffle(hop,hop) combine into combineShuffleWithHorizOp helper. NFC. 2020-08-10 15:08:57 +01:00
Simon Pilgrim e6dc2c8ce7 [X86][SSE] combineTargetShuffle - rearrange shuffle(hop,hop) matching to delay shuffle mask manipulation. NFC.
Check that we're shuffling hadd/pack ops first before altering shuffle masks.

First step towards adding extra functionality, plus it avoids costly shuffle mask manipulation if not necessary.
2020-08-10 14:13:19 +01:00
Matt Arsenault f9c279b057 PeepholeOptimizer: Use Register 2020-08-10 08:49:36 -04:00
Craig Topper bc8be30540 [X86][GlobalISel] Remove unneeded code for handling zext i8->16, i8->i64, i16->i64, i32->i64.
These all seem to be handled by tablegen pattern imports.
2020-08-09 00:26:15 -07:00
Craig Topper d3153b5ca2 [X86] Remove a DCI.isBeforeLegalize() call from combineVSelectWithAllOnesOrZeros.
This was blocking isTypeLegal call so that we could do a particular
transform on illegal types before type legalization. But the we
create a target specific node using that type. We shouldn't do
that if the type isn't legal. So I think we should just always
make sure the type is legal.

I suspect that in order to get the condition VT to not be a vector
of i1 we already completed type legalization anyway so this probably
doesn't matter much in practice.
2020-08-08 14:19:13 -07:00
Craig Topper 966a58e329 [X86] Support matching VPTERNLOG when the root node is X86ISD::ANDNP. 2020-08-08 13:11:47 -07:00
Craig Topper 815a9b256b [X86] Remove isSafeToClobberEFLAGS helper and just inline it into the call sites.
This is just a thin wrapper around computeRegisterLivness which
we can just call directly. The only real difference is that
isSafeToClobberEFLAGS returns a bool and computeRegisterLivness
returns an enum. So we need to check for the specific enum value
that isSafeToClobberEFLAGS was hiding.

I've also adjusted which sites pass an explicit value for
Neighborhood since the default for computeRegisterLivness is 10.
2020-08-08 12:31:58 -07:00
Craig Topper 8d3ae64b04 Recommit "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places"
I messed up the bug numbers in the commit message before

Previously this function searched 4 instructions forwards or
backwards to determine if it was ok to clobber eflags.

This is called in 3 places: rematerialization, turning 2 operand
leas into adds or splitting 3 ops leas into an lea and add on some
CPU targets.

This patch increases the search limit to 10 instructions for
rematerialization and 2 operand lea to add. I've left the old
treshold for 3 ops lea spliting as that increases code size.

Fixes PR47024 and PR46315.
2020-08-08 11:53:14 -07:00
Craig Topper 761f568420 Revert "[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places"
This reverts commit 44b260cb0a.

I messed up the bug number in the commit message so I'm reverting
to fix it.
2020-08-08 11:53:14 -07:00
Simon Pilgrim cc15380f10 [X86][SSE] combineTargetShuffle - use scaleShuffleMask helper to widen shuffle mask. NFCI.
Use scaleShuffleMask helper for the shuffle(hadd,hadd) canonicalization.
2020-08-08 19:36:18 +01:00
Craig Topper 44b260cb0a [X86] Increase the number of instructions searched for isSafeToClobberEFLAGS in a couple places
Previously this function searched 4 instructions forwards or
backwards to determine if it was ok to clobber eflags.

This is called in 3 places: rematerialization, turning 2 operand
leas into adds or splitting 3 ops leas into an lea and add on some
CPU targets.

This patch increases the search limit to 10 instructions for
rematerialization and 2 operand lea to add. I've left the old
treshold for 3 ops lea spliting as that increases code size.

Fixes PR47024 and PR43014
2020-08-08 11:29:41 -07:00
Craig Topper 514b00c439 [X86] Limit the scope of the min/max canonicalization in combineSelect
Previously the transform was doing these two canonicalizations
(x > y) ? x : y -> (x >= y) ? x : y
(x < y) ? x : y -> (x <= y) ? x : y

But those don't seem to be useful generally. And they actively
pessimize the cases in PR47049.

This patch limits it to
(x > 0) ? x : 0 -> (x >= 0) ? x : 0
(x < -1) ? x : -1 -> (x <= -1) ? x : -1

These are the cases mentioned in the comments as the motivation
for the canonicalization. These allow the CMOV to use the S
flag from the compare thus improving opportunities to use a TEST
or the flags from an arithmetic instruction.
2020-08-07 22:51:49 -07:00
Keno Fischer c58674df14 [X86] Don't produce bad x86andp nodes for i1 vectors
In D85499, I attempted to fix this same issue by canonicalizing
andnp for i1 vectors, but since there was some opposition to such
a change, this commit just fixes the bug by using two different
forms depending on which kind of vector type is in use. We can
then always decide to switch the canonical forms later.

Description of the original bug:
We have a DAG combine that tries to fold (vselect cond, 0000..., X) -> (andnp cond, x).
However, it does so by attempting to create an i64 vector with the number
of elements obtained by truncating division by 64 from the bitwidth. This is
bad for mask vectors like v8i1, since that division is just zero. Besides,
we don't want i64 vectors anyway. For i1 vectors, switch the pattern
to (andnp (not cond), x), which is the canonical form for `kandn`
on mask registers.

Fixes https://github.com/JuliaLang/julia/issues/36955.

Differential Revision: https://reviews.llvm.org/D85553
2020-08-07 20:05:47 -04:00
Craig Topper 0215ae9735 [X86] Remove incomplete custom handling of i128 sdivrem/udivrem on Windows.
We need to have special handling of i128 div/rem on Windows due
to a weird calling convention needed for the libcall. There was
also some code that made it look like we do the same for sdivrem/udiv,
but the code didn't account for multiple return values of those
functions so couldn't possibly work. I think this code never
triggers because we don't have libcall names defined for those
functions by default so DAGCombine never creates DIVREM nodes.
2020-08-05 23:01:07 -07:00
Craig Topper 08b2d0a963 [X86] Disable copy elision in LowerMemArgument for scalarized vectors when the loc VT is a different size than the original element.
For example a v4f16 argument is scalarized to 4 i32 values. So
the values are spread out instead of being packed tightly like
in the original vector.

Fixes PR47000.
2020-08-05 15:44:54 -07:00
Simon Pilgrim b60f998859 [X86][SSE] Fold 128-bit PACK(EXTEND(X),EXTEND(Y)) -> CONCAT(X,Y) subvectors
This is seen in the sub-128-bit vector trunc(ext()) of comparison results

Fixes pr46585.ll regression in D66004
2020-08-05 18:27:40 +01:00
Simon Pilgrim 6a06c7a0a7 [X86] isHorizontalBinOp - only update LHS/RHS references on success
We've had issues in the past where isHorizontalBinOp calls would affect later combines as the LHS/RHS references had been commuted but still failed to match.
2020-08-05 15:09:52 +01:00
Simon Pilgrim a57bfb44bc [X86][AVX] Fold CONCAT(HOP(X,Y),HOP(Z,W)) -> HOP(CONCAT(X,Z),CONCAT(Y,W)) for integer types 2020-08-05 15:09:51 +01:00
Krzysztof Parzyszek 09897b146a [RDF] Remove uses of RDFRegisters::normalize (deprecate)
This function has been reduced to an identity function for some time.
2020-08-04 17:02:12 -05:00
Simon Pilgrim 6f0da46d53 [X86] getFauxShuffleMask - drop unnecessary computeKnownBits OR(X,Y) shuffle decoding.
Now that rG47cea9e82dda941e lets us aggressively decode multi-use shuffles for the OR(SHUFFLE(),SHUFFLE()) case we don't need the computeKnownBits variant any more.
2020-08-04 15:57:47 +01:00
Simon Pilgrim 051f293b78 [X86] Remove unused canScaleShuffleElements helper
The only use was removed at rG36750ba5bd0e9e72

Thanks to @nemanjai for the heads up
2020-08-04 14:51:23 +01:00
Simon Pilgrim 36750ba5bd [X86][AVX] isHorizontalBinOp - relax lane-crossing limits for AVX1-only targets.
Permit lane-crossing post shuffles on AVX1 targets as long as every element comes from the same source lane, which for v8f32/v4f64 cases can be efficiently lowered with the LowerShuffleAsLanePermuteAnd* style methods.
2020-08-04 14:27:01 +01:00