The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded.
llvm-svn: 290573
I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp.
There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store.
In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern.
Differential Revision: https://reviews.llvm.org/D27899
llvm-svn: 290250
Ideally ISD::FP_TO_SINT and ISD::FP_TO_UINT would only be used for cases with the same number of input and output elements.
Similar things have already been done for other convert intrinsics.
llvm-svn: 289316
Summary:
Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection.
This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits.
We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version.
We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that.
This fixes PR30913.
Reviewers: delena, zvi, v_klochkov
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27144
llvm-svn: 289190
Replace the CVTTPD2DQ/CVTTPD2UDQ and CVTDQ2PD/CVTUDQ2PD opcodes with general versions.
This is an initial step towards similar FP_TO_SINT/FP_TO_UINT and SINT_TO_FP/UINT_TO_FP lowering to AVX512 CVTTPS2QQ/CVTTPS2UQQ and CVTQQ2PS/CVTUQQ2PS with illegal types.
Differential Revision: https://reviews.llvm.org/D27072
llvm-svn: 287870
The same thing was done to 32-bit and 64-bit element sizes previously.
This will allow us to support these shuffls in InstCombineCalls along with the other variable shift intrinsics.
llvm-svn: 287312
Both the (V)CVTDQ2PD (i32 to f64) and (V)CVTUDQ2PD (u32 to f64) conversion instructions are lossless and can be safely represented as generic SINT_TO_FP/UINT_TO_FP calls instead of x86 intrinsics without affecting final codegen.
LLVM counterpart to D26686
Differential Revision: https://reviews.llvm.org/D26736
llvm-svn: 287108
These will be used to replace the masked intrinsics so that InstCombineCalls can optimize the AVX-512 variable shifts the same way it does for AVX2.
llvm-svn: 286754
After this I'll add the unmasked intrinsics to InstCombineCalls to finish making our handling of these types of shuffles consistent between AVX-512 and the legacy intrinsics.
llvm-svn: 286725
Summary:
This is the first step towards being able to add the avx512 shift by immediate intrinsics to InstCombineCalls where we aleady support the sse2 and avx2 intrinsics. We need to the unmasked versions so we can avoid having to teach InstCombineCalls that it would need to insert selects sometimes. Instead we'll just add the selects around the new instrinsics in the frontend.
This change should also enable the shift by i32 intrinsics to take a non-constant shift value just like the avx2 and sse intrinsics. This will enable us to fix PR30691 once we update clang.
Next I'll switch clang to use the new builtins. Then we'll come back to the backend and remove/autoupgrade the old intrinsics. Then I'll work on the same series for variable shifts.
Reviewers: RKSimon, zvi, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26333
llvm-svn: 286711
This patch adds support for fptoui to 2i32 from both 2f64 and 2f32, building on Simon's change for the signed version in r284459 and using AVX-512 instructions.
If we don't have VLX support we need to use a 512-bit operation for v2f64->v2i32 and extract the result.
It also recognises that cvttpd2udq zeroes the upper 64-bits of the xmm result.
Differential Revision: https://reviews.llvm.org/D26331
llvm-svn: 286345
Summary: This allows the SSE intrinsic to use the EVEX instruction when available. It also fixes EVEX to not use a weird (v4i32 (fp_to_sint v2f64)) node and it merges some isel patterns. This also fixes some cases that weren't combining vzmovl with cvttpd2dq to remove extra moves.
Reviewers: delena, zvi, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26330
llvm-svn: 286344
This removes a couple tablegen classes that become unused after this change. Another class gained an additional parameter to allow PMADDUBSW to specify a different result type from its input type.
llvm-svn: 285515
Summary: Clang's intrinsic header currently tries to negate the third operand of a vfmadd mask3 in order to create vfmsub, but this fails isel. This patch adds scalar vfmsub and vfnmsub mask3 that we can use instead to avoid the negate. This is consistent with the packed instructions.
Reviewers: igorb, delena
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D25933
llvm-svn: 285173
This revealed that scalar intrinsics could create nodes with a rounding mode of FROUND_CUR_DIRECTION, but the patterns didn't check for it. It just worked because isel doesn't check operand count and we had a pattern without the rounding mode argument at all.
llvm-svn: 282231
It turns out isel is really not robust against having different type profiles for the same opcode. It turns out that if you put an illegal rounding mode(i.e. not CUR_DIRECTION or NO_EXC) on a comiss intrinsic we would generate the FSETCC form with the rounding mode added, but then pattern match to an instruction with ROUND_CUR_DIRECTION.
We can probably get away with just one FSETCCM opcode that always contains the rounding mode and explicitly put ROUND_CUR_DIRECTION in the pattern, but I'll leave that for future work.
With this change the clang tests for the comiss intrinsics that used an incorrect rounding mode of 3 properly fail isel instead of silently doing the wrong thing. Those clang tests will be fixed in a follow up commit and I also plan to add rounding mode checking to clang.
llvm-svn: 282055
There was no way to control its value so it was always FROUND_CURRENT making it unnecessary. The true rounding mode is encoded in the immediate operand of the instruction.
This also removes the pattern from the rb form of the instructions since there is no way to specify the FROUND_NO_EXC rounding mode it required.
llvm-svn: 282052