Summary: This patch changes the kunpck intrinsic autoupgrade to use vXi1 shufflevector operations to perform vector extracts and concats. This more closely matches the definition of the kunpck instructions. Currently we rely on a DAG combine to turn the scalar shift/and/or code into a concat vectors operation. By doing it in the IR we get this for free.
Reviewers: spatel, RKSimon, zvi, jina.nahias
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D42018
llvm-svn: 322462
We have to take special care to avoid the cases where the result of the truncate would be padded with zero elements.
Ideally we'd just use ISD::TRUNCATE for these cases instead.
llvm-svn: 322454
Extend vXi1 conditions of vXi8/vXi16 selects even before type legalization gets a chance to split wide vectors. Previously we would only extend 128 and 256 bit vectors. But if we start with a 512 bit vector or wider that needs to be split we wouldn't extend until after the split had taken place. By extending early we improve the results of type legalization.
Don't widen condition of 128/256 bit vXi16/vXi8 selects when we have BWI but not VLX. We can still use a mask register by widening the select to 512-bits instead. This is similar to what we do for compares already.
llvm-svn: 322450
Additional test cases cover selects with i16/i8 conditions that are only 128/256-bits wide, but the compares are 512-bits wide and can only produce k-registers. We should be able to artificially widen the selects to avoid moving the k-register to an xmm/ymm register.
llvm-svn: 322449
In addition to the existing match as part of a loop-reduction, add a
straightforward pattern match for DAG-contained patterns.
Reviewers: RKSimon, craig.topper
Subscribers: llvm-commits
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D41811
llvm-svn: 322446
This avoids having the result type stick around until lowering where we have to extend the setcc and insert a truncate. If we get the types converted early we can do more to optimize it.
llvm-svn: 322432
While the suffix isn't required to disambiguate the instructions, it is required in order to parse the instructions when the suffix is specified in order to match the GNU assembler.
llvm-svn: 322354
Summary:
Fold cases such as:
(v8i8 truncate (v8i32 extract_subvector (v16i32 sext (v16i8 V), Idx)))
->
(v8i8 extract_subvector (v16i8 V), Idx)
This can be generalized to cases where the truncate and extend do not
fully cancel each other out, but it may require querying the target
about profitability.
Reviewers: RKSimon, craig.topper, spatel, efriedma
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41927
llvm-svn: 322300
Broken test from old attempt for folding tables - we don't peek through extract_subvector spills at all (which is why it doesn't fold), and we already have foldMemoryOperandCustom to handle insertps immediate correction anyway.
llvm-svn: 322292
Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering.
llvm-svn: 322279
Summary:
As RKSimon suggested in pr35820, in the case that Src is smaller in
bit-size than Indices, need to widen Src to avoid type mismatch.
Fixes pr35820
Reviewers: RKSimon, craig.topper
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41865
llvm-svn: 322272
If the index is v2i64 we can use the scatter instruction that has v4i32/v4f32 data register, v2i64 index, and v2i1 mask. Similar was already done for gather.
Implement custom widening for v2i32 data to remove the code that reverses type legalization during lowering.
llvm-svn: 322254
Make sure I really get back to the beahvior before my rewrite in r321035
which turned out not to be completely NFC as I changed the behavior for
the ios simulator environment.
llvm-svn: 322223
Currently we infer the scale at isel time by analyzing whether the base is a constant 0 or not. If it is we assume scale is 1, else we take it from the element size of the pass thru or stored value. This seems a little weird and I think it makes more sense to make it explicit in the DAG rather than doing tricky things in the backend.
Most of this patch is just making sure we copy the scale around everywhere.
Differential Revision: https://reviews.llvm.org/D40055
llvm-svn: 322210
Adds missing coverage for SHUFPD undef argument lowering, and also shows a missed opportunity to remove a unnecessary move compared to 02 shuffle mask.
llvm-svn: 322175
Planning to add support for named vregs. This puts is in a conundrum since
physregs are named as well. To rectify this we need to use a sigil other than
'%' for physregs in MIR. We've settled on using '$' for physregs but first we
must repurpose it from external symbols using it, which is what this commit is
all about. We think '&' will have familiar semantics for C/C++ users.
llvm-svn: 322146
Normally target independent DAG combine would do this combine based on getSetCCResultType, but with VLX getSetCCResultType returns a vXi1 type preventing the DAG combining from kicking in.
But doing this combine can allow us to remove the explicit sign extend that would otherwise be emitted.
This patch adds a target specific DAG combine to combine the sext+setcc when the result type is the same size as the input to the setcc. I've restricted this to FP compares and things that can be represented with PCMPEQ and PCMPGT since we don't have full integer compare support on the older ISAs.
Differential Revision: https://reviews.llvm.org/D41850
llvm-svn: 322101
Summary:
Added the FastVariableShuffle feature to cases that resembled processors
for which this fearure is on.
For AVX2 there are processors with and w/o this fearue enable.
For AVX512 only KNL does enable this feature so cases which only have
+avx512f were left without the FastVariableShuffle enabled.
Reviewers: RKSimon, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D41851
llvm-svn: 322090
Ingredients in this patch:
1. Add HANDLE_LIBCALL defs for finite mathlib functions that correspond to LLVM intrinsics.
2. Plumbing to send TargetLibraryInfo down to SelectionDAGLegalize.
3. Relaxed math and library checking in SelectionDAGLegalize::ConvertNodeToLibcall() to choose finite libcalls.
There was a bug about determining the availability of the finite calls that should be fixed with:
rL322010
Not in this patch:
This doesn't resolve the question/bug of clang creating the intrinsic IR in the first place.
There's likely follow-up work needed to support the long double variants better.
There's room for improvement to reduce the code duplication.
Create finite calls that don't originate from a corresponding intrinsic or DAG node?
Differential Revision: https://reviews.llvm.org/D41338
llvm-svn: 322087
Since register classes and banks are already printed with the register
definition, don't print it at the end of every instruction anymore.
This follows MIR in this regard and is another step to the unification
of the two formats.
llvm-svn: 322086
CET (Control-Flow Enforcement Technology) introduces a new mechanism called IBT (Indirect Branch Tracking).
According to IBT, each Indirect branch should land on dedicated ENDBR instruction (End Branch).
The new pass adds ENDBR instructions for every indirect jmp/call (including jumps using jump tables / switches).
For more information, please see the following:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf
Differential Revision: https://reviews.llvm.org/D40482
Change-Id: Icb754489faf483a95248f96982a4e8b1009eb709
llvm-svn: 322062
The code that checks the immediate wasn't masking to the lower 3-bits like the code in X86InstrInfo.cpp that's used by the peephole pass does.
llvm-svn: 322060
I had to drop fast-isel-abort from a test because we can't fast isel some of the mask stuff. When we used intrinsics we implicitly fell back to SelectionDAG for the intrinsic call without triggering the abort error. But with native IR that doesn't happen the same way.
llvm-svn: 322050