This fixes selection of KANDN instructions and allows us to remove an extra set of patterns for KNOT and KXNOR.
Reviewers: delena, igorb
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D26134
llvm-svn: 285878
Summary:
In the case where of 'select i1 , f32, f32' or select i1, f64, f64 prefer lowering to masked-moves over branches.
Fixes pr30561
Reviewers: igorb, aymanmus, delena
Differential Revision: https://reviews.llvm.org/D25310
llvm-svn: 285196
Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available.
llvm-svn: 280648
Previously we weren't creating masked logical operations if bitcasts appeared between the logic operation and the select. The IR optimizers can move bitcasts across logic operations and create these cases. To minimize the number of cases we need to handle, this change promotes all logic ops to an i64 vector type just like when only SSE or AVX is available.
Unfortunately, this also has the consequence of making it difficult to select unmasked VPANDD/VPORD/VPXORD in all the cases it was previously used. This is the cause of most of the test change. This shouldn't result in any functional change though.
llvm-svn: 279929
Optimized lowering of BITCAST node. The BITCAST node can be replaced with COPY_TO_REG instead of KMOV.
It allows to suppress two opposite BITCAST operations and avoid redundant "movs".
Differential Revision: https://reviews.llvm.org/D23247
llvm-svn: 277958
I'm not convinced the patterns for the rm_Int was correct anyway. It had a tied source that should't exist for the unmasked version. The load form of MOVSS always zeros the most significant bits. I've left the patterns off the masked load instructions as I'm not sure what the correct pattern should be and we don't have any tests currently. Nor do we implement masked scalar load intrinsics in clang currently.
llvm-svn: 277098
An identity COPY like this:
%AL = COPY %AL, %EAX<imp-def>
has no semantic effect, but encodes liveness information: Further users
of %EAX only depend on this instruction even though it does not define
the full register.
Replace the COPY with a KILL instruction in those cases to maintain this
liveness information. (This reverts a small part of r238588 but this
time adds a comment explaining why a KILL instruction is useful).
llvm-svn: 274952
This re-applies r268760, reverted in r268794.
Fixes http://llvm.org/PR27670
The original imp-defs assertion was way overzealous: forward all
implicit operands, except imp-defs of the new super-reg def (r268787
for GR64, but also possible for GR16->GR32), or imp-uses of the new
super-reg use.
While there, mark the source use as Undef, and add an imp-use of the
old source reg: that should cover any case of dead super-regs.
At the stage the pass runs, flags are unlikely to matter anyway;
still, let's be as correct as possible.
Also add MIR tests for the various interesting cases.
Original commit message:
Codesize is less (16) or equal (8), and we avoid partial
dependencies.
Differential Revision: http://reviews.llvm.org/D19999
llvm-svn: 268831
xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) was only being combined at the AfterLegalizeTypes stage, this patch permits the combine to occur anytime before then as well.
The main aim with this to improve the ability to recognise bitmasks that can be converted to shuffles.
I had to modify a number of AVX512 mask tests as the basic bitcast to/from scalar pattern was being stripped out, preventing testing of the mmask bitops. By replacing the bitcasts with loads we can get almost the same result.
Differential Revision: http://reviews.llvm.org/D18944
llvm-svn: 265998
fixed extract-insert i1 element,
load i1, zextload i1 should be with "and $1, %reg" to prevent loading garbage.
added a bunch of new tests.
llvm-svn: 237793
"[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros."
llvm-svn: 221028
This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case.
Added tests for vselect of <X x i1> values.
llvm-svn: 220777
Added scalar compare VCMPSS, VCMPSD.
Implemented LowerSELECT for scalar FP operations.
I replaced FSETCCss, FSETCCsd with one node type FSETCCs.
Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1.
llvm-svn: 197384