Commit Graph

14 Commits

Author SHA1 Message Date
Craig Topper 058f2f6d72 [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers
We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.

This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.

I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.

Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.

This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.

Differential Revision: https://reviews.llvm.org/D30968

llvm-svn: 298928
2017-03-28 16:35:29 +00:00
Craig Topper 2012dda9a0 [AVX-512] Add a few more patterns for selecting masked vpternlog with broadcast loads where the passthru operand is not operand 0.
llvm-svn: 295673
2017-02-20 17:44:09 +00:00
Craig Topper c6c68f5958 [AVX-512] Add more patterns to fold masked VPTERNLOG with load when the passthru isn't operand 0.
llvm-svn: 295640
2017-02-20 07:00:40 +00:00
Craig Topper 5aef828ba7 [AVX-512] Add tests for missed opportunities to fold masked VPTERNLOG with load when the passthru op isn't operand 0.
llvm-svn: 295639
2017-02-20 07:00:37 +00:00
Craig Topper a5fa2e40f9 [AVX-512] Fix mistake in the immediate swizzle for some of the VPTERNLOG patterns.
llvm-svn: 295638
2017-02-20 07:00:34 +00:00
Craig Topper cb5b45cc36 [AVX-512] Use a better immediate in the VPTERNLOG commuting tests so its easier to spot bad swizzling.
llvm-svn: 295637
2017-02-20 07:00:31 +00:00
Craig Topper 5b4e36aafa [AVX-512] Add more VPTERNLOG patterns to enable folding of broadcast loads that aren't in operand 2.
llvm-svn: 295634
2017-02-20 02:47:42 +00:00
Craig Topper 489057715e [AVX-512] Disable peephole optimizations on the VPTERNLOG commute test. Add new patterns to enable isel to fold the loads on it own.
llvm-svn: 295616
2017-02-19 21:32:15 +00:00
Craig Topper 4e794c71a6 [AVX-512] Add patterns to recognize masked vpternlog when the passthrough operand is not operand 0.
This uses a SDNodeXForm to swizzle the appropriate immediate bits to allow this to be matched.

llvm-svn: 295612
2017-02-19 19:36:58 +00:00
Craig Topper ab1afa85ba [AVX-512] Add test cases that show failure to select masked VPTERNLOG when a select is used to force the passthru operand to be not operand 0.
llvm-svn: 295611
2017-02-19 19:36:54 +00:00
Craig Topper 218d1a020e [AVX-512] Add broadcast VPTERNLOG instructions to special case commuting switch.
The instructions are marked commutable, but without special handling we don't get the immediate correct.

While here also remove the masked memory forms that aren't commutable.

llvm-svn: 295602
2017-02-19 08:03:26 +00:00
Craig Topper 94de4b9330 [AVX-512] Add patterns to show missed opportunities for folding vpternlog with broadcast loads. Also demonstrates a bug in the commuting of broadcast vpternlog instructions when we are able to select them.
llvm-svn: 295601
2017-02-19 08:03:23 +00:00
Craig Topper 202b453a8a [AVX-512] Add support for commuting VPTERNLOG instructions.
VPTERNLOG is a ternary instruction with an immediate specifying the logical operation to perform. For each bit position in the 3 source vectors the bit from each source is concatenated together and the resulting 3-bit value is used to select a bit in the immediate. This bit value is written to the result vector.

We can commute this by swapping operands and modifying the immediate. To modify the immediate we need to swap two pairs of bits. The pairs correspond to the locations in the immediate where the commuted operands bits have opposite values and the uncommuted operand has the same value. Bits 0 and 7 will never be swapped since the relevant bits from all sources are the same value.

This refactors and reuses parts of the FMA3 commuting code which is also a three operand instruction.

llvm-svn: 282132
2016-09-22 03:00:50 +00:00
Craig Topper 3639cda748 [AVX-512] Add test cases to demonstrate opportunities for commuting vpternlog. Commuting will be added in a future commit.
llvm-svn: 281157
2016-09-11 05:33:43 +00:00