Commit Graph

47 Commits

Author SHA1 Message Date
Craig Topper a406796f5f [X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead of vXi32.
This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX.

I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that.

I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that.

llvm-svn: 326991
2018-03-08 08:02:52 +00:00
Craig Topper a05ed17316 [X86] Make XOP VPCOM instructions commutable to fold loads during isel.
llvm-svn: 325547
2018-02-20 03:58:13 +00:00
Craig Topper a55ac7b790 [X86] Rename 256-bit VFRCZ instructions to have the Y before the rr/rm to match other instructions. NFC
llvm-svn: 323304
2018-01-24 05:14:39 +00:00
Craig Topper 468a813315 [X86] Use Ld scheduler classes for instructions with folded loads.
llvm-svn: 320459
2017-12-12 07:06:35 +00:00
Simon Pilgrim e1490afa4c [X86][XOP] Add missing scheduler classes to XOP instructions
All match equivalent basic classes (WritePHAdd, WriteFAdd etc.) according to both the AMD 15h SOG and Agner's tables.

llvm-svn: 318758
2017-11-21 12:02:18 +00:00
Simon Pilgrim dac6fd4170 [X86][XOP] Merge rotation opcodes with AVX512 equivalents. NFCI.
The XOP rotations act as ROTL with +ve values and ROTR with -ve values, which means that we can treat them all as ROTL with unsigned modulo. We already check that we're only trying to lower as ROTL for XOP rotations.

Differential Revision: https://reviews.llvm.org/D37949

llvm-svn: 314207
2017-09-26 14:12:50 +00:00
Craig Topper 5c7cd25f82 [X86] Remove isel checks for immediate size on floating point compare and xop compare instructions. NFCI
If these checks fail we end up not selecting an instruction at all. So we are already relying on the immediate being checked upstream of isel. So doing the check in isel is just bloat to the isel table. Interestingly, we didn't check on the AVX512 version of the instructions anyway.

llvm-svn: 313724
2017-09-20 06:38:41 +00:00
Ayman Musa 0b4f97d5e9 [X86] Adding FoldGenRegForm helper field (for memory folding tables tableGen backend) to X86Inst class and set its value for the relevant instructions.
Some register-register instructions can be encoded in 2 different ways, this happens when 2 register operands can be folded (separately). 
For example if we look at the MOV8rr and MOV8rr_REV, both instructions perform exactly the same operation, but are encoded differently. Here is the relevant information about these instructions from Intel's 64-ia-32-architectures-software-developer-manual:

Opcode  Instruction  Op/En  64-Bit Mode  Compat/Leg Mode  Description
8A /r   MOV r8,r/m8  RM     Valid        Valid            Move r/m8 to r8.
88 /r   MOV r/m8,r8  MR     Valid        Valid            Move r8 to r/m8.
Here we can see that in order to enable the folding of the output and input registers, we had to define 2 "encodings", and as a result we got 2 move 8-bit register-register instructions.

In the X86 backend, we define both of these instructions, usually one has a regular name (MOV8rr) while the other has "_REV" suffix (MOV8rr_REV), must be marked with isCodeGenOnly flag and is not emitted from CodeGen.

Automatically generating the memory folding tables relies on matching encodings of instructions, but in these cases where we want to map both memory forms of the mov 8-bit (MOV8rm & MOV8mr) to MOV8rr (not to MOV8rr_REV) we have to somehow point from the MOV8rr_REV to the "regular" appropriate instruction which in this case is MOV8rr.

This field enable this "pointing" mechanism - which is used in the TableGen backend for generating memory folding tables.

Differential Revision: https://reviews.llvm.org/D32683

llvm-svn: 304087
2017-05-28 12:39:37 +00:00
Craig Topper 811756b4dc [X86][XOP] Reduce the size of a multiclass by moving more stuff to parameters instead of doing 128-bit and 256-bit simultaneously.
This requires some instructions to be renamed to move the Y earlier in the instruction name. The new names are more consistent with other instructions.

llvm-svn: 295579
2017-02-18 22:53:43 +00:00
Craig Topper de10312bea Recommit "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR."
Clang has now been fixed to not use these intrinsics.

llvm-svn: 295571
2017-02-18 21:50:58 +00:00
Craig Topper ba2a726cc6 Revert "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR."
This reverts r295564. I missed that clang was still using the intrinsics despite our half implemented autoupgrade support.

llvm-svn: 295565
2017-02-18 20:14:20 +00:00
Craig Topper 884db3f85d [X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR.
It seems we were already upgrading 128-bit VPCMOV, but the intrinsic was still defined and being used in isel patterns. While I was here I also simplified the tablegen multiclasses.

llvm-svn: 295564
2017-02-18 19:51:25 +00:00
Simon Pilgrim 8e5ecf8ad1 [X86][XOP] Added support for VPMADCSWD 'extend+hadd' IFMA patterns
VPMADCSWD act as VPADDD( VPMADDWD( x, y ), z ) - multiply+extend+hadd and add to v4i32 accumulator

llvm-svn: 292021
2017-01-14 18:52:13 +00:00
Simon Pilgrim b290805e94 [X86][XOP] Added support for VPMACSDQH/VPMACSDQL 'extension' IFMA patterns
VPMACSDQH/VPMACSDQL act as VPADDQ( VPMULDQ( x, y ), z ) - multiply+extending either the odd/even 4i32 input elements and adding to v2i64 accumulator

llvm-svn: 292020
2017-01-14 18:08:54 +00:00
Simon Pilgrim a1631749f8 [X86][XOP] Added support for VPMACSWW/VPMACSDD 'lossy' IFMA patterns
VPMACSWW/VPMACSDD act as add( mul( x, y ), z ) - ignoring any upper bits from both the multiply and add stages

llvm-svn: 292019
2017-01-14 17:13:52 +00:00
Craig Topper 7f76c23781 [X86][XOP] Add a reversed reg/reg form for VPROT instructions.
The W bit distinquishes which operand is the memory operand. But if the mod bits are 3 then the memory operand is a register and there are two possible encodings. We already did this correctly for several other XOP instructions.

llvm-svn: 287961
2016-11-26 02:14:00 +00:00
Craig Topper 5f8419da34 [X86] Create a new instruction format to handle 4VOp3 encoding. This saves one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling.
llvm-svn: 279424
2016-08-22 07:38:50 +00:00
Craig Topper 9b20fece81 [X86] Create a new instruction format to handle MemOp4 encoding. This saves one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling.
llvm-svn: 279423
2016-08-22 07:38:45 +00:00
Craig Topper ca0eda3e6a [X86] Merge hasVEX_i8ImmReg into the ImmFormat type which had extra unused encodings. This saves one bit in TSFlags. NFC
llvm-svn: 279412
2016-08-22 01:37:19 +00:00
Simon Pilgrim e85506b6e0 [X86][XOP] Support for VPERMIL2PD/VPERMIL2PS 2-input shuffle instructions
This patch begins adding support for lowering to the XOP VPERMIL2PD/VPERMIL2PS shuffle instructions - adding the X86ISD::VPERMIL2 opcode and cleaning up the usage.

The internal llvm intrinsics were assuming the shuffle mask operand was the same type as the float/double input operands (I guess to simplify the intrinsic definitions in X86InstrXOP.td to a single value type). These needed changing to integer types (matching the clang builtin and the AMD intrinsics definitions), an auto upgrade path is added to convert old calls.

Mask decoding/target shuffle support will be added in future patches.

Differential Revision: http://reviews.llvm.org/D20049

llvm-svn: 271633
2016-06-03 08:06:03 +00:00
Simon Pilgrim a6ba27fbde [X86][XOP] Fixed instruction postfixes to more closely match operands
Suggested by Sanjay in D18189 as the multiple folding options in XOP instructions can be tricky

llvm-svn: 264305
2016-03-24 16:31:30 +00:00
Simon Pilgrim d7c4fce47d [X86][XOP] Merged 128/256 bit 4op instruction definitions. NFCI.
llvm-svn: 264294
2016-03-24 15:28:02 +00:00
Simon Pilgrim 572ca71573 [X86][XOP] Support for VPPERM byte shuffle instruction
This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode.

Differential Revision: http://reviews.llvm.org/D18189

llvm-svn: 264260
2016-03-24 11:52:43 +00:00
Craig Topper 2bf0c0394d [X86] Add some missing reversed forms of XOP instructions.
llvm-svn: 261417
2016-02-20 06:20:17 +00:00
Simon Pilgrim e88dc04c48 [X86][XOP] Add support for the matching of the VPCMOV bit select instruction
XOP has the VPCMOV instruction that performs the common vector bit select operation OR( AND( SRC1, SRC3 ), AND( SRC2, ~SRC3 ) )

This patch adds tablegen pattern matching for this instruction.

Differential Revision: http://reviews.llvm.org/D8841

llvm-svn: 251975
2015-11-03 20:27:01 +00:00
Simon Pilgrim 86c5e85e84 [X86][XOP] Add VPROT instruction opcodes
Added X86ISD opcodes for VPROT vector rotate by variable and by immediate.

llvm-svn: 250620
2015-10-17 19:04:24 +00:00
Craig Topper 4f76372afc [X86] Change all the i8imm operands in XOP instructions to u8imm so the parser will check the size.
llvm-svn: 250147
2015-10-13 05:06:25 +00:00
Simon Pilgrim 52d47e5704 [X86][XOP] Added support for the lowering of 128-bit vector integer comparisons to XOP PCOM/PCOMU instructions.
The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646).

llvm-svn: 249976
2015-10-11 14:15:17 +00:00
Simon Pilgrim 3d11c994f7 [X86][XOP] Added support for the lowering of 128-bit vector shifts to XOP shift instructions
The XOP shifts just have logical/arithmetic versions and the left/right shifts are controlled by whether the value is positive/negative. Because of this I've added new X86ISD nodes instead of trying to force them to use the existing shift nodes.

Additionally Excavator cores (bdver4) support XOP and AVX2 - meaning that it should use the AVX2 shifts when it can and fall back to XOP in other cases.

Differential Revision: http://reviews.llvm.org/D8690

llvm-svn: 248878
2015-09-30 08:17:50 +00:00
Simon Pilgrim 31457d54f7 [X86][XOP] Enable commutation for XOP instructions
Patch to allow XOP instructions (integer comparison and integer multiply-add) to be commuted. The comparison instructions sometimes require the compare mode to be flipped but the remaining instructions can use default commutation modes.

This patch also sets the SSE domains of all the XOP instructions.

Differential Revision: http://reviews.llvm.org/D7646

llvm-svn: 229267
2015-02-14 22:40:46 +00:00
Craig Topper 916708f152 [X86] Add support for parsing and printing the mnemonic aliases for the XOP VPCOM instructions.
llvm-svn: 229078
2015-02-13 07:42:25 +00:00
Craig Topper 68ab0465a0 [X86] Remove the remaining uses of memop from AVX and AVX2 instruction patterns. AVX and AVX2 can handle unaligned loads being folded so we can just use 'load'
llvm-svn: 228551
2015-02-08 22:38:25 +00:00
Craig Topper d402df3ce8 Merge HasVEXPrefix/HasEVEXPrefix/HasXOPPrefix into a 2-bit 'encoding' field in TSFlags.
llvm-svn: 200624
2014-02-02 07:08:01 +00:00
Craig Topper 9e3e38ae3f Add XOP disassembler support. Fixes PR13933.
llvm-svn: 191874
2013-10-03 05:17:48 +00:00
Craig Topper a73be890a1 Add explicit VEX_L tags to all 256-bit instructions. This will allow us to remove code from the code emitters that examined operands to set the L-bit.
llvm-svn: 164202
2012-09-19 06:06:34 +00:00
Craig Topper 71dc02d659 Fix intrinsics for XOP frczss/sd instructions. These instructions only take one source register and zero the upper bits of the destination rather than preserving them.
llvm-svn: 158396
2012-06-13 07:18:53 +00:00
Craig Topper 7afe343be5 Add intrinsics for immediate form of XOP vprot instructions. Use i128mem instead of f128mem for integer XOP instructions.
llvm-svn: 158291
2012-06-10 07:31:56 +00:00
Craig Topper a54893c662 Use XOP vpcom intrinsics in patterns instead of a target specific SDNode type. Remove the custom lowering code that selected the SDNode type.
llvm-svn: 158279
2012-06-09 17:02:24 +00:00
Jia Liu e1d619691b some comment fix for X86 and ARM
llvm-svn: 150902
2012-02-19 02:03:36 +00:00
Jia Liu b22310fda6 Emacs-tag and some comment fix for all ARM, CellSPU, Hexagon, MBlaze, MSP430, PPC, PTX, Sparc, X86, XCore.
llvm-svn: 150878
2012-02-18 12:03:15 +00:00
Craig Topper 4daa67483d Remove most of the intrinsics for XOP VPCMOV instruction. They all aliased to the same instruction with different types. This would be better accomplished with casts in the not yet created xopintrin.h header file.
llvm-svn: 149795
2012-02-05 00:55:56 +00:00
Craig Topper ca29bcfc10 Move some XOP patterns into instruction definition. Replae VPCMOV intrinsic patterns with custom lowering to a target specific nodes.
llvm-svn: 149216
2012-01-30 01:10:15 +00:00
Craig Topper 86e44bc829 Add HasXOP predicate check covering a bunch of XOP intrinsic patterns.
llvm-svn: 149054
2012-01-26 07:51:55 +00:00
Jan Sjödin 21f83d9f36 Add XOP Intrinsics and tests
llvm-svn: 147949
2012-01-11 15:20:20 +00:00
Craig Topper 2ba766ae84 Add disassembler support for VPERMIL2PD and VPERMIL2PS.
llvm-svn: 147368
2011-12-30 06:23:39 +00:00
Craig Topper cd93de93fa Separate the concept of having memory access in operand 4 from the concept of having the W bit set for XOP instructons. Removes ORing W-bits in the encoder and will similarly simplify the disassembler implementation.
llvm-svn: 147366
2011-12-30 04:48:54 +00:00
Jan Sjödin 7c0face455 XOP instructions and encoding tests.
llvm-svn: 146407
2011-12-12 19:37:49 +00:00