Commit Graph

1739 Commits

Author SHA1 Message Date
Craig Topper 321df5b0d4 [X86] Stop promoting integer loads to vXi64
Summary:
Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast.

I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.

I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size.

I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D53306

llvm-svn: 344877
2018-10-21 21:30:26 +00:00
Craig Topper e70c560b6d [X86] Remove some isel patterns that shouldn't be possible.
These included a bitcast of a load from v4f32 to v2f64, but DAG combine should have already changed the type of the load to remove the cast.

llvm-svn: 344573
2018-10-15 23:34:58 +00:00
Simon Pilgrim f09fc3bc12 [X86] Move ReadAfterLd functionality into X86FoldableSchedWrite (PR36957)
Currently we hardcode instructions with ReadAfterLd if the register operands don't need to be available until the folded load has completed. This doesn't take into account the different load latencies of different memory operands (PR36957).

This patch adds a ReadAfterFold def into X86FoldableSchedWrite to replace ReadAfterLd, allowing us to specify the load latency at a scheduler class level.

I've added ReadAfterVec*Ld classes that match the XMM/Scl, XMM and YMM/ZMM WriteVecLoad classes that we currently use, we can tweak these values in future patches once this infrastructure is in place.

Differential Revision: https://reviews.llvm.org/D52886

llvm-svn: 343868
2018-10-05 17:57:29 +00:00
Craig Topper 12c18840fa [X86] Allow movmskpd/ps ISD nodes to be created and selected with integer input types.
This removes an int->fp bitcast between the surrounding code and the movmsk. I had already added a hack to combineMOVMSK to try to look through this bitcast to improve the SimplifyDemandedBits there.

But I found an additional issue where the bitcast was preventing combineMOVMSK from being called again after earlier nodes in the DAG are optimized. The bitcast gets revisted, but not the user of the bitcast. By using integer types throughout, the bitcast doesn't get in the way.

llvm-svn: 343046
2018-09-25 23:28:27 +00:00
Craig Topper 8238580aae [X86] Prefer unpckhpd over movhlps in isel for fake unary cases
In r337348, I changed lowering to prefer X86ISD::UNPCKL/UNPCKH opcodes over MOVLHPS/MOVHLPS for v2f64 {0,0} and {1,1} shuffles when we have SSE2. This enabled the removal of a bunch of weirdly bitcasted isel patterns in r337349. To avoid changing the tests I placed a gross hack in isel to still emit movhlps instructions for fake unary unpckh nodes. A similar hack was not needed for unpckl and movlhps because we do execution domain switching for those. But unpckh and movhlps have swapped operand order.

This patch removes the hack.

This is a code size increase since unpckhpd requires a 0x66 prefix and movhlps does not. But if that's a big concern we should be using movhlps for all unpckhpd opcodes and let commuteInstruction turnit into unpckhpd when its an advantage.

Differential Revision: https://reviews.llvm.org/D49499

llvm-svn: 341973
2018-09-11 17:57:27 +00:00
Andrea Di Biagio fb3d9e1449 [X86] Remove wrong ReadAdvance from multiclass sse_fp_unop_s.
A ReadAdvance was incorrectly added to the SchedReadWrite list associated with
the following SSE instructions:

sqrtss
sqrtsd
rsqrtss
rcpss

As a consequence, a wrong operand latency was computed for the register operand
used as the base address of the folded load operand.

This patch removes the wrong ReadAdvance, and updates the llvm-mca test cases.
There is still a problem with correctly modeling partial register writes on XMM
registers This other problem is currently tracked here:
https://bugs.llvm.org/show_bug.cgi?id=38813

Differential Revision: https://reviews.llvm.org/D51542

llvm-svn: 341326
2018-09-03 16:47:34 +00:00
Craig Topper a11a3b3818 [SelectionDAG][X86] Reorder the operands the MaskedStoreSDNode to put the value first.
Summary:
Previously the value being stored is the last operand in SDNode. This causes the type legalizer to visit the mask operand before the value operand. The type legalizer was more complicated because of this since we want the type of the value to drive the decisions.

This patch moves the value to be the first operand so we visit it first during type legalization. It also simplifies the type legalization code accordingly.

X86 is currently the only in tree target that uses this SDNode. Not sure if there are any users out of tree.

Reviewers: RKSimon, delena, hfinkel, eli.friedman

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D50402

llvm-svn: 340689
2018-08-25 17:48:17 +00:00
Craig Topper 633fe98e27 [X86] Change legacy SSE scalar fp to integer intrinsics to use specific ISD opcodes instead of keeping as intrinsics. Unify SSE and AVX512 isel patterns.
AVX512 added new versions of these intrinsics that take a rounding mode. If the rounding mode is 4 the new intrinsics are equivalent to the old intrinsics.

The AVX512 intrinsics were being lowered to ISD opcodes, but the legacy SSE intrinsics were left as intrinsics. This resulted in the AVX512 instructions needing separate patterns for the ISD opcodes and the legacy SSE intrinsics.

Now we convert SSE intrinsics and AVX512 intrinsics with rounding mode 4 to the same ISD opcode so we can share the isel patterns.

llvm-svn: 339749
2018-08-15 01:23:00 +00:00
Craig Topper e902b7d0b0 [X86] Support fp128 and/or/xor/load/store with VEX and EVEX encoded instructions.
Move all the patterns to X86InstrVecCompiler.td so we can keep SSE/AVX/AVX512 all in one place.

To save some patterns we'll use an existing DAG combine to convert f128 fand/for/fxor to integer when sse2 is enabled. This allows use to reuse all the existing patterns for v2i64.

I believe this now makes SHA instructions the only case where VEX/EVEX and legacy encoded instructions could be generated simultaneously.

llvm-svn: 338821
2018-08-03 06:12:56 +00:00
Craig Topper 28ac623f6f [X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types.
Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction.

Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining.

Differential Revision: https://reviews.llvm.org/D49280

llvm-svn: 337590
2018-07-20 17:57:53 +00:00
Simon Pilgrim dd7bf598cc Fix spelling mistake in comments. NFCI.
llvm-svn: 337442
2018-07-19 09:14:39 +00:00
Craig Topper 92ea7a7b48 [X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by using VMOVLPS with a modified address.
This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable.

llvm-svn: 337357
2018-07-18 07:31:32 +00:00
Craig Topper 95063a45b8 [X86] Remove patterns that mix X86ISD::MOVLHPS/MOVHLPS with v2i64/v2f64 types.
The X86ISD::MOVLHPS/MOVHLPS should now only be emitted in SSE1 only. This means that the v2i64/v2f64 types would be illegal thus we don't need these patterns.

llvm-svn: 337349
2018-07-18 05:10:53 +00:00
Craig Topper 1425e10cc6 [X86] Generate v2f64 X86ISD::UNPCKL/UNPCKH instead of X86ISD::MOVLHPS/MOVHLPS for unary v2f64 {0,0} and {1,1} shuffles with SSE2.
I'm trying to restrict the MOVLHPS/MOVHLPS ISD nodes to SSE1 only. With SSE2 we can use unpcks. I believe this will allow some patterns to be cleaned up to require fewer bitcasts.

I've put in an odd isel hack to still select MOVHLPS instruction from the unpckh node to avoid changing tests and because movhlps is a shorter encoding. Ideally we'd do execution domain switching on this, but the operands are in the wrong order and are tied. We might be able to try a commute in the domain switching using custom code.

We already support domain switching for UNPCKLPD and MOVLHPS.

llvm-svn: 337348
2018-07-18 05:10:51 +00:00
Craig Topper a29f58dc31 [X86] Remove the vector alignment requirement from the patterns added in r337320.
The resulting instruction will only load 64 bits so alignment isn't required.

llvm-svn: 337334
2018-07-17 23:26:20 +00:00
Craig Topper 9ef92865ec [X86] Add patterns for folding full vector load into MOVHPS and MOVLPS with SSE1 only.
llvm-svn: 337320
2018-07-17 20:16:18 +00:00
Craig Topper 9187bca71b [X86] Remove some standalone patterns in favor of the patterns in the MOVLPD instruction definitions.
Previously we passed 'null_frag' into the instruction definition. The multiclass is shared with MOVHPD which doesn't use null_frag. It turns out by passing X86Movsd it produces patterns equivalent to some standalone patterns.

llvm-svn: 337299
2018-07-17 16:24:33 +00:00
Craig Topper 880f92ad71 [X86] Properly qualify some MOVSS/MOVSD patterns with OptSize.
These are integer versions of patterns that I already fixed for floating point.

llvm-svn: 337240
2018-07-17 06:24:16 +00:00
Craig Topper 07a1787501 [X86] Merge the FR128 and VR128 regclass since they have identical spill and alignment characteristics.
This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid.

The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well.

llvm-svn: 337147
2018-07-16 06:56:09 +00:00
Craig Topper d553ff3e2e [X86] Add load patterns for cases where we select X86Movss/X86Movsd to blend instructions.
This allows us to fold the load during isel without waiting for the peephole pass to do it.

llvm-svn: 337136
2018-07-15 21:49:01 +00:00
Craig Topper 8f34858779 [X86] Use 128-bit ops for 256-bit vzmovl patterns.
128-bit ops implicitly zero the upper bits. This should address the comment about domain crossing for the integer version without AVX2 since we can use a 128-bit VBLENDW without AVX2.

The only bad thing I see here is that we failed to reuse an vxorps in some of the tests, but I think that's already known issue.

llvm-svn: 337134
2018-07-15 18:51:07 +00:00
Craig Topper 60ce856134 [X86] Add some optsize patterns for 256-bit X86vzmovl.
These patterns use VMOVSS/SD. Without optsize we use BLENDI instead.

llvm-svn: 337119
2018-07-15 06:03:19 +00:00
Craig Topper f0b164415c [X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing for size.
AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX.

This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that.

llvm-svn: 337083
2018-07-14 02:05:08 +00:00
Craig Topper 2260e4149a [X86] Use the correct types in some recently added isel patterns.
These were supposed to be integer types since we are selecting integer instructions.

Found while preparing to remove these patterns for another patch.

llvm-svn: 337057
2018-07-13 22:27:53 +00:00
Craig Topper a8efe59a0a [X86] Prefer MOVSS/SD over BLEND under optsize in isel.
Previously we iseled to blend, commuted to another blend, and then commuted back to movss/movsd or blend depending on optsize. Now we do it directly.

llvm-svn: 336976
2018-07-13 06:25:31 +00:00
Craig Topper 2ab325ba23 [X86] Remove isel patterns that turns packed add/sub/mul/div+movss/sd into scalar intrinsic instructions.
This is not an optimization we should be doing in isel. This is more suitable for a DAG combine.

My main concern is a future time when we support more FPENV. Changing a packed op to a scalar op could cause us to miss some exceptions that should have occured if we had done a packed op. A DAG combine would be better able to manage this.

llvm-svn: 336971
2018-07-13 04:50:39 +00:00
Craig Topper 3a13477214 [X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions.
These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD.

llvm-svn: 336954
2018-07-12 22:14:10 +00:00
Craig Topper b0053b79d6 Revert r336950 and r336951 "[X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions." and "foo"
One of them had a bad title and they should have been squashed.

llvm-svn: 336953
2018-07-12 21:58:03 +00:00
Craig Topper b01a355354 foo
llvm-svn: 336950
2018-07-12 21:53:07 +00:00
Craig Topper 73c7dceffd [X86] Remove i128 type from FR128 regclass.
i128 isn't a legal type in our x86 implementation today. So remove this and the few patterns that used it until it becomes necessary.

llvm-svn: 336889
2018-07-12 07:30:01 +00:00
Andrea Di Biagio 483db141e3 [X86] Fix MayLoad/HasSideEffect flag for (V)MOVLPSrm instructions.
Before revision 336728, the "mayLoad" flag for instruction (V)MOVLPSrm was
inferred directly from the "default" pattern associated with the instruction
definition.

r336728 removed special node X86Movlps, and all the patterns associated to it.
Now instruction (V)MOVLPSrm doesn't have a pattern associated to it, and the
'mayLoad/hasSideEffects' flags are left unset.

When the instruction info is emitted by tablegen, method
CodeGenDAGPatterns::InferInstructionFlags() sees that (V)MOVLPSrm doesn't have a
pattern, and flags are undefined. So, it conservatively sets the
"hasSideEffects" flag for it.

As a consequence, we were losing the 'mayLoad' flag, and we were gaining a
'hasSideEffect' flag in its place.
This patch fixes the issue (originally reported by Michael Holmen).

The mca tests show the differences in the instruction info flags.  Instructions
that were affected by this problem were: MOVLPSrm/VMOVLPSrm/VMOVLPSZ128rm.

Differential Revision: https://reviews.llvm.org/D49182

llvm-svn: 336818
2018-07-11 15:27:50 +00:00
Craig Topper 1d6a80cd95 [X86] Remove some composite MOVSS/MOVSD isel patterns.
These patterns looked for a MOVSS/SD followed by a scalar_to_vector. Or a scalar_to_vector followed by a load.

In both cases we emitted a MOVSS/SD for the MOVSS/SD part, a REG_CLASS for the scalar_to_vector, and a MOVSS/SD for the load.

But we have patterns that do each of those 3 things individually so there's no reason to build large patterns.

Most of the test changes are just reorderings. The one test that had a meaningful change is pr30430.ll and it appears to be a regression. But its doing -O0 so I think it missed a lot of opportunities and was just getting lucky before.

llvm-svn: 336762
2018-07-11 04:51:40 +00:00
Craig Topper 27c77fe4ce [X86] Remove AddedComplexity from all patterns that use X86vzmovl as their root.
Some added 20 and some added 15. Its unclear when to use which value and whether they are required at all.

This patch removes them all. If we start finding real world issues we may need to add them back with proper tests.

llvm-svn: 336735
2018-07-10 22:23:54 +00:00
Craig Topper dea0b88b04 [X86] Remove X86ISD::MOVLPS and X86ISD::MOVLPD. NFCI
These ISD nodes try to select the MOVLPS and MOVLPD instructions which are special load only instructions. They load data and merge it into the lower 64-bits of an XMM register. They are logically equivalent to our MOVSD node plus a load.

There was only one place in X86ISelLowering that used MOVLPD and no places that selected MOVLPS. The one place that selected MOVLPD had to choose between it and MOVSD based on whether there was a load. But lowering is too early to tell if the load can really be folded. So in isel we have patterns that use MOVSD for MOVLPD if we can't find a load.

We also had patterns that select the MOVLPD instruction for a MOVSD if we can find a load, but didn't choose the MOVLPD ISD opcode for some reason.

So it seems better to just standardize on MOVSD ISD opcode and manage MOVSD vs MOVLPD instruction with isel patterns.

llvm-svn: 336728
2018-07-10 21:00:22 +00:00
Craig Topper db73f56489 [X86] Remove some seemingly unnecessary patterns.
We're missing the EVEX equivalents of these patterns and seem to get along fine.

I think we end up with X86vzload for the obvious IR cases that would produce this DAG.

llvm-svn: 336638
2018-07-10 05:31:42 +00:00
Craig Topper e9cff7d47b [X86] Remove some patterns that include a bitcast of a floating point load to an integer type.
DAG combine should have converted the type of the load.

llvm-svn: 336557
2018-07-09 16:03:02 +00:00
Craig Topper 16ee4b4957 [X86] Remove some patterns that seems to be unreachable.
These patterns mapped (v2f64 (X86vzmovl (v2f64 (scalar_to_vector FR64:$src)))) to a MOVSD and an zeroing XOR. But the complexity of a pattern for (v2f64 (X86vzmovl (v2f64))) that selects MOVQ is artificially and hides this MOVSD pattern.

Weirder still, the SSE version of the pattern was explicitly blocked on SSE41, but yet we had copied it to AVX and AVX512.

llvm-svn: 336556
2018-07-09 16:03:01 +00:00
Craig Topper 22330c700b [X86] Remove some seemingly unnecessary AddedComplexity lines.
Looking at the generated tables this didn't seem to make an obvious difference in pattern priority.

llvm-svn: 336555
2018-07-09 16:02:59 +00:00
Craig Topper c98c675f03 [X86] Remove an AddedComplexity line that seems unnecessary.
It only existed on SSE and AVX version. AVX512 version didn't have it.

I checked the generated table and this didn't seem necessary to creat a match preference.

llvm-svn: 336516
2018-07-08 22:57:33 +00:00
Craig Topper f61c631b25 [X86] Remove patterns for MOVLPD/MOVLPS nodes with integer types.
Lowering shouldn't generate these. If we need to use them for integer types, it should use a bitcast.

llvm-svn: 336458
2018-07-06 18:47:57 +00:00
Craig Topper 56440b9745 [X86] Don't use aligned load/store instructions for fp128 if the load/store isn't aligned.
Similarily, don't fold fp128 loads into SSE instructions if the load isn't aligned. Unless we're targeting an AMD CPU that doesn't check alignment on arithmetic instructions.

Should fix PR38001

llvm-svn: 336121
2018-07-02 17:01:54 +00:00
Craig Topper 7ffa976993 [X86] Don't fold unaligned loads into SSE ROUNDPS/ROUNDPD for ceil/floor/nearbyint/rint/trunc.
Incorrect patterns were added in r334460. This changes them to check alignment properly for SSE.

llvm-svn: 335062
2018-06-19 17:51:42 +00:00
Mikhail Dvoretckii b1ce7765be [X86] VRNDSCALE* folding from masked and scalar ffloor and fceil patterns
This patch handles back-end folding of generic patterns created by lowering the
X86 rounding intrinsics to native IR in cases where the instruction isn't a
straightforward packed values rounding operation, but a masked operation or a
scalar operation.

Differential Revision: https://reviews.llvm.org/D45203

llvm-svn: 335037
2018-06-19 10:37:52 +00:00
Craig Topper 16fdde5e63 [X86] Add '.s' aliases to the assembler for the various redundant move encodings to match gas and our EVEX instructions.
We already have these aliases for EVEX enocded instructions, but not for the GPR, MMX, SSE, and VEX versions.

Also remove the vpextrw.s EVEX alias. That's not something gas implements.

llvm-svn: 334922
2018-06-18 05:00:50 +00:00
Craig Topper 29f22d7baa [X86] More additions to the load folding tables based on the autogenerated tables.
Including more additions for NotMemoryFoldable to remove some entries from the autogenerated table.

llvm-svn: 334898
2018-06-16 23:25:50 +00:00
Tomasz Krupa bcaab53d47 [X86] Lowering sqrt intrinsics to native IR
Summary: Complementary patch to lowering sqrt intrinsics in Clang.

Reviewers: craig.topper, spatel, RKSimon, DavidKreitzer, uriel.k

Reviewed By: craig.topper

Subscribers: tkrupa, mike.dvoretsky, llvm-commits

Differential Revision: https://reviews.llvm.org/D41599

llvm-svn: 334849
2018-06-15 18:05:24 +00:00
Craig Topper 9f829f76e8 [X86] Remove NotMemoryFoldable from some AVX/AVX512 scalar instructions.
Some of these instructions are already in the manual folding table so we should have them in the auto table too.

llvm-svn: 334725
2018-06-14 15:40:27 +00:00
Craig Topper b2552e1e08 [x86] fix mappings of cvttp2si/cvttp2ui x86 intrinsics to x86-specific nodes and isel patterns (PR37551)
Summary:
The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.

This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll and avx512-cvttp2i.ll

Reviewers: RKSimon, gbedwell, spatel

Reviewed By: spatel

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D47993

llvm-svn: 334685
2018-06-14 03:16:58 +00:00
Craig Topper 957b738432 [X86] Add isel patterns for folding loads when creating ROUND instructions from ffloor/fnearbyint/fceil/frint/ftrunc.
We were missing packed isel folding patterns for all of sse41, avx, and avx512.

For some reason avx512 had scalar load folding patterns under optsize(due to partial/undef reg update), but we didn't have the equivalent sse41 and avx patterns.

Sometimes we would get load folding due to peephole pass anyway, but we're also missing avx512 instructions from the load folding table. I'll try to fix that in another patch.

Some of this was spotted in the review for D47993.

This patch adds all the folds to isel, adds a few spot tests, and disables the peephole pass on a few tests to ensure we're testing some of these patterns.

llvm-svn: 334460
2018-06-12 00:48:57 +00:00
Craig Topper 860562c915 [X86] Miscellaneous fixes to get the load folding table generator to work again.
llvm-svn: 334377
2018-06-10 21:48:24 +00:00
Nicolai Haehnle 01d261f18d TableGen: Streamline the semantics of NAME
Summary:
The new rules are straightforward. The main rules to keep in mind
are:

1. NAME is an implicit template argument of class and multiclass,
   and will be substituted by the name of the instantiating def/defm.

2. The name of a def/defm in a multiclass must contain a reference
   to NAME. If such a reference is not present, it is automatically
   prepended.

And for some additional subtleties, consider these:

3. defm with no name generates a unique name but has no special
   behavior otherwise.

4. def with no name generates an anonymous record, whose name is
   unique but undefined. In particular, the name won't contain a
   reference to NAME.

Keeping rules 1&2 in mind should allow a predictable behavior of
name resolution that is simple to follow.

The old "rules" were rather surprising: sometimes (but not always),
NAME would correspond to the name of the toplevel defm. They were
also plain bonkers when you pushed them to their limits, as the old
version of the TableGen test case shows.

Having NAME correspond to the name of the toplevel defm introduces
"spooky action at a distance" and breaks composability:
refactoring the upper layers of a hierarchy of nested multiclass
instantiations can cause unexpected breakage by changing the value
of NAME at a lower level of the hierarchy. The new rules don't
suffer from this problem.

Some existing .td files have to be adjusted because they ended up
depending on the details of the old implementation.

Change-Id: I694095231565b30f563e6fd0417b41ee01a12589

Reviewers: tra, simon_tatham, craig.topper, MartinO, arsenm, javed.absar

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D47430

llvm-svn: 333900
2018-06-04 14:26:05 +00:00
Alexander Ivchenko 96062eaa8e [X86] Scalar mask and scalar move optimizations
1. Introduction of mask scalar TableGen patterns.
2. Introduction of new scalar move TableGen patterns
   and refactoring of existing ones.
3. Folding of pattern created by introducing scalar
   masking in Clang header files.

Patch by tkrupa

Differential Revision: https://reviews.llvm.org/D47012

llvm-svn: 333419
2018-05-29 14:27:11 +00:00
Petar Jovanovic c051000b83 [X86][MIPS][ARM] New machine instruction property 'isMoveReg'
This property is needed in order to follow values movement between
registers. This property is used in TII to implement method that
returns true if simple copy like instruction is recognized, along
with source and destination machine operands.

Patch by Nikola Prica.

Differential Revision: https://reviews.llvm.org/D45204

llvm-svn: 333093
2018-05-23 15:28:28 +00:00
Simon Pilgrim 1273f4ad93 [X86] Add GPR<->XMM Schedule Tags
BtVer2 - fix NumMicroOp and account for the Lat+6cy GPR->XMM and Lat+1cy XMm->GPR delays (see rL332737)

The high number of MOVD/MOVQ equivalent instructions meant that there were a number of missed patterns in SNB/Znver1:
SNB - add missing GPR<->MMX costs (taken from Agner / Intel AOM)
Znver1 - add missing GPR<->XMM MOVQ costs (taken from Agner)

llvm-svn: 332745
2018-05-18 17:58:36 +00:00
Simon Pilgrim c4b8d367a8 [X86][SSE] Ensure vector partial load/stores use the WriteVecLoad/WriteVecStore scheduler classes
Retag some instructions that were missed when we split off vector load/store/moves - MOVQ/MOVD etc.

Fixes BtVer2/SLM which have different behaviours for GPR stores.

llvm-svn: 332718
2018-05-18 14:08:01 +00:00
Simon Pilgrim e819199e2a [X86][AVX] VEXTRACTF128mr store is a WriteFStoreX not WriteFStore
llvm-svn: 332715
2018-05-18 13:17:51 +00:00
Simon Pilgrim d749b321b2 [X86][SSE] Ensure float load/stores use the WriteFLoad/WriteFStore scheduler classes
Retag some instructions that were missed when we split off vector load/store/moves - MOVSS/MOVSD/MOVHPD/MOVHPD/MOVLPD/MOVLPS etc.

Fixes BtVer2/SLM which have different behaviours for GPR stores.

llvm-svn: 332714
2018-05-18 13:13:59 +00:00
Craig Topper a2c5264718 [X86] Add OptForSize to a couple load folding patterns. Remove some bad FIXME comments.
The FIXME comments were about preventing load folding to avoid a partial xmm update. But these instructions use GPR as input when the load isn't folded. This won't help prevent a partial xmm update.

llvm-svn: 332573
2018-05-17 05:41:11 +00:00
Simon Pilgrim 5647e89f5a [X86] Split WriteCvtI2F/WriteCvtF2I into I<->F32 and I<->F64 scheduler classes
A lot of the models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first

llvm-svn: 332451
2018-05-16 10:53:45 +00:00
Simon Pilgrim be9a206883 [X86] Split WriteCvtF2F into F32->F64 and F64->F32 scheduler classes
BtVer2 - Fixes schedules for (V)CVTPS2PD instructions

A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first

llvm-svn: 332376
2018-05-15 17:36:49 +00:00
Simon Pilgrim 891ebcdbaa [X86] Split off F16C WriteCvtPH2PS/WriteCvtPS2PH scheduler classes
Btver2 - VCVTPH2PSYrm needs to double pump the AGU
Broadwell - missing VCVTPS2PH*mr stores extra latency

Allows us to remove the WriteCvtF2FSt conversion store class

llvm-svn: 332357
2018-05-15 14:12:32 +00:00
Simon Pilgrim 215ce4a1ca [X86] Add NT load/store scheduler classes
llvm-svn: 332274
2018-05-14 18:37:19 +00:00
Craig Topper 266b7ae55d [X86] Cleanup a multiclass that doesn't need as many parameters after recent intrinsic removals.
llvm-svn: 332207
2018-05-14 00:17:52 +00:00
Craig Topper 38b713d4a7 [X86] Add some load folding patterns for cvtsi2ss/sd into intrinsic instructions.
llvm-svn: 332189
2018-05-13 01:54:33 +00:00
Craig Topper df3a9cedff [X86] Remove an autoupgrade legacy cvtss2sd intrinsics.
llvm-svn: 332187
2018-05-13 00:29:40 +00:00
Craig Topper 38ad7ddabc [X86] Remove and autoupgrade cvtsi2ss/cvtsi2sd intrinsics to match what clang has used for a very long time.
llvm-svn: 332186
2018-05-12 23:14:39 +00:00
Simon Pilgrim ead11e4d4b [X86] Added scheduler helper classes to split move/load/store by size
Nothing uses this yet but this will allow us to specialize MMX/XMM/YMM/ZMM vector moves.

llvm-svn: 332090
2018-05-11 12:46:54 +00:00
Simon Pilgrim ab34aa8294 [X86] Cleanup WriteFStore/WriteVecStore schedules
MOVNTPD/MOVNTPS should be WriteFStore

Standardized BDW/HSW/SKL/SKX WriteFStore/WriteVecStore - fixes some missed instregex patterns. (V)MASKMOVDQU was already using the default, its costs gets increased but is still nowhere near the real cost of that nasty instruction....

llvm-svn: 331864
2018-05-09 11:01:16 +00:00
Simon Pilgrim b0a3be04ec [X86] Add vector masked load/store scheduler classes (PR32857)
Split off from existing vector load/store classes to remove InstRW overrides.

llvm-svn: 331760
2018-05-08 12:17:55 +00:00
Simon Pilgrim 210286ed8f [X86] Add SchedWriteFTest/SchedWriteVecTest TEST scheduler classes
Split off from SchedWriteVecLogic to remove InstRW overrides.

llvm-svn: 331757
2018-05-08 10:28:03 +00:00
Simon Pilgrim 1233e1234a [X86] Split WriteFAdd/WriteFCmp/WriteFMul schedule classes
Split to support single/double for scalar, XMM and YMM/ZMM instructions - removing InstrRW overrides for these instructions.

Fixes Atom ADDSUBPD instruction and reclassifies VFPCLASS as WriteFCmp which is closer in behaviour.

llvm-svn: 331672
2018-05-07 20:52:53 +00:00
Simon Pilgrim e480ed0b9f [X86][AVX2] Tag VPMOVSX/VPMOVZX ymm instructions as WriteShuffle256
These are more like cross-lane shuffles than regular shuffles - we already do this for AVX512 equivalents.

Differential Revision: https://reviews.llvm.org/D46229

llvm-svn: 331659
2018-05-07 18:25:19 +00:00
Simon Pilgrim ac5d0a31ef [X86] Split WriteFDiv schedule classes to support single/double scalar, XMM and YMM/ZMM instructions.
This removes all InstrRW overrides for these instructions - some x87 overrides remain but most use default (and realistic) values.

llvm-svn: 331643
2018-05-07 16:15:46 +00:00
Simon Pilgrim f3ae50fca2 [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt schedule classes
WriteFRcp/WriteFRsqrt are split to support scalar, XMM and YMM/ZMM instructions.

WriteFSqrt is split into single/double/long-double sizes and scalar, XMM, YMM and ZMM instructions.

This removes all InstrRW overrides for these instructions.

NOTE: There were a couple of typos in the Znver1 model - notably a 1cy throughput for SQRT that is highly unlikely and doesn't tally with Agner.

NOTE: I had to add Agner's numbers for several targets for WriteFSqrt80.
llvm-svn: 331629
2018-05-07 11:50:44 +00:00
Simon Pilgrim bf4c8c0ff2 [X86] Add WriteVecMOVMSKY scheduler class
llvm-svn: 331525
2018-05-04 14:54:33 +00:00
Simon Pilgrim be51b20127 [X86] Add SchedWriteFRnd fp rounding scheduler classes
Split off from SchedWriteFAdd for fp rounding/bit-manipulation instructions.

Fixes an issue on btver2 which only had the ymm version using the JSTC pipe instead of JFPA.

llvm-svn: 331515
2018-05-04 12:59:24 +00:00
Simon Pilgrim 542b20d656 [X86] Add WriteDPPD/WriteDPPS dot product scheduler classes
llvm-svn: 331489
2018-05-03 22:31:19 +00:00
Simon Pilgrim f2d2cedab4 [X86] Split WriteVecShift/WriteVarVecShift into MMX, XMM and YMM/ZMM scheduler classes
This took a bit of extra work as on Intel targets the old (V)PSLLDrr/(V)PSLLDrm style instructions act differently - I ended up creating WriteVecShiftImm classes for XMM/YMM/ZMM vector shift by immediate and retaining WriteVecShift as the default (used only by MMX) plus WriteVecShiftX/WriteVecShiftY. X86SchedWriteWidths hides most of this thank goodness.

llvm-svn: 331472
2018-05-03 17:56:43 +00:00
Simon Pilgrim f7dd6069a5 [X86] Split WriteVecALU/WritePHAdd into XMM and YMM/ZMM scheduler classes
llvm-svn: 331453
2018-05-03 13:27:10 +00:00
Simon Pilgrim e8671ef434 [X86] Convert most remaining uses of X86SchedWritePair scheduler classes to X86SchedWriteWidths.
We've dealt with the majority already.

llvm-svn: 331347
2018-05-02 12:27:54 +00:00
Simon Pilgrim c708868cb1 [X86] Split WriteFRcp/WriteFRsqrt/WriteFSqrt into XMM and YMM/ZMM scheduler classes
llvm-svn: 331290
2018-05-01 18:06:07 +00:00
Simon Pilgrim c546f9424f [X86] Split WriteFCmp into XMM and YMM/ZMM scheduler classes
Removes more WriteFCmp InstRW overrides

llvm-svn: 331283
2018-05-01 16:50:16 +00:00
Simon Pilgrim 1b7a80d80a [X86] Convert all uses of WriteFAdd to X86SchedWriteWidths.
In preparation of splitting WriteFAdd by vector width.

llvm-svn: 331273
2018-05-01 15:57:17 +00:00
Simon Pilgrim f6b81dae9e [X86] Convert all uses of WriteFShuffle to X86SchedWriteWidths.
In preparation of splitting WriteFShuffle by vector width.

llvm-svn: 331262
2018-05-01 14:14:42 +00:00
Simon Pilgrim 6f710a6440 [X86] Convert all uses of WriteFLogic/WriteVecLogic to X86SchedWriteWidths.
In preparation of splitting WriteVecLogic by vector width.

llvm-svn: 331256
2018-05-01 12:15:29 +00:00
Simon Pilgrim fc0c26f1a6 [X86] Tag PSLLDQ/PSRLDQ as WriteShuffle scheduler classes instead of shifts.
Although they are encoded similar to bit shifts, the byte shifts behave like shuffles from a scheduling point of view.

llvm-svn: 331253
2018-05-01 11:05:42 +00:00
Simon Pilgrim 3c35408e48 [X86] Introduce X86SchedWriteWidths schedule wrapper for different vector widths.
We need to split most of the scheduler classes by vector width to remove more of the InstRW overrides, this patch should make this easier/tidier by allowing us to pass the X86SchedWriteWidths wrapper to multi-width multiclasses and then split as required.

I've included fields for Scl (scalar float/double), MMX (MMX integer), XMM, YMM and ZMM widths. These fields mostly share the same classes but it should give us the flexibility that we may need in the future.

This patch has replaced a set of example SSE/AVX512 instruction cases but isn't exhaustive as it gets very noisy before we really need the functionality.

Differential Revision: https://reviews.llvm.org/D46266

llvm-svn: 331208
2018-04-30 18:18:38 +00:00
Craig Topper 06624e1a93 [X86] Restrict many of the InstAliases to either to only att or intel syntax. NFCI
Many of these aliases exist to give one syntax or the other a slightly different mnemonic and the other variant gets a duplicate of its normal mnemonic

This patch restricts a lot of these to only one variant so we don't get the duplication.

This removes a lot of duplicate entries from the matcher table. It also reduces the number of warnings printed when you enable the ambiguous match warning in tablegen.

llvm-svn: 331117
2018-04-28 18:46:11 +00:00
Simon Pilgrim 9f561dd54a [X86][SSE] Stop hard coding some instruction scheduler classes.
Make these arguments to the multiclass to allow easier specialization.

llvm-svn: 331107
2018-04-28 14:08:51 +00:00
Craig Topper d656410293 [X86] Make the STTNI flag intrinsics use the flags from pcmpestrm/pcmpistrm if the mask instrinsics are also used in the same basic block.
Summary:
Previously the flag intrinsics always used the index instructions even if a mask instruction also exists.

To fix fix this I've created a single ISD node type that returns index, mask, and flags. The SelectionDAG CSE process will merge all flavors of intrinsics with the same inputs to a s ingle node. Then during isel we just have to look at which results are used to know what instruction to generate. If both mask and index are used we'll need to emit two instructions. But for all other cases we can emit a single instruction.

Since I had to do manual isel anyway, I've removed the pseudo instructions and custom inserter code that was working around tablegen limitations with multiple implicit defs.

I've also renamed the recently added sse42.ll test case to sttni.ll since it focuses on that subset of the sse4.2 instructions.

Reviewers: chandlerc, RKSimon, spatel

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D46202

llvm-svn: 331091
2018-04-27 22:15:33 +00:00
Simon Pilgrim 8a937e00d8 [X86] Split WriteFBlend/WriteFVarBlend/WriteFVarShuffle into XMM and YMM/ZMM scheduler classes
This removes all the WriteFBlend/WriteFVarBlend InstRW overrides - some WriteFVarShuffle remain to be fixed.

llvm-svn: 331065
2018-04-27 18:19:48 +00:00
Simon Pilgrim c3c767bf50 [X86] Split WriteFHadd into XMM and YMM/ZMM scheduler classes
This removes all the HADD/HSUB PS/PD InstRW overrides.

llvm-svn: 331054
2018-04-27 16:11:57 +00:00
Simon Pilgrim b2aa89c909 [X86][AVX] Split WriteFLogic into XMM and YMM/ZMM scheduler classes
This removes all the AND/ANDN/OR/XOR PS/PD InstRW overrides.

llvm-svn: 331051
2018-04-27 15:50:33 +00:00
Craig Topper b0227189fd [X86] Remove alignment restriction on loading folding of pcmp[ei]str* during isel too.
This is a follow up to the changes in r330896 which enabled folding after isel during peephole and register allocation.

llvm-svn: 330897
2018-04-26 03:53:39 +00:00
Simon Pilgrim 27bc83e228 [X86] Split off PHMINPOSUW to their own schedule class
This also fixes Jaguar's schedule which was treating it as the WriteVecIMul default. 

llvm-svn: 330756
2018-04-24 18:49:25 +00:00
Simon Pilgrim f0945aa0e0 [X86][F16C] Add WriteCvtF2FSt scheduling class
Fixes the classification of VCVTPS2PHmr/VCVTPS2PHYmr which were tagged as WriteCvtF2FLd_WriteRMW (PR36887)

llvm-svn: 330737
2018-04-24 16:43:07 +00:00
Simon Pilgrim f7d2a93d5f [X86] Add vector element insertion/extraction scheduler classes
Split off pinsr/pextr and extractps instructions.

(Mostly) fixes PR36887.

Note: It might be worth adding a WriteFInsertLd class as well in the future.

Differential Revision: https://reviews.llvm.org/D45929

llvm-svn: 330714
2018-04-24 13:21:41 +00:00
Craig Topper 3f1d538165 [X86] Add VEX_WIG to VEX encoded version of VCMPPSY/VCMPPDY.
llvm-svn: 330563
2018-04-23 04:50:01 +00:00
Simon Pilgrim 2fd8269c6f [X86][MMX][SSE] Tag missed PHADD/PHSUB instructions with WritePHAdd
llvm-svn: 330545
2018-04-22 15:02:23 +00:00
Craig Topper e958c7270e [X86] Change TB to PS on LFENCE instruction.
This matches the other FENCE instructions.

llvm-svn: 330533
2018-04-22 03:15:02 +00:00