Commit Graph

9 Commits

Author SHA1 Message Date
Craig Topper 8b5f2ab2a4 Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."
The assert that caused this to be reverted should be fixed now.

Original commit message:

This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

llvm-svn: 368183
2019-08-07 16:24:26 +00:00
Mitch Phillips bd0d97e1c4 Revert "[X86] Enable -x86-experimental-vector-widening-legalization by default."
This reverts commit 3de33245d2.

This commit broke the MSan buildbots. See
https://reviews.llvm.org/rL367901 for more information.

llvm-svn: 368107
2019-08-06 23:00:43 +00:00
Craig Topper 3de33245d2 [X86] Enable -x86-experimental-vector-widening-legalization by default.
This patch changes our defualt legalization behavior for 16, 32, and
64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
widening. For example, v8i8 will now be widened to v16i8 instead of
promoted to v8i16. This keeps the elements widths the same and pads
with undef elements. We believe this is a better legalization strategy.
But it carries some issues due to the fragmented vector ISA. For
example, i8 shifts and multiplies get widened and then later have
to be promoted/split into vXi16 vectors.

This has the potential to cause regressions so we wanted to get
it in early in the 10.0 cycle so we have plenty of time to
address them.

Next steps will be to merge tests that explicitly test the command
line option. And then we can remove the option and its associated
code.

llvm-svn: 367901
2019-08-05 18:25:36 +00:00
Craig Topper 510e6fadaa [X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it.
The build_vector will become a constant pool load. By using the
desired type initially, it ensures we don't generate a bitcast
of the constant pool load which will need to be folded with
the load.

While experimenting with another patch, I noticed that when the
load type and the constant pool type don't match, then
SimplifyDemandedBits can't handle it. While we should probably
fix that, this was a simple way to fix the issue I saw.

llvm-svn: 366732
2019-07-22 19:58:49 +00:00
Craig Topper 06aea1720a [X86] Move promotion of vector and/or/xor from legalization to DAG combine
Summary:
I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that.

This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better.

In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53107

llvm-svn: 344487
2018-10-15 01:51:58 +00:00
Craig Topper 7af5e333e7 [X86] Create paddus/psubus from narrower vectors with i8/i16 element types.
Summary:
This patch allows vectors with a power of 2 number of elements and i8/i16 element type to select paddus/psubus instructions. ReplaceNodeResults has been updated to custom widen these operations up to 128 bits like we already do for PAVG.

Another step towards fixing PR38691

Reviewers: RKSimon, spatel

Reviewed By: RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D51818

llvm-svn: 341753
2018-09-08 19:32:58 +00:00
Craig Topper 803912ea57 [X86] Fix an issue in the matching for ADDUS.
We were basically assuming only one operand of the compare could be an ADD node and using that to swap operands. But we can have a normal add followed by a saturing add.

This rewrites the canonicalization to just be based on the condition code.

llvm-svn: 340134
2018-08-19 04:26:31 +00:00
Craig Topper a85d7e927b [X86] Add a test case showing an issue in our addusw pattern matching.
We are unable to handle a normal add followed by a saturing add with certain operand orders on the icmp.

llvm-svn: 340133
2018-08-19 04:26:29 +00:00
Tomasz Krupa 86a63889f3 [X86] Lowering addus/subus intrinsics to native IR
Summary: This revision improves previous version (rL330322) which has been reverted due to crashes.

This is the patch that lowers x86 intrinsics to native IR
in order to enable optimizations. The patch also includes folding
of previously missing saturation patterns so that IR emits the same
machine instructions as the intrinsics.

Reviewers: craig.topper, spatel, RKSimon

Reviewed By: craig.topper

Subscribers: mike.dvoretsky, DavidKreitzer, sroland, llvm-commits

Differential Revision: https://reviews.llvm.org/D46179

llvm-svn: 339650
2018-08-14 08:00:56 +00:00