Commit Graph

17926 Commits

Author SHA1 Message Date
Simon Pilgrim a365719a24 [X86][SSE] LowerVSELECT - pull out repeated getOperand(). NFCI.
llvm-svn: 345458
2018-10-27 18:37:59 +00:00
Simon Pilgrim 88116e905e Revert rL345395: [X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs
Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed.
........
Its been reported that this is causing out of trunk failures.

llvm-svn: 345451
2018-10-27 07:10:48 +00:00
Craig Topper 4b89647b79 [X86] Add some isel patterns for scalar_to_vector/extract_vector_element that use the avx512 extended register classes when they are available.
llvm-svn: 345448
2018-10-27 05:35:20 +00:00
Reid Kleckner 98d880fbd7 [Spectre] Fix MIR verifier errors in retpoline thunks
Summary:
The main challenge here is that X86InstrInfo::AnalyzeBranch doesn't
understand the way we're using a CALL instruction as a branch, so we
can't list the CallTarget MBB as a successor of the entry block. If we
don't list it as a successor, then the AsmPrinter doesn't print a label
for the MBB.

Fix the issue by inserting our own label at the beginning of the call
target block. We can rely on the AsmPrinter to always emit it, even
though the block appears to be unreachable, but address-taken.

Fixes PR38391.

Reviewers: thegameg, chandlerc, echristo

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D53653

llvm-svn: 345426
2018-10-26 20:26:36 +00:00
Craig Topper 8315d9990c [X86] Stop promoting vector and/or/xor/andn to vXi64.
These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation.

This patch removes the promotion and adds isel patterns instead.

The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward.

Differential Revision: https://reviews.llvm.org/D53268

llvm-svn: 345408
2018-10-26 17:21:26 +00:00
Simon Pilgrim 5d1be4f8d4 [X86][SSE] Move 2-input limit up from getFauxShuffleMask to resolveTargetShuffleInputs
Makes no difference to actual shuffle decoding yet, but merges all the existing limits in one place for when proper support is fixed.

llvm-svn: 345395
2018-10-26 15:19:02 +00:00
Sanjay Patel 6b40768f5a [x86] commute blendvb with constant condition op to allow load folding
This is a narrow fix for 1 of the problems mentioned in PR27780:
https://bugs.llvm.org/show_bug.cgi?id=27780

I looked at more general solutions, but it's a mess. We canonicalize shuffle masks
based on the number of elements accessed from each operand, and that's not optional.
If you remove that, we'll crash because we fail to match isel patterns. So I'm
waiting until we're sure that we have blendvb with constant condition and then
commuting based on the load potential. Other cases like blend-with-immediate are
already handled elsewhere, so this is probably not a common problem anyway.

I didn't use "MayFoldLoad" because that checks for one-use and in these cases, we've
screwed that up by creating a temporary PSHUFB using these operands that we're counting
on to be killed later. Undoing that didn't look like a simple task because it's
intertwined with determining if we actually use both operands of the shuffle or not.a

Differential Revision: https://reviews.llvm.org/D53737

llvm-svn: 345390
2018-10-26 14:58:13 +00:00
Simon Pilgrim 7575c6d01b [X86] Use existing pulled out VT variables. NFCI.
llvm-svn: 345388
2018-10-26 14:39:28 +00:00
Craig Topper 813064bf4d [X86] Change X86 backend to look for 'min-legal-vector-width' attribute instead of 'required-vector-width' when determining whether 512-bit vectors should be legal.
The required-vector-width attribute was only used for backend testing and has never been generated by clang.

I believe clang is now generating min-legal-vector-width for vector uses in user code.

With this I believe passing -mprefer-vector-width=256 to clang should prevent use of zmm registers in the generated assembly unless the user used a 512-bit intrinsic in their source code.

llvm-svn: 345317
2018-10-25 21:16:06 +00:00
Craig Topper c10de9a37a [X86] Remove ProcIntelKNL and replace with a SlowPMADDWD flag to use in the one place it was checked.
llvm-svn: 345286
2018-10-25 17:29:00 +00:00
Craig Topper 5d787ac4be [X86] Remove some uarch tuning flags from KNL that look to have been inherited from SNB/IVB incorrectly
KNL is based on a modified Silvermont core so I don't think these features apply. I think the LEA flag is probably also wrong, but I'm less sure as I barely understand the 3 LEA flags we have currently.

Differential Revision: https://reviews.llvm.org/D53671

llvm-svn: 345285
2018-10-25 17:28:57 +00:00
Simon Pilgrim 53e8e145e9 [CostModel][X86] Add realistic vXi64 uitofp vXf64 costs
Match codegen improvements from D53649/rL345256

llvm-svn: 345263
2018-10-25 13:06:20 +00:00
Simon Pilgrim 0573b8d8b6 [CostModel][X86] Add realistic i64 uitofp f64 scalar costs
llvm-svn: 345261
2018-10-25 12:42:10 +00:00
Clement Courbet 41c8af3924 [MCSched] Bind PFM Counters to the CPUs instead of the SchedModel.
Summary:
The pfm counters are now in the ExegesisTarget rather than the
MCSchedModel (PR39165).

This also compresses the pfm counter tables (PR37068).

Reviewers: RKSimon, gchatelet

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D52932

llvm-svn: 345243
2018-10-25 07:44:01 +00:00
Craig Topper 7ae43cad65 [X86] Don't use the OriginalDemandedBits to calculate the DemandedMask for PMULUDQ/PMULDQ inputs.
Multiply a is complex operation so just because some bit of the output isn't used doesn't mean that bit of the input isn't used.

We might able to bound it, but it will require some more thought.

llvm-svn: 345241
2018-10-25 07:00:09 +00:00
Craig Topper eaa1cf5b57 [X86] Fix typo in comment. NFC
llvm-svn: 345236
2018-10-25 05:00:20 +00:00
Reid Kleckner 49a24278ba [ELF] Fix large code model MIR verifier errors
Instead of using the MOVGOT64r pseudo, use the existing
MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to
create a "scratch register operand" for the pseudo to use, and the
register allocator can make better decisions.

Fixes some X86 verifier errors tracked in PR27481.

llvm-svn: 345219
2018-10-24 22:57:28 +00:00
Reid Kleckner 9c5bda652c [X86] Add *SP to tailcall register class to fix verifier error
It's possible to do a tail call to a stack argument. LLVM already
calculates the right stack offset to call through.

Fixes the sibcall* and musttail* verifier failures tracked at PR27481.

llvm-svn: 345197
2018-10-24 21:09:34 +00:00
Reid Kleckner 953bdce68d [MC] Separate masm integer literal lexer support from inline asm
Summary:
This renames the IsParsingMSInlineAsm member variable of AsmLexer to
LexMasmIntegers and moves it up to MCAsmLexer. This is the only behavior
controlled by that variable. I added a public setter, so that it can be
set from outside or from the llvm-mc command line. We may need to
arrange things so that users can get this behavior from clang, but
that's future work.

I also put additional hex literal lexing functionality under this flag
to fix PR32973. It appears that this hex literal parsing wasn't intended
to be enabled in non-masm-style blocks.

Now, masm integers (0b1101 and 0ABCh) work in __asm blocks from clang,
but 0b label references work when using .intel_syntax in standalone .s
files.

However, 0b label references will *not* work from __asm blocks in clang.
They will work from GCC inline asm blocks, which it sounds like is
important for Crypto++ as mentioned in PR36144.

Essentially, we only lex masm literals for inline asm blobs that use
intel syntax. If the .intel_syntax directive is used inside a gnu-style
inline asm statement, masm literals will not be lexed, which is
compatible with gas and llvm-mc standalone .s assembly.

This fixes PR36144 and PR32973.

Reviewers: Gerolf, avt77

Subscribers: eraman, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D53535

llvm-svn: 345189
2018-10-24 20:23:57 +00:00
Craig Topper 7bb8c2e6e5 [X86] Explicitly list all KNL features of inheriting from IVB. NFC
I'm not sure all the microarchitectural tuning flags that have been added to IVBFeatures are relevant for KNL. Separating will allow us to see and audit them. There might even be some simplification opportunities in the Sandy Bridge through Icelake inheritance line without KNL using the same chain.

llvm-svn: 345183
2018-10-24 19:24:44 +00:00
Simon Pilgrim c5bb362b13 [X86][SSE] Add SimplifyDemandedBitsForTargetNode PMULDQ/PMULUDQ handling
Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes.

This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398.

Differential Revision: https://reviews.llvm.org/D53643

llvm-svn: 345182
2018-10-24 19:11:28 +00:00
Simon Pilgrim ac84005841 [CostModel][X86] Add vXi8 vector division by constants costs.
ISD::MULHS/ISD::MULHU lowering of vXi8 types means we expand these in TargetLowering BuildSDIV/BuildUDIV.

llvm-svn: 345175
2018-10-24 18:44:12 +00:00
Craig Topper 2417273255 [X86] Bring back the MOV64r0 pseudo instruction
This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos.

My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at.

There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling.

Differential Revision: https://reviews.llvm.org/D52757

llvm-svn: 345165
2018-10-24 17:32:09 +00:00
Simon Pilgrim 2cce074e8c [CostModel][X86] Enable non-uniform vector division by constants costs.
Non-uniform division/remainder handling was added back at D49248/D50765 - so share the 'mul+sub' costs that already exist for uniform cases.

llvm-svn: 345164
2018-10-24 17:30:29 +00:00
Craig Topper da54bbf52a [X86] Correct a bad isel predicate. Though I don't think it can be exposed.
This B/W VPTEST instructions are only available with AVX512BW. But lowering should prevent any byte or word elements from getting to isel so this can't be exposed.

llvm-svn: 345112
2018-10-24 06:13:36 +00:00
Matthias Braun 4f82406c46 SelectionDAG: Reuse bigger sized constants in memset expansion.
When implementing memset's today we often see this pattern:
$x0 = MOV 0xXYXYXYXYXYXYXYXY
store $x0, ...
$w1 = MOV 0xXYXYXYXY
store $w1, ...

We first create a 64bit constant in a 64bit register with all bytes the
same and then create a 32bit constant with all bytes the same in a 32bit
register. In many targets we could just access the lower byte of the
64bit register instead.

- Ideally this would be handled by the ConstantHoist pass but it runs
  too early when memset isn't expanded yet.
- The memset expansion code already had this optimization implemented,
  however SelectionDAG constantfolding would constantfold the
  "trunc(bigconstnat)" pattern to "smallconstant".
- This patch makes the memset expansion mark the constant as Opaque and
  stop DAGCombiner from constant folding in this situation. (Similar to
  how ConstantHoisting marks things as Opaque to avoid folding
  ADD/SUB/etc.)

Differential Revision: https://reviews.llvm.org/D53181

llvm-svn: 345102
2018-10-23 23:19:23 +00:00
Simon Pilgrim b6c57075c0 [X86][SSE] Revert rL343922 combinePMULDQ AddToWorklist (PR39398)
We can't add the MULDQ node back to the worklist after the demanded bits change has been committed in case the node has been removed entirely. This will have to wait until we have SimplifyDemandedBitsForTargetNode.

llvm-svn: 345070
2018-10-23 19:07:53 +00:00
Roman Lebedev 2fae985793 X86DAGToDAGISel::matchBitExtract(): lambdas can't have default arguments.
As reported by ctopper.
That is a gcc-only warning at the moment.

llvm-svn: 345065
2018-10-23 18:27:10 +00:00
Simon Pilgrim f04a04c2b6 [TTI][X86] Treat SK_Transpose shuffles as SK_PermuteTwoSrc - there's no difference in lowering.
llvm-svn: 345048
2018-10-23 16:45:26 +00:00
Roman Lebedev 06e4db07af Experimental re-land of [X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern
This initially landed in rL345014, but was reverted in rL345017
due to sanitizer-x86_64-linux-fast buildbot failure in
check-lld (ELF/relocatable-versioned.s) test.

While i'm not yet quite sure what is the problem, one obvious
thing here is that extra truncation roundtrip.
Maybe that's it? If not, will re-revert.

Differential Revision: https://reviews.llvm.org/D53521

llvm-svn: 345027
2018-10-23 13:19:31 +00:00
Simon Pilgrim f85ee9f8b4 [X86][SSE] Update raw mask shuffle decoders to handle UNDEF mask elts
Matches the approach taken in the constant pool shuffle decoders, and uses an UndefElts mask instead of uint64_t(-1) raw mask values, which doesn't work safely for i32/i64 shuffle mask sizes (as the -1 value is legal).

This allows us to remove the constant pool shuffle decoders from most of the getTargetShuffleMask variable shuffle cases (X86ISD::VPERMV3 will be handled in a future commit).

llvm-svn: 345018
2018-10-23 11:33:38 +00:00
Roman Lebedev c29dbbdb10 Revert "[X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern"
*Seems* to be breaking sanitizer-x86_64-linux-fast buildbot,
the ELF/relocatable-versioned.s test:

==17758==MemorySanitizer CHECK failed: /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191 "((kBlockMagic)) == ((((u64*)addr)[0]))" (0x6a6cb03abcebc041, 0x0)
    #0 0x59716b in MsanCheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/msan/msan.cc:393
    #1 0x586635 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_termination.cc:79
    #2 0x57d5ff in __sanitizer::InternalFree(void*, __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator32<__sanitizer::AP32> >*) /b/sanitizer-x86_64-linux-fast/build/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_allocator.cc:191
    #3 0x7fc21b24193f  (/lib/x86_64-linux-gnu/libc.so.6+0x3593f)
    #4 0x7fc21b241999 in exit (/lib/x86_64-linux-gnu/libc.so.6+0x35999)
    #5 0x7fc21b22c2e7 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e7)
    #6 0x57c039 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/lld+0x57c039)

This reverts commit r345014.

llvm-svn: 345017
2018-10-23 10:34:57 +00:00
Roman Lebedev 1c95b2f779 [X86][BMI1] X86DAGToDAGISel: select BEXTR from x << (32 - y) >> (32 - y) pattern
Summary:
Continuation of D52348.

We also get the `c) x &  (-1 >> (32 - y))` pattern here, because of the D48768.
I will add extra-uses into those tests and follow-up with a patch to handle those patterns too.

Reviewers: RKSimon, craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D53521

llvm-svn: 345014
2018-10-23 09:08:44 +00:00
Saleem Abdulrasool 96cd3cc312 X86: fix a comment copy-paste issue (NFC)
The comment was copy-pasted but not updated.  NFC.

llvm-svn: 344973
2018-10-22 23:34:24 +00:00
Craig Topper 96889b8b96 [X86] Remove unused entries from the X86ProcFamily enum. Add a note to discourage creation of new enum entries.
As we've learned multiple times, a coarse grained enum like this is not scalable and we should be migrating away from it.

llvm-svn: 344972
2018-10-22 23:14:55 +00:00
Matthias Braun a0beeffeed X86: Do not optimize branches with undef eflags inputs
analyzeBranch()/insertBranch() etc. do not properly deal with an undef
flag on the eflags input and used to produce invalid MIR.  I don't see
this ever affecting real world inputs (I don't think it is possible to
produce undef flags with llvm IR), so I simply changed the code to bail
out in this case.

rdar://42122367

llvm-svn: 344970
2018-10-22 22:52:23 +00:00
Craig Topper c8e183f9ee Recommit r344877 "[X86] Stop promoting integer loads to vXi64"
I've included a fix to DAGCombiner::ForwardStoreValueToDirectLoad that I believe will prevent the previous miscompile.

Original commit message:

Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to rem

I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.

I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the lo

I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D53306

llvm-svn: 344965
2018-10-22 22:14:05 +00:00
Tim Northover a23c12a627 X86: add alias for pushfw/popfw in Intel mode
A while ago we changed pushf and popf in Intel mode to generate pushfq
and popfq. Unfortunately that left us with no way to get the 16-bit
encoding in Intel mode so this patch adds pushfw and popfw as aliases
there.

llvm-svn: 344949
2018-10-22 20:38:13 +00:00
Simon Pilgrim 3b91e9676b Revert rL344931 from llvm/trunk: [X86][SSE] getTargetShuffleMaskIndices - allow opt-in support for whole undef shuffle mask elements
We can't safely assume that certain RawMask entries are UNDEF as most variable shuffles ignore non-index bits - PSHUFB only works on i8 elts so it'd be safe to use but I'm intending to come up with an alternative approach that works for all.
........
Enable this for PSHUFB constant mask decoding and remove the ConstantPool DecodePSHUFBMask

llvm-svn: 344937
2018-10-22 19:01:25 +00:00
Simon Pilgrim 794f85cd93 Revert rL344933 from llvm/trunk: [X86][SSE] Tidyup DecodeVPERMILPMask shuffle mask decoding
We can't safely assume that certain RawMask entries are UNDEF as most variable shuffles ignore non-index bits.
........
Add support for UNDEF raw mask elements and remove the ConstantPool DecodeVPERMILPMask usage in X86ISelLowering.cpp

llvm-svn: 344936
2018-10-22 18:58:32 +00:00
Simon Pilgrim 476c9f42fc [X86][SSE] Tidyup DecodeVPERMILPMask shuffle mask decoding
Add support for UNDEF raw mask elements and remove the ConstantPool DecodeVPERMILPMask usage in X86ISelLowering.cpp

llvm-svn: 344933
2018-10-22 18:35:13 +00:00
Simon Pilgrim 3521367ff3 [X86][SSE] getTargetShuffleMaskIndices - allow opt-in support for whole undef shuffle mask elements
Enable this for PSHUFB constant mask decoding and remove the ConstantPool DecodePSHUFBMask

llvm-svn: 344931
2018-10-22 18:09:02 +00:00
Simon Pilgrim 5dff767c25 [X86] getTargetConstantBitsFromNode - handle extraction from larger constant pool entries
First step towards removing X86ShuffleDecodeConstantPool usage from X86ISelLowering.cpp

llvm-svn: 344924
2018-10-22 17:43:33 +00:00
Craig Topper 8d8dcfe690 Revert r344877 "[X86] Stop promoting integer loads to vXi64"
Sam McCall reported miscompiles in some tensorflow code. Reverting while I try to figure out.

llvm-svn: 344921
2018-10-22 16:59:24 +00:00
Simon Pilgrim 6f5cd7c67f [X86][SSE] getTargetShuffleMask - pull out repeated shuffle mask element size. NFCI.
llvm-svn: 344910
2018-10-22 15:33:30 +00:00
Roman Lebedev 898808504d [X86] X86DAGToDAGISel: handle BZHI selection too, not just BEXTR.
Summary:
As discussed in D52304 / IRC, we now have pattern matching for
'bit extract' in two places - tablegen and `X86DAGToDAGISel`.
There are 4 patterns.
And we will have a problem with `x &  (-1 >> (32 - y))` pattern.
* If the mask is one-use, then it is always unfolded into `x << (32 - y) >> (32 - y)` first.
  Thus, the existing test coverage is already broken.
* If it is not one-use, then it is not unfolded, and is matched as BZHI.
* If it is not one-use, we will not match it as BEXTR. And if it is one-use, it will have been unfolded already.
So we will either not handle that pattern for BEXTR, or not have test coverage for it.
This is bad.

As discussed with @craig.topper, let's unify this matching, and do everything in `X86DAGToDAGISel`.
Then we will not have code duplication, and will have proper test coverage.

This indeed does not affect any tests, and this is great.
It means that for these two patterns, the `X86DAGToDAGISel` is identical to the tablegen version.

Please review carefully, i'm not fully sure about that intrinsic change, and introduction of the new `X86ISD` opcode.

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: craig.topper

Subscribers: llvm-commits, craig.topper

Differential Revision: https://reviews.llvm.org/D53164

llvm-svn: 344904
2018-10-22 14:12:44 +00:00
Roman Lebedev 13c5ab2e27 [X86][BMI1]: X86DAGToDAGISel: select BEXTR from x & ((1 << nbits) + (-1)) pattern
Summary:
Trivial continuation of D52304.
While this pattern is not canonical, we do select it in the BZHI case,
so this should not be any different.

Reviewers: RKSimon, craig.topper, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D52348

llvm-svn: 344902
2018-10-22 13:54:17 +00:00
Craig Topper 290c081d91 [X86] Add patterns for vector and/or/xor/andn with other types than vXi64.
This makes fast isel treat all legal vector types the same way. Previously only vXi64 was in the fast-isel tables.

This unfortunately prevents matching of andn by fast-isel for these types since the requires SelectionDAG. But we already had this issue for vXi64. So at least we're consistent now.

Interestinly it looks like fast-isel can't handle instructions with constant vector arguments so the the not part of the andn patterns is selected with SelectionDAG. This explains why VPTERNLOG shows up in some of the tests.

This is a subset of D53268. As I make progress on that, I will try to reduce the number of lines in the tablegen files.

llvm-svn: 344884
2018-10-22 06:30:22 +00:00
Craig Topper 321df5b0d4 [X86] Stop promoting integer loads to vXi64
Summary:
Theoretically this was done to simplify the amount of isel patterns that were needed. But it also meant a substantial number of our isel patterns have to match an explicit bitcast. By making the vXi32/vXi16/vXi8 types legal for loads, DAG combiner should be able to change the load type to remove the bitcast.

I had to add some additional plain load instruction patterns and a few other special cases, but overall the isel table has reduced in size by ~12000 bytes. So it looks like this promotion was hurting us more than helping.

I still have one crash in vector-trunc.ll that I'm hoping @RKSimon can help with. It seems to relate to using getTargetConstantFromNode on a load that was shrunk due to an extract_subvector combine after the constant pool entry was created. So we end up decoding more mask elements than the load size.

I'm hoping this patch will simplify the number of patterns needed to remove the and/or/xor promotion.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits, RKSimon

Differential Revision: https://reviews.llvm.org/D53306

llvm-svn: 344877
2018-10-21 21:30:26 +00:00
Craig Topper 8de07b4db1 Revert r344873 "foo"
Rebase gone wrong left this in my tree.

llvm-svn: 344875
2018-10-21 21:08:37 +00:00