Commit Graph

19229 Commits

Author SHA1 Message Date
Craig Topper 120cffccf8 [X86] Manually reimplement getTargetInsertSubreg in X86DAGToDAGISel::matchBitExtract so we can call insertDAGNode on the target constant.
This is needed to maintain the topological sort order.

Fixes PR42992.

llvm-svn: 369084
2019-08-16 04:47:44 +00:00
Daniel Sanders 0c47611131 Apply llvm-prefer-register-over-unsigned from clang-tidy to LLVM
Summary:
This clang-tidy check is looking for unsigned integer variables whose initializer
starts with an implicit cast from llvm::Register and changes the type of the
variable to llvm::Register (dropping the llvm:: where possible).

Partial reverts in:
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
X86FixupLEAs.cpp - Some functions return unsigned and arguably should be MCRegister
X86FrameLowering.cpp - Some functions return unsigned and arguably should be MCRegister
HexagonBitSimplify.cpp - Function takes BitTracker::RegisterRef which appears to be unsigned&
MachineVerifier.cpp - Ambiguous operator==() given MCRegister and const Register
PPCFastISel.cpp - No Register::operator-=()
PeepholeOptimizer.cpp - TargetInstrInfo::optimizeLoadInstr() takes an unsigned&
MachineTraceMetrics.cpp - MachineTraceMetrics lacks a suitable constructor

Manual fixups in:
ARMFastISel.cpp - ARMEmitLoad() now takes a Register& instead of unsigned&
HexagonSplitDouble.cpp - Ternary operator was ambiguous between unsigned/Register
HexagonConstExtenders.cpp - Has a local class named Register, used llvm::Register instead of Register.
PPCFastISel.cpp - PPCEmitLoad() now takes a Register& instead of unsigned&

Depends on D65919

Reviewers: arsenm, bogner, craig.topper, RKSimon

Reviewed By: arsenm

Subscribers: RKSimon, craig.topper, lenary, aemerson, wuzish, jholewinski, MatzeB, qcolombet, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, wdng, nhaehnle, sbc100, jgravelle-google, kristof.beyls, hiraditya, aheejin, kbarton, fedor.sergeev, javed.absar, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, tpr, PkmX, jocewei, jsji, Petar.Avramovic, asbirlea, Jim, s.egerton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65962

llvm-svn: 369041
2019-08-15 19:22:08 +00:00
Craig Topper 2a372ba534 [X86] Add custom type legalization for bitcasting mmx to v2i32/v4i16/v8i8 to use movq2dq instead of going through memory.
llvm-svn: 369031
2019-08-15 18:23:37 +00:00
Craig Topper 6eebd2bcd7 [X86] Improve cost model for subvector extraction of less than 128-bit vectors
Now that we're using widening legalization. We need to improve our extract_subvector cost model for these types. This patch begins by modeling these as a subvector extract followed by a permute. I've left FIXMEs in the code for future improvements.

Differential Revision: https://reviews.llvm.org/D65892

llvm-svn: 369022
2019-08-15 17:29:42 +00:00
Jonas Devlieghere 0eaee545ee [llvm] Migrate llvm::make_unique to std::make_unique
Now that we've moved to C++14, we no longer need the llvm::make_unique
implementation from STLExtras.h. This patch is a mechanical replacement
of (hopefully) all the llvm::make_unique instances across the monorepo.

llvm-svn: 369013
2019-08-15 15:54:37 +00:00
Sanjay Patel 57d459309d [SDAG][x86] check for relaxed math when matching an FP reduction
If the last step in an FP add reduction allows reassociation and doesn't care
about -0.0, then we are free to recognize that computation as a reduction
that may reorder the intermediate steps.

This is requested directly by PR42705:
https://bugs.llvm.org/show_bug.cgi?id=42705
and solves PR42947 (if horizontal math instructions are actually faster than
the alternative):
https://bugs.llvm.org/show_bug.cgi?id=42947

Differential Revision: https://reviews.llvm.org/D66236

llvm-svn: 368995
2019-08-15 12:43:15 +00:00
Craig Topper 1e246b20c0 [X86] Add isel pattern to match VZEXT_MOVL and a v2i64 scalar_to_vector bitcasted from x86mmx to MOVQ2DQ.
We already had the pattern for just the scalar to vector and bitcast,
but not the case where we wanted zeroes in the high half of the xmm.

llvm-svn: 368972
2019-08-15 06:46:30 +00:00
Craig Topper e6409602a1 [X86] Make sure load is non-volatile in the MMX_X86movdq2q (loadv2i64) isel pattern.
This pattern will narrow the load so we should make sure its not
volatile.

llvm-svn: 368971
2019-08-15 06:46:26 +00:00
Craig Topper dbcbbf5658 [X86] Remove unneeded isel pattern for v4f32->v4i32 fp_to_sint and conversion to MMX.
fp_to_sint is turned into X86cvttp2si during isel preprocessing.
The other redundant isel patterns were removed previously, but I
missed this one because its in the MMX td file.

llvm-svn: 368968
2019-08-15 05:52:02 +00:00
Craig Topper a57734ba4e [X86] Disable custom type legalization for v2i32/v4i16/v8i8->i64.
The default legalization can take care of this.

llvm-svn: 368967
2019-08-15 05:51:58 +00:00
Craig Topper 57286afe4e [X86] Disable custom type legalization for v2i32/v4i16/v8i8->f64 bitcast.
The generic legalization handles this in the same way so just use
that.

llvm-svn: 368966
2019-08-15 05:51:54 +00:00
Craig Topper ba39fcd8c6 [X86] Remove some unreachable code from LowerBITCAST.
llvm-svn: 368965
2019-08-15 05:51:50 +00:00
Craig Topper 14f7560020 [X86] Remove some dead code and combine some repeated code that's left.
If the width is 256 bits, then we must have AVX so the else here
was unnecessary. Once that's removed then the >= 256 bit code is
identical to the 128 bit code with a different VT so combine them.

llvm-svn: 368956
2019-08-15 04:07:43 +00:00
Craig Topper 3e44d96170 [X86] Use PSADBW for v8i8 addition reductions.
Improves the 8 byte case from PR42674.

Differential Revision: https://reviews.llvm.org/D66069

llvm-svn: 368864
2019-08-14 15:57:29 +00:00
Craig Topper 30d3e9c395 [X86][CostModel] Adjust the costs of ZERO_EXTEND/SIGN_EXTEND with less than 128-bit inputs
Now that we legalize by widening, the element types here won't change. Previously these were modeled as the elements being widened and then the instruction might become an AND or SHL/ASHR pair. But now they'll become something like a ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG.

For AVX2, when the destination type is legal its clear the cost should be 1 since we have extend instructions that can produce 256 bit vectors from less than 128 bit vectors. I'm a little less sure about AVX1 costs, but I think the ones I changed were definitely too high, but they might still be too high.

Differential Revision: https://reviews.llvm.org/D66169

llvm-svn: 368858
2019-08-14 14:52:39 +00:00
Craig Topper 8c545168ee [X86] Add llvm_unreachable to a switch that covers all expected values.
llvm-svn: 368857
2019-08-14 14:51:19 +00:00
Simon Pilgrim e7b350a5d1 [X86] XFormVExtractWithShuffleIntoLoad - handle shuffle mask scaling
If the target shuffle mask is from a wider type, attempt to scale the mask so that the extraction can attempt to peek through.

Fixes the regression mentioned in rL368662

Reapplying this as rL368308 had to be reverted as part of rL368660 to revert rL368276

llvm-svn: 368663
2019-08-13 11:11:42 +00:00
Simon Pilgrim 1a8d790cf5 [X86] SimplifyDemandedVectorElts - attempt to recombine target shuffle using DemandedElts mask (reapplied)
If we don't demand all elements, then attempt to combine to a simpler shuffle.

At the moment we can only do this if Depth == 0 as combineX86ShufflesRecursively uses Depth to track whether the shuffle has really changed or not - we'll need to change this before we can properly start merging combineX86ShufflesRecursively into SimplifyDemandedVectorElts. 

The insertps-combine.ll regression is because XFormVExtractWithShuffleIntoLoad can't see through shuffles of different widths - this will be fixed in a follow-up commit.

Reapplying this as rL368307 had to be reverted as part of rL368660 to revert rL368276

llvm-svn: 368662
2019-08-13 10:51:39 +00:00
Hans Wennborg 5390d25f2b Revert r368276 "[TargetLowering] SimplifyDemandedBits - call SimplifyMultipleUseDemandedBits for ISD::EXTRACT_VECTOR_ELT"
This introduced a false positive MemorySanitizer warning about use of
uninitialized memory in a vectorized crc function in Chromium. That suggests
maybe something is not right with this transformation. See
https://crbug.com/992853#c7 for a reproducer.

This also reverts the follow-up commits r368307 and r368308 which
depended on this.

> This patch attempts to peek through vectors based on the demanded bits/elt of a particular ISD::EXTRACT_VECTOR_ELT node, allowing us to avoid dependencies on ops that have no impact on the extract.
>
> In particular this helps remove some unnecessary scalar->vector->scalar patterns.
>
> The wasm shift patterns are annoying - @tlively has indicated that the wasm vector shift codegen are to be refactored in the near-term and isn't considered a major issue.
>
> Differential Revision: https://reviews.llvm.org/D65887

llvm-svn: 368660
2019-08-13 09:33:25 +00:00
Amara Emerson e14c91b71a [GlobalISel] Make the InstructionSelector instance non-const, allowing state to be maintained.
Currently we can't keep any state in the selector object that we get from
subtarget. As a result we have to plumb through all our variables through
multiple functions. This change makes it non-const and adds a virtual init()
method to allow further state to be captured for each target.

AArch64 makes use of this in this patch to cache a call to hasFnAttribute()
which is expensive to call, and is used on each selection of G_BRCOND.

Differential Revision: https://reviews.llvm.org/D65984

llvm-svn: 368652
2019-08-13 06:26:59 +00:00
Reid Kleckner e9865b9b31 [WinEH] Fix catch block parent frame pointer offset
r367088 made it so that funclets store XMM registers into their local
frame instead of storing them to the parent frame. However, that change
forgot to update the parent frame pointer offset for catch blocks. This
change does that.

Fixes crashes when an exception is rethrown in a catch block that saves
XMMs, as described in https://crbug.com/992860.

llvm-svn: 368631
2019-08-12 23:02:00 +00:00
Craig Topper e07e593782 [X86] Allow combineTruncateWithSat to use pack instructions for i16->i8 without AVX512BW.
We need AVX512BW to be able to truncate an i16 vector. If we don't
have that we have to extend i16->i32, then trunc, i32->i8. But we
won't be able to remove the min/max if we do that. At least not
without more special handling.

llvm-svn: 368623
2019-08-12 22:18:23 +00:00
Craig Topper 0761a38e8a [X86] Remove unreachable code from LowerTRUNCATE. NFC
All three 256->128 bit cases were already handled above.

Noticed while looking at the coverage report.

llvm-svn: 368609
2019-08-12 19:26:45 +00:00
Craig Topper a3605baaff [X86] Add a paranoia type check to the code that detects AVG patterns from truncating stores.
If we're after type legalize, we should make sure we won't create
a store with an illegal type when we separate the AVG pattern
from the truncating store.

I don't know of a way to fail for this today. Just noticed while
I was in the vicinity.

llvm-svn: 368608
2019-08-12 19:26:37 +00:00
Craig Topper 1b02909847 [X86] Simplify creation of saturating truncating stores.
We just need to check if the truncating store is legal
instead of going through isSATValidOnAVX512Subtarget.

llvm-svn: 368607
2019-08-12 19:26:30 +00:00
Craig Topper 3f4e9b156d [X86] Replace call to isTruncStoreLegalOrCustom with isTruncStoreLegal. NFC
We have no custom trunc stores on X86.

llvm-svn: 368606
2019-08-12 19:26:22 +00:00
Craig Topper 09d5d15339 [X86] Disable use of zmm registers for varargs musttail calls under prefer-vector-width=256 and min-legal-vector-width=256.
Under this config, the v16f32 type we try to use isn't to a register
class so the getRegClassFor call will fail.

llvm-svn: 368594
2019-08-12 17:43:26 +00:00
Simon Pilgrim 182249daee [X86][SSE] ComputeKnownBits - add basic PSADBW handling
llvm-svn: 368558
2019-08-12 12:19:19 +00:00
Pengfei Wang e28cbbd5d4 [X86] Support -march=tigerlake
Support -march=tigerlake for x86.
Compare with Icelake Client, It include 4 more new features ,they are
avx512vp2intersect, movdiri, movdir64b, shstk.

Patch by Xiang Zhang (xiangzhangllvm)

Differential Revision: https://reviews.llvm.org/D65840

llvm-svn: 368543
2019-08-12 01:29:46 +00:00
Craig Topper ce6a2cf966 [X86] Simplify some of the type checks in combineSubToSubus.
If we have SSE2 we can handle any i8/i16 type and let
type legalization deal with it.

llvm-svn: 368538
2019-08-11 17:36:49 +00:00
Craig Topper 637964bfd8 [X86] Don't use SplitOpsAndApply for ISD::USUBSAT.
Target independent type legalization and custom lowering
should be able to handle it.

llvm-svn: 368537
2019-08-11 17:36:45 +00:00
Craig Topper 9758e0e1bf [X86] Remove some more code from combineShuffle that is no longer needed with widening legalization.
llvm-svn: 368523
2019-08-11 02:17:18 +00:00
Craig Topper 0f74b82ef1 [X86] Remove some code from combineShuffle that seems largely unnecessary with widening legalization.
The test case that changed is probably better served through
allowing combineTruncatedArithmetic to create narrow vectors. It
also appears InstCombine would have simplified this test case
to remove the zext and trunc anyway.

llvm-svn: 368522
2019-08-11 02:08:38 +00:00
Simon Pilgrim ec128709f0 [X86][SSE] Lower shuffle as ANY_EXTEND_VECTOR_INREG
On SSE41+ targets we always lower vector shuffles to ZERO_EXTEND_VECTOR_INREG, even if we don't need the extended bits.

This patch relaxes this so that we lower to ANY_EXTEND_VECTOR_INREG if we can, meaning that shuffle combines have a better idea of what elements need to be kept zero. This helps the multiple reduction code as we can now combine away a lot more of the pack+extend codes.

Differential Revision: https://reviews.llvm.org/D65741

llvm-svn: 368515
2019-08-10 16:46:07 +00:00
Craig Topper 74c43a2277 [X86] Match the IR pattern form movmsk on SSE1 only targets where v4i32 isn't legal
Summary:
This patch adds a special DAG combine for SSE1 to recognize the IR pattern InstCombine gives us for movmsk. This only does the recognition for a few cases where its obvious the input won't be scalarized resulting in building a vector just do to the movmsk. I've made it separate from our existing matching for movmsk since that's called in multiple places and I didn't spend time to see if the other callers would make sense here. Plus the restrictions and additional checks would complicate that.

This fixes the case from PR42870. Buts its probably still broken the presence of logic ops feeding the movmsk pattern which would further hide the v4f32 type.

Reviewers: spatel, RKSimon, xbolva00

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65689

llvm-svn: 368506
2019-08-10 07:51:13 +00:00
Craig Topper a8e5e73711 [X86] Improve the diagnostic for larger than 4-bit immediate for vpermil2pd/ps. Only allow MCConstantExprs.
llvm-svn: 368505
2019-08-10 04:28:52 +00:00
Luo, Yuanke c6c86f4f81 [X86] Fix stack probe issue on windows32.
Summary:
On windows if the frame size exceed 4096 bytes, compiler need to
generate a call to _alloca_probe. X86CallFrameOptimization pass
changes the reserved stack size and cause of stack probe function
not be inserted. This patch fix the issue by detecting the call
frame size, if the size exceed 4096 bytes, drop X86CallFrameOptimization.

Reviewers: craig.topper, wxiao3, annita.zhang, rnk, RKSimon

Reviewed By: rnk

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D65923

llvm-svn: 368503
2019-08-10 02:49:02 +00:00
Daniel Sanders e9a57c2b23 [globalisel] Add G_SEXT_INREG
Summary:
Targets often have instructions that can sign-extend certain cases faster
than the equivalent shift-left/arithmetic-shift-right. Such cases can be
identified by matching a shift-left/shift-right pair but there are some
issues with this in the context of combines. For example, suppose you can
sign-extend 8-bit up to 32-bit with a target extend instruction.
  %1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity)
  %2:_(s32) = G_ASHR %1:_(s32), i32 24
  %3:_(s32) = G_ASHR %2:_(s32), i32 1
would reasonably combine to:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 25
which no longer matches the special case. If your shifts and extend are
equal cost, this would break even as a pair of shifts but if your shift is
more expensive than the extend then it's cheaper as:
  %2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8
  %3:_(s32) = G_ASHR %2:_(s32), i32 1
It's possible to match the shift-pair in ISel and emit an extend and ashr.
However, this is far from the only way to break this shift pair and make
it hard to match the extends. Another example is that with the right
known-zeros, this:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 24
  %3:_(s32) = G_MUL %2:_(s32), i32 2
can become:
  %1:_(s32) = G_SHL %0:_(s32), i32 24
  %2:_(s32) = G_ASHR %1:_(s32), i32 23

All upstream targets have been configured to lower it to the current
G_SHL,G_ASHR pair but will likely want to make it legal in some cases to
handle their faster cases.

To follow-up: Provide a way to legalize based on the constant. At the
moment, I'm thinking that the best way to achieve this is to provide the
MI in LegalityQuery but that opens the door to breaking core principles
of the legalizer (legality is not context sensitive). That said, it's
worth noting that looking at other instructions and acting on that
information doesn't violate this principle in itself. It's only a
violation if, at the end of legalization, a pass that checks legality
without being able to see the context would say an instruction might not be
legal. That's a fairly subtle distinction so to give a concrete example,
saying %2 in:
  %1 = G_CONSTANT 16
  %2 = G_SEXT_INREG %0, %1
is legal is in violation of that principle if the legality of %2 depends
on %1 being constant and/or being 16. However, legalizing to either:
  %2 = G_SEXT_INREG %0, 16
or:
  %1 = G_CONSTANT 16
  %2:_(s32) = G_SHL %0, %1
  %3:_(s32) = G_ASHR %2, %1
depending on whether %1 is constant and 16 does not violate that principle
since both outputs are genuinely legal.

Reviewers: bogner, aditya_nandakumar, volkan, aemerson, paquette, arsenm

Subscribers: sdardis, jvesely, wdng, nhaehnle, rovka, kristof.beyls, javed.absar, hiraditya, jrtc27, atanasyan, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61289

llvm-svn: 368487
2019-08-09 21:11:20 +00:00
Eric Christopher db2f17d362 Remove variable only used in an assert.
llvm-svn: 368486
2019-08-09 21:02:47 +00:00
Craig Topper 6cb05ca044 [X86] Remove custom handling for extloads from LowerLoad.
We don't appear to need this with widening legalization.

llvm-svn: 368479
2019-08-09 20:27:22 +00:00
Simon Pilgrim 60394f47b0 [X86][SSE] Swap X86ISD::BLENDV inputs with an inverted selection mask (PR42825)
As discussed on PR42825, if we are inverting the selection mask we can just swap the inputs and avoid the inversion.

Differential Revision: https://reviews.llvm.org/D65522

llvm-svn: 368438
2019-08-09 12:44:20 +00:00
Tim Northover e1a5f668b3 GlobalISel: pack various parameters for lowerCall into a struct.
I've now needed to add an extra parameter to this call twice recently. Not only
is the signature getting extremely unwieldy, but just updating all of the
callsites and implementations is a pain. Putting the parameters in a struct
sidesteps both issues.

llvm-svn: 368408
2019-08-09 08:26:38 +00:00
Craig Topper 6179175551 [X86] Remove code that expands truncating stores from combineStore.
We shouldn't form trunc stores that need to be expanded now that
we are using widening legalization.

llvm-svn: 368400
2019-08-09 06:59:53 +00:00
Craig Topper 7e33f11ba7 [X86] Remove stale FIXME from combineMaskedStore. NFC
I believe PR34584 was tracking that FIXME, but its since been
closed and a test case was added.

llvm-svn: 368397
2019-08-09 05:55:41 +00:00
Craig Topper 8c5c09780d [X86] Remove DAG combine expansion of extending masked load and truncating masked store.
The only way to generate these was through promoting legalization
of narrow vectors, but we widen those types now. So we shouldn't
produce these nodes.

llvm-svn: 368396
2019-08-09 05:53:37 +00:00
Craig Topper 509c8774fa [X86] Remove handler for (U/S)(ADD/SUB)SAT from ReplaceNodeResults. Remove TypeWidenVector check from code that handles X86ISD::VPMADDWD and X86ISD::AVG.
More unneeded code since we now legalize narrow vectors by widening.

llvm-svn: 368395
2019-08-09 05:17:52 +00:00
Craig Topper 824961824f [X86] Remove ISD::SETCC handling from ReplaceNodeResults.
This is no longer needed since we widen v2i32 instead of promoting.

llvm-svn: 368394
2019-08-09 05:17:48 +00:00
Craig Topper ef5b435b00 [X86] Simplify ISD::LOAD handling in ReplaceNodeResults and ISD::STORE handling in LowerStore now that v2i32 is widened to v4i32.
llvm-svn: 368390
2019-08-09 03:09:43 +00:00
Craig Topper 0da681a2be [X86] Merge v2f32 and v2i32 gather/scatter handling in ReplaceNodeResults/LowerMSCATTER now that v2i32 is also widened like v2f32.
llvm-svn: 368389
2019-08-09 03:09:28 +00:00
Craig Topper 6f81db0f68 [X86] Now unreachable handling for f64->v2i32/v4i16/v8i8 bitcasts from ReplaceNodeResults.
We rely on the generic type legalizer for this now.

llvm-svn: 368388
2019-08-09 03:09:19 +00:00