Commit Graph

13181 Commits

Author SHA1 Message Date
Sanjay Patel fd58d623ff [x86] split tests for FP and integer horizontal math
These are similar patterns, but when you throw AVX512 onto the pile,
the number of variations explodes. For FP, we really don't care about
AVX1 vs. AVX2 for FP ops. There may be some superficial shuffle diffs,
but that's not what we're testing for here, so I removed those RUNs.

Separating by type also lets us specify 'sse3' for the FP file vs. 'ssse3'
for the integer file...because x86.

llvm-svn: 350357
2019-01-03 22:26:51 +00:00
Sanjay Patel 8db27b31ac [x86] add common FileCheck prefix to reduce assert duplication; NFC
llvm-svn: 350356
2019-01-03 22:11:14 +00:00
Sanjay Patel 9633d76a40 [DAGCombiner][x86] scalarize binop followed by extractelement
As noted in PR39973 and D55558:
https://bugs.llvm.org/show_bug.cgi?id=39973
...this is a partial implementation of a fold that we do as an IR canonicalization in instcombine:

// extelt (binop X, Y), Index --> binop (extelt X, Index), (extelt Y, Index)

We want to have this in the DAG too because as we can see in some of the test diffs (reductions), 
the pattern may not be visible in IR.

Given that this is already an IR canonicalization, any backend that would prefer a vector op over 
a scalar op is expected to already have the reverse transform in DAG lowering (not sure if that's
a realistic expectation though). The transform is limited with a TLI hook because there's an
existing transform in CodeGenPrepare that tries to do the opposite transform.

Differential Revision: https://reviews.llvm.org/D55722

llvm-svn: 350354
2019-01-03 21:31:16 +00:00
Sanjay Patel 4e71ff234e [x86] add tests for buildvector with extracted element; NFC
llvm-svn: 350338
2019-01-03 17:55:32 +00:00
Simon Pilgrim 44d6b25d2c [X86] Cleanup saturated add/sub tests
Use X86/X64 check prefixes
Use nounwind to reduce cfi noise

llvm-svn: 350301
2019-01-03 12:31:13 +00:00
Markus Lavin 72b9deb21f [CodeGen] Skip over dbg-instr in twoaddr pass
A DBG_VALUE between a two-address instruction and a following COPY
would prevent rescheduleMIBelowKill optimization inside
TwoAddressInstructionPass.

Differential Revision: https://reviews.llvm.org/D55987

llvm-svn: 350289
2019-01-03 08:36:06 +00:00
Craig Topper 5ef47ad82e [X86] Add test cases for opportunities to use KTEST when check if the result of ANDing two mask registers is zero.
The test cases are constructed to avoid folding the AND into a masked compare operation.

Currently we emit a KAND and a KORTEST for these cases.

llvm-svn: 350287
2019-01-03 07:12:54 +00:00
Craig Topper df5304d8de [X86] Add load folding support to the custom isel we do for X86ISD::UMUL/SMUL.
The peephole pass isn't always able to fold the load because it can't commute the implicit usage of AL/AX/EAX/RAX.

llvm-svn: 350272
2019-01-02 23:24:08 +00:00
Craig Topper ce46bfa848 [X86] Add test cases to show that we fail to fold loads into i8 smulo and i8/i16/i32/i64 umulo lowering without the assistance of the peephole pass. NFC
llvm-svn: 350271
2019-01-02 23:24:03 +00:00
Craig Topper 9d4860ec4e [X86] Remove X86ISD::INC/DEC. Just select them from X86ISD::ADD/SUB at isel time
INC/DEC are pretty much the same as ADD/SUB except that they don't update the C flag.

This patch removes the special nodes and just pattern matches from ADD/SUB during isel if the C flag isn't being used.

I had to avoid selecting DEC is the result isn't used. This will become a SUB immediate which will turned into a CMP later by optimizeCompareInstr. This lead to the one test change where we use a CMP instead of a DEC for an overflow intrinsic since we only checked the flag.

This also exposed a hole in our RMW flag matching use of hasNoCarryFlagUses. Our root node for the match is a store and there's no guarantee that all the flag users have been selected yet. So hasNoCarryFlagUses needs to check copyToReg and machine opcodes, but it also needs to check for the pre-match SETCC, SETCC_CARRY, BRCOND, and CMOV opcodes.

Differential Revision: https://reviews.llvm.org/D55975

llvm-svn: 350245
2019-01-02 19:01:05 +00:00
Craig Topper 8dd7bd2cd7 [DAGCombiner] After performing the division by constant optimization for a DIV or REM node, replace the users of the corresponding REM or DIV node if it exists.
Currently we expand the two nodes separately. This gives DAG combiner an opportunity to optimize the expanded sequence taking into account only one set of users. When we expand the other node we'll create the expansion again, but might not be able to optimize it the same way. So the nodes won't CSE and we'll have two similarish sequences in the same basic block. By expanding both nodes at the same time we'll avoid prematurely optimizing the expansion until both the division and remainder have been replaced.

Improves the test case from PR38217. There may be additional opportunities after this.

Differential Revision: https://reviews.llvm.org/D56145

llvm-svn: 350239
2019-01-02 18:19:07 +00:00
Craig Topper 3109f3a4ab [LegalizeIntegerTypes] When promoting the result of an extract_vector_elt also promote the input type if necessary
By also promoting the input type we get a better idea for what scalar type to use. This can provide better results if the result of the extract is sign extended. What was previously happening is that the extract result would be legalized, sometime later the input of the sign extend would be legalized using the result of the extract. Then later the extract input would be legalized forcing a truncate into the input of the sign extend using a replace all uses. This requires DAG combine to combine out the sext/truncate pair. But sometimes we visited the truncate first and messed things up before the sext could be combined.

By creating the extract with the correct scalar type when we create legalize the result type, the truncate will be added right away. Then when the sign_extend input is legalized it will create an any_extend of the truncate which can be optimized by getNode to maybe remove the truncate. And then a sign_extend_inreg. Now DAG combine doesn't have to worry about getting rid of the extend.

This fixes the regression on X86 in D56156.

Differential Revision: https://reviews.llvm.org/D56176

llvm-svn: 350236
2019-01-02 17:58:30 +00:00
Craig Topper c562fae02b [DAGCombiner][X86][PowerPC] Teach visitSIGN_EXTEND_INREG to fold (sext_in_reg (aext/sext x)) -> (sext x) when x has more than 1 sign bit and the sext_inreg is from one of them.
If x has multiple sign bits than it doesn't matter which one we extend from so we can sext from x's msb instead.

The X86 setcc-combine.ll changes are a little weird. It appears we ended up with a (sext_inreg (aext (trunc (extractelt)))) after type legalization. The sext_inreg+aext now gets optimized by this combine to leave (sext (trunc (extractelt))). Then we visit the trunc before we visit the sext. This ends up changing the truncate to an extractvectorelt from a bitcasted vector. I have a follow up patch to fix this.

Differential Revision: https://reviews.llvm.org/D56156

llvm-svn: 350235
2019-01-02 17:58:27 +00:00
Simon Pilgrim d8125726d5 [X86] Support SHLD/SHRD masked shift-counts (PR34641)
Peek through shift modulo masks while matching double shift patterns.

I was hoping to delay this until I could remove the X86 code with generic funnel shift matching (PR40081) but this will do for now.

Differential Revision: https://reviews.llvm.org/D56199

llvm-svn: 350222
2019-01-02 17:05:37 +00:00
Sanjay Patel eafd481aad [x86] add more tests for potential horizontal ops; NFC
As discussed in D56011 - add runs for AVX512 and tests with extra uses.

llvm-svn: 350221
2019-01-02 16:36:04 +00:00
Craig Topper 8969720787 [X86] Add i8/i16 smulo/umulo test cases where the overflow indication is used by a mask.
llvm-svn: 350204
2019-01-02 05:46:02 +00:00
Craig Topper 6f2feb8293 [X86] Remove KNL specific check prefix from xmulo.ll test. NFC
This was added at a time when i1 was a legal type with avx512f and there was a bug. i1 is no longer considered a legal type with avx512f so there should be no codegen difference.

llvm-svn: 350203
2019-01-02 05:46:00 +00:00
Craig Topper 00b390a000 [X86] Factor the core code out of LowerXALUO into a helper function. Use it in LowerBRCOND and LowerSELECT to avoid some duplicated code.
This makes it easier to keep the LowerBRCOND and LowerSELECT code in sync with LowerXALUO so they always pick the same operation for overflowing instructions.

This is inspired by the helper functions used by ARM and AArch64 for the same purpose.

The test change is because LowerSELECT was not in sync with LowerXALUO with regard to INC/DEC for SADDO/SSUBO.

llvm-svn: 350198
2019-01-01 19:34:11 +00:00
Craig Topper a728214203 [X86] Remove KNL specific check prefix from xaluo.ll test. NFC
This was added at a time when i1 was a legal type with avx512f and there was a bug. i1 is no longer considered a legal type with avx512f so there should be no codegen difference.

llvm-svn: 350195
2019-01-01 18:44:44 +00:00
Craig Topper 9478492a80 [X86] Add test cases to show where LowerSELECT doesn't select SADDO/SSUBO to INC/DEC, but LowerXALUOOp does. Leading to duplicate code.
When SADDO/SSUBO is used as a part of a condition, the X86 backend has to lower the instruction twice. One for the flags use and then once for the data use. These two selections should be kept in sync so they end up with one node providing the data and the flags. This doesn't seem to be happening for INC/DEC.

llvm-svn: 350194
2019-01-01 18:44:42 +00:00
Simon Pilgrim 8b503c795e [X86] Add PR34641 masked shld/shrd test cases
llvm-svn: 350181
2018-12-31 19:46:18 +00:00
Craig Topper c25f1f8f17 [X86] Add additional RUN lines to prepare for D56156. NFC
llvm-svn: 350180
2018-12-31 19:09:32 +00:00
Craig Topper ed3ffae4a4 [SelectionDAG] Add SIGN_EXTEND_VECTOR_INREG support to computeKnownBits.
Differential Revision: https://reviews.llvm.org/D56168

llvm-svn: 350179
2018-12-31 19:09:30 +00:00
Craig Topper bb0873cf46 [X86] Add X86ISD::VSRAI to computeKnownBitsForTargetNode.
Differential Revision: https://reviews.llvm.org/D56169

llvm-svn: 350178
2018-12-31 19:09:27 +00:00
Craig Topper a32e353afa [X86] Don't mark SEXTLOAD from v4i8/v4i16/v8i8 as Custom on pre-sse4.1.
This seems to be getting in the way more than its helping. This does mean we stop scalarizing some cases, but I'm not convinced the scalarization was really better.

Some of the changes to vsel-cmp-load.ll are a regression but D56156 should fix it.

llvm-svn: 350159
2018-12-30 03:05:07 +00:00
Craig Topper f237ce159e [X86] Add custom type legalization for SIGN_EXTEND_VECTOR_INREG from 16i16/v32i8 to v4i64 when v4i64 needs splitting.
This allows us to sign extend to v4i32 first. And then share that extension to implement the final steps to v4i64 using a pcmpgt and punpckl and punpckh.

We already do something similar for SIGN_EXTEND with -x86-experimental-vector-widening-legalization.

llvm-svn: 350158
2018-12-30 02:30:34 +00:00
Craig Topper 7bb1d50455 [X86] Add test case from PR38217. NFC
llvm-svn: 350150
2018-12-29 07:14:30 +00:00
Craig Topper 0a6cec6f9f [X86] Don't mark SEXTLOAD v4i8->v4i64 and v8i8->v8i64 as custom under vector widening legalization.
This was tricking us into making these operations and then letting them get scalarized later. But I can't prove that the scalarized version is actually better.

llvm-svn: 350141
2018-12-29 01:17:11 +00:00
Craig Topper f814d28eb3 [X86] Directly emit X86ISD::PMULUDQ from the ReplaceNodeResults handling of v2i8/v2i16/v2i32 multiply.
Previously we emitted a multiply and some masking that was supposed to matched to PMULUDQ, but the masking could sometimes be removed before we got a chance to match it. So instead just emit the PMULUDQ directly.

Remove the DAG combine that was added when the ReplaceNodeResults code was originally added. Add a new DAG combine to avoid regressions in shrink_vmul.ll

Some of the shrink_vmul.ll test cases now pick PMULUDQ instead of PMADDWD/PMULLD, but I think this should be an improvement on most CPUs.

I think all of this can go away if/when we switch to -x86-experimental-vector-widening-legalization

llvm-svn: 350134
2018-12-28 19:19:39 +00:00
Craig Topper 787ad92bf6 [X86] Remove check that avoids creating PMULDQ with illegal types. Rely on SplitOpsAndApply to legalize it.
Create PMULDQ/PMULUDQ as long as the number of elements is a power of 2.

This seems to give some improvements in our ability to use SimplifyDemandedBits.

llvm-svn: 350084
2018-12-27 03:37:04 +00:00
Craig Topper 0229da8f07 [X86] Use GetDemandedBits to simplify the operands of PMULDQ/PMULUDQ.
This is an alternative to what I attempted in D56057.

GetDemandedBits is a special version of SimplifyDemandedBits that allows simplifications even when the operand has other uses. GetDemandedBits will only do simplifications that allow a node to be bypassed. It won't create new nodes or alter any of the other users.

I had to add support for bypassing SIGN_EXTEND_INREG to GetDemandedBits.

Based on a patch that Simon Pilgrim sent me in email.

Fixes PR40142.

llvm-svn: 350059
2018-12-24 19:40:20 +00:00
Craig Topper 6356ad940b [X86] Add test cases for PR40142. NFC
llvm-svn: 350058
2018-12-24 19:40:17 +00:00
Craig Topper 5eb5e2bc89 [X86] Autogenerate complete checks. NFC
llvm-svn: 350039
2018-12-24 01:59:31 +00:00
Sanjay Patel 93f1074677 [DAGCombiner] limit shuffle to extend transform (PR40146)
It's dangerous to knowingly create an illegal vector type
no matter what stage of combining we're in.

This prevents the missed folding/scalarization seen in:
https://bugs.llvm.org/show_bug.cgi?id=40146

llvm-svn: 350034
2018-12-23 20:48:31 +00:00
Sanjay Patel 9e5588e1df [x86] add test for vector shuffle --> extend transform (PR40146); NFC
llvm-svn: 350033
2018-12-23 20:36:52 +00:00
Sanjay Patel 9933574ac3 [DAGCombiner] allow hoisting vector bitwise logic ahead of extends
llvm-svn: 350032
2018-12-23 19:58:16 +00:00
Sanjay Patel 8bc612f63b [x86] add tests for vector extend + logic ops; NFC
llvm-svn: 350031
2018-12-23 18:37:44 +00:00
Craig Topper 006bac6880 [X86] Return false from hasAndNotCompare if the comparision value is a constant.
We won't end up using an ANDN instruction in this case so we should generate the same code we do for pre-BMI targets.

llvm-svn: 350018
2018-12-23 05:52:55 +00:00
Craig Topper 3cc92a28ce [X86] Fix an old FIXME about folding the zero constant into the OR instruction we use for sequentially consistent fence in 32-bit mode without SSE2.
llvm-svn: 350013
2018-12-23 01:54:43 +00:00
Craig Topper dfb8a427ff [X86] Autogenerate complete checks. NFC
llvm-svn: 350012
2018-12-23 01:54:41 +00:00
Sanjay Patel 4b537aaf6d [DAGCombiner] allow narrowing of add followed by truncate
trunc (add X, C ) --> add (trunc X), C'

If we're throwing away the top bits of an 'add' instruction, do it in the narrow destination type.
This makes the truncate-able opcode list identical to the sibling transform done in IR (in instcombine).

This change used to show regressions for x86, but those are gone after D55494. 
This gets us closer to deleting the x86 custom function (combineTruncatedArithmetic) 
that does almost the same thing.

Differential Revision: https://reviews.llvm.org/D55866

llvm-svn: 350006
2018-12-22 17:10:31 +00:00
Sanjay Patel 52c02d70e2 [x86] add load fold patterns for movddup with vzext_load
The missed load folding noticed in D55898 is visible independent of that change 
either with an adjusted IR pattern to start or with AVX2/AVX512 (where the build 
vector becomes a broadcast first; movddup is not produced until we get into isel 
via tablegen patterns).

Differential Revision: https://reviews.llvm.org/D55936

llvm-svn: 350005
2018-12-22 16:59:02 +00:00
Roman Lebedev da1df56e5d NFC][CodeGen][X86][AArch64] Tests for bit extract (pat. a/c/d) with trunc (PR36419)
llvm-svn: 350000
2018-12-22 10:38:05 +00:00
Roman Lebedev c90611db06 [NFC][CodeGen][X86][AArch64] Bit extract: add nounwind attr to drop .cfi noise
Forgot about that.

llvm-svn: 349999
2018-12-22 09:58:13 +00:00
Roman Lebedev 29d8af283a [NFC][CodeGen][X86][AArch64] Tests for bit extract (pat. b) with trunc (PR36419)
@bextr64_32_b1 is extracted from hotpath of real-world code
(RawSpeed BitStream<>::peekBitsNoFill()) after `clang -O3`.

@bextr64_32_b2/@bextr64_32_b0 is the same pattern,
but with trunc done last, showing how i think it can be handled:
https://rise4fun.com/Alive/K4B
https://rise4fun.com/Alive/qC9

It is possible that middle-end should do some of this, too.

https://bugs.llvm.org/show_bug.cgi?id=36419

llvm-svn: 349998
2018-12-22 09:40:14 +00:00
David Blaikie 9efb0153f0 llvm-dwarfdump: Remove extraneous space between '(' and 'indexed'
When dumping string or address indexes

llvm-svn: 349997
2018-12-22 08:43:08 +00:00
Craig Topper e58cd9cbc6 [X86] Add isel patterns to match BMI/TBMI instructions when lowering has turned the root nodes into one of the flag producing binops.
This fixes the patterns that have or/and as a root. 'and' is handled differently since thy usually have a CMP wrapped around them.

I had to look for uses of the CF flag because all these nodes have non-standard CF flag behavior. A real or/xor would always clear CF. In practice we shouldn't be using the CF flag from these nodes as far as I know.

Differential Revision: https://reviews.llvm.org/D55813

llvm-svn: 349962
2018-12-21 21:42:43 +00:00
Craig Topper 62ec024d3b [X86] Don't allow optimizeCompareInstr to replace a CMP with BEXTR if the sign flag is used.
The BEXTR instruction documents the SF bit as undefined.

The TBM BEXTR instruction has the same issue, but I'm not sure how to test it. With the control being an immediate we can determine the sign bit is 0 or the BEXTR would have been removed.

Fixes PR40060

Differential Revision: https://reviews.llvm.org/D55807

llvm-svn: 349956
2018-12-21 21:16:26 +00:00
Sanjay Patel 80187b8a17 [x86] add movddup specialization for build vector lowering (PR37502)
This is admittedly a narrow fix for the problem:
https://bugs.llvm.org/show_bug.cgi?id=37502
...but as the XOP restriction shows, it's a maze to get this right. 
In the motivating example, note that we have movddup before SSE4.1 and 
again with AVX2. That's because insertps isn't available pre-SSE41 and 
vbroadcast is (more generally) available with AVX2 (and the splat is 
reduced to movddup via isel pattern).

Differential Revision: https://reviews.llvm.org/D55898

llvm-svn: 349937
2018-12-21 18:48:32 +00:00
Sanjay Patel a87fba4e92 [x86] remove excess check lines; NFC
Forgot that the integer variants have an extra 's'.

llvm-svn: 349929
2018-12-21 17:19:43 +00:00
Sanjay Patel 252660c1ff [x86] move misplaced tests; NFC
Mixed up integer and FP in rL349923.

llvm-svn: 349928
2018-12-21 17:06:43 +00:00
Sanjay Patel 41eebdefa7 [x86] add tests for possible horizontal op transform; NFC
llvm-svn: 349923
2018-12-21 16:49:41 +00:00
Sanjay Patel fef39ecd31 [x86] move test for movddup; NFC
This adds an AVX512 run as suggested in D55936.
The test didn't really belong with other build vector tests
because that's not the pattern here. I don't see much value 
in adding 64-bit RUNs because they wouldn't exercise the 
isel patterns that we're aiming to expose.

llvm-svn: 349920
2018-12-21 16:08:27 +00:00
Simon Pilgrim 5d403f6bf8 [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm)
This auto upgrades the signed SSE saturated math intrinsics to SADD_SAT/SSUB_SAT generic intrinsics.

Clang counterpart: https://reviews.llvm.org/D55890

Differential Revision: https://reviews.llvm.org/D55894

llvm-svn: 349892
2018-12-21 09:04:14 +00:00
Craig Topper 7b78137403 [X86] Autogenerate complete checks. NFC
llvm-svn: 349870
2018-12-21 01:27:33 +00:00
Simon Pilgrim 2a25360ae3 [X86] Auto upgrade XOP/AVX512 rotation intrinsics to generic funnel shift intrinsics (llvm)
This emits FSHL/FSHR generic intrinsics for the XOP VPROT and AVX512 VPROL/VPROR rotation intrinsics.

Clang counterpart: https://reviews.llvm.org/D55937

Differential Revision: https://reviews.llvm.org/D55938

llvm-svn: 349795
2018-12-20 19:01:07 +00:00
Sanjay Patel 18b008b577 [x86] add test to show missed movddup load fold; NFC
llvm-svn: 349773
2018-12-20 17:05:57 +00:00
Simon Pilgrim b208255fe0 [SelectionDAGBuilder] Enable funnel shift building to custom rotates
This patch enables funnel shift -> rotate building for all ROTL/ROTR custom/legal operations.

AFAICT X86 was the last target that was missing modulo support (PR38243), but I've tried to CC stakeholders for every target that has ROTL/ROTR custom handling for their final OK.

Differential Revision: https://reviews.llvm.org/D55747

llvm-svn: 349765
2018-12-20 14:56:44 +00:00
Simon Pilgrim 09c081176a [X86][AVX512] Don't custom lower v16i8 rotations.
As discussed on D55747, the expansion to (wider) shifts is better on all AVX512 cases, not just BWI.

llvm-svn: 349763
2018-12-20 14:38:35 +00:00
Clement Courbet 36a3480385 Re-land r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Update PPC ir following GEP->bitcat to bitcat->GEP->bitcat change.

llvm-svn: 349747
2018-12-20 13:01:04 +00:00
Simon Pilgrim 6bbf39b48c [X86][SSE] Auto upgrade PADDS/PSUBS intrinsics to SADD_SAT/SSUB_SAT generic intrinsics (llvm)
Pulled out of D55894 to match the clang changes in D55890.

Differential Revision: https://reviews.llvm.org/D55890

llvm-svn: 349744
2018-12-20 11:53:54 +00:00
Simon Pilgrim e85ad60ee0 [X86] Update PADDSW/PSUBSW intrinsic usage with generic saturated intrinsics.
As discussed on D55894, this makes no difference to the actual test.

llvm-svn: 349742
2018-12-20 11:14:56 +00:00
Clement Courbet e22cf4d7cb Revert r349731 "[CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads."
Forgot to update PowerPC tests for the GEP->bitcast change.

llvm-svn: 349733
2018-12-20 09:58:33 +00:00
Clement Courbet 1bb6e1b0f2 [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads.
Summary:
This allows expanding {7,11,13,14,15,21,22,23,25,26,27,28,29,30,31}-byte memcmp
in just two loads on X86. These were previously calling memcmp.

Reviewers: spatel, gchatelet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55263

llvm-svn: 349731
2018-12-20 09:13:47 +00:00
Sanjay Patel ca6434de37 [x86] add test to show ddup hole; NFC (PR37502)
llvm-svn: 349680
2018-12-19 20:35:28 +00:00
Craig Topper 84a00bd98a [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing
The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI.

This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect.

Differential Revision: https://reviews.llvm.org/D55870

llvm-svn: 349661
2018-12-19 18:49:13 +00:00
Craig Topper 291470347a [X86] Fix assert fails in pass X86AvoidSFBPass
Fixes https://bugs.llvm.org/show_bug.cgi?id=38743

The function removeRedundantBlockingStores is supposed to remove any blocking stores contained in each other in lockingStoresDispSizeMap.
But it currently looks only at the previous one, which will miss some cases that result in assert.

This patch refine the function to check all previous layouts until find the uncontained one. So all redundant stores will be removed.

Patch by Pengfei Wang

Differential Revision: https://reviews.llvm.org/D55642

llvm-svn: 349660
2018-12-19 18:45:57 +00:00
Simon Pilgrim 171f3aa012 [X86] Remove already upgraded llvm.x86.avx512.mask.padds/psubs tests
Duplicate tests have already been moved to avx512bw-intrinsics-upgrade.ll

llvm-svn: 349643
2018-12-19 17:18:27 +00:00
Simon Pilgrim 7bfbf3caa4 [X86][SSE] Auto upgrade PADDUS/PSUBUS intrinsics to UADD_SAT/USUB_SAT generic intrinsics (llvm)
Now that we use the generic ISD opcodes, we can use the generic intrinsics directly as well. This fixes the poor fast-isel codegen by not expanding to an easily broken IR code sequence.

I'm intending to deal with the signed saturation equivalents as well.

Clang counterpart: https://reviews.llvm.org/D55879

Differential Revision: https://reviews.llvm.org/D55855

llvm-svn: 349630
2018-12-19 14:43:36 +00:00
Simon Pilgrim 2ae3a91656 [SelectionDAG] Optional handling of UNDEF elements in matchBinaryPredicate (part 2 of 2)
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts is simplifying vector elements, we're seeing more constant BUILD_VECTOR containing undefs.

This patch provides opt-in support for UNDEF elements in matchBinaryPredicate, passing NULL instead of the result ConstantSDNode* argument.

I've updated the (or (and X, c1), c2) -> (and (or X, c2), c1|c2) fold to demonstrate its use, which I believe is safe for undef cases.

Differential Revision: https://reviews.llvm.org/D55822

llvm-svn: 349629
2018-12-19 14:09:38 +00:00
Simon Pilgrim 6c95bea072 [TargetLowering] Fix propagation of undefs in zero extension ops (PR40091)
As described on PR40091, we have several places where zext (and zext_vector_inreg) fold an undef input into an undef output. For zero extensions this is incorrect as the output should guarantee to least have the new upper bits set to zero.

SimplifyDemandedVectorElts is the worst offender (and its the most likely to cause new undefs to appear) but DAGCombiner's tryToFoldExtendOfConstant has a similar issue.

Thanks to @dmgreen for catching this.

Differential Revision: https://reviews.llvm.org/D55883

llvm-svn: 349625
2018-12-19 13:37:59 +00:00
Simon Pilgrim ac62c8a3aa [X86][SSE] Remove use of SSE ADDS/SUBS saturation intrinsics from schedule/stack tests
These are due to be upgraded soon, but good to replace them with generic llvm sadd_sat/ssub_sat intrinsics now.

The avx512 masked cases need doing as well but require a bit of tidyup first.

llvm-svn: 349621
2018-12-19 12:00:25 +00:00
Simon Pilgrim 2072b5afbe [SelectionDAG] Optional handling of UNDEF elements in matchUnaryPredicate
Now that SimplifyDemandedBits/SimplifyDemandedVectorElts are simplifying vector elements, we're seeing more constant BUILD_VECTOR containing UNDEFs.

This patch provides opt-in handling of UNDEF elements in matchUnaryPredicate, passing NULL instead of the ConstantSDNode* argument.

I've updated SelectionDAG::simplifyShift to demonstrate its use.

Differential Revision: https://reviews.llvm.org/D55819

llvm-svn: 349616
2018-12-19 10:41:06 +00:00
Simon Pilgrim d4b077698a [X86][SSE] Remove SSE ADDUS/SUBUS saturation intrinsics from schedule/stack tests
These are already being autoupgraded, currently to an IR sequence, but best to replace them with generic llvm uadd_sat/usub_sat intrinsics (which D55855 will be doing shortly anyhow).

The avx512 masked cases need doing as well but require a bit of tidyup first.

llvm-svn: 349615
2018-12-19 10:39:14 +00:00
Pete Cooper f86db5ce9e Rewrite objc intrinsics to runtime methods in PreISelIntrinsicLowering instead of SDAG.
SelectionDAG currently changes these intrinsics to function calls, but that won't work
for other ISel's.  Also we want to eventually support nonlazybind and weak linkage coming
from the front-end which we can't do in SelectionDAG.

llvm-svn: 349552
2018-12-18 22:20:03 +00:00
Craig Topper 18a9d545e1 [X86] Add BSR to isUseDefConvertible.
We already had BSF here as part of __builtin_ffs improvements and I was just wondering yesterday whether we should have BSR there.

This addresses one issue from PR40090.

llvm-svn: 349531
2018-12-18 20:03:54 +00:00
Nikita Popov f6058ff140 [X86] Use SADDSAT/SSUBSAT instead of ADDS/SUBS
Migrate the X86 backend from X86ISD opcodes ADDS and SUBS to generic
ISD opcodes SADDSAT and SSUBSAT. This also improves scodegen for
@llvm.sadd.sat() and @llvm.ssub.sat() intrinsics.

This is a followup to D55787 and part of PR40056.

Differential Revision: https://reviews.llvm.org/D55833

llvm-svn: 349520
2018-12-18 18:28:22 +00:00
Craig Topper 20a6db5a84 [X86] Create PSUBUS from (add (umax X, C), -C)
InstCombine seems to canonicalize or PSUB patter into a max with the cosntant and an add with an inverse of the constant.

This patch recognizes this pattern and turns it into PSUBUS. Future work could improve undef element handling.

Fixes some of PR40053

Differential Revision: https://reviews.llvm.org/D55780

llvm-svn: 349519
2018-12-18 18:26:25 +00:00
Simon Pilgrim 1411917431 [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for constant rotation amounts
Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion.

llvm-svn: 349510
2018-12-18 17:31:11 +00:00
Simon Pilgrim e9effe9744 [X86][SSE] Don't use 'sign bit select' vXi8 ROTL lowering for splat rotation amounts
Noticed by @spatel on D55747 - we get much better codegen if we use the regular shift expansion.

llvm-svn: 349500
2018-12-18 16:02:23 +00:00
Simon Pilgrim be0fbe673e [X86][SSE] Add shift combine 'out of range' tests with UNDEFs
Shows failure to simplify out of range shift amounts to UNDEF if any element is UNDEF.

llvm-svn: 349483
2018-12-18 13:37:04 +00:00
Nikita Popov 665ab08178 [X86] Use UADDSAT/USUBSAT instead of ADDUS/SUBUS
Replace the X86ISD opcodes ADDUS and SUBUS with generic ISD opcodes
UADDSAT and USUBSAT. As a side-effect, this also makes codegen for
the @llvm.uadd.sat and @llvm.usub.sat intrinsics reasonable.

This only replaces use in the X86 backend, and does not move any of
the ADDUS/SUBUS X86 specific combines into generic codegen.

Differential Revision: https://reviews.llvm.org/D55787

llvm-svn: 349481
2018-12-18 13:23:03 +00:00
Nikita Popov a7d2a235bb [SelectionDAG][X86] Fix [US](ADD|SUB)SAT vector legalization, add tests
Integer result promotion needs to use the scalar size, and we need
support for result widening.

This is in preparation for D55787.

llvm-svn: 349480
2018-12-18 13:22:53 +00:00
Simon Pilgrim ba8e84b31c [X86][AVX] Add 256/512-bit vector funnel shift tests
Extra coverage for D55747

llvm-svn: 349471
2018-12-18 10:32:54 +00:00
Simon Pilgrim 46b90e851b [X86][SSE] Add 128-bit vector funnel shift tests
Extra coverage for D55747

llvm-svn: 349470
2018-12-18 10:08:23 +00:00
Simon Pilgrim af6fbbf18b [TargetLowering] Fallback from SimplifyDemandedVectorElts to SimplifyDemandedBits
For opcodes not covered by SimplifyDemandedVectorElts, SimplifyDemandedBits might be able to help now that it supports demanded elts as well.

llvm-svn: 349466
2018-12-18 09:33:25 +00:00
Simon Pilgrim 26c630f416 [X86][SSE] Replace (VSRLI (VSRAI X, Y), 31) -> (VSRLI X, 31) fold.
This fold was incredibly specific - replace with a SimplifyDemandedBits fold to remove a VSRAI if only the original sign bit is demanded (its guaranteed to stay the same).

Test change is merely a rescheduling.

llvm-svn: 349459
2018-12-18 08:55:47 +00:00
Craig Topper 284d426f6d [X86] Add test cases to show isel failing to match BMI blsmsk/blsi/blsr when the flag result is used.
A similar things happen to TBM instructions which we already have tests for.

llvm-svn: 349450
2018-12-18 08:26:01 +00:00
Craig Topper 4adf9ca738 [X86] Add test case for PR40060. NFC
llvm-svn: 349441
2018-12-18 04:58:07 +00:00
Craig Topper 6a6f6109c4 [X86] Add baseline tests for D55780
This adds tests for (add (umax X, C), -C) as part of fixing PR40053

llvm-svn: 349416
2018-12-17 23:20:14 +00:00
Simon Pilgrim 7e2975a44c [X86][SSE] Improve immediate vector shift known bits handling.
Convert VSRAI to VSRLI is the sign bit is known zero and improve KnownBits output for all shift instruction.

Fixes the poor codegen comments in D55768.

llvm-svn: 349407
2018-12-17 22:09:47 +00:00
Craig Topper 8c9d772991 [X86] Add T1MSKC and TZMSK to isDefConvertible used by optimizeCompareInstr.
These seem to have been missed when the other TBM instructions were added.

llvm-svn: 349404
2018-12-17 21:50:06 +00:00
Craig Topper 728cbc0378 Convert (CMP (srl/shl X, C), 0) to (CMP (and X, C'), 0) when only the zero flag is used.
This allows a TEST to be used and can be combined with any AND that may already exist as an input to the shift.

This was already done in EmitTest, but was easily tricked by multiple uses because the setcc might be used by multiple instructions. Once the SETCC and users are legalized then we can look for the shift to be used by a single CMP, but the CMP itself can have multiple users.

This appears to fix the case in PR39968.

llvm-svn: 349385
2018-12-17 20:02:16 +00:00
Simon Pilgrim 9274f17a5e [TargetLowering] Add DemandedElts mask to SimplifyDemandedBits (PR40000)
This is an initial patch to add the necessary support for a DemandedElts argument to SimplifyDemandedBits, more closely matching computeKnownBits and to help improve vector codegen.

I've added only a small amount of the changes necessary to get at least one test to update - a lot more can be done but I'd like to add these methodically with proper test coverage, at the same time the hope is to slowly move some/all of SimplifyDemandedVectorElts into SimplifyDemandedBits as well.

Differential Revision: https://reviews.llvm.org/D55768

llvm-svn: 349374
2018-12-17 18:43:43 +00:00
Simon Pilgrim 193429ea15 Regenerate test in prep for SimplifyDemandedBits improvements.
llvm-svn: 349350
2018-12-17 12:48:34 +00:00
Craig Topper 792d4f130d [X86] Add test case for PR39968. NFC
llvm-svn: 349331
2018-12-17 07:51:17 +00:00
Simon Pilgrim 0dea14f2f2 Regenerate test (merges X86+X64 cases). NFCI.
llvm-svn: 349317
2018-12-16 19:07:57 +00:00
Craig Topper 10f8892837 [X86] Remove truncation handling from EmitTest. Replace it with a DAG combine.
I'd like to try to move a lot of the flag matching out of EmitTest and push it to isel or isel preprocessing. This is a step towards that.

The test-shrink-bug.ll changie is an improvement because we are no longer interfering with test shrink handling in isel.

The pr34137.ll change is a regression, but the IR came from -O0 and was not reduced by InstCombine. So it contains a lot of redundancies like duplicate loads that made it combine poorly.

llvm-svn: 349315
2018-12-16 18:35:55 +00:00
Craig Topper b0b9c54578 [X86] Autogenerate complete checks. NFC
llvm-svn: 349314
2018-12-16 18:35:54 +00:00
Sanjay Patel 13ac2f15b0 [x86] increment/decrement constant vector with min/max in vsetcc lowering (PR39859)
This is part of fixing PR39859:
https://bugs.llvm.org/show_bug.cgi?id=39859

We have a crippled vector ISA, so we have to invert a typical fold and create min/max here.

As discussed in the bug report, we can probably do better by using saturating subtract when 
it's available, but we should have this improvement for the min/max patterns regardless.

Alive proofs:
https://rise4fun.com/Alive/zsf
https://rise4fun.com/Alive/Qrl

Differential Revision: https://reviews.llvm.org/D55515

llvm-svn: 349304
2018-12-16 15:05:48 +00:00
Sanjay Patel f24900b934 [DAGCombiner] allow hoisting vector bitwise logic ahead of truncates
The transform performs a bitwise logic op in a wider type followed by
truncate when both inputs are truncated from the same source type:
logic_op (truncate x), (truncate y) --> truncate (logic_op x, y)

There are a bunch of other checks that should prevent doing this when 
it might be harmful.

We already do this transform for scalars in this spot. The vector 
limitation was shared with a check for the case when the operands are 
extended. I'm not sure if that limit is needed either, but that would 
be a separate patch.

Differential Revision: https://reviews.llvm.org/D55448

llvm-svn: 349303
2018-12-16 14:57:04 +00:00
Simon Pilgrim 0ef977b83d [SelectionDAG] Add FSHL/FSHR support to computeKnownBits
Also exposes an issue in DAGCombiner::visitFunnelShift where we were assuming the shift amount had the result type (after legalization it'll have the targets shift amount type).

llvm-svn: 349298
2018-12-16 13:33:37 +00:00
Simon Pilgrim 780b3ca775 [X86] Add computeKnownBits tests for funnel shift intrinsics
llvm-svn: 349297
2018-12-16 12:15:31 +00:00
Craig Topper 392edb6223 [X86] Autogenerate complete checks. NFC
llvm-svn: 349287
2018-12-15 22:52:57 +00:00
Simon Pilgrim ef7b5949e5 [X86] Lower to SHLD/SHRD on slow machines for optsize
Use consistent rules for when to lower to SHLD/SHRD for slow machines - fixes a weird issue where funnel shift gets expanded but then X86ISelLowering's combineOr sees the optsize and combines to SHLD/SHRD, but now with the modulo amount guard......

llvm-svn: 349285
2018-12-15 19:43:44 +00:00
Simon Pilgrim 53c8b1b6f7 [X86] Add optsize SHLD/SHRD tests
llvm-svn: 349284
2018-12-15 19:32:26 +00:00
Dinar Temirbulatov 8c8724dd0d [CodeGen] Enhance machine PHIs optimization
Summary:
Make machine PHIs optimization to work for single value register taken from
several different copies. This is the first step to fix PR38917. This change
allows to get rid of redundant PHIs (see opt_phis2.mir test) to make
the subsequent optimizations (like CSE) possible and simpler.

For instance, before this patch the code like this:

%b = COPY %z
...
%a = PHI %bb1, %a; %bb2, %b
could be optimized to:

%a = %b
but the code like this:

%c = COPY %z
...
%b = COPY %z
...
%a = PHI %bb1, %a; %bb2, %b; %bb3, %c
would remain unchanged.
With this patch the latter case will be optimized:

%a = %z```.

Committed on behalf of: Anton Afanasyev anton.a.afanasyev@gmail.com

Reviewers: RKSimon, MatzeB

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54839

llvm-svn: 349271
2018-12-15 14:37:01 +00:00
Simon Pilgrim 1e1fd9c761 [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts
Differential Revision: https://reviews.llvm.org/D55600

llvm-svn: 349264
2018-12-15 11:36:36 +00:00
Sanjay Patel 7b776863ac [x86] add tests for extractelement of FP binops; NFC
llvm-svn: 349179
2018-12-14 19:15:54 +00:00
Sanjay Patel 4f4963b9cb [x86] make tests immune to scalarization improvements; NFC
llvm-svn: 349176
2018-12-14 18:44:16 +00:00
Sanjay Patel 5a97a105f8 [x86] auto-generate complete checks; NFC
llvm-svn: 349162
2018-12-14 16:49:57 +00:00
Sanjay Patel 41e8112ed6 [x86] regenerate test checks; NFC
llvm-svn: 349161
2018-12-14 16:46:21 +00:00
Sanjay Patel b7d9f9117e [x86] make tests immune to scalarization improvements; NFC
llvm-svn: 349160
2018-12-14 16:44:58 +00:00
Scott Linder de6beb02a5 Implement -frecord-command-line (-frecord-gcc-switches)
Implement options in clang to enable recording the driver command-line
in an ELF section.

Implement a new special named metadata, llvm.commandline, to support
frontends embedding their command-line options in IR/ASM/ELF.

This differs from the GCC implementation in some key ways:

* In GCC there is only one command-line possible per compilation-unit,
  in LLVM it mirrors llvm.ident and multiple are allowed.
* In GCC individual options are separated by NULL bytes, in LLVM entire
  command-lines are separated by NULL bytes. The advantage of the GCC
  approach is to clearly delineate options in the face of embedded
  spaces. The advantage of the LLVM approach is to support merging
  multiple command-lines unambiguously, while handling embedded spaces
  with escaping.

Differential Revision: https://reviews.llvm.org/D54487
Clang Differential Revision: https://reviews.llvm.org/D54489

llvm-svn: 349155
2018-12-14 15:38:15 +00:00
John Brawn 1d0d86ae40 [RegAllocGreedy] IMPLICIT_DEF values shouldn't prefer registers
It costs nothing to spill an IMPLICIT_DEF value (the only spill code that's
generated is a KILL of the value), so when creating split constraints if the
live-out value is IMPLICIT_DEF the exit constraint should be DontCare instead
of PrefReg.

Differential Revision: https://reviews.llvm.org/D55652

llvm-svn: 349151
2018-12-14 14:07:57 +00:00
Sanjay Patel 791ae69afe [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract; 2nd try
This is a retry of rL349051 (reverted at rL349056). I changed the check for dead-ness from
number of uses to an opcode test for DELETED_NODE based on existing similar code.

Differential Revision: https://reviews.llvm.org/D55655

llvm-svn: 349058
2018-12-13 17:05:01 +00:00
Simon Pilgrim b5aaa673c6 [X86][SSE] Add SSE vector imm/var shift support to SimplifyDemandedVectorEltsForTargetNode
llvm-svn: 349057
2018-12-13 16:39:29 +00:00
Sanjay Patel c56f5728ee revert rL349051: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract
This causes an address sanitizer bot failure:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/27187/steps/check-llvm%20asan/logs/stdio

llvm-svn: 349056
2018-12-13 16:32:44 +00:00
Simon Pilgrim b0b2f1503a [X86][SSE] Fix all remaining modulo vector rotation amounts (PR38243)
There's still a couple of minor SimplifyDemandedElts regressions in some of the shift amount splats that will be fixed in future patches.

llvm-svn: 349052
2018-12-13 15:50:31 +00:00
Sanjay Patel a7b115b392 [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract
Differential Revision: https://reviews.llvm.org/D55655

llvm-svn: 349051
2018-12-13 15:44:26 +00:00
Simon Pilgrim ba91ff4a86 [X86][SSE] Fix modulo rotation amounts for v8i16/v16i16/v4i32 (PR38243)
llvm-svn: 349047
2018-12-13 15:23:09 +00:00
Simon Pilgrim 320fd7383f [X86][BWI] Don't custom lower vXi8 rotations.
We always expand to shifts anyhow - test changes are just different scheduling only.

llvm-svn: 349034
2018-12-13 13:44:33 +00:00
Clement Courbet 76f4ae1092 [CodeGen] Allow mempcy/memset to generate small overlapping stores.
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.

Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D55365

llvm-svn: 349016
2018-12-13 09:56:19 +00:00
Craig Topper d1c61861dd [X86] Don't emit MULX by default with BMI2
MULX has somewhat improved register allocation constraints compared to the legacy MUL instruction. Both output registers are encoded instead of fixed to EAX/EDX, but EDX is used as input. It also doesn't touch flags. Unfortunately, the encoding is longer.

Prefering it whenever BMI2 is enabled is probably not optimal. Choosing it should somehow be a function of register allocation constraints like converting adds to three address. gcc and icc definitely don't pick MULX by default. Not sure what if any rules they have for using it.

Differential Revision: https://reviews.llvm.org/D55565

llvm-svn: 348975
2018-12-12 21:21:31 +00:00
Craig Topper cd7d7ac0fd [X86] Move stack folding test for MULX to a MIR test. Add a MULX32 case as well
A future patch may stop using MULX by default so use MIR to ensure we're always testing MULX.

Add the 32-bit case that we couldn't do in the 64-bit mode IR test due to it being promoted to a 64-bit mul.

llvm-svn: 348972
2018-12-12 20:50:24 +00:00
Simon Pilgrim 4a641efdc1 [X86] Added missing constant pool checks. NFCI.
So the extra checks in D55600 don't look like a regression.

llvm-svn: 348966
2018-12-12 19:56:38 +00:00
Craig Topper 4937adf75f [X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input.
I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that.

I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I changed one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful.

Differential Revision: https://reviews.llvm.org/D55414

llvm-svn: 348959
2018-12-12 19:20:21 +00:00
Simon Pilgrim 5864ab2dc0 [X86] Added missing constant pool checks. NFCI.
So the extra checks in D55600 don't look like a regression.

llvm-svn: 348956
2018-12-12 18:53:12 +00:00
Sanjay Patel 44eaa492b8 [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA
This extends the code that handles 16-bit add promotion to form LEA to also allow 8-bit adds. 
That allows us to combine add ops with register moves and save some instructions. This is 
another step towards allowing add truncation in generic DAGCombiner (see D54640).

Differential Revision: https://reviews.llvm.org/D55494

llvm-svn: 348946
2018-12-12 17:58:27 +00:00
Simon Pilgrim f6c898e12f [TargetLowering] Add ISD::AND handling to SimplifyDemandedVectorElts
If either of the operand elements are zero then we know the result element is going to be zero (even if the other element is undef).

Differential Revision: https://reviews.llvm.org/D55558

llvm-svn: 348926
2018-12-12 13:43:07 +00:00
Leonard Chan 118e53fd63 [Intrinsic] Signed Fixed Point Multiplication Intrinsic
Add an intrinsic that takes 2 signed integers with the scale of them provided
as the third argument and performs fixed point multiplication on them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D54719

llvm-svn: 348912
2018-12-12 06:29:14 +00:00
Craig Topper 1fe466689b [X86] Combine vpmovdw+vpacksswb into vpmovdb.
This is similar to the combine we already have for vpmovdw+vpackuswb.

llvm-svn: 348910
2018-12-12 05:56:01 +00:00
Craig Topper 5b69b5e20a [X86] Add a few more fptosi test cases to demonstrate -x86-experimental-vector-widening legalization not combining vpacksswb+vpmovdw.
We are able to combine vpackuswb+vpmovdw, but we didn't have packsswb+vpmovdw at the time that combine was added.

llvm-svn: 348909
2018-12-12 05:55:59 +00:00
Craig Topper b51283bfd7 Fix not correct imm operand assertion for SUB32ri in X86CondBrFolding::analyzeCompare
Summary:
When doing X86CondBrFolding::analyzeCompare, it will meet the SUB32ri instruction as below to use the global address for its operand,
  %733:gr32 = SUB32ri %62:gr32(tied-def 0), @img2buf_normal, implicit-def $eflags
  JNE_1 %bb.41, implicit $eflags

so the assertion "assert(MI.getOperand(ValueIndex).isImm() && "Expecting Imm operand")" is not correct and change the assert to if make X86CondBrFolding::analyzeCompare return false as not finding the compare for this

Patch by Jianping Chen

Reviewers: smaslov, LuoYuanke, liutianle, Jianping

Reviewed By: Jianping

Subscribers: lebedev.ri, llvm-commits

Differential Revision: https://reviews.llvm.org/D54250

llvm-svn: 348853
2018-12-11 15:32:14 +00:00
Clement Courbet 8b6434bbb9 Revert r348843 "[CodeGen] Allow mempcy/memset to generate small overlapping stores."
Breaks ARM/memcpy-inline.ll

llvm-svn: 348844
2018-12-11 13:38:43 +00:00
Clement Courbet 93b3445770 [CodeGen] Allow mempcy/memset to generate small overlapping stores.
Summary:
All targets either just return false here or properly model `Fast`, so I
don't think there is any reason to prevent CodeGen from doing the right
thing here.

Subscribers: nemanjai, javed.absar, eraman, jsji, llvm-commits

Differential Revision: https://reviews.llvm.org/D55365

llvm-svn: 348843
2018-12-11 13:15:56 +00:00
Simon Pilgrim f6371f5f23 [TargetLowering] Add ISD::EXTRACT_VECTOR_ELT support to SimplifyDemandedBits
Let SimplifyDemandedBits attempt to simplify all elements of a vector extraction.

Part of PR39689.

llvm-svn: 348839
2018-12-11 11:08:40 +00:00
Craig Topper 4bd93fa5bb [X86] Switch the 64-bit mulx schedule test to use inline assembly.
I'm not sure we should always prefer MULX over MUL. So making the MULX guaranteed with inline assembly.

llvm-svn: 348833
2018-12-11 07:41:06 +00:00
Amara Emerson 5ec146046c [GlobalISel] Restrict G_MERGE_VALUES capability and replace with new opcodes.
This patch restricts the capability of G_MERGE_VALUES, and uses the new
G_BUILD_VECTOR and G_CONCAT_VECTORS opcodes instead in the appropriate places.

This patch also includes AArch64 support for selecting G_BUILD_VECTOR of <4 x s32>
and <2 x s64> vectors.

Differential Revisions: https://reviews.llvm.org/D53629

llvm-svn: 348788
2018-12-10 18:44:58 +00:00
Simon Pilgrim fc2c9af99c [TargetLowering] Add UNDEF folding to SimplifyDemandedVectorElts
If all the demanded elements of the SimplifyDemandedVectorElts are known to be UNDEF, we can simplify to an ISD::UNDEF node.

Zero constant folding will be handled in a future patch - its a little trickier as we often have bitcasted zero values.

Differential Revision: https://reviews.llvm.org/D55511

llvm-svn: 348784
2018-12-10 18:29:46 +00:00
Sanjay Patel 45ae6b50d8 [x86] add tests for LowerVSETCC with min/max; NFC
llvm-svn: 348769
2018-12-10 16:28:30 +00:00
Francis Visoiu Mistrih 0ad1af72cd [DAGCombiner] Simplify test case from r348759
Thanks Simon for pointing that out.

llvm-svn: 348765
2018-12-10 16:04:56 +00:00
Francis Visoiu Mistrih 753efe3584 [DAGCombiner] Use the result value type in visitCONCAT_VECTORS
This triggers an assert when combining concat_vectors of a bitcast of
merge_values.

With asserts disabled, it fails to select:
fatal error: error in backend: Cannot select: 0x7ff19d000e90: i32 = any_extend 0x7ff19d000ae8
  0x7ff19d000ae8: f64,ch = CopyFromReg 0x7ff19d000c20:1, Register:f64 %1
    0x7ff19d000b50: f64 = Register %1
In function: d

Differential Revision: https://reviews.llvm.org/D55507

llvm-svn: 348759
2018-12-10 14:31:34 +00:00
Jeremy Morse 045c67769d [DebugInfo] Emit undef DBG_VALUEs when SDNodes are optimised out
This is a fix for PR39896, where dbg.value's of SDNodes that have been
optimised out do not lead to "DBG_VALUE undef" instructions being created.
Such undef instructions are necessary to terminate earlier variable
ranges, otherwise variable values leak past the point where they're valid.

The "invalidated" flag of SDDbgValue is currently being abused to mean two
things:
 * The corresponding SDNode is now invalid
 * This SDDbgValue should not be emitted
Of which there are several legitimate combinations of meaning:
 * The SDNode has been invalidated and we should emit "DBG_VALUE undef"
 * The SDNode has been invalidated but the debug data was salvaged, don't
   emit anything for this SDDbgValue
 * This SDDbgValue has been emitted

This patch introduces distinct "Emitted" and "Invalidated" fields to the
SDDbgValue class, updates users accordingly, and generates "undef"
DBG_VALUEs for invalidated records. Awkwardly, there are circumstances
where we emit SDDbgValue's twice, specifically DebugInfo/X86/dbg-addr-dse.ll
which I've preserved.

Differential Revision: https://reviews.llvm.org/D55372

llvm-svn: 348751
2018-12-10 11:20:47 +00:00
Nikita Popov e79477895e [X86] Fix AvoidStoreForwardingBlocks pass for negative displacements
Fixes https://bugs.llvm.org/show_bug.cgi?id=39926.

The size of the first copy was computed as
std::abs(std::abs(LdDisp2) - std::abs(LdDisp1)), which results in
skipped bytes if the signs of LdDisp2 and LdDisp1 differ. As far as
I can see, this should just be LdDisp2 - LdDisp1. The case where
LdDisp1 > LdDisp2 is already handled in the code above, in which case
LdDisp2 is set to LdDisp1 and this subtraction will evaluate to
Size1 = 0, which is the correct value to skip an overlapping copy.

Differential Revision: https://reviews.llvm.org/D55485

llvm-svn: 348750
2018-12-10 10:16:50 +00:00
Craig Topper 02b614abc8 [X86] Merge addcarryx/addcarry intrinsic into a single addcarry intrinsic.
Both intrinsics do the exact same thing so we really only need one.

Earlier in the 8.0 cycle we changed the signature of this intrinsic without renaming it. But it looks difficult to get the autoupgrade code to allow me to merge the intrinsics and change the signature at the same time. So I've renamed the intrinsic slightly for the new merged intrinsic. I'm skipping autoupgrading from the previous new to 8.0 signature. I've also renamed the subborrow for consistency.

llvm-svn: 348737
2018-12-10 06:07:50 +00:00
Craig Topper 2b09d17d93 [X86] If the carry input to an addcarry/subborrow intrinsic is known to be 0, emit a flag setting ADD/SUB instead of ADC/SBB.
Previously we had to take the carry in and add -1 to it to set the carry flag so we could use it with ADC/SBB. But if we know its 0 then we don't need to bother.

This should go a long way towards fixing PR24545.

llvm-svn: 348727
2018-12-09 18:02:37 +00:00
Sanjay Patel 099beb25e4 [x86] regenerate test checks; NFC
llvm-svn: 348723
2018-12-09 14:47:53 +00:00
Sanjay Patel 19bc850220 [x86] don't try to convert add with undef operands to LEA
The existing code tries to handle an undef operand while transforming an add to an LEA, 
but it's incomplete because we will crash on the i16 test with the debug output shown below. 
It's better to just give up instead. Really, GlobalIsel should have folded these before we 
could get into trouble.

# Machine code for function add_undef_i16: NoPHIs, TracksLiveness, Legalized, RegBankSelected, Selected

bb.0 (%ir-block.0):
  liveins: $edi
  %1:gr32 = COPY killed $edi
  %0:gr16 = COPY %1.sub_16bit:gr32
  %5:gr64_nosp = IMPLICIT_DEF
  %5.sub_16bit:gr64_nosp = COPY %0:gr16
  %6:gr64_nosp = IMPLICIT_DEF
  %6.sub_16bit:gr64_nosp = COPY %2:gr16
  %4:gr32 = LEA64_32r killed %5:gr64_nosp, 1, killed %6:gr64_nosp, 0, $noreg
  %3:gr16 = COPY killed %4.sub_16bit:gr32
  $ax = COPY killed %3:gr16
  RET 0, implicit killed $ax

# End machine code for function add_undef_i16.

*** Bad machine code: Reading virtual register without a def ***
- function:    add_undef_i16
- basic block: %bb.0  (0x7fe6cd83d940)
- instruction: %6.sub_16bit:gr64_nosp = COPY %2:gr16
- operand 1:   %2:gr16
LLVM ERROR: Found 1 machine code errors.

Differential Revision: https://reviews.llvm.org/D54710

llvm-svn: 348722
2018-12-09 14:40:37 +00:00
Nikita Popov 3192449412 [X86] Add test for PR39926; NFC
The test file shows a case where the avoid store forwarding block
pass misses to copy a range (-1..1) when the load displacement
changes sign.

Baseline test for D55485.

llvm-svn: 348712
2018-12-09 12:02:56 +00:00