Commit Graph

12919 Commits

Author SHA1 Message Date
Sanjay Patel c3717cd0d5 [DAGCombiner] don't hoist logic op if operands have other uses
The AVX512 diffs are neutral, but the bswap test shows a clear overreach in 
hoistLogicOpWithSameOpcodeHands(). If we don't check for other uses, we can 
increase the instruction count.

This could also fight with transforms trying to go in the opposite direction 
and possibly blow up/infinite loop. This might be enough to solve the bug 
noted here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181203/608593.html

I did not add the hasOneUse() checks to all opcodes because I see a perf 
regression for at least one opcode. We may decide that's irrelevant in the
face of potential compiler crashing, but I'll see if I can salvage that first.

llvm-svn: 348508
2018-12-06 18:16:32 +00:00
Sanjay Patel db6396b892 [x86] add test for hoistLogicOpWithSameOpcodeHands with extra uses; NFC
llvm-svn: 348506
2018-12-06 18:06:10 +00:00
Clement Courbet fee1040f04 [X86][NFC] Convert memcpy/memset tests to update_llc_test_checks.
llvm-svn: 348477
2018-12-06 10:07:12 +00:00
Clement Courbet 52d382488f [X86][NFC] Add more tests for memset.
llvm-svn: 348465
2018-12-06 08:48:06 +00:00
Pete Cooper e13d0992dc Add objc.* ARC intrinsics and codegen them to their runtime methods.
Reviewers: erik.pilkington, ahatanak

Differential Revision: https://reviews.llvm.org/D55233

llvm-svn: 348441
2018-12-06 00:52:54 +00:00
Simon Pilgrim c10590f6f9 [X86][SSE] Fix a copy+paste typo that was folding the sext/zext of partial vectors
llvm-svn: 348403
2018-12-05 19:32:19 +00:00
Sanjay Patel 33a448f935 [DAGCombiner] don't try to extract a fraction of a vector binop and crash (PR39893)
Because we're potentially peeking through a bitcast in this transform,
we need to use overall bitwidths rather than number of elements to
determine when it's safe to proceed.

Should fix:
https://bugs.llvm.org/show_bug.cgi?id=39893

llvm-svn: 348383
2018-12-05 17:10:30 +00:00
Andrea Di Biagio 3998bebc12 [X86] Add test case to show missed opportunity to combine a concat_vector into a scalar_to_vector. NFC
This is a test for D55274.

llvm-svn: 348380
2018-12-05 16:23:27 +00:00
Chandler Carruth 71c14a36a2 [SLH] Fix a nasty bug in SLH.
Whenever we effectively take the address of a basic block we need to
manually update that basic block to reflect that fact or later passes
such as tail duplication and tail merging can break the invariants of
the code. =/ Sadly, there doesn't appear to be any good way of
automating this or even writing a reasonable assert to catch it early.

The change seems trivially and obviously correct, but sadly the only
really good test case I have is 1000s of basic blocks. I've tried
directly writing a test case that happens to make tail duplication do
something that crashes later on, but this appears to require an
*amazingly* complex set of conditions that I've not yet reproduced.

The change is technically covered by the tests because we mark the
blocks as having their address taken, but that doesn't really count as
properly testing the functionality.

llvm-svn: 348374
2018-12-05 15:42:11 +00:00
Chandler Carruth e3ea164659 [SLH] Regenerate tests with --no_x86_scrub_rip to restore the higher
fidelity checking of RIP-based references to basic blocks and other
labels.

These labels are super important for SLH tests so we should keep them
readable in the test cases.

llvm-svn: 348373
2018-12-05 15:41:13 +00:00
Simon Pilgrim 180639afe5 [SelectionDAG] Initial support for FSHL/FSHR funnel shift opcodes (PR39467)
This is an initial patch to add a minimum level of support for funnel shifts to the SelectionDAG and to begin wiring it up to the X86 SHLD/SHRD instructions.

Some partial legalization code has been added to handle the case for 'SlowSHLD' where we want to expand instead and I've added a few DAG combines so we don't get regressions from the existing DAG builder expansion code.

Differential Revision: https://reviews.llvm.org/D54698

llvm-svn: 348353
2018-12-05 11:12:12 +00:00
Craig Topper 3991089816 [X86] Add narrow vector test cases to vector-reduce* tests. Add copies of the tests with -x86-experimental-vector-widening-legalization
llvm-svn: 348334
2018-12-05 06:29:44 +00:00
Craig Topper 6934202dc0 [MachineLICM][X86][AMDGPU] Fix subtle bug in the updating of PhysRegClobbers in post-RA LICM
It looks like MCRegAliasIterator can visit the same physical register twice. When this happens in this code in LICM we end up setting the PhysRegDef and then later in the same loop visit the register again. Now we see that PhysRegDef is set from the earlier iteration so now set PhysRegClobber.

This patch splits the loop so we have one that uses the previous value of PhysRegDef to update PhysRegClobber and second loop that updates PhysRegDef.

The X86 atomic test is an improvement. I had to add sideeffect to the two shrink wrapping tests to prevent hoisting from occurring. I'm not sure about the AMDGPU tests. It looks like the branch instruction changed at end the of the loops. And in the branch-relaxation test I think there is now "and vcc, exec, -1" instruction that wasn't there before.

Differential Revision: https://reviews.llvm.org/D55102

llvm-svn: 348330
2018-12-05 03:41:26 +00:00
Matt Arsenault b17241b12d Move llc-start-stop-instance to x86
Avoid bot failures where the host pass
setup might not have 2 dead-mi-elimination runs

llvm-svn: 348290
2018-12-04 18:19:08 +00:00
Simon Pilgrim 07843640d5 [X86][SSE] Add SimplifyDemandedBitsForTargetNode handling for MOVMSK
Moves existing SimplifyDemandedBits call out of combineMOVMSK and add SimplifyDemandedVectorElts call based on the sign bits we need.

llvm-svn: 348282
2018-12-04 16:52:32 +00:00
Simon Pilgrim e82c3dab12 [X86][SSE] Add MOVMSK demandedbits/elts tests
llvm-svn: 348277
2018-12-04 16:01:25 +00:00
Clement Courbet 7925d58eae [X86][NFC] Add more constant-size memcmp tests.
llvm-svn: 348257
2018-12-04 12:35:51 +00:00
Simon Pilgrim 0add090e24 [TargetLowering] expandFP_TO_UINT - avoid FPE due to out of range conversion (PR17686)
PR17686 demonstrates that for some targets FP exceptions can fire in cases where the FP_TO_UINT is expanded using a FP_TO_SINT instruction.

The existing code converts both the inrange and outofrange cases using FP_TO_SINT and then selects the result, this patch changes this for 'strict' cases to pre-select the FP_TO_SINT input and the offset adjustment.

The X87 cases don't need the strict flag but generates much nicer code with it....

Differential Revision: https://reviews.llvm.org/D53794

llvm-svn: 348251
2018-12-04 11:21:30 +00:00
Simon Pilgrim 666261cdc8 [TargetLowering] Add SimplifyDemandedVectorElts support to EXTEND opcodes
Add support for ISD::*_EXTEND and ISD::*_EXTEND_VECTOR_INREG opcodes.

The extra broadcast in trunc-subvector.ll will be fixed in an upcoming patch.

llvm-svn: 348246
2018-12-04 10:41:06 +00:00
Sanjay Patel d24f63477d [DAGCombiner] narrow truncated vector binops when legal
This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.

x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.

Differential Revision: https://reviews.llvm.org/D55126

llvm-svn: 348195
2018-12-03 21:57:35 +00:00
Craig Topper 5440b63fa8 [X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS.
Summary:
We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly.

After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations.

This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55165

llvm-svn: 348159
2018-12-03 18:26:27 +00:00
Craig Topper e35b01f8ea [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.

The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54836

llvm-svn: 348158
2018-12-03 18:26:24 +00:00
Simon Pilgrim fb39916048 Fix line endings. NFCI.
llvm-svn: 348146
2018-12-03 14:55:09 +00:00
Craig Topper 959b415e2f [X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar.
llvm-svn: 348104
2018-12-02 19:47:14 +00:00
Sanjay Patel b205606d3e [SelectionDAG] fold constant with undef vector per element
This makes the SDAG behavior consistent with the way we do this in IR.
It's possible that we were getting the wrong answer before. For example,
'xor undef, undef --> 0' but 'xor undef, C' --> undef. 

But the most practical improvement is likely as shown in the tests here - 
for FP, we were overconstraining undef lanes to NaN, and that can prevent 
vector simplifications/narrowing (see D51553).

llvm-svn: 348090
2018-12-02 13:48:42 +00:00
Craig Topper 4bb077910a [X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack.
Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register.

llvm-svn: 348086
2018-12-02 05:46:50 +00:00
Craig Topper ec096a1dae [X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64.
The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away.

By custom legalizing it we can avoid this churn and maybe produce better code.

llvm-svn: 348085
2018-12-02 05:46:48 +00:00
Craig Topper eff43f6ae3 [X86] Add vXi8 division/remainder by non-splat constant test cases to prepare for an upcoming patch.
llvm-svn: 348082
2018-12-01 21:53:08 +00:00
Craig Topper f4b13927e7 [X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1
Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55138

llvm-svn: 348079
2018-12-01 19:26:31 +00:00
Simon Pilgrim e017ed3245 [SelectionDAG] Improve SimplifyDemandedBits to SimplifyDemandedVectorElts simplification
D52935 introduced the ability for SimplifyDemandedBits to call SimplifyDemandedVectorElts through BITCASTs if the demanded bit mask entirely covered the sub element.

This patch relaxes this to demanding an element if we need any bit from it.

Differential Revision: https://reviews.llvm.org/D54761

llvm-svn: 348073
2018-12-01 12:08:55 +00:00
Craig Topper 2d6324c3cb [X86] Remove stale FIXME from test case. NFC
This was fixed in r346581. I just forgot to remove it.

llvm-svn: 348069
2018-12-01 07:45:36 +00:00
Sanjay Patel 39298cae9f [x86] add tests for undef + partial undef constant folding; NFC
Keep this file sync'd with the instsimplify version (rL348045).

llvm-svn: 348047
2018-11-30 22:54:33 +00:00
Craig Topper 4d80f199e8 [X86] Change vXi8 MULHU lowering to unpack high and low half of lanes instead of extracting and concating low and high half registers.
This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation.

llvm-svn: 348019
2018-11-30 18:43:18 +00:00
Craig Topper 8191307d09 [X86] Prefer lowerVectorShuffleAsBitMask over using a avx512 masked operation when avx512bw/avx512vl is enabled.
This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates.

llvm-svn: 348018
2018-11-30 18:43:15 +00:00
Sanjay Patel 1901a12e76 [SelectionDAG] fold FP binops with 2 undef operands to undef
llvm-svn: 348016
2018-11-30 18:38:52 +00:00
Sanjay Patel 1cfb796b58 [x86] add tests for fake vector FP ops; NFC
llvm-svn: 348002
2018-11-30 16:50:08 +00:00
Than McIntosh 0e0a8a3fee [CodeGen] Prefer static frame index for STATEPOINT liveness args
Summary:
If a given liveness arg of STATEPOINT is at a fixed frame index
(e.g. a function argument passed on stack), prefer to use this
fixed location even the address is also in a register. If we use
the register it will generate a spill, which is not necessary
since the fixed frame index can be directly recorded in the stack
map.

Patch by Cherry Zhang <cherryyz@google.com>.

Reviewers: thanm, niravd, reames

Reviewed By: reames

Subscribers: cherryyz, reames, anna, arphaman, llvm-commits

Differential Revision: https://reviews.llvm.org/D53889

llvm-svn: 347998
2018-11-30 16:22:41 +00:00
Craig Topper a2133061c0 [X86] Emit PACKUS directly from the v16i8 LowerMULH code instead of using a shuffle.
llvm-svn: 347967
2018-11-30 08:32:05 +00:00
Mircea Trofin f1a49e8525 Revert "Revert r347596 "Support for inserting profile-directed cache prefetches""
Summary:
This reverts commit d8517b96dfbd42e6a8db33c50d1fa1e58e63fbb9.

Fix: correct  the use of DenseMap.

Reviewers: davidxl, hans, wmi

Reviewed By: wmi

Subscribers: mgorny, eraman, llvm-commits

Differential Revision: https://reviews.llvm.org/D55088

llvm-svn: 347938
2018-11-30 01:01:52 +00:00
Sanjay Patel 8d27144251 [DAGCombiner] narrow truncated binops
The motivating case for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=32023
and the corresponding rot16.ll regression tests.

Because x86 scalar shift amounts are i8 values, we can end up with trunc-binop-trunc 
sequences that don't get folded in IR.

As the TODO comments suggest, there will be regressions if we extend this (for x86, 
we mostly seem to be missing LEA opportunities, but there are likely vector folds 
missing too). I think those should be considered existing bugs because this is the 
same transform that we do as an IR canonicalization in instcombine. We just need 
more tests to make those visible independent of this patch.

Differential Revision: https://reviews.llvm.org/D54640

llvm-svn: 347917
2018-11-29 20:58:26 +00:00
Craig Topper 129d529ab3 [SelectionDAG][AArch64][X86] Move legalization of vector MULHS/MULHU from LegalizeDAG to LegalizeVectorOps
I believe we should be legalizing these with the rest of vector binary operations. If any custom lowering is required for these nodes, this will give the DAG combine between LegalizeVectorOps and LegalizeDAG to run on the custom code before constant build_vectors are lowered in LegalizeDAG.

I've moved MULHU/MULHS handling in AArch64 from Lowering to isel. Moving the lowering earlier caused build_vector+extract_subvector simplifications to kick in which made the generated code worse.

Differential Revision: https://reviews.llvm.org/D54276

llvm-svn: 347902
2018-11-29 19:36:17 +00:00
Craig Topper 6cd0b17078 [X86] Add a DAG combine pre type legalization to widen division by constant splat on narrow vectors to avoid scalarization
This is another patch for -x86-experimental-vector-widening. This pre widens narrow division by constants so that we can get pass the legal type check in the generic DAG combiner. Otherwise we end up scalarizing.

I've restricted this to splats for now because it was easy to just call DAG.getConstant. Not sure what we should do for non-splat? Increase the element size?Widen the constant vector by padding with 1?

Differential Revision: https://reviews.llvm.org/D54919

llvm-svn: 347898
2018-11-29 19:13:38 +00:00
Hans Wennborg 6e3be9d12e Revert r347596 "Support for inserting profile-directed cache prefetches"
It causes asserts building BoringSSL. See https://crbug.com/91009#c3 for
repro.

This also reverts the follow-ups:
Revert r347724 "Do not insert prefetches with unsupported memory operands."
Revert r347606 "[X86] Add dependency from X86 to ProfileData after rL347596"
Revert r347607 "Add new passes to X86 pipeline tests"

llvm-svn: 347864
2018-11-29 13:58:02 +00:00
Sanjay Patel 2de209313e [x86] try select simplification for target-specific nodes
This failed to select (which might be a separate bug) in
X86ISelDAGToDAG because we try to create a select node
that can be simplified away after rL347227.

This change avoids the problem by simplifying the SHRUNKBLEND
node sooner. In the test case, we manage to realize that the
true/false values of the select (SHRUNKBLEND) are the same thing,
so it simplifies away completely.

llvm-svn: 347818
2018-11-28 22:51:04 +00:00
Craig Topper f3b6f583e2 [X86] Add a combine for back to back VSRAI instructions
Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI

Differential Revision: https://reviews.llvm.org/D54959

llvm-svn: 347784
2018-11-28 18:03:38 +00:00
Mircea Trofin 35f0e5cd2d Do not insert prefetches with unsupported memory operands.
Summary:
Ignore advices where the memory operand of the 'anchor' instruction
uses unsupported register types.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54983

llvm-svn: 347724
2018-11-28 01:08:45 +00:00
Craig Topper 5fb34b5498 [X86] Add cascade lake arch in X86 target.
This is skylake-avx512 with the addition of avx512vnni ISA.

Patch by Jianping Chen

Differential Revision: https://reviews.llvm.org/D54785

llvm-svn: 347681
2018-11-27 18:05:00 +00:00
Sanjay Patel 3827aabe75 [x86] regenerate checks; NFC
llvm-svn: 347661
2018-11-27 15:52:17 +00:00
Craig Topper 587b981fca [X86] Add test cases for vector shifts of v2i32/v2i16/v4i16/v2i8/v4i8/v8i8 with promotion legalization and widening legalization. NFC
llvm-svn: 347643
2018-11-27 07:20:19 +00:00
Craig Topper 4325505f05 [X86] Prevent DAG combine from folding a bitcast from vXi1 to iX with a store on pre-AVX512 targets.
If we fold the bitcast into the store we'll end up creating a truncating store to vXi1 that will get scalarized. Instead allow the bitcast to be turned into a movmsk.

We probably need to do something if the store itself is a vXi1 type, but I'll leave that til a testcase appears.

llvm-svn: 347632
2018-11-27 02:57:27 +00:00