Loading a constant into a k-register in AVX512 requires a bitcast from a scalar constant. In the test case here we have a k-register store that gets split into multiple parts of KNL. MergeConsecutiveStores sees each of these pieces as a consecutive store and looks through the bitcast to find the underly scalar constant. But when we went to create the combined store we didn't look through the same bitcast.
llvm-svn: 326677
X86 considers v1i1 a legal type under AVX512 and as such a truncate from a v1iX type to v1i1 can be turned into a scalar truncate plus a conversion to v1i1. We would much prefer a v1i1 SCALAR_TO_VECTOR over a one element BUILD_VECTOR.
During lowering we were detecting the v1i1 BUILD_VECTOR as a splat BUILD_VECTOR like we try to do for v2i1/v4i1/etc. In this case we create (select i1 splat_elt, v1i1 all-ones, v1i1 all-zeroes). That goes through some more legalization and we end up with a CMOV choosing between 0 and 1 in scalar and a scalar_to_vector.
Arguably we could detect the v1i1 BUILD_VECTOR and do this better in X86 target code. But just using a SCALAR_TO_VECTOR in legalization is much easier.
llvm-svn: 326637
The fast/linear DAG scheduler doesn't lower DBG_VALUEs except for
function entry nodes.
Patch by Joshua Cranmer!
Differential Revision: https://reviews.llvm.org/D43028
llvm-svn: 326631
Masking first, prevents the extend from being combine with loads. Its also interfering with some vXi1 extraction code.
Differential Revision: https://reviews.llvm.org/D42679
llvm-svn: 326500
This supports things like
(setcc ugt X, 0) -> (setcc ne X, 0)
I've restricted to only make changes to vectors before legalize ops because I doubt all targets have accurate condition code legality information for vectors given how little we did before.
Differential Revision: https://reviews.llvm.org/D42948
llvm-svn: 326495
AVX512 used to promote v32i1 to v32i8 during legalization when BWI was disabled. So this code was added to improve legalization of v32i1 concat_vectors of v16i1 by extending the v16i1 to v16i8 to avoid scalarization.
X86 has since switched to legalizing v32i1 by splitting to v16i1 instead. This has rendered this code unnecessary and its no longer exercised.
llvm-svn: 326153
Summary:
There are transformation that change setcc into other constructs, and transform that try to reconstruct a setcc from the brcond condition. Depending on what order these transform are done, the end result differs.
Most of the time, it is preferable to get a setcc as a brcond argument (and this is why brcond try to recreate the setcc in the first place) so we ensure this is done every time by also doing it at the setcc level when the only user is a brcond.
Reviewers: spatel, hfinkel, niravd, craig.topper
Subscribers: nhaehnle, llvm-commits
Differential Revision: https://reviews.llvm.org/D41235
llvm-svn: 325892
isCondCodeLegal internally checked Legal or Custom which is misleading. Though no targets set any cond code action to Custom today.
So I've renamed isCondCodeLegal to isCondCodeLegalOrCustom and added a real isCondCodeLegal that only checks Legal.
I've changed legalization code to use isCondCodeLegalOrCustom and left things reachable via DAG combine as isCondCodeLegal. I've also changed some places that called getCondCodeAction and compared to Legal to just use isCondCodeLegal.
I'm looking at trying to keep SETCC all the way to isel for the AVX512 integer comparisons and I suspect I'll need to make some condition codes Custom to stop DAG combine from changing things post LegalizeOps. Prior to this only Expand stopped DAG combine, but that causes LegalizeOps to try to swap operands or invert rather than calling our Custom handler.
Differential Revision: https://reviews.llvm.org/D43607
llvm-svn: 325829
This patch reverts r325440 and r325438 because it triggers an
assertion in SelectionDAGBuilder.cpp. Also having debug enabled
may unintentionally affect code-gen. The patch is reverted until
we find a better solution.
llvm-svn: 325825
This allows us to improve vector constant matching in more DAG code (backends, TargetLowering etc.).
Differential Revision: https://reviews.llvm.org/D43466
llvm-svn: 325815
We looked through a BITCAST, but the bitcast might be a from a scalar type rather than a vector.
I don't have a test case. I stumbled onto it while prototyping another change that isn't ready yet.
llvm-svn: 325750
This is split off from D42948 and includes just the cases that constant fold to true or false. It also includes some refactoring to keep predicate checks together.
This supports things like
(setcc uge X, 0) -> true
Differential Revision: https://reviews.llvm.org/D43489
llvm-svn: 325627
DAGCombiner and SimplifySetCC both use getPointerTy for shift amounts pre-legalization. DAGCombiner uses a single helper function to hide this. SimplifySetCC does it in multiple places.
This patch adds a defaulted parameter to getShiftAmountTy that can make it return getPointerTy for scalar types. Use this parameter to simplify the SimplifySetCC and DAGCombiner.
Additionally, there were two places in SimplifySetCC that were creating shifts using the target's preferred shift amount pre-legalization. If the target uses a narrow type and the type is illegal, this can cause SimplfiySetCC to create a shift with an amount that can't represent all possible shift values for the type. To fix this we should use pointer type there too.
Alternatively we could make getScalarShiftAmountTy for each target return a safe value for large types as proposed in D43445. And maybe we should still do that, but fixing the SimplifySetCC code keeps other targets from tripping over this in the future.
Fixes PR36250.
Differential Revision: https://reviews.llvm.org/D43449
llvm-svn: 325602
ExpandUINT_TO_FLOAT can accept vXi32 or vXi64 inputs, so we need to use a uint64_t shift to generate the 2^(BW/2) constant.
No test case unfortunately as no upstream target uses this, but its affecting a downstream target.
llvm-svn: 325578
If we have a clamp pattern, SMIN(SMAX(X, LO),HI) or SMAX(SMIN(X, HI),LO) then we can deduce that the number of signbits (zeros/ones) will be at least the minimum of the LO and HI constants.
ComputeKnownBits equivalent of D43338.
Differential Revision: https://reviews.llvm.org/D43463
llvm-svn: 325521
If we have a clamp pattern, SMIN(SMAX(X, LO),HI) or SMAX(SMIN(X, HI),LO) then we can deduce that the number of signbits will be at least the minimum of the LO and HI constants.
I haven't bothered with the UMIN/UMAX equivalent as (1) we don't have any current use cases and (2) I wonder if we'd be better off immediately falling back for ComputeKnownBits for UMIN/UMAX which already has optimization patterns useful for unsigned cases.
Differential Revision: https://reviews.llvm.org/D43338
llvm-svn: 325450
Summary:
https://llvm.org/PR36263 shows that when compiling at -O0 a dbg.value()
instruction (that remains from an original dbg.declare()) is dropped
by FastISel. Since FastISel selects instructions by iterating a basic
block backwards, it drops the dbg.value if one of its operands is not
yet instantiated by a previously selected instruction.
Instead of calling 'lookUpRegForValue()' we can call 'getRegForValue()'
instead that will insert a placeholder for the operand to be filled in
when continuing the instruction selection.
Reviewers: aprantl, dblaikie, probinson
Reviewed By: aprantl
Subscribers: llvm-commits, dstenb, JDevlieghere
Differential Revision: https://reviews.llvm.org/D43386
llvm-svn: 325438
Sadly, r324359 caused at least PR36312. There is a patch out for review
but it seems to be taking a bit and we've already had these crashers in
tree for too long. We're hitting this PR in real code now and are
blocked on shipping new compilers as a consequence so I'm reverting us
back to green.
Sorry for the churn due to the stacked changes that I had to revert. =/
llvm-svn: 325420
Based off the DemandedElts mask the and UNDEF elements returned from the SimplifyDemandedVectorElts calls to the shuffle operands, we can attempt to simplify the shuffle mask.
I had to be very conservative here as accepting post-legalized shuffle masks could cause problems for targets that legalize UNDEF mask elements back to inrange values (PowerPC), similarly combining to identity shuffle masks could cause too much UNDEF information to disappear for later combines.
llvm-svn: 325354
Summary:
Currently when expanding a SETCC node into a SELECT_CC, LLVM uses
an incorrect type for determining BooleanContent of the result. This
patch fixes the issue.
Fixes PR36079.
Reviewers: rogfer01, javed.absar, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43282
llvm-svn: 325325
Same for the sign extend case.
Currently we check for multiple uses on the binop. Then we call ExtendUsesToFormExtLoad to capture SetCCs that use the load. So we only end up finding any setccs when the and has additional uses and the load is used by a setcc. I don't think the and having multiple uses is relevant here. I think we should only be checking for the load having multiple uses.
This changes an NVPTX test because we now find that the load has a second use by a truncate, but ExtendUsesToFormExtLoad only looks at setccs it can extend. All other operations just check isTruncateFree. Maybe we should allow widening of an existing truncate even if its not free?
Differential Revision: https://reviews.llvm.org/D43063
llvm-svn: 325289
This is mainly a move of simplifyShuffleOperands from DAGCombiner::visitVECTOR_SHUFFLE to create a more general purpose TargetLowering::SimplifyDemandedVectorElts implementation.
Further features can be moved/added in future patches.
Differential Revision: https://reviews.llvm.org/D42896
llvm-svn: 325232
When creating high MachineMemOperand for MSTORE/MLOAD we supply
it with the original PointerInfo, while the pointer itself had been incremented.
The patch adds the proper offset to the PointerInfo.
llvm-svn: 325135
Making a width of GEP Index, which is used for address calculation, to be one of the pointer properties in the Data Layout.
p[address space]:size:memory_size:alignment:pref_alignment:index_size_in_bits.
The index size parameter is optional, if not specified, it is equal to the pointer size.
Till now, the InstCombiner normalized GEPs and extended the Index operand to the pointer width.
It works fine if you can convert pointer to integer for address calculation and all registered targets do this.
But some ISAs have very restricted instruction set for the pointer calculation. During discussions were desided to retrieve information for GEP index from the Data Layout.
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120416.html
I added an interface to the Data Layout and I changed the InstCombiner and some other passes to take the Index width into account.
This change does not affect any in-tree target. I added tests to cover data layouts with explicitly specified index size.
Differential Revision: https://reviews.llvm.org/D42123
llvm-svn: 325102
Summary:
If the and has an additional use we shouldn't invert it. That creates an additional instruction.
While there add a one use check to the transform above that looked similar.
Reviewers: spatel, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43225
llvm-svn: 325019
The bug has been lying dormant, but apparently was never exposed, until
after rL324941 because we didn't return the correct result
for shifts with undef operands.
llvm-svn: 325010
This started by noticing that scalar and vector types were producing different results with div ops in PR36305:
https://bugs.llvm.org/show_bug.cgi?id=36305
...but the problem is bigger. I couldn't keep it straight without a table, so I'm attaching that as a PDF to
the review. The x86 tests in undef-ops.ll correspond to that table.
Green means that instsimplify and the DAG agree on the result for all types.
Red means the DAG was returning undef when IR was not.
Yellow means the DAG was returning a non-undef result when IR returned undef.
This patch assumes that we're currently doing the right thing in IR.
Note: I couldn't find any problems with lowering vector constants as the code comments were warning,
but those comments were written long ago in rL36413 .
Differential Revision: https://reviews.llvm.org/D43141
llvm-svn: 324941
Armv8.1-A added an atomic load-clear instruction (which performs bitwise
and with the complement of it's operand), but not a load-and
instruction. Our current code-generation for atomic load-and always
inserts an MVN instruction to invert its argument, even if it could be
folded into a constant or another instruction.
This adds lowering early in selection DAG to convert a load-and
operation into an xor with -1 and a load-clear, allowing the normal DAG
optimisations to work on it.
To do this, I've had to add a new ISD opcode, ATOMIC_LOAD_CLR. I don't
see any easy way to do this with an AArch64-specific ISD node, because
the code-generation for atomic operations assumes the SDNodes are of
type AtomicSDNode.
I've left the old tablegen patterns in because they are still needed for
global isel.
Differential revision: https://reviews.llvm.org/D42478
llvm-svn: 324908
This reverses instcombine's demanded bits' transform which always tries to clear bits in constants.
As noted in PR35792 and shown in the test diffs:
https://bugs.llvm.org/show_bug.cgi?id=35792
...we can do better in codegen by trying to form -1. The x86 sub test shows a missed opportunity.
I did investigate changing instcombine's behavior, but it would be more work to change
canonicalization in IR. Clearing bits / shrinking constants can allow killing instructions,
so we'd have to figure out how to not regress those cases.
Differential Revision: https://reviews.llvm.org/D42986
llvm-svn: 324839
This allows us to recognise more saturation patterns and also simplify some MINMAX codegen that was failing to combine CMPGE comparisons to a legal CMPGT.
Differential Revision: https://reviews.llvm.org/D43014
llvm-svn: 324837
SelectionDAG::getBoolConstant was recently introduced. At the time I didn't know getConstTrueVal existed, but I think getBoolConstant is better as it will use the source VT to make sure it can properly detect floating point if it is configured differently.
llvm-svn: 324832
When adding operands to machine instructions in case of
RegisterSDNodes, generate a COPY node in case the register class
does not match the one in the instruction definition.
Differental Revision: https://reviews.llvm.org/D35561
llvm-svn: 324733
Many in SimplifySetCC and FoldSetCC try to create true or false constants. Some of them query getBooleanContents to figure out whether to use all ones or just 1 for true. But many places do not check and just use 1 without ensuring the VT has an i1 scalar type. Note sure if those places only trigger before type legalization so they only see an i1
type?
To cleanup the inconsistency and reduce some duplicated code, this patch adds a getBoolConstant method to SelectionDAG that takes are of querying getBooleanContents and doing the right thing.
Differential Revision: https://reviews.llvm.org/D43037
llvm-svn: 324634
We're passing the binary op that uses the load instead of the load.
Noticed by inspection. Not sure how to test this because this just prevents the introduction of an extend that will later be truncated and will probably be combined out.
llvm-svn: 324568