The CONCAT_VECTORS case was using the original mask element count to determine how to adjust the broadcast index. But if we looked through a bitcast the original mask size doesn't tell us anything about the concat_vectors.
This patch switchs to using the concat_vectors input element count directly instead.
Differential Revision: https://reviews.llvm.org/D53823
llvm-svn: 345626
The SchedModel allows the addition of ReadAdvances to express that certain
operands of the instructions are needed at a later point than the others.
RegAlloc may add pseudo operands that are not part of the instruction
descriptor, and therefore cannot have any read advance entries. This meant
that in some cases the desired read advance was nullified by such a pseudo
operand, which still had the original latency.
This patch fixes this by making sure that such pseudo operands get a zero
latency during DAG construction.
Review: Matthias Braun, Ulrich Weigand.
https://reviews.llvm.org/D49671
llvm-svn: 345606
Narrowing vector binops came up in the demanded bits discussion in D52912.
I don't think we're going to be able to do this transform in IR as a canonicalization
because of the risk of creating unsupported widths for vector ops, but we already have
a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is
currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression
test diffs).
This is artificially limited to not look through bitcasts because there are so many
test diffs already, but that's marked with a TODO and is a small follow-up.
Differential Revision: https://reviews.llvm.org/D53784
llvm-svn: 345602
Summary:
Because of the D48768, that pattern is always unfolded into pattern d,
thus we had no test coverage.
Reviewers: RKSimon, craig.topper
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53574
llvm-svn: 345583
Similar to FoldCONCAT_VECTORS, this patch adds FoldBUILD_VECTOR to simplify cases that can avoid the creation of the BUILD_VECTOR - if all the operands are UNDEF or if the BUILD_VECTOR simplifies to a copy.
This exposed an assumption in some AMDGPU code that getBuildVector was guaranteed to be a BUILD_VECTOR node that I've tried to handle.
Differential Revision: https://reviews.llvm.org/D53760
llvm-svn: 345578
- Relex hard coded registers and stack frame sizes
- Some test cleanups
- Change phi-dbg.ll to match on mir output after phi elimination instead
of going through the whole codegen pipeline.
This is in preparation for https://reviews.llvm.org/D52010
I'm committing all the test changes upfront that work before and after
independently.
llvm-svn: 345532
The machine verifier was disabled for x86 by default. There are now only
9 tests failing, compared to what previously was between 20 and 30.
This is a good opportunity to file bugs for all the remaining issues,
then explicitly disable the failing tests and enabling the machine
verifier by default.
This allows us to avoid adding new tests that break the verifier.
PR27481
llvm-svn: 345513
Add an intrinsic that takes 2 integers and perform saturation subtraction on
them.
This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.
Differential Revision: https://reviews.llvm.org/D53783
llvm-svn: 345512
This test breaks the X86 MachineVerifier. It looks like the MIR part is
completely useless.
The original author suggests that it can be removed.
Differential Revision: https://reviews.llvm.org/D53767
llvm-svn: 345501
When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction.
Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that.
llvm-svn: 345488
Add vector support to TargetLowering::expandFP_TO_UINT.
This exposes an issue in X86TargetLowering::LowerVSELECT which was assuming that the select mask was the same width as the LHS/RHS ops - as long as the result is a sign splat we can easily sext/trunk this.
llvm-svn: 345473
Adding the baseline tests in a preparatory NFC commit,
so that the actual commit shows the *diff*.
Yes, i'm aware that a few of these codegen-based sched tests
are testing wrong instructions, i will fix that afterwards.
For https://reviews.llvm.org/D52779
llvm-svn: 345462
On GreenDragon, CodeGen/X86/cpus-no-x86_64.ll was still timing out even
after breaking up the original test. I further split off the intel and
AMD cpus which hopefully resolves this.
http://green.lab.llvm.org/green/job/clang-stage2-cmake-RgSan/
llvm-svn: 345438
Summary:
The main challenge here is that X86InstrInfo::AnalyzeBranch doesn't
understand the way we're using a CALL instruction as a branch, so we
can't list the CallTarget MBB as a successor of the entry block. If we
don't list it as a successor, then the AsmPrinter doesn't print a label
for the MBB.
Fix the issue by inserting our own label at the beginning of the call
target block. We can rely on the AsmPrinter to always emit it, even
though the block appears to be unreachable, but address-taken.
Fixes PR38391.
Reviewers: thegameg, chandlerc, echristo
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D53653
llvm-svn: 345426
These promotions add additional bitcasts to the SelectionDAG that can pessimize computeKnownBits/computeNumSignBits. It also seems to interfere with broadcast formation.
This patch removes the promotion and adds isel patterns instead.
The increased table size is more than I would like, but hopefully we can find some canonicalizations or other tricks to start pruning out patterns going forward.
Differential Revision: https://reviews.llvm.org/D53268
llvm-svn: 345408
This is a narrow fix for 1 of the problems mentioned in PR27780:
https://bugs.llvm.org/show_bug.cgi?id=27780
I looked at more general solutions, but it's a mess. We canonicalize shuffle masks
based on the number of elements accessed from each operand, and that's not optional.
If you remove that, we'll crash because we fail to match isel patterns. So I'm
waiting until we're sure that we have blendvb with constant condition and then
commuting based on the load potential. Other cases like blend-with-immediate are
already handled elsewhere, so this is probably not a common problem anyway.
I didn't use "MayFoldLoad" because that checks for one-use and in these cases, we've
screwed that up by creating a temporary PSHUFB using these operands that we're counting
on to be killed later. Undoing that didn't look like a simple task because it's
intertwined with determining if we actually use both operands of the shuffle or not.a
Differential Revision: https://reviews.llvm.org/D53737
llvm-svn: 345390
.debug_loclists is the DWARF 5 version of the .debug_loc.
With that patch, it will be emitted when DWARF 5 is used.
Differential revision: https://reviews.llvm.org/D53365
llvm-svn: 345377
The required-vector-width attribute was only used for backend testing and has never been generated by clang.
I believe clang is now generating min-legal-vector-width for vector uses in user code.
With this I believe passing -mprefer-vector-width=256 to clang should prevent use of zmm registers in the generated assembly unless the user used a 512-bit intrinsic in their source code.
llvm-svn: 345317
FENTRY_CALL is actually not taking any input / output operands. The
machine verifier complains now because the target description says that:
* It needs 1 unknown output
* It needs 1 or more variable inputs
llvm-svn: 345316
KNL is based on a modified Silvermont core so I don't think these features apply. I think the LEA flag is probably also wrong, but I'm less sure as I barely understand the 3 LEA flags we have currently.
Differential Revision: https://reviews.llvm.org/D53671
llvm-svn: 345285
The current state of the llc invocation is:
* Running all the passes from dwarfehprepare to stack coloring
(included)
* It runs it from the LLVM IR included in the file
* It *ADDS* the generated MI from ISel to the MI in the MIR file
* The machine verifier doesn't like it.
Differential Revision: https://reviews.llvm.org/D53698
llvm-svn: 345266
As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well.
Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets.
The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it.
Differential Revision: https://reviews.llvm.org/D53649
llvm-svn: 345256
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines.
Differential Revision: https://reviews.llvm.org/D53287
llvm-svn: 345250
Instead of using the MOVGOT64r pseudo, use the existing
MO_PIC_BASE_OFFSET support on symbol operands. Now I don't have to
create a "scratch register operand" for the pseudo to use, and the
register allocator can make better decisions.
Fixes some X86 verifier errors tracked in PR27481.
llvm-svn: 345219
It's possible to do a tail call to a stack argument. LLVM already
calculates the right stack offset to call through.
Fixes the sibcall* and musttail* verifier failures tracked at PR27481.
llvm-svn: 345197
Add X86 SimplifyDemandedBitsForTargetNode and use it to simplify PMULDQ/PMULUDQ target nodes.
This enables us to repeatedly simplify the node's arguments after the previous approach had to be reverted due to PR39398.
Differential Revision: https://reviews.llvm.org/D53643
llvm-svn: 345182
This patch brings back the MOV64r0 pseudo instruction for zeroing a 64-bit register. This replaces the SUBREG_TO_REG MOV32r0 sequence we use today. Post register allocation we will rewrite the MOV64r0 to a 32-bit xor with an implicit def of the 64-bit register similar to what we do for the various XMM/YMM/ZMM zeroing pseudos.
My main motivation is to enable the spill optimization in foldMemoryOperandImpl. As we were seeing some code that repeatedly did "xor eax, eax; store eax;" to spill several registers with a new xor for each store. With this optimization enabled we get a store of a 0 immediate instead of an xor. Though I admit the ideal solution would be one xor where there are multiple spills. I don't believe we have a test case that shows this optimization in here. I'll see if I can try to reduce one from the code were looking at.
There's definitely some other machine CSE(and maybe other passes) behavior changes exposed by this patch. So it seems like there might be some other deficiencies in SUBREG_TO_REG handling.
Differential Revision: https://reviews.llvm.org/D52757
llvm-svn: 345165
A lifetime end intrinsic between a tail call and the return should not
prevent the call from being tail call optimized.
Differential Revision: https://reviews.llvm.org/D53519
llvm-svn: 345163