We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332550
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332549
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332548
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332547
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332539
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332538
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332537
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332534
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332533
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we correct FP undef folding.
llvm-svn: 332532
This breaks the code which saves and restores LR, so we can't outline
without doing something more complicated for stack adjustment.
Found by inspection; we get lucky in most cases because getMemOpInfo
only handles STRWpost, not any other pre/post-increment forms. But it
hits a couple of artificial testcases in the tree.
Differential Revision: https://reviews.llvm.org/D46920
llvm-svn: 332529
As suggested by Fabian on PR37441, use PSHUFLW to extend shift amount types for use with PSRAD/PSRLD to reduce register pressure.
Some of this ideally would be done by combineTargetShuffle but its tricky to do as most of the shuffles are sharing inputs.
Differential Revision: https://reviews.llvm.org/D46959
llvm-svn: 332524
The cost computation assumes we do this correctly, but the actual
lowering was wrong.
Differential Revision: https://reviews.llvm.org/D46923
llvm-svn: 332514
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we make those fixes.
llvm-svn: 332501
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we make those fixes.
llvm-svn: 332500
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we make those fixes.
llvm-svn: 332499
As i64 types are not legal on 32-bit targets, insert these into a suitable zero vector and use the packed vXi64<->FP conversion instructions instead.
Fixes PR3163.
Differential Revision: https://reviews.llvm.org/D43441
llvm-svn: 332498
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.
Differential Revision https://reviews.llvm.org/D46477
llvm-svn: 332482
We currently handle all aggregates by creating one large LLT, and letting the
legalizer deal with splitting them up. However using this approach means that
we can't support big endian code correctly.
This patch changes the way that the IRTranslator deals with aggregate values,
by splitting them up into their constituent element values. To do this, parts
of the translator need to be modified to deal with multiple VRegs for a single
Value.
A new Value to VReg mapper is introduced to help keep compile time under
control, currently there is no measurable impact on CTMark despite the extra
code being generated in some cases.
Patch is based on the original work of Tim Northover.
Differential Revision: https://reviews.llvm.org/D46018
llvm-svn: 332449
Add support for this target hook, covering MIPS, microMIPS and MIPSR6, along
with some tests. Also add missing getOppositeBranchOpc() cases exposed by the
tests.
Reviewers: atanasyan, abeserminji, smaksimovic
Differential Revision: https://reviews.llvm.org/D46794
llvm-svn: 332446
This patch re-introduces the "S" inline assembler constraint. This matches
an absolute symbolic address or a label reference. The primary use case is
asm("adrp %0, %1\n\t"
"add %0, %0, :lo12:%1" : "=r"(addr) : "S"(&var));
I say re-introduces as it seems like "S" was implemented in the original
AArch64 backend, but it looks like it wasn't carried forward to the merged
backend. The original implementation had A and L modifiers that could be
used to print ":lo12:" to the string. It looks like gcc doesn't use these
and :lo12: is expected to be written in the inline assembly string so I've
not implemented A and L. Clang already supports the S modifier.
Fixes PR37180
Differential Revision: https://reviews.llvm.org/D46745
llvm-svn: 332444
Summary:
SelectionDAGLegalize::ExpandNode() inserts an ISD::MUL when lowering a
BR_JT opcode. While many backends optimize this multiply into a shift, e.g.
the MIPS backend currently always lowers this into a sequence of
load-immediate+multiply+mflo in MipsSETargetLowering::lowerMulDiv().
I initially changed the multiply to a shift in the MIPS backend but it
turns out that would not have handled the MIPSR6 case and was a lot more
code than doing it in LegalizeDAG.
I believe performing this simple optimization in LegalizeDAG instead of
each individual backend is the better solution since this also fixes other
backeds such as MSP430 which calls the multiply runtime function
__mspabi_mpyi without this patch.
Reviewers: sdardis, atanasyan, pftbest, asl
Reviewed By: sdardis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45760
llvm-svn: 332439
It doesn't matter much this late in the pipeline, but one place that
does check for it is the function alignment code.
Differential Revision: https://reviews.llvm.org/D46373
llvm-svn: 332415
It is legal for the type passed to isLegalAddressingMode to be
unsized or, more specifically, VoidTy. In this case, we must
check the legality of load / stores for all legal types. Directly
trying to call getTypeStoreSize is incorrect, and leads to breakage
in e.g. Loop Strength Reduction. This change guards against that
behaviour.
Differential Revision: https://reviews.llvm.org/D40405
llvm-svn: 332409
When storing the 0th lane of a vector, use a simpler and usually more
efficient scalar store instead. In this case, also using the unscaled
offset.
Differential revision: https://reviews.llvm.org/D46762
llvm-svn: 332394
specially handle SETB_C* pseudo instructions.
Summary:
While the logic here is somewhat similar to the arithmetic lowering, it
is different enough that it made sense to have its own function.
I actually tried a bunch of different optimizations here and none worked
well so I gave up and just always do the arithmetic based lowering.
Looking at code from the PR test case, we actually pessimize a bunch of
code when generating these. Because SETB_C* pseudo instructions clobber
EFLAGS, we end up creating a bunch of copies of EFLAGS to feed multiple
SETB_C* pseudos from a single set of EFLAGS. This in turn causes the
lowering code to ruin all the clever code generation that SETB_C* was
hoping to achieve. None of this is needed. Whenever we're generating
multiple SETB_C* instructions from a single set of EFLAGS we should
instead generate a single maximally wide one and extract subregs for all
the different desired widths. That would result in substantially better
code generation. But this patch doesn't attempt to address that.
The test case from the PR is included as well as more directed testing
of the specific lowering pattern used for these pseudos.
Reviewers: craig.topper
Subscribers: sanjoy, mcrosier, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D46799
llvm-svn: 332389
BtVer2 - Fixes schedules for (V)CVTPS2PD instructions
A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first
llvm-svn: 332376
The test case added in r332265 had incomplete livein information which
was caught by the EXPENSIVE_CHECKS bot. Fix the livein information and
add -verify-machineinstrs to the test case.
llvm-svn: 332367
This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups.
It gets us most of the FMF functionality that we want without adding any state bits to the flags. It
also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch.
It should provide a superset of the functionality from D46563 - the extra tests show propagation and
codegen diffs for fcmp, vecreduce, and FP libcalls.
The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last
node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently,
so that shouldn't make any difference.
Differential Revision: https://reviews.llvm.org/D46854
llvm-svn: 332358
Btver2 - VCVTPH2PSYrm needs to double pump the AGU
Broadwell - missing VCVTPS2PH*mr stores extra latency
Allows us to remove the WriteCvtF2FSt conversion store class
llvm-svn: 332357
Summary:
New unsigned saturation downconvert patterns detection was implemented in
X86 Codegen:
(truncate (smin (smax (x, C1), C2)) to dest_type),
where C1 >= 0 and C2 is unsigned max of destination type.
(truncate (smax (smin (x, C2), C1)) to dest_type)
where C1 >= 0, C2 is unsigned max of destination type and C1 <= C2.
These two patterns are equivalent to:
(truncate (umin (smax(x, C1), unsigned_max_of_dest_type)) to dest_type)
Reviewers: RKSimon
Subscribers: llvm-commits, a.elovikov
Differential Revision: https://reviews.llvm.org/D45315
llvm-svn: 332336
Change relocation output so that relocation information follows
individual instructions rather than clustering them at the end
of packets.
This change required shifting block of code but the actual change
is in HexagonPrettyPrinter's PrintInst.
Differential Revision: https://reviews.llvm.org/D46728
llvm-svn: 332283
Summary:
The BranchFolding pass is currently missing opportunities to hoist
common code if the hoisted-to block contains a single conditional branch
that has register uses. This occurs somewhat frequently on AArch64 with
CBZ/TBZ opcodes.
This change also eliminates some code differences when debug info is
present since the presence of e.g. DBG_VALUE instructions in the
hoisted-to block can enable hoisting that wouldn't have occurred without
them.
Reviewers: MatzeB, rnk, kparzysz, twoh, aprantl, javed.absar
Subscribers: kristof.beyls, JDevlieghere, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46324
llvm-svn: 332265
When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead.
Differential revision: https://reviews.llvm.org/D46655
llvm-svn: 332251
We cannot query this attribute from a subtarget given a machine function.
At this point attribute itself is already unavailable and can only be
obtained through MFI.
Differential Revision: https://reviews.llvm.org/D46781
llvm-svn: 332166
Summary:
We have no logic to promote alloca to vector for an AddrSpaceCast instruction.
Reviewer:
arsenm
Differential Revision:
https://reviews.llvm.org/D45993
llvm-svn: 332147
Remove a useless SwitchSection which also causes compilation failure
when IR contains comdat.
The SwitchSection is useless because the current section is already
correct text section for the function therefore no need to switch.
It causes compilation failure for comdat because functions with comdat
has specific text section, not the default .text section.
Since HIP uses comdat, this bug caused failures for HIP.
Differential Revision: https://reviews.llvm.org/D46770
llvm-svn: 332137
ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to
reflect a simplification to the load (ExtLoad).
Based on my reading, ExtendSetCCUses may create new nodes to extend a
constant attached to a SETCC. It also creates fresh SETCC nodes which
refer to any updated operands.
ISTM that the location applied to the new constant and SETCC nodes
should be the same as the location of the ExtLoad.
This was suggested by Adrian in https://reviews.llvm.org/D45995.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46216
llvm-svn: 332119
This teaches tryToFoldExtOfLoad to set the right location on a
newly-created extload. With that in place, the logic for performing a
certain ([s|z]ext (load ...)) combine becomes identical for sexts and
zexts, and we can get rid of one copy of the logic.
The test case churn is due to dependencies on IROrders inherited from
the wrong SDLoc.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46158
llvm-svn: 332118
These directives allow the 'C' (compressed) extension to be enabled/disabled
within a single file.
Differential Revision: https://reviews.llvm.org/D45864
Patch by Kito Cheng
llvm-svn: 332107
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction. The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:
- an assert when lowering the LD1LANE ISD node since it assumes an
constant operand
- an assert in isel if the lane index value depends on the
post-incremented base register
Both of these issues are avoided by simply checking that the lane index
is a constant.
Fixes bug 35822.
Reviewers: t.p.northover, javed.absar
Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46591
llvm-svn: 332103
Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets.
So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened.
We may be able to drop some of the old patterns, but I leave that for a future patch.
llvm-svn: 332049
Summary: The final -wasm component has been the default for some time now.
Subscribers: jfb, dschuff, jgravelle-google, eraman, aheejin, JDevlieghere, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D46342
llvm-svn: 332007
With nnan, there's no need for the masked merge / blend
sequence (that probably costs much more than the min/max
instruction).
Somewhere between clang 5.0 and 6.0, we started producing
these intrinsics for fmax()/fmin() in C source instead of
libcalls or fcmp/select. The backend wasn't prepared for
that, so we regressed perf in those cases.
Note: it's possible that other targets have similar problems
as seen here.
Noticed while investigating PR37403 and related bugs:
https://bugs.llvm.org/show_bug.cgi?id=37403
The IR FMF propagation cases still don't work. There's
a proposal that might fix those cases in D46563.
llvm-svn: 331992
Clang 6.0 was updated to create these intrinsics rather than
libcalls or fcmp/select, but the backend wasn't prepared to
handle that optimally.
This bug is not the primary reason for PR37403:
https://bugs.llvm.org/show_bug.cgi?id=37403
...but it's probably more important for x86 perf.
llvm-svn: 331988
Summary:
The combine in rebuildSetCC may be combined to another
node leaving our references stale. Keep a handle on
it to avoid stale references.
Fixes PR36602.
Reviewers: dbabokin, RKSimon, eli.friedman, davide
Subscribers: hiraditya, uabelho, JesperAntonsson, qcolombet, llvm-commits
Differential Revision: https://reviews.llvm.org/D46404
llvm-svn: 331985
Const/local/shared address spaces are all < 4GB and we can always use
32-bit pointers to access them. This has substantial performance impact
on kernels that uses shared memory for intermediary results.
The feature is disabled by default.
Differential Revision: https://reviews.llvm.org/D46147
llvm-svn: 331941
The second source operand of G_SHL, G_ASHR, and G_LSHR must preserve its
value as a (small) unsigned integer, therefore its incorrect to widen it
in any way but by zero extending it.
G_SHL was using G_ANYEXT and G_ASHR - G_SEXT (which is correct for their
destination and first source operands, but not the "number of bits to
shift" operand).
Generally, shifts aren't as similar to regular binary operations as it
might seem, for instance, they aren't commutative nor associative and
the second source operand usually requires a special treatment.
Reviewers: bogner, javed.absar, aivchenk, rovka
Reviewed By: bogner
Subscribers: igorb, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D46413
llvm-svn: 331926
If a multiply is truncated, SimplifyDemandedBits
sometimes turns a zero_extend of the inputs into an
any_extend, which makes the known bits computation unhelpful.
Ignore these and compute known bits for the underlying value,
since we insert the correct extend type after.
llvm-svn: 331919
This is an extension of an existing combine to reduce wider
shls if the result fits in the final result type. This
introduces the same combine, but reduces the shift to a middle
sized type to avoid the slow 64-bit shift.
llvm-svn: 331916
If the truncate is only accessing the first element of the vector,
we can use the original source value.
This helps with some combine ordering issues after operations are
lowered to integer operations between bitcasts of build_vector.
In particular it stops unnecessarily materializing the unused
top half of a vector in some cases.
llvm-svn: 331909