BtVer2 - Fixes schedules for (V)CVTPS2PD instructions
A lot of the Intel models still have too many InstRW overrides for these new classes - this needs cleaning up but I wanted to get the classes in first
llvm-svn: 332376
The test case added in r332265 had incomplete livein information which
was caught by the EXPENSIVE_CHECKS bot. Fix the livein information and
add -verify-machineinstrs to the test case.
llvm-svn: 332367
This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups.
It gets us most of the FMF functionality that we want without adding any state bits to the flags. It
also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch.
It should provide a superset of the functionality from D46563 - the extra tests show propagation and
codegen diffs for fcmp, vecreduce, and FP libcalls.
The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last
node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently,
so that shouldn't make any difference.
Differential Revision: https://reviews.llvm.org/D46854
llvm-svn: 332358
Btver2 - VCVTPH2PSYrm needs to double pump the AGU
Broadwell - missing VCVTPS2PH*mr stores extra latency
Allows us to remove the WriteCvtF2FSt conversion store class
llvm-svn: 332357
Summary:
New unsigned saturation downconvert patterns detection was implemented in
X86 Codegen:
(truncate (smin (smax (x, C1), C2)) to dest_type),
where C1 >= 0 and C2 is unsigned max of destination type.
(truncate (smax (smin (x, C2), C1)) to dest_type)
where C1 >= 0, C2 is unsigned max of destination type and C1 <= C2.
These two patterns are equivalent to:
(truncate (umin (smax(x, C1), unsigned_max_of_dest_type)) to dest_type)
Reviewers: RKSimon
Subscribers: llvm-commits, a.elovikov
Differential Revision: https://reviews.llvm.org/D45315
llvm-svn: 332336
Change relocation output so that relocation information follows
individual instructions rather than clustering them at the end
of packets.
This change required shifting block of code but the actual change
is in HexagonPrettyPrinter's PrintInst.
Differential Revision: https://reviews.llvm.org/D46728
llvm-svn: 332283
Summary:
The BranchFolding pass is currently missing opportunities to hoist
common code if the hoisted-to block contains a single conditional branch
that has register uses. This occurs somewhat frequently on AArch64 with
CBZ/TBZ opcodes.
This change also eliminates some code differences when debug info is
present since the presence of e.g. DBG_VALUE instructions in the
hoisted-to block can enable hoisting that wouldn't have occurred without
them.
Reviewers: MatzeB, rnk, kparzysz, twoh, aprantl, javed.absar
Subscribers: kristof.beyls, JDevlieghere, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46324
llvm-svn: 332265
When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead.
Differential revision: https://reviews.llvm.org/D46655
llvm-svn: 332251
We cannot query this attribute from a subtarget given a machine function.
At this point attribute itself is already unavailable and can only be
obtained through MFI.
Differential Revision: https://reviews.llvm.org/D46781
llvm-svn: 332166
Summary:
We have no logic to promote alloca to vector for an AddrSpaceCast instruction.
Reviewer:
arsenm
Differential Revision:
https://reviews.llvm.org/D45993
llvm-svn: 332147
Remove a useless SwitchSection which also causes compilation failure
when IR contains comdat.
The SwitchSection is useless because the current section is already
correct text section for the function therefore no need to switch.
It causes compilation failure for comdat because functions with comdat
has specific text section, not the default .text section.
Since HIP uses comdat, this bug caused failures for HIP.
Differential Revision: https://reviews.llvm.org/D46770
llvm-svn: 332137
ExtendSetCCUses updates SETCC nodes which use a load (OriginalLoad) to
reflect a simplification to the load (ExtLoad).
Based on my reading, ExtendSetCCUses may create new nodes to extend a
constant attached to a SETCC. It also creates fresh SETCC nodes which
refer to any updated operands.
ISTM that the location applied to the new constant and SETCC nodes
should be the same as the location of the ExtLoad.
This was suggested by Adrian in https://reviews.llvm.org/D45995.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46216
llvm-svn: 332119
This teaches tryToFoldExtOfLoad to set the right location on a
newly-created extload. With that in place, the logic for performing a
certain ([s|z]ext (load ...)) combine becomes identical for sexts and
zexts, and we can get rid of one copy of the logic.
The test case churn is due to dependencies on IROrders inherited from
the wrong SDLoc.
Part of: llvm.org/PR37262
Differential Revision: https://reviews.llvm.org/D46158
llvm-svn: 332118
These directives allow the 'C' (compressed) extension to be enabled/disabled
within a single file.
Differential Revision: https://reviews.llvm.org/D45864
Patch by Kito Cheng
llvm-svn: 332107
Summary:
performPostLD1Combine in AArch64ISelLowering looks for vector
insert_vector_elt of a loaded value which it can optimize into a single
LD1LANE instruction. The code checking for the pattern was not checking
if the lane index was a constant which could cause two problems:
- an assert when lowering the LD1LANE ISD node since it assumes an
constant operand
- an assert in isel if the lane index value depends on the
post-incremented base register
Both of these issues are avoided by simply checking that the lane index
is a constant.
Fixes bug 35822.
Reviewers: t.p.northover, javed.absar
Subscribers: rengolin, kristof.beyls, mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D46591
llvm-svn: 332103
Clang's codegen now uses 128-bit masked load/store intrinsics in IR. The backend will widen to 512-bits on AVX512F targets.
So this patch adds patterns to detect codegen's widening and patterns for AVX512VL that don't get widened.
We may be able to drop some of the old patterns, but I leave that for a future patch.
llvm-svn: 332049
Summary: The final -wasm component has been the default for some time now.
Subscribers: jfb, dschuff, jgravelle-google, eraman, aheejin, JDevlieghere, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D46342
llvm-svn: 332007
With nnan, there's no need for the masked merge / blend
sequence (that probably costs much more than the min/max
instruction).
Somewhere between clang 5.0 and 6.0, we started producing
these intrinsics for fmax()/fmin() in C source instead of
libcalls or fcmp/select. The backend wasn't prepared for
that, so we regressed perf in those cases.
Note: it's possible that other targets have similar problems
as seen here.
Noticed while investigating PR37403 and related bugs:
https://bugs.llvm.org/show_bug.cgi?id=37403
The IR FMF propagation cases still don't work. There's
a proposal that might fix those cases in D46563.
llvm-svn: 331992