This adds partial instruction selection support for llvm.aarch64.stlxr. It also
factors out selection for G_INTRINSIC_W_SIDE_EFFECTS into its own function. The
new function removes the restriction that the intrinsic ID on the
G_INTRINSIC_W_SIDE_EFFECTS be on operand 0.
Also add a test, and add a GISel line to arm64-ldxr-stxr.ll.
Differential Revision: https://reviews.llvm.org/D60100
llvm-svn: 357518
Both LLVM 8.0.0 and current trunk fail to compile on Solaris with GCC 8.1.0:
/vol/llvm/src/llvm/dist/utils/FileCheck/FileCheck.cpp: In function ‘void DumpAnnotatedInput(llvm::raw_ostream&, const llvm::FileCheckRequest&, llvm::StringRef, std::vector<InputAnnotation>&, unsigned int)’:
/vol/llvm/src/llvm/dist/utils/FileCheck/FileCheck.cpp:408:41: error: call of overloaded ‘log10(unsigned int&)’ is ambiguous
unsigned LineNoWidth = log10(LineCount) + 1;
^
In file included from /vol/gcc-8/lib/gcc/i386-pc-solaris2.11/8.1.0/include-fixed/math.h:24,
from /vol/gcc-8/include/c++/8.1.0/cmath:45,
from /vol/llvm/src/llvm/dist/include/llvm-c/DataTypes.h:28,
from /vol/llvm/src/llvm/dist/include/llvm/Support/DataTypes.h:16,
from /vol/llvm/src/llvm/dist/include/llvm/ADT/Hashing.h:47,
from /vol/llvm/src/llvm/dist/include/llvm/ADT/ArrayRef.h:12,
from /vol/llvm/src/llvm/dist/include/llvm/Support/CommandLine.h:22,
from /vol/llvm/src/llvm/dist/utils/FileCheck/FileCheck.cpp:18:
/vol/gcc-8/lib/gcc/i386-pc-solaris2.11/8.1.0/include-fixed/iso/math_iso.h:209:21: note: candidate: ‘long double std::log10(long double)’
inline long double log10(long double __X) { return __log10l(__X); }
^~~~~
/vol/gcc-8/lib/gcc/i386-pc-solaris2.11/8.1.0/include-fixed/iso/math_iso.h:170:15: note: candidate: ‘float std::log10(float)’
inline float log10(float __X) { return __log10f(__X); }
^~~~~
/vol/gcc-8/lib/gcc/i386-pc-solaris2.11/8.1.0/include-fixed/iso/math_iso.h:70:15: note: candidate: ‘double std::log10(double)’
extern double log10 __P((double));
^~~~~
Fixed by using std::log10 instead, which allowed the compilation on i386-pc-solaris2.11
and sparc-sun-solaris2.11 to continue.
Differential Revision: https://reviews.llvm.org/D60043
llvm-svn: 357509
Set the correct debug location on instructions which load arguments in
preparation for a call to an arg-promoted function.
This prevents location cascade from misattributing the line/scope of one
of these loads to the location of the instruction preceding the call.
Differential Revision: https://reviews.llvm.org/D60113
llvm-svn: 357500
Bug: https://bugs.llvm.org/show_bug.cgi?id=41180
In the bug test case the debug location was missing for the cmp instruction in
the "middle block" BB. This patch fixes the bug by copying the debug location
from the cmp of the scalar loop's terminator branch, if it exists.
The patch also fixes the debug location on the subsequent branch instruction.
It was previously using the location of the of the original loop's pre-header
block terminator. Both of these instructions will now map to the source line of
the conditional branch in the original loop.
A regression test has been added that covers these issues.
Patch by Orlando Cazalet-Hyams!
Differential Revision: https://reviews.llvm.org/D59944
llvm-svn: 357499
Did experiments on power 9 machine, checked the outputs for NaN & Infinity+
cases with corresponding DCMX bit set. Confirmed the DCMX mask bit for NaN and
infinity+ are reversed.
This patch fixes the issue.
Patch by Victor Huang.
Differential Revision: https://reviews.llvm.org/D59384
llvm-svn: 357494
The code was failing to actually check for the presence of the call to widenable_condition. The whole point of specifying the widenable_condition intrinsic was allowing widening transforms. A normal branch is not widenable. A normal branch leading to a deopt is not widenable (in general).
I added a test case via LoopPredication, but GuardWidening has an analogous bug. Those are the only two passes actually using this utility just yet. Noticed while working on LoopPredication for non-widenable branches; POC in D60111.
llvm-svn: 357493
Summary:
Some flags accepted by --set-section-flags and --rename-section can change a SHT_NOBITS section to a SHT_PROGBITS section. Note that none of them can change a SHT_PROGBITS to SHT_NOBITS.
The full list (found via experimentation of individually setting each flag) that does this is: contents, load, noload, code, data, rom, and debug.
This was found by testing llvm-objcopy with the gnu binutils test suite, specifically this test case: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob;f=binutils/testsuite/binutils-all/copy-1.d;h=f2b0d9e90df738c2891b4d5c7b62f62894b556ca;hb=HEAD
Reviewers: jhenderson, grimar, jakehehrlich, alexshap, espindola
Reviewed By: jhenderson
Subscribers: emaste, arichardson, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59958
llvm-svn: 357492
When we're cross-compiling, build and use a native llvm-nm instead of
attempting to use the one from the target's build tree.
A nice follow-up would be to add a cache variable to allow specifying a
path to an external native llvm-nm instead of building one ourselves,
similar to LLVM_TABLEGEN and LLVM_CONFIG_PATH.
Differential Revision: https://reviews.llvm.org/D60025
llvm-svn: 357487
Instead of duplicating functionality for building native versions of
tblgen and llvm-config, add a function to set up a native tool build.
This will also be used for llvm-nm in a follow-up.
This should be NFC for tblgen, besides the slightly different COMMENT
for the custom command (it'll display the tablegen target name instead
of always saying TableGen). For the native llvm-config, it's a behavior
change in that we'll use llvm_ExternalProject_BuildCmd instead of
constructing the build command manually, always build in Release, and
reference the correct binary path for multi-config generators. I believe
all of these changes to be bug fixes.
Differential Revision: https://reviews.llvm.org/D60024
llvm-svn: 357486
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.
Reviewers: fedor.sergeev, majnemer
Reviewed By: majnemer
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60080
llvm-svn: 357485
Summary: It is possible that multiple indirect call targets have been promoted for a single callsite from the profiled binary. Current implementation repeats promotion for all these targets as far as the callsite itself is hot (the callsite is assumed to be hot if any one of these targets was "hot" during the profiling). However, even when one of the ICPed target is hot other targets may not, and we should not repeat promotion for "cold" targets.
Reviewers: danielcdh, wmi
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59940
llvm-svn: 357484
Summary:
When inserting an `unreachable` after a noreturn call, we must ensure
that it's not a musttail call to avoid breaking the IR invariants for
musttail calls.
Reviewers: fedor.sergeev, majnemer
Reviewed By: majnemer
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60079
llvm-svn: 357483
For shift and rotate instructions that only use the last 6 bits of the shift
amount, a shift amount of (x*64-s) can be substituted with (-s). This saves
one instruction and a register:
lhi %r1, 64
sr %r1, %r3
sllg %r2, %r2, 0(%r1)
=>
lcr %r1, %r3
sllg %r2, %r2, 0(%r1)
Review: Ulrich Weigand
llvm-svn: 357481
`StoreInst::getValueOperand` is identical to `getOperand(0)`, so the call to
`getOperand(0)` can be replaced. Further, `SI->getValueOperand` is redundantly
called just a few lines down, despite its return value being stored in variable
`DV`. No functional change.
llvm-svn: 357479
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=41293 and
https://bugs.llvm.org/show_bug.cgi?id=41045. llvm-objcopy assumed that
it could always read a section header string table. This isn't the case
when the sections were previously all stripped, and the e_shstrndx field
was set to 0. This patch fixes this. It also fixes a double space in an
error message relating to this issue, and prevents llvm-objcopy from
adding extra space for non-existent section headers, meaning that
--strip-sections on the output of a previous --strip-sections run
produces identical output, simplifying the test.
Reviewed by: rupprecht, grimar
Differential Revision: https://reviews.llvm.org/D59989
llvm-svn: 357475
To disable using of odd floating-point registers (O32 ABI and
-mno-odd-spreg command line option) such registers and their
super-registers added to the set of reserved registers. In general, it
works. But there is at least one problem - in case of enabled machine
verifier pass some floating-point tests failed because live ranges of
register units that are reserved is not empty and verification pass
failed with "Live segment doesn't end at a valid instruction" error
message.
There is D35985 patch which tries to solve the problem by explicit
removing of register units. This solution did not get approval.
I would like to use another approach for prevent using odd floating
point registers - define `AltOrders` and `AltOrderSelect` for MIPS
floating point register classes. Such `AltOrders` contains reduced set
of registers. At first glance, such solution does not break any test
cases and allows enabling machine instruction verification for all MIPS
test cases.
Differential Revision: http://reviews.llvm.org/D59799
llvm-svn: 357472
This patch allows symbols appended with @plt to parse and assemble with the
R_RISCV_CALL_PLT relocation.
Differential Revision: https://reviews.llvm.org/D55335
Patch by Lewis Revill.
llvm-svn: 357470
Summary:
This patch adds the code needed to parse a minidump file into the
MinidumpYAML model, and the necessary glue code so that obj2yaml can
recognise the minidump files and process them.
Reviewers: jhenderson, zturner, clayborg
Subscribers: mgorny, lldb-commits, amccarth, markmentovai, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59634
llvm-svn: 357469
There are various places in LLVM where the definition of StackID is not
properly honoured, for example in PEI where objects with a StackID > 0 are
allocated on the default stack (StackID0). This patch enforces that PEI
only considers allocating objects to StackID 0.
Reviewers: arsenm, thegameg, MatzeB
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D60062
llvm-svn: 357460
The code was previously checking that candidates for sinking had exactly
one use or were a store instruction (which can't have uses). This meant
we could sink call instructions only if they had a use.
That limitation seemed a bit arbitrary, so this patch changes it to
"instruction has zero or one use" which seems more natural and removes
the need to special-case stores.
Differential revision: https://reviews.llvm.org/D59936
llvm-svn: 357452
The code doesn't actually need any of the information about the widenable condition at this level. The only thing we need is to ensure the WC call is the last thing anded in, and even that is a quirk we should really look to remove.
llvm-svn: 357448
There's an existing optimization for x != C, but somehow it was missing
a special case for 0.
While I'm here, also cleaned up the code/comments a bit: the second
value produced by the MERGE_VALUES was actually dead, since a CMOV only
produces one result.
Differential Revision: https://reviews.llvm.org/D59616
llvm-svn: 357437
It's a little tricky to make this issue show up because
prologue/epilogue emission normally likes to push at least two
registers... but it doesn't when lr is force-spilled due to function
length. Not sure if that really makes sense, but I decided not to touch
it for now.
Differential Revision: https://reviews.llvm.org/D59385
llvm-svn: 357436
This improves selection for vector stores into v2s64s. Before we just
scalarized them, but we can just use a STRQui instead.
Differential Revision: https://reviews.llvm.org/D60083
llvm-svn: 357432
This should allow llvm-exegesis to intelligently constrain the rounding mode.
The mask in the encoder shouldn't be necessary any more. We used to allow codegen to use 8-11 for rounding mode and the assembler would use 0-3 to mean the same thing so we masked here and in the printer. Codegen now matches the assembler and the printer was updated, but I forgot to update the encoder.
llvm-svn: 357419
We'd been optimizing the case where the predicate was obviously true, do the same for the false case. Mostly just for completeness sake, but also may improve compile time in loops which will exit through the guard. Such loops are presumed rare in fastpath code, but may be present down untaken paths, so optimizing for them is still useful.
llvm-svn: 357408
Summary:
Previously, we translate llvm.round to PTX cvt.rni, which rounds to the
even interger when the source is equidistant between two integers. This
is not correct as llvm.round should round away from zero. This change
replaces llvm.round with a round away from zero implementation through
target specific custom lowering.
Modify a few affected tests to not check for cvt.rni. Instead, we check
for the use of a few constants used in implementing round. We are also
adding CUDA runnable tests to check for the values produced by
llvm.round to test-suites/External/CUDA.
Reviewers: tra
Subscribers: jholewinski, sanjoy, jlebar, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59947
llvm-svn: 357407
LoopPredication was replacing the original condition, but leaving the instructions to compute the old conditions around. This would get cleaned up by other passes of course, but we might as well do it eagerly. That also makes the test output less confusing.
llvm-svn: 357406
I'm about to make some changes to the pass which cause widespread - but uninteresting - test diffs. Prepare the tests for easy updating.
llvm-svn: 357404
As highlighted by tests, if one of the operands is loop variant, but guaranteed to have the same value on all iterations, we have a missed oppurtunity.
llvm-svn: 357403
This change incorporates an effort by Connor Abbot to change how we deal
with WWM operations potentially trashing valid values in inactive lanes.
Previously, the SIFixWWMLiveness pass would work out which registers
were being trashed within WWM regions, and ensure that the register
allocator did not have any values it was depending on resident in those
registers if the WWM section would trash them. This worked perfectly
well, but would cause sometimes severe register pressure when the WWM
section resided before divergent control flow (or at least that is where
I mostly observed it).
This fix instead runs through the WWM sections and pre allocates some
registers for WWM. It then reserves these registers so that the register
allocator cannot use them. This results in a significant register
saving on some WWM shaders I'm working with (130 -> 104 VGPRs, with just
this change!).
Differential Revision: https://reviews.llvm.org/D59295
llvm-svn: 357400
This instruction writes a block of allocation tags
and stores zero to the associated data locations.
It differs from STGM by 1 bit and has the same
arguments.
The specification can be found here:
https://developer.arm.com/docs/ddi0596/c
Differential Revision: https://reviews.llvm.org/D60065
llvm-svn: 357397
This patch replaces the addition of VK_RISCV_CALL in RISCVMCCodeEmitter by
creating the RISCVMCExpr when tail/call are parsed, or in the codegen case
when the callee symbols are created.
This required adding a new CallSymbol operand to allow only adding
VK_RISCV_CALL to tail/call instructions.
This patch will allow further expansion of parsing and codegen to easily
include PLT symbols which must generate the R_RISCV_CALL_PLT relocation.
Differential Revision: https://reviews.llvm.org/D55560
Patch by Lewis Revill.
llvm-svn: 357396
The STGV/LDGV instructions were replaced with
STGM/LDGM. The encodings remain the same but there
is no longer writeback so there are no unpredictable
encodings to check for.
The specfication can be found here:
https://developer.arm.com/docs/ddi0596/c
Differential Revision: https://reviews.llvm.org/D60064
llvm-svn: 357395
This patch adds an implementation of a PC-relative addressing sequence to be
used when -mcmodel=medium is specified. With absolute addressing, a 'medium'
codemodel may cause addresses to be out of range. This is because while
'medium' implies a 2 GiB addressing range, this 2 GiB can be at any offset as
opposed to 'small', which implies the first 2 GiB only.
Note that LLVM/Clang currently specifies code models differently to GCC, where
small and medium imply the same functionality as GCC's medlow and medany
respectively.
Differential Revision: https://reviews.llvm.org/D54143
Patch by Lewis Revill.
llvm-svn: 357393
The latest version of the MTE spec added a system
register 'GMID_EL1'. It contains the block size used
by the LDGM and STGM instructions and is read only.
The specification can be found here:
https://developer.arm.com/docs/ddi0596/c
llvm-svn: 357392
Summary:
This fixes PR41270.
The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.
Reviewers: reames, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60058
llvm-svn: 357389
This reverts commit 75216a6dbcfe5fb55039ef06a07e419fa875f4a5.
I'll recommit with a better commit message with reference to the
phabricator review.
llvm-svn: 357387
This fixes PR41270.
The recursive function evaluateInDifferentElementOrder expects to be called
on a vector Value, so when we call it on a vector GEP's arguments, we must
first check that the argument is indeed a vector.
llvm-svn: 357385
If we have a commutable vector binop with inverted select-shuffles,
we don't care about the order of the operands in each vector lane:
LHS = shuffle V1, V2, <0, 5, 6, 3>
RHS = shuffle V2, V1, <0, 5, 6, 3>
LHS + RHS --> <V1[0]+V2[0], V2[1]+V1[1], V2[2]+V1[2], V1[3]+V2[3]> --> V1 + V2
PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...is currently titled as an SLP enhancement, but at least for the
given example, we can reduce that in instcombine because we are just
eliminating shuffles.
As noted in the TODO, this could be generalized, but I haven't thought
through those patterns completely, so this is limited to what appears
to be always safe.
Differential Revision: https://reviews.llvm.org/D60048
llvm-svn: 357382
Summary:
The missing `<` causes the lld command to override the test file, which fails in
environments marking the test files as readonly.
Reviewers: bkramer
Reviewed By: bkramer
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60060
llvm-svn: 357380
Adds a `seto` pattern expansion. Without it the lowerings of `fcmp one` and
`fcmp ord` would be inefficient due to an unoptimized double negation.
Differential Revision: https://reviews.llvm.org/D59699
llvm-svn: 357378
A pcrel_lo will point to the associated pcrel_hi fixup which in turn points to
the real target. RISCVMCExpr::evaluatePCRelLo will work around this
indirection in order to allow the fixup to be evaluate properly. However, if
relocations are forced (e.g. due to linker relaxation is enabled) then its
evaluation is undesired and will result in a relocation with the wrong target.
This patch modifies evaluatePCRelLo so it will not try to evaluate if the
fixup will be forced as a relocation. A new helper method is added to
RISCVAsmBackend to query this.
Differential Revision: https://reviews.llvm.org/D59686
llvm-svn: 357374
One motivation for making this change is that the lack of using movmsk is likely
a main source of perf difference between clang and gcc on the C-Ray benchmark as
shown here:
https://www.phoronix.com/scan.php?page=article&item=gcc-clang-2019&num=5
...but this change alone isn't enough to solve that problem.
The 'all-of' examples show what is likely the worst case trade-off: we end up with
an extra instruction (or 2 if we count the 'xor' register clearing). The 'any-of'
examples look clearly better using movmsk because we've traded 2 vector instructions
for 2 scalar instructions, and movmsk may have better timing than the generic 'movq'.
If we examine the llvm-mca output for these cases, it appears that even though the
'all-of' movmsk variant looks worse on paper, it would perform better on both
Haswell and Jaguar.
$ llvm-mca -mcpu=haswell no_movmsk.s -timeline
Iterations: 100
Instructions: 400
Total Cycles: 504
Total uOps: 400
Dispatch Width: 4
uOps Per Cycle: 0.79
IPC: 0.79
Block RThroughput: 1.0
$ llvm-mca -mcpu=haswell movmsk.s -timeline
Iterations: 100
Instructions: 600
Total Cycles: 358
Total uOps: 600
Dispatch Width: 4
uOps Per Cycle: 1.68
IPC: 1.68
Block RThroughput: 1.5
$ llvm-mca -mcpu=btver2 no_movmsk.s -timeline
Iterations: 100
Instructions: 400
Total Cycles: 407
Total uOps: 400
Dispatch Width: 2
uOps Per Cycle: 0.98
IPC: 0.98
Block RThroughput: 2.0
$ llvm-mca -mcpu=btver2 movmsk.s -timeline
Iterations: 100
Instructions: 600
Total Cycles: 311
Total uOps: 600
Dispatch Width: 2
uOps Per Cycle: 1.93
IPC: 1.93
Block RThroughput: 3.0
Finally, there may be CPUs where movmsk is horribly slow (old AMD small cores?), but if
that's true, then we're also almost certainly making the wrong transform already for
reductions with >2 elements, so that should be fixed independently.
Differential Revision: https://reviews.llvm.org/D59997
llvm-svn: 357367
In PR41304:
https://bugs.llvm.org/show_bug.cgi?id=41304
...we have a case where we want to fold a binop of select-shuffle (blended) values.
Rather than try to match commuted variants of the pattern, we can canonicalize the
shuffles and check for mask equality with commuted operands.
We don't produce arbitrary shuffle masks in instcombine, but select-shuffles are a
special case that the backend is required to handle because we already canonicalize
vector select to this shuffle form.
So there should be no codegen difference from this change. It's possible that this
improves CSE in IR though.
Differential Revision: https://reviews.llvm.org/D60016
llvm-svn: 357366
Straightforward port of StatepointIRVerifier pass to new Pass Manager framework.
Fix By: skatkov
Reviewed By: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D59825
This is a re-land of r357147/r357148 with LLVM_ENABLE_MODULES build fixed.
Adding IR/SafepointIRVerifier.h into its own module.
llvm-svn: 357361
Negate updates flags like a subtract. We should be able to use the flags from the RMW form of negate when we have (store (X86ISD::SUB 0, load A), A)
Differential Revision: https://reviews.llvm.org/D60007
llvm-svn: 357353
This patch adds support for the RISC-V hard float ABIs, building on top of
rL355771, which added basic target-abi parsing and MC layer support. It also
builds on some re-organisations and expansion of the upstream ABI and calling
convention tests which were recently committed directly upstream.
A number of aspects of the RISC-V float hard float ABIs require frontend
support (e.g. flattening of structs and passing int+fp for fp+fp structs in a
pair of registers), and will be addressed in a Clang patch.
As can be seen from the tests, it would be worthwhile extending
RISCVMergeBaseOffsets to handle constant pool as well as global accesses.
Differential Revision: https://reviews.llvm.org/D59357
llvm-svn: 357352
Fixes PR41316 where the expanded PAVG intrinsic had had one of its ADDs turned into an OR due to its operands having no conflicting bits.
llvm-svn: 357351
vararg.ll previously missed RV64 tests. This patch also prepares for using
vararg.ll to test handling of varargs for the ilp32f/ilp32d/lp64f/lp64d hard
float ABIs. In these ABIs, varargs are passed as in either the ilp32 or lp64
ABI. Due to some slight codegen differences, different check lines are needed
for when RV32D is enabled.
llvm-svn: 357350
Summary:
BTW, STLExtras.h provides llvm::size() which is similar to std::size()
for random access iterators. However, if we prefer qualified
llvm::size(), the member function .size() will be more convenient.
Reviewers: jhenderson, jakehehrlich, rupprecht, grimar, alexshap, espindola
Reviewed By: grimar
Subscribers: emaste, arichardson, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60028
llvm-svn: 357347
Summary:
Linearing the control flow by placing `try`/`end_try` markers can create
mismatches in unwind destinations. This patch resolves these mismatches
by wrapping those instructions with an incorrect unwind destination with
a nested `try`/`catch`/`end_try` and branching to the right destination
within the new catch block.
Reviewers: dschuff
Subscribers: sunfish, sbc100, jgravelle-google, chrib, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D48345
llvm-svn: 357343
Summary:
While this does not change any final output, this will greatly simplify
ixing unwind destination mismatches in CFGStackify (D48345), because we
have to create some new registers there.
Reviewers: dschuff
Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59652
llvm-svn: 357342
The SplitF64 node is used on RV32D to convert an f64 directly to a pair of i32
(necessary as bitcasting to i64 isn't legal). When performed on a ConstantFP,
this will result in a FP load from the constant pool followed by a store to
the stack and two integer loads from the stack (necessary as there is no way
to directly move between f64 FPRs and i32 GPRs on RV32D). It's always cheaper
to just materialise integers for the lo and hi parts of the FP constant, so do
that instead.
llvm-svn: 357341
This change adds hierarchical "time trace" profiling blocks that can be visualized in Chrome, in a "flame chart" style. Each profiling block can have a "detail" string that for example indicates the file being processed, template name being instantiated, function being optimized etc.
This is taken from GitHub PR: https://github.com/aras-p/llvm-project-20170507/pull/2
Patch by Aras Pranckevičius.
Differential Revision: https://reviews.llvm.org/D58675
llvm-svn: 357340
This minimises differences in output when compiling with hardware floating
point support, which will be done in a future patch (to demonstrate the same
vararg calling convention is used).
llvm-svn: 357339
Use $<CONFIG> instead of $<CONFIGURATION>, since the latter has been
deprecated since CMake 3.0, and the former is entirely equivalent.
llvm-svn: 357338
Summary:
Currently we create a routing block to the dispatch block for every
predecessor of every entry. So the total number of routing blocks
created will be (# of preds) * (# of entries). But we don't need to do
this: we need at most 2 routing blocks per loop entry, one for when the
predecessor is inside the loop and one for it is outside the loop. (We
can't merge these into one because this will creates another loop cycle
between blocks inside and blocks outside) This patch fixes this and
creates at most 2 routing blocks per entry.
This also renames variable `Split` to `Routing`, which I think is a bit
clearer.
Reviewers: kripken
Subscribers: sunfish, dschuff, sbc100, jgravelle-google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59462
llvm-svn: 357337
Summary:
On AIX, we can determine whether a filesystem is remote using `mntctl`.
If the information is not found, then claim that the file is remote
(since that is the more restrictive case). Testing for the associated
interface is restored with a modified version of the unit test from
rL295768.
Reviewers: jasonliu, xingxue
Reviewed By: xingxue
Subscribers: jsji, apaprocki, Hahnfeld, zturner, krytarowski, kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58801
llvm-svn: 357333
Summary:
This feature is not actually used for anything in the WebAssembly
backend, but adding it allows users to get it into the target features
sections of their objects, which makes these objects
future-compatible.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jdoerfert, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D60013
llvm-svn: 357321
Summary:
This lets us avoid e.g. checking if A >=s B in getSMaxExpr(A, B) if we've
already established that (A smax B) is the best we can do.
Fixes PR41225.
Reviewers: asbirlea
Subscribers: mcrosier, jlebar, bixia, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60010
llvm-svn: 357320
This adds support for v2s32 vector inserts, and updates the selection +
regbankselect tests for G_INSERT_VECTOR_ELT.
Differential Revision: https://reviews.llvm.org/D59910
llvm-svn: 357318
We need XMM registers to handle varargs with the Win64 ABI. Before we would
silently generate bad code resulting in an assertion failure elsewhere in the
backend.
llvm-svn: 357317
Summary:
This fixes crashes when a BB in which an END_LOOP is to be placed is
unreachable and does not have any predecessors. Fixes PR41307.
Reviewers: dschuff
Subscribers: yurydelendik, sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D60004
llvm-svn: 357303
Since this can be set with s_setreg*, it should not be a subtarget
property. Set a default based on the calling convention, and Introduce
a new amdgpu-dx10-clamp attribute to override this if desired.
Also introduce a new amdgpu-ieee attribute to match.
The values need to match to allow inlining. I think it is OK for the
caller's dx10-clamp attribute to override the callee, but there
doesn't appear to be the infrastructure to do this currently without
definining the attribute in the generic Attributes.td.
Eventually the calling convention lowering will need to insert a mode
switch somewhere for these.
llvm-svn: 357302
Document the intended use of the `.amdgcn.next_free_{s,v}gpr` in the
context of multiple kernels and functions.
Differential Revision: https://reviews.llvm.org/D59949
llvm-svn: 357289
Summary:
Nodes that have no uses are eventually pruned when they are selected
from the worklist. Record nodes newly added to the worklist or DAG and
perform pruning after every combine attempt.
Reviewers: efriedma, RKSimon, craig.topper, spatel, jyknight
Reviewed By: jyknight
Subscribers: jdoerfert, jyknight, nemanjai, jvesely, nhaehnle, javed.absar, hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58070
llvm-svn: 357283
Summary:
Various SelectionDAG non-combine operations (e.g. the getNode smart
constructor and legalization) may leave dangling nodes by applying
optimizations without fully pruning unused result values. This results
in nodes that are never added to the worklist and therefore can not be
pruned.
Add a node inserter for the combiner to make sure such nodes have the
chance of being pruned. This allows a number of additional peephole
optimizations.
Reviewers: efriedma, RKSimon, craig.topper, jyknight
Reviewed By: jyknight
Subscribers: msearles, jyknight, sdardis, nemanjai, javed.absar, hiraditya, jrtc27, atanasyan, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D58068
llvm-svn: 357279
This may not be NFC, but I'm not sure how to expose any diffs in
tests. In theory, it should be slightly more efficient and possibly
more profitable to do the canonicalizations (which can increase the
undef elements in the mask) ahead of SimplifyDemandedVectorElts().
llvm-svn: 357272
Summary: Support reading notes that don't have a standard note name.
Reviewers: MaskRay
Reviewed By: MaskRay
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59969
llvm-svn: 357271
For the cases where the icmp/fcmp predicate is commutative, use reorderInputsAccordingToOpcode to collect and commute the operands.
This requires a helper to recognise commutativity in both general Instruction and CmpInstr types - the CmpInst::isCommutative doesn't overload the Instruction::isCommutative method for reasons I'm not clear on (maybe because its based on predicate not opcode?!?).
Differential Revision: https://reviews.llvm.org/D59992
llvm-svn: 357266
The `lowerMSASplatImm` function zero-extends `i32` immediates while
building constant. If target type is `i64`, negative immediate loses
the sign. As a result, for example `__builtin_msa_ldi_d(-1)` lowered
to series of instruction loads incorrect value 0xffffffff to the `$w0`
register instead of single `ldi.d $w0, -1` instruction.
The fix zero-extends unsigned immediates and signed-extend signed
immediates.
Differential Revision: http://reviews.llvm.org/D59884
llvm-svn: 357264
Summary:
It doesn't need anything from Analysis::SchedClassCluster class,
and takes ResolvedSchedClass as param, so this seems rather fitting.
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59994
llvm-svn: 357263
Summary:
`ResolvedSchedClass` will need to be used outside of `Analysis`
(before `InstructionBenchmarkClustering` even), therefore promote
it into a non-private top-level class, and while there also
move all of the functions that are only called by `ResolvedSchedClass`
into that same new file.
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: mgorny, tschuett, mgrang, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59993
llvm-svn: 357259
After investigating the examples from D59777 targeting an SSE4.1 machine,
it looks like a very different problem due to how we map illegal types (256-bit in these cases).
We're missing a shuffle simplification that maps elements of a vector back to a shuffled operand.
We have a more general version of this transform in DAGCombiner::visitVECTOR_SHUFFLE(), but that
generality means it is limited to patterns with a one-use constraint, and the examples here have
2 uses. We don't need any uses or legality limitations for a simplification (no new value is
created).
It looks like we miss this pattern in IR too.
In one of the zext examples here, we have shuffle masks like this:
Shuf0 = vector_shuffle<0,u,3,7,0,u,3,7>
Shuf = vector_shuffle<4,u,6,7,u,u,u,u>
...so that's moving the high half of the 1st vector into the low half. But the high half of the
1st vector is already identical to the low half.
Differential Revision: https://reviews.llvm.org/D59961
llvm-svn: 357258
Updated to use DenseMap::insert instead of [] operator for insertion, to
avoid a crash caused by epoch checks.
This reverts commit 2b85de4383.
llvm-svn: 357257
This is a sibling to rL357178 that I noticed we'd hit if we chose
an alternate transform in D59818.
%z = zext i8 %x to i32
%dec = add i32 %z, -1
%r = sext i32 %dec to i64
=>
%z2 = zext i8 %x to i64
%r = add i64 %z2, -1
https://rise4fun.com/Alive/kPP
The x86 vector diffs show a slight regression, so there's a chance
that we should limit this and the previous transform to scalars.
But given that we allowed vectors before, I'm matching that behavior
here. We should change both transforms together if that's the right
thing to do.
llvm-svn: 357254
In the example below, we would previously emit two range checks, one for cases
1--3 and one for 4--6. This patch makes us exploit the fact that the
fall-through is unreachable and only one range check is necessary.
switch i32 %i, label %default [
i32 1, label %bb1
i32 2, label %bb1
i32 3, label %bb1
i32 4, label %bb2
i32 5, label %bb2
i32 6, label %bb2
]
default: unreachable
llvm-svn: 357252
This patch adds an experimental stage named MicroOpQueueStage.
MicroOpQueueStage can be used to simulate a hardware micro-op queue (basically,
a decoupling queue between 'decode' and 'dispatch'). Users can specify a queue
size, as well as a optional MaxIPC (which - in the absence of a "Decoders" stage
- can be used to simulate a different throughput from the decoders).
This stage is added to the default pipeline between the EntryStage and the
DispatchStage only if PipelineOption::MicroOpQueue is different than zero. By
default, llvm-mca sets PipelineOption::MicroOpQueue to the value of hidden flag
-micro-op-queue-size.
Throughput from the decoder can be simulated via another hidden flag named
-decoder-throughput. That flag allows us to quickly experiment with different
frontend throughputs. For targets that declare a loop buffer, flag
-decoder-throughput allows users to do multiple runs, each time simulating a
different throughput from the decoders.
This stage can/will be extended in future. For example, we could add a "buffer
full" event to notify bottlenecks caused by backpressure. flag
-decoder-throughput would probably go away if in future we delegate to another
stage (DecoderStage?) the simulation of a (potentially variable) throughput from
the decoders. For now, flag -decoder-throughput is "good enough" to run some
simple experiments.
Differential Revision: https://reviews.llvm.org/D59928
llvm-svn: 357248
The majority of the printRelocation and printDynamicRelocation functions
were identical. This patch factors this all out into a new function.
There are a couple of minor differences to do with printing of symbols
without names, but I think these are harmless, and in some cases a small
improvement.
Reviewed by: grimar, rupprecht, Higuoxing
Differential Revision: https://reviews.llvm.org/D59823
llvm-svn: 357246
Summary:
The diff looks scary but it really isn't:
1. I moved the check for the number of measurements into `SchedClassClusterCentroid::validate()`
2. While there, added a check that we can only have a single inverse throughput measurement. I missed that when adding it initially.
3. In `Analysis::SchedClassCluster::measurementsMatch()` is called with the current LLVM values from schedule class and the values from Centroid.
3.1. The values from centroid we can already get from `SchedClassClusterCentroid::getAsPoint()`.
This isn't 100% a NFC, because previously for inverse throughput we used `min()`. I have asked whether i have done that correctly in
https://reviews.llvm.org/D57647?id=184939#inline-510384 but did not hear back. I think `avg()` should be used too, thus it is a fix.
3.2. Finally, refactor the computation of the LLVM-specified values into `Analysis::SchedClassCluster::getSchedClassPoint()`
I will need that function for [[ https://bugs.llvm.org/show_bug.cgi?id=41275 | PR41275 ]]
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59951
llvm-svn: 357245
We should be able to match elements with the swapped predicate as well - as long as we commute the source operands.
Differential Revision: https://reviews.llvm.org/D59956
llvm-svn: 357243
Summary:
PowerPC64/PowerPC64le supports the builtin function __builtin_setrnd to set the floating point rounding mode. This function will use the least significant two bits of integer argument to set the floating point rounding mode.
double __builtin_setrnd(int mode);
The effective values for mode are:
0 - round to nearest
1 - round to zero
2 - round to +infinity
3 - round to -infinity
Note that the mode argument will modulo 4, so if the int argument is greater than 3, it will only use the least significant two bits of the mode. Namely, builtin_setrnd(102)) is equal to builtin_setrnd(2).
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D59405
llvm-svn: 357241
Some DAG mutations can only be applied to `ScheduleDAGMI`, and have to
internally cast a `ScheduleDAGInstrs` to `ScheduleDAGMI`.
There is nothing actually specific to `ScheduleDAGMI` in `Topo`.
llvm-svn: 357239
The register index can only really be an SGPR. Lie that a VGPR index
is legal, and then rewrite the instruction in a waterfall loop to
handle the index.
llvm-svn: 357235
A shift and add/sub sequence combination is faster in place of a multiply by constant.
Because the cycle or latency of multiply is not huge, we only consider such following
worthy patterns.
```
(mul x, 2^N + 1) => (add (shl x, N), x)
(mul x, -(2^N + 1)) => -(add (shl x, N), x)
(mul x, 2^N - 1) => (sub (shl x, N), x)
(mul x, -(2^N - 1)) => (sub x, (shl x, N))
```
And the cycles or latency is subtarget-dependent so that we need consider the
subtarget to determine to do or not do such transformation.
Also data type is considered for different cycles or latency to do multiply.
Differential Revision: https://reviews.llvm.org/D58950
llvm-svn: 357233
Only runs the clang-tools-extra lit tests; not yet the unit tests.
Add a build file for clangd-indexer too, since it's needed for
the tests.
Differential Revision: https://reviews.llvm.org/D59955
llvm-svn: 357232
Summary:
It does not currently make sense to use WebAssembly features in some functions
but not others, so this CL adds an IR pass that takes the union of all used
feature sets and applies it to each function in the module. This allows us to
prevent atomics from being lowered away if some function has opted in to using
them. When atomics is not enabled anywhere, we detect whether there exists any
atomic operations or thread local storage that would be stripped and disallow
linking with objects that contain atomics if and only if atomics or tls are
stripped. When atomics is enabled, mark it as used but do not require it of
other objects in the link. These changes allow libraries that do not use atomics
to be built once and linked into both single-threaded and multithreaded
binaries.
Reviewers: aheejin, sbc100, dschuff
Subscribers: jgravelle-google, hiraditya, sunfish, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59625
llvm-svn: 357226
Essentially echo "" | yaml2obj crashes. This patch attempts to trim whitespace
and determine if the yaml string in the file is empty or not. If the input is
empty then it will not properly print out an error message and return an error
code.
Differential Revision: https://reviews.llvm.org/D59964
A test/tools/yaml2obj/empty.yaml
M tools/yaml2obj/yaml2obj.cpp
llvm-svn: 357219
For the attached test case, unchecked addition of immediate starts and
ends overflows, as they can be arbitrary i64 constants.
Proof: https://rise4fun.com/Alive/Plqc
Reviewers: qcolombet, gilr, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59218
llvm-svn: 357217
For multi-dimensional array like below
int a[2][3];
the previous implementation generates BTF_KIND_ARRAY type
like below:
. element_type: int
. index_type: unsigned int
. number of elements: 6
This is not the best way to represent arrays, esp.,
when converting BTF back to headers and users will see
int a[6];
instead.
This patch generates proper support for multi-dimensional arrays.
For "int a[2][3]", the two BTF_KIND_ARRAY types will be
generated:
Type #n:
. element_type: int
. index_type: unsigned int
. number of elements: 3
Type #(n+1):
. element_type: #n
. index_type: unsigned int
. number of elements: 2
The linux kernel already supports such a multi-dimensional
array representation properly.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D59943
llvm-svn: 357215
This patch has three related fixes to improve float literal lexing:
1. Make AsmLexer::LexDigit handle floats without a decimal point more
consistently.
2. Make AsmLexer::LexFloatLiteral print an error for floats which are
apparently missing an "e".
3. Make APFloat::convertFromString use binutils-compatible exponent
parsing.
Together, this fixes some cases where a float would be incorrectly
rejected, fixes some cases where the compiler would crash, and improves
diagnostics in some cases.
Patch by Brandon Jones.
Differential Revision: https://reviews.llvm.org/D57321
llvm-svn: 357214
Even if the interleaving transform would otherwise be legal, we shouldn't
introduce an interleaved load that is wider than the original load: it might
have undefined behavior.
It might be possible to perform some sort of mask-narrowing transform in
some cases (using a narrower interleaved load, then extending the
results using shufflevectors). But I haven't tried to implement that,
at least for now.
Fixes https://bugs.llvm.org/show_bug.cgi?id=41245 .
Differential Revision: https://reviews.llvm.org/D59954
llvm-svn: 357212
By extending OrderedBB to allow removing and replacing cached
instructions, we can preserve OrderedBBs in DSE easily. This eliminates
one source of quadratic compile time in DSE.
Fixes PR38829.
Reviewers: rnk, efriedma, hfinkel
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D59789
llvm-svn: 357208
If the caller can preserve the OBB, we can avoid recomputing the order
for each getDependency call.
Reviewers: efriedma, rnk, hfinkel
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D59788
llvm-svn: 357206
to unbreak the modular bots and its follow-up commit.
This reverts commit https://reviews.llvm.org/D59825
because it introduced a
fatal error: cyclic dependency in module 'LLVM_intrinsic_gen': LLVM_intrinsic_gen -> LLVM_IR -> LLVM_intrinsic_gen
llvm-svn: 357201
For 64-bit operations we should consider if the immediate can be made to fit
in an unsigned 32-bits immedate. For OR/XOR this allows us to load the immediate
with MOV32ri instead of movabsq. For AND this allows us to fold the immediate.
Differential Revision: https://reviews.llvm.org/D59867
llvm-svn: 357196
This avoids allocating a few KB of heap memory on startup, and instead
allocates these maps lazily. I noticed this while profiling LLD.
llvm-svn: 357192
Enough to build the clangd binaries, but this is still missing build
files for:
- fuzzer
- indexer
- index/dex/dexp
- benchmarks
- xpc
Differential Revision: https://reviews.llvm.org/D59899
llvm-svn: 357182
Summary:
The current git-svnrevert script only works with git-svn repos (e.g. using "git svn find-rev" to find the commit to revert). This adds a similar implementation that works with the llvm git command handler.
Usage:
```
// Revert by svn id
$ git llvm revert r123456
// See what commands would be run instead of actually reverting
$ git llvm revert -n r123456
<full git revert + git commit commands>
// Git commit hash also fine
$ git llvm revert abc123456
// For convenience, the git->svn method can be used directly:
$ git llvm svn-lookup abc123456
r123456
// Push revert upstream (drop the -n when ready)
$ git llvm push -n
```
Regardless of how the command is invoked (with a svn revision or git hash), the message is:
```
Revert [LibFoo] Change Foo implementation
This reverts r123456 (git commit abc123)
```
Reviewers: jyknight, mehdi_amini, jlebar
Reviewed By: jlebar
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59837
llvm-svn: 357180
Ensure Code Object V2 documentation is complete, but always contains a
warning and a link to the equivalent Code Object V3 documentation.
Explicitly indicate that any note records present in a code object that
are not documented must be considered deprecated and ignored.
Differential Revision: https://reviews.llvm.org/D59782
llvm-svn: 357176
This is probably the least important of our movmsk problems, but I'm starting
at the bottom to reduce distractions.
We were creating a select_cc which bypasses the select and bitmask codegen
optimizations that we have now. If we produce a compare+negate instead, we
allow things like neg/sbb carry bit hacks, and in all cases we avoid a cmov.
There's no partial register update danger in these sequences because we always
produce the zero-register xor ahead of the 'set' if needed.
There seems to be a missing fold for sext of a bool bit here:
negl %ecx
movslq %ecx, %rax
...but that's an independent transform.
Differential Revision: https://reviews.llvm.org/D59818
llvm-svn: 357172
Summary:
This adds a BranchFusion feature to replace the usage of the MacroFusion
for AMD CPUs.
See D59688 for context.
Reviewers: andreadb, lebedev.ri
Subscribers: hiraditya, jdoerfert, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59872
llvm-svn: 357171
Based on llvm-exegesis measurements.
Now that llvm-exegesis is ~2 magnitudes faster, and is a bit smarter,
it is now possible to continue cleanup of the scheduler model.
With this, there are no more latency inconsistencies for the
opcodes that produce stable measurements, and only a few inconsistencies
for unstable measurements (MMX_* opcodes, opcodes that llvm-exegesis
measures by chaining - CMP, TEST, BT, SETcc, CVT, MOV, etc.)
llvm-svn: 357169
Summary: When implementing `GNU style` dumper for `.gnu.version` section, we should find symbol version name by `vs_index`.
Reviewers: jhenderson, rupprecht
Reviewed By: rupprecht
Subscribers: arphaman, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59545
llvm-svn: 357164
If scalar truncates are free, attempt to pre-truncate build_vectors source operands.
Only attempt to do this before legalization as we often end up with truncations/extensions during build_vector lowering.
Differential Revision: https://reviews.llvm.org/D59654
llvm-svn: 357161
This is in preparation to a driver patch to add gcc 8's -fsanitize=pointer-compare and -fsanitize=pointer-subtract.
Disabled by default as this is still an experimental feature.
Reviewed By: morehouse, vitalybuka
Differential Revision: https://reviews.llvm.org/D59220
llvm-svn: 357157
With this change, the VPlan native path is triggered with the directive:
#pragma clang loop vectorize(enable)
There is no need to specify the vectorize_width(N) clause.
Patch by Francesco Petrogalli <francesco.petrogalli@arm.com>
Differential Revision: https://reviews.llvm.org/D57598
llvm-svn: 357156
G_SELECT uses a 1-bit scalar for the condition, and is currently
implemented with a plain CMPri against 0. This means that values such as
0x1110 are interpreted as true, when instead the higher bits should be
treated as undefined and therefore ignored. Replace the CMPri with a
TSTri against 0x1, which performs an implicit AND, yielding the expected
result.
llvm-svn: 357153
Summary:
This is an alternative to D59539.
Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.
But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
the LLVM values this cluster will **never** match the LLVM values,
and thus this cluster will **always** be displayed as inconsistent.
The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the `.yaml` and look it up manually.
The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.
But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.
Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.
This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..
Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].
Reviewers: courbet, gchatelet
Reviewed By: courbet
Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59820
llvm-svn: 357152
Summary:
Add tests for selection across basic block boundary:
* one test containing a buffer load, where part of the offset
computation is placed in the predecessor of the load
* similar test, but containing two buffer loads and shared
computations
Please note that the behaviour being tested will be updated in
a subsequent commit.
This commit was extracted from https://reviews.llvm.org/D59535.
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: jvesely, nhaehnle, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59690
llvm-svn: 357149
These fixup kinds are not explicitly related to the code section. They
are there to signal how to apply the fixup.
Also, a couple of other minor wasm cleanups.
Differential Revision: https://reviews.llvm.org/D59908
llvm-svn: 357145
The issue here is that we actually allow CGSCC passes to mutate IR (and
therefore invalidate analyses) outside of the current SCC. At a minimum,
we need to support mutating parent and ancestor SCCs to support the
ArgumentPromotion pass which rewrites all calls to a function.
However, the analysis invalidation infrastructure is heavily based
around not needing to invalidate the same IR-unit at multiple levels.
With Loop passes for example, they don't invalidate other Loops. So we
need to customize how we handle CGSCC invalidation. Doing this without
gratuitously re-running analyses is even harder. I've avoided most of
these by using an out-of-band preserved set to accumulate the cross-SCC
invalidation, but it still isn't perfect in the case of re-visiting the
same SCC repeatedly *but* it coming off the worklist. Unclear how
important this use case really is, but I wanted to call it out.
Another wrinkle is that in order for this to successfully propagate to
function analyses, we have to make sure we have a proxy from the SCC to
the Function level. That requires pre-creating the necessary proxy.
The motivating test case now works cleanly and is added for
ArgumentPromotion.
Thanks for the review from Philip and Wei!
Differential Revision: https://reviews.llvm.org/D59869
llvm-svn: 357137
The last reference to this function was removed from the ARM
td files in 2015 in rL225266.
Differential Revision: https://reviews.llvm.org/D59868
llvm-svn: 357130
If we know the 2 halves of an oversized zext-in-reg are the same,
don't create those halves independently.
I tried several different approaches to fold this, but it's difficult
to get right during legalization. In the default path, we are creating
a generic shuffle that looks like an unpack high, but it can get
transformed into a different mask (a blend), so it's not
straightforward to match that. If we try to fold after it actually
becomes an X86ISD::UNPCKH node, we can't be sure what the operand node
is - it might be a generic shuffle, or it could be some x86-specific op.
From the test output, we should be doing something like this for SSE4.1
as well, but I'd rather leave that as a follow-up since it involves
changing lowering actions.
Differential Revision: https://reviews.llvm.org/D59777
llvm-svn: 357129
This is not exactly NFC because it should make further combines
of MOVMSK easier to match, but there should be no outward differences
because we have isel patterns in place specifically to allow this. See:
// Also support integer VTs to avoid a int->fp bitcast in the DAG.
llvm-svn: 357128
This makes more sense as a place to initialize these. I don't think runOnMachineFunction was overriden when these cached values were originally created.
llvm-svn: 357123
When lowering a load or store for TypeWidenVector, the type legalizer
would use a single load or store if the associated integer type was legal
or promoted. E.g. it loads a v4i8 as an i32 if i32 is legal/promotable.
(See https://reviews.llvm.org/rL236528 for reference.)
This applies that behaviour to vector types. If the vector type is
TypePromoteInteger, the element type is going to be TypePromoteInteger
as well, which will lead to have a single promoting load rather than N
individual promoting loads. For instance, if we have a v3i1, we would
now have a load of v4i1 instead of 3 loads of i1.
Patch by Guillaume Marques. Thanks!
Differential Revision: https://reviews.llvm.org/D56201
llvm-svn: 357120
Add a test that checks the intersectWith() implementation against
all 4-bit range pairs. The test uses a more explicit way of
calculating the possible intersections, and checks that the right
one is picked out according to the smallest set heuristic.
This is in preparation for introducing intersectWith() variants that
use different heuristics to pick an intersection range, if there are
multiple possibilities.
llvm-svn: 357119
Summary:
Follow-up for D56743.
* Add more "--" in llvm-rc invocations.
* Add llvm-rc to the tools list. This uses full path to llvm-rc in test
RUN lines (llvm-lit -v), making them copy-pasteable.
Reviewers: mstorsjo, zturner
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59858
llvm-svn: 357118
Split off from D59749. This adds isWrappedSet() and
isUpperSignWrapped() set with the same behavior as isSignWrappedSet()
and isUpperWrapped() for the respectively other domain.
The methods isWrappedSet() and isSignWrappedSet() will not consider
ranges of the form [X, Max] == [X, 0) and [X, SignedMax] == [X, SignedMin)
to be wrapping, while isUpperWrapped() and isUpperSignWrapped() will.
Also replace the checks in getUnsignedMin() and friends with method
calls that implement the same logic.
llvm-svn: 357112
Summary:
A recent fix (r355751) caused a compile time regression because setting
the ModifiedDT flag in optimizeSelectInst means that each time a select
instruction is optimized the function walk in runOnFunction stops and
restarts again (which was needed to build a new DT before we started
building it lazily in r356937). Now that the DT is built lazily, a
simple fix is to just reset the DT at this point, rather than restarting
the whole function walk.
In the future other places that set ModifiedDT may want to switch to
just resetting the DT directly. But that will require an evaluation to
ensure that they don't otherwise need to restart the function walk.
Reviewers: spatel
Subscribers: jdoerfert, llvm-commits, xur
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59889
llvm-svn: 357111
WarnMissedTransforms.cpp produces remarks that use !Failure tags.
These weren't supported in optrecord.py, so if you encountered one in any of
the tools, the tool would crash.
Add them as a type of missed optimization.
Differential Revision: https://reviews.llvm.org/D59895
llvm-svn: 357110
ARMBaseInstrInfo::getNumLDMAddresses is making bad assumptions about the
memory operands of load and store-multiple operations. This doesn't
really fix the problem properly, but it's enough to prevent crashing,
at least.
Fixes https://bugs.llvm.org/show_bug.cgi?id=41231 .
Differential Revision: https://reviews.llvm.org/D59834
llvm-svn: 357109
Split out from D59749. The current implementation of isWrappedSet()
doesn't do what it says on the tin, and treats ranges like
[X, Max] as wrapping, because they are represented as [X, 0) when
using half-inclusive ranges. This also makes it inconsistent with
the semantics of isSignWrappedSet().
This patch renames isWrappedSet() to isUpperWrapped(), in preparation
for the introduction of a new isWrappedSet() method with corrected
behavior.
llvm-svn: 357107
Right now, if you try to use optdiff.py on any opt records, it will fail because
its calls to gather_results weren't updated to support filtering.
Since filters are supposed to be optional, this makes them None by default in
get_remarks and in gather_results. This allows other tools that don't support
filtering to still use the functions as is.
Differential Revision: https://reviews.llvm.org/D59894
llvm-svn: 357106
If there were only dbg_values in the block, recede would hit the
beginning of the block and try to use thet dbg_value as a real
instruction.
llvm-svn: 357105
Start using the uadd.sat and usub.sat intrinsics for the existing
canonicalizations. These intrinsics should optimize better than
expanded IR, have better handling in the X86 backend and should
be no worse than expanded IR in other backends, as far as we know.
rL357012 already introduced use of uadd.sat for the add+umin pattern.
Differential Revision: https://reviews.llvm.org/D58872
llvm-svn: 357103
The artifact combiners push instructions which have been marked for deletion
onto an list for the legalizer to deal with on return. However, for trunc(ext)
combines the combiner routine recursively calls itself. When it does this the
dead instructions list may not be empty, and the other combiners don't expect
to be dealing with essentially invalid MIR (multiple vreg defs etc).
This change fixes it by ensuring that the dead instructions are processed on
entry into tryCombineInstruction.
As a result, this fix exposed a few places in tests where G_TRUNC instructions
were not being deleted even though they were dead.
Differential Revision: https://reviews.llvm.org/D59892
llvm-svn: 357101
This reapplies r356149, using the correct overload of findUnusedReg
which passes the current iterator.
This worked most of the time, because the scavenger iterator was moved
at the end of the frame index loop in PEI. This would fail if the
spill was the first instruction. This was further hidden by the fact
that the scavenger wasn't passed in for normal frame index
elimination.
llvm-svn: 357098
Haswell CPUs have special support for SHLD/SHRD with the same register for both sources. Such an instruction will go to the rotate/shift unit on port 0 or 6. This gives it 1 cycle latency and 0.5 cycle reciprocal throughput. When the register is not the same, it becomes a 3 cycle operation on port 1. Sandybridge and Ivybridge always have 1 cyc latency and 0.5 cycle reciprocal throughput for any SHLD.
When FastSHLDRotate feature flag is set, we try to use SHLD for rotate by immediate unless BMI2 is enabled. But MachineCopyPropagation can look through a copy and change one of the sources to be different. This will break the hardware optimization.
This patch adds psuedo instruction to hide the second source input until after register allocation and MachineCopyPropagation. I'm not sure if this is the best way to do this or if there's some other way we can make this work.
Fixes PR41055
Differential Revision: https://reviews.llvm.org/D59391
llvm-svn: 357096
This patch removes an overly conservative check that would prevent
simplifying copies when the value we were tracking would go through
several subregister indices.
Indeed, the intend of this check was to not track values whenever
we have to compose subregister, but actually what the check was
doing was bailing anytime we see a second subreg, even if that
second subreg would actually be the new source of truth (as opposed
to a part of that subreg).
Differential Revision: https://reviews.llvm.org/D59891
llvm-svn: 357095
This patch fixes an assembler bug that allowed SVE vector registers to contain a
type suffix when not expected. The SVE unpredicated movprfx instruction is the
only instruction affected.
The following are examples of what was previously valid:
movprfx z0.b, z0.b
movprfx z0.b, z0.s
movprfx z0, z0.s
These instructions are now erroneous.
Patch by Cullen Rhodes (c-rhodes)
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D59636
llvm-svn: 357094
Also includes one example of how this transform is unsound. This isn't
verifying the copies are used in the control flow intrinisic patterns.
Also add option to disable exec mask opt pass. Since this pass is
unsound, it may be useful to turn it off until it is fixed.
llvm-svn: 357091