Now local value sinking only scans and numbers instructions added
between the current flush point and the last flush point. This ensures
that ISel is overall linear in the size of the BB.
Fixes PR37010 and re-enables local value sinking by default.
llvm-svn: 331087
This patch adds support for fragment expressions
TryToShrinkGlobalToBoolean() which were previously just dropped.
Thanks to Reid Kleckner for providing me a reproducer!
llvm-svn: 331086
Summary:
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the `LHS` and `RHS` matchers:
1. match `RHS` matcher to the `first` operand of binary operator,
2. and then match `LHS` matcher to the `second` operand of binary operator.
This works ok.
But it complicates writing of commutative matchers, where one would like to match
(`m_Value()`) the value on one side, and use (`m_Specific()`) it on the other side.
This is additionally complicated by the fact that `m_Specific()` stores the `Value *`,
not `Value **`, so it won't work at all out of the box.
The last problem is trivially solved by adding a new `m_c_Specific()` that stores the
`Value **`, not `Value *`. I'm choosing to add a new matcher, not change the existing
one because i guess all the current users are ok with existing behavior,
and this additional pointer indirection may have performance drawbacks.
Also, i'm storing pointer, not reference, because for some mysterious-to-me reason
it did not work with the reference.
The first one appears trivial, too.
Currently, we
1. match `LHS` matcher to the `first` operand of binary operator,
2. and then match `RHS` matcher to the `second` operand of binary operator.
If that does not match, we swap the ~~`LHS` and `RHS` matchers~~ **operands**:
1. match ~~`RHS`~~ **`LHS`** matcher to the ~~`first`~~ **`second`** operand of binary operator,
2. and then match ~~`LHS`~~ **`RHS`** matcher to the ~~`second`~ **`first`** operand of binary operator.
Surprisingly, `$ ninja check-llvm` still passes with this.
But i expect the bots will disagree..
The motivational unittest is included.
I'd like to use this in D45664.
Reviewers: spatel, craig.topper, arsenm, RKSimon
Reviewed By: craig.topper
Subscribers: xbolva00, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D45828
llvm-svn: 331085
As suggested in D45842
(although still not sure if we're going to advance that),
we must invalidate references to instructions that have
been recycled (operands were changed, so result is different).
llvm-svn: 331083
Some of the bots were failing in a different way to the others. These were
unable to compare tuples. Fix this by changing to a struct, thereby avoiding
the quirks of tuples.
llvm-svn: 331081
We currently have a hard to solve analysis problem around the order of instructions within a potentially throwing block. We can't cheaply determine whether a given instruction is before the first potential throw in the block. While we're working on that in the background, special case the first instruction within the header.
why this particular special case? Well, headers are guaranteed to execute if the loop does, and it turns out we tend to produce this form in practice.
In a follow on patch, I tend to extend LICM with an alternate approach which works for any instruction in the header before the first throw, but this is the best I can come up with other users of the analysis (such as store promotion.)
Note: I can't show the difference in the analysis result since we're ORing in the expensive instruction walk used by SCEV. Using the full walk is not suitable for a general solution.
llvm-svn: 331079
Summary:
Only allow a single unique .symver alias per symbol. This matches the
behavior of gas. I noticed that we ignored multiple mismatched symver
directives looking at https://reviews.llvm.org/D45798
Reviewers: pcc, tejohnson, espindola
Reviewed By: pcc
Subscribers: emaste, arichardson, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D45845
llvm-svn: 331078
Extend the live-in check for all aliased registers so that we can
allow sinking Copy instructions when only implicit def is in successor's
live-in.
llvm-svn: 331072
Summary:
Currently only the memory size is supported but others can be added as
needed.
narrowScalar for G_LOAD and G_STORE now correctly update the
MachineMemOperand and will refuse to legalize atomics since those need more
careful expansions to maintain atomicity.
Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar
Reviewed By: aemerson
Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D45466
llvm-svn: 331071
The idea is to have a pass which performs the same transformation as GuardWidening, but can be run within a loop pass manager without disrupting the pass manager structure. As demonstrated by the test case, this doesn't quite get there because of issues with post dom, but it gives a good step in the right direction. the motivation is purely to reduce compile time since we can now preserve locality during the loop walk.
This patch only includes a legacy pass. A follow up will add a new style pass as well.
llvm-svn: 331060
These branches were previously unanalyzable and unselectable. Add them and
recognize how to generate their inverses.
Reviewers: smaksimovic, atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D46113
llvm-svn: 331050
Put the first ldp at the end, so that the load-store optimizer can run
and merge the ldp and the add into a post-index ldp.
This didn't work in case no frame was needed and resulted in code size
regressions.
llvm-svn: 331044
Summary:
`HandleLLVMOptions` adds `-w` to the cflags if `LLVM_ENABLE_WARNINGS` is not on.
With `-w`, `check_cxx_compiler_flag` doesn't error out for unsupported flags
(for example `-mcrc` on x86_64), and those flags end up being detected as
working - and really they aren't.
I am not entirely sure what the best way to solve this is, but setting
`LLVM_ENABLE_WARNINGS` prior to including `HandleLLVMOptions` does the job.
Reviewers: phosek, beanz
Reviewed By: phosek
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D46079
llvm-svn: 331042
If the MachineInstr uses a custom inserter and is then erased after
instruction selection, there is no use for mapping it to a sched class.
Review: Ulrich Weigand
llvm-svn: 331040
We currently support LCSSA PHI nodes in the outer loop exit, if their
incoming values do not come from the outer loop latch or if the
outer loop latch has a single predecessor. In that case, the outer loop latch
will be executed only if the inner loop gets executed. If we have multiple
predecessors for the outer loop latch, it may be executed even if the inner
loop does not get executed.
This is a first step to support the case described in
https://bugs.llvm.org/show_bug.cgi?id=30472
Reviewers: efriedma, karthikthecool, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D43237
llvm-svn: 331037
This adds IR intrinsics for the AArch64 dot-product instructions introduced in
v8.2-A.
Differential revisioon: https://reviews.llvm.org/D46107
llvm-svn: 331036
Since PTX has grown a <2 x half> datatype vectorization has become more
important. The late LoadStoreVectorizer intentionally only does loads
and stores, but now arithmetic has to be vectorized for optimal
throughput too.
This is still very limited, SLP vectorization happily creates <2 x half>
if it's a legal type but there's still a lot of register moving
happening to get that fed into a vectorized store. Overall it's a small
performance win by reducing the amount of arithmetic instructions.
I haven't really checked what the loop vectorizer does to PTX code, the
cost model there might need some more tweaks. I didn't see it causing
harm though.
Differential Revision: https://reviews.llvm.org/D46130
llvm-svn: 331035
This patch makes compiler does not fuse fmul and fadd/fsub into
fmadd/fmsub by default. Instead, -fp-contract=fast option can
be used when such behavior is desired.
Differential Revision: https://reviews.llvm.org/D46057
llvm-svn: 331033
This adds IR intrinsics for the ARM dot-product instructions introduced in
v8.2-A.
Differential revision: https://reviews.llvm.org/D46106
llvm-svn: 331032
Back when the R52 schedule was added in rL286949, there was no way
to enable machine schedules in ARM for specific cores. Since then a
target feature has been added. This enables the feature for R52,
removing the need to manually specify compiler flags.
llvm-svn: 331027
Summary:
The value tracking analysis uses function alignment to infer that the
least significant bits of function pointers are known to be zero.
Unfortunately, this is not correct for ARM targets: the least
significant bit of a function pointer stores the ARM/Thumb state
information (i.e., the LSB is set for Thumb functions and cleared for
ARM functions).
The original approach (https://reviews.llvm.org/D44781) introduced a
new field for function pointer alignment in the DataLayout structure
to address this. But it seems unlikely that optimizations based on
function pointer alignment would bring much benefit in practice to
justify the additional maintenance burden, so this patch simply
assumes that function pointer alignment is always unknown.
Reviewers: javed.absar, efriedma
Reviewed By: efriedma
Subscribers: kristof.beyls, llvm-commits, hfinkel, rogfer01
Differential Revision: https://reviews.llvm.org/D46110
llvm-svn: 331025
Add new umin creation method which accepts a list of operands.
SCEV does not represents umin which is required in getExact, so
it transforms umin to umax with not. As a result the transformation of
tree of max to max with several operands does not work.
We just use the new introduced method for creation umin from several operands.
Reviewers: sanjoy, mkazantsev
Reviewed By: sanjoy
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46047
llvm-svn: 331015
It doesn't unwind, and the wrong marking leads to the creation of an
.eh_frame section when it isn't necessary.
Differential Revision: https://reviews.llvm.org/D46082
llvm-svn: 331008
The program might have unusual expectations for functions; for example,
the Linux kernel's build system warns if it finds references from .text
to .init.data.
I'm not sure this is something we actually want to make any guarantees
about (there isn't any explicit rule that would disallow outlining
in this case), but we might want to be conservative anyway.
Differential Revision: https://reviews.llvm.org/D46091
llvm-svn: 331007
Summary: Also test for symbols information in test/MC/WebAssembly/debug-info.ll.
Subscribers: jfb, dschuff, jgravelle-google, aheejin, sunfish, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D46160
llvm-svn: 331005
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,
Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D45736
llvm-svn: 331002
Summary: The instruction index was never referenced in the body. Just a minor cleanup.
Reviewers: andreadb
Reviewed By: andreadb
Subscribers: javed.absar, gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D46142
llvm-svn: 331001
Summary:
The '@' character is a special character in Doxygen. In a handful of cases we were not escaping this character which resulted in llvm intrinsics not being rendered properly. Specifically, the @llvm part was removed.
For example, see https://llvm.org/doxygen/classllvm_1_1AssumptionCache.html. There are a few references to '.assume' without the @llvm. prefix. This patch corrects this.
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits, sanjoy
Differential Revision: https://reviews.llvm.org/D45981
llvm-svn: 330998
Summary:
Simplify integer add expression X % C0 + (( X / C0 ) % C1) * C0 to
X % (C0 * C1). This is a common pattern seen in code generated by the XLA
GPU backend.
Add test cases for this new optimization.
Patch by Bixia Zheng!
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: efriedma, craig.topper, lebedev.ri, llvm-commits, jlebar
Differential Revision: https://reviews.llvm.org/D45976
llvm-svn: 330992
remainder expressions as operands.
Summary:
Add test cases to prepare for the new optimization that Simplifies integer add
expression X % C0 + (( X / C0 ) % C1) * C0 to X % (C0 * C1).
Patch by Bixia Zheng!
Reviewers: sanjoy
Reviewed By: sanjoy
Subscribers: jlebar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46017
llvm-svn: 330991
The main goal of this change is to make it much easier to track which
rules are actually covered by Testgen'erated regression tests.
Reviewers: aemerson, dsanders
Differential Revision: https://reviews.llvm.org/D46095
llvm-svn: 330988
`lb` and `lbu` commands accepts 16-bit signed offsets. But GAS accepts
larger offsets for these commands. If an offset does not fit in 16-bit
range, `lb` command is translated into lui/lb or lui/addu/lb series.
It's interesting that initially LLVM assembler supported this feature,
but later it was broken.
This patch restores support for 32-bit offsets. It replaces `mem_simm16`
operand for `LB` and `LBu` definitions by the new `mem_simmptr` operand.
This operand is intended to check that offset fits to the same size as
using for pointers. Later we will be able to extend this rule and
accepts 64-bit offsets when it is possible.
Some issues remain:
- The regression also affects LD, SD, LH, LHU commands. I'm going
to fix them by a separate patch.
- GAS accepts any 32-bit values as an offset. Now LLVM accepts signed
16-bit values and this patch extends the range to signed 32-bit offsets.
In other words, the following code accepted by GAS and still triggers
an error by LLVM:
```
lb $4, 0x80000004
# gas
lui a0, 0x8000
lb a0, 4(a0)
```
- In case of 64-bit pointers GAS accepts a 64-bit offset and translates
it to the li/dsll/lb series of commands. LLVM still rejects it.
Probably this feature has never been implemented in LLVM. This issue
is for a separate patch.
```
lb $4, 0x800000001
# gas
li a0, 0x8000
dsll a0, a0, 0x14
lb a0, 4(a0)
```
Differential Revision: https://reviews.llvm.org/D45020
llvm-svn: 330983
Summary:
Use the FP for scavenged spill slot accesses to prevent corruption of
the callee-save region when the SP is re-aligned.
Based on problem and patch reported by @paulwalker-arm
This is an alternative to solution proposed in D45770
Reviewers: t.p.northover, paulwalker-arm, thegameg, javed.absar
Subscribers: qcolombet, mcrosier, paulwalker-arm, kristof.beyls, rengolin, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D46063
llvm-svn: 330976
This diff implements --redefine-sym option
for changing the name of a symbol.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D46029
llvm-svn: 330973
For local variables the first DW_OP_deref is consumed by turning the
location kind into a memeory location, but that only makes sense for
values that are in a register to begin with, which cannot happen for
global variables that are attached to a symbol.
rdar://problem/39741860
llvm-svn: 330970
Summary:
The old comment referred to llvm/IR/Writer.h which doesn't longer exist.
This patch replaces it with an up-to-date description of AsmWriter library.
Patch by Alex Yursha.
Reviewers: gribozavr, vlad.tsyrklevich
Reviewed By: vlad.tsyrklevich
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45895
llvm-svn: 330962
Summary:
Follow-up to D43690, the EliminateAvailableExternally pass currently
runs under -O0 and -O2 and up. Under -O1 we would still want to drop
available_externally symbols to reduce space without inlining having
run.
Reviewers: tejohnson
Reviewed By: tejohnson
Subscribers: mehdi_amini, llvm-commits, kcc
Differential Revision: https://reviews.llvm.org/D46093
llvm-svn: 330961
Correct the definitions of ei, di, eret, deret, wait, syscall and break.
Also provide microMIPS specific aliases to match the MIPS aliases.
Additionally correct the definition of the wait instruction so that
it is present in the instruction mapping tables.
Reviewers: smaksimovic, abeserminji, atanasyan
Differential Revision: https://reviews.llvm.org/D45939
llvm-svn: 330952
As noted, the attribute name is subject to change once we have
the clang side implemented, but it's clear that we need some
kind of attribute-based predication here based on the discussion
for:
rL330437
llvm-svn: 330951
This is another preliminary step for disabling this transform as
discussed in the post-commit thread for:
rL330437
I'm using one of the names suggested there for the attribute, but
we can fix that up as needed once the clang side of this is sorted
out.
llvm-svn: 330950
This causes some slight shuffling but no meaningful codegen differences on the
corpus I used for testing, but it has a larger impact when combined with e.g.
rematerialisation. Regardless, it makes sense to report as accurate
target-specific information as possible.
llvm-svn: 330949
There's no direct instruction for this, but it's trivially implemented
with two movs. Without this the code generator just dies when
encountering a shufflevector.
Differential Revision: https://reviews.llvm.org/D46116
llvm-svn: 330948
As discussed in the post-review comments for rL330437,
we need to guard this fold to allow existing code to
keep working with the undefined behavior that they've
come to rely on.
That would mean duplicating more code than we already
have, so let's fix that first.
llvm-svn: 330947
This patch adds a new shuffle kind useful for transposing a 2xn matrix. These
transpose shuffle masks read corresponding even- or odd-numbered vector
elements from two n-dimensional source vectors and write each result into
consecutive elements of an n-dimensional destination vector. The transpose
shuffle kind is meant to model the TRN1 and TRN2 AArch64 instructions. As such,
this patch also considers transpose shuffles in the AArch64 implementation of
getShuffleCost.
Differential Revision: https://reviews.llvm.org/D45982
llvm-svn: 330941
I'm unable to construct a representative test case that demonstrates the
advantage, but it seems sensible to report accurate target-specific
information regardless.
llvm-svn: 330938
This patch extends the PredicateMethod of AsmOperands used in SVE's
LD1 instructions with a DiagnosticPredicate. This makes them 'context
sensitive' to the operand that has been parsed and tells the user to
use the right register (with expected shift/extend), rather than telling
the immediate is out of range when it actually parsed a register.
Patch [2/2] in a series to improve assembler diagnostics for SVE:
- Patch [1/2]: https://reviews.llvm.org/D45879
- Patch [2/2]: https://reviews.llvm.org/D45880
Reviewers: olista01, stoklund, craig.topper, mcrosier, rengolin, echristo, fhahn, SjoerdMeijer, evandro, javed.absar
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D45880
llvm-svn: 330934
This has no impact on codegen for the current RISC-V unit tests or my small
benchmark set and very minor changes in a few programs in the GCC torture
suite. Based on this, I haven't been able to produce a representative test
program that demonstrates a benefit from isLegalAddressingMode. I'm committing
the patch anyway, on the basis that presenting accurate information to the
target-independent code is preferable to relying on incorrect generic
assumptions.
llvm-svn: 330932
An optional, light-weight and backward-compatible mechanism to allow
specifying that a diagnostic _only_ applies to a partial mismatch (NearMiss),
rather than a full mismatch.
Patch [1/2] in a series to improve assembler diagnostics for SVE.
- Patch [1/2]: https://reviews.llvm.org/D45879
- Patch [2/2]: https://reviews.llvm.org/D45880
Reviewers: olista01, stoklund, craig.topper, mcrosier, rengolin, echristo, fhahn, SjoerdMeijer, evandro, javed.absar
Reviewed By: olista01
Differential Revision: https://reviews.llvm.org/D45879
llvm-svn: 330930
LLVM might be compiled using a toolchain file which controls the linker
to use via flags (e.g. `-B` or `-fuse-ld=`). Take these flags into
account for linker detection. We can also correct the detection by
manually passing LLVM_USE_LINKER, of course, but it seems more
convenient to have the detection take flags into account.
Differential Revision: https://reviews.llvm.org/D45464
llvm-svn: 330924
instructions.
These have special permission according to the x86 manual to read
unaligned memory, and this folding is done by ICC and GCC as well.
This corrects one of the issues identified in PR37246.
llvm-svn: 330896
comparison instructions (pcmp[ei]stri*).
These will help show improvements from fixes to PR37246.
I've not really covered the mask forms of this intrinsic as I don't have
as good of an intuition about the likely usage patterns there. Happy for
someone to extend this with tests covering the mask form.
llvm-svn: 330895
This reverts commit 023c8be90980e0180766196cba86f81608b35d38.
This patch triggers miscompile of zlib on PowerPC platform. Most likely it is
caused by some pre-backend PPC-specific pass, but we don't clearly know the
reason yet. So we temporally revert this patch with intention to return it
once the problem is resolved. See bug 37229 for details.
llvm-svn: 330893
If no data or instructions are emitted after a location directive, we
should clear the cv_loc when we change sections, or it will be emitted
at the beginning of the next section. This violates our invariant that
all .cv_loc directives belong to the same section. Add clearer
assertions for this.
llvm-svn: 330884
This makes it possible to reverse a filtered range. For example, here's
a way to visit memory accesses in a BasicBlock in reverse order:
auto MemInsts = reverse(make_filter_range(BB, [](Instruction &I) {
return isa<StoreInst>(&I) || isa<LoadInst>(&I);
}));
for (auto &MI : MemInsts)
...
To implement this functionality, I factored out forward iteration
functionality into filter_iterator_base, and added a specialization of
filter_iterator_impl which supports bidirectional iteration. Thanks to
Tim Shen, Zachary Turner, and others for suggesting this design and
providing feedback! This version of the patch supersedes the original
(https://reviews.llvm.org/D45792).
This was motivated by a problem we encountered in D45657: we'd like to
visit the non-debug-info instructions in a BasicBlock in reverse order.
Testing: check-llvm, check-clang
Differential Revision: https://reviews.llvm.org/D45853
llvm-svn: 330875
Algorithmically compute the 'x20' SDIV/UDIV vector costs - this is necessary for PR36550 when DIV costs will be driven from the scheduler models.
llvm-svn: 330870
lit is picking up a stale executable in the unittests tree, which is
failing on Windows.
To simplify the CMake and avoid problems like this in the future, now we
always compile the test, but the test exits successfully when plugins
are not enabled.
llvm-svn: 330867
- Add "amdgpu-waitcnt-forcezero" to force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
- Add debug counters to control force emit of s_waitcnt instrs; debug counters:
si-insert-waitcnts-forceexp: force emit s_waitcnt expcnt(0) instrs
si-insert-waitcnts-forcevm: force emit s_waitcnt lgkmcnt(0) instrs
si-insert-waitcnts-forcelgkm: force emit s_waitcnt vmcnt(0) instrs
- Add some debug statements
Note that a variant of this patch was previously committed/reverted.
Differential Revision: https://reviews.llvm.org/D45888
llvm-svn: 330862
Debug var, expr and loc were only supported for non-fixed stack objects.
This patch adds the following fields to the "fixedStack:" entries, and
renames the ones from "stack:" to:
* debug-info-variable
* debug-info-expression
* debug-info-location
Differential Revision: https://reviews.llvm.org/D46032
llvm-svn: 330859
Imports in a wasm module can have custom module name. This change
adds the module name to the WasmSymbol structure so that the linker
can preserve this module name.
This is needed to fix: https://bugs.llvm.org/show_bug.cgi?id=37168
Differential Revision: https://reviews.llvm.org/D45797
llvm-svn: 330854
Previously we only formed MUL_IMM when we split a constant. This blocked load folding on those cases. We should also form MUL_IMM for 3/5/9 to favor LEA over load folding.
Differential Revision: https://reviews.llvm.org/D46040
llvm-svn: 330850
Previously `call zero`, `call f0` etc would fail. This leads to compilation
failures if building programs that define functions with those names and using
-save-temps.
llvm-svn: 330846
Summary:
When performing indirect call promotion, current implementation inspects "all" parameters of the callsite and attemps to match with the formal argument type of the callee function. However, it is not possible to find the type for variable length arguments, and the compiler crashes when it attemps to match the type for variable lenght argument.
It seems that the bug is introduced with D40658. Prior to that, the type matching is performed only for the parameters whose ID is less than callee->getFunctionNumParams(). The attached test case will crash without the patch.
Reviewers: mssimpso, davidxl, davide
Reviewed By: mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46026
llvm-svn: 330844
Virtually all other tablegen outputs are called .inc, not .gen, so rename these two too for consistency.
No behavior change.
https://reviews.llvm.org/D46058
llvm-svn: 330843
As discussed in D45862, we want to delete parts of
this code because it can create more instructions
than it removes. But we also want to preserve some
folds that are winners, so tidy up what's here to
make splitting the good from bad a bit easier.
llvm-svn: 330841
As discussed in D45862, we want these folds sometimes
because they're good improvements.
But as we can see here, the current logic doesn't
check uses and doesn't produce optimal code in all
cases.
llvm-svn: 330837
To do this:
1. Change GlobalAddress SDNode to TargetGlobalAddress to avoid legalizer
split the symbol.
2. Change ExternalSymbol SDNode to TargetExternalSymbol to avoid legalizer
split the symbol.
3. Let PseudoCALL match direct call with target operand TargetGlobalAddress
and TargetExternalSymbol.
Differential Revision: https://reviews.llvm.org/D44885
llvm-svn: 330827
To do this:
1. Add PseudoCALLIndirct to match indirect function call.
2. Add PseudoCALL to support parsing and print pseudo `call` in assembly
3. Expand PseudoCALL to the following form with R_RISCV_CALL relocation type
while encoding:
auipc ra, func
jalr ra, ra, 0
If we expand PseudoCALL before emitting assembly, we will see auipc and jalr
pair when compile with -S. It's hard for assembly parser to parsing this
pair and identify it's semantic is function call and then insert R_RISCV_CALL
relocation type. Although we could insert R_RISCV_PCREL_HI20 and
R_RISCV_PCREL_LO12_I relocation types instead of R_RISCV_CALL.
Due to RISCV relocation design, auipc and jalr pair only can relax to jal with
R_RISCV_CALL + R_RISCV_RELAX relocation types.
We expand PseudoCALL as late as encoding(RISCVMCCodeEmitter) instead of before
emitting assembly(RISCVAsmPrinter) because we want to preserve call
pseudoinstruction in assembly code. It's more readable and assembly parser
could identify call assembly and insert R_RISCV_CALL relocation type.
Differential Revision: https://reviews.llvm.org/D45859
llvm-svn: 330826
ISel is currently picking 'JAL' over 'JAL_MM' for calling a function when
targeting microMIPS. A later patch will correct this behaviour.
This patch extends the mechanism for transforming instructions into their short
delay to recognise 'JAL_MM' for transforming into 'JALS_MM'.
llvm-svn: 330825
With this patch, options to add/tweak views are all grouped together in the
-help output.
The new "View Options" category looks like this:
```
View Options:
-dispatch-stats - Print dispatch statistics
-instruction-info - Print the instruction info view
-instruction-tables - Print instruction tables
-register-file-stats - Print register file statistics
-resource-pressure - Print the resource pressure view
-retire-stats - Print retire control unit statistics
-scheduler-stats - Print scheduler statistics
-timeline - Print the timeline view
-timeline-max-cycles=<uint> - Maximum number of cycles in the timeline view. Defaults to 80 cycles
-timeline-max-iterations=<uint> - Maximum number of iterations to print in timeline view
```
llvm-svn: 330816
This also means we have to check if the latch is the exiting block now,
as `transform` expects the latches to be the exiting blocks too.
https://bugs.llvm.org/show_bug.cgi?id=36586
Reviewers: efriedma, davide, karthikthecool
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D45279
llvm-svn: 330806
Summary:
When Reassociate is rewriting an expression tree it may
reuse old binary expression nodes, for new expressions.
Whenever an expression node is reused, but with a non-trivial
change in the result, we need to invalidate any debug info
that is associated with the node.
If for example rewriting
x = mul a, b
y = mul c, x
into
x = mul c, b
y = mul a, x
we still get the same result for 'y', but 'x' is a new expression.
All debug info referring to 'x' must be invalidated (marked as
optimized out) since we no longer calculate the expected value.
As a side-effect this patch avoid (at least some) problems where
reassociate could end up creating IR with debug-use before def.
Earlier the dbg.value nodes where left untouched in the IR, while
the reused binary nodes where sinked to just before the root node
of the rewritten expression tree. See PR27273 for more info about
such problems.
Reviewers: dblaikie, aprantl, dexonsmith
Reviewed By: aprantl
Subscribers: JDevlieghere, llvm-commits
Tags: #debug-info
Differential Revision: https://reviews.llvm.org/D45975
llvm-svn: 330804
Summary:
Use a MapVector instead of a DenseMap for RemMap since it is iteratated
over and the order of iteration can effect the order that new
instructions are created. This can in turn effect the use list order of
div/rem input values if multiple new instructions are created that share
any input values.
Reviewers: spatel
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D45858
llvm-svn: 330792
update API for dominators rather than doing manual, hacky updates.
This is just the first step, but in some ways the most important as it
moves the non-trivial unswitching to update the domtree rather than
fully recalculating it each time.
Subsequent patches should remove the custom update logic used by the
trivial unswitch and replace it with uses of the update API.
This also fixes a number of bugs I was seeing when testing non-trivial
unswitch due to it querying the quasi-correct dominator tree. Now the
tree is 100% correct and safe to query. That said, there are still more
bugs I can see with non-trivial unswitch just running over the test
suite, so more bugfix patches are needed as well.
Thanks to both Sanjoy and Fedor for reviews and testing!
Differential Revision: https://reviews.llvm.org/D45943
llvm-svn: 330787
Before, the outliner would grab ADRPs that used LR/W30. This patch fixes
that by checking for explicit uses of those registers before the special-casing
for ADRPs.
This also adds a test that ensures that those sorts of ADRPs won't be outlined.
llvm-svn: 330783
We were previously prefering ZEXTLOAD over EXTLOAD if it is legal. This triggers during X86's promotion of i16->i32. Not sure about other targets.
Using ZEXTLOAD can prevent folding it to SEXTLOAD later if we were to promote a sign extended operand like we would need for SRA. However, X86 doesn't currently promote i16 SRA. I was looking into doing that which is how I found this issue.
This is also blocking our ability to fold 4 byte aligned EXTLOADs with "loadi32". This is what caused most of the test changes here.
Differential Revision: https://reviews.llvm.org/D45585#inline-402825
llvm-svn: 330781
Previously, _any_ store or load instruction was considered to be
operating on a spill if it had a frameindex as an operand, and thus
was fair game for optimisations such as "StackSlotColoring". This
usually works, except on architectures where spills can be partially
restored, for example on X86 where a spilt vector can have a single
component loaded (zeroing the rest of the target register). This can be
mis-interpreted and the zero extension unsoundly eliminated, see
pr30821.
To avoid this, this commit optionally provides the caller to
isLoadFromStackSlot and isStoreToStackSlot with the number of bytes
spilt/loaded by the given instruction. Optimisations can then determine
that a full spill followed by a partial load (or vice versa), for
example, cannot necessarily be commuted.
Patch by Jeremy Morse!
Differential Revision: https://reviews.llvm.org/D44782
llvm-svn: 330778
Summary:
It was removed about a year ago in r300477. Bring it back, along with
its unittest, when the MSVC STL is in use. The MSVC STL performs
self-assignment in std::shuffle. These days, llvm::sort calls
std::shuffle when expensive checks are enabled to help find
non-determinism bugs.
Reviewers: craig.topper, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D46028
llvm-svn: 330776
Summary: This is no longer used by mesa since its 18.0.0 release.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arsenm, kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye
Differential Revision: https://reviews.llvm.org/D45988
llvm-svn: 330775
Summary:
The PointerMayBeCapturedBefore function's DomTree arg should be
const instead of non-const. There are no non-const uses of it
in the function.
llvm-svn: 330769
These are all but 1 of the select-of-constant tests that appear
to be transformed within foldSelectICmpAnd() and the block above
it predicated by decomposeBitTestICmp().
As discussed in D45862 (and can be seen in several tests here),
we probably want to stop doing those transforms because they
can increase the instruction count without benefitting other
passes or codegen.
The 1 test not included here is a urem test where the bit hackery
allows us to remove a urem. To preserve killing that urem, we
should do some stronger known-bits analysis or pattern matching of
'urem x, (select-of-pow2-constants)'.
llvm-svn: 330768
Method BugDriver::performFinalCleanups(...) would delete Module object
it worked on, which was also deleted by its caller
(e.g. TestCodeGenerator(...)). Changed the code to avoid double delete
and make Module ownership slightly clearer.
Patch by Andrzej Janik.
llvm-svn: 330763
-stdlib=libc++ is added to both the compilation and the link flags, but
the logic for adding it was only checking if it was supported during
compilation and not linking. This could lead to false positives, for
example when using clang with libstdc++ (where the compiler would
support -stdlib=libc++ but then linking would fail because of libc++
actually being unavailable).
llvm-svn: 330761
The output of update_llc_test_checks.py on this test file has changed,
so the test file should be updated to minimize source changes in future
patches.
The test updates for this file appear to be limited to relaxations of
the form:
-; SSE2-NEXT: movq %rdi, -{{[0-9]+}}(%rsp) # 8-byte Spill
+; SSE2-NEXT: movq %rdi, {{[-0-9]+}}(%r{{[sb]}}p) # 8-byte Spill
This was suggested in https://reviews.llvm.org/D45995.
llvm-svn: 330758
When debugging test failures with -vv (or -v in the case of the
internal shell), this makes it easier to locate the RUN line that
failed. For example, clang's test/Driver/linux-ld.c has 892 total RUN
lines, and clang's test/Driver/arm-cortex-cpus.c has 424 RUN lines
after concatenation for line continuations.
When reading the generated shell script, this also makes it easier to
locate the RUN line that produced each command.
To support reporting RUN line numbers in the case of the internal
shell, this patch extends the internal shell to support the null
command, ":", except pipelines are not supported.
Reviewed By: asmith, delcypher
Differential Revision: https://reviews.llvm.org/D44598
llvm-svn: 330755
If a packed inline constant is sign extended it must be truncated
after the shift. I.e. a constant (0xH0000, 0xHBC00), will be represented
as 0xFFFFFFFFBC000000 in the IR because the immediate is sign extended
to 64 bit. After the value shifted right by 16 to use it in a low part
with op_sel_hi it becomes 0xFFFFFFFFBC00 and does not qualify as inline
constant any longer.
Fixed the error and added verification code. Without the fix and with
the verification bug is causing pk_max_f16_literal.ll to fail.
Differential Revision: https://reviews.llvm.org/D45987
llvm-svn: 330752
Removes one subprocess and one temp file from the build for each tablegen
invocation.
No intended behavior change.
https://reviews.llvm.org/D45899
llvm-svn: 330742
This is part of fixing the instruction predicates for MIPS.
Reviewers: atanasyan, abeserminji
Differential Revision: https://reviews.llvm.org/D44212
This patch relands r327409, hopefully without the problematic part of the
tests that cause FileCheck to assert on the windows expensive checks bot.
llvm-svn: 330741
Patch #2 from VPlan Outer Loop Vectorization Patch Series #1
(RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
This patch introduces the basic infrastructure to detect, legality check
and process outer loops annotated with hints for explicit vectorization.
All these changes are protected under the feature flag
-enable-vplan-native-path. This should make this patch NFC for the existing
inner loop vectorizer.
Reviewers: hfinkel, mkuper, rengolin, fhahn, aemerson, mssimpso.
Differential Revision: https://reviews.llvm.org/D42447
llvm-svn: 330739
After D43236, we started interchanging loops with empty dependence
matrices. In isProfitableForVectorization, we try to determine if
interchanging makes the loop dependences more friendly to the
vectorizer. If there are no dependences, we should not interchange,
based on that heuristic.
Reviewers: efriedma, mcrosier, karthikthecool, blitz.opensource
Reviewed By: mcrosier
Differential Revision: https://reviews.llvm.org/D45208
llvm-svn: 330738
The instruction printer used by llvm-mca to generate the performance report now
defaults the output assembly format to the format used for the input assembly
file.
On x86, the asm format can be either AT&T or Intel, depending on the
presence/absence of directive `.intel_syntax`.
Users can still specify a different assembly dialect with the command line flag
-output-asm-variant=<uint>.
llvm-svn: 330733
Current code does not check that a register number is in the 0-31 range.
Sometimes the parser checks that later for some kinds of instructions,
but that leads to unclear / incorrect error messages like that:
% cat test.s
.text
lb $4, 8($32)
% llvm-mc test.s -triple=mips64-unknown-linux
test.s:2:10: error: expected memory with 16-bit signed offset
lb $4, 8($32)
^
Sometimes the parser just crashes:
% cat test.s
.text
lw $4, 8($32)
% llvm-mc test.s -triple=mips64-unknown-linux
This patch resolves the problem by checking that register number after
'$' sign is in the 0-31 range. If the number is out of the range the
parser shows the `invalid register number` error, but treats invalid
register number as a normal one to continue parsing and catch other
possible errors.
Differential Revision: https://reviews.llvm.org/D45919
llvm-svn: 330732
The first step in fixing problems raised in D45862
is to make the problems visible. Now we can more easily
see/update cases where selects have been turned into
multiple instructions with no apparent improvement in
analysis or benefits for other passes (vectorization).
llvm-svn: 330731
The current version of the script uses regex for params.
This could mask a bug (param values got wrongly swapped),
but it seems unlikely in practice, so let's just update
the whole file to reduce diffs when there is a meaningful
change here.
llvm-svn: 330729
`shtest-xunit-output.py` test.
Although there is no `-` file Jeremy Morse has reported to me that it
causes problems in their setup because lit tries to find it and ends up
loading an out of tree lit configuration file.
llvm-svn: 330728
It used to symlink dsymutil to llvm-dsymutil, but after r327790 llvm's dsymutil
binary is now called dsymutil without prefix.
r327792 then reversed the direction of the symlink if
LLVM_INSTALL_CCTOOLS_SYMLINKS was set, but that looks like a buildfix and not
like something anyone should need.
https://reviews.llvm.org/D45966
llvm-svn: 330727
The memory location an invariant load is using can never be clobbered by
any store, so it's safe to move the load ahead of the store.
Differential Revision: https://reviews.llvm.org/D46011
llvm-svn: 330725
Zero latency instructions are now scheduled the same way as other instructions.
Before this patch, there was a specialzed code path for those instructions.
All scheduler events are now generated from method `scheduleInstruction()` and
from method `cycleEvent()`. This will make easier to implement a "execution
stage", and let that stage publish all the scheduler events.
No functional change intended.
llvm-svn: 330723
While not necessary for correctness, it is preferable for
performance reasons on all architectures we currently support
to align functions to 16-byte boundaries by default.
llvm-svn: 330718
Split off pinsr/pextr and extractps instructions.
(Mostly) fixes PR36887.
Note: It might be worth adding a WriteFInsertLd class as well in the future.
Differential Revision: https://reviews.llvm.org/D45929
llvm-svn: 330714
Summary:
If attribute "use-soft-float"="true" is set then X86ISelLowering.cpp sets
'Promote' action for ISD::SINT_TO_FP operation on type i32.
But 'Promote' action is not proper in this case since lib function
__floatsidf is available for casting from signed int to float type.
Thus Expand action is more suitable here.
The Expand action should be set for ISD::UINT_TO_FP for soft float as well.
If function attribute "use-soft-float"="true" is set then infinite looping
can happen in DAG combining, function visitSINT_TO_FP() replaces SINT_TO_FP
node with UINT_TO_FP node and function combineUIntToFP() replace vice versa in cycle.
The fix prevents it.
Patch by vrybalov
Differential Revision: https://reviews.llvm.org/D45572
llvm-svn: 330711
loop unswitch.
This code incorrectly added the header to the loop block set early. As
a consequence we would incorrectly conclude that a nested loop body had
already been visited when the header of the outer loop was the preheader
of the nested loop. In retrospect, adding the header eagerly doesn't
really make sense. It seems nicer to let the cycle be formed naturally.
This will catch crazy bugs in the CFG reconstruction where we can't
correctly form the cycle earlier rather than later, and makes the rest
of the logic just fall out.
I've also added various asserts that make these issues *much* easier to
debug.
llvm-svn: 330707
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.
It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.
The second part is platform independent and ensures that:
* CFI instructions do not affect code generation (they are not counted as
instructions when tail duplicating or tail merging)
* Unwind information remains correct when a function is modified by
different passes. This is done in a late pass by analyzing information
about cfa offset and cfa register in BBs and inserting additional CFI
directives where necessary.
Added CFIInstrInserter pass:
* analyzes each basic block to determine cfa offset and register are valid
at its entry and exit
* verifies that outgoing cfa offset and register of predecessor blocks match
incoming values of their successors
* inserts additional CFI directives at basic block beginning to correct the
rule for calculating CFA
Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.
CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.
Patch by Violeta Vukobrat.
Differential Revision: https://reviews.llvm.org/D42848
llvm-svn: 330706
Guard the MIPS64 variant correctly for i64, mark the MIPS32 version as not
in microMIPS and provide the microMIPS version.
Additionally, remove a related stale XFAIL'd test as bswap has its own test
case providing coverage.
Reviewers: smaksimovic, abeserminji, atanasyan
Differential Revision: https://reviews.llvm.org/D45816
llvm-svn: 330705
Summary:
The pass is supposed to scalarize such intrinsics if the target does not support
them natively, so if the scalarization does not happen instruction selection
crashes due to inability to lower these intrinsics.
Reviewers: andrew.w.kaylor, craig.topper
Reviewed By: andrew.w.kaylor
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45947
llvm-svn: 330700
By checking that none of the child loops contain a BB we make sure BBMap
contains the innermost loop defining BB. This invariant was violated in
LoopInterchange and got caught by this assertion.
Reviewers: chandlerc, mzolotukhin, sanjoy, mehdi_amini, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D45971
llvm-svn: 330698
/usr/local/bin/ld.lld: error: undefined symbol: llvm::createAggressiveInstCombinerPass()
>>> referenced by cc1_main.cpp
>>> tools/clang/tools/driver/CMakeFiles/clang.dir/cc1_main.cpp.o:(_GLOBAL__sub_I_cc1_main.cpp)
And so on
The bot coverage is clearly missing.
llvm-svn: 330693
Summary:
I am preparing a patch to the path function. While working on it, I
noticed that some of the areas are lacking test coverage (e.g. filename
and parent_path functions), so I add more tests to guard against
regressions there.
I have also found the failure messages hard to understand, so I rewrote
some existing test to give more actionable messages when they fail:
- for tests which run over multiple inputs, I use SCOPED_TRACE, to show
which of the inputs caused the actual failure.
- for comparisons of vectors, I use gmock's container matchers, which
will print out the full container contents (and the elements that
differ) when they fail to match.
Reviewers: zturner, espindola
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45941
llvm-svn: 330691
Add explicit dependency on ObjcopyTableGen
and rerun the tests on Windows.
I will double-check the build bots
and revert this commit if necessary.
llvm-svn: 330685
This encoding is recognized by the CPU, but the behavior is undefined. This makes the disassembler handle it correctly so we don't print bswapl with a 16-bit register.
llvm-svn: 330682
This code path can very clearly be called in a context where we have
baselined all the cloned blocks to a particular loop and are trying to
handle nested subloops. There is no harm in this, so just relax the
assert. I've added a test case that will make sure we actually exercise
this code path.
llvm-svn: 330680
The test is apparently needed e.g. for check-cfi on Windows where we get
'C:/b/slave/sanitizer-windows/build/./bin/clang.exe': command not found
without it. Try to fix the problem that was fixed by r330672 by also checking
for isabs() instead.
llvm-svn: 330673
lit's util.which() would check if the passed-in path existed directly,
and if so return it as-is. This is never the case when running llvm's, clang's,
or lld's tests normally. But when running `./llvm-lit path/to/clang/test`
with a cwd of llvm-build/bin, this if would detect that clang exists at path
'clang' and return 'clang' as the discovered clang binary -- and then lit would
use the " clang " -> "*** Do not use 'clang' in tests, use '%clang'. ***"
substitution to replace that with a broken test. By removing this early
return, lit ends up with the usual absolute path and everything works even
in this uncommon case.
llvm-svn: 330672
(notionally Scalar.h is part of libLLVMScalarOpts, so it shouldn't be
included by InstCombine which doesn't/shouldn't need to depend on
ScalarOpts)
llvm-svn: 330669
I was reminded today that this patch got reverted in r301885. I can no
longer reproduce the failure that caused the revert locally (...almost
one year later), and the patch applied pretty cleanly, so I guess we'll
see if the bots still get angry about it.
The original breakage was InstSimplify complaining (in "assertion
failed" form) about getting passed some crazy IR when running `ninja
check-sanitizer`. I'm unable to find traces of what, exactly, said crazy
IR was. I suppose we'll find out pretty soon if that's still the case.
:)
Original commit:
Author: gbiv
Date: Mon May 1 18:12:08 2017
New Revision: 301880
URL: http://llvm.org/viewvc/llvm-project?rev=301880&view=rev
Log:
[InstSimplify] Handle selects of GEPs with 0 offset
In particular (since it wouldn't fit nicely in the summary):
(select (icmp eq V 0) P (getelementptr P V)) -> (getelementptr P V)
Differential Revision: https://reviews.llvm.org/D31435
llvm-svn: 330667
As we're becoming stricter w/ respect to not allowing vregs having LLTs
and regclasses assigned both mid-globalisel pipeline, the number of
extra copies grows, some of which separate G_UNMERGE's from their
corresponding G_MERGE's, becoming a performance concern.
It's worth mentioning that we're already looking through copies while
combining legalization artifacts for every kind of artifact but
G_UNMERGE.
Reviewed By: aditya_nandakumar
Reviewers: ab, t.p.northover, volkan, javed.absar
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45644
llvm-svn: 330660