When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.
New tests added here:
CodeGen/AArch64/sve-vector-splat.ll
I believe we see some code quality improvements in these existing
tests too:
CodeGen/AArch64/dag-numsignbits.ll
CodeGen/AArch64/reduce-and.ll
CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll
The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.
Differential Revision: https://reviews.llvm.org/D114357
This patch fixes a crash when despeculating ctlz/cttz intrinsics with
scalable-vector types. It is not safe to speculatively get the size of
the vector type in bits in case the vector type is not a fixed-length type. As
it happens this isn't required as vector types are skipped anyway.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D112141
Stop using APInt constructors and methods that were soft-deprecated in
D109483. This fixes all the uses I found in llvm, except for the APInt
unit tests which should still test the deprecated methods.
Differential Revision: https://reviews.llvm.org/D110807
This requires a minor change to CodeGenPrepare to ensure that
shouldSinkOperands will be called for And.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D110106
Rather than inspecting the pointer element type, use the access
type of the load/store/atomicrmw/cmpxchg.
In the process of doing this, simplify the logic by storing the
address + type in MemoryUses, rather than an Instruction + Operand
pair (which was then used to fetch the address).
In the combination of addressing modes, when replacing the matched phi nodes,
sometimes the phi node to be replaced has been modified. For example,
there’s matcher set [A, B] and [C, A], which will have cyclic dependency:
A is replaced by B and C will be replaced by A. Because we tried to match new phi node
to another new phi node, we should ignore new phi nodes when mapping new phi node to old one.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D108635
In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree,
which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D107262
If value tracking can confirm that the cttz/ctlz source is known non-zero then we don't need to create a branch (which DAG will struggle to recover from).
Differential Revision: https://reviews.llvm.org/D106685
Reland of 31859f896.
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D104797
This patch fixes an issue which occurred in CodeGenPrepare and
HWAddressSanitizer, which both at some point create a map of Old->New
instructions and update dbg.value uses of these. They did this by
iterating over the dbg.value's location operands, and if an instance of
the old instruction was found, replaceVariableLocationOp would be
called on that dbg.value. This would cause an error if the same operand
appeared multiple times as a location operand, as the first call to
replaceVariableLocationOp would update all uses of the old instruction,
invalidating the old iterator and eventually hitting an assertion.
This has been fixed by no longer iterating over the dbg.value's location
operands directly, but by first collecting them into a set and then
iterating over that, ensuring that we never attempt to replace a
duplicated operand multiple times.
Differential Revision: https://reviews.llvm.org/D105129
Reland of 31859f896.
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Differential Revision: https://reviews.llvm.org/D104797
This optimization pre-promotes the input and constants for a
switch instruction to a legal type so that all the generated compares
share the same extend. Since RISCV prefers sext for i32 to i64
extends, we should honor that to use sext.w instead of a pair
of shifts.
Reviewed By: jrtc27
Differential Revision: https://reviews.llvm.org/D104612
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D95425
This adds a simple fold into codegenprepare that converts comparison of
branches towards comparison with zero if possible. For example:
%c = icmp ult %x, 8
br %c, bla, blb
%tc = lshr %x, 3
becomes
%tc = lshr %x, 3
%c = icmp eq %tc, 0
br %c, bla, blb
As a first order approximation, this can reduce the number of
instructions needed to perform the branch as the shift is (often) needed
anyway. At the moment this does not effect very much, as llvm tends to
prefer the opposite form. But it can protect against regressions from
commits like rG9423f78240a2.
Simple cases of Add and Sub are added along with Shift, equally as the
comparison to zero can often be folded with cpsr flags.
Differential Revision: https://reviews.llvm.org/D101778
Move the findDbg* functions into lib/IR/DebugInfo.cpp from
lib/Transforms/Utils/Local.cpp.
D99169 adds a call to a function (findDbgUsers) that lives in
lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp
(LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught
this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch
moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp
which is part of LLVMCore.
Reviewed By: dblaikie, rnk
Differential Revision: https://reviews.llvm.org/D100632
Pseudo probes, when scattered in a block, can be chained dependencies of other regular DAG nodes and block DAG combine optimizations. To fix this, scattered probes in a block are grouped and placed at the beginning of the block. This shouldn't affect the profile quality.
Test Plan:
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D100002
Follow up to a6d2a8d6f5. These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
This is no-functional-change intended (NFC), but needed to allow
optimizer passes to use the API. See D98898 for a proposed usage
by SimplifyCFG.
I'm simplifying the code by removing the cl::opt. That was added
back with the original commit in D19488, but I don't see any
evidence in regression tests that it was used. Target-specific
overrides can use the usual patterns to adjust as necessary.
We could also restore that cl::opt, but it was not clear to me
exactly how to do it in the convoluted TTI class structure.
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.
This reverts commit 01ac6d1587.
This caused non-deterministic compiler output; see comment on the
code review.
> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232
This reverts commit df69c69427.
Change was reverted in commit 8d20f2c2c6 because it was causing an infinite loop. 9228f2f32 fixed the root issue in the code structure, this change just reapplies the original change w/adaptation to the new code structure.
This fixes the bug demonstrated by the test case in the commit message of 8d20f2c2 (which was a revert of cf82700). The root issue was that we have two transforms which are inverses of each other. We use one for simple induction variables (where we can use the post-inc form), and the other for everything else. The problem was that the two transforms could disagree about whether something was an induction variable.
The reverted commit made a change to one of the matcher routines which was used for one of the two transforms without updating the other matcher. However, it's worth noting the existing code w/o the reverted change also has cases where the decision could differ between the two paths.
The fix is simply to consolidate the code such that two paths must agree by construction, and to add an assert to catch any potential future re-divergence.
Triggering the infinite loop requires side stepping the SunkAddrs cache. The SunkAddrs cache has the effect of suppressing the iteration in the common case, but there are codepaths through CGP which restart iteration and clear this cache.
Unfortunately, I have not been able to construct a standalone IR test case for this. The original test case is a c++ program which when compiled by clang demonstrates the infinite loop, but all of my attempts at extracting an IR test case runnable through opt/llc have failed to reproduce. (Including capturing the IR at point of the transform itself!) I have no idea what weird state clang is creating here.
I also tried creating a test case by hand, but gave up after about an hour of trying to find the right combination to dance through multiple transforms to create the end result needed to trip the bug.
This reverts commit cf82700af8 due to a compile timeout when building the following with `clang -O2`:
```
template <class, class = int> class a;
struct b {
using d = int *;
};
struct e {
using f = b::d;
};
class g {
public:
e::f h;
e::f i;
};
template <class, class> class a : g {
public:
long j() const { return i - h; }
long operator[](long) const noexcept;
};
template <class c, class k> long a<c, k>::operator[](long l) const noexcept {
return h[l];
}
template <typename m, typename n> int fn1(m, n, const char *);
int o, p;
class D {
void q(const a<long> &);
long r;
};
void D::q(const a<long> &l) {
int s;
if (l[0])
for (; l.j(); ++s) {
if (l[s])
while (fn1(o, 0, ""))
;
r = l[s] / p;
}
}
```
This removes some (but not all) uses of type-less CreateGEP()
and CreateInBoundsGEP() APIs, which are incompatible with opaque
pointers.
There are a still a number of tricky uses left, as well as many
more variation APIs for CreateGEP.
LSR prefers to schedule iv increments just before the latch. The recent 80511565 broadened this to moving increments in the original IR. This pointed out a robustness problem with the CGP transform.
When we have a use of an induction increment outside of the loop (we canonicalize away from this form, but it happens e.g. unanalyzeable loops) we'd avoid performing the uadd/usub transform. Interestingly, all of these involve moving the increment closer to it's operands, so there's no concern about dominating all uses. We can handle that case cheaply, resulting in a more robust transform.
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.
Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.
Differential Revision: https://reviews.llvm.org/D88232
In the NFC commit 8d835f42a5, the check for `!L` is
moved to a separate function `getIVIncrement` which, instead of using `BO->getParent()`,
uses `PN->getParent()`. However, these two basic blocks are not necessarily the same.
https://bugs.llvm.org/show_bug.cgi?id=49466 demonstrates a case where `PN` is contained in
a loop while `BO` is not, causing the null-pointer dereference in `L->getLoopLatch()`.
This patch checks whether both `BO` and `PN` belong to the same loop before entering `getIVIncrement`.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D98144
This patch updates DbgVariableIntrinsics to support use of a DIArgList for the
location operand, resulting in a significant change to its interface. This patch
does not update all IR passes to support multiple location operands in a
dbg.value; the only change is to update the DbgVariableIntrinsic interface and
its uses. All code outside of the intrinsic classes assumes that an intrinsic
will always have exactly one location operand; they will still support
DIArgLists, but only if they contain exactly one Value.
Among other changes, the setOperand and setArgOperand functions in
DbgVariableIntrinsic have been made private. This is to prevent code from
setting the operands of these intrinsics directly, which could easily result in
incorrect/invalid operands being set. This does not prevent these functions from
being called on a debug intrinsic at all, as they can still be called on any
CallInst pointer; it is assumed that any code directly setting the operands on a
generic call instruction is doing so safely. The intention for making these
functions private is to prevent DIArgLists from being overwritten by code that's
naively trying to replace one of the Values it points to, and also to fail fast
if a DbgVariableIntrinsic is updated to use a DIArgList without a valid
corresponding DIExpression.
This is a compile time optimization for d9e93e8e5. Not sure this matters or not, but why not do it just in case.
This does involve querying TLI with a potentially invalid addressing mode for the using instruction, but since we don't actually pass the using instruction to the TLI callback, that should be fine.
This is a compile time optimization for d9e93e8e5. As pointed out in post dommit review on the original review (D96399), there was a moderately large compile time regression with this patch and the eager computation of domtree on matcher construction is the first obvious candidate for why.
CodeGenPrepare currently first removes empty blocks, then in a loop
performs other optimizations. One of those optimizations is the removal
of call instructions that invoke @llvm.assume, which can create new
empty blocks.
This means that when a branch only contains a call to __builtin_assume(),
the empty branch will survive into MIR, and will then only be
half-removed by MIR-level optimizations (e.g. removing the branch but
leaving the condition intact).
Fix it by eliminating @llvm.expect builtin calls before removing empty
blocks.
Reviewed By: bkramer
Differential Revision: https://reviews.llvm.org/D97848
This patch enables the case where we do not completely eliminate offset.
Supposedly in this case we reduce live range overlap that never harms, but
since there are doubts this is true, this goes as a separate change.
Differential Revision: https://reviews.llvm.org/D96399
Reviewed By: reames
While optimizing the memory instruction, we sometimes need to add
offset to the value of `IV`. We could avoid doing so if the `IV.next` is
already defined at the point of interest. In this case, we may get two
possible advantages from this:
- If the `IV` step happens to match with the offset, we don't need to add
the offset at all;
- We reduce overlap of live ranges of `IV` and `IV.next`. They may stop overlapping
and it will lead to better register allocation. Even if the overlap will preserve,
we are not introducing a new overlap, so it should be a neutral transform (Disabled
this patch, will come with follow-up).
Currently I've only added support for IVs that get decremented using `usub`
intrinsic. We could also support `AddInstr`, however there is some weird
interaction with some other transform that may lead to infinite compilation
in this case (seems like same transform is done and undone over and over).
I need to investigate why it happens, but generally we could do that too.
The first part only handles case where this reuse fully elimiates the offset.
Differential Revision: https://reviews.llvm.org/D96399
Reviewed By: reames