There appears to be a mis-compile with MemorySSA-backed DSE in
combination with llvm.lifetime.end. It currently appears like
DSE is doing the right thing and the llvm.lifetime.end markers
are incorrect. The reverted patch uncovers the mis-compile.
This patch temporarily switches back to the legacy DSE
implementation, while we investigate.
This reverts commit 9d172c8e9c.
When looking for memory defs killed by memory terminators the code
currently incorrectly ignores the size argument of llvm.lifetime.end.
This patch updates the code to use isMemTerminator and updates
isMemTerminator to use isOverwrite() to make sure locations that are
outside the range marked as dead by llvm.lifetime.end are not
considered. Note that isOverwrite is only used for llvm.lifetime.end,
because free-like functions make the whole underlying object dead.
This matches the legacy PM pass by having one constructor use command
line flags, and the other use parameters to the pass.
This fixes all tests under Transforms/LowerTypeTests using NPM.
Reviewed By: ychen, pcc
Differential Revision: https://reviews.llvm.org/D87845
This patch adds handling of rotation patterns with constant shift amounts - the next bit will be how we want to support non-uniform constant vectors.
Differential Revision: https://reviews.llvm.org/D87452
Pulled from D87452, this is a fixed version of the collectBitParts fshl/fshr handling which as @nikic noticed wasn't checking for different providers or had correct bit ordering (which was hid by only testing shift amounts of bitwidth/2).
Differential Revision: https://reviews.llvm.org/D88292
Issue Details:
In order to support coroutine splitting, any multi-value PHI node in a coroutine is split into multiple blocks with single-value PHI Nodes, which then allows a subsequent transform to generate `reload` instructions as required (i.e., to reload the value if required if the coroutine has been resumed). This causes issues with EH pads (`catchswitch` and `catchpad`) as all pads within a `catchswitch` must have the same unwind destination, but the coroutine splitting logic may modify them to each have a unique unwind destination if there is a PHI node in the unwind `cleanuppad` that is set from values in the `catchswitch` and `cleanuppad` blocks.
Fix Details:
During splitting, if such a PHI node is detected, then create a "dispatcher" `cleanuppad` as well as the blocks with single-value PHI Nodes: thus the "dispatcher" is the unwind destination and it will detect which predecessor called it and then branch to the appropriate single-value PHI node block, which will then branch back to the original `cleanuppad` block.
Reviewed By: GorNishanov, lxfind
Differential Revision: https://reviews.llvm.org/D88059
Summary:
This patch add support for printing analysis messages relating to data
globalization on the GPU. This occurs when data is shared between the
threads in a GPU context and must be pushed to global or shared memory.
Reviewers: jdoerfert
Subscribers: guansong hiraditya llvm-commits ormris sstefan1 yaxunl
Tags: #OpenMP #LLVM
Differential Revision: https://reviews.llvm.org/D88243
Introduce a helper which can be used to update the debug location of an
Instruction after the instruction is hoisted. This can be used to safely
drop a source location as recommended by the docs.
For more context, see the discussion in https://reviews.llvm.org/D60913.
Differential Revision: https://reviews.llvm.org/D85670
These were only really used for 2 things. One was to check if the operand matches the phi if it exists. The other was for the createOp method to build the reduction.
For the first case we still have the operation we just need to know how to index its operands. So I've modified getLHS/getRHS to just use the opcode/kind to know how to find the right operands on an instruction that is now passed in.
For the other case we had to create an OperationData object to set the LHS/RHS values and copy the opcode/kind from another object. We would then just call createOp on that temporary object. Instead I've made LHS/RHS arguments to createOp and removed all these temporary objects.
Differential Revision: https://reviews.llvm.org/D88193
This seems to fit the CGSCC updates model better than calling
addNewFunctionInto{Ref,}SCC() on newly created/outlined functions.
Now addNewFunctionInto{Ref,}SCC() are no longer necessary.
However, this doesn't work on newly outlined functions that aren't
referenced by the original function. e.g. if a() was outlined into b()
and c(), but c() is only referenced by b() and not by a(), this will
trigger an assert.
This also fixes an issue I was seeing with newly created functions not
having passes run on them.
Ran check-llvm with expensive checks.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87798
All of the callers already have an Instruction *. Many of them
from a dyn_cast.
Also update the OperationData constructor to use a Instruction&
to remove a dyn_cast and make it clear that the pointer is non-null.
Differential Revision: https://reviews.llvm.org/D88132
D87691 reordered some checks, which turned out to be unsafe. More
specifically, when examining a store instruction, the check against
getOrCreateResult should be done before attempting to call
isSameMemGeneration. Otherwise a crash in MSSA walker can occur.
This patch restores the order of these calls to what it was originally.
This refactors VPuser to not inherit from VPValue to facilitate
introducing operations that introduce multiple VPValues (e.g.
VPInterleaveRecipe).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D84679
In this patch I've fixed some warnings that arose from the implicit
cast of TypeSize -> uint64_t. I tried writing a variety of different
cases to show how this optimisation might work for scalable vectors
and found:
1. The optimisation does not work for cases where the cast type
is scalable and the allocated type is not. This because we need to
know how many times the cast type fits into the allocated type.
2. If we pass all the various checks for the case when the allocated
type is scalable and the cast type is not, then when creating the
new alloca we have to take vscale into account. This leads to
sub-optimal IR that is worse than the original IR.
3. For the remaining case when both the alloca and cast types are
scalable it is hard to find examples where the optimisation would
kick in, except for simple bitcasts, because we typically fail the
ABI alignment checks.
For now I've changed the code to bail out if only one of the alloca
and cast types is scalable. This means we continue to support the
existing cases where both types are fixed, and also the specific case
when both types are scalable with the same size and alignment, for
example a simple bitcast of an alloca to another type.
I've added tests that show we don't attempt to promote the alloca,
except for simple bitcasts:
Transforms/InstCombine/AArch64/sve-cast-of-alloc.ll
Differential revision: https://reviews.llvm.org/D87378
A conversion from `pow` to `sqrt` shall not call an `errno`-setting
`sqrt` with -//infinity//: the `sqrt` will set `EDOM` where the `pow`
call need not.
This patch avoids the erroneous (pun not intended) transformation by
applying the restrictions discussed in the thread for
https://lists.llvm.org/pipermail/llvm-dev/2020-September/145051.html.
The existing tests are updated (depending on emphasis in the checks for
library calls, avoidance of overlap, and overall coverage):
- to add `ninf`, retaining the intended library call,
- to use the intrinsic, retaining the use of `select`, or
- to expect the replacement to not occur.
The following is tested:
- The pow intrinsic folds to a `select` instruction to
handle -//infinity//.
- The pow library call folds, with `ninf`, to `sqrt` without the
`select` instruction associated with handling -//infinity//.
- The pow library call does not fold to `sqrt` without `ninf`.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D87877
As an exhaustive test shows, this logic is fully identical to the old
implementation, with exception of the case where both of the operands
had empty ranges:
```
TEST_F(ConstantRangeTest, CVP_UDiv) {
unsigned Bits = 4;
EnumerateConstantRanges(Bits, [&](const ConstantRange &CR0) {
if(CR0.isEmptySet())
return;
EnumerateConstantRanges(Bits, [&](const ConstantRange &CR1) {
if(CR0.isEmptySet())
return;
unsigned MaxActiveBits = 0;
for (const ConstantRange &CR : {CR0, CR1})
MaxActiveBits = std::max(MaxActiveBits, CR.getActiveBits());
ConstantRange OperandRange(Bits, /*isFullSet=*/false);
for (const ConstantRange &CR : {CR0, CR1})
OperandRange = OperandRange.unionWith(CR);
unsigned NewWidth = OperandRange.getUnsignedMax().getActiveBits();
EXPECT_EQ(MaxActiveBits, NewWidth) << CR0 << " " << CR1;
});
});
}
```
This is a continuation of 8d487668d0,
the logic is pretty much identical for SRem:
Name: pos pos
Pre: C0 >= 0 && C1 >= 0
%r = srem i8 C0, C1
=>
%r = urem i8 C0, C1
Name: pos neg
Pre: C0 >= 0 && C1 <= 0
%r = srem i8 C0, C1
=>
%r = urem i8 C0, -C1
Name: neg pos
Pre: C0 <= 0 && C1 >= 0
%r = srem i8 C0, C1
=>
%t0 = urem i8 -C0, C1
%r = sub i8 0, %t0
Name: neg neg
Pre: C0 <= 0 && C1 <= 0
%r = srem i8 C0, C1
=>
%t0 = urem i8 -C0, -C1
%r = sub i8 0, %t0
https://rise4fun.com/Alive/Vd6
Now, this new logic does not result in any new catches
as of vanilla llvm test-suite + RawSpeed.
but it should be virtually compile-time free,
and it may be important to be consistent in their handling,
because if we had a pair of sdiv-srem, and only converted one of them,
-divrempairs will no longer see them as a pair,
and thus not "merge" them.
The current code for handling pow(x, y) where y is an integer plus 0.5
is not explicitly guarded against attempting to transform the case where
abs(y) is exactly 0.5.
The latter case is meant to be handled by `replacePowWithSqrt`. Indeed,
if the pow(x, integer+0.5) case proceeds past a certain point, it will
hit an assertion by attempting to form pow(x, 0) using `getPow`.
This patch adds an explicit check to prevent attempting the
pow(x, integer+0.5) transformation on pow(x, +/-0.5) as suggested during
the review of D87877. This has the effect of retaining the shrinking of
`pow` to `powf` when the `sqrt` libcall cannot be formed.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D88066
Refactored __tgt_target_data_begin_mapper_<issue|wait> to receive the handle as an input/output argument.
This given the compiler warning of returning the handle as copy.
Differential Revision: https://reviews.llvm.org/D88029
This provides a convenient way to print VPValues and recipes in a
debugger. In particular it saves the user from instantiating
VPSlotTracker to print recipes or values.
Changes TTI function getIntImmCostInst to take an additional Instruction parameter,
which enables us to be able to check it is part of a min(max())/max(min()) pattern that will match SSAT.
We can then mark the constant used as free to prevent it being hoisted so SSAT can still be generated.
Required minor changes in some non-ARM backends to allow for the optional parameter to be included.
Differential Revision: https://reviews.llvm.org/D87457
Extend the handling of memory intrinsics to also include non-
target-specific intrinsics, in particular masked loads and stores.
Invent "isHandledNonTargetIntrinsic" to distinguish between intrin-
sics that should be handled natively from intrinsics that can be
passed to TTI.
Add code that handles masked loads and stores and update the
testcase to reflect the results.
Differential Revision: https://reviews.llvm.org/D87340
SimplifyCFG's options should always be overridden by command line flags,
but they mistakenly weren't in the default constructor.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D87718
1. Store intrinsic ID in ParseMemoryInst instead of a boolean flag
"IsTargetMemInst". This will make it easier to add support for
target-independent intrinsics.
2. Extract the complex multiline conditions from EarlyCSE::processNode
into a new function "getMatchingValue".
Differential Revision: https://reviews.llvm.org/D87691