The warning would fire when calling canReplaceGEPIdxWithZero on a GEP
whose source element type is a scalable vector. The size of scalable
vector types is not known, so this optimization cannot be performed.
This patch fixes the issue by:
- bailing out early in this routine if the GEP instruction's source
element type is a scalable vector.
- making use of getFixedSize -- this removes the dependency on the
deprecated interface.
Reviewed By: fpetrogalli
Differential Revision: https://reviews.llvm.org/D89968
This allows using annotation in a much more contexts than it currently has.
especially when annotation with template or constexpr.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D88645
I'm not certain InstCombinerImpl::matchBSwapOrBitReverse needs to filter the or(op0(),op1()) ops - there are just too many cases that recognizeBSwapOrBitReverseIdiom/collectBitParts handle now (and quickly).
As discussed on PR35155, this extends narrowFunnelShift (recently renamed from narrowRotate) to support basic funnel shift patterns.
Unlike matchFunnelShift we don't include the computeKnownBits limitation as extracting the pattern from the zext/trunc layers should be a indicator of reasonable funnel shift codegen, in D89139 we demonstrated how to efficiently promote funnel shifts to wider types.
Differential Revision: https://reviews.llvm.org/D89542
This follows on from D89558 which added the new intrinsic and D88955
which added similar combines for llvm.amdgcn.fmul.legacy.
Differential Revision: https://reviews.llvm.org/D90028
As discussed in D89952,
instcombine can sometimes find a way to reduce similar patterns,
but it is incomplete.
InstSimplify uses the computeConstantRange() ValueTracking analysis
via simplifyICmpWithConstant(), so we just need to fill in the max
value of cttz to process any "icmp pred cttz(X), C" pattern (the
min value is initialized to zero automatically).
https://alive2.llvm.org/ce/z/Z_SLWZ
Follow-up to D89976.
When InstCombine removes an alloca, it erases the dbg.{addr,declare}
instructions which refer to the alloca. It would be better to instead
remove all debug intrinsics which describe the contents of the dead
alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s.
This effectively undoes work performed in an InstCombine run earlier in
the pipeline by LowerDbgDeclare, which inserts DW_OP_deref dbg.values
before CallInst users of an alloca. The motivating example looks like:
```
define void @foo(i32 %0) {
%a = alloca i32 ; This alloca is erased.
store i32 %0, i32* %a
dbg.value(i32 %0, "arg0") ; This dbg.value survives.
dbg.value(i32* %a, "arg0", DW_OP_deref)
call void @trivially_inlinable_no_op(i32* %a)
ret void
}
```
If the DW_OP_deref dbg.value is not erased, it becomes dbg.value(undef)
after inlining, making "arg0" unavailable. But we already have dbg.value
descriptions of the alloca's value (from LowerDbgDeclare), so the
DW_OP_deref dbg.value cannot serve its purpose of describing an
initialization of the alloca by some callee. It invalidates other useful
dbg.values, causing large gaps in location coverage, so we should delete
it (even though doing so may cause stale dbg.values to appear, if
there's a dead store to `%a` in @trivially_inlinable_no_op).
OTOH, it wouldn't be correct to delete all dbg.value descriptions of an
alloca. Note that it's possible to describe a variable that takes on
different pointer values, e.g.:
```
void use(int *);
void t(int a, int b) {
int *local = &a; // dbg.value(i32* %a.addr, "local")
local = &b; // dbg.value(i32* undef, "local")
use(&a); // (note: %b.addr is optimized out)
local = &a; // dbg.value(i32* %a.addr, "local")
}
```
In this example, the alloca for "b" is erased, but we need to describe
the value of "local" as <unavailable> before the call to "use". This
prevents "local" from appearing to be equal to "&a" at the callsite.
rdar://66592859
Differential Revision: https://reviews.llvm.org/D85555
Prior to this patch, computeKnownBits would only try to deduce trailing zeros
bits for getelementptrs. This patch adds the logic to treat geps as a series
of add * scaling factor.
Thanks to this patch, using a gep or performing an address computation
directly "by hand" (ptrtoint followed by adds and mul followed by inttoptr)
offers the same computeKnownBits information.
Previously, the "by hand" approach would have given more information.
This is related to https://llvm.org/PR47241.
Differential Revision: https://reviews.llvm.org/D86364
This is to simplify icmp instructions in the form like:
%cmp = icmp eq i32 (i8*, i8*)* bitcast (i32 (i32**, i32**)* @f32 to i32
%(i8*, i8*)), bitcast (i32 (i64**, i64**) @f64 to i32 (i8*, i8*)*)
Here @f32 and @f64 are two functions.
Differential Revision: https://reviews.llvm.org/D87850
D70365 allows us to make attributes default. This is a follow up to
actually make nosync, nofree and willreturn default. The approach we
chose, for now, is to opt-in to default attributes to avoid introducing
problems to target specific intrinsics. Intrinsics with default
attributes can be created using `DefaultAttrsIntrinsic` class.
Scalar cases were already being handled by foldLogOpOfMaskedICmps (so this was dead code), but refactoring to support non-uniform vectors will take some time, so tweak this fold in the meantime.
This reverts commit 25a97c3a43.
We have other constant folds that fold undef funnel shift amounts to 0 - so we need to be consistent.
If we end up with regressions where we lose a splat shift amount pattern we'll have to investigate other canonicalizations, but matchFunnelShift currently protects us from that.
This was broken by 16295d521e, when
instructions started being handled and not just constant
expressions. This was re-inserting an equivalent bitcast to the
original memcpy operand, which made a non-functional IR change on
every iteration.
This also fixes a secondary problem where it was inserting
addrspacecasts which may not have been legal (i.e. it changed the
source address space). Start visiting all pointer users and fail out
if we can't process them. Also start handling the relevant memory
intrinsic users. These cases can be dealt with by running
InferAddressSpaces separately.
By always performing a modulo on the shift amount constants this was causing undef amounts being replaced with zero, meaning we were losing funnel shift by splat (with undef) patterns.
Tweaked the shift amount bounds check to support (passthrough) undefs, and use Constant::mergeUndefsWith to preserve the undefs after folding.
Recently we started looking into sret parameters, though the issue could crop
up elsewhere. If the pointee type is opaque, we should not try to compute its
size because that leads to an assertion failure.
Replace m_SpecificInt with m_APIntAllowUndef to matching splats containing undefs, then use ConstantExpr::mergeUndefsWith to merge the undefs together in the result.
The undef funnel shift amounts are getting replaced with zero later on - I'll address this in a later patch, otherwise we lose potential shift by splat value patterns.
Based on the recent patches D88475 and D88429 where we are losing undef values due to extension/comparisons.
I've added a Constant::mergeUndefsWith method that merges the undef scalar/elements from another Constant into a specific Constant.
Differential Revision: https://reviews.llvm.org/D88687