Add an emitter for the memrchr common extension and simplify the strrchr
call handler to use it. This enables transforming calls with the empty
string to the test C ? S : 0.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D128954
This function is well-defined for an instruction that doesn't access
memory (and thus trivially doesn't alias anything in the AST), so
drop the assert. We can end up with a readnone call here if we
originally created a MemoryDef for an indirect call, which was
later replaced with a direct readnone call.
Fixes https://github.com/llvm/llvm-project/issues/51333.
Differential Revision: https://reviews.llvm.org/D127947
LoopSimplify only requires that the loop predecessor has a single
successor and is safe to hoist into -- it doesn't necessarily have
to be an unconditional BranchInst.
Adjust LoopDeletion to assert conditions closer to what it actually
needs for correctness, namely a single successor and a
side-effect-free terminator (as the terminator is getting dropped).
Fixes https://github.com/llvm/llvm-project/issues/56266.
At the moment, the same VPlan can be used code generation of both the
main vector and epilogue vector loop. This can lead to wrong results, if
the plan is optimized based on the VF of the main vector loop and then
re-used for the epilogue loop.
One example where this is problematic is if the scalar loops need to
execute at least one iteration, e.g. due to interleave groups.
To prevent mis-compiles in the short-term, disable optimizing exit
conditions for VPlans when using epilogue vectorization. The proper fix
is to avoid re-using the same plan for both loops, which will require
support for cloning plans first.
Fixes#56319.
When converting strchr(p, '\0') to p + strlen(p) we know that
strlen() must return an offset that is inbounds of the allocated
object (otherwise it would be UB), so we can use an inbounds GEP.
An equivalent argument can be made for the other cases.
Handle denormal constant input for fcmp instructions based on the
denormal handling mode.
Reviewed By: spatel, dcandler
Differential Revision: https://reviews.llvm.org/D128647
This is a minor refinement of resolvedUndefsIn(), mostly for clarity.
If the value of an instruction is undef, then that's already a legal
final result -- we can safely rauw such an instruction with undef.
We only need to mark unknown values as overdefined, as that's the
result we get for an instruction that has not been processed because
it has an undef operand.
Differential Revision: https://reviews.llvm.org/D128251
The unidentified objects recognized in `getUnderlyingObjects` may
still alias to the noalias parameter because `getUnderlyingObjects`
may not check deep enough to get the underlying object because of
`MaxLookup`. The real underlying object for the unidentified object
may still be the noalias parameter.
Originally Patched By: tingwang
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D127202
When merging GEP of GEP with constant indices, if the second GEP's offset is not divisible by the first GEP's element size, convert both type to i8* and merge.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D125934
I looked at canonicalizing in the other direction, but that causes
many potential regressions and infinite loops because we already
(possibly wrongly) canonicalize "trunc X to i1" into an and+icmp.
This has a data layout restriction to avoid creating illegal
mask instructions, but we could remove that if we can show
that the backend can undo this when needed.
The motivating example from issue #56119 is modeled by the
PhaseOrdering test.
Building on the work on D124284, this patch tags v4i8 and v2i16 vector loads as custom, enabling SLP to try to vectorize these types ending in a partial store (using the SSE MOVD instruction) - we already do something similar for 64-bit vector types.
Differential Revision: https://reviews.llvm.org/D127604
Correct a logic bug in the memrchr enhancement added in D123629 that
makes it ineffective in a subset of cases.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D128856
At the moment LoopVersioning is only created for inner-loop
vectorization. This patch moves it to LVP::execute, which means it will
also be added for epilogue vectorization. As a consequence, the proper
noalias metadata is now also added to epilogue vector loops.
LVer will be moved to VPTransformState as follow-up.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D127966
The assert was added with 0399473de8 and is correct for that
pattern, but it is off-by-1 with the enhancement in d4f39d8333.
The transforms are still correct with the new pre-condition:
https://alive2.llvm.org/ce/z/6_6ghmhttps://alive2.llvm.org/ce/z/_GTBUt
And as shown in the new test, the transform is expected with
'ult' - in that case, the icmp reduces to test if the shift
amount is 0.
The ArgumentPromotion pass uses Mem2Reg promotion at the end to cutting
down generated alloca instructions as well as meaningless stores and
this behavior can leave unused (dead) arguments.
The test shows that the arguments are not removed in the current
optimization pipeline.
Use a common ConstantFoldInstOperands-based constant folding
implementation, instead of specifying the folding function for
each function individually. Going through the generic handling
doesn't appear to have any significant compile-time impact.
As the test change shows, this is not NFC, because we now use
DataLayout-aware constant folding, which can do slightly better
in some cases (e.g. those involving GEPs).
In some cases, there may be widened users of inductions even though the
plan includes the scalar VF. In those cases, make sure we still replace
the VPWidenIntOrFpInductionRecipe with scalar steps, as otherwise we may
try to execute a VPWidenIntOrFpInductionRecipe with a scalar VF.
Alternatively the patch could also split the range if needed.
This fixes a crash exposed by D123720.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D128755
Currently, we only remove dead blocks and non-feasible edges in
IPSCCP, but not in SCCP. I'm not aware of any strong reason for
that difference, so this patch updates SCCP to perform the CFG
cleanup as well.
Compile-time impact seems to be pretty minimal, in the 0.05%
geomean range on CTMark.
For the test case from https://reviews.llvm.org/D126962#3611579
the result after -sccp now looks like this:
define void @test(i1 %c) {
entry:
br i1 %c, label %unreachable, label %next
next:
unreachable
unreachable:
call void @bar()
unreachable
}
-jump-threading does nothing on this, but -simplifycfg will produce
the optimal result.
Differential Revision: https://reviews.llvm.org/D128796
enabled
The C++20 Coroutines couldn't be compiled to WebAssembly due to an
optimization named symmetric transfer requires the support for musttail
calls but WebAssembly doesn't support it yet.
This patch tries to fix the problem by adding a supportsTailCalls
method to TargetTransformImpl to skip the symmetric transfer when
tail-call feature is not supported.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D128794
The filter clause in the landingpad may not have a GlobalVariable operand.
It may instead have a ConstantArray of operands and each operand within this
ConstantArray should also be checked to see if it is a GlobalVariable.
This patch add the check for the ConstantArray as well as a debug message that
outputs the contents of MustKeepGlobalVariables.
Reviewed By: lei, amyk, scui
Differential Revision: https://reviews.llvm.org/D128287
ConnectProlog adds new incoming values to exit phi nodes which can
change the SCEV for the phi after 20d798bd47.
Fix is analog to cfc741bc0e.
Fixes#56286.
ConnectEpilog adds new incoming values to exit phi nodes which can
change the SCEV for the phi after 20d798bd47.
Fix is analog to cfc741bc0e.
Fixes#56282.
This test demonstrates how sinking down gc.relocate may lead to breach
of LCSSA form by tokens and, consecutively, end up with SSA breach by
LoopSimplifyCFG which creates fake edges and is unable to update missing
LCSSA phis for tokens used outside of the loop.
The first test shows that combineRangeChecks may choose to keep only two poison conditions.
And we cannot do simple arithmetic or logical and in guard.
The second test shows that keeping two poison conditions in the widened guard may allow
execution of side-effect instruction even if just freeze these conditions.
The third test shows that even in simple test we can hoist a poison and even logical and does not help here.
Reviewed By: mkazantsev
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D128779
Extend the solution accepted in D127766 to strncmp and simplify
strncmp(A, B, N) calls with constant A and B and variable N to
the equivalent of
N <= Pos ? 0 : (A < B ? -1 : B < A ? +1 : 0)
where Pos is the offset of either the first mismatch between A
and B or the terminating null character if both A and B are equal
strings.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D128089
Enhance getConstantDataArrayInfo to let the memchr and memcmp library
call folders look through arbitrarily long sequences of bitcast and
GEP instructions.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D128364
If the root order itself does not require reordering, we can just
remove its reorder mask safely (e.g., if the root node is a vector of
phis). But if this node is used as an operand in the graph, we cannot
delete the reordering, need to keep it. Otherwise the graph nodes are
not synchronized with the operands. It may cause an extra gather
instruction(s) or a compiler crash.
Also, need to be very careful when selecting the gather nodes for
reordering since there might several gather nodes with the same scalars
and we can try to reorder just the same node many times instead of
different nodes.
Differential Revision: https://reviews.llvm.org/D128680
It makes sense to handle byval promotion in the same way as non-byval
but also allowing `store` instructions. However, these should
use the same checks as the `load` instructions do, i.e. be part of the
`ArgsToPromote` collection. For these instructions, the check for
interfering modifications can be disabled, though. The promotion
algorithm itself has been modified a lot: all the accesses (i.e. loads
and stores) are rewritten to the emitted `alloca` instructions. To
optimize these new `alloca`s out, the `PromoteMemToReg` function from
`Transforms/Utils/PromoteMemoryToRegister.cpp` file is invoked after
promotion.
In order to let the `PromoteMemToReg` promote as many `alloca`s as it
is possible, there should be no `GEP`s from the `alloca`s. To
eliminate the `GEP`s, its own `alloca` is generated for every argument
part because a single `alloca` for the whole argument (that
significantly simplifies the code of the pass though) unfortunately
cannot be used.
The idea comes from the following discussion:
https://reviews.llvm.org/D124514#3479676
Differential Revision: https://reviews.llvm.org/D125485
This removes the extractvalue constant expression, as part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
extractvalue is already not supported in bitcode, so we do not need
to worry about bitcode auto-upgrade.
Uses of ConstantExpr::getExtractValue() should be replaced with
IRBuilder::CreateExtractValue() (if the fact that the result is
constant is not important) or ConstantFoldExtractValueInstruction()
(if it is). Though for this particular case, it is also possible
and usually preferable to use getAggregateElement() instead.
The C API function LLVMConstExtractValue() is removed, as the
underlying constant expression no longer exists. Instead,
LLVMBuildExtractValue() should be used (which will constant fold
or create an instruction). Depending on the use-case,
LLVMGetAggregateElement() may also be used instead.
Differential Revision: https://reviews.llvm.org/D125795
This is another attempt to land this patch.
The patch proposed to use a new cost model for loop interchange,
which is obtained from loop cache analysis.
Given a loopnest, what loop cache analysis returns is a vector of
loops [loop0, loop1, loop2, ...] where loop0 should be replaced as
the outermost loop, loop1 should be placed one more level inside, and
loop2 one more level inside, etc. What loop cache analysis does is not
only more comprehensive than the current cost model, it is also a "one-shot"
query which means that we only need to query it once during the entire
loop interchange pass, which is better than the current cost model where
we query it every time we check whether it is profitable to interchange
two loops. Thus complexity is reduced, especially after D120386 where we
do more interchanges to get the globally optimal loop access pattern.
Updates made to test cases are mostly minor changes and some
corrections. One change that applies to all tests is that we added an option
`-cache-line-size=64` to the RUN lines. This is ensure that loop
cache analysis receives a valid number of cache line size for correct
analysis. Test coverage for loop interchange is not reduced.
Currently we did not completely remove the legacy cost model, but
keep it as fall-back in case the new cost model did not run successfully.
This is because currently we have some limitations in delinearization, which
sometimes makes loop cache analysis bail out. The longer term goal is to
enhance delinearization and eventually remove the legacy cost model
compeletely.
Reviewed By: bmahjour, #loopoptwg
Differential Revision: https://reviews.llvm.org/D124926
This change is a bit subtle. If we have a type like <vscale x 1 x i64>, the vectorizer will currently reject vectorization. The reason is that a type like <1 x i64> is likely to get simply rescalarized, and the vectorizer doesn't want to be in the game of simple unrolling.
(I've given the example in terms of 1 x types which use a single register, but the same issue exists for any N x types which use N registers. e.g. RISCV LMULs.)
This change distinguishes scalable types from fixed types under the reasoning that converting to a scalable type isn't unrolling. Because the actual vscale isn't known until runtime, using a vscale type is potentially very profitable.
This makes an important, but unchecked, assumption. Specifically, the scalable type is assumed to only be legal per the cost model if there's actually a scalable register class which is distinct from the scalar domain. This is, to my knowledge, true for all targets which return non-invalid costs for scalable vector ops today, but in theory, we could have a target decide to lower scalable to fixed length vector or even scalar registers. If that ever happens, we'd need to revisit this code.
In practice, this patch unblocks scalable vectorization for ELEN types on RISCV.
Let me sketch one alternate implementation I considered. We could have restricted this to when we know a minimum value for vscale. Specifically, for the default +v extension for RISCV, we actually know that vscale >= 2 for ELEN types. However, doing it this way means we can't generate scalable vectors when using the various embedded vector extensions which have a minimum vscale of 1.
Differential Revision: https://reviews.llvm.org/D128542