Summary: masked_load and masked_store instructions require the alignment to be specified and a power of two. It seems to me that this requirement applies to masked_gather and masked_scatter as well.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73179
Patch by Chris Chrulski
When generating value profiling instrumentation, ensure the call gets the
correct funclet token, otherwise WinEHPrepare will turn the call (and all
subsequent instructions) into unreachable.
Differential Revision: https://reviews.llvm.org/D73221
Patch by Chris Chrulski
This fixes a problem with the current behavior when assertions are enabled.
A loop that exits to a catchswitch instruction is skipped for the counter
promotion, however this check was being done after the PGOCounterPromoter
tried to collect an insertion point for the exit block. A call to
getFirstInsertionPt() on a block that begins with a catchswitch instruction
triggers an assertion. This change performs a check whether the counter
promotion is possible prior to collecting the ExitBlocks and InsertPts.
Differential Revision: https://reviews.llvm.org/D73222
The codegen for splitting a llvm.vector.reduction intrinsic into parts
will be better than the codegen for the generic reductions. This will
only directly effect when vectorization factors are specified by the
user.
Also added tests to make sure the codegen for larger reductions is OK.
Differential Revision: https://reviews.llvm.org/D72257
This fixes a bug where a PHI node that is only referenced by a lifetime.end intrinsic in an otherwise empty cleanuppad can cause SimplyCFG to create an SSA violation while removing the empty cleanuppad. Theoretically the same problem can occur with debug intrinsics.
Differential Revision: https://reviews.llvm.org/D72540
When we use information only to short-cut deduction or improve it, we
can use OPTIONAL dependences instead of REQUIRED ones to avoid cascading
pessimistic fixpoints.
We also need to track dependences only when we use assumed information,
e.g., we act on assumed liveness information.
When we follow uses, e.g., in AAMemoryBehavior or AANoCapture, we need
to make sure the value is a pointer before we ask for abstract
attributes only valid for pointers. This happens because we follow
pointers through calls that do not capture but may return the value.
Summary:
Third part in series to support Safe Whole Program Devirtualization
Enablement, see RFC here:
http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html
This patch adds type test metadata under -fwhole-program-vtables,
even for classes without hidden visibility. It then changes WPD to skip
devirtualization for a virtual function call when any of the compatible
vtables has public vcall visibility.
Additionally, internal LLVM options as well as lld and gold-plugin
options are added which enable upgrading all public vcall visibility
to linkage unit (hidden) visibility during LTO. This enables the more
aggressive WPD to kick in based on LTO time knowledge of the visibility
guarantees.
Support was added to all flavors of LTO WPD (regular, hybrid and
index-only), and to both the new and old LTO APIs.
Unfortunately it was not simple to split the first and second parts of
this part of the change (the unconditional emission of type tests and
the upgrading of the vcall visiblity) as I needed a way to upgrade the
public visibility on legacy WPD llvm assembly tests that don't include
linkage unit vcall visibility specifiers, to avoid a lot of test churn.
I also added a mechanism to LowerTypeTests that allows dropping type
test assume sequences we now aggressively insert when we invoke
distributed ThinLTO backends with null indexes, which is used in testing
mode, and which doesn't invoke the normal ThinLTO backend pipeline.
Depends on D71907 and D71911.
Reviewers: pcc, evgeny777, steven_wu, espindola
Subscribers: emaste, Prazek, inglorion, arichardson, hiraditya, MaskRay, dexonsmith, dang, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71913
We currently use integer ranges to merge concrete function arguments.
We use the ParamState range for those, but we only look up concrete
values in the regular state. For concrete function arguments that are
themselves arguments of the containing function, we can use the param
state directly and improve the precision in some cases.
Besides improving the results in some cases, this is also a small step towards
switching to ValueLatticeElement, by allowing D60582 to be a NFC.
Reviewers: efriedma, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D71836
Summary:
First patch to support Safe Whole Program Devirtualization Enablement,
see RFC here: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html
Always emit !vcall_visibility metadata under -fwhole-program-vtables,
and not just for -fvirtual-function-elimination. The vcall visibility
metadata will (in a subsequent patch) be used to communicate to WPD
which vtables are safe to devirtualize, and we will optionally convert
the metadata to hidden visibility at link time. Subsequent follow on
patches will help enable this by adding vcall_visibility metadata to the
ThinLTO summaries, and always emit type test intrinsics under
-fwhole-program-vtables (and not just for vtables with hidden
visibility).
In order to do this safely with VFE, since for VFE all vtable loads must
be type checked loads which will no longer be the case, this patch adds
a new "Virtual Function Elim" module flag to communicate to GlobalDCE
whether to perform VFE using the vcall_visibility metadata.
One additional advantage of using the vcall_visibility metadata to drive
more WPD at LTO link time is that we can use the same mechanism to
enable more aggressive VFE at LTO link time as well. The link time
option proposed in the RFC will convert vcall_visibility metadata to
hidden (aka linkage unit visibility), which combined with
-fvirtual-function-elimination will allow it to be done more
aggressively at LTO link time under the same conditions.
Reviewers: pcc, ostannard, evgeny777, steven_wu
Subscribers: mehdi_amini, Prazek, hiraditya, dexonsmith, davidxl, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71907
Calling `operator*` on a WeakVH with a null value yields a null
reference, which is UB. Avoid this by implicitly converting the WeakVH
to a `Value *` rather than dereferencing and then taking the address
for the type conversion.
Differential Revision: https://reviews.llvm.org/D73280
In case of loops with multiple exit where all-but-one exit are deoptimizing
it might happen that the first rotation will end up with latch having a deoptimizing
exit. This makes the loop unsuitable for trip-count analysis (say, getLoopEstimatedTripCount)
as well as for loop transformations that know how to handle multple deoptimizing exits.
It pretty much means that canonical form in multple-deoptimizing-exits case should be
with non-deoptimizing exit at latch.
Teach loop-rotation to reach this canonical form by repeating rotation.
-loop-rotate-multi option introduced to control this behavior, currently disabled by default.
Reviewers: skatkov, asbirlea, reames, fhahn
Reviewed By: skatkov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73058
It is incorrect to call ValueHandleBase::ValueIsRAUWd when only one use
is replaced since it simply violates semantics of the callback and leads
to bugs like PR44320.
Previously this call was used specifically to keep LICM's cache of
AliasSetTrackers up to date across passes (as PR36801 showed, even for
that purpose it didn't work properly), but since LICM doesn't have that
cache anymore, we can safely remove this incorrect call with no
repercussions.
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=44320
Reviewers: asbirlea, fhahn, efriedma, reames
Reviewed-By: asbirlea
Differential Revision: https://reviews.llvm.org/D73089
Currently due to the edge caching, we create wrong predicates for
branches with matching true and false successors. We will cache the
condition for the edge from the true successor, and then lookup the same
edge (src and dst are the same) for the edge to the false successor.
If both successors match, the condition should always be true. At the
moment, we cannot really create constant VPValues, but we can just
create a true condition as X | !X. Later passes will clean that up.
Fixes PR44488.
Reviewers: rengolin, hsaito, fhahn, Ayal, dorit, gilr
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D73079
This patch adds a custom implementation of isLegalNTStore to AArch64TTI
that supports vector types that can be directly stored by STNP. Note
that the implementation may not catch all valid cases (e.g. because the
vector is a multiple of 256 and could be broken down to multiple valid 256 bit
stores), but it is good enough for LV to vectorize loops with NT stores,
as LV only passes in a vector with 2 elements to check. LV seems to also
be the only user of isLegalNTStore.
We should also do the same for NT loads, but before that we need to
ensure that we properly lower LDNP of vectors, similar to D72919.
Reviewers: dmgreen, samparker, t.p.northover, ab
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D73158
Fixes https://bugs.llvm.org/show_bug.cgi?id=44529. We already have
a combine to sink a negation through a left-shift, but it currently
only works if the shift operand is negatable without creating any
instructions. This patch introduces freelyNegateValue() as a more
powerful extension of dyn_castNegVal(), which allows negating a
value as long as this doesn't end up increasing instruction count.
Specifically, this patch adds support for negating A-B to B-A.
This mechanism could in the future be extended to handle general
negation chains that a) start at a proper 0-X negation and b) only
require one operand to be freely negatable. This would end up as a
weaker form of D68408 aimed at the most obviously profitable subset
that eliminates a negation entirely.
Differential Revision: https://reviews.llvm.org/D72978
This is 1 of the potential folds uncovered by extending D72521.
We don't seem to do this in the backend either (unless I'm not
seeing some target-specific transform).
icc and gcc (appears to be target-specific) do this transform.
Differential Revision: https://reviews.llvm.org/D73057
Summary:
We don't have control/verify what will be the RHS of the division, so it might
happen to be zero, causing UB.
Reviewers: Vasilis, RKSimon, ABataev
Reviewed By: ABataev
Subscribers: vporpo, ABataev, hiraditya, llvm-commits, vdmitrie
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72740
Summary:
New `@test13` in `Attributor/align.ll` is the main motivation - `null` pointer
really does not limit our alignment knowledge, in fact it is fully aligned
since it has no bits set.
Here we don't special-case `null` pointer because it is somewhat controversial
to add one more place where we enforce that `null` pointer is zero,
but instead we do the more general thing of trying to perform constant-fold
of pointer constant to an integer, and perform alignment inferrment on that.
Reviewers: jdoerfert, gchatelet, courbet, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, arphaman, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73131
This should be the last step needed to solve the problem in the
description of PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
If we're casting an FP value to int, testing its signbit, and then
choosing between a value and its negated value, that's a
complicated way of saying "copysign":
(bitcast X) < 0 ? -TC : TC --> copysign(TC, X)
Differential Revision: https://reviews.llvm.org/D72643
As mentioned in D72643, we'd like to be able to assert that any select
of equivalent constants has been removed before we're deep into InstCombine.
But there's a loophole in that assertion for vectors with undef elements
that don't match exactly.
This patch should close that gap. If we have undefs, we can't safely
propagate those unless both constants elements for that lane are undef.
Differential Revision: https://reviews.llvm.org/D72958
Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly.
Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov
Reviewed By: Ayal, DaniilSuchkov
Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67905
Summary: Current implementation of getLoopEstimatedTripCount returns 1 iteration less than it should. The reason is that in bottom tested loop first iteration is executed before first back branch is taken. For example for loop with !{!"branch_weights", i32 1 // taken, i32 1 // exit} metadata getLoopEstimatedTripCount gives 1 while actual number of iterations is 2.
Reviewers: Ayal, fhahn
Reviewed By: Ayal
Subscribers: mgorny, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71990
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.
- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute
AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).
Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.
Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.
-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.
Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.
This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.
AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
During the SeparateConstOffsetFromGEP pass, signed extensions are distributed
to the values that feed into them and then later recombined. The recombination
stage is somewhat problematic- it doesn't differ add and sub instructions
from another when matching the sext(a) +/- sext(b) -> sext(a +/- b) pattern
in some instances.
An example- the IR contains:
%unextendedA
%unextendedB
%subuAuB = unextendedA - unextendedB
%extA = extend A
%extB = extend B
%addeAeB = extA + extB
The problematic optimization will transform that into:
%unextendedA
%unextendedB
%subuAuB = unextendedA - unextendedB
%extA = extend A
%extB = extend B
%addeAeB = extend subuAuB ; Obviously not semantically equivalent to the IR input.
This patch fixes that.
Patch by Drew Wock <drew.wock@sas.com>
Differential Revision: https://reviews.llvm.org/D65967
Fixes https://bugs.llvm.org/show_bug.cgi?id=44552. We need to make
sure that the store is reprocessed, because performing DSE may
expose more DSE opportunities.
There is a slight caveat here though: We need to make sure that we
add back the store the worklist first, because that means it will
be processed after the operands of the removed store have been
processed. This is a general bug in InstCombine worklist management
that I hope to address at some point, but for now it means we need
to do this manually rather than just returning the instruction as
changed.
Differential Revision: https://reviews.llvm.org/D72807
There are two related bugs here: First, we don't add the operand
we're replacing to the worklist, which means it may not get DCEd
(see test change). Second, usually this would just get picked up
in the next iteration, but we also do not report the instruction
as changed. This means that we do not get that extra instcombine
iteration, and more importantly, may break the pass pipeline, as
the function is not marked as changed.
Differential Revision: https://reviews.llvm.org/D72864
Currently, there is no way to disable ExpensiveCombines when doing
a standalone opt -instcombine run, as that's the default, and the
opt option can currently only be used to force enable, not to force
disable. The only way to disable expensive combines is via -O1 or -O2,
but that of course also runs the rest of the kitchen sink...
This patch allows using opt -instcombine -expensive-combines=0 to
run InstCombine without ExpensiveCombines.
Differential Revision: https://reviews.llvm.org/D72861
Blindly following unique-successors chain appeared to be a bad idea.
In a degenerate case when block jumps to itself that goes into endless loop.
Discovered this problem when playing with additional changes,
managed to reproduce it on existing LoopPredication code.
Fix by checking a "visited" set while iterating through unique successors.
Reviewed By: skatkov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72908
Introduce parsing, add a few instances of parameter use into GVN-PRE tests.
Reviewers: skatkov, asbirlea
Reviewed By: skatkov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72752