This solves a problem with non-deterministic output from opt due
to not performing dominator tree updates in a deterministic order.
The problem that was analysed indicated that JumpThreading was using
the DomTreeUpdater via llvm::MergeBasicBlockIntoOnlyPred. When
preparing the list of updates to send to DomTreeUpdater::applyUpdates
we iterated over a SmallPtrSet, which didn't give a well-defined
order of updates to perform.
The added domtree-updates.ll test case is an example that would
result in non-deterministic printouts of the domtree. Semantically
those domtree:s are equivalent, but it show the fact that when we
use the domtree iterator the order in which nodes are visited depend
on the order in which dominator tree updates are performed.
Since some passes (at least EarlyCSE) are iterating over nodes in the
dominator tree in a similar fashion as the domtree printer, then the
order in which transforms are applied by such passes, transitively,
also depend on the order in which dominator tree updates are
performed. And taking EarlyCSE as an example the end result could be
different depending on in which order the transforms are applied.
Reviewed By: nikic, kuhar
Differential Revision: https://reviews.llvm.org/D110292
In most common cases the @llvm.get.active.lane.mask intrinsic maps directly
to the SVE whilelo instruction, which already takes overflow into account.
However, currently in SelectionDAGBuilder::visitIntrinsicCall we always lower
this immediately to a generic sequence of instructions that explicitly
take overflow into account. This makes it very difficult to then later
transform back into a single whilelo instruction. Therefore, this patch
introduces a new TLI function called shouldExpandGetActiveLaneMask that asks if
we should lower/expand this to a sequence of generic ISD nodes, or instead
just leave it as an intrinsic for the target to lower.
You can see the significant improvement in code quality for some of the
tests in this file:
CodeGen/AArch64/active_lane_mask.ll
Differential Revision: https://reviews.llvm.org/D114542
With the recently introduced tracking as well as D113349, we can
greatly simplify forgetSymbolicName(). In fact, we can simply
replace it with forgetMemoizedResults().
What forgetSymbolicName() used to do is to walk the IR use-def
chain to find all SCEVs that mention the SymbolicName. However,
thanks to use tracking, we can now determine the relevant SCEVs
in a more direct way. D113349 is needed to also clear out the
actual IR to SCEV mapping in ValueExprMap.
Differential Revision: https://reviews.llvm.org/D114263
After backedge taken counts have been calculated, we want to
invalidate all addrecs and dependent expressions in the loop,
because we might compute better results with the newly available
backedge taken counts. Previously this was done with a forgetLoop()
style use-def walk. With recent improvements to SCEV invalidation,
we can instead directly invalidate any SCEVs using addrecs in this
loop. This requires a great deal less subtlety to avoid invalidating
more than necessary, and in particular gets rid of the hack from
D113349. The change is similar to D114263 in spirit.
Relative to the previous landing attempt, this introduces an additional
flag on forgetMemoizedResults() to not remove SCEVUnknown phis from
the value map. The invalidation after BECount calculation wants to
leave these alone and skips them in its own use-def walk, but we can
still end up invalidating them via forgetMemoizedResults() if there
is another IR value with the same SCEV. This is intended as a temporary
workaround only, and the need for this should go away once the
getBackedgeTakenInfo() invalidation is refactored in the spirit of
D114263.
-----
This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:
* Addrec construction directly wrote to ValueExprMap in a few places,
without updating ExprValueMap. Add a helper to ensures they stay
consistent. The adjustment in forgetSymbolicName() explicitly
drops the old value from the map, so that we don't rely on it
being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
ExprValueMap, but not dropping the corresponding entries from
ValueExprMap.
Differential Revision: https://reviews.llvm.org/D113349
Relative to the previous landing attempt, this makes
insertValueToMap() resilient against the value already being
present in the map -- previously I only checked this for the
createSimpleAffineAddRec() case, but the same issue can also
occur for the general createNodeForPHI(). In both cases, the
addrec may be constructed and added to the map in a recursive
query trying to create said addrec. In this case, this happens
due to the invalidation when the BE count is computed, which
ends up clearing out the symbolic name as well.
-----
This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:
* Addrec construction directly wrote to ValueExprMap in a few places,
without updating ExprValueMap. Add a helper to ensures they stay
consistent. The adjustment in forgetSymbolicName() explicitly
drops the old value from the map, so that we don't rely on it
being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
ExprValueMap, but not dropping the corresponding entries from
ValueExprMap.
Differential Revision: https://reviews.llvm.org/D113349
This change folds a basic funnel shift idiom:
- (or (shl x, amt), (lshr y, sub(bw, amt))) -> fshl(x, y, amt)
- (or (shl x, sub(bw, amt)), (lshr y, amt)) -> fshr(x, y, amt)
This also helps in folding to rotate shift if x and y are equal since we
already have a funnel shift to rotate combine.
Differential Revision: https://reviews.llvm.org/D114499
This reverts commit 71a7c55f0f.
The revert broken building llvm-reduce and it is not clear it fixes an
issue with LLVM_ENABLE_THREADS=OFF.
See discussion in https://reviews.llvm.org/D114183 for more details.
Introduced/discussed in https://reviews.llvm.org/D38719
The header validation logic was also explicitly building the DWARFUnits
to validate. But then other calls, like "Units.getUnitForOffset" creates
the DWARFUnits again in the DWARFContext proper - so, let's avoid
creating the DWARFUnits twice by walking the DWARFContext's units rather
than building a new list explicitly.
This does reduce some verifier power - it means that any unit with a
header parsing failure won't get further validation, whereas the
verifier-created units were getting some further validation despite
invalid headers. I don't think this is a great loss/seems "right" in
some ways to me that if the header's invalid we should stop there.
Exposing the raw DWARFUnitVectors from DWARFContext feels a bit
sub-optimal, but gave simple access to the getUnitForOffset to keep the
rest of the code fairly similar.
Almost all of the time, call instructions don't actually lead to SP being
different after they return. An exception is win32's _chkstk, which which
implements stack probes. We need to recognise that as modifying SP, so
that copies of the value are tracked as distinct vla pointers.
This patch adds a target frame-lowering hook to see whether stack probe
functions will modify the stack pointer, store that in an internal flag,
and if it's true then scan CALL instructions to see whether they're a
stack probe. If they are, recognise them as defining a new stack-pointer
value.
The added test exercises this behaviour: two calls to _chkstk should be
considered as producing two different values.
Differential Revision: https://reviews.llvm.org/D114443
In-loop vector reductions which use the llvm.fmuladd intrinsic involve
the creation of two recipes; a VPReductionRecipe for the fadd and a
VPInstruction for the fmul. If the call to llvm.fmuladd has fast-math flags
these should be propagated through to the fmul instruction, so an
interface setFastMathFlags has been added to the VPInstruction class to
enable this.
Differential Revision: https://reviews.llvm.org/D113125
AMDGPU is unusual in that the both stack is indexed in the same
direction as stack growth (up). We therefore always need the emergency
stack slots placed as low as possible to ensure they are in range of
load/store instruction immediate offsets. The existing logic is mostly
OK, but failed if we required stack realignment.
I don't understand what the existing control isFPCloseToIncomingSP is
supposed to mean, but can only be used to stop placing the scavenge
slots earlier. Make this explicit so that targets can opt-in rather
than opt-out only.
The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm, called profi, that helps to
overcome the inaccuracies in a profile after it is collected.
Profi is an extended and significantly re-engineered classic MCMF (min-cost
max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing
missing and inaccurate profiling using a minimum cost circulation algorithm]. It
models profile inference as an optimization problem on a control-flow graph with
the objectives and constraints capturing the desired properties of profile data.
Three important challenges that are being solved by profi:
- "fixing" errors in profiles caused by sampling;
- converting basic block counts to edge frequencies (branch probabilities);
- dealing with "dangling" blocks having no samples in the profile.
The main implementation (and required docs) are in SampleProfileInference.cpp.
The worst-time complexity is quadratic in the number of blocks in a function,
O(|V|^2). However a careful engineering and extensive evaluation shows that
the running time is (slightly) super-linear. In particular, instances with
1000 blocks are solved within 0.1 second.
The algorithm has been extensively tested internally on prod workloads,
significantly improving the quality of generated profile data and providing
speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it
generally improves the performance (with a few outliers) but extra work in
the compiler might be needed to re-tune existing optimization passes relying on
profile counts.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D109860
Support for builtins that use bcdadd./bcdsub. to add/subtract
Binary Coded Decimal values as well as to determine validity
and compare BCD values.
Differential revision: https://reviews.llvm.org/D114088
The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm, called profi, that helps to
overcome the inaccuracies in a profile after it is collected.
Profi is an extended and significantly re-engineered classic MCMF (min-cost
max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing
missing and inaccurate profiling using a minimum cost circulation algorithm]. It
models profile inference as an optimization problem on a control-flow graph with
the objectives and constraints capturing the desired properties of profile data.
Three important challenges that are being solved by profi:
- "fixing" errors in profiles caused by sampling;
- converting basic block counts to edge frequencies (branch probabilities);
- dealing with "dangling" blocks having no samples in the profile.
The main implementation (and required docs) are in SampleProfileInference.cpp.
The worst-time complexity is quadratic in the number of blocks in a function,
O(|V|^2). However a careful engineering and extensive evaluation shows that
the running time is (slightly) super-linear. In particular, instances with
1000 blocks are solved within 0.1 second.
The algorithm has been extensively tested internally on prod workloads,
significantly improving the quality of generated profile data and providing
speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it
generally improves the performance (with a few outliers) but extra work in
the compiler might be needed to re-tune existing optimization passes relying on
profile counts.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D109860
Usage and naming of macros in VPIntrinsics.def has been inconsistent. Rename all property macros to VP_PROPERTY_<name>. Use BEGIN/END scope macros to attach properties to vp intrinsics and SDNodes (instead of specifying either directly with the property macro).
A follow-up patch has documentation on how the macros are (intended) to be used.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D114144
The only users of returned futures from ThreadPool is llvm-reduce after
D113857.
There should be no cases where multiple threads wait on the same future,
so there should be no need to return std::shared_future<>. Instead return
plain std::future<>.
If users need to share a future between multiple threads, they can share
the futures themselves.
Reviewed By: Meinersbur, mehdi_amini
Differential Revision: https://reviews.llvm.org/D114363
A masked load or store will load a potentially unknown number of bytes
from a memory location - that is not generally known at compile time.
They do not necessarily load/store the entire vector width, and treating
them as such can lead to incorrect aliasing information (for example, if
the underlying object is smaller than the size of the vector).
This makes sure that the MMO is given an unknown size to represent this.
which is less accurate that "may load/store from up to 16 bytes", but
less incorrect that "will load/store from 16 bytes".
Differential Revision: https://reviews.llvm.org/D113888
This patch adjusts ThreadPool::async to return futures that wrap
the result type of the passed in callable.
To do so, ThreadPool::asyncImpl first creates a shared promise. The
result of the promise is set in a new callable that first executes the
task. The callable is added to the task queue.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D114183
This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:
* Addrec construction directly wrote to ValueExprMap in a few places,
without updating ExprValueMap. Add a helper to ensures they stay
consistent. The adjustment in forgetSymbolicName() explicitly
drops the old value from the map, so that we don't rely on it
being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
ExprValueMap, but not dropping the corresponding entries from
ValueExprMap.
Differential Revision: https://reviews.llvm.org/D113349
[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with main in `IntrinsicsNVVM.td`.
Reviewed By: steffenlarsen
Differential Revision: https://reviews.llvm.org/D114193
For tagged-globals, we only need to disable relaxation for globals that
we actually tag. With this patch function pointer relocations, which
we do not instrument, can be relaxed.
This patch also makes tagged-globals work properly with LTO, as
-Wa,-mrelax-relocations=no doesn't work with LTO.
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D113220
Not only RISCV but also other target such as CSKY, there are compressed instructions mixed with normal instructions.
To reuse the basic infra to compress/uncompress and predict instruction, we need reconstruct the RISCVCompressInstEmitter
and make it more general and suitable for other target.
Differential Revision: https://reviews.llvm.org/D113475