Summary: with this patch the assume salvageKnowledge will not generate assume if all knowledge is already available in an assume with valid context. assume bulider can also in some cases update an existing assume with better information.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78014
Currently LAA's uses of ScalarEvolutionExpander blocks moving the
expander from Analysis to Transforms. Conceptually the expander does not
fit into Analysis (it is only used for code generation) and
runtime-check generation also seems to be better suited as a
transformation utility.
Reviewers: Ayal, anemet
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D78460
In order to reduce the API surface area (preparation for D78460), remove
a addRuntimeChecks() function and do the additional check in the single
caller.
Reviewers: Ayal, anemet
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D79679
them in a special text section.
For sampleFDO, because the optimized build uses profile generated from
previous release, previously we couldn't tell a function without profile
was truely cold or just newly created so we had to treat them conservatively
and put them in .text section instead of .text.unlikely. The result was when
we persuing the best performance by locking .text.hot and .text in memory,
we wasted a lot of memory to keep cold functions inside.
In https://reviews.llvm.org/D66374, we introduced profile symbol list to
discriminate functions being cold versus functions being newly added.
This mechanism works quite well for regular use cases in AutoFDO. However,
in some case, we can only have a partial profile when optimizing a target.
The partial profile may be an aggregated profile collected from many targets.
The profile symbol list method used for regular sampleFDO profile is not
applicable to partial profile use case because it may be too large and
introduce many false positives.
To solve the problem for partial profile use case, we provide an option called
--profile-unknown-in-special-section. For functions without profile, we will
still treat them conservatively in compiler optimizations -- for example,
treat them as warm instead of cold in inliner. When we use profile info to
add section prefix for functions, we will discriminate functions known to be
not cold versus functions without profile (being unknown), and we will put
functions being unknown in a special text section called .text.unknown.
Runtime system will have the flexibility to decide where to put the special
section in order to achieve a balance between performance and memory saving.
Differential Revision: https://reviews.llvm.org/D62540
No changes relative to last time, but after a mitigation for
an AMDGPU regression landed.
---
If SimplifyInstruction() does not succeed in simplifying the
instruction, it will compute the known bits of the instruction
in the hope that all bits are known and the instruction can be
folded to a constant. I have removed a similar optimization
from InstCombine in D75801, and would like to drop this one as well.
On average, we spend ~1% of total compile-time performing this
known bits calculation. However, if we introduce some additional
statistics for known bits computations and how many of them succeed
in simplifying the instruction we get (on test-suite):
instsimplify.NumKnownBits: 216
instsimplify.NumKnownBitsComputed: 13828375
valuetracking.NumKnownBitsComputed: 45860806
Out of ~14M known bits calculations (accounting for approximately
one third of all known bits calculations), only 0.0015% succeed in
producing a constant. Those cases where we do succeed to compute
all known bits will get folded by other passes like InstCombine
later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test
show a hash difference after this change. On lencod we see an
improvement (a loop phi is optimized away), on the GCC torture
test a regression (a function return value is determined only
after IPSCCP, preventing propagation from a noinline function.)
There are various regressions in InstSimplify tests. However, all
of these cases are already handled by InstCombine, and corresponding
tests have already been added there.
Differential Revision: https://reviews.llvm.org/D79294
Summary:
This patch fix the following issues with visitExtractElementInst:
1. Restrict VectorUtils::findScalarElement to fixed-length vector.
For scalable type, the number of elements in shuffle mask is
unknown at compile-time.
2. Fix out-of-range calculation for fixed-length vector.
3. Skip scalable type when analysis rely on fixed number of elements.
4. Add unit tests to check functionality of extractelement for scalable type.
Reviewers: sdesmalen, efriedma, spatel, nikic
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78267
Summary:
This helps detect some missed BFI updates during CodeGenPrepare.
This is debug build only and disabled behind a flag.
Fix a missed update in CodeGenPrepare::dupRetToEnableTailCallOpts().
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77417
Summary:
Any function in this module that make use of DemandedElts laregely does
not work with scalable vectors. DemandedElts is used to define which
elements of the vector to look at. At best, for scalable vectors, we can
express the first N elements of the vector. However, in practice, most
code that uses these functions expect to be able to talk about the
entire vector. In principle, this module should be able to be extended
to work with scalable vectors. However, before we can do that, we should
ensure that it does not cause code with scalable vectors to miscompile.
All functions that use a DemandedElts will bail out if the vector is
scalable. Usages of getNumElements() are updated to go through
FixedVectorType pointers.
Reviewers: rengolin, efriedma, sdesmalen, c-rhodes, spatel
Reviewed By: efriedma
Subscribers: david-arm, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79053
The 'nsz' flag is different than 'nnan' or 'ninf' in that it does not create poison.
Make that explicit in the LangRef and fix ValueTracking analysis that misinterpreted
the definition.
This manifests as bugs in InstSimplify shown in the test diffs and as discussed in
PR45778:
https://bugs.llvm.org/show_bug.cgi?id=45778
Differential Revision: https://reviews.llvm.org/D79422
getScalarizationOverhead is only ever called with vectors (and we already had a load of cast<VectorType> calls immediately inside the functions).
Followup to D78357
Reviewed By: @samparker
Differential Revision: https://reviews.llvm.org/D79341
Make the kind of cost explicit throughout the cost model which,
apart from making the cost clear, will allow the generic parts to
calculate better costs. It will also allow some backends to
approximate and correlate the different costs if they wish. Another
benefit is that it will also help simplify the cost model around
immediate and intrinsic costs, where we currently have multiple APIs.
RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2020-April/141263.html
Differential Revision: https://reviews.llvm.org/D79002
If SimplifyInstruction() does not succeed in simplifying the
instruction, it will compute the known bits of the instruction
in the hope that all bits are known and the instruction can be
folded to a constant. I have removed a similar optimization
from InstCombine in D75801, and would like to drop this one as well.
On average, we spend ~1% of total compile-time performing this
known bits calculation. However, if we introduce some additional
statistics for known bits computations and how many of them succeed
in simplifying the instruction we get (on test-suite):
instsimplify.NumKnownBits: 216
instsimplify.NumKnownBitsComputed: 13828375
valuetracking.NumKnownBitsComputed: 45860806
Out of ~14M known bits calculations (accounting for approximately
one third of all known bits calculations), only 0.0015% succeed in
producing a constant. Those cases where we do succeed to compute
all known bits will get folded by other passes like InstCombine
later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test
show a hash difference after this change. On lencod we see an
improvement (a loop phi is optimized away), on the GCC torture
test a regression (a function return value is determined only
after IPSCCP, preventing propagation from a noinline function.)
There are various regressions in InstSimplify tests. However, all
of these cases are already handled by InstCombine, and corresponding
tests have already been added there.
Differential Revision: https://reviews.llvm.org/D79294
The more general fold was not poison-safe, so it was removed:
rG5486e00
...but it is ok to have this transform if analysis can determine
the vector contains no poison. The test shows a simple example
of that: constant integer elements are not poison.
PR45481:
https://bugs.llvm.org/show_bug.cgi?id=45481
SDAG has an identical transform to this, so there's little
chance of any real-world impact. OTOH, that means we are
effectively sweeping the bug out of sight because poison
exists in codegen too.
The assert checks that every instruction must be annotated by this point while it is not
necessary. If the inlining process was interrupted because the threshold was reached, the rest
of the instructions would not be annotated which triggers the assert.
The added test shows the situation in which it can happen.
This is a recommit as the original commit fail due to the absence of REQUIRES: assert in the test.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D79107
Summary:
By design 'BranchProbabilityInfo:: getEdgeProbability(const BasicBlock *Src, const BasicBlock *Dst) const' should return sum of probabilities over all edges from Src to Dst. Current implementation is buggy and returns 1/num_of_successors if probabilities are not explicitly set.
Note current implementation of BPI printing has an issue as well and annotates each edge with sum of probabilities over all ages from one basic block to another. That's why 30% probability reported (instead of 10%) in the lit test. This is not urgent issue since only printing is affected.
Note also current implementation assumes that either all or none edges have probabilities set. This is not the only place which uses such assumption. At least we should assert that in verifier. In addition we can think on a more robust API of BPI which would prevent situations.
Reviewers: skatkov, yrouban, taewookoh
Reviewed By: skatkov
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79071
Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager.
Reviewers: skatkov, taewookoh, yrouban
Reviewed By: skatkov
Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78987
Summary:
A valid DominatorTree is needed to do PhiTranslation.
Before this patch, a MemoryUse could be optimized to an access outside a loop, while the address it loads from is modified in the loop.
This can lead to a miscompile.
Reviewers: george.burgess.iv
Subscribers: Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79068
The assert checks that every instruction must be annotated by this point while it is not
necessary. If the inlining process was interrupted because the threshold was reached, the rest
of the instructions would not be annotated which triggers the assert.
The added test shows the situation in which it can happen.
Reviewed-By: mtrofin
Diff: https://reviews.llvm.org/D79107
The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited.
This patch does 2 things:
1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern.
2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs.
This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing.
A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well.
Reviewed By: @craig.topper
Differential Revision: https://reviews.llvm.org/D78216
This allows forward declarations of PointerCheck, which in turn reduce
the number of times LoopAccessAnalysis needs to be included.
Ultimately this helps with moving runtime check generation to
Transforms/Utils/LoopUtils.h, without having to include it there.
Reviewers: anemet, Ayal
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D78458
There are several different types of cost that TTI tries to provide
explicit information for: throughput, latency, code size along with
a vague 'intersection of code-size cost and execution cost'.
The vectorizer is a keen user of RecipThroughput and there's at least
'getInstructionThroughput' and 'getArithmeticInstrCost' designed to
help with this cost. The latency cost has a single use and a single
implementation. The intersection cost appears to cover most of the
rest of the API.
getUserCost is explicitly called from within TTI when the user has
been explicit in wanting the code size (also only one use) as well
as a few passes which are concerned with a mixture of size and/or
a relative cost. In many cases these costs are closely related, such
as when multiple instructions are required, but one evident diverging
cost in this function is for div/rem.
This patch adds an argument so that the cost required is explicit,
so that we can make the important distinction when necessary.
Differential Revision: https://reviews.llvm.org/D78635
This method has been commented as deprecated for a while. Remove
it and replace all uses with the equivalent getCalledOperand().
I also made a few cleanups in here. For example, to removes use
of getElementType on a pointer when we could just use getFunctionType
from the call.
Differential Revision: https://reviews.llvm.org/D78882
Summary:
Extending the Function::viewCFG prototypes to allow for printing block probability info in form of .dot files during debugging.
Also avoiding an AV when no BFI/BPI available.
Reviewers: wenlei, davidxl, knaumov
Reviewed By: wenlei, davidxl
Subscribers: MaskRay, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77978
The change in D78624 had a noticeable negative compile-time impact.
It seems that going through a function call for the MaxUsesToExplore
default is fairly expensive, at least if LLVM is not built with LTO.
This patch makes MaxUsesToExpore default to 0 and assigns the actual
default in the implementation instead. This recovers most of the
regression.
Differential Revision: https://reviews.llvm.org/D78734
Summary:
This patch helps isGuaranteedNotToBeUndefOrPoison look into more constants and instructions (bitcast/alloca/gep/fcmp).
To deal with bitcast, Depth is added to isGuaranteedNotToBeUndefOrPoison.
This patch is splitted from https://reviews.llvm.org/D75808.
Checked with Alive2
Reviewers: reames, jdoerfert
Reviewed By: jdoerfert
Subscribers: sanwou01, spatel, llvm-commits, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76010
Currently an unknown/undef value is marked as overdefined when merged
with an empty range. An empty range can occur in unreachable/dead code.
When merging the new unknown state (= no value known yet) with an empty
range, there still isn't any information about the value yet and we can
stay in unknown.
This gives a few nice improvements on the number of instructions removed
by IPSCCP:
Same hash: 170 (filtered out)
Remaining: 67
Metric: sccp.IPNumInstRemoved
Program base patch diff
test-suite...rks/FreeBench/mason/mason.test 3.00 6.00 100.0%
test-suite...nchmarks/McCat/18-imp/imp.test 3.00 5.00 66.7%
test-suite...C/CFP2000/179.art/179.art.test 2.00 3.00 50.0%
test-suite...ijndael/security-rijndael.test 2.00 3.00 50.0%
test-suite...ks/Prolangs-C/agrep/agrep.test 40.00 58.00 45.0%
test-suite...ce/Applications/Burg/burg.test 26.00 37.00 42.3%
test-suite...cCat/03-testtrie/testtrie.test 3.00 4.00 33.3%
test-suite...Source/Benchmarks/sim/sim.test 29.00 36.00 24.1%
test-suite.../Applications/spiff/spiff.test 9.00 11.00 22.2%
test-suite...s/FreeBench/neural/neural.test 5.00 6.00 20.0%
test-suite...pplications/treecc/treecc.test 66.00 79.00 19.7%
test-suite...langs-C/football/football.test 85.00 101.00 18.8%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test 90.00 105.00 16.7%
test-suite...oxyApps-C++/miniFE/miniFE.test 37.00 43.00 16.2%
test-suite...rks/FreeBench/pifft/pifft.test 26.00 30.00 15.4%
test-suite...lications/sqlite3/sqlite3.test 481.00 548.00 13.9%
test-suite...marks/7zip/7zip-benchmark.test 4875.00 5522.00 13.3%
test-suite.../CINT2000/176.gcc/176.gcc.test 1117.00 1197.00 7.2%
test-suite...0.perlbench/400.perlbench.test 1618.00 1732.00 7.0%
Reviewers: efriedma, nikic, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78667
Summary:
refactor assume bulider for the next patch.
the assume builder now generate only one assume per attribute kind and per value they are on. to do this it takes the highest. this is desirable because currently, for all attributes the higest value is the most valuable.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78013
Summary:
llvm::getInlineCost starts off by determining whether inlining should
happen or not because of user directives or easily determinable
unviability. This CL refactors this functionality as a reusable API.
Reviewers: davidxl, eraman
Reviewed By: davidxl, eraman
Subscribers: hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73825
The motivation is to be able to play with the option and change if it is required.
Reviewers: fedor.sergeev, apilipenko, rnk, jdoerfert
Reviewed By: fedor.sergeev
Subscribers: hiraditya, dantrushin, llvm-commits
Differential Revision: https://reviews.llvm.org/D78624
Summary:
This is RFC for fixes in poison-related functions of ValueTracking.
These functions assume that a value can be poison bitwisely, but the semantics
of bitwise poison is not clear at the moment.
Allowing a value to have bitwise poison adds complexity to reasoning about
correctness of optimizations.
This patch makes the analysis functions simply assume that a value is
either fully poison or not, which has been used to understand the correctness
of a few previous optimizations.
The bitwise poison semantics seems to be only used by these functions as well.
In terms of implementation, using value-wise poison concept makes existing
functions do more precise analysis, which is what this patch contains.
Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr
Reviewed By: nikic
Subscribers: fhahn, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78503
Summary:
This is RFC for fixes in poison-related functions of ValueTracking.
These functions assume that a value can be poison bitwisely, but the semantics
of bitwise poison is not clear at the moment.
Allowing a value to have bitwise poison adds complexity to reasoning about
correctness of optimizations.
This patch makes the analysis functions simply assume that a value is
either fully poison or not, which has been used to understand the correctness
of a few previous optimizations.
The bitwise poison semantics seems to be only used by these functions as well.
In terms of implementation, using value-wise poison concept makes existing
functions do more precise analysis, which is what this patch contains.
Reviewers: spatel, lebedev.ri, jdoerfert, reames, nikic, nlopes, regehr
Reviewed By: nikic
Subscribers: fhahn, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78503
This API call has been used recently with, a very valid, expectation
that it would do something useful but it doesn't actually query any
backend information. So, remove this method and merge its
functionality into getUserCost. As well as that, also use
getCastInstrCost to get a proper cost from the backend for the
concerned instructions though we only currently return the answer if
it's considered free. The default implementation now also checks
int/ptr conversions too, as well as truncs and bitcasts.
Differential Revision: https://reviews.llvm.org/D76124
Use Instruction::comesBefore() instead of OrderedInstructions
inside InstructionPrecedenceTracking. This also removes the
dominator tree dependency.
Differential Revision: https://reviews.llvm.org/D78461
DelSet was used to avoid invalidating the current iterator while
modifying the map we are iterating over.
By using an early_inc_range, (which increments to iterator 'early',
allowing us to remove the current element), we can get rid of DelSet.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D78420
The API for shuffles and reductions uses generic Type parameters,
instead of VectorType, and so assertions and casts are used a lot.
This patch makes those types explicit, which means that the clients
can't be lazy, but results in less ambiguity, and that can only be a
good thing.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45562
Differential Revision: https://reviews.llvm.org/D78357
Drop visitCallInst/visitInvokeInst since the generic implementation
will delegate to visitCallBase. They appear to be from a time
before visitCallSite existed in the InstVisitor system.
Differential Revision: https://reviews.llvm.org/D78448
As suggested on D76788, this switches the LVI implementation to
return Optional<ValueLatticeElement> from various methods, instead
of passing in a ValueLatticeElement reference and returning a boolean.
Differential Revision: https://reviews.llvm.org/D78383
Cost-modeling decisions are tied to the compute interleave groups
(widening decisions, scalar and uniform values). When invalidating the
interleave groups, those decisions also need to be invalidated.
Otherwise there is a mis-match during VPlan construction.
VPWidenMemoryRecipes created initially are left around w/o converting them
into VPInterleave recipes. Such a conversion indeed should not take place,
and these gather/scatter recipes may in fact be right. The crux is leaving around
obsolete CM_Interleave (and dependent) markings of instructions along with
their costs, instead of recalculating decisions, costs, and recipes.
Alternatively to forcing a complete recompute later on, we could try
to selectively invalidate the decisions connected to the interleave
groups. But we would likely need to run the uniform/scalar value
detection parts again anyways and the extra complexity is probably not
worth it.
Fixes PR45572.
Reviewers: gilr, rengolin, Ayal, hsaito
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D78298
This patch combines the "has" and "get" parts of the cache access.
getCachedValueInfo() now both sets the BBLV return argument, and
returns whether the value was found.
Additionally, the management of the work stack is now integrated
into getBlockValue(). If the value is not cached yet, we try to
push to the stack (and return false, indicating that we need to
solve first), or return overdefined in case of a cycle.
These changes a) avoid a duplicate cache lookup for has & get and
b) ensure that the logic is uniform everywhere. For this reason
this change is also not quite NFC, because previously overdefined
values from the cache, and overdefined values from a cycle received
different treatment when it came to assumption intersection.
Differential Revision: https://reviews.llvm.org/D76788
I uploaded the old version accidentally instead of the one with these
minor adjustments requested by the reviewers.
Differential Revision: https://reviews.llvm.org/D77855
Summary:
We can and should remove deleted nodes from their respective SCCs. We
did not do this before and this was a potential problem even though I
couldn't locally trigger an issue. Since the `DeleteNode` would assert
if the node was not in the SCC, we know we only remove nodes from their
SCC and only once (when run on all the Attributor tests).
Reviewers: lebedev.ri, hfinkel, fhahn, probinson, wristow, loladiro, sstefan1, uenoku
Subscribers: hiraditya, bollu, uenoku, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77855
Summary:
While it is uncommon that the ExternalCallingNode needs to be updated,
it can happen. It is uncommon because most functions listed as callees
have external linkage, modifying them is usually not allowed. That said,
there are also internal functions that have, or better had, their
"address taken" at construction time. We conservatively assume various
uses cause the address "to be taken". Furthermore, the user might have
become dead at some point. As a consequence, transformations, e.g., the
Attributor, might be able to replace a function that is listed
as callee of the ExternalCallingNode.
Since there is no function corresponding to the ExternalCallingNode, we
did just remove the node from the callee list if we replaced it (so
far). Now it would be preferable to replace it if needed and remove it
otherwise. However, removing the node has implications on the CGSCC
iteration. Locally, that caused some other nodes to be never visited
but it is for sure possible other (bad) side effects can occur. As it
seems conservatively safe to keep the new node in the callee list we
will do that for now.
Reviewers: lebedev.ri, hfinkel, fhahn, probinson, wristow, loladiro, sstefan1, uenoku
Subscribers: hiraditya, bollu, uenoku, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77854
It can be used to avoid passing the begin and end of a range.
This makes the code shorter and it is consistent with another
wrappers we already have.
Differential revision: https://reviews.llvm.org/D78016
ModuleSummaryAnalysis is the only file in libAnalysis that brings a
dependency on the CodeGen layer from libAnalysis, moving it breaks this
dependency.
Differential Revision: https://reviews.llvm.org/D77994
Summary:
Share logic to strip debugify metadata between the IR and MIR level
debugify passes. This makes it simpler to hunt for bugs by diffing IR
with vs. without -debugify-each turned on.
As a drive-by, fix an issue causing CallGraphNodes to become invalid
when a dead llvm.dbg.value prototype is deleted.
Reviewers: dsanders, aprantl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77915
Remove SmallPtrSet include, replace with forward declaration and include SmallPtrSet.h in CodeMetrics.cpp directly.
Remove unused llvm::DataLayout/Instruction forward declarations.
Summary: change assumption cache to store an assume along with an index to the operand bundle containing the knowledge.
Reviewers: jdoerfert, hfinkel
Reviewed By: jdoerfert
Subscribers: hiraditya, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77402
This is similar to the recent move/addition of "scaleShuffleMask" (D76508),
but there are a couple of differences:
1. The existing x86 helper (canWidenShuffleElements) always tries to
divide-by-2, so it gets called iteratively and wouldn't handle the
general case of non-pow-2 length.
2. The existing x86 code handles "SM_SentinelZero" - we don't have
that in IR, but this code should be safe to use with that or other
special (negative) values.
The motivation is to enable shuffle folds in instcombine/vector-combine
that are similar to D76844 and D76727, but in the reverse-bitcast direction.
Those patterns are visible in the tests for D40633.
Differential Revision: https://reviews.llvm.org/D77881
As proposed in D77881, we'll have the related widening operation,
so this name becomes too vague.
While here, change the function signature to take an 'int' rather
than 'size_t' for the scaling factor, add an assert for overflow of
32-bits, and improve the documentation comments.
This reverts commit 60c642e74b.
This patch is making the TLI "closed" for a predefined set of VecLib
while at the moment it is extensible for anyone to customize when using
LLVM as a library.
Reverting while we figure out a way to re-land it without losing the
generality of the current API.
Differential Revision: https://reviews.llvm.org/D77925
Summary:
Don't attempt to analyze the decomposed GEP for scalable type.
GEP index scale is not compile-time constant for scalable type.
Be conservative, return MayAlias.
Explicitly call TypeSize::getFixedSize() to assert on places where
scalable type doesn't make sense.
Add unit tests to check functionality of -basicaa for scalable type.
This patch is needed for D76944.
Reviewers: sdesmalen, efriedma, spatel, bjope, ctetreau
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77828
Summary:
Encode `-fveclib` setting as per-function attribute so it can threaded through to LTO backends. Accordingly per-function TLI now reads
the attributes and select available vector function list based on that. Now we also populate function list for all supported vector
libraries for the shared per-module `TargetLibraryInfoImpl`, so each function can select its available vector list independently but without
duplicating the vector function lists. Inlining between incompatbile vectlib attributed is also prohibited now.
Subscribers: hiraditya, dexonsmith, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77632
Summary:
Remove usages of asserting vector getters in Type in preparation for the
VectorType refactor. The existence of these functions complicates the
refactor while adding little value.
Reviewers: sunfish, sdesmalen, efriedma
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77273
Ensured initialized fields; encapsulad delta calulations and evaluation
of threshold having had changed; assertion for CostThresholdMap
dereference, to indicate design intent.
Differential Revision: https://reviews.llvm.org/D77762
Summary:
There are at least three clients for KnownBits calculations:
ValueTracking, SelectionDAG and GlobalISel. To reduce duplication the
common logic should be moved out of these clients and into KnownBits
itself.
This patch does this for AND, OR and XOR calculations by implementing
and using appropriate operator overloads KnownBits::operator& etc.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74060
Use the simpler BitWidth constructor instead of the copy constructor to
make it clear when we don't actually need to copy an existing KnownBits
value. Split out from D74539. NFC.
Now compiler defines 5 sets of constants to represent rounding mode.
These are:
1. `llvm::APFloatBase::roundingMode`. It specifies all 5 rounding modes
defined by IEEE-754 and is used in `APFloat` implementation.
2. `clang::LangOptions::FPRoundingModeKind`. It specifies 4 of 5 IEEE-754
rounding modes and a special value for dynamic rounding mode. It is used
in clang frontend.
3. `llvm::fp::RoundingMode`. Defines the same values as
`clang::LangOptions::FPRoundingModeKind` but in different order. It is
used to specify rounding mode in in IR and functions that operate IR.
4. Rounding mode representation used by `FLT_ROUNDS` (C11, 5.2.4.2.2p7).
Besides constants for rounding mode it also uses a special value to
indicate error. It is convenient to use in intrinsic functions, as it
represents platform-independent representation for rounding mode. In this
role it is used in some pending patches.
5. Values like `FE_DOWNWARD` and other, which specify rounding mode in
library calls `fesetround` and `fegetround`. Often they represent bits
of some control register, so they are target-dependent. The same names
(not values) and a special name `FE_DYNAMIC` are used in
`#pragma STDC FENV_ROUND`.
The first 4 sets of constants are target independent and could have the
same numerical representation. It would simplify conversion between the
representations. Also now `clang::LangOptions::FPRoundingModeKind` and
`llvm::fp::RoundingMode` do not contain the value for IEEE-754 rounding
direction `roundTiesToAway`, although it is supported natively on
some targets.
This change defines all the rounding mode type via one `llvm::RoundingMode`,
which also contains rounding mode for IEEE rounding direction `roundTiesToAway`.
Differential Revision: https://reviews.llvm.org/D77379
This patch introduces the heat coloring of the Control Flow Graph which is based
on the relative "hotness" of each BB. The patch is a part of sequence of three
patches, related to graphs Heat Coloring.
Reviewers: rcorcs, apilipenko, davidxl, sfertile, fedor.sergeev, eraman, bollu
Differential Revision: https://reviews.llvm.org/D77161
Now that we have scalable vectors, there's a distinction that isn't
getting captured in the original SequentialType: some vectors don't have
a known element count, so counting the number of elements doesn't make
sense.
In some cases, there's a better way to express the commonality using
other methods. If we're dealing with GEPs, there's GEP methods; if we're
dealing with a ConstantDataSequential, we can query its element type
directly.
In the relatively few remaining cases, I just decided to write out
the type checks. We're talking about relatively few places, and I think
the abstraction doesn't really carry its weight. (See thread "[RFC]
Refactor class hierarchy of VectorType in the IR" on llvmdev.)
Differential Revision: https://reviews.llvm.org/D75661
The patch introduces the system to distinctively store the information
needed for the Control Flow Graph as well as the instrumentary needed for
the follow-up changes: BlockFrequencyInfo and BranchProbabilityInfo.
The patch is a part of sequence of three patches, related to graphs Heat Coloring.
Reviewers: rcorcs, apilipenko, davidxl, sfertile, fedor.sergeev, eraman, bollu
Differential Revision: https://reviews.llvm.org/D76820
The cmyk test is based on the known regression that resulted from:
rGf2fbdf76d8d0
This improves on the equivalent signed min/max change:
rG867f0c3c4d8c
The underlying icmp equivalence is:
~X pred ~Y --> Y pred X
For an icmp with constant, canonicalization results in a swapped pred:
~X < C --> X > ~C
The cmyk tests are based on the known regression that resulted from:
rGf2fbdf76d8d0
So this improvement in analysis might be enough to restore that commit.
D51664 added Instruction::comesBefore which should provide better
performance than the manual check.
Reviewers: rnk, nikic, spatel
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D76228
Forward declare DemandedBits in IVDescriptors, and move include
into the cpp file. Also drop the include from LoopUtils, which
does not need it at all.
Summary:
This patch includes two extensions:
1. It extends the GraphDiff to also keep the original list of updates
after legalization, not just the deletes/insert vectors.
It also provides an API to pop the first update (the updates are store
in reverse, such that the first update is at the end of the list)
2. It adds a bool to mark whether the given updates should be applied as
given, or applied in reverse. This moves the task of reversing the
updates (when the caller needs this) to a functionality inside
GraphDiff, versus having the caller do this.
The two changes could be split into two patches, but they seemed
reasonably small to be reviewed together.
Reviewers: kuhar, dblaikie
Subscribers: hiraditya, george.burgess.iv, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77167
Summary:
Splitting Knowledge retention into Queries in Analysis and Builder into Transform/Utils
allows Queries and Transform/Utils to use Analysis.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: mgorny, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77171
This patch adds
- New arguments to getMinPrefetchStride() to let the target decide on a
per-loop basis if software prefetching should be done even with a stride
within the limit of the hw prefetcher.
- New TTI hook enableWritePrefetching() to let a target do write prefetching
by default (defaults to false).
- In LoopDataPrefetch:
- A search through the whole loop to gather information before emitting any
prefetches. This way the target can get information via new arguments to
getMinPrefetchStride() and emit prefetches more selectively. Collected
information includes: Does the loop have a call, how many memory
accesses, how many of them are strided, how many prefetches will cover
them. This is NFC to before as long as the target does not change its
definition of getMinPrefetchStride().
- If a previous access to the same exact address was 'read', and the
current one is 'write', make it a 'write' prefetch.
- If two accesses that are covered by the same prefetch do not dominate
each other, put the prefetch in a block that dominates both of them.
- If a ConstantMaxTripCount is less than ItersAhead, then skip the loop.
- A SystemZ implementation of getMinPrefetchStride().
Review: Ulrich Weigand, Michael Kruse
Differential Revision: https://reviews.llvm.org/D70228
getCallCost is only used within the different layers of TTI, with no
backend implementing it so fold the base implementation into
getUserCost. I think this is an NFC.
Differential Revision: https://reviews.llvm.org/D77050
Summary: These were templated due to SelectionDAG using int masks for shuffles and IR using unsigned masks for shuffles. But now that D72467 has landed we have an int mask version of IRBuilder::CreateShuffleVector. So just use int instead of a template
Reviewers: spatel, efriedma, RKSimon
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D77183
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.
This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.
I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors. Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it. Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.
Differential Revision: https://reviews.llvm.org/D72467
This patch updates ValueLattice to distinguish between ranges that are
guaranteed to not include undef and ranges that may include undef.
A constant range guaranteed to not contain undef can be used to simplify
instructions to arbitrary values. A constant range that may contain
undef can only be used to simplify to a constant. If the value can be
undef, it might take a value outside the range. For example, consider
the snipped below
define i32 @f(i32 %a, i1 %c) {
br i1 %c, label %true, label %false
true:
%a.255 = and i32 %a, 255
br label %exit
false:
br label %exit
exit:
%p = phi i32 [ %a.255, %true ], [ undef, %false ]
%f.1 = icmp eq i32 %p, 300
call void @use(i1 %f.1)
%res = and i32 %p, 255
ret i32 %res
}
In the exit block, %p would be a constant range [0, 256) including undef as
%p could be undef. We can use the range information to replace %f.1 with
false because we remove the compare, effectively forcing the use of the
constant to be != 300. We cannot replace %res with %p however, because
if %a would be undef %cond may be true but the second use might not be
< 256.
Currently LazyValueInfo uses the new behavior just when simplifying AND
instructions and does not distinguish between constant ranges with and
without undef otherwise. I think we should address the remaining issues
in LVI incrementally.
Reviewers: efriedma, reames, aqjune, jdoerfert, sstefan1
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D76931
For the PHI node
%1 = phi [%A, %entry], [%X, %latch]
it is incorrect to use SCEV of backedge val %X as an exit value
of PHI unless %X is loop invariant.
This is because exit value of %1 is value of %X at one-before-last
iteration of the loop.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D73181
Optimize the common case of splat vector constant. For large vector
going through all elements is expensive. For splatr/broadcast cases we
can skip going through all elements.
Differential Revision: https://reviews.llvm.org/D76664
In past, isGuaranteedToTransferExecutionToSuccessor contained some weird logic
for volatile loads/stores that was ultimately removed by patch D65375. It's time to
remove a piece of dependent logic that used to be a workaround for the code which
is now deleted.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D76918
Aligned_alloc is a standard lib function and has been in glibc since
2.16 and in the C11 standard. It has semantics similar to malloc/calloc
for several analyses/transforms. This patch introduces aligned_alloc
in target library info and memory builtins. Subsequent ones will
make other passes aware and fix https://bugs.llvm.org/show_bug.cgi?id=44062
This change will also be useful to LLVM generators that need to allocate
buffers of vector elements larger than 16 bytes (for eg. 256-bit ones),
element boundary alignment for which is not typically provided by glibc malloc.
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D76970
This change implements constant folding to constrained versions of
intrinsics, implementing rounding: floor, ceil, trunc, round, rint and
nearbyint.
Differential Revision: https://reviews.llvm.org/D72930
Summary:
The method is used where TypeSize is implicitly cast to integer for
being checked against 0.
Reviewers: sdesmalen, efriedma
Reviewed By: sdesmalen, efriedma
Subscribers: efriedma, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76748
Summary:
Rename `succ_const_iterator` to `const_succ_iterator` and
`succ_const_range` to `const_succ_range` for consistency with the
predecessor iterators, and the corresponding iterators in
MachineBasicBlock.
Reviewers: nicholas, dblaikie, nlewycky
Subscribers: hiraditya, bmahjour, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75952
Summary:
Avoid blind cast from Operator to ExtractElementInst in
computeKnownBitsFromOperator. This resulted in some crashes
in downstream fuzzy testing. Instead we use getOperand directly
on the Operator when accessing the vector/index operands.
Haven't seen any problems with InsertElement and ShuffleVector,
but I believe those could be used in constant expressions as well.
So the same kind of fix as for ExtractElement was also applied for
InsertElement.
When it comes to ShuffleVector we now simply bail out if a dynamic
cast of the Operator to ShuffleVectorInst fails. I've got no
reproducer indicating problems for ShuffleVector, and a fix would be
slightly more complicated as getShuffleDemandedElts is involved.
Reviewers: RKSimon, nikic, spatel, efriedma
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76564
If one operand is unknown (and we don't have nowrap), don't compute
the second operand.
Also don't create an unnecessary extra KnownBits variable, it's
okay to reuse KnownOut.
This reduces instructions on libclamav_md5.c by 40%.
Summary:
DataLayout::getTypeStoreSize() returns TypeSize.
For cases where it can not be scalable vector (e.g., GlobalVariable),
explicitly call TypeSize::getFixedSize().
For cases where scalable property doesn't matter, (e.g., check for
zero-sized type), use TypeSize::isNonZero().
Reviewers: sdesmalen, efriedma, apazos, reames
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76454
Summary:
Return None if GEP index type is scalable vector. Size of scalable vectors
are multiplied by a runtime constant.
Avoid transforming:
%a = bitcast i8* %p to <vscale x 16 x i8>*
%tmp0 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 0
store <vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8>* %tmp0
%tmp1 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 1
store <vscale x 16 x i8> zeroinitializer, <vscale x 16 x i8>* %tmp1
into:
%a = bitcast i8* %p to <vscale x 16 x i8>*
%tmp0 = getelementptr <vscale x 16 x i8>, <vscale x 16 x i8>* %a, i64 0
%1 = bitcast <vscale x 16 x i8>* %tmp0 to i8*
call void @llvm.memset.p0i8.i64(i8* align 16 %1, i8 0, i64 32, i1 false)
Reviewers: sdesmalen, efriedma, apazos, reames
Reviewed By: sdesmalen
Subscribers: tschuett, hiraditya, rkruppe, arphaman, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76464
The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range.
This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).
This is fixing up various places that use the implicit
TypeSize->uint64_t conversion.
The new overloads in MemoryLocation.h are already used in various places
that construct a MemoryLocation from a TypeSize, including MemorySSA.
(They were using the implicit conversion before.)
Differential Revision: https://reviews.llvm.org/D76249
Summary:
As noted in [[ https://bugs.llvm.org/show_bug.cgi?id=45201 | PR45201 ]],
[[ https://bugs.llvm.org/show_bug.cgi?id=10090 | PR10090 ]] SCEV doesn't
always avoid recursive algorithms, and that causes issues with
large expression depths and/or smaller stack sizes.
In `SCEVExpander::isHighCostExpansion*()` case, the refactoring to avoid
recursion is rather idiomatic. We simply need to place the root expr
into a vector, and iterate over vector elements accounting for the cost
of each one, adding new exprs at the end of the vector,
thus achieving recursion-less traversal.
The order in which we will visit exprs doesn't matter here,
so we will be fine with the most basic approach of using SmallVector
and inserting/extracting from the back, which accidentally is the same
depth-first traversal that we were doing previously recursively.
Reviewers: mkazantsev, reames, wmi, ekatz
Reviewed By: mkazantsev
Subscribers: hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76273
Summary:
DataLayout::getTypeAllocSize() return TypeSize. For cases where the
scalable property doesn't matter, we should explicitly call getKnownMinSize()
to avoid implicit type conversion to uint64_t, which is not valid for scalable
vector type.
Reviewers: sdesmalen, efriedma, apazos, reames
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76260
Summary:
Run StackSafetyAnalysis at the end of the IR pipeline and annotate
proven safe allocas with !stack-safe metadata. Do not instrument such
allocas in the AArch64StackTagging pass.
Reviewers: pcc, vitalybuka, ostannard
Reviewed By: vitalybuka
Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, gilang, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D73513
Summary:
Skip folds that rely on DataLayout::getTypeAllocSize(). For scalable
vector, only minimal type alloc size is known at compile-time.
Reviewers: sdesmalen, efriedma, spatel, apazos
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75892
This patch adds a new singlecrfromundef lattice value, indicating a
single element constant range which was merge with undef at some point.
Merging it with another constant range results in overdefined, as we
won't be able to replace all users with a single value.
This patch uses a ConstantRange instead of a Constant*, because regular
integer constants are represented as single element constant ranges as
well and this allows the existing code working without additional
changes.
Reviewers: efriedma, nikic, reames, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75845
This patch adds a new undef lattice state, which is used to represent
UndefValue constants or instructions producing undef.
The main difference to the unknown state is that merging undef values
with constants (or single element constant ranges) produces the
constant/constant range, assuming all uses of the merge result will be
replaced by the found constant.
Contrary, merging non-single element ranges with undef needs to go to
overdefined. Using unknown for UndefValues currently causes mis-compiles
in CVP/LVI (PR44949) and will become problematic once we use
ValueLatticeElement for SCCP.
Reviewers: efriedma, reames, davide, nikic
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D75120