Follow the model used on Linux, where the clang driver passes the
linker a -u switch to force the profile runtime to be linked in,
rather than having every TU emit a dead function with a reference.
Differential Revision: https://reviews.llvm.org/D79835
As mentioned in the post-commit comments of D81013 -
the mask check API has to assume the shuffle is
not length-changing, but we have not ruled that out
in this code. Use the ShuffleVectorInst call instead.
Follow the model used on Linux, where the clang driver passes the
linker a -u switch to force the profile runtime to be linked in,
rather than having every TU emit a dead function with a reference.
Patch By: mcgrathr
Differential Revision: https://reviews.llvm.org/D79835
Remove the requirement, that when performing accumulator elimination,
all other cases must return the same dynamic constant. We can do this by
initializing the accumulator with the identity value of the accumulation
operation, and inserting an additional operation before any return.
Differential Revision: https://reviews.llvm.org/D80844
Summary:
In SCEVExpander FactorOutConstant(), when GEP indexing into/over scalable vector,
it is legal for the 'Factor' in a MulExpr to be the size of a scalable vector
instead of a compile-time constant.
Current upstream crash with the test attached.
Reviewers: efriedma, sdesmalen, sanjoy.google, mkazantsev
Reviewed By: efriedma
Subscribers: hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80973
Allow InvokeInst to have the second optional prof branch weight for
its unwind branch. InvokeInst is a terminator with two successors.
It might have its unwind branch taken many times. If so
the BranchProbabilityInfo unwind branch heuristic can be inaccurate.
This patch allows a higher accuracy calculated with both branch
weights set.
Changes:
- A new section about InvokeInst is added to
the BranchWeightMetadata page. It states the old information that
missed in the doc and adds new about the second branch weight.
- Verifier is changed to allow either 1 or 2 branch weights
for InvokeInst.
- A new test is written for BranchProbabilityInfo to demonstrate
the main improvement of the simple fix in calcMetadataWeights().
- Several new testcases are created for Inliner. Those check that
both weights are accounted for invoke instruction weight
calculation.
- PGOUseFunc::setBranchWeights() is fixed to be applicable to
InvokeInst.
Reviewers: davidxl, reames, xur, yamauchi
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80618
Remove the function Instruction::setProfWeight() and make
use of Instruction::copyMetadata(.., {LLVMContext::MD_prof}).
This is correct for all use cases of setProfWeight() as it
is applied to CallBase instructions only.
This change results in prof metadata copied intact even if
the source has "VP". The old pair of calls
extractProfTotalWeight() + setProfWeight() resulted in
setting branch_weights if the source had "VP" data.
Reviewers: yamauchi, davidxl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80987
Summary:
For retcon and retcon.once coroutines we assume that all uses of spills
can be sunk past coro.begin. This simplifies handling of instructions
that escape the address of an alloca.
The current implementation would have issues if the address of the
alloca is escaped before coro.begin. (It also has issues with casts before and
uses of those casts after the coro.begin instruction)
%alloca_addr = alloca ...
%escape = ptrtoint %alloca_addr
coro.begin
store %escape to %alloca_addr
rdar://60272809
Subscribers: hiraditya, modocache, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81023
Currently extracting a lane for a VPValue def is not supported, if it is
managed directly by VPTransformState (e.g. because it is created by a
VPInstruction or an external VPValue def).
For now, simply extract the requested lane. In the future, we should
also cache the extracted scalar values, similar to LV.
Reviewers: Ayal, rengolin, gilr, SjoerdMeijer
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D80787
Summary:
This patch simplifies FindMostPopularDest without changing the
functionality.
Given a list of jump threading destinations, the function finds the
most popular destination. To ensure determinism when there are
multiple destinations with the highest popularity, the function picks
the first one in the successor list with the highest popularity.
Without this patch:
- The function populates DestPopularity -- a histogram mapping
destinations to their respective occurrence counts.
- Then we iterate over DestPopularity, looking for the highest
popularity while building a vector of destinations with the highest
popularity.
- Finally, we iterate the successor list, looking for the destination
with the highest popularity.
With this patch:
- We implement DestPopularity with MapVector instead of DenseMap. We
populate the map with popularity 0 for all successors in the order
they appear in the successor list.
- We build the histogram in the same way as before.
- We simply use std::max_element on DestPopularity to find the most
popular destination. The use of MapVector ensures determinism.
Reviewers: wmi, efriedma
Reviewed By: wmi
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81030
When sampleFDO is enabled, people may expect they can use
-fno-profile-sample-use to opt-out using sample profile for a certain file.
That could be either for debugging purpose or for performance tuning purpose.
However, when thinlto is enabled, if a function in file A compiled with
-fno-profile-sample-use is imported to another file B compiled with
-fprofile-sample-use, the inlined copy of the function in file B may still
get its profile annotated.
The inconsistency may even introduce profile unused warning because if the
target is not compiled with explicit debug information flag, the function
in file A won't have its debug information enabled (debug information will
be enabled implicitly only when -fprofile-sample-use is used). After it is
imported into file B which is compiled with -fprofile-sample-use, profile
annotation for the outline copy of the function will fail because the
function has no debug information, and that will trigger profile unused
warning.
We add a new attribute use-sample-profile to control whether a function
will use its sample profile no matter for its outline or inline copies.
That will make the behavior of -fno-profile-sample-use consistent.
Differential Revision: https://reviews.llvm.org/D79959
LV currently only supports power of 2 vectorization factors, which has
been made explicit with the assertion added in
840450549c.
However, if the widest type is not a power-of-2 the computed MaxVF won't
be a power-of-2 either. This patch updates computeFeasibleMaxVF to
ensure the returned value is a power-of-2 by rounding down to the
nearest power-of-2.
Fixes PR46139.
Reviewers: Ayal, gilr, rengolin
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D80870
gc.relocate intrinsic is special in that its second and third operands
are not real values, but indices into relocate's parent statepoint list
of GC pointers.
To be CSE'd, they need special handling in `isEqual()` and `getHashCode()`.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D80445
Summary:
This simplifies the interface by storing the function analysis manager
with the InlineAdvisor, and, thus, not requiring it be passed each time
we inquire for an advice.
Reviewers: davidxl, asbirlea
Subscribers: eraman, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80405
SimplifyDemandedVectorElts() bails out on ScalableVectorType
anyway, but we can exit faster with the external check.
Move this to a helper function because there are likely other
vector folds that we can try here.
Summary:
The working set size heuristics (ProfileSummaryInfo::hasHugeWorkingSetSize)
under the partial sample PGO may not be accurate because the profile is partial
and the number of hot profile counters in the ProfileSummary may not reflect the
actual working set size of the program being compiled.
To improve this, the (approximated) ratio of the the number of profile counters
of the program being compiled to the number of profile counters in the partial
sample profile is computed (which is called the partial profile ratio) and the
working set size of the profile is scaled by this ratio to reflect the working
set size of the program being compiled and used for the working set size
heuristics.
The partial profile ratio is approximated based on the number of the basic
blocks in the program and the NumCounts field in the ProfileSummary and computed
through the thin LTO indexing. This means that there is the limitation that the
scaled working set size is available to the thin LTO post link passes only.
Reviewers: davidxl
Subscribers: mgorny, eraman, hiraditya, steven_wu, dexonsmith, arphaman, dang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79831
As discussed in https://bugs.llvm.org/show_bug.cgi?id=45951 and
D80584, the name 'tmp' is almost always a bad choice, but we have
a legacy of regression tests with that name because it was baked
into utils/update_test_checks.py.
This change makes -instnamer more consistent (already using "arg"
and "bb", the common LLVM shorthand). And it avoids the conflict
in telling users of the FileCheck script to run "-instnamer" to
create a better regression test and having that cause a warn/fail
in update_test_checks.py.
This is a reimplementation of the `orderNodes` function, as the old
implementation didn't take into account all cases.
The new implementation uses SCCs instead of Loops to take account of
irreducible loops.
Fix PR41509
Differential Revision: https://reviews.llvm.org/D79037
relevant aggregate build instructions only (UserCost).
Users are detected with findBuildAggregate routine and the trick is
that following SLP vectorization may end up vectorizing entire list
with smaller chunks. Cost adjustment then is applied for individual
chunks and these adjustments obviously have to be smaller than the
entire aggregate build cost.
Differential Revision: https://reviews.llvm.org/D80773
Prevent `invertCondition` from creating the inversion instruction, in
case the given value is an argument which has already been inverted.
Note that this approach has already been taken in case the given value
is an instruction (and not an argument).
Differential Revision: https://reviews.llvm.org/D80399
Currently SCCP does not widen PHIs, stores or along call edges
(arguments/return values), but on operations that directly extend ranges
(like binary operators).
This means PHIs, stores and call edges are not pessimized by widening
currently, while binary operators are. The main reason for widening
operators initially was that opting-out for certain operations was
more straight-forward in the initial implementation (and it did not
matter too much, as range support initially was only implemented for a
very limited set of operations.
During the discussion in D78391, it was suggested to consider flipping
widening to PHIs, stores and along call edges. After adding support for
tracking the number of range extensions in ValueLattice, limiting the
number of range extensions per value is straight forward.
This patch introduces a MaxWidenSteps option to the MergeOptions,
limiting the number of range extensions per value. For PHIs, it seems
natural allow an extension for each (active) incoming value plus 1. For
the other cases, a arbitrary limit of 10 has been chosen initially. It would
potentially make sense to set it depending on the users of a
function/global, but that still needs investigating. This potentially
leads to more state-changes and longer compile-times.
The results look quite promising (MultiSource, SPEC):
Same hash: 179 (filtered out)
Remaining: 58
Metric: sccp.IPNumInstRemoved
Program base widen-phi diff
test-suite...ks/Prolangs-C/agrep/agrep.test 58.00 82.00 41.4%
test-suite...marks/SciMark2-C/scimark2.test 32.00 43.00 34.4%
test-suite...rks/FreeBench/mason/mason.test 6.00 8.00 33.3%
test-suite...langs-C/football/football.test 104.00 128.00 23.1%
test-suite...cations/hexxagon/hexxagon.test 36.00 42.00 16.7%
test-suite...CFP2000/177.mesa/177.mesa.test 214.00 249.00 16.4%
test-suite...ngs-C/assembler/assembler.test 14.00 16.00 14.3%
test-suite...arks/VersaBench/dbms/dbms.test 10.00 11.00 10.0%
test-suite...oxyApps-C++/miniFE/miniFE.test 43.00 47.00 9.3%
test-suite...ications/JM/ldecod/ldecod.test 179.00 195.00 8.9%
test-suite...CFP2006/433.milc/433.milc.test 249.00 265.00 6.4%
test-suite.../CINT2000/175.vpr/175.vpr.test 98.00 104.00 6.1%
test-suite...peg2/mpeg2dec/mpeg2decode.test 70.00 74.00 5.7%
test-suite...CFP2000/188.ammp/188.ammp.test 71.00 75.00 5.6%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test 111.00 117.00 5.4%
test-suite...ce/Applications/Burg/burg.test 41.00 43.00 4.9%
test-suite...000/197.parser/197.parser.test 66.00 69.00 4.5%
test-suite...tions/lambda-0.1.3/lambda.test 23.00 24.00 4.3%
test-suite...urce/Applications/lua/lua.test 301.00 313.00 4.0%
test-suite...TimberWolfMC/timberwolfmc.test 76.00 79.00 3.9%
test-suite...lications/ClamAV/clamscan.test 991.00 1030.00 3.9%
test-suite...plications/d/make_dparser.test 53.00 55.00 3.8%
test-suite...fice-ispell/office-ispell.test 83.00 86.00 3.6%
test-suite...lications/obsequi/Obsequi.test 28.00 29.00 3.6%
test-suite.../Prolangs-C/bison/mybison.test 56.00 58.00 3.6%
test-suite.../CINT2000/254.gap/254.gap.test 170.00 176.00 3.5%
test-suite.../Applications/lemon/lemon.test 30.00 31.00 3.3%
test-suite.../CINT2000/176.gcc/176.gcc.test 1202.00 1240.00 3.2%
test-suite...pplications/treecc/treecc.test 79.00 81.00 2.5%
test-suite...chmarks/MallocBench/gs/gs.test 357.00 366.00 2.5%
test-suite...eeBench/analyzer/analyzer.test 103.00 105.00 1.9%
test-suite...T2006/445.gobmk/445.gobmk.test 1697.00 1724.00 1.6%
test-suite...006/453.povray/453.povray.test 1812.00 1839.00 1.5%
test-suite.../Benchmarks/Bullet/bullet.test 337.00 342.00 1.5%
test-suite.../CINT2000/252.eon/252.eon.test 426.00 432.00 1.4%
test-suite...T2000/300.twolf/300.twolf.test 214.00 217.00 1.4%
test-suite...pplications/oggenc/oggenc.test 244.00 247.00 1.2%
test-suite.../CINT2006/403.gcc/403.gcc.test 4008.00 4055.00 1.2%
test-suite...T2006/456.hmmer/456.hmmer.test 175.00 177.00 1.1%
test-suite...nal/skidmarks10/skidmarks.test 430.00 434.00 0.9%
test-suite.../Applications/sgefa/sgefa.test 115.00 116.00 0.9%
test-suite...006/447.dealII/447.dealII.test 1082.00 1091.00 0.8%
test-suite...6/482.sphinx3/482.sphinx3.test 141.00 142.00 0.7%
test-suite...ocBench/espresso/espresso.test 152.00 153.00 0.7%
test-suite...3.xalancbmk/483.xalancbmk.test 4003.00 4025.00 0.5%
test-suite...lications/sqlite3/sqlite3.test 548.00 551.00 0.5%
test-suite...marks/7zip/7zip-benchmark.test 5522.00 5551.00 0.5%
test-suite...nsumer-lame/consumer-lame.test 208.00 209.00 0.5%
test-suite...:: External/Povray/povray.test 1556.00 1563.00 0.4%
test-suite...000/186.crafty/186.crafty.test 298.00 299.00 0.3%
test-suite.../Applications/SPASS/SPASS.test 2019.00 2025.00 0.3%
test-suite...ications/JM/lencod/lencod.test 8427.00 8449.00 0.3%
test-suite...6/464.h264ref/464.h264ref.test 6797.00 6813.00 0.2%
test-suite...6/471.omnetpp/471.omnetpp.test 431.00 430.00 -0.2%
test-suite...006/450.soplex/450.soplex.test 446.00 447.00 0.2%
test-suite...0.perlbench/400.perlbench.test 1729.00 1727.00 -0.1%
test-suite...000/255.vortex/255.vortex.test 3815.00 3819.00 0.1%
Reviewers: efriedma, nikic, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D79036
Whilst trying to compile this test to assembly:
CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret.c
I discovered some warnings were firing in InstCombiner::visitBitCast
due to calls to getNumElements() for scalable vector types. These
calls only really made sense for fixed width vectors so I have fixed
up the code appropriately.
Differential Revision: https://reviews.llvm.org/D80559
These are the two operand sets which are expected to survive more than another week or so. Instead of bothering to update the deopt and gc-transition operands, we'll just wait until those are removed and delete the code.
For those following along, this is likely to be the last (major) change in this sequence for about a week. I want to wait until all of this has been merged downstream to ensure I haven't introduced any bugs (and migrate some downstream code to the new interfaces). Once that's done, we should be able to delete Statepoint/ImmutableStatepoint without too much work.
Summary:
Only column-major was supported so far. This adds row-major support as well.
Note that we probably also want very efficient SIMD implementations for the
various target platforms.
Bug:
https://bugs.llvm.org/show_bug.cgi?id=46085
Reviewers: nicolasvasilache, reidtatge, bkramer, fhahn, ftynse, andydavis1, craig.topper, dcaballe, mehdi_amini, anemet
Reviewed By: fhahn
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80673
Continues from D80598.
The key point of the change is to default to using operand bundles instead of the inline length prefix argument lists for statepoint nodes. An important subtlety to note is that the presence of a bundle has semantic meaning, even if it is empty. As such, we need to make a somewhat deeper change to the interface than is first obvious.
Existing code treats statepoint deopt arguments and the deopt bundle operands differently during inlining. The former is ignored (resulting in caller state being dropped), the later is merged.
We can't preserve the old behaviour for calls with deopt fed to RS4GC and then inlining, but we can avoid the no-deopt case changing. At least in internal testing, that seem to be the important one. (I'd argue the "stop merging after RS4GC" behaviour for the former was always "unexpected", but that the behaviour for non-deopt calls actually make sense.)
Differential Revision: https://reviews.llvm.org/D80674
Summary:
Follow up D79751 and put the instrumentation / value collection side (in
addition to the optimization side) behind the flag as well.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80646
Summary: The following code from
/llvm/lib/Transforms/Utils/LoopUnrollAndJam.cpp can be used by other
transformations:
while (!MergeBlocks.empty()) {
BasicBlock *BB = *MergeBlocks.begin();
BranchInst *Term = dyn_cast<BranchInst>(BB->getTerminator());
if (Term && Term->isUnconditional() &&
L->contains(Term->getSuccessor(0))) {
BasicBlock *Dest = Term->getSuccessor(0);
BasicBlock *Fold = Dest->getUniquePredecessor();
if (MergeBlockIntoPredecessor(Dest, &DTU, LI)) {
// Don't remove BB and add Fold as they are the same BB
assert(Fold == BB);
(void)Fold;
MergeBlocks.erase(Dest);
} else
MergeBlocks.erase(BB);
} else
MergeBlocks.erase(BB);
}
Hence it should be separated into its own utility function.
Authored By: sidbav
Reviewer: Whitney, Meinersbur, asbirlea, dmgreen, etiotto
Reviewed By: asbirlea
Subscribers: hiraditya, zzheng, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D80583
This one is slightly odd since it counts as an address expression,
which previously could never fail. Allow the existing TTI hook to
return the value to use, and re-use it for handling how to handle
ptrmask.
Handles the no-op addrspacecasts for AMDGPU. We could probably do
something better based on analysis of the mask value based on the
address space, but leave that for now.