The use list is a linked list so getNumUses requires a linear scan through the whole list. hasNUses will stop scanning at N and see if that is the end.
llvm-svn: 300505
and to expose a handle to represent the actual case rather than having
the iterator return a reference to itself.
All of this allows the iterator to be used with common STL facilities,
standard algorithms, etc.
Doing this exposed some missing facilities in the iterator facade that
I've fixed and required some work to the actual iterator to fully
support the necessary API.
Differential Revision: https://reviews.llvm.org/D31548
llvm-svn: 300032
The first variant contains all current transformations except
transforming switches into lookup tables. The second variant
contains all current transformations.
The switch-to-lookup-table conversion results in code that is more
difficult to analyze and optimize by other passes. Most importantly,
it can inhibit Dead Code Elimination. As such it is often beneficial to
only apply this transformation very late. A common example is inlining,
which can often result in range restrictions for the switch expression.
Changes in execution time according to LNT:
SingleSource/Benchmarks/Misc/fp-convert +3.03%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20%
MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43%
and a couple of smaller changes. For perimeter it also results 2.6%
a smaller binary.
Differential Revision: https://reviews.llvm.org/D30333
llvm-svn: 298799
This moves it to the iterator facade utilities giving it full random
access semantics, etc. It can also now be used with standard algorithms
like std::all_of and std::any_of and range adaptors like llvm::reverse.
Also make the semantics of iterating match what every other iterator
uses and forbid decrementing past the begin iterator. This was used as
a hacky way to work around iterator invalidation. However, every
instance trying to do this failed to actually avoid touching invalid
iterators despite the clear documentation that the removed and all
subsequent iterators become invalid including the end iterator. So I've
added a return of the next iterator to removeCase and rewritten the
loops that were doing this to correctly follow the iterator pattern of
either incremneting or removing and assigning fresh values to the
iterator and the end.
In one case we were trying to go backwards to make this cleaner but it
doesn't actually work. I've made that code match the code we use
everywhere else to remove cases as we iterate. This changes the order of
cases in one test output and I moved that test to CHECK-DAG so it
wouldn't care -- the order isn't semantically meaningful anyways.
llvm-svn: 298791
Summary: Along with https://reviews.llvm.org/D27804, debug locations need to be merged when hoisting store instructions as well. Not sure if just dropping debug locations would make more sense for this case, but as the branch instruction will have at least different discriminator with the hoisted store instruction, I think there will be no difference in practice.
Reviewers: aprantl, andreadb, danielcdh
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29062
llvm-svn: 293372
Conservatively disable sinking and merging inline-asm instructions as doing so
can potentially create arguments that cannot satisfy the inline-asm constraints.
For example, SimplifyCFG used to do the following transformation:
(before)
if.then:
%0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 8)
br label %if.end
if.else:
%1 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 6)
br label %if.end
(after)
%.sink = select i1 %tobool, i32 6, i32 8
%0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 %.sink)
This would result in a crash in the backend since only immediate integer operands
are permitted for constraint "n".
rdar://problem/30110806
Differential Revision: https://reviews.llvm.org/D29111
llvm-svn: 293025
This reapplies r289828 (reverted in r289833 as it broke the address sanitizer). The
debugloc is now only set when the instruction is not a call, as this causes the
verifier to assert (the inliner requires an inlinable callsite to have a debug loc
if the caller and callee have debug info).
Original commit message:
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Original review: https://reviews.llvm.org/D27590
llvm-svn: 290973
Now that a new API to merge debug locations has been committed at r289661 (see
review D26256 for more details), we can use it to "improve" the code added by
revision r280995.
Instead of nulling the debugloc of a commoned instruction, we use the 'merged'
debug location. At the moment, this is just a no functional change since
function `DILocation::getMergedLocation()` is just a stub and would always
return a null location.
Differential Revision: https://reviews.llvm.org/D27804
llvm-svn: 289862
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Differential Revision: https://reviews.llvm.org/D27590
llvm-svn: 289828
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
llvm-svn: 289756
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.
Differential Revision: https://reviews.llvm.org/D26594
llvm-svn: 288458
Summary: Debug info should *not* affect code generation. This patch properly handles debug info to make sure the generated code are the same with or without debug info.
Reviewers: davidxl, mzolotukhin, jmolloy
Subscribers: aprantl, llvm-commits
Differential Revision: https://reviews.llvm.org/D25286
llvm-svn: 284415
Not all ConstantExprs can be represented by a global variable, for example most
pointer arithmetic other than addition of a constant, so we can't convert these
values from switch statements to lookup tables.
Differential Revision: https://reviews.llvm.org/D25550
llvm-svn: 284379
BasicBlock::size is O(insts), making this loop O(blocks*insts), which
can be really slow on generated code. getPrevNode already checks if
we're at the beginning of the block and returns nullptr if so, just use
that instead. No functionality change intended.
llvm-svn: 284303
Summary:
Constant bundle operands may need to retain their constant-ness for
correctness. I'll admit that this is slightly odd, but it looks like
SimplifyCFG already does this for things like @llvm.frameaddress and
@llvm.stackmap, so I suppose adding one more case is not a big deal.
It is possible to add a mechanism to denote bundle operands that need to
remain constants, but that's probably too complicated for the time
being.
Reviewers: jmolloy
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D25502
llvm-svn: 284028
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.
Differential Revision: https://reviews.llvm.org/D24462
llvm-svn: 283530
GetCaseResults assumed that a terminator with one successor was an
unconditional branch. This is not necessarily the case, it could be a
cleanupret.
Strengthen the check by querying whether or not the terminator is
exceptional.
llvm-svn: 283517
We were updating metadata but not IR flags. Because we pick an arbitrary instruction to be the CSE candidate, it comes down to luck (50% or less chance) if this results in broken codegen or not, which is why PR30373 which is actually not the fault of the commit it was bisected down to.
Fixes PR30373.
llvm-svn: 281889
This should *actually* fix PR30244. This cranks up the workaround for PR30188 so that we never sink loads or stores of allocas.
The idea is that these should be removed by SROA/Mem2Reg, and any movement of them may well confuse SROA or just cause unwanted code churn. It's not ideal that the midend should be crippled like this, but that unwanted churn can really cause significant regressions in important workloads (tsan).
llvm-svn: 281162
Exposed by PR30244, we will split a block currently if we think we can sink at least one instruction. However this isn't right - the reason we split predecessors is so that we can sink instructions that otherwise couldn't be sunk because it isn't safe to do so - stores, for example.
So, change the heuristic to only split if it thinks it can sink at least one non-speculatable instruction.
Should fix PR30244.
llvm-svn: 281160
Summary: The hoisted instruction is executed speculatively. It could affect the debugging experience as user would see gdb go into code that may not be expected to execute. It will also affect sample profile accuracy by assigning incorrect frequency to source within then/else branch.
Reviewers: davidxl, dblaikie, chandlerc, kcc, echristo
Subscribers: mehdi_amini, probinson, eric_niebler, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D24164
llvm-svn: 280995
We can't create metadata-valued PHIs; don't try to do so when sinking.
I created a test case for this using the @llvm.type.test intrinsic, because it
takes a metadata parameter and does not have severe side effects (thus
SimplifyCFG is willing to otherwise sink it).
Previously, running the test case would crash with:
Invalid use of metadata!
%.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0>
LLVM ERROR: Broken function found, compilation aborted!
llvm-svn: 280866
In failure cases it's not guaranteed that the PHI we're inspecting is actually in the successor block! In this case we need to bail out early, and never query getIncomingValueForBlock() as that will cause an assert.
llvm-svn: 280794
I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores.
Fixes PR30188 (again).
llvm-svn: 280792
PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG.
Fixes PR30292.
llvm-svn: 280790
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.
The real fix is in SROA, which I'll be looking into.
llvm-svn: 280470
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.
As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:
if (a)
x(1);
else if (b)
x(2);
This produces the following CFG:
[if]
/ \
[x(1)] [if]
| | \
| | \
| [x(2)] |
\ | /
[ end ]
[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.
We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).
We are now able to detect this case and split off the unconditional arcs to a common successor:
[if]
/ \
[x(1)] [if]
| | \
| | \
| [x(2)] |
\ / |
[sink.split] |
\ /
[ end ]
Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.
llvm-svn: 280364
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.
This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.
This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:
%a = load i32* %b %d = load i32* %b
%c = gep i32* %a, i32 0 %e = gep i32* %d, i32 1
Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).
This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.
Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.
In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.
In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.
This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.
llvm-svn: 280351
We iterate over the result from SafeToMergeTerminators, so make it a SmallSetVector instead of a SmallPtrSet.
Should fix stage3 convergence builds.
llvm-svn: 280342
A very important case is not handled here: multiple arcs to a single block with a PHI. Consider:
a:
%1 = icmp %b, 1
br %1, label %c, label %e
c:
%2 = icmp %b, 2
br %2, label %d, label %e
d:
br %e
e:
phi [0, %a], [1, %c], [2, %d]
FoldValueComparisonIntoPredecessors will refuse to fold this, as it doesn't know how to deal with two arcs to a common destination with different PHI values. The answer is obvious - just split all conflicting arcs.
llvm-svn: 280338
We check that a sinking candidate is used by only one PHI node during our legality checks. However for instructions that are used by other sinking candidates our heuristic is less conservative. This can result in a candidate actually being illegal when we come to sink it because of how we sunk a predecessor. Do the used-by-only-one-PHI checks again during sinking to ensure we don't crash.
llvm-svn: 280228