with the new pass manager, and no longer relying on analysis groups.
This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:
- FunctionAAResults is a type-erasing alias analysis results aggregation
interface to walk a single query across a range of results from
different alias analyses. Currently this is function-specific as we
always assume that aliasing queries are *within* a function.
- AAResultBase is a CRTP utility providing stub implementations of
various parts of the alias analysis result concept, notably in several
cases in terms of other more general parts of the interface. This can
be used to implement only a narrow part of the interface rather than
the entire interface. This isn't really ideal, this logic should be
hoisted into FunctionAAResults as currently it will cause
a significant amount of redundant work, but it faithfully models the
behavior of the prior infrastructure.
- All the alias analysis passes are ported to be wrapper passes for the
legacy PM and new-style analysis passes for the new PM with a shared
result object. In some cases (most notably CFL), this is an extremely
naive approach that we should revisit when we can specialize for the
new pass manager.
- BasicAA has been restructured to reflect that it is much more
fundamentally a function analysis because it uses dominator trees and
loop info that need to be constructed for each function.
All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.
The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.
This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.
Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.
One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.
Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.
Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.
Differential Revision: http://reviews.llvm.org/D12080
llvm-svn: 247167
Summary:
Add a `cleanupendpad` instruction, used to mark exceptional exits out of
cleanups (for languages/targets that can abort a cleanup with another
exception). The `cleanupendpad` instruction is similar to the `catchendpad`
instruction in that it is an EH pad which is the target of unwind edges in
the handler and which itself has an unwind edge to the next EH action.
The `cleanupendpad` instruction, similar to `cleanupret` has a `cleanuppad`
argument indicating which cleanup it exits. The unwind successors of a
`cleanuppad`'s `cleanupendpad`s must agree with each other and with its
`cleanupret`s.
Update WinEHPrepare (and docs/tests) to accomodate `cleanupendpad`.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12433
llvm-svn: 246751
This makes RemoveDuplicatePHINodes more effective and fixes an assertion
failure. Triggering the assertions requires a DenseSet reallocation
so this change only contains a constructive test.
I'll explain the issue with a small example. In the following function
there's a duplicate PHI, %4 and %5 are identical. When this is found
the DenseSet in RemoveDuplicatePHINodes contains %2, %3 and %4.
define void @F() {
br label %1
; <label>:1 ; preds = %1, %0
%2 = phi i32 [ 42, %0 ], [ %4, %1 ]
%3 = phi i32 [ 42, %0 ], [ %5, %1 ]
%4 = phi i32 [ 42, %0 ], [ 23, %1 ]
%5 = phi i32 [ 42, %0 ], [ 23, %1 ]
br label %1
}
after RemoveDuplicatePHINodes runs the function looks like this. %3 has
changed and is now identical to %2, but RemoveDuplicatePHINodes never
saw this.
define void @F() {
br label %1
; <label>:1 ; preds = %1, %0
%2 = phi i32 [ 42, %0 ], [ %4, %1 ]
%3 = phi i32 [ 42, %0 ], [ %4, %1 ]
%4 = phi i32 [ 42, %0 ], [ 23, %1 ]
br label %1
}
If the DenseSet does a reallocation now it will reinsert all
keys and stumble over %3 now having a different hash value than it had
when inserted into the map for the first time. This change clears the
set whenever a PHI is deleted and starts the progress from the
beginning, allowing %3 to be deleted and avoiding inconsistent DenseSet
state. This potentially has a negative performance impact because
it rescans all PHIs, but I don't think that this ever makes a difference
in practice.
llvm-svn: 246694
Unlike scalar operations, we can perform vector operations on element types that
are smaller than the native integer types. We type-promote scalar operations if
they are smaller than a native type (e.g., i8 arithmetic is promoted to i32
arithmetic on Arm targets). This patch detects and removes type-promotions
within the reduction detection framework, enabling the vectorization of small
size reductions.
In the legality phase, we look through the ANDs and extensions that InstCombine
creates during promotion, keeping track of the smaller type. In the
profitability phase, we use the smaller type and ignore the ANDs and extensions
in the cost model. Finally, in the code generation phase, we truncate the result
of the reduction to allow InstCombine to rewrite the entire expression in the
smaller type.
This fixes PR21369.
http://reviews.llvm.org/D12202
Patch by Matt Simpson <mssimpso@codeaurora.org>!
llvm-svn: 246149
... and move it into LoopUtils where it can be used by other passes, just like ReductionDescriptor. The API is very similar to ReductionDescriptor - that is, not very nice at all. Sorting these both out will come in a followup.
NFC
llvm-svn: 246145
As Sanjoy pointed out over in http://reviews.llvm.org/D11819, a switch on an icmp should always be able to become a branch instruction. This patch generalizes that notion slightly to prove that the default case of a switch is unreachable if the cases completely cover all possible bit patterns in the condition. Once that's done, the switch to branch conversion kicks in just fine.
Note: Duplicate case values are disallowed by the LangRef and verifier.
Differential Revision: http://reviews.llvm.org/D11995
llvm-svn: 246125
Summary:
WinEHPrepare is going to require that cleanuppad and catchpad produce values
of token type which are consumed by any cleanupret or catchret exiting the
pad. This change updates the signatures of those operators to require/enforce
that the type produced by the pads is token type and that the rets have an
appropriate argument.
The catchpad argument of a `CatchReturnInst` must be a `CatchPadInst` (and
similarly for `CleanupReturnInst`/`CleanupPadInst`). To accommodate that
restriction, this change adds a notion of an operator constraint to both
LLParser and BitcodeReader, allowing appropriate sentinels to be constructed
for forward references and appropriate error messages to be emitted for
illegal inputs.
Also add a verifier rule (noted in LangRef) that a catchpad with a catchpad
predecessor must have no other predecessors; this ensures that WinEHPrepare
will see the expected linear relationship between sibling catches on the
same try.
Lastly, remove some superfluous/vestigial casts from instruction operand
setters operating on BasicBlocks.
Reviewers: rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12108
llvm-svn: 245797
Gets a bit tricky in the ValueMapper, of course - not sure if we should
just expose a list of explicit types for each Value so that the
ValueMapper can be neutral to these special cases (it's OK for things
like load, where the explicit type is the result type - but when that's
not the case, it means plumbing through another "special" type... )
llvm-svn: 245728
The module splitter splits a module into linkable partitions. It will
be used to implement parallel LTO code generation.
This initial version of the splitter does not attempt to deal with the
somewhat subtle symbol visibility issues around module splitting. These
will be dealt with in a future change.
Differential Revision: http://reviews.llvm.org/D12132
llvm-svn: 245662
and make it always preserve debug locations, since all callers wanted this
behavior anyway.
This is addressing a post-commit review feedback for r245589.
NFC (inside the LLVM tree).
llvm-svn: 245622
Instruction::dropUnknownMetadata(KnownSet) is supposed to preserve all
metadata in KnownSet, but the condition for DebugLocs was inverted.
Most users of dropUnknownMetadata() actually worked around this by not
adding LLVMContext::MD_dbg to their list of KnowIDs.
This is now made explicit.
llvm-svn: 245589
Since Ashutosh made findDefsUsedOutsideOfLoop public, we can clean this
up.
Now clients that don't compute DefsUsedOutsideOfLoop can just call
versionLoop() and computing DefsUsedOutsideOfLoop will happen
implicitly. With that there is no reason to expose addPHINodes anymore.
Ashutosh, you can now drop the calls to findDefsUsedOutsideOfLoop and
addPHINodes in LVerLICM and things should just work.
llvm-svn: 245579
folding the code into the main Analysis library.
There already wasn't much of a distinction between Analysis and IPA.
A number of the passes in Analysis are actually IPA passes, and there
doesn't seem to be any advantage to separating them.
Moreover, it makes it hard to have interactions between analyses that
are both local and interprocedural. In trying to make the Alias Analysis
infrastructure work with the new pass manager, it becomes particularly
awkward to navigate this split.
I've tried to find all the places where we referenced this, but I may
have missed some. I have also adjusted the C API to continue to be
equivalently functional after this change.
Differential Revision: http://reviews.llvm.org/D12075
llvm-svn: 245318
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.
I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.
But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.
To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.
To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.
With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.
Differential Revision: http://reviews.llvm.org/D12063
llvm-svn: 245193
If we can ignore NaNs, fmin/fmax libcalls can become compare and select
(this is what we turn std::min / std::max into).
This IR should then be optimized in the backend to whatever is best for
any given target. Eg, x86 can use minss/maxss instructions.
This should solve PR24314:
https://llvm.org/bugs/show_bug.cgi?id=24314
Differential Revision: http://reviews.llvm.org/D11866
llvm-svn: 245187
Some personality routines require funclet exit points to be clearly
marked, this is done by producing a token at the funclet pad and
consuming it at the corresponding ret instruction. CleanupReturnInst
already had a spot for this operand but CatchReturnInst did not.
Other personality routines don't need to use this which is why it has
been made optional.
llvm-svn: 245149
This introduces the basic functionality to support "token types".
The motivation stems from the need to perform operations on a Value
whose provenance cannot be obscured.
There are several applications for such a type but my immediate
motivation stems from WinEH. Our personality routine enforces a
single-entry - single-exit regime for cleanups. After several rounds of
optimizations, we may be left with a terminator whose "cleanup-entry
block" is not entirely clear because control flow has merged two
cleanups together. We have experimented with using labels as operands
inside of instructions which are not terminators to indicate where we
came from but found that LLVM does not expect such exotic uses of
BasicBlocks.
Instead, we can use this new type to clearly associate the "entry point"
and "exit point" of our cleanup. This is done by having the cleanuppad
yield a Token and consuming it at the cleanupret.
The token type makes it impossible to obscure or otherwise hide the
Value, making it trivial to track the relationship between the two
points.
What is the burden to the optimizer? Well, it turns out we have already
paid down this cost by accepting that there are certain calls that we
are not permitted to duplicate, optimizations have to watch out for
such instructions anyway. There are additional places in the optimizer
that we will probably have to update but early examination has given me
the impression that this will not be heroic.
Differential Revision: http://reviews.llvm.org/D11861
llvm-svn: 245029
If <src> is non-zero we can safely set the flag to true, and this
results in less code generated for, e.g. ffs(x) + 1 on FreeBSD.
Thanks to majnemer for suggesting the fix and reviewing.
Code generated before the patch was applied:
0: 0f bc c7 bsf %edi,%eax
3: b9 20 00 00 00 mov $0x20,%ecx
8: 0f 45 c8 cmovne %eax,%ecx
b: 83 c1 02 add $0x2,%ecx
e: b8 01 00 00 00 mov $0x1,%eax
13: 85 ff test %edi,%edi
15: 0f 45 c1 cmovne %ecx,%eax
18: c3 retq
Code generated after the patch was applied:
0: 0f bc cf bsf %edi,%ecx
3: 83 c1 02 add $0x2,%ecx
6: 85 ff test %edi,%edi
8: b8 01 00 00 00 mov $0x1,%eax
d: 0f 45 c1 cmovne %ecx,%eax
10: c3 retq
It seems we can still use cmove and save another 'test' instruction, but
that can be tackled separately.
Differential Revision: http://reviews.llvm.org/D11989
llvm-svn: 244947
r243382 changed the behavior to always require a set of memchecks to be
passed to LoopVer. This change restores the prior behavior as an
alternative to the new behavior. This allows the checks to be
implicitly taken from the LAA object.
Patch by Ashutosh Nema!
llvm-svn: 244763
Summary: This patch adds check for dead blocks and skip them for processSwitchInst(). This will help reduce compilation time.
Reviewers: reames, hans
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11953
llvm-svn: 244656
Summary: LowerSwitch crashed with the attached test case after deleting the default block. This happened because the current implementation of deleting dead blocks is wrong. After the default block being deleted, it contains no instruction or terminator, and it should no be traversed anymore. However, since the iterator is advanced before processSwitchInst() function is executed, the block advanced to could be deleted inside processSwitchInst(). The deleted block would then be visited next and crash dyn_cast<SwitchInst>(Cur->getTerminator()) because Cur->getTerminator() returns a nullptr. This patch fixes this problem by recording dead default blocks into a list, and delete them after all processSwitchInst() has been done. It still possible to visit dead default blocks and waste time process them. But it is a compile time issue, and I plan to have another patch to add support to skip dead blocks.
Reviewers: kariddi, resistor, hans, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11852
llvm-svn: 244642
This adds somewhat basic preparation functionality including:
- Formation of funclets via coloring basic blocks.
- Cloning of polychromatic blocks to ensure that funclets have unique
program counters.
- Demotion of values used between different funclets.
- Some amount of cleanup once we have removed predecessors from basic
blocks.
- Verification that we are left with a CFG that makes some amount of
sense.
N.B. Arguments and numbering still need to be done.
Differential Revision: http://reviews.llvm.org/D11750
llvm-svn: 244558
This patch moves the verification of fast-math to just before vectorization is done. This way we can tell clang to append the command line options would that allow floating-point commutativity. Specifically those are enableing fast-math or specifying a loop hint.
llvm-svn: 244489
Summary: llvm::ConstantFoldTerminator function can convert SwitchInst with single case (and default) to a conditional BranchInst. This patch adds support to preserve make.implicit metadata on this conversion.
Reviewers: sanjoy, weimingz, chenli
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D11841
llvm-svn: 244348
As a follow-up to r244181, resolve uniquing cycles underneath distinct
nodes on the fly. This prevents uniquing cycles in early operands from
affecting later operands. It also removes an iteration through distinct
nodes' operands.
No real functional change here, just more prompt resolution of temporary
nodes.
llvm-svn: 244302
After r244074, we now have a successors() method to iterate over
all the successors of a TerminatorInst. This commit changes a bunch
of eligible loops to use it.
llvm-svn: 244260
Rotate the algorithm for remapping distinct nodes in order to simplify
how uniquing cycles get resolved. This removes some of the recursion,
and, most importantly, exposes all uniquing cycles at the top-level.
Besides being a little more efficient -- temporary MDNodes won't live as
long -- the clearer logic should help protect against bugs like those
fixed in r243961 and r243976.
What are uniquing cycles? Why do they present challenges when remapping
metadata?
!0 = !{!1}
!1 = !{!0}
!0 and !1 form a simple uniquing cycle. When remapping from one
metadata graph to another, every uniquing cycle gets "duplicated"
through a dance:
!0-temp = !{!1?} ; map(!0): clone !0, VM[!0] = !0-temp
!1-temp = !{!0?} ; ..map(!1): clone !1, VM[!1] = !1-temp
!1-temp = !{!0-temp} ; ..map(!1): remap !1's operands
!2 = !{!0-temp} ; ..map(!1): uniquify: !1-temp => !2
!0-temp = !{!2} ; map(!0): remap !0's operands
!3 = !{!2} ; map(!0): uniquify: !0-temp => !3
; Result
!2 = !{!3}
!3 = !{!2}
(In the two "uniquify" steps above, the operands of !X-temp are compared
to the operands of !X. If they're the same, then !X-temp gets RAUW'ed
to !X; if they're different, then !X-temp is promoted to a new unique
node. The latter case always hits in for uniquing cycles, so we
duplicate all the nodes involved.)
Why is this a problem? Uniquable Metadata nodes that have temporary
node as transitive operands keep RAUW support until the temporary nodes
get finalized. With non-cycles, this happens automatically: when a
uniquable node's count of unresolved operands drops to zero, it
immediately sheds its own RAUW support (possibly triggering the same in
any node that references it). However, uniquing cycles create a
reference cycle, and uniqued nodes that transitively reference a
uniquing cycle are "stuck" in an unresolved state until someone calls
`MDNode::resolveCycles()` on a node in the unresolved subgraph.
Distinct nodes should help here (and mostly do): since they aren't
uniqued anywhere, they are guaranteed not to be RAUW'ed. They
effectively form a barrier between uniqued nodes, breaking some uniquing
cycles, and shielding uniqued nodes from uniquing cycles.
Unfortunately, with this barrier in place, the unresolved subgraph(s)
can be disjoint from the top-level node. The mapping algorithm needs to
find at least one representative from each disjoint subgraph. But which
nodes are *stuck*, and which will get resolved automatically? And which
nodes are in the unresolved subgraph? The old logic was conservative.
This commit rotates the logic for distinct nodes, so that we have access
to unresolved nodes at the top-level call to `llvm::MapMetadata()`.
Each time we return to the top-level, we know that all temporaries have
been RAUW'ed away. Here, it's safe (and necessary) to call
`resolveCycles()` immediately on unresolved operands.
This should also perform better than the old algorithm. The recursion
stack is shorter, temporary nodes don't live as long, and there are
fewer tracking references to unresolved nodes. As the debug info graph
introduces more 'distinct' nodes, remapping should incrementally get
cheaper and cheaper.
Aside from possible performance improvements (and reduced cruft in the
`LLVMContext`), there should be no functionality change here.
llvm-svn: 244181
Rename `remap()` to `remapOperands()`, and restrict its contract to
remapping operands. Previously, it also called `mapToMetadata()`, but
this logic is hard to reason about externally. In particular, this
refactors `mapUniquedNode()` to avoid redundant mapping calls, taking
advantage of the RAUWs that are already in place.
llvm-svn: 244168
r243883 and r243961 made a use-after-free far more likely:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/6041/steps/check-llvm%20asan/logs/stdio
Unresolved nodes get inserted into the `Cycles` array. If they later
get resolved through RAUW, we need to update the reference. It's
interesting that this never hit before (maybe an asan-ified clang
bootstrap with `-flto -g` would have hit it, but I admit I haven't tried
anything quite that crazy).
llvm-svn: 243976
This change was done as an audit and is by inspection. The new EH
system is still very much a work in progress. NFC for the landingpad
case.
llvm-svn: 243965
r243883 started moving 'distinct' nodes instead of duplicated them in
lib/Linker. This had the side-effect of sometimes not cloning uniqued
nodes that reference them. I missed a corner case:
!named = !{!0}
!0 = !{!1}
!1 = distinct !{!0}
!0 is the entry point for "remapping", and a temporary clone (say,
!0-temp) is created and mapped in case we need to model a uniquing
cycle.
Recursive descent into !1. !1 is distinct, so we leave it alone,
but update its operand to !0-temp.
Pop back out to !0. Its only operand, !1, hasn't changed, so we don't
need to use !0-temp. !0-temp goes out of scope, and we're finished
remapping, but we're left with:
!named = !{!0}
!0 = !{!1}
!1 = distinct !{null} ; uh oh...
Previously, if !0 and !0-temp ended up with identical operands, then
!0-temp couldn't have been referenced at all. Now that distinct nodes
don't get duplicated, that assumption is invalid. We need to
!0-temp->replaceAllUsesWith(!0) before freeing !0-temp.
I found this while running an internal `-flto -g` bootstrap. Strangely,
there was no case of this in the open source bootstrap I'd done before
commit...
llvm-svn: 243961
Instead of cloning distinct `MDNode`s when linking in a module, just
move them over. The module linker destroys the source module, so the
old node would otherwise just be leaked on the context. Create the new
node in place. This also reduces the number of cloned uniqued nodes
(since it's less likely their operands have changed).
This mapping strategy is only correct when we're discarding the source,
so the linker turns it on via a ValueMapper flag, `RF_MoveDistinctMDs`.
There's nothing observable in terms of `llvm-link` output here: the
linked module should be semantically identical.
I'll be adding more 'distinct' nodes to the debug info metadata graph in
order to break uniquing cycles, so the benefits of this will partly come
in future commits. However, we should get some gains immediately, since
we have a fair number of 'distinct' `DILocation`s being linked in.
llvm-svn: 243883
This is a minor optimization to only check for unresolved operands
inside `mapDistinctNode()` if the operands have actually changed. This
shouldn't really cause any change in behaviour. I didn't actually see a
slowdown in a profile, I was just poking around nearby and saw the
opportunity.
llvm-svn: 243866
This introduces new instructions neccessary to implement MSVC-compatible
exception handling support. Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.
Differential Revision: http://reviews.llvm.org/D11097
llvm-svn: 243766
The reason I was passing this vector by value in the constructor so that
I wouldn't have to copy when initializing the corresponding member but
then I forgot the std::move.
The use-case is LoopDistribution which filters the checks then
std::moves it to LoopVersioning's constructor. With this interface we
can avoid any copies.
llvm-svn: 243616
Before the patch, the checks were generated internally in
addRuntimeCheck. Now, we use the new overloaded version of
addRuntimeCheck that takes the ready-made set of checks as a parameter.
The checks are now generated by the client (LoopDistribution) with the
new RuntimePointerChecking::generateChecks API.
Also the new printChecks API is used to print out the checks for
debugging.
This is to continue the transition over to the new model whereby clients
will get the full set of checks from LAA, filter it and then pass it to
LoopVersioning and in turn to addRuntimeCheck.
llvm-svn: 243382
Summary:
Was D9784: "Remove loop variant range check when induction variable is
strictly increasing"
This change re-implements D9784 with the two differences:
1. It does not use SCEVExpander and does not generate new
instructions. Instead, it does a quick local search for existing
`llvm::Value`s that it needs when modifying the `icmp`
instruction.
2. It is more general -- it deals with both increasing and decreasing
induction variables.
I've added all of the tests included with D9784, and two more.
As an example on what this change does (copied from D9784):
Given C code:
```
for (int i = M; i < N; i++) // i is known not to overflow
if (i < 0) break;
a[i] = 0;
}
```
This transformation produces:
```
for (int i = M; i < N; i++)
if (M < 0) break;
a[i] = 0;
}
```
Which can be unswitched into:
```
if (!(M < 0))
for (int i = M; i < N; i++)
a[i] = 0;
}
```
I went back and forth on whether the top level logic should live in
`SimplifyIndvar::eliminateIVComparison` or be put into its own
routine. Right now I've put it under `eliminateIVComparison` because
even though the `icmp` is not *eliminated*, it no longer is an IV
comparison. I'm open to putting it in its own helper routine if you
think that is better.
Reviewers: reames, nicholas, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11278
llvm-svn: 243331
Instead of the pattern
for (auto I = x.rbegin(), E = x.end(); I != E; ++I)
we can use make_range to construct the reverse range and iterate using
that instead.
llvm-svn: 243163
We currently version `__asan_init` and when the ABI version doesn't match, the linker gives a `undefined reference to '__asan_init_v5'` message. From this, it might not be obvious that it's actually a version mismatch error. This patch makes the error message much clearer by changing the name of the undefined symbol to be `__asan_version_mismatch_check_xxx` (followed by the version string). We obviously don't want the initializer to be named like that, so it's a separate symbol that is used only for the purpose of version checking.
Reviewed at http://reviews.llvm.org/D11004
llvm-svn: 243003
preparation for de-coupling the AA implementations.
In order to do this, they had to become fake-scoped using the
traditional LLVM pattern of a leading initialism. These can't be actual
scoped enumerations because they're bitfields and thus inherently we use
them as integers.
I've also renamed the behavior enums that are specific to reasoning
about the mod/ref behavior of functions when called. This makes it more
clear that they have a very narrow domain of applicability.
I think there is a significantly cleaner API for all of this, but
I don't want to try to do really substantive changes for now, I just
want to refactor the things away from analysis groups so I'm preserving
the exact original design and just cleaning up the names, style, and
lifting out of the class.
Differential Revision: http://reviews.llvm.org/D10564
llvm-svn: 242963
Currently, a load from an alloca that is used in as single block and is not preceded
by a store is replaced by undef. This is not always correct if the single block is
inside a loop.
Fix the logic so that:
1) If there are no stores in the block, replace the load with an undef, as before.
2) If there is a store (regardless of where it is in the block w.r.t the load), bail
out, and let the rest of mem2reg handle this alloca.
Patch by: gil.rapaport@intel.com
Differential Revision: http://reviews.llvm.org/D11355
llvm-svn: 242884
through APIs that are no longer necessary now that the update API has
been removed.
This will make changes to the AA interfaces significantly less
disruptive (I hope). Either way, it seems like a really nice cleanup.
llvm-svn: 242882
part of simplifying its interface and usage in preparation for porting
to work with the new pass manager.
Note that this will likely expose that we have dead arguments, members,
and maybe even pass requirements for AA. I'll be cleaning those up in
seperate patches. This just zaps the actual update API.
Differential Revision: http://reviews.llvm.org/D11325
llvm-svn: 242881
I am planning to add more nested classes inside RuntimePointerCheck so
all these triple-nesting would be hard to follow.
Also rename it to RuntimePointerChecking (i.e. append 'ing').
llvm-svn: 242218
Previously we would refrain from attempting to increase the linkage of
available_externally globals because they were considered weak for the
linker. Now they are treated more like a declaration instead of a weak
definition.
This was causing SSE alignment faults in Chromuim, when some code
assumed it could increase the alignment of a dllimported global that it
didn't control. http://crbug.com/509256
llvm-svn: 242091
No in-tree alias analysis used this facility, and it was not called in
any particularly rigorous way, so it seems unlikely to be correct.
Note that one of the only stateful AA implementations in-tree,
GlobalsModRef is completely broken currently (and any AA passes like it
are equally broken) because Module AA passes are not effectively
invalidated when a function pass that fails to update the AA stack runs.
Ultimately, it doesn't seem like we know how we want to build stateful
AA, and until then trying to support and maintain correctness for an
untested API is essentially impossible. To that end, I'm planning to rip
out all of the update API. It can return if and when we need it and know
how to build it on top of the new pass manager and as part of *tested*
stateful AA implementations in the tree.
Differential Revision: http://reviews.llvm.org/D10889
llvm-svn: 241975
Summary:
The class will obviously need improvement down the road. For one, there
is no reason that addPHINodes would have to be exposed like that. I
will make this and other improvements in follow-up patches.
The main goal is to be able to share this functionality. The
LoopLoadElimination pass I am working on needs it too. Later we can
move other clients as well (LV and Ashutosh's LICMVer).
Reviewers: hfinkel, ashutosh.nema
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10577
llvm-svn: 241932
Summary:
This makes them available to the LoopVersioning class as that is moved
to its own module in the next patch.
Reviewers: ashutosh.nema, hfinkel
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10576
llvm-svn: 241931
Summary:
This introduces new instructions neccessary to implement MSVC-compatible
exception handling support. Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.
Reviewers: rnk, JosephTremoulet, reames, nlewycky, rjmccall
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11041
llvm-svn: 241888
We would create a phi node with a zero initialized operand instead of
undef in the case where no value was originally available. This was
problematic for x86_mmx which has no null value.
llvm-svn: 241143
Set debug location for terminator instruction in loop backedge block
(which is an unconditional jump to loop header). We can't copy debug
location from original backedges, as there can be several of them,
with different debug info locations. So, we follow the approach of
SplitBlockPredecessors, and copy the debug info from first non-PHI
instruction in the header (i.e. destination block).
This is yet another change for PR23837.
llvm-svn: 240999
Currently some users of this function do this explicitly, and all the
rest forget to do this.
ThreadSanitizer was one of such users, and had missing debug
locations for calls into TSan runtime handling atomic operations,
eventually leading to poorly symbolized stack traces and malfunctioning
suppressions.
This is another change relevant to PR23837.
llvm-svn: 240460
When a case of INT64_MIN was followed by a case that was greater than
zero, we were overflowing a signed integer here. Since we've sorted
the cases here anyway (and thus currentValue must be greater than
nextValue) it's simple enough to avoid this by using addition rather
than subtraction.
Found by UBSAN on existing tests.
llvm-svn: 240201
The patch is generated using this command:
tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py -fix \
-checks=-*,llvm-namespace-comment -header-filter='llvm/.*|clang/.*' \
llvm/lib/
Thanks to Eugene Kosov for the original patch!
llvm-svn: 240137
The personality routine currently lives in the LandingPadInst.
This isn't desirable because:
- All LandingPadInsts in the same function must have the same
personality routine. This means that each LandingPadInst beyond the
first has an operand which produces no additional information.
- There is ongoing work to introduce EH IR constructs other than
LandingPadInst. Moving the personality routine off of any one
particular Instruction and onto the parent function seems a lot better
than have N different places a personality function can sneak onto an
exceptional function.
Differential Revision: http://reviews.llvm.org/D10429
llvm-svn: 239940
A reduction is a special kind of recurrence. In the loop vectorizer we currently
identify basic reductions. Future patches will extend this to identifying basic
recurrences.
llvm-svn: 239835
Use IRBuilder::Create(Cond)?Br instead of constructing instructions
manually with BranchInst::Create(). It's consistent with other
uses of IRBuilder in this pass, and has an additional important
benefit:
Using IRBuilder will ensure that new branch instruction will get
the same debug location as original terminator instruction it will
eventually replace.
For now I'm not adding a testcase, as currently original terminator
instruction also lack debug location due to missing debug location
propagation in BasicBlock::splitBasicBlock. That is, the testcase
will accompany the fix for the latter I'm going to mail soon.
llvm-svn: 239550
Test Plan: regression test suite
Reviewers: eugenis, dblaikie
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10343
llvm-svn: 239438
isInductionPHI wants to calculate the stride based on the pointee size.
However, this is not possible when the pointee is zero sized.
This fixes PR23763.
llvm-svn: 239143
Alternatively, this type could be derived on-demand whenever
getResultElementType is called - if someone thinks that's the better
choice (simple time/space tradeoff), I'm happy to give it a go.
llvm-svn: 238716
If the type isn't trivially moveable emplace can skip a potentially
expensive move. It also saves a couple of characters.
Call sites were found with the ASTMatcher + some semi-automated cleanup.
memberCallExpr(
argumentCountIs(1), callee(methodDecl(hasName("push_back"))),
on(hasType(recordDecl(has(namedDecl(hasName("emplace_back")))))),
hasArgument(0, bindTemporaryExpr(
hasType(recordDecl(hasNonTrivialDestructor())),
has(constructExpr()))),
unless(isInTemplateInstantiation()))
No functional change intended.
llvm-svn: 238602
This patch extends EarlyCSE to take advantage of the information that a controlling branch gives us about the value of a Value within this and dominated basic blocks. If the current block has a single predecessor with a controlling branch, we can infer what the branch condition must have been to execute this block. The actual change to support this is downright simple because EarlyCSE's existing scoped hash table logic deals with most of the complexity around merging.
The patch actually implements two optimizations.
1) The first is analogous to JumpThreading in that it enables EarlyCSE's CSE handling to fold branches which are exactly redundant due to a previous branch to branches on constants. (It doesn't actually replace the branch or change the CFG.) This is pretty clearly a win since it enables substantial CFG simplification before we start trying to inline.
2) The second is analogous to CVP in that it exploits the knowledge gained to replace dominated *uses* of the original value. EarlyCSE does not otherwise reason about specific uses, so this is the more arguable one. It does enable further simplication and constant folding within the rest of the visit by EarlyCSE.
In both cases, the added code only handles the easy dominance based case of each optimization. The general case is deferred to the existing passes.
Differential Revision: http://reviews.llvm.org/D9763
llvm-svn: 238071
Now that Intrinsic::ID is a typed enum, we can forward declare it and so return it from this method.
This updates all users which were either using an unsigned to store it, or had a now unnecessary cast.
llvm-svn: 237810
The array passed to LoadAndStorePromoter's constructor was a constant reference to a SmallVectorImpl, which is just the same as passing an ArrayRef.
Also, the data in the array can be 'const Instruction*' instead of 'Instruction*'. Its not possible to convert a SmallVectorImpl<T*> to SmallVectorImpl<const T*>, but ArrayRef does provide such a method.
Currently this added calls to makeArrayRef which should be a nop, but i'm going to kick off a discussion about improving ArrayRef to not need these.
llvm-svn: 237226
We already had a method to iterate over all the incoming values of a PHI. This just changes all eligible code to use it.
Ineligible code included anything which cared about the index, or was also trying to get the i'th incoming BB.
llvm-svn: 237169
Summary:
This helper function creates a ctor function, which calls sanitizer's
init function with given arguments. This constructor is then expected
to be added to module's ctors. The patch helps unifying how sanitizer
constructor functions are created, and how init functions are called
across all sanitizers.
Reviewers: kcc, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8777
llvm-svn: 236627
Many of the callers already have the pointer type anyway, and for the
couple of callers that don't it's pretty easy to call PointerType::get
on the pointee type and address space.
This avoids LLParser from using PointerType::getElementType when parsing
GlobalAliases from IR.
llvm-svn: 236160
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`. The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.
Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one. It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs. YMMV of
course.
Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py. I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three. It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).
Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.
llvm-svn: 236120
When using bit tests for hole checks, we call AddPredecessorToBlock to give the
phi node a value from the bit test block. This would break if we've
previously called removePredecessor on the default destination because the
switch is fully covered.
Test case by Mark Lacey.
llvm-svn: 235771
(reverted in r235533)
Original commit message:
"Calls to llvm::Value::mutateType are becoming extra-sensitive now that
instructions have extra type information that will not be derived from
operands or result type (alloca, gep, load, call/invoke, etc... ). The
special-handling for mutateType will get more complicated as this work
continues - it might be worth making mutateType virtual & pushing the
complexity down into the classes that need special handling. But with
only two significant uses of mutateType (vectorization and linking) this
seems OK for now.
Totally open to ideas/suggestions/improvements, of course.
With this, and a bunch of exceptions, we can roundtrip an indirect call
site through bitcode and IR. (a direct call site is actually trickier...
I haven't figured out how to deal with the IR deserializer's lazy
construction of Function/GlobalVariable decl's based on the type of the
entity which means looking through the "pointer to T" type referring to
the global)"
The remapping done in ValueMapper for LTO was insufficient as the types
weren't correctly mapped (though I was using the post-mapped operands,
some of those operands might not have been mapped yet so the type
wouldn't be post-mapped yet). Instead use the pre-mapped type and
explicitly map all the types.
llvm-svn: 235651
This patch refactors the definition of common utility function "isInductionPHI" to LoopUtils.cpp.
This fixes compilation error when configured with -DBUILD_SHARED_LIBS=ON
llvm-svn: 235577
This reverts commit r235458.
It looks like this might be breaking something LTO-ish. Looking into it
& will recommit with a fix/test case/etc once I've got more to go on.
llvm-svn: 235533
Calls to llvm::Value::mutateType are becoming extra-sensitive now that
instructions have extra type information that will not be derived from
operands or result type (alloca, gep, load, call/invoke, etc... ). The
special-handling for mutateType will get more complicated as this work
continues - it might be worth making mutateType virtual & pushing the
complexity down into the classes that need special handling. But with
only two significant uses of mutateType (vectorization and linking) this
seems OK for now.
Totally open to ideas/suggestions/improvements, of course.
With this, and a bunch of exceptions, we can roundtrip an indirect call
site through bitcode and IR. (a direct call site is actually trickier...
I haven't figured out how to deal with the IR deserializer's lazy
construction of Function/GlobalVariable decl's based on the type of the
entity which means looking through the "pointer to T" type referring to
the global)
llvm-svn: 235458
Summary:
This lets us use range based for loops.
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9169
llvm-svn: 235416
Summary:
MemorySSA uses this algorithm as well, and this enables us to reuse the code in both places.
There are no actual algorithm or datastructure changes in here, just code movement.
Reviewers: qcolombet, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9118
llvm-svn: 235406
Remove early returns for when `getVariable()` is null, and just assert
that it never happens. The Verifier already confirms that there's a
valid variable on these intrinsics, so we should assume the debug info
isn't broken. I also updated a check for a `!dbg` attachment, which the
Verifier similarly guarantees.
llvm-svn: 235400
This commit fixes the code which adds lifetime markers in InlineFunction to skip
zero-sized allocas instead of asserting on them.
rdar://problem/20531155
llvm-svn: 235312
This patch refactors reduction identification code out of LoopVectorizer and
exposes them as common utilities.
No functional change.
Review: http://reviews.llvm.org/D9046
llvm-svn: 235284
Remove 'inlinedAt:' from MDLocalVariable. Besides saving some memory
(variables with it seem to be single largest `Metadata` contributer to
memory usage right now in -g -flto builds), this stops optimization and
backend passes from having to change local variables.
The 'inlinedAt:' field was used by the backend in two ways:
1. To tell the backend whether and into what a variable was inlined.
2. To create a unique id for each inlined variable.
Instead, rely on the 'inlinedAt:' field of the intrinsic's `!dbg`
attachment, and change the DWARF backend to use a typedef called
`InlinedVariable` which is `std::pair<MDLocalVariable*, MDLocation*>`.
This `DebugLoc` is already passed reliably through the backend (as
verified by r234021).
This commit removes the check from r234021, but I added a new check
(that will survive) in r235048, and changed the `DIBuilder` API in
r235041 to require a `!dbg` attachment whose 'scope:` is in the same
`MDSubprogram` as the variable's.
If this breaks your out-of-tree testcases, perhaps the script I used
(mdlocalvariable-drop-inlinedat.sh) will help; I'll attach it to PR22778
in a moment.
llvm-svn: 235050
Change `DIBuilder::insertDeclare()` and `insertDbgValueIntrinsic()` to
take an `MDLocation*`/`DebugLoc` parameter which it attaches to the
created intrinsic. Assert at creation time that the `scope:` field's
subprogram matches the variable's. There's a matching `clang` commit to
use the API.
The context for this is PR22778, which is removing the `inlinedAt:`
field from `MDLocalVariable`, instead deferring to the `!dbg` location
attached to the debug info intrinsic. The best way to ensure we always
have a `!dbg` attachment is to require one at creation time. I'll be
adding verifier checks next, but this API change is the best way to
shake out frontend bugs.
Note: I added an `llvm_unreachable()` in `bindings/go` and passed in
`nullptr` for the `DebugLoc`. The `llgo` folks will eventually need to
pass a valid `DebugLoc` here.
llvm-svn: 235041
Change `DICompileUnit::replaceSubprograms()` and
`DICompileUnit::replaceGlobalVariables()` to match the `MDCompileUnit`
equivalents that they're wrapping.
llvm-svn: 234852
Gut the `DIDescriptor` wrappers around `MDLocalScope` subclasses. Note
that `DILexicalBlock` wraps `MDLexicalBlockBase`, not `MDLexicalBlock`.
llvm-svn: 234850
Summary:
Runtime unrolling of loops needs to emit an expression to compute the
loop's runtime trip-count. Avoid runtime unrolling if this computation
will be expensive.
Depends on D8993.
Reviewers: atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8994
llvm-svn: 234846
This is along the same lines as r234832, but for `DILocation`. Clean
out all accessors from `DILocation`. Any callers should be using
`MDLocation` directly (e.g., via `operator->()`).
llvm-svn: 234835
Completely gut `DIExpression`, turning it into a simple wrapper around
`MDExpression *`. There are two bits of magic left:
- It's constructed from `const MDExpression*` but convertible to
`MDExpression*`.
- It's default-constructed to `nullptr`.
Otherwise, it should behave quite like a raw pointer. Once I've done
the same to the rest of the `DIDescriptor` subclasses, I'll come back to
delete them entirely (and update call sites as necessary to deal with
the missing magic).
llvm-svn: 234832
As documented in PR23200 (and the FIXMEs I've added to the code here),
this logic is fairly broken: it modifies the `LLVMContext` in a way that
affects other modules and cannot be serialized to assembly/bitcode. For
now, move it over to `MDLocation::computeNewDiscriminators()` anyway.
llvm-svn: 234825
I don't see a reason to add the `copyWithNewScope()` API over to
`MDLocation` -- it seems to be a holdover from when creating locations
required knowing details of operand layout -- so change
`AddDiscriminators` to call `MDLocation::get()` directly. Should be no
functionality change here.
llvm-svn: 234824
Clean up a predicate I added in r229731, fix the relevant comment and
add a test case. The earlier version is confusing to read and was also
buggy (probably not a coincidence) till Alexey fixed it in r233881.
llvm-svn: 234701
Rewrite `DILocation::atSameLineAs()` as `MDLocation::canDiscriminate()`
with a doxygen comment explaining its purpose. I've added a few FIXMEs
where I think this check is too weak; fixing that is tracked by PR23199.
llvm-svn: 234674
WinEH currently turns invokes into calls. Long term, we will reconsider
this, but for now, make sure we remap the operands and clone the
successors of the new terminator.
llvm-svn: 234608
CallSite roughly behaves as a common base CallInst and InvokeInst. Bring
the behavior closer to that model by making upcasts explicit. Downcasts
remain implicit and work as before.
Following dyn_cast as a mental model checking whether a Value *V isa
CallSite now looks like this:
if (auto CS = CallSite(V)) // think dyn_cast
instead of:
if (CallSite CS = V)
This is an extra token but I think it is slightly clearer. Making the
ctor explicit has the advantage of not accidentally creating nullptr
CallSites, e.g. when you pass a Value * to a function taking a CallSite
argument.
llvm-svn: 234601
The code uses a priority queue and a worklist, which share the same
visited set, but the visited set is only updated when inserting into
the priority queue. Instead, switch to using separate visited sets
for the priority queue and worklist.
llvm-svn: 234425
Replace all uses of `DITypedArray<>` with `MDTupleTypedArrayWrapper<>`
and `MDTypeRefArray`. The APIs are completely different, but the
provided functionality is the same: treat an `MDTuple` as if it's an
array of a particular element type.
To simplify this patch a bit, I've temporarily typedef'ed
`DebugNodeArray` to `DIArray` and `MDTypeRefArray` to `DITypeArray`.
I've also temporarily conditionalized the accessors to check for null --
eventually these should be changed to asserts and the callers should
check for null themselves.
There's a tiny accompanying patch to clang.
llvm-svn: 234290
Summary:
Instead of making a local copy of `checkInterfaceFunction` for each
sanitizer, move the function in a common place.
Reviewers: kcc, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8775
llvm-svn: 234220