This patch adds support for dfsan on aarch64-linux with 42-bit VMA
(current default config for 64K pagesize kernels). The support is
enabled by defining the SANITIZER_AARCH64_VMA to 42 at build time
for both clang/llvm and compiler-rt. The default VMA is 39 bits.
llvm-svn: 245840
TL-DR: SROA is followed by EarlyCSE which requires the DominatorTree.
There is no reason not to require it up-front for SROA.
Some history is necessary to understand why we ended-up here.
r123437 switched the second (Legacy)SROA in the optimizer pipeline to
use SSAUpdater in order to avoid recomputing the costly
DominanceFrontier. The purpose was to speed-up the compile-time.
Later r123609 removed the need for the DominanceFrontier in
(Legacy)SROA.
Right after, some cleanup was made in r123724 to remove any reference
to the DominanceFrontier. SROA existed in two flavors: SROA_SSAUp and
SROA_DT (the latter replacing SROA_DF).
The second argument of `createScalarReplAggregatesPass` was renamed
from `UseDomFrontier` to `UseDomTree`.
I believe this is were a mistake was made. The pipeline was not
updated and the call site was still:
PM->add(createScalarReplAggregatesPass(-1, false));
At that time, SROA was immediately followed in the pipeline by
EarlyCSE which required alread the DominatorTree. Not requiring
the DominatorTree in SROA didn't save anything, but unfortunately
it was lost at this point.
When the new SROA Pass was introduced in r163965, I believe the goal
was to have an exact replacement of the existing SROA, this bug
slipped through.
You can see currently:
$ echo "" | clang -x c++ -O3 -c - -mllvm -debug-pass=Structure
...
...
FunctionPass Manager
SROA
Dominator Tree Construction
Early CSE
After this patch:
$ echo "" | clang -x c++ -O3 -c - -mllvm -debug-pass=Structure
...
...
FunctionPass Manager
Dominator Tree Construction
SROA
Early CSE
This improves the compile time from 88s to 23s for PR17855.
https://llvm.org/bugs/show_bug.cgi?id=17855
And from 113s to 12s for PR16756
https://llvm.org/bugs/show_bug.cgi?id=16756
Reviewers: chandlerc
Differential Revision: http://reviews.llvm.org/D12267
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245820
Summary:
WinEHPrepare is going to require that cleanuppad and catchpad produce values
of token type which are consumed by any cleanupret or catchret exiting the
pad. This change updates the signatures of those operators to require/enforce
that the type produced by the pads is token type and that the rets have an
appropriate argument.
The catchpad argument of a `CatchReturnInst` must be a `CatchPadInst` (and
similarly for `CleanupReturnInst`/`CleanupPadInst`). To accommodate that
restriction, this change adds a notion of an operator constraint to both
LLParser and BitcodeReader, allowing appropriate sentinels to be constructed
for forward references and appropriate error messages to be emitted for
illegal inputs.
Also add a verifier rule (noted in LangRef) that a catchpad with a catchpad
predecessor must have no other predecessors; this ensures that WinEHPrepare
will see the expected linear relationship between sibling catches on the
same try.
Lastly, remove some superfluous/vestigial casts from instruction operand
setters operating on BasicBlocks.
Reviewers: rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12108
llvm-svn: 245797
Summary:
Merge functions previously relied on unsigned comparisons of pointer values to
order functions. This caused observable non-determinism in the compiler for
large bitcode programs. Basically, opt -mergefuncs program.bc | md5sum produces
different hashes when run repeatedly on the same machine. Differing output was
observed on three large bitcodes, but it was less frequent on the smallest file.
It is possible that this only manifests on the large inputs, hence remaining
undetected until now.
This patch fixes this by removing (almost, see below) all places where
comparisons between pointers are used to order functions. Most of these changes
are local, but the comparison of global values requires assigning an identifier
to each local in the order it is visited. This is very similar to the way the
comparison function identifies Value*'s defined within a function. Because the
order of visiting the functions and their subparts is deterministic, the
identifiers assigned to the globals will be as well, and the order of functions
will be deterministic.
With these changes, there is no more observed non-determinism. There is also
only minor slowdowns (negligible to 4%) compared to the baseline, which is
likely a result of the fact that global comparisons involve hash lookups and not
just pointer comparisons.
The one caveat so far is that programs containing BlockAddress constants can
still be non-deterministic. It is not clear what the right solution is here. In
particular, even if the global numbers are used to order by function, we still
need a way to order the BasicBlock*'s. Unfortunately, we cannot just bail out
and fail to order the functions or consider them equal, because we require a
total order over functions. Note that programs with BlockAddress constants are
relatively rare, so the impact of leaving this in is minor as long as this pass
is opt-in.
Author: jrkoenig
Reviewers: nlewycky, jfb, dschuff
Subscribers: jevinskie, llvm-commits, chapuni
Differential revision: http://reviews.llvm.org/D12168
llvm-svn: 245762
The original checkin was buggy, this change has a fix.
Original commit message:
[InstCombine] Transform A & (L - 1) u< L --> L != 0
Summary:
This transform is never a pessimization at the IR level (since it
replaces an `icmp` with another), and has potentiall payoffs:
1. It may make the `icmp` fold away or become loop invariant.
2. It may make the `A & (L - 1)` computation dead.
This shows up in Java, in range checks generated by array accesses of
the form `a[i & (a.length - 1)]`.
Reviewers: reames, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12210
llvm-svn: 245753
Gets a bit tricky in the ValueMapper, of course - not sure if we should
just expose a list of explicit types for each Value so that the
ValueMapper can be neutral to these special cases (it's OK for things
like load, where the explicit type is the result type - but when that's
not the case, it means plumbing through another "special" type... )
llvm-svn: 245728
The module splitter splits a module into linkable partitions. It will
be used to implement parallel LTO code generation.
This initial version of the splitter does not attempt to deal with the
somewhat subtle symbol visibility issues around module splitting. These
will be dealt with in a future change.
Differential Revision: http://reviews.llvm.org/D12132
llvm-svn: 245662
Summary:
This transform is never a pessimization at the IR level (since it
replaces an `icmp` with another), and has potentiall payoffs:
1. It may make the `icmp` fold away or become loop invariant.
2. It may make the `A & (L - 1)` computation dead.
This shows up in Java, in range checks generated by array accesses of
the form `a[i & (a.length - 1)]`.
Reviewers: reames, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12210
llvm-svn: 245635
and make it always preserve debug locations, since all callers wanted this
behavior anyway.
This is addressing a post-commit review feedback for r245589.
NFC (inside the LLVM tree).
llvm-svn: 245622
This patch adds support for asan on aarch64-linux with 42-bit VMA
(current default config for 64K pagesize kernels). The support is
enabled by defining the SANITIZER_AARCH64_VMA to 42 at build time
for both clang/llvm and compiler-rt. The default VMA is 39 bits.
llvm-svn: 245594
Summary:
Refactor, NFC
Extracts computeOverflowForSignedAdd and isKnownNonNegative from NaryReassociate to ValueTracking in case
others need it.
Reviewers: reames
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D11313
llvm-svn: 245591
Instruction::dropUnknownMetadata(KnownSet) is supposed to preserve all
metadata in KnownSet, but the condition for DebugLocs was inverted.
Most users of dropUnknownMetadata() actually worked around this by not
adding LLVMContext::MD_dbg to their list of KnowIDs.
This is now made explicit.
llvm-svn: 245589
Caught by the famous "DebugLoc describes the currect SubProgram" assertion.
When GVN is removing a nonlocal load it updates the debug location of the
SSA value it replaced the load with with the one of the load. In the
testcase this actually overwrites a valid debug location with an empty one.
In reality GVN has to make an arbitrary choice between two equally valid
debug locations. This patch changes to behavior to only update the
location if the value doesn't already have a debug location.
llvm-svn: 245588
Since Ashutosh made findDefsUsedOutsideOfLoop public, we can clean this
up.
Now clients that don't compute DefsUsedOutsideOfLoop can just call
versionLoop() and computing DefsUsedOutsideOfLoop will happen
implicitly. With that there is no reason to expose addPHINodes anymore.
Ashutosh, you can now drop the calls to findDefsUsedOutsideOfLoop and
addPHINodes in LVerLICM and things should just work.
llvm-svn: 245579
Summary: We know that -x & 1 is equivalent to x & 1, avoid using negation for testing if a negative integer is even or odd.
Reviewers: majnemer
Subscribers: junbuml, mssimpso, gberry, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D12156
llvm-svn: 245569
Usually DSE is not supposed to remove lifetime intrinsics, but it's
actually ok to remove them for dead objects in terminating blocks,
because they convey no extra information there. Until we hit a lifetime
start that cannot be removed, that is. Because from that point on the
lifetime intrinsics become interesting again, e.g. for stack coloring.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11710
llvm-svn: 245542
analyses into LLVM's Analysis library rather than having them in
a Transforms library.
This is motivated by the need to have the core AliasAnalysis
infrastructure be aware of the ObjCARCAliasAnalysis. However, it also
seems like a nice and clean separation. Everything was very easy to move
and this doesn't create much clutter in the analysis library IMO.
Differential Revision: http://reviews.llvm.org/D12133
llvm-svn: 245541
folding the code into the main Analysis library.
There already wasn't much of a distinction between Analysis and IPA.
A number of the passes in Analysis are actually IPA passes, and there
doesn't seem to be any advantage to separating them.
Moreover, it makes it hard to have interactions between analyses that
are both local and interprocedural. In trying to make the Alias Analysis
infrastructure work with the new pass manager, it becomes particularly
awkward to navigate this split.
I've tried to find all the places where we referenced this, but I may
have missed some. I have also adjusted the C API to continue to be
equivalently functional after this change.
Differential Revision: http://reviews.llvm.org/D12075
llvm-svn: 245318
After hitting @llvm.assume(X) we can:
- propagate equality that X == true
- if X is icmp/fcmp (with eq operation), and one of operand
is constant we can change all variables with constants in the same BasicBlock
http://reviews.llvm.org/D11918
llvm-svn: 245265
PR24469 resulted because DeleteDeadInstruction in handleNonLocalStoreDeletion was
deleting the next basic block iterator. Fixed the same by resetting the basic block iterator
post call to DeleteDeadInstruction.
llvm-svn: 245195
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.
I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.
But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.
To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.
To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.
With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.
Differential Revision: http://reviews.llvm.org/D12063
llvm-svn: 245193
If we can ignore NaNs, fmin/fmax libcalls can become compare and select
(this is what we turn std::min / std::max into).
This IR should then be optimized in the backend to whatever is best for
any given target. Eg, x86 can use minss/maxss instructions.
This should solve PR24314:
https://llvm.org/bugs/show_bug.cgi?id=24314
Differential Revision: http://reviews.llvm.org/D11866
llvm-svn: 245187
Bitwise arithmetic can obscure a simple sign-test. If replacing the
mask with a truncate is preferable if the type is legal because it
permits us to rephrase the comparison more explicitly.
llvm-svn: 245171
ByteSize and BitSize should not be size_t but unsigned, considering
1) They are at most 2^16 and 2^19, respectively.
2) BitSize is an argument to Type::getIntNTy which takes unsigned.
Also, use the correct utostr instead itostr and cache the string result.
Thanks to James Touton for reporting this!
llvm-svn: 245167
Some personality routines require funclet exit points to be clearly
marked, this is done by producing a token at the funclet pad and
consuming it at the corresponding ret instruction. CleanupReturnInst
already had a spot for this operand but CatchReturnInst did not.
Other personality routines don't need to use this which is why it has
been made optional.
llvm-svn: 245149
This patch makes the Merge Functions pass faster by calculating and comparing
a hash value which captures the essential structure of a function before
performing a full function comparison.
The hash is calculated by hashing the function signature, then walking the basic
blocks of the function in the same order as the main comparison function. The
opcode of each instruction is hashed in sequence, which means that different
functions according to the existing total order cannot have the same hash, as
the comparison requires the opcodes of the two functions to be the same order.
The hash function is a static member of the FunctionComparator class because it
is tightly coupled to the exact comparison function used. For example, functions
which are equivalent modulo a single variant callsite might be merged by a more
aggressive MergeFunctions, and the hash function would need to be insensitive to
these differences in order to exploit this.
The hashing function uses a utility class which accumulates the values into an
internal state using a standard bit-mixing function. Note that this is a different interface
than a regular hashing routine, because the values to be hashed are scattered
amongst the properties of a llvm::Function, not linear in memory. This scheme is
fast because only one word of state needs to be kept, and the mixing function is
a few instructions.
The main runOnModule function first computes the hash of each function, and only
further processes functions which do not have a unique function hash. The hash
is also used to order the sorted function set. If the hashes differ, their
values are used to order the functions, otherwise the full comparison is done.
Both of these are helpful in speeding up MergeFunctions. Together they result in
speedups of 9% for mysqld (a mostly C application with little redundancy), 46%
for libxul in Firefox, and 117% for Chromium. (These are all LTO builds.) In all
three cases, the new speed of MergeFunctions is about half that of the module
verifier, making it relatively inexpensive even for large LTO builds with
hundreds of thousands of functions. The same functions are merged, so this
change is free performance.
Author: jrkoenig
Reviewers: nlewycky, dschuff, jfb
Subscribers: llvm-commits, aemerson
Differential revision: http://reviews.llvm.org/D11923
llvm-svn: 245140
This seems to only work some of the time. In some situations,
this seems to use a nonsensical type and isn't actually aware of the
memory being accessed. e.g. if branch condition is an icmp of a pointer,
it checks the addressing mode of i1.
llvm-svn: 245137
Summary: Similar to the change we applied to ASan. The same test case works.
Reviewers: samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11961
llvm-svn: 245067
This introduces the basic functionality to support "token types".
The motivation stems from the need to perform operations on a Value
whose provenance cannot be obscured.
There are several applications for such a type but my immediate
motivation stems from WinEH. Our personality routine enforces a
single-entry - single-exit regime for cleanups. After several rounds of
optimizations, we may be left with a terminator whose "cleanup-entry
block" is not entirely clear because control flow has merged two
cleanups together. We have experimented with using labels as operands
inside of instructions which are not terminators to indicate where we
came from but found that LLVM does not expect such exotic uses of
BasicBlocks.
Instead, we can use this new type to clearly associate the "entry point"
and "exit point" of our cleanup. This is done by having the cleanuppad
yield a Token and consuming it at the cleanupret.
The token type makes it impossible to obscure or otherwise hide the
Value, making it trivial to track the relationship between the two
points.
What is the burden to the optimizer? Well, it turns out we have already
paid down this cost by accepting that there are certain calls that we
are not permitted to duplicate, optimizations have to watch out for
such instructions anyway. There are additional places in the optimizer
that we will probably have to update but early examination has given me
the impression that this will not be heroic.
Differential Revision: http://reviews.llvm.org/D11861
llvm-svn: 245029
its creation function.
This required shifting a bunch of method definitions to be out-of-line
so that we could leave most of the implementation guts in the .cpp file.
llvm-svn: 245021
I've used forward declarations and reorderd the source code some to make
this reasonably clean and keep as much of the code as possible in the
source file, including all the stratified set details. Just the basic AA
interface and the create function are in the header file, and the header
file is now included into the relevant locations.
llvm-svn: 245009
Summary:
This patch implements my promised optimization to reunites certain sexts from
operands after we extract the constant offset. See the header comment of
reuniteExts for its motivation.
One key building block that enables this optimization is Bjarke's poison value
analysis (D11212). That helps to prove "a +nsw b" can't overflow.
Reviewers: broune
Subscribers: jholewinski, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D12016
llvm-svn: 245003
AliasAnalysis in LoopIdiomRecognize.
The previous commit to LIR, r244879, exposed some scary bug in the loop
pass pipeline with an assert failure that showed up on several bots.
This patch got reverted as part of getting that revision reverted, but
they're actually independent and unrelated. This patch has no functional
change and should be completely safe. It is also useful for my current
work on the AA infrastructure.
llvm-svn: 244993
If <src> is non-zero we can safely set the flag to true, and this
results in less code generated for, e.g. ffs(x) + 1 on FreeBSD.
Thanks to majnemer for suggesting the fix and reviewing.
Code generated before the patch was applied:
0: 0f bc c7 bsf %edi,%eax
3: b9 20 00 00 00 mov $0x20,%ecx
8: 0f 45 c8 cmovne %eax,%ecx
b: 83 c1 02 add $0x2,%ecx
e: b8 01 00 00 00 mov $0x1,%eax
13: 85 ff test %edi,%edi
15: 0f 45 c1 cmovne %ecx,%eax
18: c3 retq
Code generated after the patch was applied:
0: 0f bc cf bsf %edi,%ecx
3: 83 c1 02 add $0x2,%ecx
6: 85 ff test %edi,%edi
8: b8 01 00 00 00 mov $0x1,%eax
d: 0f 45 c1 cmovne %ecx,%eax
10: c3 retq
It seems we can still use cmove and save another 'test' instruction, but
that can be tackled separately.
Differential Revision: http://reviews.llvm.org/D11989
llvm-svn: 244947
We used to be over-conservative about preserving inbounds. Actually, the second
GEP (which applies the constant offset) can inherit the inbounds attribute of
the original GEP, because the resultant pointer is equivalent to that of the
original GEP. For example,
x = GEP inbounds a, i+5
=>
y = GEP a, i // inbounds removed
x = GEP inbounds y, 5 // inbounds preserved
llvm-svn: 244937
DeadStoreElimination does eliminate a store if it stores a value which was loaded from the same memory location.
So far this worked only if the store is in the same block as the load.
Now we can also handle stores which are in a different block than the load.
Example:
define i32 @test(i1, i32*) {
entry:
%l2 = load i32, i32* %1, align 4
br i1 %0, label %bb1, label %bb2
bb1:
br label %bb3
bb2:
; This store is redundant
store i32 %l2, i32* %1, align 4
br label %bb3
bb3:
ret i32 0
}
Differential Revision: http://reviews.llvm.org/D11854
llvm-svn: 244901
Consider this code:
BB:
%i = phi i32 [ 0, %if.then ], [ %c, %if.else ]
%add = add nsw i32 %i, %b
...
In this common case the add can be moved to the %if.else basic block, because
adding zero is an identity operation. If we go though %if.then branch it's
always a win, because add is not executed; if not, the number of instructions
stays the same.
This pattern applies also to other instructions like sub, shl, shr, ashr | 0,
mul, sdiv, div | 1.
Patch by Jakub Kuderski!
llvm-svn: 244887
simplified form to remove redundant checks and simplify the code for
popcount recognition. We don't actually need to handle all of these
cases.
I've left a FIXME for one in particular until I finish inspecting to
make sure we don't actually *rely* on the predicate in any way.
llvm-svn: 244879
Most SSE/AVX (non-constant) vector shift instructions only use the lower 64-bits of the 128-bit shift amount vector operand, this patch calls SimplifyDemandedVectorElts to optimize for this.
I had to refactor some of my recent InstCombiner work on the vector shifts to avoid quite a bit of duplicate code, it means that SimplifyX86immshift now (re)decodes the type of shift.
Differential Revision: http://reviews.llvm.org/D11938
llvm-svn: 244872
Summary: This patch moves the check of OptimizeForSize before traversing over all basic blocks in current loop. If OptimizeForSize is set to true, no non-trivial unswitch is ever allowed. Therefore, the early exit will help reduce compilation time. This patch should be NFC.
Reviewers: reames, weimingz, broune
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11997
llvm-svn: 244868
code into methods on LoopIdiomRecognize.
This simplifies the code somewhat and also makes it much easier to move
the analyses around. Ultimately, the separate class wasn't providing
significant value over methods -- it contained the precondition basic
block and the current loop. The current loop is already available and
the precondition block wasn't needed everywhere and is easy to pass
around.
In several cases I just moved things to be static functions because they
already accepted most of their inputs as arguments.
This doesn't fix the way we manage analyses yet, that will be the next
patch, but it already makes the code over 50 lines shorter.
No functionality changed.
llvm-svn: 244851
complexity.
There is only one function that was called from multiple locations, and
that was 'getBranch' which has a reasonable one-line spelling already:
dyn_cast<BranchInst>(BB->getTerminator). We could make this shorter, but
it doesn't seem to add much value. Instead, we should avoid calling it
so many times on the same basic blocks, but that will be in a subsequent
patch.
The other functions are only called in one location, so inline them
there, and take advantage of this to use direct early exit and reduce
indentation. This makes it much more clear what is being tested for, and
in fact makes it clear now to me that there are simpler ways to do this
work. However, this patch just does the mechanical inlining. I'll clean
up the functionality of the code to leverage loop simplified form more
effectively in a follow-up.
Despite lots of early line breaks due to early-exit, this is still
shorter than it was before.
llvm-svn: 244841
a significant code cleanup here.
The handling of analyses in this pass is overly complex and can be
simplified significantly, but the right way to do that is to simplify
all of the code not just the analyses, and that'll require pretty
extensive edits that would be noisy with formatting changes mixed into
them.
llvm-svn: 244828
To be clear: this is an *optimization* not a correctness change.
CodeGenPrep likes to duplicate icmps feeding branch instructions to take advantage of x86's ability to fuze many comparison/branch patterns into a single micro-op and to reduce the need for materializing i1s into general registers. PlaceSafepoints likes to place safepoint polls right at the end of basic blocks (immediately before terminators) when inserting entry and backedge safepoints. These two heuristics interact in a somewhat unfortunate way where the branch terminating the original block will be controlled by a condition driven by unrelocated pointers. This forces the register allocator to keep both the relocated and unrelocated values of the pointers feeding the icmp alive over the safepoint poll.
One simple fix would have been to just adjust PlaceSafepoints to move one back in the basic block, but you can reach similar cases as a result of LICM or other hoisting passes. As a result, doing a post insertion fixup seems to be more robust.
I considered doing this in CodeGenPrep itself, but having to update the live sets of already rewritten safepoints gets complicated fast. In particular, you can't just use def/use information because by moving the icmp, we're extending the live range of it's inputs potentially.
Instead, this patch teaches RewriteStatepointsForGC to make the required adjustments before making the relocations explicit in the IR. This change really highlights the fact that RSForGC is a CodeGenPrep-like pass which is performing target specific lowering. In the long run, we may even want to combine the two though this would require a lot more smarts to be integrated into RSForGC first. We currently rely on being able to run a set of cleanup passes post rewriting because the IR RSForGC generates is pretty damn ugly.
Differential Revision: http://reviews.llvm.org/D11819
llvm-svn: 244821
When rewriting the IR such that base pointers are available for every live pointer, we potentially need to duplicate instructions to propagate the base. The original code had only handled PHI and Select under the belief those were the only instructions which would need duplicated. When I added support for vector instructions, I'd added a collection of hacks for ExtractElement which caught most of the common cases. Of course, I then found the one test case my hacks couldn't cover. :)
This change removes all of the early hacks for extract element. By defining extractelement as a BDV (rather than trying to look through it), we can extend the rewriting algorithm to duplicate the extract as needed. Note that a couple of peephole optimizations were left in for the moment, because while we now handle extractelement as a first class citizen, we're not yet handling insertelement. That change will follow in the near future.
llvm-svn: 244808
AliasAnalysis.
Same as the other commits, the TLI access from an alias analysis is
going away and isn't very clean -- it is better to explicitly mark the
dependencies.
llvm-svn: 244785
just depend on it directly.
This was particularly frustrating because there was a really wide
mixture of using a member variable and re-extracting it from the AA that
happened to be around. I think the result is much more clear.
I've also deleted all of the pointless null checks and used references
across the APIs where I could to make it explicit that this cannot be
null in a useful fashion.
llvm-svn: 244780
r243382 changed the behavior to always require a set of memchecks to be
passed to LoopVer. This change restores the prior behavior as an
alternative to the new behavior. This allows the checks to be
implicitly taken from the LAA object.
Patch by Ashutosh Nema!
llvm-svn: 244763
As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely).
InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask).
I also moved all the relevant combine tests into InstCombine/blend_x86.ll
Differential Revision: http://reviews.llvm.org/D11934
llvm-svn: 244723
`InstCombiner::OptimizeOverflowCheck` was asserting an
invariant (operands to binary operations are ordered by decreasing
complexity) that wasn't really an invariant. Fix this by instead having
`InstCombiner::OptimizeOverflowCheck` establish the invariant if it does
not hold.
llvm-svn: 244676
Summary: This patch adds check for dead blocks and skip them for processSwitchInst(). This will help reduce compilation time.
Reviewers: reames, hans
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11953
llvm-svn: 244656
Summary: LowerSwitch crashed with the attached test case after deleting the default block. This happened because the current implementation of deleting dead blocks is wrong. After the default block being deleted, it contains no instruction or terminator, and it should no be traversed anymore. However, since the iterator is advanced before processSwitchInst() function is executed, the block advanced to could be deleted inside processSwitchInst(). The deleted block would then be visited next and crash dyn_cast<SwitchInst>(Cur->getTerminator()) because Cur->getTerminator() returns a nullptr. This patch fixes this problem by recording dead default blocks into a list, and delete them after all processSwitchInst() has been done. It still possible to visit dead default blocks and waste time process them. But it is a compile time issue, and I plan to have another patch to add support to skip dead blocks.
Reviewers: kariddi, resistor, hans, reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11852
llvm-svn: 244642
Summary:
For LTO we need to enable this pass in the LTO pipeline,
as it is skipped during the "-flto -c" compile step (when PrepareForLTO is
set).
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11919
llvm-svn: 244622
The select pattern recognition in ValueTracking (as used by InstCombine
and SelectionDAGBuilder) only knew about integer patterns. This teaches
it about minimum and maximum operations.
matchSelectPattern() has been extended to return a struct containing the
existing Flavor and a new enum defining the pattern's behavior when
given one NaN operand.
C minnum() is defined to return the non-NaN operand in this case, but
the idiomatic C "a < b ? a : b" would return the NaN operand.
ARM and AArch64 at least have different instructions for these different cases.
llvm-svn: 244580
This adds somewhat basic preparation functionality including:
- Formation of funclets via coloring basic blocks.
- Cloning of polychromatic blocks to ensure that funclets have unique
program counters.
- Demotion of values used between different funclets.
- Some amount of cleanup once we have removed predecessors from basic
blocks.
- Verification that we are left with a CFG that makes some amount of
sense.
N.B. Arguments and numbering still need to be done.
Differential Revision: http://reviews.llvm.org/D11750
llvm-svn: 244558
This patch and a relatec clang patch solve the problem of having to explicitly enable analysis when specifying a loop hint pragma to get the diagnostics. Passing AlwasyPrint as the pass name (see below) causes the front-end to print the diagnostic if the user has specified '-Rpass-analysis' without an '=<target-pass>’. Users of loop hints can pass that compiler option without having to specify the pass and they will get diagnostics for only those loops with loop hints.
llvm-svn: 244555
This patch moves checking the threshold of runtime pointer checks to the vectorization requirements (late diagnostics) and emits a diagnostic that infroms the user the loop would be vectorized if not for exceeding the pointer-check threshold. Clang will also append the options that can be used to allow vectorization.
llvm-svn: 244523
As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations.
Differential Revision: http://reviews.llvm.org/D11886
llvm-svn: 244495
This patch moves the verification of fast-math to just before vectorization is done. This way we can tell clang to append the command line options would that allow floating-point commutativity. Specifically those are enableing fast-math or specifying a loop hint.
llvm-svn: 244489
Sometimes interleaving is not beneficial, as determined by the cost-model and sometimes it is disabled by a loop hint (by the user). This patch modifies the diagnostic messages to make it clear why interleaving wasn't done.
llvm-svn: 244485
This change adds the unroll metadata "llvm.loop.unroll.enable" which directs
the optimizer to unroll a loop fully if the trip count is known at compile time, and
unroll partially if the trip count is not known at compile time. This differs from
"llvm.loop.unroll.full" which explicitly does not unroll a loop if the trip count is not
known at compile time.
The "llvm.loop.unroll.enable" is intended to be added for loops annotated with
"#pragma unroll".
llvm-svn: 244466
Summary:
This adds a hook to TTI which enables us to selectively turn on by default
interleaved access vectorization for targets on which we have have performed
the required benchmarking.
Reviewers: rengolin
Subscribers: rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D11901
llvm-svn: 244449
The scalarizer can cache incorrect entries when walking up a chain of
insertelement instructions. This occurs when it encounters more than one
instruction that it is not actively searching for, as it unconditionally caches
every element it finds. The fix is to only cache the first element that it
isn't searching for so we don't overwrite correct entries.
Reviewers: hfinkel
Differential Revision: http://reviews.llvm.org/D11559
llvm-svn: 244448
This is the full set of checks that clients can further filter. IOW,
it's client-agnostic. This makes LAA complete in the sense that it now
provides the two main results of its analysis precomputed:
1. memory dependences via getDepChecker().getInsterestingDependences()
2. run-time checks via getRuntimePointerCheck().getChecks()
However, as a consequence we now compute this information pro-actively.
Thus if the client decides to skip the loop based on the dependences
we've computed the checks unnecessarily. In order to see whether this
was a significant overhead I checked compile time on SPEC2k6 LTO bitcode
files. The change was in the noise.
The checks are generated in canCheckPtrAtRT, at the same place where we
used to call groupChecks to merge checks.
llvm-svn: 244368
Summary: llvm::ConstantFoldTerminator function can convert SwitchInst with single case (and default) to a conditional BranchInst. This patch adds support to preserve make.implicit metadata on this conversion.
Reviewers: sanjoy, weimingz, chenli
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D11841
llvm-svn: 244348
This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes.
e.g.
%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)
In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) | 15) - giving a zero result.
In addition, this review also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821).
Differential Revision: http://reviews.llvm.org/D11760
llvm-svn: 244341
As a follow-up to r244181, resolve uniquing cycles underneath distinct
nodes on the fly. This prevents uniquing cycles in early operands from
affecting later operands. It also removes an iteration through distinct
nodes' operands.
No real functional change here, just more prompt resolution of temporary
nodes.
llvm-svn: 244302
After r244074, we now have a successors() method to iterate over
all the successors of a TerminatorInst. This commit changes a bunch
of eligible loops to use it.
llvm-svn: 244260
iisUnmovableInstruction() had a list of instructions hardcoded which are
considered unmovable. The list lacked (at least) an entry for the va_arg
and cmpxchg instructions.
Fix this by introducing a new Instruction::mayBeMemoryDependent()
instead of maintaining another instruction list.
Patch by Matthias Braun <matze@braunis.de>.
Differential Revision: http://reviews.llvm.org/D11577
rdar://problem/22118647
llvm-svn: 244244
This is the first mechanical step in preparation for making this and all
the other alias analysis passes available to the new pass manager. I'm
factoring out all the totally boring changes I can so I'm moving code
around here with no other changes. I've even minimized the formatting
churn.
I'll reformat and freshen comments on the interface now that its located
in the right place so that the substantive changes don't triger this.
llvm-svn: 244197
around a DataLayout interface in favor of directly querying DataLayout.
This wrapper specifically helped handle the case where this no
DataLayout, but LLVM now requires it simplifynig all of this. I've
updated callers to directly query DataLayout. This in turn exposed
a bunch of places where we should have DataLayout readily available but
don't which I've fixed. This then in turn exposed that we were passing
DataLayout around in a bunch of arguments rather than making it readily
available so I've also fixed that.
No functionality changed.
llvm-svn: 244189
Rotate the algorithm for remapping distinct nodes in order to simplify
how uniquing cycles get resolved. This removes some of the recursion,
and, most importantly, exposes all uniquing cycles at the top-level.
Besides being a little more efficient -- temporary MDNodes won't live as
long -- the clearer logic should help protect against bugs like those
fixed in r243961 and r243976.
What are uniquing cycles? Why do they present challenges when remapping
metadata?
!0 = !{!1}
!1 = !{!0}
!0 and !1 form a simple uniquing cycle. When remapping from one
metadata graph to another, every uniquing cycle gets "duplicated"
through a dance:
!0-temp = !{!1?} ; map(!0): clone !0, VM[!0] = !0-temp
!1-temp = !{!0?} ; ..map(!1): clone !1, VM[!1] = !1-temp
!1-temp = !{!0-temp} ; ..map(!1): remap !1's operands
!2 = !{!0-temp} ; ..map(!1): uniquify: !1-temp => !2
!0-temp = !{!2} ; map(!0): remap !0's operands
!3 = !{!2} ; map(!0): uniquify: !0-temp => !3
; Result
!2 = !{!3}
!3 = !{!2}
(In the two "uniquify" steps above, the operands of !X-temp are compared
to the operands of !X. If they're the same, then !X-temp gets RAUW'ed
to !X; if they're different, then !X-temp is promoted to a new unique
node. The latter case always hits in for uniquing cycles, so we
duplicate all the nodes involved.)
Why is this a problem? Uniquable Metadata nodes that have temporary
node as transitive operands keep RAUW support until the temporary nodes
get finalized. With non-cycles, this happens automatically: when a
uniquable node's count of unresolved operands drops to zero, it
immediately sheds its own RAUW support (possibly triggering the same in
any node that references it). However, uniquing cycles create a
reference cycle, and uniqued nodes that transitively reference a
uniquing cycle are "stuck" in an unresolved state until someone calls
`MDNode::resolveCycles()` on a node in the unresolved subgraph.
Distinct nodes should help here (and mostly do): since they aren't
uniqued anywhere, they are guaranteed not to be RAUW'ed. They
effectively form a barrier between uniqued nodes, breaking some uniquing
cycles, and shielding uniqued nodes from uniquing cycles.
Unfortunately, with this barrier in place, the unresolved subgraph(s)
can be disjoint from the top-level node. The mapping algorithm needs to
find at least one representative from each disjoint subgraph. But which
nodes are *stuck*, and which will get resolved automatically? And which
nodes are in the unresolved subgraph? The old logic was conservative.
This commit rotates the logic for distinct nodes, so that we have access
to unresolved nodes at the top-level call to `llvm::MapMetadata()`.
Each time we return to the top-level, we know that all temporaries have
been RAUW'ed away. Here, it's safe (and necessary) to call
`resolveCycles()` immediately on unresolved operands.
This should also perform better than the old algorithm. The recursion
stack is shorter, temporary nodes don't live as long, and there are
fewer tracking references to unresolved nodes. As the debug info graph
introduces more 'distinct' nodes, remapping should incrementally get
cheaper and cheaper.
Aside from possible performance improvements (and reduced cruft in the
`LLVMContext`), there should be no functionality change here.
llvm-svn: 244181
Rename `remap()` to `remapOperands()`, and restrict its contract to
remapping operands. Previously, it also called `mapToMetadata()`, but
this logic is hard to reason about externally. In particular, this
refactors `mapUniquedNode()` to avoid redundant mapping calls, taking
advantage of the RAUWs that are already in place.
llvm-svn: 244168
The only place that tries to return a CallGraph by value
(CallGraphAnalysis::run) doesn't seem to be used right now, but it's a
reasonable bit of cleanup anyway.
llvm-svn: 244122
Create wrapper methods in the Function class for the OptimizeForSize and MinSize
attributes. We want to hide the logic of "or'ing" them together when optimizing
just for size (-Os).
Currently, we are not consistent about this and rely on a front-end to always set
OptimizeForSize (-Os) if MinSize (-Oz) is on. Thus, there are 18 FIXME changes here
that should be added as follow-on patches with regression tests.
This patch is NFC-intended: it just replaces existing direct accesses of the attributes
by the equivalent wrapper call.
Differential Revision: http://reviews.llvm.org/D11734
llvm-svn: 243994
r243883 and r243961 made a use-after-free far more likely:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/6041/steps/check-llvm%20asan/logs/stdio
Unresolved nodes get inserted into the `Cycles` array. If they later
get resolved through RAUW, we need to update the reference. It's
interesting that this never hit before (maybe an asan-ified clang
bootstrap with `-flto -g` would have hit it, but I admit I haven't tried
anything quite that crazy).
llvm-svn: 243976
This change was done as an audit and is by inspection. The new EH
system is still very much a work in progress. NFC for the landingpad
case.
llvm-svn: 243965
r243883 started moving 'distinct' nodes instead of duplicated them in
lib/Linker. This had the side-effect of sometimes not cloning uniqued
nodes that reference them. I missed a corner case:
!named = !{!0}
!0 = !{!1}
!1 = distinct !{!0}
!0 is the entry point for "remapping", and a temporary clone (say,
!0-temp) is created and mapped in case we need to model a uniquing
cycle.
Recursive descent into !1. !1 is distinct, so we leave it alone,
but update its operand to !0-temp.
Pop back out to !0. Its only operand, !1, hasn't changed, so we don't
need to use !0-temp. !0-temp goes out of scope, and we're finished
remapping, but we're left with:
!named = !{!0}
!0 = !{!1}
!1 = distinct !{null} ; uh oh...
Previously, if !0 and !0-temp ended up with identical operands, then
!0-temp couldn't have been referenced at all. Now that distinct nodes
don't get duplicated, that assumption is invalid. We need to
!0-temp->replaceAllUsesWith(!0) before freeing !0-temp.
I found this while running an internal `-flto -g` bootstrap. Strangely,
there was no case of this in the open source bootstrap I'd done before
commit...
llvm-svn: 243961
through PHI nodes across iterations.
This patch teaches the new advanced loop unrolling heuristics to propagate
constants into the loop from the preheader and around the backedge after
simulating each iteration. This lets us brute force solve simple recurrances
that aren't modeled effectively by SCEV. It also makes it more clear why we
need to process the loop in-order rather than bottom-up which might otherwise
make much more sense (for example, for DCE).
This came out of an attempt I'm making to develop a principled way to account
for dead code in the unroll estimation. When I implemented
a forward-propagating version of that it produced incorrect results due to
failing to propagate *cost* between loop iterations through the PHI nodes, and
it occured to me we really should at least propagate simplifications across
those edges, and it is quite easy thanks to the loop being in canonical and
LCSSA form.
Differential Revision: http://reviews.llvm.org/D11706
llvm-svn: 243900
Instead of cloning distinct `MDNode`s when linking in a module, just
move them over. The module linker destroys the source module, so the
old node would otherwise just be leaked on the context. Create the new
node in place. This also reduces the number of cloned uniqued nodes
(since it's less likely their operands have changed).
This mapping strategy is only correct when we're discarding the source,
so the linker turns it on via a ValueMapper flag, `RF_MoveDistinctMDs`.
There's nothing observable in terms of `llvm-link` output here: the
linked module should be semantically identical.
I'll be adding more 'distinct' nodes to the debug info metadata graph in
order to break uniquing cycles, so the benefits of this will partly come
in future commits. However, we should get some gains immediately, since
we have a fair number of 'distinct' `DILocation`s being linked in.
llvm-svn: 243883
This is a minor optimization to only check for unresolved operands
inside `mapDistinctNode()` if the operands have actually changed. This
shouldn't really cause any change in behaviour. I didn't actually see a
slowdown in a profile, I was just poking around nearby and saw the
opportunity.
llvm-svn: 243866
This introduces new instructions neccessary to implement MSVC-compatible
exception handling support. Most of the middle-end and none of the
back-end haven't been audited or updated to take them into account.
Differential Revision: http://reviews.llvm.org/D11097
llvm-svn: 243766
The patch changes the SLPVectorizer::vectorizeStores to choose the immediate
succeeding or preceding candidate for a store instruction when it has multiple
consecutive candidates. In this way it has better chance to find more slp
vectorization opportunities.
Differential Revision: http://reviews.llvm.org/D10445
llvm-svn: 243666
The reason I was passing this vector by value in the constructor so that
I wouldn't have to copy when initializing the corresponding member but
then I forgot the std::move.
The use-case is LoopDistribution which filters the checks then
std::moves it to LoopVersioning's constructor. With this interface we
can avoid any copies.
llvm-svn: 243616
Before, we were passing the pointer partitions to LAA. Now, we get all
the checks from LAA and filter out the checks within partitions in
LoopDistribution.
This effectively concludes the steps to move filtering memchecks from
LAA into its clients. There is still some cleanup left to remove the
unused interfaces in LAA that still take PtrPartition.
(Moving this functionality to LoopDistribution also requires
needsChecking on pointers to be made public.)
llvm-svn: 243613
Bonus change to remove emacs major mode marker from SystemZMachineFunctionInfo.cpp because emacs already knows it's C++ from the extension. Also fix typo "appeary" in AMDGPUMCAsmInfo.h.
llvm-svn: 243585
Summary:
returns_twice (most importantly, setjmp) functions are
optimization-hostile: if local variable is promoted to register, and is
changed between setjmp() and longjmp() calls, this update will be
undone. This is the reason why "man setjmp" advises to mark all these
locals as "volatile".
This can not be enough for ASan, though: when it replaces static alloca
with dynamic one, optionally called if UAR mode is enabled, it adds a
whole lot of SSA values, and computations of local variable addresses,
that can involve virtual registers, and cause unexpected behavior, when
these registers are restored from buffer saved in setjmp.
To fix this, just disable dynamic alloca and UAR tricks whenever we see
a returns_twice call in the function.
Reviewers: rnk
Subscribers: llvm-commits, kcc
Differential Revision: http://reviews.llvm.org/D11495
llvm-svn: 243561
ASan shadow on Android starts at address 0 for both historic and
performance reasons. This is possible because the platform mandates
-pie, which makes lower memory region always available.
This is not such a good idea on 64-bit platforms because of MAP_32BIT
incompatibility.
This patch changes Android/AArch64 mapping to be the same as that of
Linux/AAarch64.
llvm-svn: 243548
Summary:
As added initially, statepoints required their call targets to be a
constant pointer null if ``numPatchBytes`` was non-zero. This turns out
to be a problem ergonomically, since there is no way to mark patchable
statepoints as calling a (readable) symbolic value.
This change remove the restriction of requiring ``null`` call targets
for patchable statepoints, and changes PlaceSafepoints to maintain the
symbolic call target through its transformation.
Reviewers: reames, swaroop.sridhar
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11550
llvm-svn: 243502
Before the patch, the checks were generated internally in
addRuntimeCheck. Now, we use the new overloaded version of
addRuntimeCheck that takes the ready-made set of checks as a parameter.
The checks are now generated by the client (LoopDistribution) with the
new RuntimePointerChecking::generateChecks API.
Also the new printChecks API is used to print out the checks for
debugging.
This is to continue the transition over to the new model whereby clients
will get the full set of checks from LAA, filter it and then pass it to
LoopVersioning and in turn to addRuntimeCheck.
llvm-svn: 243382
Summary:
If a scale or a base register can be rewritten as "Zext({A,+,1})" then
LSR will now consider a formula of that form in its normal cost
computation.
Depends on D9180
Reviewers: qcolombet, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9181
llvm-svn: 243348
Summary:
Was D9784: "Remove loop variant range check when induction variable is
strictly increasing"
This change re-implements D9784 with the two differences:
1. It does not use SCEVExpander and does not generate new
instructions. Instead, it does a quick local search for existing
`llvm::Value`s that it needs when modifying the `icmp`
instruction.
2. It is more general -- it deals with both increasing and decreasing
induction variables.
I've added all of the tests included with D9784, and two more.
As an example on what this change does (copied from D9784):
Given C code:
```
for (int i = M; i < N; i++) // i is known not to overflow
if (i < 0) break;
a[i] = 0;
}
```
This transformation produces:
```
for (int i = M; i < N; i++)
if (M < 0) break;
a[i] = 0;
}
```
Which can be unswitched into:
```
if (!(M < 0))
for (int i = M; i < N; i++)
a[i] = 0;
}
```
I went back and forth on whether the top level logic should live in
`SimplifyIndvar::eliminateIVComparison` or be put into its own
routine. Right now I've put it under `eliminateIVComparison` because
even though the `icmp` is not *eliminated*, it no longer is an IV
comparison. I'm open to putting it in its own helper routine if you
think that is better.
Reviewers: reames, nicholas, atrick
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11278
llvm-svn: 243331
Now that we are generating sane codegen for vector sext/zext nodes on SSE targets, this patch uses instcombine to replace the SSE41/AVX2 pmovsx and pmovzx intrinsics with the equivalent native IR code.
Differential Revision: http://reviews.llvm.org/D11503
llvm-svn: 243303
This reverts commit r243167.
Duncan pointed out that dyn_cast can return null in these cases, so this
was an unsafe commit to make. Sorry for the noise.
Worryingly there were no tests which fail...
llvm-svn: 243302
r243250 appeared to break clang/test/Analysis/dead-store.c on one of the build
slaves, but I couldn't reproduce this failure locally. Probably a false
positive as I saw this test was broken by r243246 or r243247 too but passed
later without people fixing anything.
llvm-svn: 243253
Summary:
This patch updates TargetTransformInfoImplCRTPBase::getGEPCost to consider
addressing modes. It now returns TCC_Free when the GEP can be completely folded
to an addresing mode.
I started this patch as I refactored SLSR. Function isGEPFoldable looks common
and is indeed used by some WIP of mine. So I extracted that logic to getGEPCost.
Furthermore, I noticed getGEPCost wasn't directly tested anywhere. The best
testing bed seems CostModel, but its getInstructionCost method invokes
getAddressComputationCost for GEPs which provides very coarse estimation. So
this patch also makes getInstructionCost call the updated getGEPCost for GEPs.
This change inevitably breaks some tests because the cost model changes, but
nothing looks seriously wrong -- if we believe the new cost model is the right
way to go, these tests should be updated.
This patch is not perfect yet -- the comments in some tests need to be updated.
I want to know whether this is a right approach before fixing those details.
Reviewers: chandlerc, hfinkel
Subscribers: aschwaighofer, llvm-commits, aemerson
Differential Revision: http://reviews.llvm.org/D9819
llvm-svn: 243250
Summary:
This patch improves trivial loop unswitch.
The current trivial loop unswitch only checks if loop header's terminator contains a trivial unswitch condition. But if the loop header only has one reachable successor (due to intentionally or unintentionally missed code simplification), we should consider the successor as part of the loop header. Therefore, instead of stopping at loop header's terminator, we should keep traversing its successors within loop until reach a *real* conditional branch or switch (whose condition can not be constant folded). This change will enable a single -loop-unswitch pass to unswitch multiple trivial conditions (unswitch one trivial condition could open opportunity to unswitch another one in the same loop), while the old implementation can unswitch only one per pass.
Reviewers: reames, broune
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11481
llvm-svn: 243203
This patch extend LoopReroll pass to hand the loops which
is similar to the following:
while (len > 1) {
sum4 += buf[len];
sum4 += buf[len-1];
len -= 2;
}
llvm-svn: 243171
Since both places which set this variable do so with dyn_cast, and not
dyn_cast_or_null, its impossible to get a nullptr here, so we can remove
the check.
llvm-svn: 243167
Instead of the pattern
for (auto I = x.rbegin(), E = x.end(); I != E; ++I)
we can use make_range to construct the reverse range and iterate using
that instead.
llvm-svn: 243163
Summary:
This threshold limited FunctionAttrs ability to prove arguments to be read-only.
In NVPTX, a specialized instruction ld.global.nc can be used to load memory
with non-coherent texture cache. We notice that in SHOC [1] benchmark, some
function arguments are not marked with readonly because FunctionAttrs reaches
a hardcoded threshold when analysis uses.
Removing this threshold won't cause significant regression in compilation time, because the worst-case time complexity of the algorithm is still O(# of instructions) for each parameter.
Patched by Xuetian Weng.
[1] https://github.com/vetter/shoc
Reviewers: nlewycky, jingyue, nicholas
Subscribers: nicholas, test, llvm-commits
Differential Revision: http://reviews.llvm.org/D11311
llvm-svn: 243141
The names for instructions inserted were previous dependent on iteration order. By deriving the names from the original instructions, we can avoid instability in tests without resorting to ordered traversals. It also makes the IR mildly easier to read at large scale.
llvm-svn: 243140
We had a few places where we did
for (unsigned i = 0, e = STy->getNumElements(); i != e; ++i) {
but those could instead do
for (auto *EltTy : STy->elements()) {
llvm-svn: 243136
Summary:
Resolving a branch allows us to ignore blocks that won't be executed, and thus make our estimate more accurate.
This patch is intended to be applied after D10205 (though it could be applied independently).
Reviewers: chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10206
llvm-svn: 243084
The new code should hopefully be equivalent to the old code; it just uses a worklist to track instructions which need to visited rather than iterating over all instructions visited each time. This should be faster, but the primary benefit is that the purpose should be more clear and the diff of adding another instruction type (forthcoming) much more obvious.
Differential Revision: http://reviews.llvm.org/D11480
llvm-svn: 243071
Deleting much of the code using trace-rewrite-statepoints and use idiomatic DEBUG statements instead. This includes adding operator<< to a helper class.
llvm-svn: 243054
We don't need to pass in the map from BDV to PhiStates; we can instead handle that externally and let the MeetPhiStates helper class just meet PhiStates.
llvm-svn: 243045
Summary:
Scalarizer has two data structures that hold information about changes
to the function, Gathered and Scattered. These are cleared in finish()
at the end of runOnFunction() if finish() detects any changes to the
function.
However, finish() was checking for changes by only checking if
Gathered was non-empty. The function visitStore() only modifies
Scattered without touching Gathered. As a result, Scattered could have
ended up having stale data if Scalarizer only scalarized store
instructions. Since the data in Scattered is used during the execution
of the pass, this introduced dangling pointer errors.
The fix is to check whether both Scattered and Gathered are empty
before deciding what to do in finish(). This also fixes a problem
where the Function can be modified although the pass returns false.
Reviewers: rnk
Subscribers: rnk, srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D10459
llvm-svn: 243040
We currently version `__asan_init` and when the ABI version doesn't match, the linker gives a `undefined reference to '__asan_init_v5'` message. From this, it might not be obvious that it's actually a version mismatch error. This patch makes the error message much clearer by changing the name of the undefined symbol to be `__asan_version_mismatch_check_xxx` (followed by the version string). We obviously don't want the initializer to be named like that, so it's a separate symbol that is used only for the purpose of version checking.
Reviewed at http://reviews.llvm.org/D11004
llvm-svn: 243003
the general GMR-in-non-LTO flag.
Without this, we have the global information during the CGSCC pipeline
for GVN and such, but don't have it available during the late loop
optimizations such as the vectorizer. Moreover, after the CGSCC pipeline
has finished we have substantially more accurate and refined call graph
information, function annotations, etc, which will make GMR even more
powerful than it is early in the pipelien.
Note that we have to play silly games with preserving AliasAnalysis
(which is now trivially preserved) in order to let a module analysis
magically be preserved into the entire function pass pipeline.
Simultaneously we have to not make GMR an immutable pass in order to be
able to re-run it and collect fresh data on the final call graph.
llvm-svn: 242999
preparation for de-coupling the AA implementations.
In order to do this, they had to become fake-scoped using the
traditional LLVM pattern of a leading initialism. These can't be actual
scoped enumerations because they're bitfields and thus inherently we use
them as integers.
I've also renamed the behavior enums that are specific to reasoning
about the mod/ref behavior of functions when called. This makes it more
clear that they have a very narrow domain of applicability.
I think there is a significantly cleaner API for all of this, but
I don't want to try to do really substantive changes for now, I just
want to refactor the things away from analysis groups so I'm preserving
the exact original design and just cleaning up the names, style, and
lifting out of the class.
Differential Revision: http://reviews.llvm.org/D10564
llvm-svn: 242963