Extend signed relational comparison instrumentation with a special
case for comparisons with -1. This fixes an MSan false positive when
such comparison is used as a sign bit test.
https://llvm.org/bugs/show_bug.cgi?id=24561
llvm-svn: 245980
The loop minimum iterations check below ensures the loop has enough trip count so the generated
vector loop will likely be executed, and it covers the overflow check.
Differential Revision: http://reviews.llvm.org/D12107.
llvm-svn: 245952
Eventually, we will need sample profiles to be incorporated into the
inliner's cost models. To do this, we need the sample profile pass to
be a module pass.
This patch makes no functional changes beyond the mechanical adjustments
needed to run SampleProfile as a module pass.
llvm-svn: 245940
It doesn't solve the problem, when for example we load something, and
then assume that it is the same as some constant value, because
globalopt will fail on unknown load instruction. The proposed solution
would be to skip some instructions that we can't evaluate and they are
safe to skip (f.e. load, assume and many others) and see if they are
required to perform optimization (f.e. we don't care about ephemeral
instructions that may appear using @llvm.assume())
http://reviews.llvm.org/D12266
llvm-svn: 245919
This patch adds support for dfsan on aarch64-linux with 42-bit VMA
(current default config for 64K pagesize kernels). The support is
enabled by defining the SANITIZER_AARCH64_VMA to 42 at build time
for both clang/llvm and compiler-rt. The default VMA is 39 bits.
llvm-svn: 245840
TL-DR: SROA is followed by EarlyCSE which requires the DominatorTree.
There is no reason not to require it up-front for SROA.
Some history is necessary to understand why we ended-up here.
r123437 switched the second (Legacy)SROA in the optimizer pipeline to
use SSAUpdater in order to avoid recomputing the costly
DominanceFrontier. The purpose was to speed-up the compile-time.
Later r123609 removed the need for the DominanceFrontier in
(Legacy)SROA.
Right after, some cleanup was made in r123724 to remove any reference
to the DominanceFrontier. SROA existed in two flavors: SROA_SSAUp and
SROA_DT (the latter replacing SROA_DF).
The second argument of `createScalarReplAggregatesPass` was renamed
from `UseDomFrontier` to `UseDomTree`.
I believe this is were a mistake was made. The pipeline was not
updated and the call site was still:
PM->add(createScalarReplAggregatesPass(-1, false));
At that time, SROA was immediately followed in the pipeline by
EarlyCSE which required alread the DominatorTree. Not requiring
the DominatorTree in SROA didn't save anything, but unfortunately
it was lost at this point.
When the new SROA Pass was introduced in r163965, I believe the goal
was to have an exact replacement of the existing SROA, this bug
slipped through.
You can see currently:
$ echo "" | clang -x c++ -O3 -c - -mllvm -debug-pass=Structure
...
...
FunctionPass Manager
SROA
Dominator Tree Construction
Early CSE
After this patch:
$ echo "" | clang -x c++ -O3 -c - -mllvm -debug-pass=Structure
...
...
FunctionPass Manager
Dominator Tree Construction
SROA
Early CSE
This improves the compile time from 88s to 23s for PR17855.
https://llvm.org/bugs/show_bug.cgi?id=17855
And from 113s to 12s for PR16756
https://llvm.org/bugs/show_bug.cgi?id=16756
Reviewers: chandlerc
Differential Revision: http://reviews.llvm.org/D12267
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245820
Summary:
WinEHPrepare is going to require that cleanuppad and catchpad produce values
of token type which are consumed by any cleanupret or catchret exiting the
pad. This change updates the signatures of those operators to require/enforce
that the type produced by the pads is token type and that the rets have an
appropriate argument.
The catchpad argument of a `CatchReturnInst` must be a `CatchPadInst` (and
similarly for `CleanupReturnInst`/`CleanupPadInst`). To accommodate that
restriction, this change adds a notion of an operator constraint to both
LLParser and BitcodeReader, allowing appropriate sentinels to be constructed
for forward references and appropriate error messages to be emitted for
illegal inputs.
Also add a verifier rule (noted in LangRef) that a catchpad with a catchpad
predecessor must have no other predecessors; this ensures that WinEHPrepare
will see the expected linear relationship between sibling catches on the
same try.
Lastly, remove some superfluous/vestigial casts from instruction operand
setters operating on BasicBlocks.
Reviewers: rnk, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12108
llvm-svn: 245797
Summary:
Merge functions previously relied on unsigned comparisons of pointer values to
order functions. This caused observable non-determinism in the compiler for
large bitcode programs. Basically, opt -mergefuncs program.bc | md5sum produces
different hashes when run repeatedly on the same machine. Differing output was
observed on three large bitcodes, but it was less frequent on the smallest file.
It is possible that this only manifests on the large inputs, hence remaining
undetected until now.
This patch fixes this by removing (almost, see below) all places where
comparisons between pointers are used to order functions. Most of these changes
are local, but the comparison of global values requires assigning an identifier
to each local in the order it is visited. This is very similar to the way the
comparison function identifies Value*'s defined within a function. Because the
order of visiting the functions and their subparts is deterministic, the
identifiers assigned to the globals will be as well, and the order of functions
will be deterministic.
With these changes, there is no more observed non-determinism. There is also
only minor slowdowns (negligible to 4%) compared to the baseline, which is
likely a result of the fact that global comparisons involve hash lookups and not
just pointer comparisons.
The one caveat so far is that programs containing BlockAddress constants can
still be non-deterministic. It is not clear what the right solution is here. In
particular, even if the global numbers are used to order by function, we still
need a way to order the BasicBlock*'s. Unfortunately, we cannot just bail out
and fail to order the functions or consider them equal, because we require a
total order over functions. Note that programs with BlockAddress constants are
relatively rare, so the impact of leaving this in is minor as long as this pass
is opt-in.
Author: jrkoenig
Reviewers: nlewycky, jfb, dschuff
Subscribers: jevinskie, llvm-commits, chapuni
Differential revision: http://reviews.llvm.org/D12168
llvm-svn: 245762
The original checkin was buggy, this change has a fix.
Original commit message:
[InstCombine] Transform A & (L - 1) u< L --> L != 0
Summary:
This transform is never a pessimization at the IR level (since it
replaces an `icmp` with another), and has potentiall payoffs:
1. It may make the `icmp` fold away or become loop invariant.
2. It may make the `A & (L - 1)` computation dead.
This shows up in Java, in range checks generated by array accesses of
the form `a[i & (a.length - 1)]`.
Reviewers: reames, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12210
llvm-svn: 245753
Gets a bit tricky in the ValueMapper, of course - not sure if we should
just expose a list of explicit types for each Value so that the
ValueMapper can be neutral to these special cases (it's OK for things
like load, where the explicit type is the result type - but when that's
not the case, it means plumbing through another "special" type... )
llvm-svn: 245728
The module splitter splits a module into linkable partitions. It will
be used to implement parallel LTO code generation.
This initial version of the splitter does not attempt to deal with the
somewhat subtle symbol visibility issues around module splitting. These
will be dealt with in a future change.
Differential Revision: http://reviews.llvm.org/D12132
llvm-svn: 245662
Summary:
This transform is never a pessimization at the IR level (since it
replaces an `icmp` with another), and has potentiall payoffs:
1. It may make the `icmp` fold away or become loop invariant.
2. It may make the `A & (L - 1)` computation dead.
This shows up in Java, in range checks generated by array accesses of
the form `a[i & (a.length - 1)]`.
Reviewers: reames, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12210
llvm-svn: 245635
and make it always preserve debug locations, since all callers wanted this
behavior anyway.
This is addressing a post-commit review feedback for r245589.
NFC (inside the LLVM tree).
llvm-svn: 245622
This patch adds support for asan on aarch64-linux with 42-bit VMA
(current default config for 64K pagesize kernels). The support is
enabled by defining the SANITIZER_AARCH64_VMA to 42 at build time
for both clang/llvm and compiler-rt. The default VMA is 39 bits.
llvm-svn: 245594
Summary:
Refactor, NFC
Extracts computeOverflowForSignedAdd and isKnownNonNegative from NaryReassociate to ValueTracking in case
others need it.
Reviewers: reames
Subscribers: majnemer, llvm-commits
Differential Revision: http://reviews.llvm.org/D11313
llvm-svn: 245591
Instruction::dropUnknownMetadata(KnownSet) is supposed to preserve all
metadata in KnownSet, but the condition for DebugLocs was inverted.
Most users of dropUnknownMetadata() actually worked around this by not
adding LLVMContext::MD_dbg to their list of KnowIDs.
This is now made explicit.
llvm-svn: 245589
Caught by the famous "DebugLoc describes the currect SubProgram" assertion.
When GVN is removing a nonlocal load it updates the debug location of the
SSA value it replaced the load with with the one of the load. In the
testcase this actually overwrites a valid debug location with an empty one.
In reality GVN has to make an arbitrary choice between two equally valid
debug locations. This patch changes to behavior to only update the
location if the value doesn't already have a debug location.
llvm-svn: 245588
Since Ashutosh made findDefsUsedOutsideOfLoop public, we can clean this
up.
Now clients that don't compute DefsUsedOutsideOfLoop can just call
versionLoop() and computing DefsUsedOutsideOfLoop will happen
implicitly. With that there is no reason to expose addPHINodes anymore.
Ashutosh, you can now drop the calls to findDefsUsedOutsideOfLoop and
addPHINodes in LVerLICM and things should just work.
llvm-svn: 245579
Summary: We know that -x & 1 is equivalent to x & 1, avoid using negation for testing if a negative integer is even or odd.
Reviewers: majnemer
Subscribers: junbuml, mssimpso, gberry, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D12156
llvm-svn: 245569
Usually DSE is not supposed to remove lifetime intrinsics, but it's
actually ok to remove them for dead objects in terminating blocks,
because they convey no extra information there. Until we hit a lifetime
start that cannot be removed, that is. Because from that point on the
lifetime intrinsics become interesting again, e.g. for stack coloring.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11710
llvm-svn: 245542
analyses into LLVM's Analysis library rather than having them in
a Transforms library.
This is motivated by the need to have the core AliasAnalysis
infrastructure be aware of the ObjCARCAliasAnalysis. However, it also
seems like a nice and clean separation. Everything was very easy to move
and this doesn't create much clutter in the analysis library IMO.
Differential Revision: http://reviews.llvm.org/D12133
llvm-svn: 245541
folding the code into the main Analysis library.
There already wasn't much of a distinction between Analysis and IPA.
A number of the passes in Analysis are actually IPA passes, and there
doesn't seem to be any advantage to separating them.
Moreover, it makes it hard to have interactions between analyses that
are both local and interprocedural. In trying to make the Alias Analysis
infrastructure work with the new pass manager, it becomes particularly
awkward to navigate this split.
I've tried to find all the places where we referenced this, but I may
have missed some. I have also adjusted the C API to continue to be
equivalently functional after this change.
Differential Revision: http://reviews.llvm.org/D12075
llvm-svn: 245318
After hitting @llvm.assume(X) we can:
- propagate equality that X == true
- if X is icmp/fcmp (with eq operation), and one of operand
is constant we can change all variables with constants in the same BasicBlock
http://reviews.llvm.org/D11918
llvm-svn: 245265
PR24469 resulted because DeleteDeadInstruction in handleNonLocalStoreDeletion was
deleting the next basic block iterator. Fixed the same by resetting the basic block iterator
post call to DeleteDeadInstruction.
llvm-svn: 245195
This change makes ScalarEvolution a stand-alone object and just produces
one from a pass as needed. Making this work well requires making the
object movable, using references instead of overwritten pointers in
a number of places, and other refactorings.
I've also wired it up to the new pass manager and added a RUN line to
a test to exercise it under the new pass manager. This includes basic
printing support much like with other analyses.
But there is a big and somewhat scary change here. Prior to this patch
ScalarEvolution was never *actually* invalidated!!! Re-running the pass
just re-wired up the various other analyses and didn't remove any of the
existing entries in the SCEV caches or clear out anything at all. This
might seem OK as everything in SCEV that can uses ValueHandles to track
updates to the values that serve as SCEV keys. However, this still means
that as we ran SCEV over each function in the module, we kept
accumulating more and more SCEVs into the cache. At the end, we would
have a SCEV cache with every value that we ever needed a SCEV for in the
entire module!!! Yowzers. The releaseMemory routine would dump all of
this, but that isn't realy called during normal runs of the pipeline as
far as I can see.
To make matters worse, there *is* actually a key that we don't update
with value handles -- there is a map keyed off of Loop*s. Because
LoopInfo *does* release its memory from run to run, it is entirely
possible to run SCEV over one function, then over another function, and
then lookup a Loop* from the second function but find an entry inserted
for the first function! Ouch.
To make matters still worse, there are plenty of updates that *don't*
trip a value handle. It seems incredibly unlikely that today GVN or
another pass that invalidates SCEV can update values in *just* such
a way that a subsequent run of SCEV will incorrectly find lookups in
a cache, but it is theoretically possible and would be a nightmare to
debug.
With this refactoring, I've fixed all this by actually destroying and
recreating the ScalarEvolution object from run to run. Technically, this
could increase the amount of malloc traffic we see, but then again it is
also technically correct. ;] I don't actually think we're suffering from
tons of malloc traffic from SCEV because if we were, the fact that we
never clear the memory would seem more likely to have come up as an
actual problem before now. So, I've made the simple fix here. If in fact
there are serious issues with too much allocation and deallocation,
I can work on a clever fix that preserves the allocations (while
clearing the data) between each run, but I'd prefer to do that kind of
optimization with a test case / benchmark that shows why we need such
cleverness (and that can test that we actually make it faster). It's
possible that this will make some things faster by making the SCEV
caches have higher locality (due to being significantly smaller) so
until there is a clear benchmark, I think the simple change is best.
Differential Revision: http://reviews.llvm.org/D12063
llvm-svn: 245193
If we can ignore NaNs, fmin/fmax libcalls can become compare and select
(this is what we turn std::min / std::max into).
This IR should then be optimized in the backend to whatever is best for
any given target. Eg, x86 can use minss/maxss instructions.
This should solve PR24314:
https://llvm.org/bugs/show_bug.cgi?id=24314
Differential Revision: http://reviews.llvm.org/D11866
llvm-svn: 245187
Bitwise arithmetic can obscure a simple sign-test. If replacing the
mask with a truncate is preferable if the type is legal because it
permits us to rephrase the comparison more explicitly.
llvm-svn: 245171