There were several SelectInst combines that always returned an existing
instruction instead of modifying an old one or creating a new one.
These are prime candidates for moving to InstSimplify.
llvm-svn: 239229
Summary:
canUnrollCompletely takes `unsigned` values for `UnrolledCost` and
`RolledDynamicCost` but is passed in `uint64_t`s that are silently
truncated. Because of this, when `UnrolledSize` is a large integer
that has a small remainder with UINT32_MAX, LLVM tries to completely
unroll loops with high trip counts.
Reviewers: mzolotukhin, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10293
llvm-svn: 239218
CVP wants to analyze the condition operand of a select along an edge.
It succeeds in getting back a Constant but not a ConstantInt. Instead,
it gets a ConstantExpr. It then assumes that the Constant must be equal
to false because it isn't equal to true.
Instead, perform an additional comparison.
This fixes PR23752.
llvm-svn: 239217
If we have (select a, b, c), it is sometimes valid to simplify this to a
single select operand. However, doing so is only valid if the
computation doesn't inject poison into the computation.
It might be helpful to consider the following example:
(select (icmp ne %i, INT_MAX), (add nsw %i, 1), INT_MIN)
The select is equivalent to (add %i, 1) but not (add nsw %i, 1).
Self hosting on x86_64 revealed that this occurs very, very rarely so
bailing out is hopefully pretty reasonable.
llvm-svn: 239215
This reverts commit r239141. This commit was an attempt to reintroduce
a previous patch that broke many self-hosting bots with clang timeouts,
but it still has slowdown issues, at least on ARM, increasing the
compilation time (stage 2, clang's) by 5x.
llvm-svn: 239175
This change is NFC because both the ``break;`` and the fall through end
up returning immediately. However, this helps clarify intent and also
ensures correctness in case more ``case`` blocks are added later.
llvm-svn: 239172
The new naming is (to me) much easier to understand. Here is a summary
of the new state of the world:
- '*Threshold' is the threshold for full unrolling. It is measured
against the estimated unrolled cost as computed by getUserCost in TTI
(or CodeMetrics, etc). We will exceed this threshold when unrolling
loops where unrolling exposes a significant degree of simplification
of the logic within the loop.
- '*PercentDynamicCostSavedThreshold' is the percentage of the loop's
estimated dynamic execution cost which needs to be saved by unrolling
to apply a discount to the estimated unrolled cost.
- '*DynamicCostSavingsDiscount' is the discount applied to the estimated
unrolling cost when the dynamic savings are expected to be high.
When actually analyzing the loop, we now produce both an estimated
unrolled cost, and an estimated rolled cost. The rolled cost is notably
a dynamic estimate based on our analysis of the expected execution of
each iteration.
While we're still working to build up the infrastructure for making
these estimates, to me it is much more clear *how* to make them better
when they have reasonably descriptive names. For example, we may want to
apply estimated (from heuristics or profiles) dynamic execution weights
to the *dynamic* cost estimates. If we start doing that, we would also
need to track the static unrolled cost and the dynamic unrolled cost, as
only the latter could reasonably be weighted by profile information.
This patch is sadly not without functionality change for the new unroll
analysis logic. Buried in the heuristic management were several things
that surprised me. For example, we never subtracted the optimized
instruction count off when comparing against the unroll heursistics!
I don't know if this just got lost somewhere along the way or what, but
with the new accounting of things, this is much easier to keep track of
and we use the post-simplification cost estimate to compare to the
thresholds, and use the dynamic cost reduction ratio to select whether
we can exceed the baseline threshold.
The old values of these flags also don't necessarily make sense. My
impression is that none of these thresholds or discounts have been tuned
yet, and so they're just arbitrary placehold numbers. As such, I've not
bothered to adjust for the fact that this is now a discount and not
a tow-tier threshold model. We need to tune all these values once the
logic is ready to be enabled.
Differential Revision: http://reviews.llvm.org/D9966
llvm-svn: 239164
isInductionPHI wants to calculate the stride based on the pointee size.
However, this is not possible when the pointee is zero sized.
This fixes PR23763.
llvm-svn: 239143
I don't have the IR which is causing the build bot breakage but I can
postulate as to why they are timing out:
1. SimplifyWithOpReplaced was stripping flags from the simplified value.
2. visitSelectInstWithICmp was overriding SimplifyWithOpReplaced because
it's simplification wasn't correct.
3. InstCombine would revisit the add instruction and note that it can
rederive the flags.
4. By modifying the value, we chose to revisit instructions which reuse
the value. One of the instructions is the original select, causing
LLVM to never reach fixpoint.
Instead, strip the flags only when we are sure we are going to perform
the simplification.
llvm-svn: 239141
We cleverly handle cases where computation done in one argument of a select
instruction is suitable for the other operand, thus obviating the need
of the select and the comparison. However, the other operand cannot
have flags.
This fixes PR23757.
llvm-svn: 239115
port it to the new pass manager.
All this does is extract the inner "location" class used by AA into its
own full fledged type. This seems *much* cleaner as MemoryDependence and
soon MemorySSA also use this heavily, and it doesn't make much sense
being inside the AA infrastructure.
This will also make it much easier to break apart the AA infrastructure
into something that stands on its own rather than using the analysis
group design.
There are a few places where this makes APIs not make sense -- they were
taking an AliasAnalysis pointer just to build locations. I'll try to
clean those up in follow-up commits.
Differential Revision: http://reviews.llvm.org/D10228
llvm-svn: 239003
Summary:
Once a gc.statepoint has been rewritten to relocate live references, the
SSA values represent physical pointers instead of logical references.
Logical dereferencability does not imply physical dereferencability and
after RewriteStatepointsForGC has run any attributes that imply
dereferencability of the logical references need to be stripped.
This current approach is conservative, and can be made more precise
later if needed. For starters, we need to strip dereferencable
attributes only from pointers that live in the GC address space.
Reviewers: reames, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10105
llvm-svn: 238883
Summary:
A later change that has RewriteStatepointsForGC change function
attributes throughout the module depends on this.
Reviewers: reames, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D10104
llvm-svn: 238882
Alternatively, this type could be derived on-demand whenever
getResultElementType is called - if someone thinks that's the better
choice (simple time/space tradeoff), I'm happy to give it a go.
llvm-svn: 238716
If the type isn't trivially moveable emplace can skip a potentially
expensive move. It also saves a couple of characters.
Call sites were found with the ASTMatcher + some semi-automated cleanup.
memberCallExpr(
argumentCountIs(1), callee(methodDecl(hasName("push_back"))),
on(hasType(recordDecl(has(namedDecl(hasName("emplace_back")))))),
hasArgument(0, bindTemporaryExpr(
hasType(recordDecl(hasNonTrivialDestructor())),
has(constructExpr()))),
unless(isInTemplateInstantiation()))
No functional change intended.
llvm-svn: 238602
The patch evaluates the expansion cost of exitValue in indVarSimplify pass, and only does the rewriting when the expansion cost is low or loop can be deleted with the rewriting. It provides an option "-replexitval=" to control the default aggressiveness of the exitvalue rewriting. It also fixes some missing cases in SCEVExpander::isHighCostExpansionHelper to enhance the evaluation of SCEV expansion cost.
Differential Revision: http://reviews.llvm.org/D9800
llvm-svn: 238507
Currently we only fold a BitCast into a Load when the BitCast is its
only user.
Do the same for any no-op cast.
Differential Revision: http://reviews.llvm.org/D9152
llvm-svn: 238452
Canonicalizing 'x [+-] (-Constant * y)' is not a win if we don't *know*
we will open up CSE opportunities.
If the multiply was 'nsw', then negating 'y' requires us to clear the
'nsw' flag. If this is actually worth pursuing, it is probably more
appropriate to do so in GVN or EarlyCSE.
This fixes PR23675.
llvm-svn: 238397
Summary:
This patch made two improvements to NaryReassociate and the NVPTX pipeline
1. Run EarlyCSE/GVN after NaryReassociate to get rid of redundant common
expressions.
2. When adding an instruction to SeenExprs, maps both the SCEV before and after
reassociation to that instruction.
Test Plan: updated @reassociate_gep_nsw in nary-gep.ll
Reviewers: meheff, broune
Reviewed By: broune
Subscribers: dberlin, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9947
llvm-svn: 238396
This fixes a bit I forgot in r238335. In addition to the data record and
the counter, we can also move the name of the counter to the comdat for
the associated function.
I'm also adding an IR test case to check that these three elements are
placed in the proper comdat.
llvm-svn: 238351
Counter symbols created for linkonce functions are not discarded by ELF
linkers unless the symbols are placed in the same comdat section as its
associated function.
llvm-svn: 238335
Long ago, the poll insertion code assumed that the insertion site was a terminator. As a result, the entry selection code would split a basic block to ensure it could pass a terminator. The insertion code was updated quite a while ago - possibly before it ever landed upstream - but the now redundant work was never removed.
While I'm at it, remove a comment which doesn't apply to the upstreamed code.
NFC intended.
llvm-svn: 238254
While working on another change, I noticed that the naming in this function was mildly deceptive. While fixing that, I took the oppurtunity to modernize some of the code. NFC intended.
llvm-svn: 238252
Summary:
In case of functions that have a pointer argument and only pass it to
each other, the function attributes pass deduces that the pointer should
get the readnone attribute, but fails to remove a readonly attribute
that may already have been present.
Reviewers: nlewycky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9995
llvm-svn: 238152
lazily built.
Also, make it a much more generic SCEV cache, which today exposes only
a reduced GEP model description but could be extended in the future to
do other profitable caching of SCEV information.
llvm-svn: 238124
This patch extends EarlyCSE to take advantage of the information that a controlling branch gives us about the value of a Value within this and dominated basic blocks. If the current block has a single predecessor with a controlling branch, we can infer what the branch condition must have been to execute this block. The actual change to support this is downright simple because EarlyCSE's existing scoped hash table logic deals with most of the complexity around merging.
The patch actually implements two optimizations.
1) The first is analogous to JumpThreading in that it enables EarlyCSE's CSE handling to fold branches which are exactly redundant due to a previous branch to branches on constants. (It doesn't actually replace the branch or change the CFG.) This is pretty clearly a win since it enables substantial CFG simplification before we start trying to inline.
2) The second is analogous to CVP in that it exploits the knowledge gained to replace dominated *uses* of the original value. EarlyCSE does not otherwise reason about specific uses, so this is the more arguable one. It does enable further simplication and constant folding within the rest of the visit by EarlyCSE.
In both cases, the added code only handles the easy dominance based case of each optimization. The general case is deferred to the existing passes.
Differential Revision: http://reviews.llvm.org/D9763
llvm-svn: 238071
InstCombine transforms A *nsw B +nsw A *nsw C to A *nsw (B + C).
This is incorrect -- e.g. if A = -1, B = 1, C = INT_SMAX. Then
nothing in the LHS overflows, but the multiplication in RHS overflows.
We need to first make sure that we won't multiple by INT_SMAX + 1.
Test case `add_of_mul` contributed by Sanjoy Das.
This fixes PR23635.
Differential Revision: http://reviews.llvm.org/D9629
llvm-svn: 238066
accumulating estimated cost, and other loop-centric logic from the logic
used to analyze instructions in a particular iteration.
This makes the visitor very narrow in scope -- all it does is visit
instructions, update a map of simplified values, and return whether it
is able to optimize away a particular instruction.
The two cost metrics are now returned as an optional struct. When the
optional is left unengaged, there is no information about the unrolled
cost of the loop, when it is engaged the cost metrics are available to
run against the thresholds.
No functionality changed.
llvm-svn: 238033
This change does a few things:
- Move some InstCombine transforms to InstSimplify
- Run SimplifyCall from within InstCombine::visitCallInst
- Teach InstSimplify to fold [us]mul_with_overflow(X, undef) to 0.
llvm-svn: 237995
simplified model for use simulating each iteration into a separate
helper function that just returns the cache.
Building this cache had nothing to do with the rest of the unroll
analysis and so this removes an unnecessary coupling, etc. It should
also make it easier to think about the concept of providing fast cached
access to basic SCEV models as an orthogonal concept to the overall
unroll simulation.
I'd really like to see this kind of caching logic folded into SCEV
itself, it seems weird for us to provide it at this layer rather than
making repeated queries into SCEV fast all on their own.
No functionality changed.
llvm-svn: 237993
a single location.
This reduces code duplication a bit and will also pave the way for
a better separation between the visitation algorithm and the unroll
analysis.
No functionality changed.
llvm-svn: 237990
PR23608 pointed out that using the preheader to gain a context instruction isn't always legal because a loop might not have a preheader. When looking into that, I realized that using the preheader to determine legality for sinking is questionable at best. Given no test covers that case and the original commit didn't seem to intend it, I restructured the code to only ask context sensative queries for hoising of loads and stores. This is effectively a partial revert of 237593.
llvm-svn: 237985
Summary:
x = &a[i];
y = &a[i + j];
=>
y = x + j;
along with some refactoring work such as extracting method
findClosestMatchingDominator.
Depends on D9786 which provides the ScalarEvolution::getGEPExpr interface.
Test Plan: nary-gep.ll
Reviewers: meheff, broune
Reviewed By: broune
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9802
llvm-svn: 237971
Correct assertion would be that there is no other uses from chain we are currently cloning. It is ok to have other uses of values not from this chain.
Differential Revision: http://reviews.llvm.org/D9882
llvm-svn: 237899
In effect a partial revert of r237858, which was a dumb shortcut.
Looking at the dependencies of the destination should be the proper
fix: if the new memset would depend on anything other than itself,
the transformation isn't correct.
llvm-svn: 237874
Fixes PR23599, another miscompile introduced by r235232: when there is
another dependency on the destination of the created memset (i.e., the
part of the original destination that the memcpy doesn't depend on)
between the memcpy and the original memset, we would insert the created
memset after the memcpy, and thus after the other dependency.
Instead, insert the created memset right after the old one.
llvm-svn: 237858
Make sure if we're truncating a constant that would then be sign extended
that the sign extension of the truncated constant is the same as the
original constant.
> Canonicalize min/max expressions correctly.
>
> This patch introduces a canonical form for min/max idioms where one operand
> is extended or truncated. This often happens when the other operand is a
> constant. For example:
>
> %1 = icmp slt i32 %a, i32 0
> %2 = sext i32 %a to i64
> %3 = select i1 %1, i64 %2, i64 0
>
> Would now be canonicalized into:
>
> %1 = icmp slt i32 %a, i32 0
> %2 = select i1 %1, i32 %a, i32 0
> %3 = sext i32 %2 to i64
>
> This builds upon a patch posted by David Majenemer
> (https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
> passively stopped instcombine from ruining canonical patterns. This
> patch additionally actively makes instcombine canonicalize too.
>
> Canonicalization of expressions involving a change in type from int->fp
> or fp->int are not yet implemented.
llvm-svn: 237821
Now that Intrinsic::ID is a typed enum, we can forward declare it and so return it from this method.
This updates all users which were either using an unsigned to store it, or had a now unnecessary cast.
llvm-svn: 237810
This change adds a new GC strategy for supporting the CoreCLR runtime.
This strategy is currently identical to Statepoint-example GC,
but is necessary for several upcoming changes specific to CoreCLR, such as:
1. Base-pointers not explicitly reported for interior pointers
2. Different format for stack-map encoding
3. Location of Safe-point polls: polls are only needed before loop-back edges and before tail-calls (not needed at function-entry)
4. Runtime specific handshake between calls to managed/unmanaged functions.
llvm-svn: 237753
We were special casing a handful of intrinsics as not needing a safepoint before them. After running into another valid case - memset - I took a closer look and realized that almost no intrinsics need to have a safepoint poll before them. Restructure the code to make that apparent so that we stop hitting these bugs. The only intrinsics which need a safepoint poll before them are ones which can run arbitrary code.
llvm-svn: 237744
InstructionCombiningPass was added after LoopUnrollPass in r237395. Because
InstructionCombiningPass is strictly more powerful than InstructionSimplifierPass,
remove the unnecessary InstructionSimplifierPass.
Differential Revision: http://reviews.llvm.org/D9838
llvm-svn: 237702
Summary: When PlaceSafepoints pass replaces old return result with gc_result from statepoint, it asserts that gc_result can not have preceding phis in its parent block. This is only true on invoke statepoint, which terminates the block and puts its result at the beginning of the normal successor block. Call statepoint does not terminate the block and thus its result is in the same block with it. There should be no restriction on whether there are phis or not.
Reviewers: reames, igor-laevsky
Reviewed By: igor-laevsky
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9803
llvm-svn: 237597
Summary:
Allow hoisting of loads from values marked with dereferenceable_or_null
attribute. For values marked with the attribute perform
context-sensitive analysis to determine whether it's known-non-null or
not.
Patch by Artur Pilipenko!
Reviewers: hfinkel, sanjoy, reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9253
llvm-svn: 237593
Summary:
This allows other passes (such as SLSR) to compute the SCEV expression for an
imaginary GEP.
Test Plan: no regression
Reviewers: atrick, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9786
llvm-svn: 237589
Don't replace a phi with an identical phi. This was done long ago to
"preserve" IVUsers analysis. The code has already called
SE->forgetValue(PN) so I see no purpose in creating a new value for
the phi.
llvm-svn: 237587
SimplifyDemandedBits was "simplifying" a constant by removing just sign bits.
This caused a canonicalization race between different parts of instcombine.
Fix and regression test added - third time lucky?
llvm-svn: 237539
The AArch64 LNT bot is unhappy - I've found that the problem is in
SimpliftDemandedBits, but that's going to require another code review
so reverting in the meantime.
llvm-svn: 237528
The test timeouts were due to instcombine fighting itself. Regression test added.
Original log message:
Canonicalize min/max expressions correctly.
This patch introduces a canonical form for min/max idioms where one operand
is extended or truncated. This often happens when the other operand is a
constant. For example:
%1 = icmp slt i32 %a, i32 0
%2 = sext i32 %a to i64
%3 = select i1 %1, i64 %2, i64 0
Would now be canonicalized into:
%1 = icmp slt i32 %a, i32 0
%2 = select i1 %1, i32 %a, i32 0
%3 = sext i32 %2 to i64
This builds upon a patch posted by David Majenemer
(https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
passively stopped instcombine from ruining canonical patterns. This
patch additionally actively makes instcombine canonicalize too.
Canonicalization of expressions involving a change in type from int->fp
or fp->int are not yet implemented.
llvm-svn: 237520
There's no point in copying around constants, so, when all else fails,
we can still transform memcpy of memset into two independent memsets.
To quote the example, we can turn:
memset(dst1, c, dst1_size);
memcpy(dst2, dst1, dst2_size);
into:
memset(dst1, c, dst1_size);
memset(dst2, c, dst2_size);
When dst2_size <= dst1_size.
Like r235232 for copy constructors, this can occur in move constructors.
Differential Revision: http://reviews.llvm.org/D9682
llvm-svn: 237506
Summary:
This is a pass for speculative execution of instructions for simple if-then (triangle) control flow. It's aimed at GPUs, but could perhaps be used in other contexts. Enabling this pass gives us a 1.0% geomean improvement on Google benchmark suites, with one benchmark improving 33%.
Credit goes to Jingyue Wu for writing an earlier version of this pass.
Patched by Bjarke Roune.
Test Plan:
This patch adds a set of tests in test/Transforms/SpeculativeExecution/spec.ll
The pass is controlled by a flag which defaults to having the pass not run.
Reviewers: eliben, dberlin, meheff, jingyue, hfinkel
Reviewed By: jingyue, hfinkel
Subscribers: majnemer, jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9360
llvm-svn: 237459
This reverts r237453 - it was causing timeouts on some bots. Reverting
while I investigate (it's probably InstCombine fighting itself...)
llvm-svn: 237458
Summary:
Consider (B | i) * S as (B + i) * S if B and i have no bits set in
common.
Test Plan: @or in slsr-mul.ll
Reviewers: broune, meheff
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9788
llvm-svn: 237456
This patch introduces a canonical form for min/max idioms where one operand
is extended or truncated. This often happens when the other operand is a
constant. For example:
%1 = icmp slt i32 %a, i32 0
%2 = sext i32 %a to i64
%3 = select i1 %1, i64 %2, i64 0
Would now be canonicalized into:
%1 = icmp slt i32 %a, i32 0
%2 = select i1 %1, i32 %a, i32 0
%3 = sext i32 %2 to i64
This builds upon a patch posted by David Majenemer
(https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
passively stopped instcombine from ruining canonical patterns. This
patch additionally actively makes instcombine canonicalize too.
Canonicalization of expressions involving a change in type from int->fp
or fp->int are not yet implemented.
llvm-svn: 237453
Transfer the calling convention from the invoke being replaced by
PlaceStatepoints to the new invoke to gc.statepoint created. Add a test
case that would have caught this issue.
llvm-svn: 237414
rL236672 would generate all invoke statepoints with deopt args set to a
list containing the single element "0", instead of an empty list.
Also add a test case that would have caught this.
llvm-svn: 237413
Summary:
Extract method haveNoCommonBitsSet so that we don't have to duplicate this logic in
InstCombine and SeparateConstOffsetFromGEP.
This patch also makes SeparateConstOffsetFromGEP more precise by passing
DominatorTree to computeKnownBits.
Test Plan: value-tracking-domtree.ll that tests ValueTracking indeed leverages dominating conditions
Reviewers: broune, meheff, majnemer
Reviewed By: majnemer
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D9734
llvm-svn: 237407
This is to cleanup some redundency generated by LoopUnroll pass. Such redundency may not be cleaned up by existing passes after LoopUnroll.
Differential Revision: http://reviews.llvm.org/D9777
llvm-svn: 237395
Summary:
This implements the initial version as was proposed earlier this year
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/080462.html).
Since then Loop Access Analysis was split out from the Loop Vectorizer
and was made into a separate analysis pass. Loop Distribution becomes
the second user of this analysis.
The pass is off by default and can be enabled
with -enable-loop-distribution. There is currently no notion of
profitability; if there is a loop with dependence cycles, the pass will
try to split them off from other memory operations into a separate loop.
I decided to remove the control-dependence calculation from this first
version. This and the issues with the PDT are actively discussed so it
probably makes sense to treat it separately. Right now I just mark all
terminator instruction required which keeps identical CFGs for each
distributed loop. This seems to be working pretty well for 456.hmmer
where even though there is an empty if-then block in the distributed
loop initially, it gets completely removed.
The pass keeps DominatorTree and LoopInfo updated. I've tested this
with -loop-distribute-verify with the testsuite where we distribute ~90
loops. SimplifyLoop is violated in some cases and I have a FIXME
covering this.
Reviewers: hfinkel, nadav, aschwaighofer
Reviewed By: aschwaighofer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8831
llvm-svn: 237358
ArrayRef already has a SFINAE constructor which can construct ArrayRef<const T*> from ArrayRef<T*>.
This adds methods to do the same directly from SmallVector and std::vector. This avoids an intermediate step through the use of makeArrayRef.
Also update the users of this in LICM and SROA to remove the now unnecessary makeArrayRef call.
Reviewed by David Blaikie.
llvm-svn: 237309
Summary:
This patch teaches the PlaceSafepoints pass about two `CallSite`
function attributes:
* "statepoint-id": if the string value of this attribute can be parsed
as an integer, then it is propagated to the ID parameter of the
statepoint created.
* "statepoint-num-patch-bytes": if the string value of this attribute
can be parsed as an integer, then it is propagated to the `num patch
bytes` parameter of the statepoint created.
This change intentionally does not assert on a malformed value for these
attributes, given that they're not "official" attributes.
Reviewers: reames, pgavlin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9735
llvm-svn: 237286
Avoid running forever by checking we are not reassociating an expression into
the same form.
Tested with @avoid_infinite_loops in nary-add.ll
llvm-svn: 237269
This patch uses the new function profile metadata "function_entry_count"
to annotate entry counts from sample profiles.
In a sampling profile, the total samples collected at the function entry
are an approximation for the number of times that function was invoked.
llvm-svn: 237265
The array passed to LoadAndStorePromoter's constructor was a constant reference to a SmallVectorImpl, which is just the same as passing an ArrayRef.
Also, the data in the array can be 'const Instruction*' instead of 'Instruction*'. Its not possible to convert a SmallVectorImpl<T*> to SmallVectorImpl<const T*>, but ArrayRef does provide such a method.
Currently this added calls to makeArrayRef which should be a nop, but i'm going to kick off a discussion about improving ArrayRef to not need these.
llvm-svn: 237226
Reduce recalculation of the dominator tree by identifying all sites that will need a safepoint poll before doing any of the insertion. This allows us to invalidate the dominator info once, rather than once per safepoint poll inserted.
While I'm at it, update findLocationForEntrySafepoint to properly update the dom tree now that the interface has been made easy. When first written, it wasn't per comment in the code.
Differential Revision: http://reviews.llvm.org/D9727
llvm-svn: 237220
Summary:
This change adds two new parameters to the statepoint intrinsic, `i64 id`
and `i32 num_patch_bytes`. `id` gets propagated to the ID field
in the generated StackMap section. If the `num_patch_bytes` is
non-zero then the statepoint is lowered to `num_patch_bytes` bytes of
nops instead of a call (the spill and reload code remains unchanged).
A non-zero `num_patch_bytes` is useful in situations where a language
runtime requires complete control over how a call is lowered.
This change brings statepoints one step closer to patchpoints. With
some additional work (that is not part of this patch) it should be
possible to get rid of `TargetOpcode::STATEPOINT` altogether.
PlaceSafepoints generates `statepoint` wrappers with `id` set to
`0xABCDEF00` (the old default value for the ID reported in the stackmap)
and `num_patch_bytes` set to `0`. This can be made more sophisticated
later.
Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9546
llvm-svn: 237214
Responding to review feedback from http://reviews.llvm.org/D9585
1) Remove a variable shadow by converting the outer loop to a range for loop. We never really used the 'i' variable which was being shadowed.
2) Reduce DominatorTree recalculations by passing the DT to SplitEdge.
llvm-svn: 237212
checking and make the cache faster and smaller.
I had thought that using an APInt here would be useful, but I think
I was just wrong. Notably, we don't have to do any fancy overflow
checking, we can just bound the values as quite small and do the math in
a higher precision integer. I've switched to a signed integer so that
UBSan will even point out if we ever have integer overflow. I've added
various asserts to try to catch things as well and hoisted the overflow
checks so that we just leave the too-large offsets out of the SCEV-GEP
cache. This makes the value in the cache quite a bit smaller which is
probably worthwhile.
No functionality changed here (for trip counts under 1 billion).
llvm-svn: 237209
Summary:
If the branch that leads to the PHI node and the Select instruction
depend on correlated conditions, we might be able to directly use the
corresponding value from the Select instruction as the incoming value
for the PHI node, allowing later removal of the select instruction.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9051
llvm-svn: 237201
When relocating a pointer, we need to determine a base pointer for the derived pointer being relocated. We have limited support for handling a pointer extracted from a vector; the current code only handled the case where the entire vector was known to contain base pointers. This patch extends the reasoning to handle chains of insertelements where the indices are constants. This case turns out to be fairly common in vectorized code. We can now handle vectors which contains mixtures of base and derived pointers provided the insertelements use constant indices.
Note that this doesn't solve the general problem. To handle variable indexed insertelements, we'd need to scalarize and introduce conditional branching based on the index. Alternatively, we could eagerly scalarize, but the code structure doesn't currently make either fix easy. The patch also doesn't handle shufflevector or other vector manipulation for much the same reasons. I plan to defer this work until I have a motivating test case.
Differential Revision: http://reviews.llvm.org/D9676
llvm-svn: 237200
The pass doesn't actually modify the module outside of the function being processed. The only confusing piece is that it both inserts calls and then inlines the resulting calls. Given that, it definitely invalidates module level analysis results, but many FunctionPasses do that.
Differential Revision: http://reviews.llvm.org/D9590
llvm-svn: 237185
Switch from using a LoopPass to using a FunctionPass for the internal helper analysis pass. The next step is going to be to make this a true analysis pass which is required by the PlaceSafepoints pass itself.
p.s. The interesting semantic part here is that we're changing the iteration order over the loops. It shouldn't matter, but that's the reason to separate this into it's own distinct patch.
Differential Revision: http://reviews.llvm.org/D9588
llvm-svn: 237180
The old code computed dominators for every loop. This was terribly slow with no good reason. Just use the standard infrastructure for analysis passes.
Differential Revision: http://reviews.llvm.org/D9586
llvm-svn: 237176
As a step towards getting rid of internal pass manager hack entirely, remove the need for loop simplify to run in the inner pass manager. The new code does produce slightly different loop structures, so this isn't technically NFC.
Differential Revision: http://reviews.llvm.org/D9585
llvm-svn: 237172
We already had a method to iterate over all the incoming values of a PHI. This just changes all eligible code to use it.
Ineligible code included anything which cared about the index, or was also trying to get the i'th incoming BB.
llvm-svn: 237169
Summary:
This patch reimplements heuristic that tries to estimate optimization beneftis
from complete loop unrolling.
In this patch I kept the minimal changes - e.g. I removed code handling
branches and folding compares. That's a promising area, but now there
are too many questions to discuss before we can enable it.
Test Plan: Tests are included in the patch.
Reviewers: hfinkel, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8816
llvm-svn: 237156
Summary:
This patch is to rename some variables to CamelCase in gc_relocate
related functions. There is no functionality change.
Patch by Chen Li!
Reviewers: reames, AndyAyers, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9681
llvm-svn: 237069
This fixes another miscompile introduced by r235232: when there was a
dependency on the memcpy destination other than the memset, we would
ignore it, because we only looked at the source dependency.
It was a mistake to use SrcDepInfo. Instead, just use DepInfo.
llvm-svn: 237066
runOnCountable() allowed the caller to call on a loop without a
predictable backedge-taken count. Change the code so that only loops
with computable backdge-count can call this function, in order to catch
abuses.
llvm-svn: 237044
Summary:
In RewriteStatepointsForGC pass, we create a gc_relocate intrinsic for
each relocated pointer, and the gc_relocate has the same type with the
pointer. During the creation of gc_relocate intrinsic, llvm requires to
mangle its type. However, llvm does not support mangling of all possible
types. RewriteStatepointsForGC will hit an assertion failure when it
tries to create a gc_relocate for pointer to vector of pointers because
mangling for vector of pointers is not supported.
This patch changes the way RewriteStatepointsForGC pass creates
gc_relocate. For each relocated pointer, we erase the type of pointers
and create an unified gc_relocate of type i8 addrspace(1)*. Then a
bitcast is inserted to convert the gc_relocate to the correct type. In
this way, gc_relocate does not need to deal with different types of
pointers and the unsupported type mangling is no longer a problem. This
change would also ease further merge when LLVM erases types of pointers
and introduces an unified pointer type.
Some minor changes are also introduced to gc_relocate related part in
InstCombineCalls, CodeGenPrepare, and Verifier accordingly.
Patch by Chen Li!
Reviewers: reames, AndyAyers, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9592
llvm-svn: 237009
The QPX single-precision load/store intrinsics have implied
truncation/extension from/to the declared value type of <4 x double> to the
memory type of <4 x float>. When we can prove the alignment of the pointer
argument, and thus replace the intrinsic with a regular load or store, we need
to load or store the correct data type (<4 x float>) instead of (<4 x double>).
llvm-svn: 236973
Second attempt; instead of using a named local variable, passing
arguments directly to `createSanitizerCtorAndInitFunctions` worked
on Windows.
Reviewers: kcc, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8780
llvm-svn: 236951
Clang regressions were caused by more stringent assertion checking
introduced by this change. Small fix needed to clang has been committed
in r236751.
llvm-svn: 236752
Summary:
One step further getting aggregate loads and store being optimized
properly. This will only handle struct with one element at this point.
Test Plan: Added unit tests for the new supported cases.
Reviewers: chandlerc, joker-eph, joker.eph, majnemer
Reviewed By: majnemer
Subscribers: pete, llvm-commits
Differential Revision: http://reviews.llvm.org/D8339
Patch by Amaury Sechet.
From: Amaury Sechet <amaury@fb.com>
llvm-svn: 236695
Summary:
This gives frontend more precise control over collected coverage
information. User can still override these options by passing
-mllvm flags.
No functionality change.
Test Plan: regression test suite.
Reviewers: kcc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D9539
llvm-svn: 236687
If we have recognized that a conditional is constant at a particular location in the code (while trying to decide if we can simplify a conditional branch), we can eagerly replace that condition with a constant if it's definition is post dominated by the branch in question.
In practice, this ends up being a compile time savings at most. JumpThreading would have visited each using branch anyways. CVP would have visited the cmp itself again. Unless LVI gives up early, we shouldn't gain any addition power by doing this transformation early. What we do gain is simplicity and compile time.
Differential Revision: http://reviews.llvm.org/D9312
llvm-svn: 236684
Renames the original CreateGCStatepoint to CreateGCStatepointCall, and
moves invoke creating functionality from PlaceSafepoints.cpp to
IRBuilder.cpp.
This changes the labels generated for PlaceSafepoints/invokes.ll so use
a regex there to make the basic block labels more resilient.
llvm-svn: 236672
This makes use of the new API which can remove attributes from a set given a builder.
This is much faster than creating a temporary set and reduces llc time by about 0.3% which was all spent creating temporary attributes sets on the context.
llvm-svn: 236668
Summary:
This helper function creates a ctor function, which calls sanitizer's
init function with given arguments. This constructor is then expected
to be added to module's ctors. The patch helps unifying how sanitizer
constructor functions are created, and how init functions are called
across all sanitizers.
Reviewers: kcc, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D8777
llvm-svn: 236627
The patch disabled unrolling in loop vectorization pass when VF==1 on x86 architecture,
by setting MaxInterleaveFactor to 1. Unrolling in loop vectorization pass may introduce
the cost of overflow check, memory boundary check and extra prologue/epilogue code when
regular unroller will unroll the loop another time. Disable it when VF==1 remove the
unnecessary cost on x86. The same can be done for other platforms after verifying
interleaving/memory bound checking to be not perf critical on those platforms.
Differential Revision: http://reviews.llvm.org/D9515
llvm-svn: 236613
I folded the check for the flag -verify-dom-info into the only caller
where I think it is supposed to be checked: verifyAnalysis. (The idea
of the flag is to enable this expensive verification in
verifyPreservedAnalysis.)
I'm assuming that when manually scheduling the verification pass
with -passes=verify<domtree>, we do want to perform the verification.
llvm-svn: 236575
COMDAT groups which have become rendered unused because of inline are
discardable if we can prove that we've made the group empty.
This fixes PR22285.
llvm-svn: 236539
This change is the second of 3 patches to add support for specifying
the profile output from the command line via -fprofile-instr-generate=<path>,
where the specified output path/file will be overridden by the
LLVM_PROFILE_FILE environment variable.
This patch adds the necessary support to the llvm instrumenter, specifically
a new member of GCOVOptions for clang to save the specified filename, and
support for calling the new compiler-rt interface from __llvm_profile_init.
Patch by Teresa Johnson. Thanks!
llvm-svn: 236288