Lifetime markers which reference inputs to the extraction region are not
safe to extract. Example ('rhs' will be extracted):
```
entry:
+------------+
| x = alloca |
| y = alloca |
+------------+
/ \
lhs: rhs:
+-------------------+ +-------------------+
| lifetime_start(x) | | lifetime_start(x) |
| use(x) | | lifetime_start(y) |
| lifetime_end(x) | | use(x, y) |
| lifetime_start(y) | | lifetime_end(y) |
| use(y) | | lifetime_end(x) |
| lifetime_end(y) | +-------------------+
+-------------------+
```
Prior to extraction, the stack coloring pass sees that the slots for 'x'
and 'y' are in-use at the same time. After extraction, the coloring pass
infers that 'x' and 'y' are *not* in-use concurrently, because markers
from 'rhs' are no longer available to help decide otherwise.
This leads to a miscompile, because the stack slots actually are in-use
concurrently in the extracted function.
Fix this by moving lifetime start/end markers for memory regions defined
in the calling function around the call to the extracted function.
Fixes llvm.org/PR39671 (rdar://45939472).
Differential Revision: https://reviews.llvm.org/D55967
llvm-svn: 350420
NFC: This adds the dom tree verification under debug mode at a point
just before we start unrolling the loop. This allows us to verify dom
tree at a state where it is much smaller and before the unrolling
actually happens.
This also implies we do not need to run -verify-dom-info everytime to
see if the DT is in a valid state when we transform the loop for runtime
unrolling.
llvm-svn: 350334
Summary:
Keeping msan a function pass requires replacing the module level initialization:
That means, don't define a ctor function which calls __msan_init, instead just
declare the init function at the first access, and add that to the global ctors
list.
Changes:
- Pull the actual sanitizer and the wrapper pass apart.
- Add a newpm msan pass. The function pass inserts calls to runtime
library functions, for which it inserts declarations as necessary.
- Update tests.
Caveats:
- There is one test that I dropped, because it specifically tested the
definition of the ctor.
Reviewers: chandlerc, fedor.sergeev, leonardchan, vitalybuka
Subscribers: sdardis, nemanjai, javed.absar, hiraditya, kbarton, bollu, atanasyan, jsji
Differential Revision: https://reviews.llvm.org/D55647
llvm-svn: 350305
Summary:
Added a pair of APIs for encoding/decoding the 3 components of a DWARF discriminator described in http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html: the base discriminator, the duplication factor (useful in profile-guided optimization) and the copy index (used to identify copies of code in cases like loop unrolling)
The encoding packs 3 unsigned values in 32 bits. This CL addresses 2 issues:
- communicates overflow back to the user
- supports encoding all 3 components together. Current APIs assume a sequencing of events. For example, creating a new discriminator based on an existing one by changing the base discriminator was not supported.
Reviewers: davidxl, danielcdh, wmi, dblaikie
Reviewed By: dblaikie
Subscribers: zzheng, dmgreen, aprantl, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D55681
llvm-svn: 349973
Instruction::isLifetimeStartOrEnd() checks whether an Instruction is an
llvm.lifetime.start or an llvm.lifetime.end intrinsic.
This was suggested as a cleanup in D55967.
Differential Revision: https://reviews.llvm.org/D56019
llvm-svn: 349964
Currently, runtime unrolling does not support loops where multiple
exiting blocks exit to the latchExit. Added TODO and other code
clarifications for ConnectProlog code.
llvm-svn: 349944
This verification is linear in the size of the function, so it can cause
a quadratic compile-time explosion in a function with many loops to
unroll.
Differential Revision: https://reviews.llvm.org/D54732
llvm-svn: 349871
The current llvm.mem.parallel_loop_access metadata has a problem in that
it uses LoopIDs. LoopID unfortunately is not loop identifier. It is
neither unique (there's even a regression test assigning the some LoopID
to multiple loops; can otherwise happen if passes such as LoopVersioning
make copies of entire loops) nor persistent (every time a property is
removed/added from a LoopID's MDNode, it will also receive a new LoopID;
this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even
if it just to mark that a loop should not be processed again as
llvm.loop.isvectorized does, for the versioned and unversioned loop),
the parallel access information is lost for any subsequent pass.
This patch unlinks LoopIDs and parallel accesses.
llvm.mem.parallel_loop_access metadata on instruction is replaced by
llvm.access.group metadata. llvm.access.group points to a distinct
MDNode with no operands (avoiding the problem to ever need to add/remove
operands), called "access group". Alternatively, it can point to a list
of access groups. The LoopID then has an attribute
llvm.loop.parallel_accesses with all the access groups that are parallel
(no dependencies carries by this loop).
This intentionally avoid any kind of "ID". Loops that are clones/have
their attributes modifies retain the llvm.loop.parallel_accesses
attribute. Access instructions that a cloned point to the same access
group. It is not necessary for each access to have it's own "ID" MDNode,
but those memory access instructions with the same behavior can be
grouped together.
The behavior of llvm.mem.parallel_loop_access is not changed by this
patch, but should be considered deprecated.
Differential Revision: https://reviews.llvm.org/D52116
llvm-svn: 349725
When converting dbg.declares, if the described value is a [s|z]ext,
refer to the ext directly instead of referring to its operand.
This fixes a narrowing bug (the debugger got the sign of a variable
wrong, see llvm.org/PR35400).
The main reason to refer to the ext's operand was that an optimization
may remove the ext itself, leading to a dropped variable. Now that
InstCombine has been taught to use replaceAllDbgUsesWith (r336451), this
is less of a concern. Other passes can/should adopt this API as needed
to fix dropped variable bugs.
Differential Revision: https://reviews.llvm.org/D51813
llvm-svn: 349214
The `changeToCall` function did not preserve the invoke's metadata.
Currently, there is probably no metadata that depends on being applied
on a CallInst or InvokeInst. Therefore we can replace the instruction's
metadata.
This fixes http://llvm.org/PR39994
Suggested-by: Moritz Kreutzer <moritz.kreutzer@siemens.com>
Differential Revision: https://reviews.llvm.org/D55666
llvm-svn: 349170
Summary:
This patch computes the synthetic function entry count on the whole
program callgraph (based on module summary) and writes the entry counts
to the summary. After function importing, this count gets attached to
the IR as metadata. Since it adds a new field to the summary, this bumps
up the version.
Reviewers: tejohnson
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D43521
llvm-svn: 349076
The actual type of the first argument of the @dbg intrinsic
doesn't really matter as we're setting it to `undef`, but the
bitcode reader is picky about `void` types.
llvm-svn: 349069
When loops are deleted, we don't keep track of variables modified inside
the loops, so the DI will contain the wrong value for these.
e.g.
int b() {
int i;
for (i = 0; i < 2; i++)
;
patatino();
return a;
-> 6 patatino();
7 return a;
8 }
9 int main() { b(); }
(lldb) frame var i
(int) i = 0
We mark instead these values as unavailable inserting a
@llvm.dbg.value(undef to make sure we don't end up printing an incorrect
value in the debugger. We could consider doing something fancier,
for, e.g. constants, in the future.
PR39868.
rdar://problem/46418795)
Differential Revision: https://reviews.llvm.org/D55299
llvm-svn: 348988
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.
#pragma clang loop unroll_and_jam(enable)
#pragma clang loop distribute(enable)
is the same as
#pragma clang loop distribute(enable)
#pragma clang loop unroll_and_jam(enable)
and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.
This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,
!0 = !{!0, !1, !2}
!1 = !{!"llvm.loop.unroll_and_jam.enable"}
!2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
!3 = !{!"llvm.loop.distribute.enable"}
defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.
Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.
For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.
Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.
To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.
With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).
Reviewed By: hfinkel, dmgreen
Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288
llvm-svn: 348944
When CodeExtractor outlines values which are used by the original
function, it must store those values in some in-out parameter. This
store instruction must not be inserted in between a PHI and an EH pad
instruction, as that results in invalid IR.
This fixes the following verifier failure seen while outlining within
ObjC methods with live exit values:
The unwind destination does not have an exception handling instruction!
%call35 = invoke i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %exn.adjusted, i8* %1)
to label %invoke.cont34 unwind label %lpad33, !dbg !4183
The unwind destination does not have an exception handling instruction!
invoke void @objc_exception_throw(i8* %call35) #12
to label %invoke.cont36 unwind label %lpad33, !dbg !4184
LandingPadInst not the first non-PHI instruction in the block.
%3 = landingpad { i8*, i32 }
catch i8* null, !dbg !1411
rdar://46540815
llvm-svn: 348562
Treat terminators which resume exception propagation as returning instructions
(at least, for the purposes of marking outlined functions `noreturn`). This is
to avoid inserting traps after calls to outlined functions which unwind.
rdar://46129950
llvm-svn: 348404
If a PHI node out of extracted region has multiple incoming values from it,
split this PHI on two parts. First PHI has incomings only from region and
extracts with it (they are placed to the separate basic block that added to the
list of outlined), and incoming values in original PHI are replaced by first
PHI. Similar solution is already used in CodeExtractor for PHIs in entry block
(severSplitPHINodes method). It covers PR39433 bug.
Patch by Sergei Kachkov!
Differential Revision: https://reviews.llvm.org/D55018
llvm-svn: 348205
We were duplicating code around the existing isImpliedCondition() that
checks for a predecessor block/dominating condition, so make that a
wrapper call.
llvm-svn: 348088
Summary:
When mem2reg inserts phi nodes in blocks with unreachable predecessors,
it adds undef operands for those incoming edges. When there are
multiple such predecessors, the order is currently based on the address
of the BasicBlocks. This change fixes that by using the BBNumbers in
the sort/search predicates, as is done elsewhere in mem2reg to ensure
determinism.
Also adds a testcase with a bunch of unreachable preds, which
(nodeterministically) fails without the fix.
Reviewers: majnemer
Reviewed By: majnemer
Subscribers: mgrang, llvm-commits
Differential Revision: https://reviews.llvm.org/D55077
llvm-svn: 348024
In PR39807 we incorrectly handle circumstances where calls are common'd
from conditional blocks into the parent BB. Calls that can be inlined
must always have DebugLocs, however we strip them during commoning, which
the IR verifier asserts on.
Fix this by using applyMergedLocation: it will perform the same DebugLoc
stripping of conditional Locs, but will also generate an unknown location
DebugLoc that satisfies the requirement for inlinable calls to always have
locations.
Some of the prior logic for selecting a DebugLoc is now likely redundant;
I'll generate a follow-up to remove it (involves editing more regression
tests).
Differential Revision: https://reviews.llvm.org/D54997
llvm-svn: 347782
Summary:
This is a NFC as we do not import non-odr vague linkage when computing
for import list for a module.
Reviewers: tejohnson, pcc
Subscribers: inglorion, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D54928
llvm-svn: 347763
Summary:
Removing ncompatible attributes at indirect-call promoted callsites, not removing it results in
at least a IR verification error.
Reviewers: davidxl, xur, mssimpso
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D54913
llvm-svn: 347605
Put 'static' on three functions in an anonymous namespace as per our
coding style.
Remove the 'namespace llvm {}' around the .cpp file and explicitly
declare the free function 'llvm::optimizeGlobalCtorsList' in 'llvm::'.
I prefer this style for free functions because the compiler will error
out if the .h and .cpp files don't agree on the function name or
prototype.
llvm-svn: 347269
Add methods to BasicBlock which make it easier to efficiently check
whether a block has N (or more) predecessors.
This can be more efficient than using pred_size(), which is a linear
time operation.
We might consider adding similar methods for successors. I haven't done
so in this patch because succ_size() is already O(1).
With this patch applied, I measured a 0.065% compile-time reduction in
user time for running `opt -O3` on the sqlite3 amalgamation (30 trials).
The change in mergeStoreIntoSuccessor alone saves 45 million linked list
iterations in a stage2 Release build of llc.
See llvm.org/PR39702 for a harder but more general way of achieving
similar results.
Differential Revision: https://reviews.llvm.org/D54686
llvm-svn: 347256
An attempt to recommit r346584 after failure on OSX build bot.
Fixed cache key computation in ThinLTOCodeGenerator and added
test case
llvm-svn: 347033
LoopUtils.cpp contains a utility that splits an loop exit block, so that the new block contains only edges coming from the loop. In the case of nested loops, the exit path for the inner loop might also be the back-edge of the outer loop. The new block which is inserted on this path, is now a latch for the outer loop, and it needs to hold the loop metadata for the outer loop. (The test case gives a more concrete view of the situation.)
Patch by Chang Lin (clin1)
Differential Revision: https://reviews.llvm.org/D53876
llvm-svn: 346810
This patch updates DuplicateInstructionsInSplitBetween to update a DTU
instead of applying updates to the DT directly.
Given that there only are 2 users, also updated them in this patch to
avoid churn.
I slightly moved the code in CallSiteSplitting around to reduce the
places where we have to pass in DTU. If necessary, I could split those
changes in a separate patch.
This fixes missing DT updates when dealing with musttail calls in
CallSiteSplitting, by using DTU->deleteBB.
Reviewers: junbuml, kuhar, NutshellySima, indutny, brzycki
Reviewed By: NutshellySima
llvm-svn: 346769
This patch allows internalising globals if all accesses to them
(from live functions) are from non-volatile load instructions
Differential revision: https://reviews.llvm.org/D49362
llvm-svn: 346584
In SimplifyCFG when given a conditional branch that goes to BB1 and BB2, the hoisted common terminator instruction in the two blocks, caused debug line records associated with subsequent select instructions to become ambiguous. It causes the debugger to display unreachable source lines.
Differential Revision: https://reviews.llvm.org/D53390
llvm-svn: 346481
This eliminates the outlining penalty for llvm.trap/unreachable, because
callers no longer have to emit cleanup/ret instructions after calling an
outlined `noreturn` function.
rdar://45523626
llvm-svn: 346421
The lowering for a call to eh_typeid_for changes when it's moved from
one function to another.
There are several proposals for fixing this issue in llvm.org/PR39545.
Until some solution is in place, do not allow CodeExtractor to extract
calls to eh_typeid_for, as that results in serious miscompilations.
llvm-svn: 346256
When CodeExtractor moves instructions to a new function, debug
intrinsics referring to those instructions within the parent function
become invalid.
This results in the same verifier failure which motivated r344545, about
function-local metadata being used in the wrong function.
llvm-svn: 346255
Clang's -Wimplicit-fallthrough implementation warns on this. I built
clang with GCC 7.3 in +asserts and -asserts mode, and GCC doesn't warn
on this in either configuration. I think it is unnecessary. I separated
it from the large mechanical patch (https://reviews.llvm.org/D53950) in
case I am wrong and it has to be reverted.
llvm-svn: 345876
As K has to dominate I, IIUC I's range metadata must be a subset of
K's. After Eli's recent clarification to the LangRef, loading a value
outside of the range is undefined behavior.
Therefore if I's range contains elements outside of K's range and we would load
one such value, K would cause undefined behavior.
In cases like hoisting/sinking, we still want the most generic range
over all code paths to/from the hoist/sink point. As suggested in the
patches related to D47339, I will refactor the handling of those
scenarios and try to decouple it from this function as follow up, once
we switched to a similar handling of metadata in most of
combineMetadata.
I updated some tests checking mostly the merging of metadata to keep the
metadata of to dominating load. The most interesting one is probably test8 in
test/Transforms/JumpThreading/thread-loads.ll. It contained a comment
about the alias metadata preventing us to eliminate the branch, but it
seem like the actual problem currently is that we merge the ranges of
both loads and cannot eliminate the icmp afterwards. With this patch, we
manage to eliminate the icmp, as the range of the first load excludes 8.
Reviewers: efriedma, nlopes, davide
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D51629
llvm-svn: 345456
When SimplifyCFG changes the PHI node into a select instruction, the debug line records becomes ambiguous. It causes the debugger to display unreachable source lines.
Differential Revision: https://reviews.llvm.org/D53287
llvm-svn: 345250
The current splitting algorithm works in three stages:
1) Identify cold blocks, then
2) Use forward/backward propagation to mark hot blocks, then
3) Grow a SESE region of blocks *outside* of the set of hot blocks and
start outlining.
While testing this pass on Apple internal frameworks I noticed that some
kinds of control flow (e.g. loops) are never outlined, even though they
unconditionally lead to / follow cold blocks. I noticed two other issues
related to how cold regions are identified:
- An inconsistency can arise in the internal state of the hotness
propagation stage, as a block may end up in both the ColdBlocks set
and the HotBlocks set. Further inconsistencies can arise as these sets
do not match what's in ProfileSummaryInfo.
- It isn't necessary to limit outlining to single-exit regions.
This patch teaches the splitting algorithm to identify maximal cold
regions and outline them. A maximal cold region is defined as the set of
blocks post-dominated by a cold sink block, or dominated by that sink
block. This approach can successfully outline loops in the cold path. As
a side benefit, it maintains less internal state than the current
approach.
Due to a limitation in CodeExtractor, blocks within the maximal cold
region which aren't dominated by a single entry point (a so-called "max
ancestor") are filtered out.
Results:
- X86 (LNT + -Os + externals): 134KB of TEXT were outlined compared to
47KB pre-patch, or a ~3x improvement. Did not see a performance impact
across two runs.
- AArch64 (LNT + -Os + externals + Apple-internal benchmarks): 149KB
of TEXT were outlined. Ditto re: performance impact.
- Outlining results improve marginally in the internal frameworks I
tested.
Follow-ups:
- Outline more than once per function, outline large single basic
blocks, & try to remove unconditional branches in outlined functions.
Differential Revision: https://reviews.llvm.org/D53627
llvm-svn: 345209
Summary:
The current default of appending "_"+entry block label to the new
extracted cold function breaks demangling. Change the deliminator from
"_" to "." to enable demangling. Because the header block label will
be empty for release compile code, use "extracted" after the "." when
the label is empty.
Additionally, add a mechanism for the client to pass in an alternate
suffix applied after the ".", and have the hot cold split pass use
"cold."+Count, where the Count is currently 1 but can be used to
uniquely number multiple cold functions split out from the same function
with D53588.
Reviewers: sebpop, hiraditya
Subscribers: llvm-commits, erik.pilkington
Differential Revision: https://reviews.llvm.org/D53534
llvm-svn: 345178
Summary:
In several places in the code we use the following pattern:
if (hasUnaryFloatFn(&TLI, Ty, LibFunc_tan, LibFunc_tanf, LibFunc_tanl)) {
[...]
Value *Res = emitUnaryFloatFnCall(X, TLI.getName(LibFunc_tan), B, Attrs);
[...]
}
In short, we check if there is a lib-function for a certain type, and then
we _always_ fetch the name of the "double" version of the lib function and
construct a call to the appropriate function, that we just checked exists,
using that "double" name as a basis.
This is of course a problem in cases where the target doesn't support the
"double" version, but e.g. only the "float" version.
In that case TLI.getName(LibFunc_tan) returns "", and
emitUnaryFloatFnCall happily appends an "f" to "", and we erroneously end
up with a call to a function called "f".
To solve this, the above pattern is changed to
if (hasUnaryFloatFn(&TLI, Ty, LibFunc_tan, LibFunc_tanf, LibFunc_tanl)) {
[...]
Value *Res = emitUnaryFloatFnCall(X, &TLI, LibFunc_tan, LibFunc_tanf,
LibFunc_tanl, B, Attrs);
[...]
}
I.e instead of first fetching the name of the "double" version and then
letting emitUnaryFloatFnCall() add the final "f" or "l", we let
emitUnaryFloatFnCall() fetch the right name from TLI.
Reviewers: eli.friedman, efriedma
Reviewed By: efriedma
Subscribers: efriedma, bjope, llvm-commits
Differential Revision: https://reviews.llvm.org/D53370
llvm-svn: 344725