This patch appends the name of the function to the switch generated lookup
table. This will ease the visual debugging in identifying the function the table
is generated from.
Differential Revision: https://reviews.llvm.org/D34817
llvm-svn: 306867
Summary:
Runtime unrolling is done for loops with a single exit block and a
single exiting block (and this exiting block should be the latch block).
This patch adds logic to support unrolling in the presence of multiple exit
blocks (which also means multiple exiting blocks).
Currently this is under an off-by-default option and is supported when
epilog code is generated. Support in presence of prolog code will be in
a future patch (we just need to add more tests, and update comments).
This patch is essentially an implementation patch. I have not added any
heuristic (in terms of branches added or code size) to decide when
this should be enabled.
Reviewers: mkuper, sanjoy, reames, evstupac
Reviewed by: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33001
llvm-svn: 306846
In rL300494 there was an attempt to deal with excessive compile time on
invocations of getSign/ZeroExtExpr using local caching. This approach only
helps if we request the same SCEV multiple times throughout recursion. But
in the bug PR33431 we see a case where we request different values all the time,
so caching does not help and the size of the cache grows enormously.
In this patch we remove the local cache for this methods and add the recursion
depth limit instead, as we do for arithmetics. This gives us a guarantee that the
invocation sequence is limited and reasonably short.
Differential Revision: https://reviews.llvm.org/D34273
llvm-svn: 306785
Summary:
I was testing using this expansion logic in other cases besides
NVPTX, and found some runtime failures due to the lack of a check
for a zero length memcpy/memset before the loop. There is already
such a check in the memmove expansion code though.
Reviewers: hfinkel
Subscribers: jholewinski, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D34707
llvm-svn: 306541
When simplifying an instruction that has been re-mapped, it should never
simplify to an instruction in the original function. In the edge case
where we are inlining a function into itself, the existing code led to
incorrect behavior. Replace the incorrect code with an assert verifying
that we never expect simplification to produce an instruction in the old
function, unless the functions are the same.
Differential Revision: https://reviews.llvm.org/D33850
llvm-svn: 306495
BlockAddress are only valid within their function context, which does not
interact well with CodeExtractor. Detect this case and prevent it.
Differential Revision: https://reviews.llvm.org/D33839
llvm-svn: 306448
Instead of getBackEdgeTakenCount, use getExitCount on the latch exiting block
(which is proven to be the only exiting block in the loop to be unrolled).
llvm-svn: 306410
metadata out of InstCombine and into helpers.
NFC, this just exposes the logic used by InstCombine when propagating
metadata from one load instruction to another. The plan is to use this
in SROA to address PR32902.
If anyone has better ideas about how to factor this or name variables,
I'm all ears, but this seemed like a pretty good start and lets us make
progress on the PR.
This is based on a patch by Ariel Ben-Yehuda (D34285).
llvm-svn: 306267
This was reverted in r306252, but I already had the bug fixed and was
just trying to form a test case.
The original commit factored the logic for forming dedicated exits
inside of LoopSimplify into a helper that could be used elsewhere and
with an approach that required fewer intermediate data structures. See
that commit for full details including the change to the statistic, etc.
The code looked fine to me and my reviewers, but in fact didn't handle
indirectbr correctly -- it left the 'InLoopPredecessors' vector dirty.
If you have code that looks *just* right, you can end up leaking these
predecessors into a subsequent rewrite, and crash deep down when trying
to update PHI nodes for predecessors that don't exist.
I've added an assert that makes the bug much more obvious, and then
changed the code to reliably clear the vector so we don't get this bug
again in some other form as the code changes.
I've also added a test case that *does* manage to catch this while also
giving some nice positive coverage in the face of indirectbr.
The real code that found this came out of what I think is CPython's
interpreter loop, but any code with really "creative" interpreter loops
mixing indirectbr and other exit paths could manage to tickle the bug.
I was hard to reduce the original test case because in addition to
having a particular pattern of IR, the whole thing depends on the order
of the predecessors which is in turn depends on use list order. The test
case added here was designed so that in multiple different predecessor
orderings it should always end up going down the same path and tripping
the same bug. I hope. At least, it tripped it for me without
manipulating the use list order which is better than anything bugpoint
could do...
llvm-svn: 306257
I want to use the same logic as LoopSimplify to form dedicated exits in
another pass (SimpleLoopUnswitch) so I wanted to factor it out here.
I also noticed that there is a pretty significantly more efficient way
to implement this than the way the code in LoopSimplify worked. We don't
need to actually retain the set of unique exit blocks, we can just
rewrite them as we find them and use only a set to deduplicate.
This did require changing one part of LoopSimplify to not re-use the
unique set of exits, but it only used it to check that there was
a single unique exit. That part of the code is about to walk the exiting
blocks anyways, so it seemed better to rewrite it to use those exiting
blocks to compute this property on-demand.
I also had to ditch a statistic, but it doesn't seem terribly valuable.
Differential Revision: https://reviews.llvm.org/D34049
llvm-svn: 306081
Summary:
This allows strlen to be moved out of the loop in case its argument is
not modified in the loop in LICM.
Reviewers: hfinkel, davide, sanjoy, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34323
llvm-svn: 305641
This is a fix for PR33292 that shows a case of extremely long compilation
of a single .c file with clang, with most time spent within SCEV.
We have a mechanism of limiting recursion depth for getAddExpr to avoid
long analysis in SCEV. However, there are calls from getAddExpr to getMulExpr
and back that do not propagate the info about depth. As result of this, a chain
getAddExpr -> ... .> getAddExpr -> getMulExpr -> getAddExpr -> ... -> getAddExpr
can be extremely long, with every segment of getAddExpr's being up to max depth long.
This leads either to long compilation or crash by stack overflow. We face this situation while
analyzing big SCEVs in the test of PR33292.
This patch applies the same limit on max expression depth for getAddExpr and getMulExpr.
Differential Revision: https://reviews.llvm.org/D33984
llvm-svn: 305463
I did this a long time ago with a janky python script, but now
clang-format has built-in support for this. I fed clang-format every
line with a #include and let it re-sort things according to the precise
LLVM rules for include ordering baked into clang-format these days.
I've reverted a number of files where the results of sorting includes
isn't healthy. Either places where we have legacy code relying on
particular include ordering (where possible, I'll fix these separately)
or where we have particular formatting around #include lines that
I didn't want to disturb in this patch.
This patch is *entirely* mechanical. If you get merge conflicts or
anything, just ignore the changes in this patch and run clang-format
over your #include lines in the files.
Sorry for any noise here, but it is important to keep these things
stable. I was seeing an increasing number of patches with irrelevant
re-ordering of #include lines because clang-format was used. This patch
at least isolates that churn, makes it easy to skip when resolving
conflicts, and gets us to a clean baseline (again).
llvm-svn: 304787
Summary:
This problem stems from the fact that instructions are allocated using new
in LLVM, i.e. there is no relationship that can be derived by just looking
at the pointer value.
This interface dispatches to appropriate dominance check given 2 instructions,
i.e. in case the instructions are in the same basic block, ordered basicblock
(with instruction numbering and caching) are used. Otherwise, dominator tree
is used.
This is a preparation patch for https://reviews.llvm.org/D32720
Reviewers: dberlin, hfinkel, davide
Subscribers: davide, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D33380
llvm-svn: 304764
This was rL304226, reverted in 304228 due to a clang assertion failure
on the build bots. That problem should have been addressed by clang
commit rL304470.
llvm-svn: 304488
Summary:
Sort OpsToRename before iterating to make iteration order deterministic.
Thanks to Daniel Berlin for the sorting logic.
Reviewers: dberlin, RKSimon, efriedma, davide
Reviewed By: dberlin, davide
Subscribers: sanjoy, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D33265
llvm-svn: 304447
This patch does an inline expansion of memcmp.
It changes the memcmp library call into an inline expansion when the size is
known at compile time and is under a target specified threshold.
This expansion is implemented in CodeGenPrepare and expands into straight line
code. The target specifies a maximum load size and the expansion works by using
this size to load the two sources, compare, and exit early if a difference is
found. It also has a special case when the memcmp result is used in a compare
to zero equality.
Differential Revision: https://reviews.llvm.org/D28637
llvm-svn: 304313
Summary:
In rL302576, DISubprograms gained the constraint that a !dbg attachments to functions must
have a 1:1 mapping to DISubprograms. As part of that change, the function cloning support
was adjusted to attempt to enforce this invariant during cloning. However, there
were several problems with the implementation. Part of these were fixed in rL304079.
However, there was a more fundamental problem with these changes, namely that it
bypasses the matadata value map, causing the cloned metadata to be a mix of metadata
pointing to the new suprogram (where manual code was added to fix those up) and the
old suprogram (where this was not the case). This mismatch could cause a number of
different assertion failures in the DWARF emitter. Some of these are given at
https://github.com/JuliaLang/julia/issues/22069, but some others have been observed
as well. Attempt to rectify this by partially reverting the manual DI metadata fixup,
and instead using the standard value map approach. To retain the desired semantics
of not duplicating the compilation unit and inlined subprograms, explicitly freeze
these in the value map.
Reviewers: dblaikie, aprantl, GorNishanov, echristo
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33655
llvm-svn: 304226
Summary:
I believe https://reviews.llvm.org/rL302576 introduced two bugs:
1) it produces duplicate distinct variables for every: dbg.value describing the same variable.
To fix the problme I switched form getDistinct() to get() in DebugLoc.cpp: auto reparentVar = [&](DILocalVariable *Var) {
return DILocalVariable::getDistinct(
2) It passes NewFunction plain name as a linkagename parameter to Subprogram constructor. Breaks assert in:
|| DeclLinkageName.empty()) || LinkageName == DeclLinkageName) && "decl has a linkage name and it is different"' failed.
#9 0x00007f5010261b75 llvm::DwarfUnit::applySubprogramDefinitionAttributes(llvm::DISubprogram const*, llvm::DIE&) /home/gor/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp:1173:3
#
(Edit: reproducer added)
Here how https://reviews.llvm.org/rL302576 broke coroutine debug info.
Coroutine body of the original function is split into several parts by cloning and removing unneeded code.
All parts describe the original function and variables present in the original function.
For a simple case, prior to Split, original function has these two blocks:
```
PostSpill: ; preds = %AllocaSpillBB
call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !14, metadata !15), !dbg !13
store i32 %x, i32* %x.addr, align 4
...
and
sw.epilog: ; preds = %sw.bb
%x.addr.reload.addr = getelementptr inbounds %f.Frame, %f.Frame* %FramePtr, i32 0, i32 4, !dbg !20
%4 = load i32, i32* %x.addr.reload.addr, align 4, !dbg !20
call void @llvm.dbg.value(metadata i32 %4, i64 0, metadata !14, metadata !15), !dbg !13!14 = !DILocalVariable(name: "x", arg: 1, scope: !6, file: !7, line: 55, type: !11)
```
Note that in two blocks different expression represent the same original user variable X.
Before rL302576, for every cloned function there was exactly one cloned DILocalVariable(name: "x" as in:
```
define i8* @f(i32 %x) #0 !dbg !6 {
...
!6 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped,
...
!14 = !DILocalVariable(name: "x", arg: 1, scope: !6, file: !7, line: 55, type: !11)
define internal fastcc void @f.resume(%f.Frame* %FramePtr) #0 !dbg !25 {
...
!25 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2)
!28 = !DILocalVariable(name: "x", arg: 1, scope: !25, file: !7, line: 55, type: !11)
```
After rL302576, for every cloned function there were as many DILocalVariable(name: "x" as there were "call void @llvm.dbg.value" for that variable.
This was causing asserts in VerifyDebugInfo and AssemblyPrinter.
Example:
```
!27 = distinct !DISubprogram(name: "f", linkageName: "f.resume", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55,
!29 = distinct !DILocalVariable(name: "x", arg: 1, scope: !27, file: !7, line: 55, type: !11)
!39 = distinct !DILocalVariable(name: "x", arg: 1, scope: !27, file: !7, line: 55, type: !11)
!41 = distinct !DILocalVariable(name: "x", arg: 1, scope: !27, file: !7, line: 55, type: !11)
```
Second problem:
Prior to rL302576, all clones were described by DISubprogram referring to original function.
```
define i8* @f(i32 %x) #0 !dbg !6 {
...
!6 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped,
define internal fastcc void @f.resume(%f.Frame* %FramePtr) #0 !dbg !25 {
...
!25 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55, flags: DIFlagPrototyped,
```
After rL302576, DISubprogram for clones is of two minds, plain name refers to the original name, linkageName refers to plain name of the clone.
```
!27 = distinct !DISubprogram(name: "f", linkageName: "f.resume", scope: !7, file: !7, line: 55, type: !8, isLocal: false, isDefinition: true, scopeLine: 55,
```
I think the assumption in AsmPrinter is that both name and linkageName should refer to the same entity. It asserts here when they are not:
```
|| DeclLinkageName.empty()) || LinkageName == DeclLinkageName) && "decl has a linkage name and it is different"' failed.
#9 0x00007f5010261b75 llvm::DwarfUnit::applySubprogramDefinitionAttributes(llvm::DISubprogram const*, llvm::DIE&) /home/gor/llvm/lib/CodeGen/AsmPrinter/DwarfUnit.cpp:1173:3
```
After this fix, behavior (with respect to coroutines) reverts to exactly as it was before and therefore making them debuggable again, or even more importantly, compilable, with "-g"
Reviewers: dblaikie, echristo, aprantl
Reviewed By: dblaikie
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33614
llvm-svn: 304079
This patch provides an initial prototype for a pass that sinks instructions based on GVN information, similar to GVNHoist. It is not yet ready for commiting but I've uploaded it to gather some initial thoughts.
This pass attempts to sink instructions into successors, reducing static
instruction count and enabling if-conversion.
We use a variant of global value numbering to decide what can be sunk.
Consider:
[ %a1 = add i32 %b, 1 ] [ %c1 = add i32 %d, 1 ]
[ %a2 = xor i32 %a1, 1 ] [ %c2 = xor i32 %c1, 1 ]
\ /
[ %e = phi i32 %a2, %c2 ]
[ add i32 %e, 4 ]
GVN would number %a1 and %c1 differently because they compute different
results - the VN of an instruction is a function of its opcode and the
transitive closure of its operands. This is the key property for hoisting
and CSE.
What we want when sinking however is for a numbering that is a function of
the *uses* of an instruction, which allows us to answer the question "if I
replace %a1 with %c1, will it contribute in an equivalent way to all
successive instructions?". The (new) PostValueTable class in GVN provides this
mapping.
This pass has some shown really impressive improvements especially for codesize already on internal benchmarks, so I have high hopes it can replace all the sinking logic in SimplifyCFG.
Differential revision: https://reviews.llvm.org/D24805
llvm-svn: 303850
This continues the changes started when computeSignBit was replaced with this new version of computeKnowBits.
Differential Revision: https://reviews.llvm.org/D33431
llvm-svn: 303773
Summary:
Before this change, AttributeLists stored a pair of index and
AttributeSet. This is memory efficient if most arguments do not have
attributes. However, it requires doing a search over the pairs to test
an argument or function attribute. Profiling shows that this loop was
0.76% of the time in 'opt -O2' of sqlite3.c, because LLVM constantly
tests values for nullability.
This was worth about 2.5% of mid-level optimization cycles on the
sqlite3 amalgamation. Here are the full perf results:
https://reviews.llvm.org/P7995
Here are just the before and after cycle counts:
```
$ perf stat -r 5 ./opt_before -O2 sqlite3.bc -o /dev/null
13,274,181,184 cycles # 3.047 GHz ( +- 0.28% )
$ perf stat -r 5 ./opt_after -O2 sqlite3.bc -o /dev/null
12,906,927,263 cycles # 3.043 GHz ( +- 0.51% )
```
This patch *does not* change the indices used to query attributes, as
requested by reviewers. Tracking whether an index is usable for array
indexing is a huge pain that affects many of the internal APIs, so it
would be good to come back later and do a cleanup to remove this
internal adjustment.
Reviewers: pete, chandlerc
Subscribers: javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D32819
llvm-svn: 303654
This patch builds over https://reviews.llvm.org/rL303349 and replaces
the use of the condition only if it is safe to do so.
We should not blindly RAUW the condition if experimental.guard or assume
is a use of that
condition. This is because LVI may have used the guard/assume to
identify the
value of the condition, and RUAWing will fold the guard/assume and uses
before the guards/assumes.
Reviewers: sanjoy, reames, trentxintong, mkazantsev
Reviewed by: sanjoy, reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D33257
llvm-svn: 303633
Summary:
With instrumentation profiling, when updating the VP metadata after
an inline, VP metadata on the inlined copy was inadvertantly having
all counts zeroed out. This was causing indirect calls from code inlined
during the call step to be marked as cold in the ThinLTO summaries and
not imported.
The CallerBFI needs to be passed down so that the CallSiteCount can be
computed from the profile summary info. With Sample PGO this was working
since the count is extracted from the branch weight metadata on the
call being inlined (even before we stopped looking at metadata for
non-sample PGO in r302844 this largely wasn't working for instrumentation
PGO since only promoted indirect calls would be getting inlined and have
the metadata).
Added an instrumentation PGO test and renamed the sample PGO test.
Reviewers: danielcdh, eraman
Subscribers: mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D33389
llvm-svn: 303574
Summary: This allows pthread_self to be pulled out of a loop by LICM.
Reviewers: hfinkel, arsenm, davide
Reviewed By: davide
Subscribers: davide, wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D32782
llvm-svn: 303495
Refactor the strlen optimization code to work for both strlen and wcslen.
This especially helps with programs in the wild where people pass
L"string"s to const std::wstring& function parameters and the wstring
constructor gets inlined.
This also fixes a lingerind API problem/bug in getConstantStringInfo()
where zeroinitializers would always give you an empty string (without a
length) back regardless of the actual length of the initializer which
did not work well in the TrimAtNul==false causing the PR mentioned
below.
Note that the fixed getConstantStringInfo() needed fixes to SelectionDAG
memcpy lowering and may lead to some cases for out-of-bounds
zeroinitializer accesses not getting optimized anymore. So some code
with UB may produce out of bound memory reads now instead of just
producing zeros.
The refactoring "accidentally" fixes http://llvm.org/PR32124
Differential Revision: https://reviews.llvm.org/D32839
llvm-svn: 303461
Summary:
Implements PR889
Removing the virtual table pointer from Value saves 1% of RSS when doing
LTO of llc on Linux. The impact on time was positive, but too noisy to
conclusively say that performance improved. Here is a link to the
spreadsheet with the original data:
https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing
This change makes it invalid to directly delete a Value, User, or
Instruction pointer. Instead, such code can be rewritten to a null check
and a call Value::deleteValue(). Value objects tend to have their
lifetimes managed through iplist, so for the most part, this isn't a big
deal. However, there are some places where LLVM deletes values, and
those places had to be migrated to deleteValue. I have also created
llvm::unique_value, which has a custom deleter, so it can be used in
place of std::unique_ptr<Value>.
I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which
derives from User outside of lib/IR. Code in IR cannot include MemorySSA
headers or call the MemoryAccess object destructors without introducing
a circular dependency, so we need some level of indirection.
Unfortunately, no class derived from User may have any virtual methods,
because adding a virtual method would break User::getHungOffOperands(),
which assumes that it can find the use list immediately prior to the
User object. I've added a static_assert to the appropriate OperandTraits
templates to help people avoid this trap.
Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv
Reviewed By: chandlerc
Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits
Differential Revision: https://reviews.llvm.org/D31261
llvm-svn: 303362
There's no need (& a bit incorrect) to mask off the high bits of the
register reference when describing a simple bool value.
Reviewers: aprantl
Differential Revision: https://reviews.llvm.org/D31062
llvm-svn: 303117
This patch adds min/max population count, leading/trailing zero/one bit counting methods.
The min methods return answers based on bits that are known without considering unknown bits. The max methods give answers taking into account the largest count that unknown bits could give.
Differential Revision: https://reviews.llvm.org/D32931
llvm-svn: 302925
This pass uses a new target hook to decide whether or not to expand a particular
intrinsic to the shuffevector sequence.
Differential Revision: https://reviews.llvm.org/D32245
llvm-svn: 302631
This change is required because the notion of count is different for
sample profiling and getProfileCount will need to determine the
underlying profile type.
Differential revision: https://reviews.llvm.org/D33012
llvm-svn: 302597
Summary:
This fixes the immediate crash caused by introducing an incorrect inttoptr
before attempting the conversion. There may still be a legality
check missing somewhere earlier for non-integral pointers, but this change
seems necessary in any case.
Reviewers: sanjoy, dberlin
Reviewed By: dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32623
llvm-svn: 302587
As recently discussed on llvm-dev [1], this patch makes it illegal for
two Functions to point to the same DISubprogram and updates
FunctionCloner to also clone the debug info of a function to conform
to the new requirement. To simplify the implementation it also factors
out the creation of inlineAt locations from the Inliner into a
general-purpose utility in DILocation.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html
<rdar://problem/31926379>
Differential Revision: https://reviews.llvm.org/D32975
This reapplies r302469 with a fix for a bot failure (reparentDebugInfo
now checks for the case the orig and new function are identical).
llvm-svn: 302576
Summary:
Since I will post patch with some changes to
replaceDominatedUsesWith, it would be good to avoid
duplicating code again.
Reviewers: davide, dberlin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32798
llvm-svn: 302575
Use variadic templates instead of relying on <cstdarg> + sentinel.
This enforces better type checking and makes code more readable.
Differential Revision: https://reviews.llvm.org/D32541
llvm-svn: 302571
This caused PR32977.
Original commit message:
> Make it illegal for two Functions to point to the same DISubprogram
>
> As recently discussed on llvm-dev [1], this patch makes it illegal for
> two Functions to point to the same DISubprogram and updates
> FunctionCloner to also clone the debug info of a function to conform
> to the new requirement. To simplify the implementation it also factors
> out the creation of inlineAt locations from the Inliner into a
> general-purpose utility in DILocation.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html
> <rdar://problem/31926379>
>
> Differential Revision: https://reviews.llvm.org/D32975
llvm-svn: 302533
- This change allows targets to opt-in to using them instead of the log2
shufflevector algorithm.
- The SLP and Loop vectorizers have the common code to do shuffle reductions
factored out into LoopUtils, and now have a unified interface for generating
reductions regardless of the preference of the target. LoopUtils now uses TTI
to determine what kind of reductions the target wants to handle.
- For CodeGen, basic legalization support is added.
Differential Revision: https://reviews.llvm.org/D30086
llvm-svn: 302514
As recently discussed on llvm-dev [1], this patch makes it illegal for
two Functions to point to the same DISubprogram and updates
FunctionCloner to also clone the debug info of a function to conform
to the new requirement. To simplify the implementation it also factors
out the creation of inlineAt locations from the Inliner into a
general-purpose utility in DILocation.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-May/112661.html
<rdar://problem/31926379>
Differential Revision: https://reviews.llvm.org/D32975
llvm-svn: 302469
Previously SimplifyCFG used getSetSize which returns an APInt that is 1 bit wider than the ConstantRange's bit width. In the reasonably common case that the ConstantRange is 64-bits wide, this requires returning a 65-bit APInt. APInt's can only store 64-bits without a memory allocation so this is inefficient.
The new method takes the 8 as an input and tells if the range contains more than that many elements without requiring any wider math.
llvm-svn: 302385
wcslen is part of the C99 and C++98 standards.
- This introduces the function to TargetLibraryInfo.
- Also set attributes for wcslen in llvm::inferLibFuncAttributes().
Differential Revision: https://reviews.llvm.org/D32837
llvm-svn: 302278
Summary:
Do three things to help with that:
- Add AttributeList::FirstArgIndex, which is an enumerator currently set
to 1. It allows us to change the indexing scheme with fewer changes.
- Add addParamAttr/removeParamAttr. This just shortens addAttribute call
sites that would otherwise need to spell out FirstArgIndex.
- Remove some attribute-specific getters and setters from Function that
take attribute list indices. Most of these were only used from
BuildLibCalls, and doesNotAlias was only used to test or set if the
return value is malloc-like.
I'm happy to split the patch, but I think they are probably easier to
review when taken together.
This patch should be NFC, but it sets the stage to change the indexing
scheme to this, which is more convenient when indexing into an array:
0: func attrs
1: retattrs
2...: arg attrs
Reviewers: chandlerc, pete, javed.absar
Subscribers: david2050, llvm-commits
Differential Revision: https://reviews.llvm.org/D32811
llvm-svn: 302060
Summary:
Cloning basic blocks in the loop for runtime loop unroller depends on loop being
in rotated form (i.e. loop latch target is the exit block).
Assert that this is true, so that callers of runtime loop unroller pass in
canonical loops.
The single caller of this function has that check recently added:
https://reviews.llvm.org/rL301239
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32801
llvm-svn: 302058
The method is called "get *Param* Alignment", and is only used for
return values exactly once, so it should take argument indices, not
attribute indices.
Avoids confusing code like:
IsSwiftError = CS->paramHasAttr(ArgIdx, Attribute::SwiftError);
Alignment = CS->getParamAlignment(ArgIdx + 1);
Add getRetAlignment to handle the one case in Value.cpp that wants the
return value alignment.
This is a potentially breaking change for out-of-tree backends that do
their own call lowering.
llvm-svn: 301682
This is a follow up to the fix in r298360 to improve the handling of debug
values when redundant LEAs are removed. The fix in r298360 effectively
discarded the debug values. This patch now attempts to preserve the debug
values by using the DWARF DW_OP_stack_value operation via prependDIExpr.
Moved functions appendOffset and prependDIExpr from Local.cpp to
DebugInfoMetadata.cpp and made them available as static member functions of
DIExpression.
Differential Revision: https://reviews.llvm.org/D31604
llvm-svn: 301630
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.
Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.
This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.
At this moment it does not work on Gold (as in the globals are never
stripped).
This is a second re-land of r298158. This time, this feature is
limited to -fdata-sections builds.
llvm-svn: 301587
This patch introduces a new KnownBits struct that wraps the two APInt used by computeKnownBits. This allows us to treat them as more of a unit.
Initially I've just altered the signatures of computeKnownBits and InstCombine's simplifyDemandedBits to pass a KnownBits reference instead of two separate APInt references. I'll do similar to the SelectionDAG version of computeKnownBits/simplifyDemandedBits as a separate patch.
I've added a constructor that allows initializing both APInts to the same bit width with a starting value of 0. This reduces the repeated pattern of initializing both APInts. Once place default constructed the APInts so I added a default constructor for those cases.
Going forward I would like to add more methods that will work on the pairs. For example trunc, zext, and sext occur on both APInts together in several places. We should probably add a clear method that can be used to clear both pieces. Maybe a method to check for conflicting information. A method to return (Zero|One) so we don't write it out everywhere. Maybe a method for (Zero|One).isAllOnesValue() to determine if all bits are known. I'm sure there are many other methods we can come up with.
Differential Revision: https://reviews.llvm.org/D32376
llvm-svn: 301432
Commits were:
"Use WeakVH instead of WeakTrackingVH in AliasSetTracker's UnkownInsts"
"Add a new WeakVH value handle; NFC"
"Rename WeakVH to WeakTrackingVH; NFC"
The changes assumed pointers are 8 byte aligned on all architectures.
llvm-svn: 301429
Summary:
I plan to use WeakVH to mean "nulls itself out on deletion, but does
not track RAUW" in a subsequent commit.
Reviewers: dblaikie, davide
Reviewed By: davide
Subscribers: arsenm, mehdi_amini, mcrosier, mzolotukhin, jfb, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D32266
llvm-svn: 301424
This patch uses various APInt methods to reduce temporary APInt creation.
This should be all of the unrelated cleanups that got buried in D32376(creating a KnownBits struct) as well as some pointed out by Simon during the review of that. Plus a few improvements to use counting instead of masking.
I've left out any places where we do something like (KnownZero & KnownOne) != 0 as I plan to add a helper method to KnownBits to ask that question and didn't want to thrash that code an additional time.
Differential Revision: https://reviews.llvm.org/D32495
llvm-svn: 301338
One of the fast-math optimizations is to replace calls to standard double
functions with their float equivalents, e.g. exp -> expf. However, this can
cause infinite loops for the following:
float expf(float val) { return (float) exp((double) val); }
A similar inline declaration exists in the MinGW-w64 math.h header file which
when compiled with -O2/3 and fast-math generates infinite loops.
So this fix checks that the calling function to the standard double function
that is being replaced does not match the float equivalent.
Differential Revision: https://reviews.llvm.org/D31806
llvm-svn: 301304
The current Loop Unroll implementation works with loops having a
single latch that contains a conditional branch to a block outside
the loop (the other successor is, by defition of latch, the header).
If this precondition doesn't hold, avoid unrolling the loop as
the code is not ready to handle such circumstances.
Differential Revision: https://reviews.llvm.org/D32261
llvm-svn: 301239
When the location description of a source variable involves arithmetic
on the value itself, it needs to be marked with DW_OP_stack_value since it
is not describing the variable's location, but rather its value.
This is a follow-up to r297971 and fixes the source testcase quoted in
the comment in debuginfo-dce.ll.
rdar://problem/30725338
This reapplies r301093 without modifications.
llvm-svn: 301210
Summary:
Instead of keeping a variable indicating whether there are early exits
in the loop. We keep all the early exits. This improves LICM's ability to
move instructions out of the loop based on is-guaranteed-to-execute.
I am going to update compilation time as well soon.
Reviewers: hfinkel, sanjoy, efriedma, mkuper
Reviewed By: hfinkel
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D32433
llvm-svn: 301196
When the location description of a source variable involves arithmetic
on the value itself, it needs to be marked with DW_OP_stack_value since it
is not describing the variable's location, but rather its value.
This is a follow-up to r297971 and fixes the source testcase quoted in
the comment in debuginfo-dce.ll.
rdar://problem/30725338
llvm-svn: 301093
Currently we choose PostBB as the single successor of QFB, but its possible that QTB's single successor is QFB which would make QFB the correct choice.
Differential Revision: https://reviews.llvm.org/D32323
llvm-svn: 300992
CodeExtractor looks up the dominator node corresponding to return blocks
when splitting them. If one of these blocks is unreachable, there's no
node in the Dom and CodeExtractor crashes because it doesn't check
for domtree node validity.
In theory, we could add just a check for skipping null DTNodes in
`splitReturnBlock` but the fix I propose here is slightly different. To the
best of my knowledge, unreachable blocks are irrelevant for the algorithm,
therefore we can just skip them when building the candidate set in the
constructor.
Differential Revision: https://reviews.llvm.org/D32335
llvm-svn: 300946
getSignBit is a static function that creates an APInt with only the sign bit set. getSignMask seems like a better name to convey its functionality. In fact several places use it and then store in an APInt named SignMask.
Differential Revision: https://reviews.llvm.org/D32108
llvm-svn: 300856
Summary:
See http://llvm.org/docs/LangRef.html#non-integral-pointer-type
The NewGVN test does not fail without these changes (perhaps it does
try to coerce pointers <-> integers to begin with?), but I added the
test case anyway.
Reviewers: dberlin
Subscribers: mcrosier, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D32208
llvm-svn: 300730
The DWARF specification knows 3 kinds of non-empty simple location
descriptions:
1. Register location descriptions
- describe a variable in a register
- consist of only a DW_OP_reg
2. Memory location descriptions
- describe the address of a variable
3. Implicit location descriptions
- describe the value of a variable
- end with DW_OP_stack_value & friends
The existing DwarfExpression code is pretty much ignorant of these
restrictions. This used to not matter because we only emitted very
short expressions that we happened to get right by accident. This
patch makes DwarfExpression aware of the rules defined by the DWARF
standard and now chooses the right kind of location description for
each expression being emitted.
This would have been an NFC commit (for the existing testsuite) if not
for the way that clang describes captured block variables. Based on
how the previous code in LLVM emitted locations, DW_OP_deref
operations that should have come at the end of the expression are put
at its beginning. Fixing this means changing the semantics of
DIExpression, so this patch bumps the version number of DIExpression
and implements a bitcode upgrade.
There are two major changes in this patch:
I had to fix the semantics of dbg.declare for describing function
arguments. After this patch a dbg.declare always takes the *address*
of a variable as the first argument, even if the argument is not an
alloca.
When lowering a DBG_VALUE, the decision of whether to emit a register
location description or a memory location description depends on the
MachineLocation — register machine locations may get promoted to
memory locations based on their DIExpression. (Future) optimization
passes that want to salvage implicit debug location for variables may
do so by appending a DW_OP_stack_value. For example:
DBG_VALUE, [RBP-8] --> DW_OP_fbreg -8
DBG_VALUE, RAX --> DW_OP_reg0 +0
DBG_VALUE, RAX, DIExpression(DW_OP_deref) --> DW_OP_reg0 +0
All testcases that were modified were regenerated from clang. I also
added source-based testcases for each of these to the debuginfo-tests
repository over the last week to make sure that no synchronized bugs
slip in. The debuginfo-tests compile from source and run the debugger.
https://bugs.llvm.org/show_bug.cgi?id=32382
<rdar://problem/31205000>
Differential Revision: https://reviews.llvm.org/D31439
llvm-svn: 300522
The use list is a linked list so getNumUses requires a linear scan through the whole list. hasNUses will stop scanning at N and see if that is the end.
llvm-svn: 300505
This patch is a generalization of the improvement introduced in rL296898.
Previously, we were able to peel one iteration of a loop to get rid of a Phi that becomes
an invariant on the 2nd iteration. In more general case, if a Phi becomes invariant after
N iterations, we can peel N times and turn it into invariant.
In order to do this, we for every Phi in loop's header we define the Invariant Depth value
which is calculated as follows:
Given %x = phi <Inputs from above the loop>, ..., [%y, %back.edge].
If %y is a loop invariant, then Depth(%x) = 1.
If %y is a Phi from the loop header, Depth(%x) = Depth(%y) + 1.
Otherwise, Depth(%x) is infinite.
Notice that if we peel a loop, all Phis with Depth = 1 become invariants,
and all other Phis with finite depth decrease the depth by 1.
Thus, peeling N first iterations allows us to turn all Phis with Depth <= N
into invariants.
Reviewers: reames, apilipenko, mkuper, skatkov, anna, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31613
llvm-svn: 300446
When peeling loops basing on phis becoming invariants, we make a wrong loop size check.
UP.Threshold should be compared against the total numbers of instructions after the transformation,
which is equal to 2 * LoopSize in case of peeling one iteration.
We should also check that the maximum allowed number of peeled iterations is not zero.
Reviewers: sanjoy, anna, reames, mkuper
Reviewed By: mkuper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31753
llvm-svn: 300441
Add hasParamAttribute() and use it instead of hasAttribute(ArgNo+1,
Kind) everywhere.
The fact that the AttributeList index for an argument is ArgNo+1 should
be a hidden implementation detail.
NFC
llvm-svn: 300272
For LCSSA purposes, loop BBs not dominating any of the exits aren't
interesting, as none of the values defined in these blocks can be
used outside the loop.
The way the code computed this information was by comparing each
BB of the loop with each of the exit blocks and ask the dominator tree
about their dominance relation. This is slow.
A more efficient way, implemented here, is that of starting from the
exit blocks and walking the dom upwards until we hit an header. By
transitivity, all the blocks we encounter in our path dominate an exit.
For the testcase provided in PR31851, this reduces compile time on
`opt -O2` by ~25%, going from 1m47s to 1m22s.
Thanks to Dan/MichaelZ for discussions/suggesting the approach/review.
Differential Revision: https://reviews.llvm.org/D31843
llvm-svn: 300255
Summary:
In first order recurrences where phi's are used outside the loop,
we should generate an additional vector.extract of the second last element from
the vectorized phi update.
This is because we require the phi itself (which is the value at the second last
iteration of the vector loop) and not the phi's update within the loop.
Also fix the code gen when we just unroll, but don't vectorize.
Fixes PR32396.
Reviewers: mssimpso, mkuper, anemet
Subscribers: llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D31979
llvm-svn: 300238
This seems like a much more natural API, based on Derek Schuff's
comments on r300015. It further hides the implementation detail of
AttributeList that function attributes come last and appear at index
~0U, which is easy for the user to screw up. git diff says it saves code
as well: 97 insertions(+), 137 deletions(-)
This also makes it easier to change the implementation, which I want to
do next.
llvm-svn: 300153
and to expose a handle to represent the actual case rather than having
the iterator return a reference to itself.
All of this allows the iterator to be used with common STL facilities,
standard algorithms, etc.
Doing this exposed some missing facilities in the iterator facade that
I've fixed and required some work to the actual iterator to fully
support the necessary API.
Differential Revision: https://reviews.llvm.org/D31548
llvm-svn: 300032
Summary:
For now, it just wraps AttributeSetNode*. Eventually, it will hold
AvailableAttrs as an inline bitset, and adding and removing enum
attributes will be super cheap.
This sinks AttributeSetNode back down to lib/IR/AttributeImpl.h.
Reviewers: pete, chandlerc
Subscribers: llvm-commits, jfb
Differential Revision: https://reviews.llvm.org/D31940
llvm-svn: 300014
In the vectorization of first order recurrence, we vectorize such
that the last element in the vector will be the one extracted to pass into the
scalar remainder loop. However, this is not true when there is a phi (other
than the primary induction variable) is used outside the loop.
In such a case, we need the value from the second last iteration (i.e.
the phi value), not the last iteration (which would be the phi update).
I've added a test case for this. Also see PR32396.
A follow up patch would generate the correct code gen for such cases,
and turn this vectorization on.
Differential Revision: https://reviews.llvm.org/D31910
Reviewers: mssimpso
llvm-svn: 299985
Analysis, it has Analysis passes, and once NewGVN is made an Analysis,
this removes the cross dependency from Analysis to Transform/Utils.
NFC.
llvm-svn: 299980
Before this patch, pass AddDiscriminators always avoided to assign
discriminators to intrinsic calls. This was done mainly for two reasons:
1) We wanted to minimize the number of based discriminators used.
2) We wanted to avoid non-deterministic discriminator assignment for
different debug levels.
Unfortunately, that approach was problematic for MemIntrinsic calls.
MemIntrinsic calls can be split by SROA into loads and stores, and each new
load/store instruction would obtain the debug location from the original
intrinsic call.
If we don't assign a discriminator to MemIntrinsic calls, then we cannot
correctly set the discriminator for the newly created loads and stores.
This may have a negative impact on the basic block weight computation
performed by the SampleLoader.
This patch fixes the issue by letting MemIntrinsic calls have a discriminator.
Differential Revision: https://reviews.llvm.org/D31900
llvm-svn: 299972
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments.
The variadic template is an obvious solution to both issues.
Differential Revision: https://reviews.llvm.org/D31070
llvm-svn: 299949
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.
From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.
llvm-svn: 299925
This re-lands r299875.
I introduced a bug in Clang code responsible for replacing K&R, no
prototype declarations with a real function definition with a prototype.
The bug was here:
// Collect any return attributes from the call.
- if (oldAttrs.hasAttributes(llvm::AttributeList::ReturnIndex))
- newAttrs.push_back(llvm::AttributeList::get(newFn->getContext(),
- oldAttrs.getRetAttributes()));
+ newAttrs.push_back(oldAttrs.getRetAttributes());
Previously getRetAttributes() carried AttributeList::ReturnIndex in its
AttributeList. Now that we return the AttributeSetNode* directly, it no
longer carries that index, and we call this overload with a single node:
AttributeList::get(LLVMContext&, ArrayRef<AttributeSetNode*>)
That aborted with an assertion on x86_32 targets. I added an explicit
triple to the test and added CHECKs to help find issues like this in the
future sooner.
llvm-svn: 299899
LLVM makes several assumptions about address space 0. However,
alloca is presently constrained to always return this address space.
There's no real way to avoid using alloca, so without this
there is no way to opt out of these assumptions.
The problematic assumptions include:
- That the pointer size used for the stack is the same size as
the code size pointer, which is also the maximum sized pointer.
- That 0 is an invalid, non-dereferencable pointer value.
These are problems for AMDGPU because alloca is used to
implement the private address space, which uses a 32-bit
index as the pointer value. Other pointers are 64-bit
and behave more like LLVM's notion of generic address
space. By changing the address space used for allocas,
we can change our generic pointer type to be LLVM's generic
pointer type which does have similar properties.
llvm-svn: 299888
Summary:
AttributeList::get(Fn|Ret|Param)Attributes no longer creates a temporary
AttributeList just to hide the AttributeSetNode type.
I've also added a factory method to create AttributeLists from a
parallel array of AttributeSetNodes. I think this simplifies
construction of AttributeLists when rewriting function prototypes.
Previously we would test if a particular index had attributes, and
conditionally add a temporary attribute list to a vector. Now the
attribute set vector is parallel to the argument vector already that
these passes already construct.
My long term vision is to wrap AttributeSetNode* inside an AttributeSet
type that holds the enum attributes, but that will come in a follow up
change.
I haven't done any performance measurements for this change because
profiling hasn't shown that any of the affected code is hot.
Reviewers: pete, chandlerc, sanjoy, hfinkel
Reviewed By: pete
Subscribers: jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D31198
llvm-svn: 299875
Summary:
While we don't want them aliasing with other pointers, there seems to
be no point in not having them clobber must-aliased'd pointers.
If some day, we split the aliasing and ordering chains, we'd make this
not aliasing but an ordering barrier (IE it doesn't affect it's
memory, but we can't hoist it above it).
Reviewers: hfinkel, george.burgess.iv
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31865
llvm-svn: 299865
In isUseTriviallyOptimizableToLiveOnEntry, pointsToConstantMemory needs to be
called on the load's pointer operand, not on the result of the load (which
might not even be a pointer).
llvm-svn: 299823
Summary:
getModRefInfo is meant to answer the question "what impact does this
instruction have on a given memory location" (not even another
instruction).
Long debate on this on IRC comes to the conclusion the answer should be "nothing special".
That is, a noalias volatile store does not affect a memory location
just by being volatile. Note: DSE and GVN and memdep currently
believe this, because memdep just goes behind AA's back after it says
"modref" right now.
see line 635 of memdep. Prior to this patch we would get modref there, then check aliasing,
and if it said noalias, we would continue.
getModRefInfo *already* has this same AA check, it just wasn't being used because volatile was
lumped in with ordering.
(I am separately testing whether this code in memdep is now dead except for the invariant load case)
Reviewers: jyknight, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31726
llvm-svn: 299741
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.
From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.
Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>
Differential Revision: https://reviews.llvm.org/D31070
llvm-svn: 299699
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.
Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.
This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.
At this moment it does not work on Gold (as in the globals are never
stripped).
This is a re-land of r298158 rebased on D31358. This time,
asan.module_ctor is put in a comdat as well to avoid quadratic
behavior in Gold.
llvm-svn: 299697
Create the constructor in the module pass.
This in needed for the GC-friendly globals change, where the constructor can be
put in a comdat in some cases, but we don't know about that in the function
pass.
This is a rebase of r298731 which was reverted due to a false alarm.
llvm-svn: 299695
Summary:
Remove all the caching the clobber walker does, and that the
caching walker does. With the patch to enable storing clobbering
access results for stores, i can find no improvement with the cache
turned on (and a number of degradations, both time and memory, from
the cost of caching. For a large program i have, we do millions of
lookups and inserts with zero hits).
I haven't tried to rename or simplify the walker otherwise yet.
(Appreciate some perf testing on this past my own testing)
Reviewers: george.burgess.iv, davide
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31576
llvm-svn: 299578
stores with some fixes.
Summary:
This enables us to cache the clobbering access for stores, despite the
fact that we can't rewrite the use-def chains themselves.
Early testing shows that, after this change, for larger testcases, it
will be a significant net positive (memory and time) to remove the
walker caching.
Reviewers: george.burgess.iv, davide
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31567
llvm-svn: 299486
Disable bypassing if one of the operands looks like a hash value. Slow
division often occurs in hashtable implementations and fast division is
never taken there because a hash value is extremely unlikely to have
enough upper bits set to zero.
A value is considered to be hash-like if it is produced by
1) XOR operation
2) Multiplication by a constant wider than the shorter type
3) PHI node with all incoming values being hash-like
Differential Revision: https://reviews.llvm.org/D28200
llvm-svn: 299329
Summary:
This enables us to cache the clobbering access for stores, despite the
fact that we can't rewrite the use-def chains themselves.
Early testing shows that, after this change, for larger testcases, it will be a significant net positive (memory and time) to remove the walker caching.
Reviewers: george.burgess.iv, davide
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31567
llvm-svn: 299322
A common way to implement nearbyint is by fiddling with the floating
point environment and calling rint. This is used at least by the BSD
libm and musl. As such, canonicalizing the latter to the former will
create infinite loops for libm and generally pessimize performance, at
least when the generic C versions are used.
This change preserves the rint in the libcall translation and also
handles the domain truncation logic, so that rint with float argument
will be reduced to rintf etc.
llvm-svn: 299247
Since there is no sdiv in SCEV, an 'udiv' is a better canonical form than an 'sdiv' as the user of induction variable
Differential Revision: https://reviews.llvm.org/D31488
llvm-svn: 299118
The first variant contains all current transformations except
transforming switches into lookup tables. The second variant
contains all current transformations.
The switch-to-lookup-table conversion results in code that is more
difficult to analyze and optimize by other passes. Most importantly,
it can inhibit Dead Code Elimination. As such it is often beneficial to
only apply this transformation very late. A common example is inlining,
which can often result in range restrictions for the switch expression.
Changes in execution time according to LNT:
SingleSource/Benchmarks/Misc/fp-convert +3.03%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20%
MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43%
and a couple of smaller changes. For perimeter it also results 2.6%
a smaller binary.
Differential Revision: https://reviews.llvm.org/D30333
llvm-svn: 298799
This moves it to the iterator facade utilities giving it full random
access semantics, etc. It can also now be used with standard algorithms
like std::all_of and std::any_of and range adaptors like llvm::reverse.
Also make the semantics of iterating match what every other iterator
uses and forbid decrementing past the begin iterator. This was used as
a hacky way to work around iterator invalidation. However, every
instance trying to do this failed to actually avoid touching invalid
iterators despite the clear documentation that the removed and all
subsequent iterators become invalid including the end iterator. So I've
added a return of the next iterator to removeCase and rewritten the
loops that were doing this to correctly follow the iterator pattern of
either incremneting or removing and assigning fresh values to the
iterator and the end.
In one case we were trying to go backwards to make this cleaner but it
doesn't actually work. I've made that code match the code we use
everywhere else to remove cases as we iterate. This changes the order of
cases in one test output and I moved that test to CHECK-DAG so it
wouldn't care -- the order isn't semantically meaningful anyways.
llvm-svn: 298791
Create the constructor in the module pass.
This in needed for the GC-friendly globals change, where the constructor can be
put in a comdat in some cases, but we don't know about that in the function
pass.
llvm-svn: 298731
Summary: Declarations need to be filtered out when counting functions.
Reviewers: eraman
Subscribers: Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D31336
llvm-svn: 298720
Library functions can have specific semantics that affect the behavior of
certain passes. DSE, for instance, gives special treatment to malloc-ed pointers
but not to pointers returned from an equivalently typed (but differently named)
function.
MetaRenamer ought not to alter program semantics, so library functions must
remain untouched.
Reviewers: mehdi_amini, majnemer, chandlerc, davide
Reviewed By: davide
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D31304
llvm-svn: 298659
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.
Rename AttributeSetImpl to AttributeListImpl to follow suit.
It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.
Reviewers: sanjoy, javed.absar, chandlerc, pete
Reviewed By: pete
Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits
Differential Revision: https://reviews.llvm.org/D31102
llvm-svn: 298393
Summary: Inliner should update the branch_weights annotation to scale it to proper value.
Reviewers: davidxl, eraman
Reviewed By: eraman
Subscribers: zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D30767
llvm-svn: 298270
NFCI.
Summary:
This is ground work for the changes to enable coercion in NewGVN.
GVN doesn't care if they end up constant because it eliminates as it goes.
NewGVN cares.
IRBuilder and ConstantFolder deliberately present the same interface,
so we use this to our advantage to templatize our functions to make
them either constant only or not.
Reviewers: davide
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30928
llvm-svn: 298262
When InstCombine calls into SimplifyLibCalls and it createa putChar calls, we don't infer the attributes. And since SimplifyLibCalls doesn't use InstCombine's IRBuilder the calls doesn't end up in the worklist on this iteration of InstCombine. So it gets picked up on the next iteration where it causes an IR change. This of course causes InstCombine to run another iteration.
So this patch just gets the attributes right the first time. We already did this for puts and some other libcalls.
Differential Revision: https://reviews.llvm.org/D31094
llvm-svn: 298171
Use a combination of !associated, comdat, @llvm.compiler.used and
custom sections to allow dead stripping of globals and their asan
metadata. Sometimes.
Currently this works on LLD, which supports SHF_LINK_ORDER with
sh_link pointing to the associated section.
This also works on BFD, which seems to treat comdats as
all-or-nothing with respect to linker GC. There is a weird quirk
where the "first" global in each link is never GC-ed because of the
section symbols.
At this moment it does not work on Gold (as in the globals are never
stripped).
Differential Revision: https://reviews.llvm.org/D30121
llvm-svn: 298158
Users often call getArgumentList().size(), which is a linear way to get
the number of function arguments. arg_size(), on the other hand, is
constant time.
In general, the fact that arguments are stored in an iplist is an
implementation detail, so I've removed it from the Function interface
and moved all other users to the argument container APIs (arg_begin(),
arg_end(), args(), arg_size()).
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D31052
llvm-svn: 298010
[Reapplies r297971 and punting on finding a better API for findDbgValues()]
This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.
In the example in the testcase (which was extracted from XNU) there is a sequence of
%4 = load %struct.entry*, %struct.entry** %next2, align 8, !dbg !41
%5 = bitcast %struct.entry* %4 to i8*, !dbg !42
%add.ptr4 = getelementptr inbounds i8, i8* %5, i64 -8, !dbg !43
%6 = bitcast i8* %add.ptr4 to %struct.entry*, !dbg !44
call void @llvm.dbg.value(metadata %struct.entry* %6, i64 0, metadata !20, metadata !21), !dbg 34
When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:
- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic
The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.
rdar://problem/30725338
Differential Revision: https://reviews.llvm.org/D30919
llvm-svn: 297994
This patch improves debug info quality in InstCombine by looking at
values that are about to be deleted, checking whether there are any
dbg.value instrinsics referring to them, and potentially encoding the
semantics of the deleted instruction into the dbg.value's
DIExpression.
In the example in the testcase (which was extracted from XNU) there is a sequence of
%4 = load %struct.entry*, %struct.entry** %next2, align 8, !dbg !41
%5 = bitcast %struct.entry* %4 to i8*, !dbg !42
%add.ptr4 = getelementptr inbounds i8, i8* %5, i64 -8, !dbg !43
%6 = bitcast i8* %add.ptr4 to %struct.entry*, !dbg !44
call void @llvm.dbg.value(metadata %struct.entry* %6, i64 0, metadata !20, metadata !21), !dbg 34
When these instructions are eliminated by instcombine one after
another, we can still salvage the otherwise dead debug info:
- Bitcasts have no effect, so have the dbg.value point to operand(0)
- Loads can be expressed via a DW_OP_deref
- Constant gep instructions can be replaced by DWARF expression arithmetic
The API introduced by this patch is not specific to instcombine and
can be useful in other places, too.
rdar://problem/30725338
Differential Revision: https://reviews.llvm.org/D30919
llvm-svn: 297971
Summary:
These are the functions used to determine when values of loads can be
extracted from stores, etc, and to perform the necessary insertions to
do this. There are no changes to the functions themselves except
reformatting, and one case where memdep was informed of a removed load
(which was pushed into the caller).
Reviewers: davide
Subscribers: mgorny, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30478
llvm-svn: 297438
This reverts commit r296488.
As noted by David Blaikie on llvm-commits, I overlooked the case of a
debug function being inlined into a nodebug function being inlined
into a debug function.
llvm-svn: 297163
Summary:
We should check if loop size allows us to peel at least one iteration
before we do so.
Patch by Max Kazantsev!
Reviewers: sanjoy, mkuper, efriedma
Reviewed By: mkuper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30632
llvm-svn: 297122
LoopInfo::getLoopFor returns nullptr if a BB is not in a loop and only
then can the loop be updated to contain the newly created BBs. Add the
missing nullptr check to SplitBlockAndInsertIfThen.
Within LLVM, the only user of this function that also passes a LoopInfo
to be updated is InnerLoopVectorizer::predicateInstructions().
As the method's name implies, the BB operataten on will always be within
a loop, but out-of-tree users may also use it differently (here: Polly).
All other uses of LoopInfo::getLoopFor in the file properly check its
return value for nullptr.
llvm-svn: 297016
Summary:
If a loop contains a Phi node which has an invariant input from back
edge, it is profitable to peel such loops (rather than unroll them) to
use the advantage that this Phi is always invariant starting from 2nd
iteration. After the 1st iteration is peeled, other optimizations can
potentially simplify calculations with this invariant.
Patch by Max Kazantsev!
Reviewers: sanjoy, apilipenko, igor-laevsky, anna, mkuper, reames
Reviewed By: mkuper
Subscribers: mkuper, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D30161
llvm-svn: 296898
Summary:
In current implementation the loop peeling happens after trip-count based partial unrolling and may
sometimes not happen at all due to it (for example, if trip count is known, but UP.Partial = false). This
is generally bad, the more than there are some situations where peeling is profitable even if the partial
unrolling is disabled.
This patch is a NFC which reorders peeling and partial unrolling application and prepares the code for
implementation of the said optimizations.
Patch by Max Kazantsev!
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper
Reviewed By: mkuper
Subscribers: mkuper, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30243
llvm-svn: 296897
ValueTracking is used for more thorough analysis of operands. Based on the
analysis, either run-time checks can be simplified (e.g. check only one operand
instead of two) or the transformation can be avoided. For example, it is quite
often the case that a divisor is promoted from a shorter type and run-time
checks for it are redundant.
With additional compile-time analysis of values, two special cases naturally
arise and are addressed by the patch:
1) Both operands are known to be short enough. Then, the long division can be
simply replaced with a short one without CFG modification.
2) If a division is unsigned and the dividend is known to be short then the
long division is not needed at all. Because if the divisor is too big for
short division then the quotient is obviously zero (and the remainder is
equal to the dividend). Actually, the division is not needed when
(divisor > dividend).
Differential Revision: https://reviews.llvm.org/D29897
llvm-svn: 296832
The most important goal of the patch is to break large insertFastDiv function
into separate pieces, so that later a different fast insertion logic can be
implemented using some of these pieces.
Differential Revision: https://reviews.llvm.org/D29896
llvm-svn: 296828
The LLVM backend cannot produce any debug info for an llvm::Function
without a DISubprogram attachment. When inlining a debug-info-carrying
function into a nodebug function, there is therefore no reason to keep
any debug info intrinsic calls or debug locations on the instructions.
This fixes a problem discovered in PR32042.
rdar://problem/30679307
llvm-svn: 296488
This was suggested in D27855: have the inliner add assumptions, so we don't
lose nonnull info provided by argument attributes.
This still doesn't solve PR28430 (dyn_cast), but this gets us closer.
https://reviews.llvm.org/D29999
llvm-svn: 296366
Summary:
Depends on D29606 and D29682
Makes us pass GVN's edge.ll (we also will pass a few other testcases
they just need cleaning up).
Thoughts on the Predicate* hiearchy of classes especially welcome :)
(it's not clear to me how best to organize it, and currently, the getBlock* seems ... uglier than maybe wasting a field somewhere or something).
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29747
llvm-svn: 295889
Add updater to passes that now need it.
Move around code in MemorySSA to expose needed functions.
Summary: Mostly cleanup
Reviewers: george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30221
llvm-svn: 295887
Summary:
This lets one add aliasing stores to the updater.
(i'm next going to move the creation/etc functions to the updater)
Reviewers: george.burgess.iv
Subscribers: llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D30154
llvm-svn: 295677
Summary:
JumpThreading for guards feature has been reverted at https://reviews.llvm.org/rL295200
due to the following problem: the feature used the following algorithm for detection of
diamond patters:
1. Find a block with 2 predecessors;
2. Check that these blocks have a common single parent;
3. Check that the parent's terminator is a branch instruction.
The problem is that these checks are insufficient. They may pass for a non-diamond
construction in case if those two predecessors are actually the same block. This may
happen if parent's terminator is a br (either conditional or unconditional) to a block
that ends with "switch" instruction with exactly two branches going to one block.
This patch re-enables the JumpThreading for guards and fixes this issue by adding the
check that those found predecessors are actually different blocks. This guarantees that
parent's terminator is a conditional branch with exactly 2 different successors, which
is now ensured by assertions. It also adds two more tests for this situation (with parent's
terminator being a conditional and an unconditional branch).
Patch by Max Kazantsev!
Reviewers: anna, sanjoy, reames
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D30036
llvm-svn: 295410
Multiple blocks in the callee can be mapped to a single cloned block
since we prune the callee as we clone it. The existing code
iterates over the value map and clones the block frequency (and
eventually scales the frequencies of the cloned blocks). Value map's
iteration is not deterministic and so the cloned block might get the
frequency of any of the original blocks. The fix is to set the max of
the original frequencies to the cloned block. The first block in the
sequence must have this max frequency and, in the call context,
subsequent blocks must have its frequency.
Differential Revision: https://reviews.llvm.org/D29696
llvm-svn: 295115
Summary:
When setting debugloc for instructions created in SplitBlockPredecessors, current implementation copies debugloc from the first-non-phi instruction of the original basic block. However, if the first-non-phi instruction is a call for @llvm.dbg.value, the debugloc of the instruction may point the location outside of the block itself. For the example code of
```
1 typedef struct _node_t {
2 struct _node_t *next;
3 } node_t;
4
5 extern node_t *root;
6
7 int foo() {
8 node_t *node, *tmp;
9 int ret = 0;
10
11 node = tmp = root->next;
12 while (node != root) {
13 while (node) {
14 tmp = node;
15 node = node->next;
16 ret++;
17 }
18 }
19
20 return ret;
21 }
```
, below is the basicblock corresponding to line 12 after Reassociate expressions pass:
```
while.cond: ; preds = %while.cond2, %entry
%node.0 = phi %struct._node_t* [ %1, %entry ], [ null, %while.cond2 ]
%ret.0 = phi i32 [ 0, %entry ], [ %ret.1, %while.cond2 ]
tail call void @llvm.dbg.value(metadata i32 %ret.0, i64 0, metadata !19, metadata !20), !dbg !21
tail call void @llvm.dbg.value(metadata %struct._node_t* %node.0, i64 0, metadata !11, metadata !20), !dbg !31
%cmp = icmp eq %struct._node_t* %node.0, %0, !dbg !33
br i1 %cmp, label %while.end5, label %while.cond2, !dbg !35
```
As you can see, the first-non-phi instruction is a call for @llvm.dbg.value, and the debugloc is
```
!21 = !DILocation(line: 9, column: 7, scope: !6)
```
, which is a definition of 'ret' variable and outside of the scope of the basicblock itself. However, current implementation picks up this debugloc for the instructions created in SplitBlockPredecessors. This patch addresses this problem by picking up debugloc from the first-non-phi-non-dbg instruction.
Reviewers: dblaikie, samsonov, eugenis
Reviewed By: eugenis
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29867
llvm-svn: 295106
Summary:
This adds support for placing predicateinfo such that it affects critical edges.
This fixes the issues mentioned by Nuno on the mailing list.
Depends on D29519
Reviewers: davide, nlopes
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29606
llvm-svn: 294921
Summary:
This patch starts the implementation as discuss in the following RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html
When optimization duplicates code that will scale down the execution count of a basic block, we will record the duplication factor as part of discriminator so that the offline process tool can find the duplication factor and collect the accurate execution frequency of the corresponding source code. Two important optimization that fall into this category is loop vectorization and loop unroll. This patch records the duplication factor for these 2 optimizations.
The recording will be guarded by a flag encode-duplication-in-discriminators, which is off by default.
Reviewers: probinson, aprantl, davidxl, hfinkel, echristo
Reviewed By: hfinkel
Subscribers: mehdi_amini, anemet, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26420
llvm-svn: 294782
Summary:
This patch allows JumpThreading also thread through guards.
Virtually, guard(cond) is equivalent to the following construction:
if (cond) { do something } else {deoptimize}
Yet it is not explicitly converted into IFs before lowering.
This patch enables early threading through guards in simple cases.
Currently it covers the following situation:
if (cond1) {
// code A
} else {
// code B
}
// code C
guard(cond2)
// code D
If there is implication cond1 => cond2 or !cond1 => cond2, we can transform
this construction into the following:
if (cond1) {
// code A
// code C
} else {
// code B
// code C
guard(cond2)
}
// code D
Thus, removing the guard from one of execution branches.
Patch by Max Kazantsev!
Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29620
llvm-svn: 294617
Summary:
This patch adds a utility to build extended SSA (see "ABCD: eliminating
array bounds checks on demand"), and an intrinsic to support it. This
is then used to get functionality equivalent to propagateEquality in
GVN, in NewGVN (without having to replace instructions as we go). It
would work similarly in SCCP or other passes. This has been talked
about a few times, so i built a real implementation and tried to
productionize it.
Copies are inserted for operands used in assumes and conditional
branches that are based on comparisons (see below for more)
Every use affected by the predicate is renamed to the appropriate
intrinsic result.
E.g.
%cmp = icmp eq i32 %x, 50
br i1 %cmp, label %true, label %false
true:
ret i32 %x
false:
ret i32 1
will become
%cmp = icmp eq i32, %x, 50
br i1 %cmp, label %true, label %false
true:
; Has predicate info
; branch predicate info { TrueEdge: 1 Comparison: %cmp = icmp eq i32 %x, 50 }
%x.0 = call @llvm.ssa_copy.i32(i32 %x)
ret i32 %x.0
false:
ret i23 1
(you can use -print-predicateinfo to get an annotated-with-predicateinfo dump)
This enables us to easily determine what operations are affected by a
given predicate, and how operations affected by a chain of
predicates.
Reviewers: davide, sanjoy
Subscribers: mgorny, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D29519
Update for review comments
Fix a bug Nuno noticed where we are giving information about and/or on edges where the info is not useful and easy to use wrong
Update for review comments
llvm-svn: 294351
The importer was previously using ModuleLinker in a sort of "IRMover mode". Use
IRMover directly instead in order to remove a level of indirection.
I will remove all importing support from ModuleLinker in a separate
change.
Differential Revision: https://reviews.llvm.org/D29468
llvm-svn: 294014
Summary:
I have a similar patch up for review already (D29173). If you prefer I
can squash them both together.
Also I think there more potential for code sharing between
LoopUnroll.cpp and LoopUnrollRuntime.cpp. Do you think patches for
that would be worthwhile?
Reviewers: mkuper, mzolotukhin
Reviewed By: mkuper, mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29311
llvm-svn: 293758
Summary:
rL293124 added the necessary infrastructure to properly add the cloned
top level loop to LoopInfo, which means we do not have to do it manually
in CloneLoopBlocks.
@mkuper sorry for not pointing this out during my review of D29156, I just
realized that today.
Reviewers: mzolotukhin, chandlerc, mkuper
Reviewed By: mkuper
Subscribers: llvm-commits, mkuper
Differential Revision: https://reviews.llvm.org/D29173
llvm-svn: 293615
Summary: Along with https://reviews.llvm.org/D27804, debug locations need to be merged when hoisting store instructions as well. Not sure if just dropping debug locations would make more sense for this case, but as the branch instruction will have at least different discriminator with the hoisted store instruction, I think there will be no difference in practice.
Reviewers: aprantl, andreadb, danielcdh
Reviewed By: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29062
llvm-svn: 293372
Summary: Extend the MemorySSAUpdater API to allow movement to arbitrary places
Reviewers: davide, george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29239
llvm-svn: 293363
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html
For reference:
- Public headers should just declare the dump() method but not use
LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD void MyClass::dump() {
// print stuff to dbgs()...
}
#endif
llvm-svn: 293359
insertUse, moveBefore and moveAfter operations.
Summary:
This creates a basic MemorySSA updater that handles arbitrary
insertion of uses and defs into MemorySSA, as well as arbitrary
movement around the CFG. It replaces the current splice API.
It can be made to handle arbitrary control flow changes.
Currently, it uses the same updater algorithm from D28934.
The main difference is because MemorySSA is single variable, we have
the complete def and use list, and don't need anyone to give it to us
as part of the API. We also have to rename stores below us in some
cases.
If we go that direction in that patch, i will merge all the updater
implementations (using an updater_traits or something to provide the
get* functions we use, called read*/write* in that patch).
Sadly, the current SSAUpdater algorithm is way too slow to use for
what we are doing here.
I have updated the tests we have to basically build memoryssa
incrementally using the updater api, and make sure it still comes out
the same.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29047
llvm-svn: 293356
Summary:
Some frontends emit a speculate-and-select idiom for sqrt, wherein they compute
sqrt(x), check if x is negative, and select NaN if it is:
%cmp = fcmp olt double %a, -0.000000e+00
%sqrt = call double @llvm.sqrt.f64(double %a)
%ret = select i1 %cmp, double 0x7FF8000000000000, double %sqrt
This is technically UB as the LangRef is written today if %a is ever less than
-0. But emitting code that's compliant with the current definition of sqrt
would require a branch, which would then prevent us from matching this idiom in
SelectionDAG (which we do today -- ISD::FSQRT has defined behavior on negative
inputs), because SelectionDAG looks at one BB at a time.
Nothing in LLVM takes advantage of this undefined behavior, as far as we can
tell, and the fact that llvm.sqrt has UB dates from its initial addition to the
LangRef.
Reviewers: arsenm, mehdi_amini, hfinkel
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D28797
llvm-svn: 293242
Even when we don't create a remainder loop (that is, when we unroll by 2), we
may duplicate nested loops into the remainder. This is complicated by the fact
the remainder may itself be either inserted into an outer loop, or at the top
level. In the latter case, we may need to create new top-level loops.
Differential Revision: https://reviews.llvm.org/D29156
llvm-svn: 293124
Summary:
This is the first in a series of patches to add a simple, generalized updater to MemorySSA.
For MemorySSA, every def is may-def, instead of the normal must-def.
(the best way to think of memoryssa is "everything is really one variable, with different versions of that variable at different points in the program).
This means when updating, we end up having to do a bunch of work to touch defs below and above us.
In order to support this quickly, i have ilist'd all the defs for each block. ilist supports tags, so this is quite easy. the only slightly messy part is that you can't have two iplists for the same type that differ only whether they have the ownership part enabled or not, because the traits are for the value type.
The verifiers have been updated to test that the def order is correct.
Reviewers: george.burgess.iv
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D29046
llvm-svn: 293085
Conservatively disable sinking and merging inline-asm instructions as doing so
can potentially create arguments that cannot satisfy the inline-asm constraints.
For example, SimplifyCFG used to do the following transformation:
(before)
if.then:
%0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 8)
br label %if.end
if.else:
%1 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 6)
br label %if.end
(after)
%.sink = select i1 %tobool, i32 6, i32 8
%0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 %.sink)
This would result in a crash in the backend since only immediate integer operands
are permitted for constraint "n".
rdar://problem/30110806
Differential Revision: https://reviews.llvm.org/D29111
llvm-svn: 293025
a lazy-asserting PoisoningVH.
AssertVH is fundamentally incompatible with cache-invalidation of
analysis results. The invaliadtion happens after the AssertingVH has
already fired. Instead, use a PoisoningVH that will assert if the
dangling handle is ever used rather than merely be assigned or
destroyed.
This patch also removes all of the (numerous) doomed attempts to work
around this fundamental incompatibility. It is a pretty significant
simplification IMO.
The most interesting change is in the Inliner where we still do some
clearing because we don't want to rely on the coarse grained
invalidation strategy of the containing pass manager. However, I prefer
the approach that contains this logic to the cleanup phase of the
Inliner, and I think we could enhance the CGSCC analysis management
layer to make this even better in the future if desired.
The rest is straight cleanup.
I've also added a test for one of the harder cases to work around: when
a *module analysis* contains many AssertingVHes pointing at functions.
Differential Revision: https://reviews.llvm.org/D29006
llvm-svn: 292928
With this change dominator tree remains in sync after each step of loop
peeling.
Differential Revision: https://reviews.llvm.org/D29029
llvm-svn: 292895
Running non-LCSSA-preserving LoopSimplify followed by LCSSA on (roughly) the
same loop is incorrect, since LoopSimplify may break LCSSA arbitrarily higher
in the loop nest. Instead, run LCSSA first, and then run LCSSA-preserving
LoopSimplify on the result.
This fixes PR31718.
Differential Revision: https://reviews.llvm.org/D29055
llvm-svn: 292854
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).
Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.
The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)
There are additional changes required in clang.
Reviewers: rsmith
Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28476
llvm-svn: 292848
the library routine shared with the new PM and other code.
This assert checks that when LCSSA preservation is requested we start in
LCSSA form. Without this early assert, given *very* complex test cases
we can hit an assert or crash much later on when trying to preserve
LCSSA.
The new PM's loop simplify doesn't need to (and indeed can't) preserve
LCSSA as the new PM doesn't deal in transforms in the dependency graph.
But we asked the library to and shockingly, this didn't work very well!
Stop doing that. Now the assert will tell us immediately with existing
test cases. Before this, it took a pretty convoluted input to trigger
this.
However, sinking the assert also found a bug in LoopUnroll where we
asked simplifyLoop to preserve LCSSA *right before we reform it*. That's
kinda silly and unsurprising that it wasn't available. =D Stop doing
that too.
We also would assert that the unrolled loop was in LCSSA even if
preserving LCSSA was never requested! I don't have a test case or
anything here. I spotted it by inspection and it seems quite obvious. No
logic change anyways, that's just avoiding a spurrious assert.
llvm-svn: 292710
This adds the following to the new PM based inliner in PGO mode:
* Use block frequency analysis to derive callsite's profile count and use
that to adjust thresholds of hot and cold callsites.
* Incrementally update the BFI of the caller after a callee gets inlined
into it. This incremental update is only within an invocation of the run
method - BFI is not preserved across calls to run.
Update the function entry count of the callee after inlining it into a
caller.
* I've tuned the thresholds for the hot and cold callsites using a hacked
up version of the old inliner that explicitly computes BFI on a set of
internal benchmarks and spec. Once the new PM based pipeline stabilizes
(IIRC Chandler mentioned there are known issues) I'll benchmark this
again and adjust the thresholds if required.
Inliner PGO support.
Differential revision: https://reviews.llvm.org/D28331
llvm-svn: 292666
Mostly straightforward changes; we just didn't do the computation before.
One sort of interesting change in LoopUnroll.cpp: we weren't handling
dominance for children of the loop latch correctly, but
foldBlockIntoPredecessor hid the problem for complete unrolling.
Currently punting on loop peeling; made some minor changes to isolate
that problem to LoopUnrollPeel.cpp.
Adds a flag -unroll-verify-domtree; it verifies the domtree immediately
after we finish updating it. This is on by default for +Asserts builds.
Differential Revision: https://reviews.llvm.org/D28073
llvm-svn: 292447
We currently check whether a reduction has a single outside user. We don't
really need to require that - we just need to make sure a single value is
used externally. The number of external users of that value shouldn't actually
matter.
Differential Revision: https://reviews.llvm.org/D28830
llvm-svn: 292424
Add missing fabs(fpext) optimzation that worked with the call,
and also fixes it creating a second fpext when there were multiple
uses.
llvm-svn: 292172
a function's CFG when that CFG is unchanged.
This allows transformation passes to simply claim they preserve the CFG
and analysis passes to check for the CFG being preserved to remove the
fanout of all analyses being listed in all passes.
I've gone through and removed or cleaned up as many of the comments
reminding us to do this as I could.
Differential Revision: https://reviews.llvm.org/D28627
llvm-svn: 292054
Summary:
This fixes Transforms/LoopUnroll/runtime-loop3.ll which failed with
EXTENSIVE_DEBUG, because the cloned basic blocks were not added to the
correct sub-loops in LoopUnrollRuntime.cpp.
Reviewers: dexonsmith, mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28482
llvm-svn: 291619
Move the code to update LoopInfo for cloned basic blocks to
addClonedBlockToLoopInfo, as suggested in
https://reviews.llvm.org/D28482.
llvm-svn: 291614
Bail out instead of asserting when we encounter this situation,
which can actually happen.
The reason the test uses the new PM is that the "bad" phi, incidentally, gets
cleaned up by LoopSimplify. But LICM can create this kind of phi and preserve
loop simplify form, so the cleanup has no chance to run.
This fixes PR31190.
We may want to solve this in a less conservative manner, since this phi is
actually uniform within the inner loop (or we may want LICM to output a cleaner
promotion to begin with).
Differential Revision: https://reviews.llvm.org/D28490
llvm-svn: 291589
fabs(x * x) is not generally safe to assume x is positive if x is a NaN.
This is also less general than it could be, so this will be replaced
with a transformation on the intrinsic.
llvm-svn: 291359
Summary:
r285871 introduced an assert that was overly aggressive in the case
of a same-named local in different same-named files (in different
directories), where the source name and therefore the GUID ended up
the same because the files were compiled in their own directory without
any leading path. Change the handling in the promotion logic to get
the summary for the version in that module.
This also exposed an issue where we are not always importing the
right copy, which is a performance not correctness issue (because
the renaming is based on the module hash which must be different,
see the bug report for details). I will fix that as a follow-on.
Fixes PR31561.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28411
llvm-svn: 291304
Summary:
This adds a new summary flag NotEligibleToImport that subsumes
several existing flags (NoRename, HasInlineAsmMaybeReferencingInternal
and IsNotViableToInline). It also subsumes the checking of references
on the summary that was being done during the thin link by
eligibleForImport() for each candidate. It is much more efficient to
do that checking once during the per-module summary build and record
it in the summary.
Reviewers: mehdi_amini
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28169
llvm-svn: 291108
This reapplies r289828 (reverted in r289833 as it broke the address sanitizer). The
debugloc is now only set when the instruction is not a call, as this causes the
verifier to assert (the inliner requires an inlinable callsite to have a debug loc
if the caller and callee have debug info).
Original commit message:
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Original review: https://reviews.llvm.org/D27590
llvm-svn: 290973
Summary:
Regardless how the loop body weight is distributed, we should preserve
total loop body weight. i.e. we should have same weight reaching the body of the loop
or its duplicates in peeled and unpeeled case.
Reviewers: mkuper, davidxl, anemet
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28179
llvm-svn: 290833
inside of `InlineFunction`. Prior to this, call instructions are
specifically being rewritten and replaced within the inlined region,
invalidating some of the call sites.
Several of these regions are using the same technique to walk the
inlined region so this seems clearly safe up to this point.
I've also added a short circuit to the scan for call sites based on what
other code is doing.
With this, the most common crash I've found in the new inliner code is
fixed. I've turned it on for another test case that covers this
scenario.
I'll make my way through most of the other inliner test cases
just to get some easy coverage next.
llvm-svn: 290562
removing fully-dead comdats without removing dead entries in comdats
with live members.
This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.
I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).
llvm-svn: 290557
Make it clear that TripCount is the upper bound of the iteration on which
control exits LatchBlock.
Differential Revision: https://reviews.llvm.org/D26675
llvm-svn: 290199
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.
Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
this at all. Active discussion and investigation is going on to remove
it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
Why? Because it adds what I suspect is inappropriate coupling for
little or no benefit. We will have an outer iteration system that
tracks devirtualization including that from function passes and
iterates already. We should improve that rather than approximate it
here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
reason at all.
The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.
A summary of the different things happening here:
1) Adding the usual new PM class and rigging.
2) Fixing minor underlying assumptions in the inline cost analysis or
inline logic that don't generally hold in the new PM world.
3) Adding the core pass logic which is in essence a loop over the calls
in the nodes in the call graph. This is a bit duplicated from the old
inliner, but only a handful of lines could realistically be shared.
(I tried at first, and it really didn't help anything.) All told,
this is only about 100 lines of code, and most of that is the
mechanics of wiring up analyses from the new PM world.
4) Updating the LazyCallGraph (in the new PM) based on the *newly
inlined* calls and references. This is very minimal because we cannot
form cycles.
5) When inlining removes the last use of a function, eagerly nuking the
body of the function so that any "one use remaining" inline cost
heuristics are immediately refined, and queuing these functions to be
completely deleted once inlining is complete and the call graph
updated to reflect that they have become dead.
6) After all the inlining for a particular function, updating the
LazyCallGraph and the CGSCC pass manager to reflect the
function-local simplifications that are done immediately and
internally by the inline utilties. These are the exact same
fundamental set of CG updates done by arbitrary function passes.
7) Adding a bunch of test cases to specifically target CGSCC and other
subtle aspects in the new PM world.
Many thanks to the careful review from Easwaran and Sanjoy and others!
Differential Revision: https://reviews.llvm.org/D24226
llvm-svn: 290161
Lowering to llvm.cttz() will result in constant folding anyway
if the argument to ffs is a constant. Pointed out by Eli for
fls() in D14590.
llvm-svn: 289888
Now that a new API to merge debug locations has been committed at r289661 (see
review D26256 for more details), we can use it to "improve" the code added by
revision r280995.
Instead of nulling the debugloc of a commoned instruction, we use the 'merged'
debug location. At the moment, this is just a no functional change since
function `DILocation::getMergedLocation()` is just a stub and would always
return a null location.
Differential Revision: https://reviews.llvm.org/D27804
llvm-svn: 289862
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Differential Revision: https://reviews.llvm.org/D27590
llvm-svn: 289828
After r289755, the AssumptionCache is no longer needed. Variables affected by
assumptions are now found by using the new operand-bundle-based scheme. This
new scheme is more computationally efficient, and also we need much less
code...
llvm-svn: 289756
At least the plugin used by the LibreOffice build
(<https://wiki.documentfoundation.org/Development/Clang_plugins>) indirectly
uses those members (through inline functions in LLVM/Clang include files in turn
using them), but they are not exported by utils/extract_symbols.py on Windows,
and accessing data across DLL/EXE boundaries on Windows is generally
problematic.
Differential Revision: https://reviews.llvm.org/D26671
llvm-svn: 289647
When a function F is inlined, InlineFunction extends the debug location of every
instruction inlined from F by adding an InlinedAt.
However, if an instruction has a 'null' debug location, InlineFunction would
propagate the callsite debug location to it. This behavior existed since
revision 210459.
Revision 210459 was originally committed specifically to workaround the lack of
debug information for instructions inlined from intrinsic functions (which are
usually declared with attributes `__always_inline__, __nodebug__`).
The problem with revision 210459 is that it doesn't make any sort of distinction
between instructions inlined from a 'nodebug' function and instructions which
are inlined from a function built with debug info. This issue may lead to
incorrect stepping in the debugger.
This patch works under the assumption that a nodebug function does not have a
DISubprogram. When a function F is inlined into another function G,
InlineFunction checks if F has debug info associated with it.
For nodebug functions, the InlineFunction logic is unchanged (i.e. it would
still propagate the callsite debugloc to the inlined instructions). Otherwise,
InlineFunction no longer propagates the callsite debug location.
Differential Revision: https://reviews.llvm.org/D27462
llvm-svn: 288895
so we can stop using DW_OP_bit_piece with the wrong semantics.
The entire back story can be found here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161114/405934.html
The gist is that in LLVM we've been misinterpreting DW_OP_bit_piece's
offset field to mean the offset into the source variable rather than
the offset into the location at the top the DWARF expression stack. In
order to be able to fix this in a subsequent patch, this patch
introduces a dedicated DW_OP_LLVM_fragment operation with the
semantics that we used to apply to DW_OP_bit_piece, which is what we
actually need while inside of LLVM. This patch is complete with a
bitcode upgrade for expressions using the old format. It does not yet
fix the DWARF backend to use DW_OP_bit_piece correctly.
Implementation note: We discussed several options for implementing
this, including reserving a dedicated field in DIExpression for the
fragment size and offset, but using an custom operator at the end of
the expression works just fine and is more efficient because we then
only pay for it when we need it.
Differential Revision: https://reviews.llvm.org/D27361
rdar://problem/29335809
llvm-svn: 288683
Now that PointerType is no longer a SequentialType, all SequentialTypes
have an associated number of elements, so we can move that information to
the base class, allowing for a number of simplifications.
Differential Revision: https://reviews.llvm.org/D27122
llvm-svn: 288464
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.
Differential Revision: https://reviews.llvm.org/D26594
llvm-svn: 288458
Summary:
We were doing an optimization in the ThinLTO backends of importing
constant unnamed_addr globals unconditionally as a local copy (regardless
of whether the thin link decided to import them). This should be done in
the thin link instead, so that resulting exported references are marked
and promoted appropriately, but will need a summary enhancement to mark
these variables as constant unnamed_addr.
The function import logic during the thin link was trying to handle
this proactively, by conservatively marking all values referenced in
the initializer lists of exported global variables as also exported.
However, this only handled values referenced directly from the
initializer list of an exported global variable. If the value is itself
a constant unnamed_addr variable, we could end up exporting its
references as well. This caused multiple issues. The first is that the
transitively exported references weren't promoted. Secondly, some could
not be promoted/renamed (e.g. they had a section or other constraint).
recursively, instead of just adding the first level of initializer list
references to the ExportList directly.
Remove this optimization and the associated handling in the function
import backend. SPEC measurements indicate we weren't getting much
from it in any case.
Fixes PR31052.
Reviewers: mehdi_amini
Subscribers: krasin, llvm-commits
Differential Revision: https://reviews.llvm.org/D26880
llvm-svn: 288446
This implements PGO-driven loop peeling.
The basic idea is that when the average dynamic trip-count of a loop is known,
based on PGO, to be low, we can expect a performance win by peeling off the
first several iterations of that loop.
Unlike unrolling based on a known trip count, or a trip count multiple, this
doesn't save us the conditional check and branch on each iteration. However,
it does allow us to simplify the straight-line code we get (constant-folding,
etc.). This is important given that we know that we will usually only hit this
code, and not the actual loop.
This is currently disabled by default.
Differential Revision: https://reviews.llvm.org/D25963
llvm-svn: 288274
analyses to have a common type which is enforced rather than using
a char object and a `void *` type when used as an identifier.
This has a number of advantages. First, it at least helps some of the
confusion raised in Justin Lebar's code review of why `void *` was being
used everywhere by having a stronger type that connects to documentation
about this.
However, perhaps more importantly, it addresses a serious issue where
the alignment of these pointer-like identifiers was unknown. This made
it hard to use them in pointer-like data structures. We were already
dodging this in dangerous ways to create the "all analyses" entry. In
a subsequent patch I attempted to use these with TinyPtrVector and
things fell apart in a very bad way.
And it isn't just a compile time or type system issue. Worse than that,
the actual alignment of these pointer-like opaque identifiers wasn't
guaranteed to be a useful alignment as they were just characters.
This change introduces a type to use as the "key" object whose address
forms the opaque identifier. This both forces the objects to have proper
alignment, and provides type checking that we get it right everywhere.
It also makes the types somewhat less mysterious than `void *`.
We could go one step further and introduce a truly opaque pointer-like
type to return from the `ID()` static function rather than returning
`AnalysisKey *`, but that didn't seem to be a clear win so this is just
the initial change to get to a reliably typed and aligned object serving
is a key for all the analyses.
Thanks to Richard Smith and Justin Lebar for helping pick plausible
names and avoid making this refactoring many times. =] And thanks to
Sean for the super fast review!
While here, I've tried to move away from the "PassID" nomenclature
entirely as it wasn't really helping and is overloaded with old pass
manager constructs. Now we have IDs for analyses, and key objects whose
address can be used as IDs. Where possible and clear I've shortened this
to just "ID". In a few places I kept "AnalysisID" to make it clear what
was being identified.
Differential Revision: https://reviews.llvm.org/D27031
llvm-svn: 287783
This patch fixes the non-determinism caused due to iterating SmallPtrSet's
which was uncovered due to the experimental "reverse iteration order " patch:
https://reviews.llvm.org/D26718
The following unit tests failed because of the undefined order of iteration.
LLVM :: Transforms/Util/MemorySSA/cyclicphi.ll
LLVM :: Transforms/Util/MemorySSA/many-dom-backedge.ll
LLVM :: Transforms/Util/MemorySSA/many-doms.ll
LLVM :: Transforms/Util/MemorySSA/phi-translation.ll
Reviewers: dberlin, mgrang
Subscribers: dberlin, llvm-commits, david2050
Differential Revision: https://reviews.llvm.org/D26704
llvm-svn: 287563