This optimization merges the common part of a group of GEPs, so we can compute
each pointer address by adding a simple offset to the common part.
The optimization is currently only enabled for the NVPTX backend, where it has
a large payoff on some benchmarks.
Review: http://reviews.llvm.org/D3462
Patch by Jingyue Wu.
llvm-svn: 207783
=[
Turns out that this was the root cause of PR19621. We found a crasher
only recently (likely due to improvements elsewhere in the SLP
vectorizer) but the reduced test case failed all the way back to here.
I've confirmed that reverting this patch both fixes the reduced test
case in PR19621 and the actual source file that led to it, so it seems
to really be rooted here. I've replied to the commit thread with
discussion of my (feeble) attempts to debug this. Didn't make it very
far, so reverting now that we have a good test case so that things can
get back to healthy while the debugging carries on.
llvm-svn: 207746
Not all address taken blocks get inlined. The reason is
that a blocks new address is known only when it is cloned. But e.g.
a branch instruction in a different block could need that address earlier
while it gets cloned. The solution is to collect the set of all
blocks that can potentially get inlined and compute a new block address
up front. Then clone and cleanup.
rdar://16427209
llvm-svn: 207713
The instcomine logic to handle vpermilvar's pd and 256 variants was incorrect.
The _256 variants have indexes into the individual 128 bit lanes and in all
cases it also has to mask out unused bits.
llvm-svn: 207577
This patch changes the vectorization remarks to also inform when
vectorization is possible but not beneficial.
Added tests to exercise some loop remarks.
llvm-svn: 207574
Summary:
This calls emitOptimizationRemark from the loop unroller and vectorizer
at the point where they make a positive transformation. For the
vectorizer, it reports vectorization and interleave factors. For the
loop unroller, it reports all the different supported types of
unrolling.
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D3456
llvm-svn: 207528
clang directly from the LLVM test suite! That doesn't work. I've
followed up on the review thread to try and get a viable solution sorted
out, but trying to get the tree clean here.
llvm-svn: 207462
mapping from a basic block to an incoming value, either for removal or
just lookup, is linear in the number of predecessors, and we were doing
this for every entry in the 'Preds' list which is in many cases almost
all of them!
Unfortunately, the fixes are quite ugly. PHI nodes just don't make this
operation easy. The efficient way to fix this is to have a clever
'remove_if' operation on PHI nodes that lets us do a single pass over
all the incoming values of the original PHI node, extracting the ones we
care about. Then we could quickly construct the new phi node from this
list. This would remove the remaining underlying quadratic movement of
unrelated incoming values and the need for silly backwards looping to
"minimize" how often we hit the quadratic case.
This is the last obvious fix for PR19499. It shaves another 20% off the
compile time for me, and while UpdatePHINodes remains in the profile,
most of the time is now stemming from the well known inefficiencies of
LVI and jump threading.
llvm-svn: 207409
more than 1 instruction. The caller need to be aware of this
and adjust instruction iterators accordingly.
rdar://16679376
Repaired r207302.
llvm-svn: 207309
right intrinsics.
A packed logical shift right with a shift count bigger than or equal to the
element size always produces a zero vector. In all other cases, it can be
safely replaced by a 'lshr' instruction.
llvm-svn: 207299
Consider this use from the new testcase:
LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
reg({1000,+,-1}<nw><%for.body>)
-3003 + reg({3,+,3}<nw><%for.body>)
-1001 + reg({1,+,1}<nuw><nsw><%for.body>)
-1000 + reg({0,+,1}<nw><%for.body>)
-3000 + reg({0,+,3}<nuw><%for.body>)
reg({-1000,+,1}<nw><%for.body>)
reg({-3000,+,3}<nsw><%for.body>)
This is the last use we consider for a solution in SolveRecurse, so CurRegs is
a large set. (CurRegs is the set of registers that are needed by the
previously visited uses in the in-progress solution.)
ReqRegs is {
{3,+,3}<nw><%for.body>,
{1,+,1}<nuw><nsw><%for.body>
}
This is the intersection of the regs used by any of the formulas for the
current use and CurRegs.
Now, the code requires a formula to contain *all* these regs (the comment is
simply wrong), otherwise the formula is immediately disqualified. Obviously,
no formula for this use contains two regs so they will all get disqualified.
The fix modifies the check to allow the formula in this case. The idea is
that neither of these formulae is introducing any new registers which is the
point of this early pruning as far as I understand.
In terms of set arithmetic, we now allow formulas whose used regs are a subset
of the required regs not just the other way around.
There are few more loops in the test-suite that are now successfully LSRed. I
have benchmarked those and found very minimal change.
Fixes <rdar://problem/13965777>
llvm-svn: 207271
buildbot - do not insert debug intrinsics before phi nodes.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207269
It's fishy to be changing the `std::vector<>` owned by the iterator, and
no one actual does it, so I'm going to remove the ability in a
subsequent commit. First, update the users.
<rdar://problem/14292693>
llvm-svn: 207252
override the default cold threshold.
When we use command line argument to set the inline threshold, the default
cold threshold will not be used. This is in line with how we use
OptSizeThreshold. When we want a higher threshold for all functions, we
do not have to set both inline threshold and cold threshold.
llvm-svn: 207245
Debug info: Let dbg.values inserted by LowerDbgDeclare inherit the location
of the dbg.value. This gets rid of tons of redundant variable DIEs in
subscopes.
rdar://problem/14874886, rdar://problem/16679936
llvm-svn: 207236
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207235
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207165
described by DBG_VALUEs during their lifetime.
Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.
This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine-intrinsics testcase and included source
rdar://problem/16679879
http://reviews.llvm.org/D3374
llvm-svn: 207130
This excludes avx512 as I don't have hardware to verify. It excludes _dq
variants because they are represented in the IR as <{2,4} x i64> when it's
actually a byte shift of the entire i{128,265}.
This also excludes _dq_bs as they aren't at all supported by the backend.
There are also no corresponding instructions in the ISA. I have no idea why
they exist...
llvm-svn: 207058
Summary:
Since the upper 64 bits of the destination register are undefined when
performing this operation, we can substitute it and let the optimizer
figure out that only a copy is needed.
Also added range merging, if an instruction copies a range that can be
merged with a previous copied range.
Added test cases for both optimizations.
Reviewers: grosbach, nadav
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3357
llvm-svn: 207055
Don't replace shifts greater than the type with the maximum shift.
This isn't hit anywhere in the tests, and somewhere else is replacing
these with undef.
llvm-svn: 207000
Use -stats to see how many loops were analyzed for possible vectorization and how many of them were actually vectorized.
Patch by Zinovy Nis
Differential Revision: http://reviews.llvm.org/D3438
llvm-svn: 206956
In the case where the constant comes from a cloned cast instruction, the
materialization code has to go before the cloned cast instruction.
This commit fixes the method that finds the materialization insertion point
by making it aware of this case.
This fixes <rdar://problem/15532441>
llvm-svn: 206913
definition below all of the header #include lines, lib/Transforms/...
edition.
This one is tricky for two reasons. We again have a couple of passes
that define something else before the includes as well. I've sunk their
name macros with the DEBUG_TYPE.
Also, InstCombine contains headers that need DEBUG_TYPE, so now those
headers #define and #undef DEBUG_TYPE around their code, leaving them
well formed modular headers. Fixing these headers was a large motivation
for all of these changes, as "leaky" macros of this form are hard on the
modules implementation.
llvm-svn: 206844
behavior based on other files defining DEBUG_TYPE, which means it cannot
define DEBUG_TYPE at all. This is actually better IMO as it forces folks
to define relevant DEBUG_TYPEs for their files. However, it requires all
files that currently use DEBUG(...) to define a DEBUG_TYPE if they don't
already. I've updated all such files in LLVM and will do the same for
other upstream projects.
This still leaves one important change in how LLVM uses the DEBUG_TYPE
macro going forward: we need to only define the macro *after* header
files have been #include-ed. Previously, this wasn't possible because
Debug.h required the macro to be pre-defined. This commit removes that.
By defining DEBUG_TYPE after the includes two things are fixed:
- Header files that need to provide a DEBUG_TYPE for some inline code
can do so by defining the macro before their inline code and undef-ing
it afterward so the macro does not escape.
- We no longer have rampant ODR violations due to including headers with
different DEBUG_TYPE definitions. This may be mostly an academic
violation today, but with modules these types of violations are easy
to check for and potentially very relevant.
Where necessary to suppor headers with DEBUG_TYPE, I have moved the
definitions below the includes in this commit. I plan to move the rest
of the DEBUG_TYPE macros in LLVM in subsequent commits; this one is big
enough.
The comments in Debug.h, which were hilariously out of date already,
have been updated to reflect the recommended practice going forward.
llvm-svn: 206822
With a constant mask a vpermil* is just a shufflevector. This patch implements
that simplification. This allows us to produce denser code. It should also
allow more folding down the line.
llvm-svn: 206801
The -tailcallelim pass should be checking if byval or inalloca args can
be captured before marking calls as tail calls. This was the real root
cause of PR7272.
With a better fix in place, revert the inliner change from r105255. The
test case it introduced still passes and has been moved to
test/Transforms/Inline/byval-tail-call.ll.
Reviewers: chandlerc
Differential Revision: http://reviews.llvm.org/D3403
llvm-svn: 206789
header files and into the cpp files.
These files will require more touches as the header files actually use
DEBUG(). Eventually, I'll have to introduce a matched #define and #undef
of DEBUG_TYPE for the header files, but that comes as step N of many to
clean all of this up.
llvm-svn: 206777
This flag replaces inline instrumentation for checks and origin stores with
calls into MSan runtime library. This is a workaround for PR17409.
Disabled by default.
llvm-svn: 206585
Summary:
This prevents the discriminator generation pass from triggering if
the DWARF version being used in the module is prior to 4.
Reviewers: echristo, dblaikie
CC: llvm-commits
Differential Revision: http://reviews.llvm.org/D3413
llvm-svn: 206507
Implement DebugInfoVerifier, which steals verification relying on
DebugInfoFinder from Verifier.
- Adds LegacyDebugInfoVerifierPassPass, a ModulePass which wraps
DebugInfoVerifier. Uses -verify-di command-line flag.
- Change verifyModule() to invoke DebugInfoVerifier as well as
Verifier.
- Add a call to createDebugInfoVerifierPass() wherever there was a
call to createVerifierPass().
This implementation as a module pass should sidestep efficiency issues,
allowing us to turn debug info verification back on.
<rdar://problem/15500563>
llvm-svn: 206300
If multiplication involves zero-extended arguments and the result is
compared as in the patterns:
%mul32 = trunc i64 %mul64 to i32
%zext = zext i32 %mul32 to i64
%overflow = icmp ne i64 %mul64, %zext
or
%overflow = icmp ugt i64 %mul64 , 0xffffffff
then the multiplication may be replaced by call to umul.with.overflow.
This change fixes PR4917 and PR4918.
Differential Revision: http://llvm-reviews.chandlerc.com/D2814
llvm-svn: 206137
Also updated as many loops as I could find using df_begin/idf_begin -
strangely I found no uses of idf_begin. Is that just used out of tree?
Also a few places couldn't use df_begin because either they used the
member functions of the depth first iterators or had specific ordering
constraints (I added a comment in the latter case).
Based on a patch by Jim Grosbach. (Jim - you just had iterator_range<T>
where you needed iterator_range<idf_iterator<T>>)
llvm-svn: 206016
Summary:
This patch adds backend support for -Rpass=, which indicates the name
of the optimization pass that should emit remarks stating when it
made a transformation to the code.
Pass names are taken from their DEBUG_NAME definitions.
When emitting an optimization report diagnostic, the lack of debug
information causes the diagnostic to use "<unknown>:0:0" as the
location string.
This is the back end counterpart for
http://llvm-reviews.chandlerc.com/D3226
Reviewers: qcolombet
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3227
llvm-svn: 205774