Commit Graph

11410 Commits

Author SHA1 Message Date
Benjamin Kramer 9130cb8547 LoopUnroll: If we're doing partial unrolling, use the PartialThreshold to limit unrolling.
Otherwise we use the same threshold as for complete unrolling, which is
way too high. This made us unroll any loop smaller than 150 instructions
by 8 times, but only if someone specified -march=core2 or better,
which happens to be the default on darwin.

llvm-svn: 207940
2014-05-04 19:12:38 +00:00
Arnold Schwaighofer cd566c423a SLPVectorizer: Bring back the insertelement patch (r205965) with fixes
When can't assume a vectorized tree is rooted in an instruction. The IRBuilder
could have constant folded it. When we rebuild the build_vector (the series of
InsertElement instructions) use the last original InsertElement instruction. The
vectorized tree root is guaranteed to be before it.

Also, we can't assume that the n-th InsertElement inserts the n-th element into
a vector.

This reverts r207746 which reverted the revert of the revert of r205018 or so.

Fixes the test case in PR19621.

llvm-svn: 207939
2014-05-04 17:10:15 +00:00
Benjamin Kramer 64425fe875 SLPVectorizer: Lazily allocate the map for block numbering.
There is no point in creating it if we're not going to vectorize
anything. Creating the map is expensive as it creates large values.
No functionality change.

llvm-svn: 207916
2014-05-03 15:50:37 +00:00
Karthik Bhat ddd0cb5ecf Vectorize intrinsic math function calls in SLPVectorizer.
This patch adds support to recognize and vectorize intrinsic math functions in SLPVectorizer.
Review: http://reviews.llvm.org/D3560 and http://reviews.llvm.org/D3559

llvm-svn: 207901
2014-05-03 09:59:54 +00:00
Eric Christopher 6c26beb770 Clean up constructor logic and member access for LoopVectorizeHints.
There are public functions that mutate various members as well as
another private member already, so make all the members private to
avoid the discontinuity and add accessors for the values. Should
be no functional change.

llvm-svn: 207868
2014-05-02 20:40:04 +00:00
Nico Weber 4b2acde21a Teach GlobalDCE how to remove empty global_ctor entries.
This moves most of GlobalOpt's constructor optimization
code out of GlobalOpt into Transforms/Utils/CDtorUtils.{h,cpp}. The
public interface is a single function OptimizeGlobalCtorsList() that
takes a predicate returning which constructors to remove.

GlobalOpt calls this with a function that statically evaluates all
constructors, just like it did before. This part of the change is
behavior-preserving.

Also add a call to this from GlobalDCE with a filter that removes global
constructors that contain a "ret" instruction and nothing else – this
fixes PR19590.

llvm-svn: 207856
2014-05-02 18:35:25 +00:00
Akira Hatanaka f76388dd7e [GVN] Pass the phi-translated address of a load instead of the untranslated
address to AnalyzeLoadFromClobberingLoad. This fixes a bug in load-PRE where
PRE is applied to a load that is not partially redundant.

<rdar://problem/16638765>.

llvm-svn: 207853
2014-05-02 17:59:17 +00:00
Nick Lewycky 718ada97bc Fold strlen(expr ? "str1" : "str2") to x ? len1 : len2. This fires about 330 times in a bootstrap of clang.
llvm-svn: 207828
2014-05-02 04:11:45 +00:00
Benjamin Kramer cd1a98bf74 Update and sort CMakeLists.
llvm-svn: 207785
2014-05-01 18:59:11 +00:00
Eli Bendersky a108a65df2 Add an optimization that does CSE in a group of similar GEPs.
This optimization merges the common part of a group of GEPs, so we can compute
each pointer address by adding a simple offset to the common part.

The optimization is currently only enabled for the NVPTX backend, where it has
a large payoff on some benchmarks.

Review: http://reviews.llvm.org/D3462

Patch by Jingyue Wu.

llvm-svn: 207783
2014-05-01 18:38:36 +00:00
Chandler Carruth 18c2fbb143 Revert r205965, which essentially reverts r205018 for the second time.
=[

Turns out that this was the root cause of PR19621. We found a crasher
only recently (likely due to improvements elsewhere in the SLP
vectorizer) but the reduced test case failed all the way back to here.
I've confirmed that reverting this patch both fixes the reduced test
case in PR19621 and the actual source file that led to it, so it seems
to really be rooted here. I've replied to the commit thread with
discussion of my (feeble) attempts to debug this. Didn't make it very
far, so reverting now that we have a good test case so that things can
get back to healthy while the debugging carries on.

llvm-svn: 207746
2014-05-01 11:24:11 +00:00
Gerolf Hoflehner 3282af13d4 Patch for function cloning to inline all blocks whose address is taken
Not all address taken blocks get inlined. The reason is
that a blocks new address is known only when it is cloned. But e.g.
a branch instruction in a different block could need that address earlier
while it gets cloned. The solution is to collect the set of all
blocks that can potentially get inlined and compute a new block address
up front. Then clone and cleanup.

rdar://16427209

llvm-svn: 207713
2014-04-30 22:05:02 +00:00
Yi Jiang e2d5f29c2f Revert r207571 - Add slp vectorization to LTO passes
llvm-svn: 207693
2014-04-30 19:27:24 +00:00
Carlo Kok 307625c974 [IPO/MergeFunctions] changes so it doesn't try to bitcast a struct return type but instead recreates it with insert/extract value.
llvm-svn: 207679
2014-04-30 17:53:04 +00:00
Benjamin Kramer bf2368d94b Add a <tuple> include to more files that aren't getting it transitively on MSVC.
llvm-svn: 207617
2014-04-30 07:21:01 +00:00
NAKAMURA Takumi 99aa6e156a ConstantHoisting.cpp: Add <tuple> for std::tie, since r207593 removed FileSystem.h, it includes <tuple>.
llvm-svn: 207614
2014-04-30 06:44:50 +00:00
Jim Grosbach 708f80f783 Tidy up.
llvm-svn: 207585
2014-04-29 22:41:58 +00:00
Jim Grosbach 4a7d496059 Spelling.
llvm-svn: 207584
2014-04-29 22:41:55 +00:00
Rafael Espindola 85f3610222 Also handle ConstantAggregateZero when optimizing vpermilvar*.
llvm-svn: 207582
2014-04-29 22:20:40 +00:00
Rafael Espindola 152ee213a4 Remove tabs.
Sorry, new machine and I forgot to change the editor setting.

llvm-svn: 207578
2014-04-29 21:02:37 +00:00
Rafael Espindola eb7bdbd0ce Two fixes to the vpermilvar optimization.
The instcomine logic to handle vpermilvar's pd and 256 variants was incorrect.
The _256 variants have indexes into the individual 128 bit lanes and in all
cases it also has to mask out unused bits.

llvm-svn: 207577
2014-04-29 20:41:54 +00:00
Diego Novillo cd64780d18 Fix vectorization remarks.
This patch changes the vectorization remarks to also inform when
vectorization is possible but not beneficial.

Added tests to exercise some loop remarks.

llvm-svn: 207574
2014-04-29 20:06:10 +00:00
Yi Jiang 1a3f18b161 Continue slp vectorization even the BB already has vectorized store radar://16641956
llvm-svn: 207572
2014-04-29 19:37:20 +00:00
Yi Jiang 4e234aa790 Add slp vectorization to LTO passes
llvm-svn: 207571
2014-04-29 19:35:39 +00:00
Adam Nemet deab6f945c Reapply r207271 without the testcase
PR19608 was filed to find a suitable testcase.

llvm-svn: 207569
2014-04-29 18:25:28 +00:00
Diego Novillo 34fc8a7c4c Add optimization remarks to the loop unroller and vectorizer.
Summary:
This calls emitOptimizationRemark from the loop unroller and vectorizer
at the point where they make a positive transformation. For the
vectorizer, it reports vectorization and interleave factors. For the
loop unroller, it reports all the different supported types of
unrolling.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3456

llvm-svn: 207528
2014-04-29 14:27:31 +00:00
Zinovy Nis 487268574a [BUG] Fix -Wunused-variable warning in Release mode. Thnx to Kostya Serebryany for pointing.
llvm-svn: 207516
2014-04-29 09:45:08 +00:00
Kostya Serebryany dc8e551d84 fix -Wunused-variable warning in Release mode
llvm-svn: 207514
2014-04-29 09:33:02 +00:00
Zinovy Nis d373fec199 [OPENMP][LV][D3423] Respect Hints.Force meta-data for loops in LoopVectorizer
llvm-svn: 207512
2014-04-29 08:55:11 +00:00
Michael Zolotukhin a93fec040a Fix a typo in comment
llvm-svn: 207499
2014-04-29 07:35:33 +00:00
Chandler Carruth c71b2c3c7f Revert r207271 for now. This commit introduced a test case that ran
clang directly from the LLVM test suite! That doesn't work. I've
followed up on the review thread to try and get a viable solution sorted
out, but trying to get the tree clean here.

llvm-svn: 207462
2014-04-28 23:07:49 +00:00
Hans Wennborg e36e116826 InstCombine: don't drop 'inalloca' in PromoteCastOfAllocation (PR19569)
llvm-svn: 207426
2014-04-28 17:40:03 +00:00
Chandler Carruth 5bdf72cef6 Fix rampant quadratic behavior in UpdatePHINodes. The operation of
mapping from a basic block to an incoming value, either for removal or
just lookup, is linear in the number of predecessors, and we were doing
this for every entry in the 'Preds' list which is in many cases almost
all of them!

Unfortunately, the fixes are quite ugly. PHI nodes just don't make this
operation easy. The efficient way to fix this is to have a clever
'remove_if' operation on PHI nodes that lets us do a single pass over
all the incoming values of the original PHI node, extracting the ones we
care about. Then we could quickly construct the new phi node from this
list. This would remove the remaining underlying quadratic movement of
unrelated incoming values and the need for silly backwards looping to
"minimize" how often we hit the quadratic case.

This is the last obvious fix for PR19499. It shaves another 20% off the
compile time for me, and while UpdatePHINodes remains in the profile,
most of the time is now stemming from the well known inefficiencies of
LVI and jump threading.

llvm-svn: 207409
2014-04-28 10:37:30 +00:00
Craig Topper e73658ddbb [C++] Use 'nullptr'.
llvm-svn: 207394
2014-04-28 04:05:08 +00:00
Gerolf Hoflehner af7a87d2e3 RecursivelyDeleteTriviallyDeadInstructions() could remove
more than 1 instruction. The caller need to be aware of this
and adjust instruction iterators accordingly.

rdar://16679376

Repaired r207302.

llvm-svn: 207309
2014-04-26 05:58:11 +00:00
Gerolf Hoflehner 1da7cbd584 Restore CloneFunction.cpp which got accidently
overwritten by previous backout of r207303

llvm-svn: 207308
2014-04-26 05:43:41 +00:00
Gerolf Hoflehner c46e9b0423 Revert commit r207302 since build failures
have been reported.

llvm-svn: 207303
2014-04-26 02:03:17 +00:00
Gerolf Hoflehner 34210108b3 RecursivelyDeleteTriviallyDeadInstructions() could remove
more than 1 instruction. The caller need to be aware of this
and adjust instruction iterators accordingly.

rdar://16679376

llvm-svn: 207302
2014-04-26 01:19:16 +00:00
Andrea Di Biagio 8cc9059ce8 [InstCombine][X86] Teach how to fold calls to SSE2/AVX2 packed logical shift
right intrinsics.

A packed logical shift right with a shift count bigger than or equal to the
element size always produces a zero vector. In all other cases, it can be
safely replaced by a 'lshr' instruction.

llvm-svn: 207299
2014-04-26 01:03:22 +00:00
Adrian Prantl 232897feaa Unbreak the gdb buildbot by not lowering dbg.declare intrinsics for arrays.
llvm-svn: 207284
2014-04-25 23:00:25 +00:00
Adam Nemet 03d91c51e4 [LoopStrengthReduce] Don't trim formula that uses a subset of required registers
Consider this use from the new testcase:

  LSR Use: Kind=ICmpZero, Offsets={0}, widest fixup type: i32
    reg({1000,+,-1}<nw><%for.body>)
    -3003 + reg({3,+,3}<nw><%for.body>)
    -1001 + reg({1,+,1}<nuw><nsw><%for.body>)
    -1000 + reg({0,+,1}<nw><%for.body>)
    -3000 + reg({0,+,3}<nuw><%for.body>)
    reg({-1000,+,1}<nw><%for.body>)
    reg({-3000,+,3}<nsw><%for.body>)

This is the last use we consider for a solution in SolveRecurse, so CurRegs is
a large set.  (CurRegs is the set of registers that are needed by the
previously visited uses in the in-progress solution.)

ReqRegs is {
  {3,+,3}<nw><%for.body>,
  {1,+,1}<nuw><nsw><%for.body>
}

This is the intersection of the regs used by any of the formulas for the
current use and CurRegs.

Now, the code requires a formula to contain *all* these regs (the comment is
simply wrong), otherwise the formula is immediately disqualified.  Obviously,
no formula for this use contains two regs so they will all get disqualified.

The fix modifies the check to allow the formula in this case.  The idea is
that neither of these formulae is introducing any new registers which is the
point of this early pruning as far as I understand.

In terms of set arithmetic, we now allow formulas whose used regs are a subset
of the required regs not just the other way around.

There are few more loops in the test-suite that are now successfully LSRed.  I
have benchmarked those and found very minimal change.

Fixes <rdar://problem/13965777>

llvm-svn: 207271
2014-04-25 21:02:21 +00:00
Adrian Prantl 32da88923a This reapplies r207235 with an additional bugfixes caught by the msan
buildbot - do not insert debug intrinsics before phi nodes.

Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.

Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.

This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source

rdar://problem/16679879
http://reviews.llvm.org/D3374

llvm-svn: 207269
2014-04-25 20:49:25 +00:00
Duncan P. N. Exon Smith d2b2facb07 SCC: Change clients to use const, NFC
It's fishy to be changing the `std::vector<>` owned by the iterator, and
no one actual does it, so I'm going to remove the ability in a
subsequent commit.  First, update the users.

<rdar://problem/14292693>

llvm-svn: 207252
2014-04-25 18:24:50 +00:00
Adrian Prantl d2d9b76e48 Revert "This reapplies r207130 with an additional testcase+and a missing check for"
This reverts commit 207235 to investigate msan buildbot breakage.

llvm-svn: 207250
2014-04-25 18:18:09 +00:00
Manman Ren 3c44067a30 [inline cold threshold] Command line argument for inline threshold will
override the default cold threshold.

When we use command line argument to set the inline threshold, the default
cold threshold will not be used. This is in line with how we use
OptSizeThreshold. When we want a higher threshold for all functions, we
do not have to set both inline threshold and cold threshold.

llvm-svn: 207245
2014-04-25 17:34:55 +00:00
Adrian Prantl 0840a22452 Reapply r207135 without modifications.
Debug info: Let dbg.values inserted by LowerDbgDeclare inherit the location
of the dbg.value. This gets rid of tons of redundant variable DIEs in
subscopes.

rdar://problem/14874886, rdar://problem/16679936

llvm-svn: 207236
2014-04-25 17:01:04 +00:00
Adrian Prantl f5834a4b49 This reapplies r207130 with an additional testcase+and a missing check for
AllocaInst that was missing in one location.
Debug info for optimized code: Support variables that are on the stack and
described by DBG_VALUEs during their lifetime.

Previously, when a variable was at a FrameIndex for any part of its
lifetime, this would shadow all other DBG_VALUEs and only a single
fbreg location would be emitted, which in fact is only valid for a small
range and not the entire lexical scope of the variable. The included
dbg-value-const-byref testcase demonstrates this.

This patch fixes this by
Local
- emitting dbg.value intrinsics for allocas that are passed by reference
- dropping all dbg.declares (they are now fully lowered to dbg.values)
SelectionDAG
- renamed constructors for SDDbgValue for better readability.
- fix UserValue::match() to handle indirect values correctly
- not inserting an MMI table entries for dbg.values that describe allocas.
- lowering dbg.values that describe allocas into *indirect* DBG_VALUEs.
CodeGenPrepare
- leaving dbg.values for an alloca were they are (see comment)
Other
- regenerated/updated instcombine.ll testcase and included source

rdar://problem/16679879
http://reviews.llvm.org/D3374

llvm-svn: 207235
2014-04-25 17:01:00 +00:00
Craig Topper f40110f4d8 [C++] Use 'nullptr'. Transforms edition.
llvm-svn: 207196
2014-04-25 05:29:35 +00:00
Karthik Bhat 6a48f7d66e Allow vectorization of bit intrinsics in BB Vectorizer.
This patch adds support for vectorization of  bit intrinsics such as bswap,ctpop,ctlz,cttz.

llvm-svn: 207174
2014-04-25 03:33:48 +00:00
Adrian Prantl 6e5de2ea06 Revert "This reapplies r207130 with an additional testcase+and a missing check for"
Typo in testcase.

llvm-svn: 207166
2014-04-25 00:42:50 +00:00