Commit Graph

7555 Commits

Author SHA1 Message Date
Chad Rosier a8bc512be5 Fix whitespace. NFC.
llvm-svn: 272409
2016-06-10 17:58:01 +00:00
Easwaran Raman e12c487b8c [PM] Port LCSSA to the new PM.
Differential Revision: http://reviews.llvm.org/D21090

llvm-svn: 272294
2016-06-09 19:44:46 +00:00
Xinliang David Li ecde1c7f3d Revert r272194 No need for it if loop Analysis Manager is used
llvm-svn: 272243
2016-06-09 03:22:39 +00:00
Davide Italiano 02861d8695 [PM] Add missing caching of GlobalsAA to EarlyCSE.
llvm-svn: 272204
2016-06-08 21:31:55 +00:00
Evgeny Stupachenko 3e2f389a7e The patch set unroll disable pragma when unroll
with user specified count has been applied.

Summary:
Previously SetLoopAlreadyUnrolled() set the disable pragma only if
there was some loop metadata.
Now it set the pragma in all cases. This helps to prevent multiple
unroll when -unroll-count=N is given.

Reviewers: mzolotukhin

Differential Revision: http://reviews.llvm.org/D20765

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 272195
2016-06-08 20:21:24 +00:00
Xinliang David Li 572135f717 [PM] Refector LoopAccessInfo analysis code
This is the preparation patch to port the analysis to new PM

Differential Revision: http://reviews.llvm.org/D20560

llvm-svn: 272194
2016-06-08 20:15:37 +00:00
Tim Shen 7aa0ad65ce [MemCpyOpt] Do not exchange llvm.lifetime.start and llvm.memcpy
Reviewers: iteratee

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D21087

llvm-svn: 272192
2016-06-08 19:42:32 +00:00
Benjamin Kramer c321e53402 Apply most suggestions of clang-tidy's performance-unnecessary-value-param
Avoids unnecessary copies. All changes audited & pass tests with asan.
No functional change intended.

llvm-svn: 272190
2016-06-08 19:09:22 +00:00
Davide Italiano d8d83f4773 [PM/SimplifyCFG] Preserve GlobalsAA even if the IR is mutated.
llvm-svn: 272139
2016-06-08 13:32:23 +00:00
Benjamin Kramer 46e38f3678 Avoid copies of std::strings and APInt/APFloats where we only read from it
As suggested by clang-tidy's performance-unnecessary-copy-initialization.
This can easily hit lifetime issues, so I audited every change and ran the
tests under asan, which came back clean.

llvm-svn: 272126
2016-06-08 10:01:20 +00:00
Davide Italiano 16e96d4b16 [PM] Preserve GlobalsAA for SROA.
Differential Revision:  http://reviews.llvm.org/D21040

llvm-svn: 272009
2016-06-07 13:21:17 +00:00
Davide Italiano fea0a4c5b2 [PM] Preserve the correct set of analyses for GVN.
llvm-svn: 271934
2016-06-06 20:01:50 +00:00
Davide Italiano 82c447823b [GVN] Switch dump() definition over to LLVM_DUMP_METHOD.
llvm-svn: 271932
2016-06-06 19:24:27 +00:00
Geoff Berry 43e5160d0e Reapply [LSR] Create fewer redundant instructions.
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs.  This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.

Originally reviewed in http://reviews.llvm.org/D18001

Reviewers: atrick

Subscribers: llvm-commits, mzolotukhin, mcrosier

Differential Revision: http://reviews.llvm.org/D18480

llvm-svn: 271929
2016-06-06 19:10:46 +00:00
Eli Friedman ee89505799 LICM: Don't sink stores out of loops that may throw.
Summary:
This hasn't been caught before because it requires noalias or similarly
strong alias analysis to actually reproduce.

Fixes http://llvm.org/PR27952 .

Reviewers: hfinkel, sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20944

llvm-svn: 271858
2016-06-05 22:13:52 +00:00
Sanjoy Das 4d4339d1e8 [PM] Port IndVarSimplify to the new pass manager
Summary:
There are some rough corners, since the new pass manager doesn't have
(as far as I can tell) LoopSimplify and LCSSA, so I've updated the
tests to run them separately in the old pass manager in the lit tests.
We also don't have an equivalent for AU.setPreservesCFG() in the new
pass manager, so I've left a FIXME.

Reviewers: bogner, chandlerc, davide

Subscribers: sanjoy, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D20783

llvm-svn: 271846
2016-06-05 18:01:19 +00:00
Sanjoy Das f90e28d6fd [IndVars] Remove -liv-reduce
It is an off-by-default option that no one seems to use[0], and given
that SCEV directly understands the overflow instrinsics there is no real
need for it anymore.

[0]: http://lists.llvm.org/pipermail/llvm-dev/2016-April/098181.html

llvm-svn: 271845
2016-06-05 18:01:12 +00:00
Michael Zolotukhin 585649895f [LoopUnroll] Set correct thresholds for new recently enabled unrolling heuristic.
In r270478, where I enabled the new heuristic I posted testing results,
which I got when explicitly passed the thresholds values via CL options.
However, setting the CL options init-values is not enough to change the
default values of thresholds, so I'm changing them in another place now.

llvm-svn: 271615
2016-06-03 00:16:46 +00:00
Davide Italiano 8738363339 [TailRecursionElimination] Refactor/cleanup.
In preparation for porting to the new PM.
Patch by Jake VanAdrighem! (review mainly by me/Justin)

Differential Revision:  http://reviews.llvm.org/D20610

llvm-svn: 271607
2016-06-02 23:02:44 +00:00
Davide Italiano 6dfdbf1f46 [PM] LoadCombine preserves GlobalsAA, doesn't depend on it.
llvm-svn: 271601
2016-06-02 22:05:59 +00:00
Davide Italiano 84e1414522 [PM/LoadCombine] Inline getAnalysisUsage(). NFCI.
llvm-svn: 271600
2016-06-02 22:04:43 +00:00
Davide Italiano bdc2971434 [PM] BDCE: Fix caching of analyses.
Another chapter in the story. GlobalsAA should be preserved, as
 well as the CFG.

llvm-svn: 271307
2016-05-31 17:53:22 +00:00
Davide Italiano 688616ff74 [PM] ADCE: Fix caching of analyses.
When this pass was originally ported, AA wasn't available for the
new PM. Now it is, so we can cache properly.

llvm-svn: 271303
2016-05-31 17:39:39 +00:00
Craig Topper 8287fd8abd [X86] Remove SSE/AVX unaligned store intrinsics as clang no longer uses them. Auto upgrade to native unaligned store instructions.
llvm-svn: 271236
2016-05-30 23:15:56 +00:00
Sanjoy Das 3e5ce2b737 [IndVars] Assert that the incoming IR is in LCSSA
Since we already assert that the outgoing IR is in LCSSA, it is easy to
get misled into thinking that -indvars broke LCSSA if the incoming IR is
non-LCSSA.  Checking this pre-condition will make such cases break in
more obvious ways.

Inspired by (but does _not_ fix) PR26682.

llvm-svn: 271196
2016-05-30 01:37:39 +00:00
Sanjoy Das 496f274257 [IndVarSimplify] Extract the logic of `-indvars` out into a class; NFC
This will be used later to port IndVarSimplify to the new pass manager.

llvm-svn: 271190
2016-05-29 21:42:00 +00:00
Davide Italiano 39893bd41c [PM] Reassociate: cache analyses more aggressively.
While here, add a FIXME for setPreserveCFG().

llvm-svn: 271159
2016-05-29 00:41:17 +00:00
Davide Italiano 484b5ab39d [PM] SCCP should preserve GlobalsAA even if the IR is mutated.
llvm-svn: 271149
2016-05-29 00:31:15 +00:00
Evgeny Stupachenko b787522d28 The patch fixes r271071
Summary:
unused variables in Release mode:
  BasicBlock *Header
  unsigned OrigCount
put under DEBUG

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 271076
2016-05-28 00:14:58 +00:00
Evgeny Stupachenko ea2aef4a1d The patch refactors unroll pass.
Summary:
Unroll factor (Count) calculations moved to a new function.
Early exits on pragma and "-unroll-count" defined factor added.
New type of unrolling "Force" introduced (previously used implicitly).
New unroll preference "AllowRemainder" introduced and set "true" by default.
(should be set to false for architectures that suffers from it).

Reviewers: hfinkel, mzolotukhin, zzheng

Differential Revision: http://reviews.llvm.org/D19553

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 271071
2016-05-27 23:15:06 +00:00
Sanjoy Das 6fff9dc932 [GVN] Preserve !range metadata when PRE'ing loads
Reviewers: dberlin, reames, george.burgess.iv

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D20743

llvm-svn: 271034
2016-05-27 19:03:10 +00:00
Benjamin Kramer 82de7d323d Apply clang-tidy's misc-move-constructor-init throughout LLVM.
No functionality change intended, maybe a tiny performance improvement.

llvm-svn: 270997
2016-05-27 14:27:24 +00:00
Igor Laevsky df9db45c94 [RewriteStatepointsForGC] All constant should have null base pointer
Currently we consider that each constant has itself as a base value. I.e "base(const) = const". 
This introduces couple of problems when we are trying to avoid reporting constants in statepoint live sets:

1. When querying "base( phi(const1, const2) )" we will get "phi(const1, const2)" as a base pointer. Since 
   it's not a constant we will record it in a stack map. However on practice we don't want this to happen
   (constant are never relocated).
2. base( phi(const, gc ptr) ) = phi( const, base(gc ptr) ). This particular case imposes challenge on our 
   runtime - we don't expect to see constant base pointers other than null. This problems can be avoided 
   by treating all constant as if they were derived from null pointer base. I.e in a first case we will 
   not include constant pointer in a stack map at all. In a second case we will get "phi(null, base(gc ptr))" 
   as a base pointer which is a lot more convenient.

Differential Revision: http://reviews.llvm.org/D20584

llvm-svn: 270993
2016-05-27 13:13:59 +00:00
Michael Zolotukhin 1ecdedad8d [LoopUnrollAnalyzer] Fix a crash in analyzeLoopUnrollCost.
Condition might be simplified to a Constant, but it doesn't have to be
ConstantInt, so we should dyn_cast, instead of cast.

This fixes PR27886.

llvm-svn: 270924
2016-05-26 21:42:51 +00:00
David Majnemer d99068d26d [MemCpyOpt] Don't perform callslot optimization across may-throw calls
An exception could prevent a store from occurring but MemCpyOpt's
callslot optimization would fire anyway, causing the store to occur.

This fixes PR27849.

llvm-svn: 270892
2016-05-26 19:24:24 +00:00
David Majnemer 474512576e [MergedLoadStoreMotion] Don't transform across may-throw calls
It is unsafe to hoist a load before a function call which may throw, the
throw might prevent a pointer dereference.

Likewise, it is unsafe to sink a store after a call which may throw.
The caller might be able to observe the difference.

This fixes PR27858.

llvm-svn: 270828
2016-05-26 07:11:09 +00:00
David Majnemer 8cce333abd [MergedLoadStoreMotion] Small cleanup
No functional change is intended.

llvm-svn: 270824
2016-05-26 05:43:12 +00:00
Craig Topper a423aa4642 [X86] Add the AVX storeu intrinsics to InstCombine and LoopStrengthReduce in the same places that the SSE/SSE2 storeu intrinsics appear.
I don't really know how to test this. Just seemed like we should be consistent.

llvm-svn: 270819
2016-05-26 04:28:45 +00:00
Sanjoy Das ee77a4828e [IRCE] Use C++11 style initializers; NFC
llvm-svn: 270815
2016-05-26 01:50:18 +00:00
Sanjoy Das a099268e85 [IRCE] Optimize conjunctions of range checks
After this change, we do the expected thing for cases like

```
Check0Passed = /* range check IRCE can optimize */
Check1Passed = /* range check IRCE can optimize */
if (!(Check0Passed && Check1Passed))
  throw_Exception();
```

llvm-svn: 270804
2016-05-26 00:09:02 +00:00
Sanjoy Das 8fe8892c2d [IRCE] Refactor out a parseRangeCheckFromCond; NFC
This will later hold more general logic to parse conjunctions of range
checks.

llvm-svn: 270802
2016-05-26 00:08:24 +00:00
Davide Italiano 1021c68e92 [PM] Port PartiallyInlineLibCalls to the new pass manager.
llvm-svn: 270798
2016-05-25 23:38:53 +00:00
Davide Italiano d85ac997b8 [PM] CorrelatedValuePropagation: pass state to function. NFCI.
While here, convert the logic of the pass to use static function(s).
This is in preparation for porting this pass to the new PM.

llvm-svn: 270734
2016-05-25 17:39:54 +00:00
Craig Topper 12e322a8cf [X86] Remove the llvm.x86.sse2.storel.dq intrinsic. It hasn't been used in a long time.
llvm-svn: 270677
2016-05-25 06:56:32 +00:00
Davide Italiano 655a145e83 [PM] Port BDCE to the new pass manager.
llvm-svn: 270647
2016-05-25 01:57:04 +00:00
Michael Zolotukhin 8f7a242c7b Re-enable "[LoopUnroll] Enable advanced unrolling analysis by default" one more time.
This reverts commit r270577.

llvm-svn: 270630
2016-05-24 23:00:05 +00:00
Sanjoy Das be99153aca [GuardWidening] Tighten the interface of the RangeCheck struct; NFC
Make `GuardWideningImpl::RangeCheck` into a class and add accessors.

llvm-svn: 270611
2016-05-24 20:54:45 +00:00
Sanjoy Das 5fd7ac452e [IRCE] Return a Value*, not SCEV* from parseRangeCheck; NFC
This is better layering, since the caller needs to check if the index
was an add-rec anyway.

llvm-svn: 270582
2016-05-24 17:19:56 +00:00
Hans Wennborg b64e4390a3 Revert r270518, which re-enabled "[LoopUnroll] Enable advanced unrolling analysis by default.
Chromium builds are still hitting the assert in PR27874.

llvm-svn: 270577
2016-05-24 16:10:12 +00:00
Michael Zolotukhin 96c150d154 Revert "Revert r270478 "[LoopUnroll] Enable advanced unrolling analysis by default.""
This reverts commit r270512 and reapplies r270478. Originally it caused
PR27847, but it was fixed in r270517.

llvm-svn: 270518
2016-05-24 01:22:20 +00:00
Hans Wennborg 6951028b61 Revert r270478 "[LoopUnroll] Enable advanced unrolling analysis by default."
This caused PR27847.

llvm-svn: 270512
2016-05-23 23:42:35 +00:00
Sanjoy Das aa83c47bab [IRCE] Optimize "uses" not branches; NFCI
This changes IRCE to optimize uses, and not branches.  This change is
NFCI since the uses we do inspect are in practice only ever going to be
the condition use in conditional branches; but this flexibility will
later allow us to analyze more complex expressions than just a direct
branch on a range check.

llvm-svn: 270500
2016-05-23 22:16:45 +00:00
Michael Zolotukhin be080fc51d [LoopUnroll] Enable advanced unrolling analysis by default.
Summary:
This patch turns on LoopUnrollAnalyzer by default. To mitigate compile
time regressions, I chose very conservative thresholds for now. Later we
can make them more aggressive, but it might require being smarter in
which loops we're optimizing. E.g. currently the biggest issue is that
with more agressive thresholds we unroll many cold loops, which
increases compile time for no performance benefit (performance of those
loops is improved, but it doesn't matter since they are cold).

Test results for compile time(using 4 samples to reduce noise):
```
MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 5.19%
SingleSource/Benchmarks/Polybench/medley/reg_detect/reg_detect  4.19%
MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow  3.39%
MultiSource/Applications/JM/lencod/lencod 1.47%
MultiSource/Benchmarks/Fhourstones-3_1/fhourstones3_1 -6.06%
```

I didn't see any performance changes in the testsuite, but it improves
some internal tests.

Reviewers: hfinkel, chandlerc

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D20482

llvm-svn: 270478
2016-05-23 19:10:19 +00:00
Sanjoy Das c5b1169de2 [IRCE] Don't use an allocator for range checks; NFC
The InductiveRangeCheck struct is only five words long; so passing these
around value is fine.  The allocator makes the code look more complex
than it is.

llvm-svn: 270309
2016-05-21 02:52:13 +00:00
Sanjoy Das 59776734a3 [IRCE] Don't pass IRBuilder<> where unnecessary; NFC
llvm-svn: 270308
2016-05-21 02:31:51 +00:00
Sanjoy Das be6c7a12cb [GuardWidening] Fix incorrect use of remove_if
I had used `std::remove_if` under the assumption that it moves the
predicate matching elements to the end, but actaully the elements
remaining towards the end (after the iterator returned by
`std::remove_if`) are indeterminate.  Fix the bug (and make the code
more straightforward) by using a temporary SmallVector, and add a test
case demonstrating the issue.

llvm-svn: 270306
2016-05-21 02:24:44 +00:00
Davide Italiano f7211fd44d [PM/PartiallyInlineLibCalls] Fix pass dependencies.
Inline getAnalysisUsage() while I'm here.

llvm-svn: 270231
2016-05-20 16:23:14 +00:00
Davide Italiano 8749dfd1bf [PartiallyInlineLibCalls] Remove dead includes. NFC.
llvm-svn: 270228
2016-05-20 15:52:23 +00:00
Davide Italiano 08713bd1ed [PM/PartiallyInlineLibCalls] Convert to static function in preparation for porting this pass to the new PM.
llvm-svn: 270225
2016-05-20 15:43:39 +00:00
Sanjoy Das 2351975860 Add const qualifiers to appease bots; NFC
llvm-svn: 270155
2016-05-19 23:15:59 +00:00
Sanjoy Das f5f0331a3b [GuardWidening] Introduce range check merging
Sequences of range checks expressed using guards, like

  guard((I - 2) u< L)
  guard((I - 1) u< L)
  guard((I + 0) u< L)
  guard((I + 1) u< L)
  guard((I + 2) u< L)

can sometimes be combined into a smaller sequence:

  guard((I - 2) u< L AND (I + 2) u< L)

if we can prove that (I - 2) u< L AND (I + 2) u< L implies all of checks
expressed in the previous sequence.

This change teaches GuardWidening to do this kind of merging when
feasible.

llvm-svn: 270151
2016-05-19 22:55:46 +00:00
Davide Italiano 46f249b4cd [SCCP] Prefer class to struct.
llvm-svn: 270074
2016-05-19 15:58:02 +00:00
Sanjoy Das b784ed36c0 [GuardWidening] Use getEquivalentICmp to fold constant compares
`ConstantRange::getEquivalentICmp` is more general, and better
factored.

llvm-svn: 270019
2016-05-19 03:53:17 +00:00
Sanjoy Das 52bbde2bbc [LowerGuards] Rename variable; NFC
PredicatePassProbability is a better name for what LikelyBranchWeight
was trying to express.

llvm-svn: 269999
2016-05-18 23:16:27 +00:00
Sanjoy Das 083f38939b New pass: guard widening
Summary:
Implement guard widening in LLVM. Description from GuardWidening.cpp:

The semantics of the `@llvm.experimental.guard` intrinsic lets LLVM
transform it so that it fails more often that it did before the
transform.  This optimization is called "widening" and can be used hoist
and common runtime checks in situations like these:

```
%cmp0 = 7 u< Length
call @llvm.experimental.guard(i1 %cmp0) [ "deopt"(...) ]
call @unknown_side_effects()
%cmp1 = 9 u< Length
call @llvm.experimental.guard(i1 %cmp1) [ "deopt"(...) ]
...
```

to

```
%cmp0 = 9 u< Length
call @llvm.experimental.guard(i1 %cmp0) [ "deopt"(...) ]
call @unknown_side_effects()
...
```

If `%cmp0` is false, `@llvm.experimental.guard` will "deoptimize" back
to a generic implementation of the same function, which will have the
correct semantics from that point onward.  It is always _legal_ to
deoptimize (so replacing `%cmp0` with false is "correct"), though it may
not always be profitable to do so.

NB! This pass is a work in progress.  It hasn't been tuned to be
"production ready" yet.  It is known to have quadriatic running time and
will not scale to large numbers of guards

Reviewers: reames, atrick, bogner, apilipenko, nlewycky

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D20143

llvm-svn: 269997
2016-05-18 22:55:34 +00:00
Michael Zolotukhin d2268a73bc [LoopUnrollAnalyzer] Take into account cost of instructions controlling branches, along with their operands.
Previously, we didn't add their and their operands cost, which could've
resulted in unrolling loops for no actual benefit.

llvm-svn: 269985
2016-05-18 21:20:12 +00:00
Davide Italiano 98f7e0e790 [PM] Port per-function SCCP to the new pass manager.
llvm-svn: 269937
2016-05-18 15:18:25 +00:00
Justin Bogner 594e07bd78 [PM] Port DSE to the new pass manager
Patch by JakeVanAdrighem. Thanks!

llvm-svn: 269847
2016-05-17 21:38:13 +00:00
Sanjoy Das fd67038c8b [Guards] Add branch metadata when lowering
Guards are expected to basically never fail.  Reflect this in the branch
probabilities in their lowered form.

llvm-svn: 269791
2016-05-17 17:51:19 +00:00
Igor Laevsky 953f2d2a54 [RewriteStatepointsForGC] Remove obsolete assertion
This is assertion is no longer necessary since we never record
constants in the live set anyway. (They are never recorded in 
the initial live set, and constant bases are removed near line 2119)

Differential Revision: http://reviews.llvm.org/D20293

llvm-svn: 269764
2016-05-17 13:54:10 +00:00
Davide Italiano 6f852eedbf [PM] RewriterStatepointForGC: add missing dependency.
llvm-svn: 269624
2016-05-16 02:29:53 +00:00
Benjamin Kramer a65b610bd2 Move helper classes into anonymous namespaces. NFC.
llvm-svn: 269591
2016-05-15 15:18:11 +00:00
Davide Italiano e62c54375d [PM/SCCP] Fix pass dependencies.
TargetLibraryInfoWrapperPass is a dependency of
SCCP but it's not listed as such. Chandler pointed
out this is an easy mistake to make which only
surfaces in weird crashes with some flag combinations.
This code will go away anyway at some point in the
future, but as long as it's (still) exercised, try
to make it correct.

llvm-svn: 269589
2016-05-15 08:04:28 +00:00
Davide Italiano e7c56c5c4f [SCCP] Use range-based for loops. NFC.
llvm-svn: 269578
2016-05-14 20:59:09 +00:00
Davide Italiano 9922344178 [PM] Port LowerAtomic to the new pass manager.
llvm-svn: 269511
2016-05-13 22:52:35 +00:00
Michael Zolotukhin 963a6d9c69 Revert "Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...""
This reverts commit r269395.

Try to reapply with a fix from chapuni.

llvm-svn: 269486
2016-05-13 21:23:25 +00:00
Jun Bum Lim be11bdc4b0 Rename getLargestLegalIntTypeSize to getLargestLegalIntTypeSizeInBits(). NFC.
Summary: Rename DataLayout::getLargestLegalIntTypeSize to DataLayout::getLargestLegalIntTypeSizeInBits() to prevent similar mistakes  fixed in r269433.

Reviewers: joker.eph, mcrosier

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D20248

llvm-svn: 269456
2016-05-13 18:38:35 +00:00
Geoff Berry 2f64c20284 [EarlyCSE] Change key type of AvailableCalls to Instruction*. NFCI.
llvm-svn: 269445
2016-05-13 17:54:58 +00:00
Jun Bum Lim f28beac419 [MemCpyOpt] Use MaxIntSize in byte instead of bit
Summary: This change fix the bug in isProfitableToUseMemset() where MaxIntSize shoule be in byte, not bit.

Reviewers: arsenm, joker.eph, mcrosier

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D20176

llvm-svn: 269433
2016-05-13 16:52:24 +00:00
Michael Zolotukhin 9be3b8b9bb Revert "[Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the..."
This reverts commit r269388.

It caused some bots to fail, I'm reverting it until I investigate the
issue.

llvm-svn: 269395
2016-05-13 06:32:25 +00:00
Adam Nemet eff76646f5 [LoopDist] Only run LAA for loops with the pragma
This should fix some compile-time regressions after r267672.  Thanks to
Chris Matthews for bisecting it.

llvm-svn: 269392
2016-05-13 04:20:31 +00:00
Michael Zolotukhin b7b8052982 [Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...
Summary:
...loop after the last iteration.

This is really hard to do correctly. The core problem is that we need to
model liveness through the induction PHIs from iteration to iteration in
order to get the correct results, and we need to correctly de-duplicate
the common subgraphs of instructions feeding some subset of the
induction PHIs. All of this can be driven either from a side effect at
some iteration or from the loop values used after the loop finishes.

This patch implements this by storing the forward-propagating analysis
of each instruction in a cache to recall whether it was free and whether
it has become live and thus counted toward the total unroll cost. Then,
at each sink for a value in the loop, we recursively walk back through
every value that feeds the sink, including looping back through the
iterations as needed, until we have marked the entire input graph as
live. Because we cache this, we never visit instructions more than twice
-- once when we analyze them and put them into the cache, and once when
we count their cost towards the unrolled loop. Also, because the cache
is only two bits and because we are dealing with relatively small
iteration counts, we can store all of this very densely in memory to
avoid this from becoming an excessively slow analysis.

The code here is still pretty gross. I would appreciate suggestions
about better ways to factor or split this up, I've stared too long at
the algorithmic side to really have a good sense of what the design
should probably look at.

Also, it might seem like we should do all of this bottom-up, but I think
that is a red herring. Specifically, the simplification power is *much*
greater working top-down. We can forward propagate very effectively,
even across strange and interesting recurrances around the backedge.
Because we use data to propagate, this doesn't cause a state space
explosion. Doing this level of constant folding, etc, would be very
expensive to do bottom-up because it wouldn't be until the last moment
that you could collapse everything. The current solution is essentially
a top-down simplification with a bottom-up cost accounting which seems
to get the best of both worlds. It makes the simplification incremental
and powerful while leaving everything dead until we *know* it is needed.

Finally, a core property of this approach is its *monotonicity*. At all
times, the current UnrolledCost is a conservatively low estimate. This
ensures that we will never early-exit from the analysis due to exceeding
a threshold when if we had continued, the cost would have gone back
below the threshold. These kinds of bugs can cause incredibly hard to
track down random changes to behavior.

We could use a techinque similar (but much simpler) within the inliner
as well to avoid considering speculated code in the inline cost.

Reviewers: chandlerc

Subscribers: sanjoy, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D11758

llvm-svn: 269388
2016-05-13 01:42:39 +00:00
Chandler Carruth 49c22190d0 [PM] Port of the DepndenceAnalysis to the new PM.
Ported DA to the new PM by splitting the former DependenceAnalysis Pass
into a DependenceInfo result type and DependenceAnalysisWrapperPass type
and adding a new PM-style DependenceAnalysis analysis pass returning the
DependenceInfo.

Patch by Philip Pfaffe, most of the review by Justin.

Differential Revision: http://reviews.llvm.org/D18834

llvm-svn: 269370
2016-05-12 22:19:39 +00:00
Davide Italiano 851f879f32 [PM] Make LowerAtomic a FunctionPass.
Differential Revision: http://reviews.llvm.org/D20025

llvm-svn: 269322
2016-05-12 18:49:32 +00:00
David Majnemer 96f0d383a7 [SCCP] Resolve shifts beyond the bitwidth to undef
Shifts beyond the bitwidth are undef but SCCP resolved them to zero.
Instead, DTRT and resolve them to undef.

This reimplements the transform which caused PR27712.

llvm-svn: 269269
2016-05-12 03:07:40 +00:00
Davide Italiano cd7c84bd8b Revert "[SCCP] Partially propagate informations when the input is not fully defined."
This reverts commit r269105 as it caused PR27712.

llvm-svn: 269252
2016-05-11 23:06:10 +00:00
Tim Northover 3961735f03 Revert "MemCpyOpt: combine local load/store sequences into memcpy."
This reverts commit r269125. It was in my tree when I ran "git svn dcommit".
It's really still under review.

llvm-svn: 269127
2016-05-10 21:49:40 +00:00
Tim Northover 6c65c71639 MemCpyOpt: combine local load/store sequences into memcpy.
Sort of the BB-local equivalent to idiom-recognizer: if we have a basic-block
that really implements a memcpy operation, backends can benefit from seeing
this.

llvm-svn: 269125
2016-05-10 21:48:11 +00:00
Hans Wennborg 719b26ba54 Loop unroller: set thresholds for optsize and minsize functions to zero
Before r268509, Clang would disable the loop unroll pass when optimizing
for size. That commit enabled it to be able to support unroll pragmas
in -Os builds. However, this regressed binary size in one of Chromium's
DLLs with ~100 KB.

This restores the original behaviour of no unrolling at -Os, but doing it
in LLVM instead of Clang makes more sense, and also allows the pragmas to
keep working.

Differential revision: http://reviews.llvm.org/D20115

llvm-svn: 269124
2016-05-10 21:45:55 +00:00
Lawrence Hu e58a814c07 Enable loopreroll for sext of loop control only IV
This patch extend loopreroll to allow the instruction chain
        of loop control only IV has sext.

        Differential Revision: http://reviews.llvm.org/D19820

llvm-svn: 269121
2016-05-10 21:16:49 +00:00
Lawrence Hu fe7c87beac Revert r26084: Enable loopreroll for sext of loop control only IV
llvm-svn: 269119
2016-05-10 21:11:09 +00:00
Davide Italiano 7860c9bbf4 [SCCP] Partially propagate informations when the input is not fully defined.
With this patch:
%r1 = lshr i64 -1, 4294967296 -> undef

Before this patch:
%r1 = lshr i64 -1, 4294967296 -> 0

llvm-svn: 269105
2016-05-10 19:49:47 +00:00
Lawrence Hu 8cc3b37d2c Enable loopreroll for sext of loop control only IV
This patch extend loopreroll to allow the instruction chain
    of loop control only IV has sext.

llvm-svn: 269084
2016-05-10 17:42:27 +00:00
Chuang-Yu Cheng 175741d5a7 Update Debug Intrinsics in RewriteUsesOfClonedInstructions in LoopRotation
Loop rotation clones instruction from the old header into the preheader. If
there were uses of values produced by these instructions that were outside
the loop, we have to insert PHI nodes to merge the two values. If the values
are used by DbgIntrinsics they will be used as a MetadataAsValue of a
ValueAsMetadata of the original values, and iterating all of the uses of the
original value will not update the DbgIntrinsics. The new code checks if the
values are used by DbgIntrinsics and if so, updates them using essentially
the same logic as the original code.

The attached testcase demonstrates the issue. Without the fix, the
DbgIntrinic outside the loop uses values computed inside the loop, even
though these values do not dominate the DbgIntrinsic.

Author: Thomas Jablin (tjablin)
Reviewers: dblaikie aprantl kbarton hfinkel cycheng

http://reviews.llvm.org/D19564

llvm-svn: 269034
2016-05-10 09:45:44 +00:00
Denis Zobnin 15d1e64b2b [LAA] Rename "isStridedPtr" with "getPtrStride". NFC.
Changing misleading function name was approved in http://reviews.llvm.org/D17268.
Patch by Roman Shirokiy.

llvm-svn: 269021
2016-05-10 05:55:16 +00:00
Philip Reames 4a3c3b66d7 [GVN] PRE of unordered loads
Again, fairly simple.  Only change is ensuring that we actually copy the property of the load correctly.  The aliasing legality constraints were already handled by the FRE patches.  There's nothing special about unorder atomics from the perspective of the PRE algorithm itself.

llvm-svn: 268804
2016-05-06 21:43:51 +00:00
Sanjoy Das 091fcfa3a7 [RS4GC] Fix typo in comment
llvm-svn: 268790
2016-05-06 20:39:33 +00:00
Philip Reames 1fdce639d2 [GVN] Handle unordered atomics in cross block FRE
You'll note there are essentially no code changes here.  Cross block FRE heavily reuses code from the block local FRE.  All of the tricky parts were done as part of the previous patch and the refactoring that removed the original code duplication.  

llvm-svn: 268775
2016-05-06 18:46:45 +00:00
Philip Reames ae8997f496 [GVN] Do local FRE for unordered atomic loads
This patch is the first in a small series teaching GVN to optimize unordered loads aggressively. This change just handles block local FRE because that's the simplest thing which lets me test MDA, and the AvailableValue pieces. Somewhat suprisingly, MDA appears fine and only a couple of small changes are needed in GVN.

Once this is in, I'll tackle non-local FRE and PRE. The former looks like a natural extension of this, the later will require a couple of minor changes.

Differential Revision: http://reviews.llvm.org/D19440

llvm-svn: 268770
2016-05-06 18:17:13 +00:00
Philip Reames 32b55181fa [EarlyCSE] Rename a variable for clarity [NFC]
llvm-svn: 268701
2016-05-06 01:13:58 +00:00
Davide Italiano f54f2f0893 [PM] Port Interprocedural SCCP to the new pass manager.
llvm-svn: 268684
2016-05-05 21:05:36 +00:00
Dehao Chen f50c67ce7c Revert http://reviews.llvm.org/D19926 as it breaks tests.
llvm-svn: 268681
2016-05-05 20:47:53 +00:00
Dehao Chen e48b4ee98c Simplify CFG before assigning discriminator.
Summary: We need to clean up CFG before assigning discriminator to minimize the impact of optimization on debug info.

Reviewers: davidxl, dblaikie, dnovillo

Subscribers: dnovillo, danielcdh, llvm-commits

Differential Revision: http://reviews.llvm.org/D19926

llvm-svn: 268675
2016-05-05 20:18:49 +00:00
Chad Rosier 799e4c6fc3 Remove dead include. NFC.
llvm-svn: 268654
2016-05-05 17:53:43 +00:00
Dehao Chen d55bc4c7ab clang-format some files in preparation of coming patch reviews.
llvm-svn: 268583
2016-05-05 00:54:54 +00:00
Adam Nemet 3c5eabfcbc [LoopDataPrefetch] Add optimization remark
With -Rpass=loop-data-prefetch, show the memory access that got
prefetched.

llvm-svn: 268578
2016-05-05 00:08:15 +00:00
Davide Italiano a7f5e88932 Revert "[SCCP] Throw away dead code. NFC."
This reverts commit r268568, as it broke the bots.

llvm-svn: 268570
2016-05-04 23:27:13 +00:00
Davide Italiano fc1214fee2 [SCCP] Throw away dead code. NFC.
llvm-svn: 268568
2016-05-04 23:05:59 +00:00
Chad Rosier 7ab9a7b203 Use a uniform name for the load combine pass. NFC.
llvm-svn: 268507
2016-05-04 15:19:02 +00:00
Igor Laevsky fb1811d3a0 [RS4GC] Use SetVector/MapVector instead of DenseSet/DenseMap to guarantee stable ordering
Goal of this change is to guarantee stable ordering of the statepoint arguments and other 
newly inserted values such as gc.relocates. Previously we had explicit sorting in a couple
of places. However for unnamed values ordering was partial and overall we didn't have any 
strong invariant regarding it. This change switches all data structures to use SetVector's
and MapVector's which provide possibility for deterministic iteration over them.
Explicit sorting is now redundant and was removed.

Differential Revision: http://reviews.llvm.org/D19669

llvm-svn: 268502
2016-05-04 14:55:36 +00:00
Andrew Kaylor 50271f787e Add opt-bisect support to additional passes that can be skipped
Differential Revision: http://reviews.llvm.org/D19882

llvm-svn: 268457
2016-05-03 22:32:30 +00:00
Justin Bogner d0d2341f30 PM: Port LoopRotation to the new loop pass manager
llvm-svn: 268452
2016-05-03 22:02:31 +00:00
Justin Bogner ab6a513b4e PM: Port LoopSimplifyCFG to the new pass manager
llvm-svn: 268446
2016-05-03 21:47:32 +00:00
Jack Liu f101c0f7a1 [SROA] Function canConvertValue needs to check whether both NewTy and OldTy pointers are
pointing to the same addr space. This can prevent SROA from creating a bitcast
between pointers with different addr spaces.

Differential Revision: http://reviews.llvm.org/D19697

llvm-svn: 268424
2016-05-03 19:30:48 +00:00
Jack Liu 430e2c2140 Revert 268409 due to missing comment.
llvm-svn: 268421
2016-05-03 19:15:02 +00:00
Jack Liu 1ff4a0b7ee (no commit message)
llvm-svn: 268409
2016-05-03 18:01:43 +00:00
Sanjoy Das 4ae3920c5b [LICM] Kill SCEV loop dispositions if needed
SCEV caches whether SCEV expressions are loop invariant, variant or
computable.  LICM breaks this cache, almost by definition; so clear the
SCEV disposition cache if LICM changed anything.

llvm-svn: 268408
2016-05-03 17:50:11 +00:00
Sanjoy Das 7e7a5a050a Use all_of instead of a raw loop; NFC
Added some tests despite being NFC, since it looks like nothing was
exercising the "all incoming values to exit PHIs are same" logic.

llvm-svn: 268407
2016-05-03 17:50:06 +00:00
Sanjoy Das 905fc27ebf [LoopDeletion] Clear SCEV loop dispositions
`Loop::makeLoopInvariant` can hoist instructions out of loops, so loop
dispositions for the loop it operated on may need to be cleared.  We can
be smarter here (especially around how `forgetLoopDispositions` is
implemented), but let's be correct first.

Fixes PR27570.

llvm-svn: 268406
2016-05-03 17:50:02 +00:00
Kristof Beyls c08f70588d Mark that SpeculativeExecution preserves Globals Alias Analysis.
A few benchmarks with lots of accesses to global variables in the hot
loops regressed a lot since r266399, which added the
SpeculativeExecution pass to the default pipeline. The problem is that
this pass doesn't mark Globals Alias Analysis as preserved. Globals
Alias Analysis is computed in a module pass, whereas
SpeculativeExecution is a function pass, and a lot of passes dependent
on the Globals Alias Analysis to optimize these benchmarks are also
function passes. As such, the Globals Alias Analysis information cannot
be recomputed between SpeculativeExecution and the following function
passes needing that information.

SpeculativeExecution doesn't invalidate Globals Alias Analysis, so mark
it as such to fix those performance regressions.

Differential Revision: http://reviews.llvm.org/D19806

llvm-svn: 268370
2016-05-03 08:33:26 +00:00
David Majnemer 3d90bb79c4 [LoopUnroll] Unroll loops which have exit blocks to EH pads
We were overly cautious in our analysis of loops which have invokes
which unwind to EH pads.  The loop unroll transform is safe because it
only clones blocks in the loop body, it does not try to split critical
edges involving EH pads.  Instead, move the necessary safety check to
LoopUnswitch.

N.B. The safety check for loop unswitch is covered by an existing test
which fails without it.

llvm-svn: 268357
2016-05-03 03:57:40 +00:00
Chad Rosier fcb2210812 Typo. NFC.
llvm-svn: 268280
2016-05-02 19:06:04 +00:00
Chad Rosier 4466ff50eb Use false rather than 0 for a boolean value. NFC.
llvm-svn: 268279
2016-05-02 19:06:02 +00:00
Adam Nemet d02872c7b4 [LLE] Fix typo from r263058
This was meant to check unit stride for both the load and the store.

Thanks to Roman Shirokiy for noticing this.

llvm-svn: 268251
2016-05-02 16:52:00 +00:00
Sanjoy Das 47cf2affbd [LowerGuardIntrinsics] Keep track of !make.implicit metadata
If a guard call being lowered by LowerGuardIntrinsics has the
`!make.implicit` metadata attached, then reattach the metadata to the
branch in the resulting expanded form of the intrinsic.  This allows us
to implement null checks as guards and still get the benefit of implicit
null checks.

llvm-svn: 268148
2016-04-30 00:55:59 +00:00
Lawrence Hu 1befea2bdc Reroll loops with multiple IV and negative step part 3
support multiple induction variables

    This patch enable loop reroll for the following case:
        for(int i=0;  i<N; i += 2) {
           S += *a++;
           S += *a++;
        };

Differential Revision: http://reviews.llvm.org/D16550

llvm-svn: 268147
2016-04-30 00:51:22 +00:00
Sanjoy Das 52c68bb0f5 [LowerGuardIntrinsics] Preserve calling conv when lowering
llvm-svn: 268142
2016-04-30 00:17:47 +00:00
Sanjoy Das 107aefc2fc Mark guards on true as "trivially dead"
This moves some logic added to EarlyCSE in rL268120 into
`llvm::isInstructionTriviallyDead`.  Adds a test case for DCE to
demonstrate that passes other than EarlyCSE can now pick up on the new
information.

llvm-svn: 268126
2016-04-29 22:23:16 +00:00
Sanjoy Das ee81b23fe7 [EarlyCSE] Simplify guard intrinsics
Summary:
This change teaches EarlyCSE some basic properties of guard intrinsics:

 - Guard intrinsics read all memory, but don't write to any memory
 - After a guard has executed, the condition it was guarding on can be
   assumed to be true
 - Guard intrinsics on a constant `true` are no-ops

Reviewers: reames, hfinkel

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19578

llvm-svn: 268120
2016-04-29 21:52:58 +00:00
Filipe Cabecinhas 0da9937517 Unify XDEBUG and EXPENSIVE_CHECKS (into the latter), and add an option to the cmake build to enable them.
Summary:
Historically, we had a switch in the Makefiles for turning on "expensive
checks". This has never been ported to the cmake build, but the
(dead-ish) code is still around.

This will also make it easier to turn it on in buildbots.

Reviewers: chandlerc

Subscribers: jyknight, mzolotukhin, RKSimon, gberry, llvm-commits

Differential Revision: http://reviews.llvm.org/D19723

llvm-svn: 268050
2016-04-29 15:22:48 +00:00
Adam Nemet 88ec491830 [LoopDist] Also emit optimization remark on success (-Rpass=)
The option -Rpass=loop-distribute now reports the loops that were
distributed.

llvm-svn: 268006
2016-04-29 07:10:46 +00:00
Adam Nemet 4338d6769e [LoopDist] Pass 'Function' to main class. NFC
Next patch will add another use for 'Function' inside the class.

llvm-svn: 268005
2016-04-29 07:10:39 +00:00
Adam Nemet 0ba164bbcb [LoopDist] Emit optimization remarks (-Rpass*)
I closely followed the precedents set by the vectorizer:

* With -Rpass-missed, the loop is reported with further details pointing
to -Rpass--analysis.

* -Rpass-analysis reports the details why distribution has failed.

* Regardless of -Rpass*, when distribution fails for a loop where
distribution was forced with the pragma, a warning is produced according
to -Wpass-failed.  In this case the analysis info is also printed even
without -Rpass-analysis.

llvm-svn: 267952
2016-04-28 23:08:32 +00:00
Adam Nemet adeccf7658 [LoopDist] Improve debug messages
The next patch will start using these for -Rpass-analysis so they won't
be internal-only anymore.

Move the 'Skipping; ' prefix that some of the message are using into the
'fail' function.  We don't want to include this prefix in
the -Rpass-analysis report.

llvm-svn: 267951
2016-04-28 23:08:30 +00:00
Adam Nemet 7f38e1199a [LoopDist] Add helper to print debug message when distribution fails. NFC
This will form the basis to emit optimization remarks (-Rpass*).

llvm-svn: 267950
2016-04-28 23:08:27 +00:00
Chad Rosier 712b7d7630 [GVN] Minor code cleanup. NFC.
Differential Revision: http://reviews.llvm.org/D18828
Patch by Aditya Kumar!

llvm-svn: 267898
2016-04-28 16:00:15 +00:00
Geoff Berry 5ae272c2c1 [EarlyCSE] Change LoadValue field Value *Data to Instruction *Inst. NFC.
Made in preparation for adding MemorySSA support to EarlyCSE.

llvm-svn: 267893
2016-04-28 15:22:37 +00:00
Geoff Berry 354fac2a69 [EarlyCSE] Sort includes. NFC.
Reviewers: mcrosier

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19617

llvm-svn: 267890
2016-04-28 14:59:27 +00:00
Ahmed Bougacha ace97c1f7d [LIR] Set attributes on memset_pattern16.
"inferattrs" will deduce the attribute, but it will be too late for
many optimizations. Set it ourselves when creating the call.

Differential Revision: http://reviews.llvm.org/D17598

llvm-svn: 267762
2016-04-27 19:04:50 +00:00
Ahmed Bougacha 7f97193dd7 [LIR] Reuse variable. NFCI.
llvm-svn: 267761
2016-04-27 19:04:46 +00:00
Artur Pilipenko 9bb6beabf4 isSafeToLoadUnconditionally support queries without a context
This is required to use this function from isSafeToSpeculativelyExecute

Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D16231

llvm-svn: 267692
2016-04-27 11:00:48 +00:00
Adam Nemet d2fa414718 [LoopDist] Add llvm.loop.distribute.enable loop metadata
Summary:
D19403 adds a new pragma for loop distribution.  This change adds
support for the corresponding metadata that the pragma is translated to
by the FE.

As part of this I had to rethink the flag -enable-loop-distribute.  My
goal was to be backward compatible with the existing behavior:

  A1. pass is off by default from the optimization pipeline
  unless -enable-loop-distribute is specified

  A2. pass is on when invoked directly from opt (e.g. for unit-testing)

The new pragma/metadata overrides these defaults so the new behavior is:

  B1. A1 + enable distribution for individual loop with the pragma/metadata

  B2. A2 + disable distribution for individual loop with the pragma/metadata

The default value whether the pass is on or off comes from the initiator
of the pass.  From the PassManagerBuilder the default is off, from opt
it's on.

I moved -enable-loop-distribute under the pass.  If the flag is
specified it overrides the default from above.

Then the pragma/metadata can further modifies this per loop.

As a side-effect, we can now also use -enable-loop-distribute=0 from opt
to emulate the default from the optimization pipeline.  So to be precise
this is the new behavior:

  C1. pass is off by default from the optimization pipeline
  unless -enable-loop-distribute or the pragma/metadata enables it

  C2. pass is on when invoked directly from opt
  unless -enable-loop-distribute=0 or the pragma/metadata disables it

Reviewers: hfinkel

Subscribers: joker.eph, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D19431

llvm-svn: 267672
2016-04-27 05:28:18 +00:00
Sanjoy Das 5253a089ba Fix typo in comment; NFC
llvm-svn: 267653
2016-04-27 01:44:31 +00:00
Matt Arsenault ba437c67d2 SLSR: Use UnknownAddressSpace instead of 0 for pure arithmetic.
In the case where isLegalAddressingMode is used for cases
not related to addressing modes, such as pure adds and muls,
it should not be using address space 0. LSR already passes -1
as the address space in these cases.

llvm-svn: 267645
2016-04-27 00:32:09 +00:00
Adam Nemet 61399ac424 [LoopDist] Split main class. NFC
This splits out the per-loop functionality from the Pass class.

With this the fact whether the loop is forced-distribute with the new
metadata/pragma can be cached in the per-loop class rather than passed
around.

llvm-svn: 267643
2016-04-27 00:31:03 +00:00
Justin Bogner c2bf63d29d PM: Port Reassociate to the new pass manager
llvm-svn: 267631
2016-04-26 23:39:29 +00:00
Justin Bogner cb8a21c88e Reassociate: Convert another functor into a lambda. NFC
Also move the explanatory comment with it.

llvm-svn: 267628
2016-04-26 23:32:00 +00:00
Sanjay Patel d2d2aa52cd [LowerExpectIntrinsic] make default likely/unlikely ratio bigger
We need the default ratio to be sufficiently large that it triggers transforms 
based on block frequency info (BFI) and plays well with the recently introduced
BranchProbability used by CGP.

Differential Revision: http://reviews.llvm.org/D19435

llvm-svn: 267615
2016-04-26 22:23:38 +00:00
Justin Bogner 90744d215b Reassociate: Simplify using lambdas. NFC
llvm-svn: 267614
2016-04-26 22:22:18 +00:00
David Majnemer 30ffc4ce45 [SROA] Don't falsely report that changes have occured
We would report that the function changed despite creating no new
allocas or performing any promotion.

This fixes PR27316.

llvm-svn: 267507
2016-04-26 01:05:00 +00:00
Chad Rosier e2cbd13e56 [ValueTracking] Improve isImpliedCondition when the dominating cond is false.
llvm-svn: 267430
2016-04-25 17:23:36 +00:00
Andrew Kaylor aa641a5171 Re-commit optimization bisect support (r267022) without new pass manager support.
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267231
2016-04-22 22:06:11 +00:00
Justin Bogner b93949089e PM: Port SinkingPass to the new pass manager
llvm-svn: 267199
2016-04-22 19:54:10 +00:00
Justin Bogner 82077c4ab0 PM: Reorder the functions used for SinkingPass. NFC
This will make the port to the new PM easier to follow.

llvm-svn: 267198
2016-04-22 19:54:04 +00:00
Jun Bum Lim d29a24e4fd [DeadStoreElimination] Shorten beginning of memset overwritten by later stores
Summary: This change will shorten memset if the beginning of memset is overwritten by later stores.

Reviewers: hfinkel, eeckstein, dberlin, mcrosier

Subscribers: mgrang, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18906

llvm-svn: 267197
2016-04-22 19:51:29 +00:00
Justin Bogner 395c2127ed PM: Port DCE to the new pass manager
Also add a very basic test, since apparently there aren't any tests
for DCE whatsoever to add the new pass version to.

llvm-svn: 267196
2016-04-22 19:40:41 +00:00
Chad Rosier 1a4bc110f5 [EarlyCSE/CVP] Add stats for CVPs and make sure to account for any Changes.
llvm-svn: 267187
2016-04-22 18:47:21 +00:00
David Majnemer bfd695d591 [EarlyCSE] Don't add the overflow flags to the hash
We take the intersection of overflow flags while CSE'ing.
This permits us to consider two instructions with different overflow
behavior to be replaceable.

llvm-svn: 267153
2016-04-22 14:12:50 +00:00
Vedant Kumar 6013f45f92 Revert "Initial implementation of optimization bisect support."
This reverts commit r267022, due to an ASan failure:

  http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549

llvm-svn: 267115
2016-04-22 06:51:37 +00:00
David Majnemer d0ce8f1485 [GVN] Respect fast-math-flags on fcmps
We assumed that flags were only present on binary operators.  This is
not true, they may also be present on calls and fcmps.

llvm-svn: 267113
2016-04-22 06:37:51 +00:00
David Majnemer 9554c1339c [EarlyCSE] Take the intersection of flags on instructions
EarlyCSE had inconsistent behavior with regards to flag'd instructions:
- In some cases, it would pessimize if the available instruction had
  different flags by not performing CSE.
- In other cases, it would miscompile if it replaced an instruction
  which had no flags with an instruction which has flags.

Fix this by being more consistent with our flag handling by utilizing
andIRFlags.

llvm-svn: 267111
2016-04-22 06:37:45 +00:00
Andrew Kaylor f0f279291c Initial implementation of optimization bisect support.
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.

The bisection is enabled using a new command line option (-opt-bisect-limit).  Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit.  A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.

The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check.  Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute.  A new function call has been added for module and SCC passes that behaves in a similar way.

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267022
2016-04-21 17:58:54 +00:00
Adam Nemet 963341c872 [LoopUtils] Move def of findStringMetadataForLoop to LoopUtils.cpp. NFC
The decl is in LoopUtils.h.  I think that this was added to
LoopVersioningLICM.cpp by mistake.

llvm-svn: 267014
2016-04-21 17:33:17 +00:00
Adam Nemet f787826b46 [LoopUtils] Rename {check->find}StringMetadata{Into->For}Loop. NFC
"Into" was misleading.  I am also planning to use this helper to look
for loop metadata and return the argument, so find seems like a better
name.

llvm-svn: 267013
2016-04-21 17:33:12 +00:00
Chad Rosier b346dcbc25 Typo.
llvm-svn: 266905
2016-04-20 19:16:23 +00:00
Chad Rosier 41dd31f0b0 [ValueTracking] Make isImpliedCondition return an Optional<bool>. NFC.
Phabricator Revision: http://reviews.llvm.org/D19277

llvm-svn: 266904
2016-04-20 19:15:26 +00:00
Chad Rosier b7dfbb40a3 [ValueTracking] Improve isImpliedCondition for conditions with matching operands.
This patch improves SimplifyCFG to catch cases like:

  if (a < b) {
    if (a > b) <- known to be false
      unreachable;
  }

Phabricator Revision: http://reviews.llvm.org/D18905

llvm-svn: 266767
2016-04-19 17:19:14 +00:00
Michael Kuperstein de16b44f74 Port DemandedBits to the new pass manager.
Differential Revision: http://reviews.llvm.org/D18679

llvm-svn: 266699
2016-04-18 23:55:01 +00:00
Mehdi Amini b550cb1750 [NFC] Header cleanup
Removed some unused headers, replaced some headers with forward class declarations.

Found using simple scripts like this one:
clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' | xargs grep -L 'IndexedMap[<]' | xargs grep -n --color=auto 'IndexedMap'

Patch by Eugene Kosov <claprix@yandex.ru>

Differential Revision: http://reviews.llvm.org/D19219

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266595
2016-04-18 09:17:29 +00:00
Duncan P. N. Exon Smith a71301befa Transforms: Fix bootstrap after r266565
Apparently there isn't test coverage for all of these.  I'd appreciate
if someone with could reproduce and send me something to reduce, but for
now I've just looked for users of RemapInstruction and MapValue and
ensured they don't accidentally insert nullptr.  Here is one of the
bootstraps that caught:

  http://lab.llvm.org:8011/builders/clang-x64-ninja-win7/builds/11494

llvm-svn: 266567
2016-04-17 19:26:49 +00:00
Justin Lebar cad81cf6b3 [Speculation] Add a SpeculativeExecution mode where the pass does nothing unless TTI::hasBranchDivergence() is true.
Summary:
This lets us add this pass to the IR pass manager unconditionally; it
will simply not do anything on targets without branch divergence.

Reviewers: tra

Subscribers: llvm-commits, jingyue, rnk, chandlerc

Differential Revision: http://reviews.llvm.org/D18625

llvm-svn: 266398
2016-04-15 00:32:09 +00:00
Nicolai Haehnle 05b127da06 [StructurizeCFG] Annotate branches that were treated as uniform
Summary:
This fully solves the problem where the StructurizeCFG pass does not
consider the same branches as uniform as the SIAnnotateControlFlow pass.
The patch in D19013 helps with this problem, but is not sufficient
(and, interestingly, causes a "regression" with one of the existing
test cases).

No tests included here, because tests in D19013 already cover this.

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19018

llvm-svn: 266346
2016-04-14 17:42:35 +00:00
Tim Northover 5c02f9ad28 ARM: override cost function to re-enable ConstantHoisting (& fix it).
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:

  + ConstantHoisting was modifying switch statements. This is simply invalid,
    the cases must remain integer constants no matter the notional cost.
  + ConstantHoisting was mangling alloca instructions in the entry block. These
    should be handled by FrameLowering, so constants actually have a cost of 0.
    Worse, the resulting bitcasts meant they became dynamic allocas.

rdar://25707382

llvm-svn: 266260
2016-04-13 23:08:27 +00:00
Betul Buyukkurt bf8554c279 [PGO] Remove redundant VP instrumentation
LLVM optimization passes may reduce a profiled target expression
to a constant. Removing runtime calls at such instrumentation points
would help speedup the runtime of the instrumented program.

llvm-svn: 266229
2016-04-13 18:52:19 +00:00
Sanjoy Das 5ce3272833 Don't IPO over functions that can be de-refined
Summary:
Fixes PR26774.

If you're aware of the issue, feel free to skip the "Motivation"
section and jump directly to "This patch".

Motivation:

I define "refinement" as discarding behaviors from a program that the
optimizer has license to discard.  So transforming:

```
void f(unsigned x) {
  unsigned t = 5 / x;
  (void)t;
}
```

to

```
void f(unsigned x) { }
```

is refinement, since the behavior went from "if x == 0 then undefined
else nothing" to "nothing" (the optimizer has license to discard
undefined behavior).

Refinement is a fundamental aspect of many mid-level optimizations done
by LLVM.  For instance, transforming `x == (x + 1)` to `false` also
involves refinement since the expression's value went from "if x is
`undef` then { `true` or `false` } else { `false` }" to "`false`" (by
definition, the optimizer has license to fold `undef` to any non-`undef`
value).

Unfortunately, refinement implies that the optimizer cannot assume
that the implementation of a function it can see has all of the
behavior an unoptimized or a differently optimized version of the same
function can have.  This is a problem for functions with comdat
linkage, where a function can be replaced by an unoptimized or a
differently optimized version of the same source level function.

For instance, FunctionAttrs cannot assume a comdat function is
actually `readnone` even if it does not have any loads or stores in
it; since there may have been loads and stores in the "original
function" that were refined out in the currently visible variant, and
at the link step the linker may in fact choose an implementation with
a load or a store.  As an example, consider a function that does two
atomic loads from the same memory location, and writes to memory only
if the two values are not equal.  The optimizer is allowed to refine
this function by first CSE'ing the two loads, and the folding the
comparision to always report that the two values are equal.  Such a
refined variant will look like it is `readonly`.  However, the
unoptimized version of the function can still write to memory (since
the two loads //can// result in different values), and selecting the
unoptimized version at link time will retroactively invalidate
transforms we may have done under the assumption that the function
does not write to memory.

Note: this is not just a problem with atomics or with linking
differently optimized object files.  See PR26774 for more realistic
examples that involved neither.

This patch:

This change introduces a new set of linkage types, predicated as
`GlobalValue::mayBeDerefined` that returns true if the linkage type
allows a function to be replaced by a differently optimized variant at
link time.  It then changes a set of IPO passes to bail out if they see
such a function.

Reviewers: chandlerc, hfinkel, dexonsmith, joker.eph, rnk

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18634

llvm-svn: 265762
2016-04-08 00:48:30 +00:00
Ulrich Weigand fc23907673 [GVN] Address review comments for D18662
As suggested by Chandler in his review comments for D18662, this
follow-on patch renames some variables in GetLoadValueForLoad and
CoerceAvailableValueToLoadType to hopefully make it more obvious
which variables hold value sizes and which hold load/store sizes.

No functional change intended.

llvm-svn: 265687
2016-04-07 15:55:11 +00:00
Ulrich Weigand 6e6966460a [GVN] Fix handling of sub-byte types in big-endian mode
When GVN wants to re-interpret an already available value in a smaller
type, it needs to right-shift the value on big-endian systems to ensure
the correct bytes are accessed.  The shift value is the difference of
the sizes of the two types.

This is correct as long as both types occupy multiples of full bytes.
However, when one of them is a sub-byte type like i1, this no longer
holds true: we still need to shift, but only to access the correct
*byte*.  Accessing bits within the byte requires no shift in either
endianness; e.g. an i1 resides in the least-significant bit of its
containing byte on both big- and little-endian systems.

Therefore, the appropriate shift value to be used is the difference of
the *storage* sizes of the two types.  This is already handled correctly
in one place where such a shift takes place (GetStoreValueForLoad), but
is incorrect in two other places: GetLoadValueForLoad and
CoerceAvailableValueToLoadType.

This patch changes both places to use the storage size as well.

Differential Revision: http://reviews.llvm.org/D18662

llvm-svn: 265684
2016-04-07 15:45:02 +00:00
Duncan P. N. Exon Smith da68cbc4ad IR: RF_IgnoreMissingValues => RF_IgnoreMissingLocals, NFC
Clarify what this RemapFlag actually means.

  - Change the flag name to match its intended behaviour.
  - Clearly document that it's not supposed to affect globals.
  - Add a host of FIXMEs to indicate how to fix the behaviour to match
    the intent of the flag.

RF_IgnoreMissingLocals should only affect the behaviour of
RemapInstruction for function-local operands; namely, for operands of
type Argument, Instruction, and BasicBlock.  Currently, it is *only*
passed into RemapInstruction calls (and the transitive MapValue calls
that it makes).

When I split Metadata from Value I didn't understand the flag, and I
used it in a bunch of places for "global" metadata.

This commit doesn't have any functionality change, but prepares to
cleanup MapMetadata and MapValue.

llvm-svn: 265628
2016-04-07 00:26:43 +00:00
JF Bastien 800f87a871 NFC: make AtomicOrdering an enum class
Summary:
In the context of http://wg21.link/lwg2445 C++ uses the concept of
'stronger' ordering but doesn't define it properly. This should be fixed
in C++17 barring a small question that's still open.

The code currently plays fast and loose with the AtomicOrdering
enum. Using an enum class is one step towards tightening things. I later
also want to tighten related enums, such as clang's
AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI'
enum).

This change touches a few lines of code which can be improved later, I'd
like to keep it as NFC for now as it's already quite complex. I have
related changes for clang.

As a follow-up I'll add:
  bool operator<(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>(AtomicOrdering, AtomicOrdering) = delete;
  bool operator<=(AtomicOrdering, AtomicOrdering) = delete;
  bool operator>=(AtomicOrdering, AtomicOrdering) = delete;
This is separate so that clang and LLVM changes don't need to be in sync.

Reviewers: jyknight, reames

Subscribers: jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D18775

llvm-svn: 265602
2016-04-06 21:19:33 +00:00
Fiona Glaser 045afc4f66 Loop Unroll: add options and tweak to make Partial unrolling more useful
1. Add FullUnrollMaxCount option that works like MaxCount, but also limits
   the unroll count for fully unrolled loops. So if a loop has an iteration
   count over this, it won't fully unroll.
2. Add CLI options for MaxCount and the new option, so they can be tested
   (plus a test).
3. Make partial unrolling obey MaxCount.

An example use-case (the out of tree one this is originally designed for) is
a target’s TTI can analyze a loop and decide on a max unroll count separate
from the size threshold, e.g. based on register pressure, then constrain
LoopUnroll to not exceed that, regardless of the size of the unrolled loop.

llvm-svn: 265562
2016-04-06 16:57:25 +00:00
Fiona Glaser 16332ba861 LoopUnroll: only allow non-modulo Partial unrolling when Runtime=true
Patch by Evgeny Stupachenko <evstupac@gmail.com>.

llvm-svn: 265558
2016-04-06 16:43:45 +00:00
Chad Rosier 074ce836f0 Simplify logic. NFC.
llvm-svn: 265537
2016-04-06 13:27:13 +00:00
Richard Trieu f35d4b0928 Add parentheses to silence warning.
llvm-svn: 265516
2016-04-06 04:22:00 +00:00
Sanjoy Das 99abb2728b [RS4GC] Add a comment
llvm-svn: 265503
2016-04-06 01:33:54 +00:00
Sanjoy Das 8d89a2b296 [RS4GC] NFC cleanup of the DeferredReplacement class
Instead of constructors use clearly named factory methods.

llvm-svn: 265486
2016-04-05 23:18:53 +00:00
Sanjoy Das 49e974b33b [RS4GC] Better codegen for deoptimize calls
Don't emit a gc.result for a statepoint lowered from
@llvm.experimental.deoptimize since the call into __llvm_deoptimize is
effectively noreturn.  Instead follow the corresponding gc.statepoint
with an "unreachable".

llvm-svn: 265485
2016-04-05 23:18:35 +00:00
Sanjay Patel e77c7de459 use range loop; NFCI
llvm-svn: 265360
2016-04-04 23:05:06 +00:00
Zia Ansari a82a58a4e5 Enable unroll for constant bound loops when TripCount is not modulo of unroll factor, reducing it to maximum power-of-2 that satisfies threshold limit.
Commit for Evgeny Stupachenko (evstupac@gmail.com)

Differential Revision: http://reviews.llvm.org/D18290

llvm-svn: 265337
2016-04-04 19:24:46 +00:00
Sanjoy Das 021de058df Introduce a @llvm.experimental.guard intrinsic
Summary:
As discussed on llvm-dev[1].

This change adds the basic boilerplate code around having this intrinsic
in LLVM:

 - Changes in Intrinsics.td, and the IR Verifier
 - A lowering pass to lower @llvm.experimental.guard to normal
   control flow
 - Inliner support

[1]: http://lists.llvm.org/pipermail/llvm-dev/2016-February/095523.html

Reviewers: reames, atrick, chandlerc, rnk, JosephTremoulet, echristo

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18527

llvm-svn: 264976
2016-03-31 00:18:46 +00:00
David Majnemer 5d518386b6 [IndVarSimplify] Don't insert after a catchswitch
Widening a PHI requires us to insert a trunc.
The logical place for this trunc is in the same BB as the PHI.
This is not possible if the BB is terminated by a catchswitch.

This fixes PR27133.

llvm-svn: 264926
2016-03-30 21:12:06 +00:00
Adam Nemet 1428d41f9a [LoopDataPrefetch] Centralize the tuning cl::opts under the pass
This is effectively NFC, minus the renaming of the options
(-cyclone-prefetch-distance -> -prefetch-distance).

The change was requested by Tim in D17943.

llvm-svn: 264806
2016-03-29 23:45:52 +00:00
Duncan P. N. Exon Smith e8eb94a9a5 ADCE: Remove debug info intrinsics in dead scopes
During ADCE, track which debug info scopes still have live references
from the code, and delete debug info intrinsics for the dead ones.

These intrinsics describe the locations of variables (in registers or
stack slots).  If there's no code left corresponding to a variable's
scope, then there's no way to reference the variable in the debugger and
it doesn't matter what its value is.

I add a DEBUG printout when the described location in an SSA register,
in case it helps some trying to track down why locations get lost.
However, we still delete these; the scope itself isn't attached to any
real code, so the ship has already sailed.

llvm-svn: 264800
2016-03-29 22:57:12 +00:00
Adam Nemet 85fba39390 [LoopDataPrefetch] Make more member functions private, NFC.
llvm-svn: 264798
2016-03-29 22:40:02 +00:00
Hyojin Sung 4673f10568 [SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops
When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes
is currently used to recognize potential loops of which the block is the header and keep the block.
However, the current algorithm fails if the loops' exit condition is evaluated only with volatile
values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested
loop, the loop is collapsed into a single loop which prevent later optimizations from being
applied (e.g., transforming nested loops into simplified forms and loop vectorization).
    
The patch augments the existing PHI node-based check by adding a pre-test if the BB actually
belongs to a set of loop headers and not eliminating it if yes.

llvm-svn: 264697
2016-03-29 04:08:57 +00:00
Reid Kleckner ba85781f58 Revert "[SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops"
This reverts commit r264596.

It does not compile.

llvm-svn: 264604
2016-03-28 18:07:40 +00:00
Hyojin Sung 0ada5b0d14 [SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops
When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes
is currently used to recognize potential loops of which the block is the header and keep the block.
However, the current algorithm fails if the loops' exit condition is evaluated only with volatile
values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested
loop, the loop is collapsed into a single loop which prevent later optimizations from being 
applied (e.g., transforming nested loops into simplified forms and loop vectorization).

The patch augments the existing PHI node-based check by adding a pre-test if the BB actually 
belongs to a set of loop headers and not eliminating it if yes. 

llvm-svn: 264596
2016-03-28 17:22:25 +00:00
Hal Finkel 5c83a090bc [SROA] Fix typo in comment
llvm-svn: 264573
2016-03-28 11:23:21 +00:00
Hal Finkel 29f5131daf C++11 is required, remove some preprocessor checks for it
We require C++11 to build, so remove a few remaining preprocessor checks for
'__cplusplus >= 201103L'. This should always be true.

llvm-svn: 264572
2016-03-28 11:13:03 +00:00
Sanjoy Das d4c783335b [RS4GC] Lower calls to @llvm.experimental.deoptimize
This changes RS4GC to lower calls to ``@llvm.experimental.deoptimize``
to gc.statepoints wrapping ``__llvm_deoptimize``, and changes
``callsGCLeafFunction`` to recognize ``@llvm.experimental.deoptimize``
as a non GC leaf function.

I've had to hard code the ``"__llvm_deoptimize"`` name in
RewriteStatepointsForGC; since ``TargetLibraryInfo`` is available only
during codegen.  This isn't without precedent in the codebase, so I'm
not overtly concerned.

llvm-svn: 264456
2016-03-25 20:12:13 +00:00
David L Kreitzer 8d441eb936 Enable non-power-of-2 #pragma unroll counts.
Patch by Evgeny Stupachenko.

Differential Revision: http://reviews.llvm.org/D18202

llvm-svn: 264407
2016-03-25 14:24:52 +00:00
David Majnemer e09d035dad [LoopStrengthReduce] Don't hoist into a catchswitch
We try to hoist the insertion point as high as possible to encourage
sharing.  However, we must be careful not to hoist into a catchswitch as
it is both an EHPad and a terminator.

llvm-svn: 264344
2016-03-24 21:40:22 +00:00
Adam Nemet 7aba60c853 [LLE] Check for mismatching types between the store and the load earlier
isDependenceDistanceOfOne asserts that the store and the load access
through the same type.  This function is also used by
removeDependencesFromMultipleStores so we need to make sure we filter
out mismatching types before reaching this point.

Now we do this when the initial candidates are gathered.

This is a refinement of the fix made in r262267.

Fixes PR27048.

llvm-svn: 264313
2016-03-24 17:59:26 +00:00
Zinovy Nis 07ac2bd4d0 [PATCH] Force LoopReroll to reset the loop trip count value after reroll.
It's a bug fix. 
For rerolled loops SE trip count remains unchanged. It leads to incorrect work of the next passes.
My patch just resets SE info for rerolled loop forcing SE to re-evaluate it next time it requested.
I also added a verifier call in the exisitng test to be sure no invalid SE data remain. Without my fix this test would fail with -verify-scev.

Differential Revision: http://reviews.llvm.org/D18316

llvm-svn: 264051
2016-03-22 13:50:57 +00:00
Adam Nemet 709e3046ee [LoopDataPrefetch] Add TTI to limit the number of iterations to prefetch ahead
Summary:
It can hurt performance to prefetch ahead too much.  Be conservative for
now and don't prefetch ahead more than 3 iterations on Cyclone.

Reviewers: hfinkel

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17949

llvm-svn: 263772
2016-03-18 00:27:43 +00:00
Adam Nemet 6d8beeca53 [LoopDataPrefetch/Aarch64] Allow selective prefetching of large-strided accesses
Summary:
And use this TTI for Cyclone.  As it was explained in the original RFC
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758), the HW
prefetcher work up to 2KB strides.

I am also adding tests for this and the previous change (D17943):

* Cyclone prefetching accesses with a large stride
* Cyclone not prefetching accesses with a small stride
* Generic Aarch64 subtarget not prefetching either

Reviewers: hfinkel

Subscribers: aemerson, rengolin, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17945

llvm-svn: 263771
2016-03-18 00:27:38 +00:00
Adam Nemet 5eccf07df3 [LoopVersioning] Annotate versioned loop with noalias metadata
Summary:
If we decide to version a loop to benefit a transformation, it makes
sense to record the now non-aliasing accesses in the newly versioned
loop.  This allows non-aliasing information to be used by subsequent
passes.

One example is 456.hmmer in SPECint2006 where after loop distribution,
we vectorize one of the newly distributed loops.  To vectorize we
version this loop to fully disambiguate may-aliasing accesses.  If we
add the noalias markers, we can use the same information in a later DSE
pass to eliminate some dead stores which amounts to ~25% of the
instructions of this hot memory-pipeline-bound loop.  The overall
performance improves by 18% on our ARM64.

The scoped noalias annotation is added in LoopVersioning.  The patch
then enables this for loop distribution.  A follow-on patch will enable
it for the vectorizer.  Eventually this should be run by default when
versioning the loop but first I'd like to get some feedback whether my
understanding and application of scoped noalias metadata is correct.

Essentially my approach was to have a separate alias domain for each
versioning of the loop.  For example, if we first version in loop
distribution and then in vectorization of the distributed loops, we have
a different set of memchecks for each versioning.  By keeping the scopes
in different domains they can conveniently be defined independently
since different alias domains don't affect each other.

As written, I also have a separate domain for each loop.  This is not
necessary and we could save some metadata here by using the same domain
across the different loops.  I don't think it's a big deal either way.

Probably the best is to review the tests first to see if I mapped this
problem correctly to scoped noalias markers.  I have plenty of comments
in the tests.

Note that the interface is prepared for the vectorizer which needs the
annotateInstWithNoAlias API.  The vectorizer does not use LoopVersioning
so we need a way to pass in the versioned instructions.  This is also
why the maps have to become part of the object state.

Also currently, we only have an AA-aware DSE after the vectorizer if we
also run the LTO pipeline.  Depending how widely this triggers we may
want to schedule a DSE toward the end of the regular pass pipeline.

Reviewers: hfinkel, nadav, ashutosh.nema

Subscribers: mssimpso, aemerson, llvm-commits, mcrosier

Differential Revision: http://reviews.llvm.org/D16712

llvm-svn: 263743
2016-03-17 20:32:32 +00:00
Sanjoy Das c9058ca9e0 [Statepoints] Export a magic constant into a header; NFC
llvm-svn: 263733
2016-03-17 18:42:17 +00:00
Sanjoy Das 312038872d [Statepoints] Separate out logic for statepoint directives; NFC
This splits out the logic that maps the `"statepoint-id"` attribute into
the actual statepoint ID, and the `"statepoint-num-patch-bytes"`
attribute into the number of patchable bytes the statpeoint is lowered
into.  The new home of this logic is in IR/Statepoint.cpp, and this
refactoring will support similar functionality when lowering calls with
deopt operand bundles in the future.

llvm-svn: 263685
2016-03-17 01:56:10 +00:00
Geoff Berry 56fabf9b55 Revert "[LSR] Create fewer redundant instructions."
This reverts commit r263644.  Investigating bootstrap failures.

llvm-svn: 263655
2016-03-16 19:21:47 +00:00
Geoff Berry 459b750871 [LSR] Create fewer redundant instructions.
Summary:
Fix LSRInstance::HoistInsertPosition() to check the original insert
position block first for a canonical insertion point that is dominated
by all inputs.  This leads to SCEV being able to reuse more instructions
since it currently tracks the instructions it creates for reuse by
keeping a table of <Value, insert point> pairs.

Reviewers: atrick

Subscribers: mcrosier, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18001

llvm-svn: 263644
2016-03-16 17:29:49 +00:00
Haicheng Wu 7873857a88 [JumpThreading] See through Cast Instructions
To capture more jump-thread opportunity.

llvm-svn: 263618
2016-03-16 04:52:52 +00:00
Haicheng Wu 64d9d7c3f7 Revert "[JumpThreading] Simplify Instructions first in ComputeValueKnownInPredecessors()"
Not sure it handles undef properly.

llvm-svn: 263605
2016-03-15 23:38:47 +00:00
Justin Lebar 6827de19b2 [LoopUnroll] Respect the convergent attribute.
Summary:
Specifically, when we perform runtime loop unrolling of a loop that
contains a convergent op, we can only unroll k times, where k divides
the loop trip multiple.

Without this change, we'll happily unroll e.g. the following loop

  for (int i = 0; i < N; ++i) {
    if (i == 0) convergent_op();
    foo();
  }

into

  int i = 0;
  if (N % 2 == 1) {
    convergent_op();
    foo();
    ++i;
  }
  for (; i < N - 1; i += 2) {
    if (i == 0) convergent_op();
    foo();
    foo();
  }.

This is unsafe, because we've just added a control-flow dependency to
the convergent op in the prelude.

In general, runtime unrolling loops that contain convergent ops is safe
only if we don't have emit a prelude, which occurs when the unroll count
divides the trip multiple.

Reviewers: resistor

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17526

llvm-svn: 263509
2016-03-14 23:15:34 +00:00
Amaury Sechet bdb261b4c0 Imporove load to store => memcpy
Summary: This now try to reorder instructions in order to help create the optimizable pattern.

Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer

Differential Revision: http://reviews.llvm.org/D16523

llvm-svn: 263503
2016-03-14 22:52:27 +00:00
Chad Rosier 00bd82cade [CVP] Replace nonnegative with positive, per Philip's request. NFC.
llvm-svn: 263430
2016-03-14 13:48:00 +00:00
Haicheng Wu d60ae33d29 [CVP] Convert an SDiv to a UDiv if both operands are known to be nonnegative
The motivating example is this

for (j = n; j > 1; j = i) {
   i = j / 2;
}

The signed division is safely to be changed to an unsigned division (j is known
to be larger than 1 from the loop guard) and later turned into a single shift
without considering the sign bit.

llvm-svn: 263406
2016-03-14 03:24:28 +00:00
Mehdi Amini ba9fba81d6 Remove PreserveNames template parameter from IRBuilder
This reapplies r263258, which was reverted in r263321 because
of issues on Clang side.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263393
2016-03-13 21:05:13 +00:00
Eric Christopher 35abd051c0 Temporarily revert:
commit ae14bf6488e8441f0f6d74f00455555f6f3943ac
Author: Mehdi Amini <mehdi.amini@apple.com>
Date:   Fri Mar 11 17:15:50 2016 +0000

    Remove PreserveNames template parameter from IRBuilder

    Summary:
    Following r263086, we are now relying on a flag on the Context to
    discard Value names in release builds.

    Reviewers: chandlerc

    Subscribers: mzolotukhin, llvm-commits

    Differential Revision: http://reviews.llvm.org/D18023

    From: Mehdi Amini <mehdi.amini@apple.com>

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@263258
    91177308-0d34-0410-b5e6-96231b3b80d8

until we can figure out what to do about clang and Release build testing.

This reverts commit 263258.

llvm-svn: 263321
2016-03-12 01:47:22 +00:00
Mehdi Amini 99eab3dd06 Remove PreserveNames template parameter from IRBuilder
Summary:
Following r263086, we are now relying on a flag on the Context to
discard Value names in release builds.

Reviewers: chandlerc

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D18023

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263258
2016-03-11 17:15:50 +00:00
Mehdi Amini 1e9c925182 Do not specialize IRBuilder to strip names in SROA
Summary:
Following r263086, we are replacing this by a runtime check.
More cleanup will follow on the IRBuilder itself, but I submitted
this patch separately as SROA has a fancy "prefixInserter" class
that needs extra-love.

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18022

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 263256
2016-03-11 17:15:34 +00:00
Chandler Carruth ace8c8f765 [PM] Sink the "Expression" type for GVN into the class as a private
member type.

Because of how this type is used by the ValueTable, it cannot actually
have hidden visibility. GCC actually nicely warns about this but Clang
just silently ... I don't even know. =/ We should do a better job either
way though.

This should resolve a bunch of the GCC warnings about visibility that
the port of GVN triggered and make the visibility story a bit more
correct.

llvm-svn: 263250
2016-03-11 16:25:19 +00:00
Chandler Carruth 3bc9c7fb45 [PM] The order of evaluation of these analyses is actually significant,
much to my horror, so use variables to fix it in place.

This terrifies me. Both basic-aa and memdep will provide more precise
information when the domtree and/or the loop info is available. Because
of this, if your pass (like GVN) requires domtree, and then queries
memdep or basic-aa, it will get more precise results. If it does this in
the other order, it gets less precise results.

All of the ideas I have for fixing this are, essentially, terrible. Here
I've just caused us to stop having unspecified behavior as different
implementations evaluate the order of these arguments differently. I'm
actually rather glad that they do, or the fragility of memdep and
basic-aa would have gone on unnoticed. I've left comments so we don't
immediately break this again. This should fix bots whose host compilers
evaluate the order of arguments differently from Clang.

llvm-svn: 263231
2016-03-11 13:26:47 +00:00
Chandler Carruth b47f8010a9 [PM] Make the AnalysisManager parameter to run methods a reference.
This was originally a pointer to support pass managers which didn't use
AnalysisManagers. However, that doesn't realistically come up much and
the complexity of supporting it doesn't really make sense.

In fact, *many* parts of the pass manager were just assuming the pointer
was never null already. This at least makes it much more explicit and
clear.

llvm-svn: 263219
2016-03-11 11:05:24 +00:00
Chandler Carruth 89c45a162f [PM] Port GVN to the new pass manager, wire it up, and teach a couple of
tests to run GVN in both modes.

This is mostly the boring refactoring just like SROA and other complex
transformation passes. There is some trickiness in that GVN's
ValueNumber class requires hand holding to get to compile cleanly. I'm
open to suggestions about a better pattern there, but I tried several
before settling on this. I was trying to balance my desire to sink as
much implementation detail into the source file as possible without
introducing overly many layers of abstraction.

Much like with SROA, the design of this system is made somewhat more
cumbersome by the need to support both pass managers without duplicating
the significant state and logic of the pass. The same compromise is
struck here.

I've also left a FIXME in a doxygen comment as the GVN pass seems to
have pretty woeful documentation within it. I'd like to submit this with
the FIXME and let those more deeply familiar backfill the information
here now that we have a nice place in an interface to put that kind of
documentaiton.

Differential Revision: http://reviews.llvm.org/D18019

llvm-svn: 263208
2016-03-11 08:50:55 +00:00
Adam Nemet efb234135c [LLE] Add missed LoopSimplify dependence
The code assumed that we always had a preheader without making the pass
dependent on LoopSimplify.

Thanks to Mattias Eriksson V for reporting this.

llvm-svn: 263173
2016-03-10 23:54:39 +00:00
Chandler Carruth 37f1f12226 [SROA] Fix PR25873, which Andrea Di Biagio analyzed the daylights out
of, and I misdiagnosed for months and months.

Andrea has had a patch for this forever, but I just couldn't see how
it was fixing the root cause of the problem. It didn't make sense to me,
even though the patch was perfectly good and the analysis of the actual
failure event was *fantastic*.

Well, I came back to it today because the patch has sat for *far* too
long and needs attention and decided I wouldn't let it go until I really
understood what was going on. After quite some time in the debugger,
I finally realized that in fact I had just missed an important case with
my previous attempt to fix PR22093 in r225149. Not only do we need to
handle loads that won't be split, but stores-of-loads that we won't
split. We *do* actually have enough logic in the presplitting to form
new slices for split stores.... *unless* we decided not to split them!

I'm so sorry that it took me this long to come to the realization that
this is the issue. It seems so obvious in hind sight (of course).
Anyways, the fix becomes *much* smaller and more focused. The fact that
we're left doing integer smashing is related to the FIXME in my original
commit: fundamentally, we're not aggressive about pre-splitting for
loads and stores to the same alloca. If we want to get aggressive about
this, it'll need both what Andrea had put into the proposed fix, but
also a *lot* more logic to essentially iteratively pre-split the alloca
until we can't do any more. As I said in that commit log, its really
unclear that this is the right call. Instead, the integer blending and
letting targets lower this to narrower stores seems slightly better. But
we definitely shouldn't really go down that path just to fix this bug.

Again, tons of thanks are owed to Andrea and others at Sony for working
on this bug. I really should have seen what was going on here and
re-directed them sooner. =////

llvm-svn: 263121
2016-03-10 15:31:17 +00:00
Chandler Carruth d94a5962cc [SROA] Clean up some really weird code, no functionality changed.
We already have the instruction extracted into 'I', just cast that to
a store the way we do for loads. Also, we don't enter the if unless SI
is non-null, so don't test it again for null.

I'm pretty sure the entire test there can be nuked, but this is just the
trivial cleanup.

llvm-svn: 263112
2016-03-10 14:16:18 +00:00
Chandler Carruth 7776377e62 [gvn] Fix more indenting and formatting in regions of code that will
need to be changed for porting to the new pass manager.

Also sink the comment on the ValueTable class back to that class instead
of it dangling on an anonymous namespace.

No functionality changed.

llvm-svn: 263084
2016-03-10 00:58:20 +00:00
Chandler Carruth 169c84f1cc [gvn] Reformat a chunk of the GVN code that is strangely indented prior
to restructuring it for porting to the new pass manager.

No functionality changed.

llvm-svn: 263083
2016-03-10 00:58:18 +00:00
Chandler Carruth 61440d225b [PM] Port memdep to the new pass manager.
This is a fairly straightforward port to the new pass manager with one
exception. It removes a very questionable use of releaseMemory() in
the old pass to invalidate its caches between runs on a function.
I don't think this is really guaranteed to be safe. I've just used the
more direct port to the new PM to address this by nuking the results
object each time the pass runs. While this could cause some minor malloc
traffic increase, I don't expect the compile time performance hit to be
noticable, and it makes the correctness and other aspects of the pass
much easier to reason about. In some cases, it may make things faster by
making the sets and maps smaller with better locality. Indeed, the
measurements collected by Bruno (thanks!!!) show mostly compile time
improvements.

There is sadly very limited testing at this point as there are only two
tests of memdep, and both rely on GVN. I'll be porting GVN next and that
will exercise this heavily though.

Differential Revision: http://reviews.llvm.org/D17962

llvm-svn: 263082
2016-03-10 00:55:30 +00:00
Philip Reames e0a5454df4 Fix the build
I screwed up rebasing 263072.  This change fixes the build and passes all make check.

llvm-svn: 263073
2016-03-09 23:07:53 +00:00
Philip Reames b54c8e6eea [LICM] Store promotion when memory is thread local
This patch teaches LICM's implementation of store promotion to exploit the fact that the memory location being accessed might be provable thread local. The fact it's thread local weakens the requirements for where we can insert stores since no other thread can observe the write. This allows us perform store promotion even in cases where the store is not guaranteed to execute in the loop.

Two key assumption worth drawing out is that this assumes a) no-capture is strong enough to imply no-escape, and b) standard allocation functions like malloc, calloc, and operator new return values which can be assumed not to have previously escaped.

In future work, it would be nice to generalize this so that it works without directly seeing the allocation site. I believe that the nocapture return attribute should be suitable for this purpose, but haven't investigated carefully. It's also likely that we could support unescaped allocas with similar reasoning, but since SROA and Mem2Reg should destroy those, they're less interesting than they first might seem.

Differential Revision: http://reviews.llvm.org/D16783

llvm-svn: 263072
2016-03-09 22:59:30 +00:00
Adam Nemet 660748ca8c [LLE] Add missing check for unit stride
I somehow missed this.  The case in GCC (global_alloc) was similar to
the new testcase except it had an array of structs rather than a two
dimensional array.

Fixes RP26885.

llvm-svn: 263058
2016-03-09 20:47:55 +00:00
Adam Nemet 34785ecff1 [LoopDataPrefetch] Add stats and debug output
llvm-svn: 262998
2016-03-09 05:33:21 +00:00
Sanjoy Das 2eac48de9e Return StringRef instead of a naked char*; NFC
llvm-svn: 262989
2016-03-09 02:34:19 +00:00
Sanjoy Das f13900f8ac [IRCE] Reflow comments; NFC
llvm-svn: 262988
2016-03-09 02:34:15 +00:00
Sanjay Patel f831fdb56a fix variable name; NFC
llvm-svn: 262953
2016-03-08 19:07:42 +00:00
Sanjay Patel 5c96723622 use range-based loop; NFCI
llvm-svn: 262952
2016-03-08 19:06:12 +00:00
Adam Nemet bb3680bd85 [LoopDataPrefetch] If prefetch distance is not set, skip pass
This lets select sub-targets enable this pass.  The patch implements the
idea from the recent llvm-dev thread:
http://thread.gmane.org/gmane.comp.compilers.llvm.devel/94925

The goal is to enable the LoopDataPrefetch pass for the Cyclone
sub-target only within Aarch64.

Positive and negative tests will be included in an upcoming patch that
enables selective prefetching of large-strided accesses on Cyclone.

llvm-svn: 262844
2016-03-07 18:35:42 +00:00
Adam Nemet efc091f457 [LLE] Fix a comment
llvm-svn: 262270
2016-02-29 23:21:12 +00:00
Adam Nemet 83be06e529 [LLE] Fix SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper with Polly
We can actually have dependences between accesses with different
underlying types.  Bail in this case.

A test will follow shortly.

llvm-svn: 262267
2016-02-29 22:53:59 +00:00
Chandler Carruth ad8cb382fa [LICM] Teach LICM how to handle cases where the alias set tracker was
merged into a loop that was subsequently unrolled (or otherwise nuked).

In this case it can't merge in the ASTs for any remaining nested loops,
it needs to re-add their instructions dircetly.

The fix is very isolated, but I've pulled the code for merging blocks
into the AST into a single place in the process. The only behavior
change is in the case which would have crashed before.

This fixes a crash reported by Mikael Holmen on the list after r261316
restored much of the loop pass pipelining and allowed us to actually do
this kind of nested transformation sequenc. I've taken that test case
and further reduced it into the somewhat twisty maze of loops in the
included test case. This does in fact trigger the bug even in this
reduced form.

llvm-svn: 262108
2016-02-27 04:34:07 +00:00
Haicheng Wu 5539f852ae [JumpThreading] Simplify Instructions first in ComputeValueKnownInPredecessors()
This change tries to find more opportunities to thread over basic blocks.

llvm-svn: 261981
2016-02-26 06:06:04 +00:00
Michael Zolotukhin 9f520ebc54 [LoopUnrollAnalyzer] Check that we're using SCEV for the same loop we're simulating.
Summary: Check that we're using SCEV for the same loop we're simulating. Otherwise, we might try to use the iteration number of the current loop in SCEV expressions for inner/outer loops IVs, which is clearly incorrect.

Reviewers: chandlerc, hfinkel

Subscribers: sanjoy, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17632

llvm-svn: 261958
2016-02-26 02:57:05 +00:00
Adam Nemet fb31d580ea [LoopDataPrefetch] Make it testable with opt
Summary:
Since this is an IR pass it's nice to be able to write tests without
llc.  This is the counterpart of the llc test under
CodeGen/PowerPC/loop-data-prefetch.ll.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17464

llvm-svn: 261578
2016-02-22 21:41:22 +00:00
Philip Reames ce38c2ddf6 [RS4GC] "Constant fold" the rs4gc-split-vector-values flag
This flag was part of a migration to a new means of handling vectors-of-points which was described in the llvm-dev thread "FYI: Relocating vector of pointers".  The old code path has been off by default for a while without complaints, so time to cleanup.

llvm-svn: 261569
2016-02-22 21:01:28 +00:00
Philip Reames 79fa9b75c0 [RS4GC] Revert optimization attempt due to memory corruption
This change reverts "246133 [RewriteStatepointsForGC] Reduce the number of new instructions for base pointers" and a follow on bugfix 12575.

As pointed out in pr25846, this code suffers from a memory corruption bug.  Since I'm (empirically) not going to get back to this any time soon, simply reverting the problematic change is the right answer.

llvm-svn: 261565
2016-02-22 20:45:56 +00:00
Elena Demikhovsky 9914dbd11b Allow setting MaxRerollIterations above 16
By Ayal Zaks.

Differential Revision http://reviews.llvm.org/D17258

llvm-svn: 261517
2016-02-22 09:38:28 +00:00
Duncan P. N. Exon Smith e9bc579c37 ADT: Remove == and != comparisons between ilist iterators and pointers
I missed == and != when I removed implicit conversions between iterators
and pointers in r252380 since they were defined outside ilist_iterator.

Since they depend on getNodePtrUnchecked(), they indirectly rely on UB.
This commit removes all uses of these operators.  (I'll delete the
operators themselves in a separate commit so that it can be easily
reverted if necessary.)

There should be NFC here.

llvm-svn: 261498
2016-02-21 20:39:50 +00:00
Sanjoy Das 979a11d5b2 [LoopDeletion] Add an assert that verifies LCSSA
This is inspired by PR24804 -- had this assert been there before,
isolating the root cause for PR24804 would have been far easier.

llvm-svn: 261481
2016-02-21 17:11:59 +00:00