Commit Graph

12660 Commits

Author SHA1 Message Date
Adam Nemet f219c64723 [LoopAccesses] Make VectorizerParams global + fix for cyclic dep
As LAA is becoming a pass, we can no longer pass the params to its
constructor.  This changes the command line flags to have external
storage.  These can now be accessed both from LV and LAA.

VectorizerParams is moved out of LoopAccessInfo in order to shorten the
code to access it.

This commits also has the fix (D7731) to the break dependence cycle
between the analysis and vector libraries.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229890
2015-02-19 19:14:52 +00:00
Adam Nemet 04d4163e95 Revert "Reformat."
This reverts commit r229651.

I'd like to ultimately revert r229650 but this reformat stands in the
way.  I'll reformat the affected files once the the loop-access pass is
fully committed.

llvm-svn: 229889
2015-02-19 19:14:34 +00:00
Benjamin Kramer 1c2beed7fd LSR: Move set instead of copying. NFC.
llvm-svn: 229871
2015-02-19 17:19:43 +00:00
Benjamin Kramer ea68a944a1 Demote vectors to arrays. No functionality change.
llvm-svn: 229861
2015-02-19 15:26:17 +00:00
Michael Gottesman e5ad66f8a9 [objc-arc] Introduce the concept of RCIdentity and rename all relevant functions to use that name. NFC.
The RCIdentity root ("Reference Count Identity Root") of a value V is a
dominating value U for which retaining or releasing U is equivalent to
retaining or releasing V. In other words, ARC operations on V are
equivalent to ARC operations on U.

This is a useful property to ascertain since we can use this in the ARC
optimizer to make it easier to match up ARC operations by always mapping
ARC operations to RCIdentityRoots instead of pointers themselves. Then
we perform pairing of retains, releases which are applied to the same
RCIdentityRoot.

In general, the two ways that we see RCIdentical values in ObjC are via:

  1. PointerCasts
  2. Forwarding Calls that return their argument verbatim.

As such in ObjC, two RCIdentical pointers must always point to the same
memory location.

Previously this concept was implicit in the code and various methods
that dealt with this concept were given functional names that did not
conform to any name in the "ARC" model. This often times resulted in
code that was hard for the non-ARC acquanted to understand resulting in
unhappiness and confusion.

llvm-svn: 229796
2015-02-19 00:42:38 +00:00
Michael Gottesman dfa3e4b08a [objc-arc-contract] Rename contractRelease => tryToContractReleaseIntoStoreStrong.
NFC. Makes it clearer what this method is actually supposed to do.

llvm-svn: 229795
2015-02-19 00:42:34 +00:00
Michael Gottesman 1827973f80 [objc-arc-contract] Refactor out tryToPeepholeInstruction into its own method. NFC.
The main method of ObjCARCContract is really large and busy. By refactoring this
out, it becomes easier to reason about.

llvm-svn: 229794
2015-02-19 00:42:30 +00:00
Michael Gottesman 56bd6a077a [objc-arc-contract] Reorganize the code a bit and make the debug output easier to read.
llvm-svn: 229793
2015-02-19 00:42:27 +00:00
Sanjoy Das 11b279a832 Partial fix for bug 22589
Don't spend the entire iteration space in the scalar loop prologue if
computing the trip count overflows.  This change also gets rid of the
backedge check in the prologue loop and the extra check for
overflowing trip-count.

Differential Revision: http://reviews.llvm.org/D7715

llvm-svn: 229731
2015-02-18 19:32:25 +00:00
Andrew Kaylor 527c5dc68d Adding implementation to outline C++ catch handlers for native Windows 64 exception handling.
Differential Revision: http://reviews.llvm.org/D7363

llvm-svn: 229715
2015-02-18 18:31:51 +00:00
Mohit K. Bhakkad 518946e440 [MSan][MIPS] VarArgHelper for MIPS64
Reviewers: Reviewers: eugenis, kcc, samsonov, petarj

Subscribers: dsanders, sagar, llvm-commits

Differential Revision: http://reviews.llvm.org/D7182

llvm-svn: 229667
2015-02-18 11:41:24 +00:00
NAKAMURA Takumi a250484c4c Reformat.
llvm-svn: 229651
2015-02-18 08:36:14 +00:00
NAKAMURA Takumi fa520c5f49 Revert r229622: "[LoopAccesses] Make VectorizerParams global" and others. r229622 brought cyclic dependencies between Analysis and Vector.
r229622: "[LoopAccesses] Make VectorizerParams global"
  r229623: "[LoopAccesses] Stash the report from the analysis rather than emitting it"
  r229624: "[LoopAccesses] Cache the result of canVectorizeMemory"
  r229626: "[LoopAccesses] Create the analysis pass"
  r229628: "[LoopAccesses] Change debug messages from LV to LAA"
  r229630: "[LoopAccesses] Add canAnalyzeLoop"
  r229631: "[LoopAccesses] Add missing const to APIs in VectorizationReport"
  r229632: "[LoopAccesses] Split out LoopAccessReport from VectorizerReport"
  r229633: "[LoopAccesses] Add -analyze support"
  r229634: "[LoopAccesses] Change LAA:getInfo to return a constant reference"
  r229638: "Analysis: fix buildbots"

llvm-svn: 229650
2015-02-18 08:34:47 +00:00
Craig Topper 1348f17205 [X86] Remove AVX512 pslldq/psrldq shift intrinsics. They aren't implemented yet and when they are they should be done with shuffles like SSE2 and AVX2.
llvm-svn: 229641
2015-02-18 06:24:49 +00:00
Craig Topper b324e43aed [X86] Remove AVX2 and SSE2 pslldq and psrldq intrinsics. We can represent them in IR with vector shuffles now. All their uses have been removed from clang in favor of shuffles.
llvm-svn: 229640
2015-02-18 06:24:44 +00:00
Adam Nemet 85fd9f8d09 [LoopAccesses] Change LAA:getInfo to return a constant reference
As expected, this required a few more const-correctness fixes.

Based on Hal's feedback on D7684.

llvm-svn: 229634
2015-02-18 03:44:33 +00:00
Adam Nemet d7350dbb85 [LoopAccesses] Split out LoopAccessReport from VectorizerReport
The only difference between these two is that VectorizerReport adds a
vectorizer-specific prefix to its messages.  When LAA is used in the
vectorizer context the prefix is added when we promote the
LoopAccessReport into a VectorizerReport via one of the constructors.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229632
2015-02-18 03:44:25 +00:00
Adam Nemet 8b12afbeee [LoopAccesses] Add missing const to APIs in VectorizationReport
When I split out LoopAccessReport from this, I need to create some temps
so constness becomes necessary.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229631
2015-02-18 03:44:20 +00:00
Adam Nemet d0db4c1395 [LoopAccesses] Change debug messages from LV to LAA
Also add pass name as an argument to VectorizationReport::emitAnalysis.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229628
2015-02-18 03:43:37 +00:00
Adam Nemet d6b7e29815 [LoopAccesses] Create the analysis pass
This is a function pass that runs the analysis on demand.  The analysis
can be initiated by querying the loop access info via LAA::getInfo.  It
either returns the cached info or runs the analysis.

Symbolic stride information continues to reside outside of this analysis
pass. We may move it inside later but it's not a priority for me right
now.  The idea is that Loop Distribution won't support run-time stride
checking at least initially.

This means that when querying the analysis, symbolic stride information
can be provided optionally.  Whether stride information is used can
invalidate the cache entry and rerun the analysis.  Note that if the
loop does not have any symbolic stride, the entry should be preserved
across Loop Distribution and LV.

Since currently the only user of the pass is LV, I just check that the
symbolic stride information didn't change when using a cached result.

On the LV side, LoopVectorizationLegality requests the info object
corresponding to the loop from the analysis pass.  A large chunk of the
diff is due to LAI becoming a pointer from a reference.

A test will be added as part of the -analyze patch.

Also tested that with AVX, we generate identical assembly output for the
testsuite (including the external testsuite) before and after.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229626
2015-02-18 03:43:24 +00:00
Adam Nemet 01abb2c355 [LoopAccesses] Make blockNeedsPredication static
blockNeedsPredication is in LoopAccess in order to share it with the
vectorizer.  It's a utility needed by LoopAccess not strictly provided
by it but it's a good place to share it.  This makes the function static
so that it no longer required to create an LoopAccessInfo instance in
order to access it from LV.

This was actually causing problems because it would have required
creating LAI much earlier that LV::canVectorizeMemory().

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229625
2015-02-18 03:43:19 +00:00
Adam Nemet 3cf32ad6db [LoopAccesses] Cache the result of canVectorizeMemory
LAA will be an on-demand analysis pass, so we need to cache the result
of the analysis.  canVectorizeMemory is renamed to analyzeLoop which
computes the result.  canVectorizeMemory becomes the query function for
the cached result.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229624
2015-02-18 03:42:57 +00:00
Adam Nemet 5474be2c80 [LoopAccesses] Stash the report from the analysis rather than emitting it
The transformation passes will query this and then emit them as part of
their own report.  The currently only user LV is modified to do just
that.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229623
2015-02-18 03:42:50 +00:00
Adam Nemet 4f3ede5a01 [LoopAccesses] Make VectorizerParams global
As LAA is becoming a pass, we can no longer pass the params to its
constructor.  This changes the command line flags to have external
storage.  These can now be accessed both from LV and LAA.

VectorizerParams is moved out of LoopAccessInfo in order to shorten the
code to access it.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229622
2015-02-18 03:42:43 +00:00
Adam Nemet 30f16e1696 [LoopAccesses] Rename LoopAccessAnalysis to LoopAccessInfo
LoopAccessAnalysis will be used as the name of the pass.

This is part of the patchset that converts LoopAccessAnalysis into an
actual analysis pass.

llvm-svn: 229621
2015-02-18 03:42:35 +00:00
Akira Hatanaka 1defd5afbd [InstCombine] Do not insert a GEP instruction before a landingpad instruction.
InstCombiner::visitGetElementPtrInst was using getFirstNonPHI to compute the
insertion point, which caused the verifier to complain when a GEP was inserted
before a landingpad instruction. This commit fixes it to use getFirstInsertionPt
instead.

rdar://problem/19394964

llvm-svn: 229619
2015-02-18 03:30:11 +00:00
Hal Finkel 4393559621 [BDCE] Don't forget uses of root instructions seen before the instruction itself
When visiting the initial list of "root" instructions (those which must always
be alive), for those that are integer-valued (such as invokes returning an
integer), we mark their bits as (initially) all dead (we might, obviously, find
uses of those bits later, but all bits are assumed dead until proven
otherwise). Don't do so, however, if we're already seen a use of those bits by
another root instruction (such as a store).

Fixes a miscompile of the sanitizer unit tests on x86_64.

Also, add a debug line for visiting the root instructions, and remove a debug
line which tried to print instructions being removed (printing dead
instructions is dangerous, and can sometimes crash).

llvm-svn: 229618
2015-02-18 03:12:28 +00:00
Benjamin Kramer 6cd780ff21 Prefer SmallVector::append/insert over push_back loops.
Same functionality, but hoists the vector growth out of the loop.

llvm-svn: 229500
2015-02-17 15:29:18 +00:00
Elena Demikhovsky ef035bb974 Fixed a bug in store sinking.
The problem was in store-sink barrier check.

Store sink barrier should be checked for ModRef (read-write) mode.

http://llvm.org/bugs/show_bug.cgi?id=22613

llvm-svn: 229495
2015-02-17 13:10:05 +00:00
Hal Finkel 2bb61ba2fe [BDCE] Add a bit-tracking DCE pass
BDCE is a bit-tracking dead code elimination pass. It is based on ADCE (the
"aggressive DCE" pass), with the added capability to track dead bits of integer
valued instructions and remove those instructions when all of the bits are
dead.

Currently, it does not actually do this all-bits-dead removal, but rather
replaces the instruction's uses with a constant zero, and lets instcombine (and
the later run of ADCE) do the rest. Because we essentially get a run of ADCE
"for free" while tracking the dead bits, we also do what ADCE does and removes
actually-dead instructions as well (this includes instructions newly trivially
dead because all bits were dead, but not all such instructions can be removed).

The motivation for this is a case like:

int __attribute__((const)) foo(int i);
int bar(int x) {
  x |= (4 & foo(5));
  x |= (8 & foo(3));
  x |= (16 & foo(2));
  x |= (32 & foo(1));
  x |= (64 & foo(0));
  x |= (128& foo(4));
  return x >> 4;
}

As it turns out, if you order the bit-field insertions so that all of the dead
ones come last, then instcombine will remove them. However, if you pick some
other order (such as the one above), the fact that some of the calls to foo()
are useless is not locally obvious, and we don't remove them (without this
pass).

I did a quick compile-time overhead check using sqlite from the test suite
(Release+Asserts). BDCE took ~0.4% of the compilation time (making it about
twice as expensive as ADCE).

I've not looked at why yet, but we eliminate instructions due to having
all-dead bits in:
External/SPEC/CFP2006/447.dealII/447.dealII
External/SPEC/CINT2006/400.perlbench/400.perlbench
External/SPEC/CINT2006/403.gcc/403.gcc
MultiSource/Applications/ClamAV/clamscan
MultiSource/Benchmarks/7zip/7zip-benchmark

llvm-svn: 229462
2015-02-17 01:36:59 +00:00
Mehdi Amini b9a0fa4822 InstCombine: fold more cases of (fp_to_u/sint (u/sint_to_fp val))
Fixes radar 15486701.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 229437
2015-02-16 21:47:54 +00:00
James Molloy 83570247f1 Run LICM as part of the cleanup phase from the scalar optimizer.
Things like LoopUnrolling can produce loop invariant values - make sure
we pick them up.

llvm-svn: 229419
2015-02-16 18:59:54 +00:00
Hal Finkel c64150b8f3 [ADCE] Don't indent inside an anonymous namespace
To be consistent with what clang-format does, don't add extra indentation
inside an anonymous namespace. NFC.

llvm-svn: 229412
2015-02-16 18:08:00 +00:00
James Molloy e32d806b5f [LoopReroll] Relax some assumptions a little.
We won't find a root with index zero in any loop that we are able to reroll.
However, we may find one in a non-rerollable loop, so bail gracefully instead
of failing hard.

llvm-svn: 229406
2015-02-16 17:02:00 +00:00
James Molloy 4c7deb2259 [LoopReroll] Don't crash on dead code
If a PHI has no users, don't crash; bail gracefully. This shouldn't
happen often, but we can make no guarantees that previous passes didn't leave
dead code around.

llvm-svn: 229405
2015-02-16 17:01:52 +00:00
Evgeniy Stepanov 292acab847 [asan] Reuse a common function.
Do not reimplement RoundUpToAlignment.

llvm-svn: 229397
2015-02-16 14:49:37 +00:00
Aaron Ballman f9a1897c72 Removing LLVM_DELETED_FUNCTION, as MSVC 2012 was the last reason for requiring the macro. NFC; LLVM edition.
llvm-svn: 229340
2015-02-15 22:54:22 +00:00
Hal Finkel 8626ed2eae [ADCE] Convert another loop for a range-based for
We can use a range-based for for the operands loop too; NFC.

llvm-svn: 229319
2015-02-15 15:51:25 +00:00
Hal Finkel 92fb2d3803 [ADCE] Use inst_range and range-based fors
Convert a few loops to range-based fors; NFC.

llvm-svn: 229318
2015-02-15 15:51:23 +00:00
Hal Finkel c6035cff55 [ADCE] Fix formatting of pointer types
We prefer to put the * with the variable, not with the type; NFC.

llvm-svn: 229317
2015-02-15 15:47:52 +00:00
Hal Finkel 234d8fea7b [ADCE] Fix capitalization of another local variable
Bring another local variable in compliance with our naming conventions, NFC.

llvm-svn: 229316
2015-02-15 15:45:30 +00:00
Hal Finkel 75901293a1 [ADCE] Fix capitalization of some local variables
Bring some local variables in compliance with our naming conventions, NFC.

llvm-svn: 229315
2015-02-15 15:45:28 +00:00
Elena Demikhovsky 6f5a859633 Enabled cost calculation for masked memory operations.
We already have implementation for cost calculation for
masked memory operations. I just call it from the loop vectorizer.

llvm-svn: 229290
2015-02-15 08:08:48 +00:00
Ramkumar Ramachandra 8fcb498a9a InstCombine: propagate deref via new addDereferenceableAttr
The "dereferenceable" attribute cannot be added via .addAttribute(),
since it also expects a size in bytes. AttrBuilder#addAttribute or
AttributeSet#addAttribute is wrapped by classes Function, InvokeInst,
and CallInst. Add corresponding wrappers to
AttrBuilder#addDereferenceableAttr.

Having done this, propagate the dereferenceable attribute via
gc.relocate, adding a test to exercise it. Note that -datalayout is
required during execution over and above -instcombine, because
InstCombine only optionally requires DataLayoutPass.

Differential Revision: http://reviews.llvm.org/D7510

llvm-svn: 229265
2015-02-14 19:37:54 +00:00
Andrea Di Biagio f54432388f [optnone] Skip pass Constant Hoisting on optnone functions.
Added test CodeGen/X86/constant-hoisting-optnone.ll to verify that
pass Constant Hoisting is not run on optnone functions.

llvm-svn: 229258
2015-02-14 15:11:48 +00:00
Duncan P. N. Exon Smith 2c79ad974c Transforms: Canonicalize access to function attributes, NFC
Canonicalize access to function attributes to use the simpler API.

getAttributes().getAttribute(AttributeSet::FunctionIndex, Kind)
  => getFnAttribute(Kind)

getAttributes().hasAttribute(AttributeSet::FunctionIndex, Kind)
  => hasFnAttribute(Kind)

llvm-svn: 229202
2015-02-14 01:11:29 +00:00
Philip Reames 9ae15209ad [InstCombine] When canonicalizing gep indices, prefer zext when possible
If we know that the sign bit of a value being sign extended is zero, we can use a zero extension instead.  This is motivated by the fact that zero extensions are generally cheaper on x86 (and most other architectures?).  We already apply a similar transform in DAGCombine, this just extends that to the IR level.

This comes up when we eagerly canonicalize gep indices to the width of a machine register (i64 on x86_64). To do so, we insert sign extensions (sext) to promote smaller types. 

Differential Revision: http://reviews.llvm.org/D7255

llvm-svn: 229189
2015-02-14 00:05:36 +00:00
Andrea Di Biagio 30d471f6aa [InstCombine] Fix regression introduced at r227197.
This patch fixes a problem I accidentally introduced in an instruction combine
on select instructions added at r227197. That revision taught the instruction
combiner how to fold a cttz/ctlz followed by a icmp plus select into a single
cttz/ctlz with flag 'is_zero_undef' cleared.

However, the new rule added at r227197 would have produced wrong results in the
case where a cttz/ctlz with flag 'is_zero_undef' cleared was follwed by a
zero-extend or truncate. In that case, the folded instruction would have
been inserted in a wrong location thus leaving the CFG in an inconsistent
state.

This patch fixes the problem and add two reproducible test cases to
existing test 'InstCombine/select-cmp-cttz-ctlz.ll'.

llvm-svn: 229124
2015-02-13 16:33:34 +00:00
James Molloy 1b6207e6eb [SimplifyCFG] Be more aggressive
Up the phi node folding threshold from a cheap "1" to a meagre "2".

Update tests for extra added selects and slight code churn.

llvm-svn: 229099
2015-02-13 10:48:30 +00:00
Chandler Carruth 30d69c2e36 [PM] Remove the old 'PassManager.h' header file at the top level of
LLVM's include tree and the use of using declarations to hide the
'legacy' namespace for the old pass manager.

This undoes the primary modules-hostile change I made to keep
out-of-tree targets building. I sent an email inquiring about whether
this would be reasonable to do at this phase and people seemed fine with
it, so making it a reality. This should allow us to start bootstrapping
with modules to a certain extent along with making it easier to mix and
match headers in general.

The updates to any code for users of LLVM are very mechanical. Switch
from including "llvm/PassManager.h" to "llvm/IR/LegacyPassManager.h".
Qualify the types which now produce compile errors with "legacy::". The
most common ones are "PassManager", "PassManagerBase", and
"FunctionPassManager".

llvm-svn: 229094
2015-02-13 10:01:29 +00:00
Chandler Carruth 1fbc316534 [unroll] Concede defeat and disable the unroll analyzer for now.
The issues with the new unroll analyzer are more fundamental than code
cleanup, algorithm, or data structure changes. I've sent an email to the
original commit thread with details and a proposal for how to redesign
things. I'm disabling this for now so that we don't spend time
debugging issues with it in its current state.

llvm-svn: 229064
2015-02-13 05:31:46 +00:00
Michael Liao d266b928ae [InstCombine] Fix a bug when combining `icmp` from `ptrtoint`
- First, there's a crash when we try to combine that pointers into `icmp`
  directly by creating a `bitcast`, which is invalid if that two pointers are
  from different address spaces.

- It's not always appropriate to cast one pointer to another if they are from
  different address spaces as that is not no-op cast. Instead, we only combine
  `icmp` from `ptrtoint` if that two pointers are of the same address space.

llvm-svn: 229063
2015-02-13 04:51:26 +00:00
Chandler Carruth 6c03dff7cc [unroll] Merge the simplification and DCE estimation methods on the
UnrollAnalyzer.

Now they share a single worklist and have less implicit state between
them. There was no real benefit to separating these two things out.

I'm going to subsequently refactor things to share even more code.

llvm-svn: 229062
2015-02-13 04:39:05 +00:00
Chandler Carruth d9591d8922 [unroll] Remove pointless dyn_cast<>s to Instruction - the users of an
instruction must by definition be instructions.

llvm-svn: 229061
2015-02-13 04:33:21 +00:00
Chandler Carruth 5457e20d27 [unroll] Don't check the loop set for whether an instruction is
contained in it each time we try to add it to the worklist, just check
this when pulling it off the worklist. That way we do it at most once
per instruction with the cost of the worklist set we would need to pay
anyways.

llvm-svn: 229060
2015-02-13 04:30:44 +00:00
Chandler Carruth e5c30e4e10 [unroll] Change the other worklist in the unroll analyzer to be a set
vector.

In addition to dramatically reducing the work required for contrived
example loops, this also has to correct some serious latent bugs in the
cost computation. Previously, we might add an instruction onto the
worklist once for every load which it used and was simplified. Then we
would visit it many times and accumulate "savings" each time.

I mean, fortunately this couldn't matter for things like calls with 100s
of operands, but even for binary operators this code seems like it must
be double counting the savings.

I just noticed this by inspection and due to the runtime problems it can
introduce, I don't have any test cases for cases where the cost produced
by this routine is unacceptable.

llvm-svn: 229059
2015-02-13 04:27:50 +00:00
Chandler Carruth 7824bc9241 [unroll] Replace a boolean, for loop, condition, and break with
std::all_of and a lambda. Much cleaner, no functionality
changed.

llvm-svn: 229058
2015-02-13 04:18:14 +00:00
Chandler Carruth 06d537cdd6 [unroll] Directly query for dead instructions.
In the unroll analyzer, it is checking each user to see if that user
will become dead. However, it first checked if that user was missing
from the simplified values map, and then if was also missing from the
dead instructions set. We add everything from the simplified values map
to the dead instructions set, so the first step is completely subsumed
by the second. Moreover, the first step requires *inserting* something
into the simplified value map which isn't what we want at all.

This also replaces a dyn_cast with a cast as an instruction cannot be
used by a non-instruction.

llvm-svn: 229057
2015-02-13 04:14:05 +00:00
Chandler Carruth 82cb30f10c [unroll] Replace a linear time check for no uses with a constant time
check.

Also hoist this into the enqueue process as it is faster even than
testing the worklist set, we should just directly filter these out much
like we filter out constants and such.

llvm-svn: 229056
2015-02-13 04:06:08 +00:00
Chandler Carruth 3b057b3216 [unroll] Rather than an operand set, use a setvector for the worklist.
We don't just want to handle duplicate operands within an instruction,
but also duplicates across operands of different instructions. I should
have gone straight to this, but I had convinced myself that it wasn't
going to be necessary briefly. I've come to my senses after chatting
more with Nick, and am now happier here.

llvm-svn: 229054
2015-02-13 03:57:40 +00:00
Chandler Carruth 17a0496b5a [unroll] Extract the code to enqueue operansd for the worklist in the
unroll analysis into a lambda and call it. That's much simpler than
duplicating all the code.

llvm-svn: 229053
2015-02-13 03:49:41 +00:00
Chandler Carruth 8c86375a10 [unroll] Use a small set to de-duplicate operands prior to putting them
into the worklist. This avoids allocating lots of worklist memory for
them when there are large numbers of repeated operands.

llvm-svn: 229052
2015-02-13 03:48:38 +00:00
Chandler Carruth 93063e6191 [unroll] Make the unroll cost analysis terminate deterministically and
reasonably quickly.

I don't have a reduced test case, but for a version of FFMPEG, this
makes the loop unroller start finishing at all (after over 15 minutes of
running, it hadn't terminated for me, no idea if it was a true infloop
or just exponential work).

The key thing here is to check the DeadInstructions set when pulling
things off the worklist. Without this, we would re-walk the user list of
already dead instructions again and again and again. Consider phi nodes
with many, many operands and other patterns.

The other important aspect of this is that because we would keep
re-visiting instructions that were already known dead, we kept adding
their cost savings to this! This would cause our cost savings to be
*insanely* inflated from this.

While I was here, I also rotated the operand walk out of the worklist
loop to make the code easier to read. There is still work to be done to
minimize worklist traffic because we don't de-duplicate operands. This
means we may add the same instruction onto the worklist 1000s of times
if it shows up in 1000s of operansd to a PHI node for example.

Still, with this patch, the ffmpeg testcase I have finishes quickly and
I can't measure the runtime impact of the unroll analysis any more. I'll
probably try to do a few more cleanups to this code, but not sure how
much cleanup I can justify right now.

llvm-svn: 229038
2015-02-13 03:40:58 +00:00
Chandler Carruth dd6029fc6e [unroll] Make range based for loops a bit more explicit and more
readable.

The biggest thing that was causing me problems is recognizing the
references vs. poniters here. I also found that for maps naming the loop
variable as KeyValue helps make it obvious why you don't actually use it
directly. Finally, using 'auto' instead of 'User *' doesn't seem like
a good tradeoff. Much like with the other cases, I like to know its
a pointer, and 'User' is just as long and tells the reader a lot more.

llvm-svn: 229033
2015-02-13 02:45:17 +00:00
Chandler Carruth 87fdafc7b2 [IC] Fix a bug with the instcombine canonicalizing of loads and
propagating of metadata.

We were propagating !nonnull metadata even when the newly formed load is
no longer of a pointer type. This is clearly broken and results in LLVM
failing the verifier and aborting. This patch just restricts the
propagation of !nonnull metadata to when we actually have a pointer
type.

This bug report and the initial version of this patch was provided by
Charles Davis! Many thanks for finding this!

We still need to add logic to round-trip the metadata correctly if we
combine from pointer types to integer types and then back by using range
metadata for the integer type loads. But this is the minimal and safe
version of the patch, which is important so we can backport it into 3.6.

llvm-svn: 229029
2015-02-13 02:30:01 +00:00
Chandler Carruth 415f41258f [unroll] Avoid the "Insn" abbreviation of Instruction. This is quite
hard to type and read for me, and is inconsistent with the other
abbreviation in the base class "Inst". For most of these (where they are
used widely) I prefer just spelling it out as Instruction. I've changed
two of the short-lived variables to use "Inst" to match the base class.

llvm-svn: 229028
2015-02-13 02:17:39 +00:00
Chandler Carruth 302a133b1e [unroll] Tidy up the integer we use to accumululate the number of
instructions optimized. NFC, just separating this out from the
functionality changing commit.

llvm-svn: 229026
2015-02-13 02:10:56 +00:00
Chandler Carruth 10a9926ab5 [unroll] Don't use a map from pointer to bool. Use a set.
This is much more efficient. In particular, the query with the user
instruction has to insert a false for every missing instruction into the
set. This is just a cleanup a long the way to fixing the underlying
algorithm problems here.

llvm-svn: 228994
2015-02-13 00:29:39 +00:00
Michael Zolotukhin 1b48019751 Prevent division by 0.
When we try to estimate number of potentially removed instructions in
loop unroller, we analyze first N iterations and then scale the
computed number by TripCount/N. We should bail out early if N is 0.

llvm-svn: 228988
2015-02-13 00:17:03 +00:00
Chandler Carruth 186ad60815 [unroll] Update the new analysis logic from r228265 to use modern coding
conventions for function names consistently. Some were already using
this but not all.

llvm-svn: 228987
2015-02-13 00:00:24 +00:00
Benjamin Kramer 443c7967ea InstCombine: Allow folding of xor into icmp by changing the predicate for vectors
The loop vectorizer can create this pattern.

llvm-svn: 228954
2015-02-12 20:26:46 +00:00
James Molloy e805ad95dc [LoopRerolling] Be more forgiving with instruction order.
We can't solve the full subgraph isomorphism problem. But we can
allow obvious cases, where for example two instructions of different
types are out of order. Due to them having different types/opcodes,
there is no ambiguity.

llvm-svn: 228931
2015-02-12 15:54:14 +00:00
Dmitry Vyukov 2e8d82e607 tsan: do not instrument not captured values
I've built some tests in WebRTC with and without this change. With this change number of __tsan_read/write calls is reduced by 20-40%, binary size decreases by 5-10% and execution time drops by ~5%. For example:

$ ls -l old/modules_unittests new/modules_unittests
-rwxr-x--- 1 dvyukov 41708976 Jan 20 18:35 old/modules_unittests
-rwxr-x--- 1 dvyukov 38294008 Jan 20 18:29 new/modules_unittests
$ objdump -d old/modules_unittests | egrep "callq.*__tsan_(read|write|unaligned)" | wc -l
239871
$ objdump -d new/modules_unittests | egrep "callq.*__tsan_(read|write|unaligned)" | wc -l
148365

http://reviews.llvm.org/D7069

llvm-svn: 228917
2015-02-12 09:55:28 +00:00
Chandler Carruth 63aaa98d94 [slp] Fix a nasty bug in the SLP vectorizer that Joerg pointed out.
Apparently some code finally started to tickle this after my
canonicalization changes to instcombine.

The bug stems from trying to form a vector type out of scalars that
aren't compatible at all. In this example, from x86_mmx values. The code
in the vectorizer that checks for reasonable types whas checking for
aggregates or vectors, but there are lots of other types that should
just never reach the vectorizer.

Debugging this was made more confusing by the lie in an assert in
VectorType::get() -- it isn't that the types are *primitive*. The types
must be integer, pointer, or floating point types. No other types are
allowed.

I've improved the assert and added a helper to the vectorizer to handle
the element type validity checks. It now re-uses the VectorType static
function and then further excludes weird target-specific types that we
probably shouldn't be touching here (x86_fp80 and ppc_fp128). Neither of
these are really reachable anyways (neither 80-bit nor 128-bit things
will get vectorized) but it seems better to just eagerly exclude such
nonesense.

I've added a test case, but while it definitely covers two of the paths
through this code there may be more paths that would benefit from test
coverage. I'm not familiar enough with the SLP vectorizer to synthesize
test cases for all of these, but was able to update the code itself by
inspection.

llvm-svn: 228899
2015-02-12 02:30:56 +00:00
Tim Northover 02438033e8 DeadArgElim: aggregate Return assessment properly.
I mistakenly thought the liveness of each "RetVal(F, i)" depended only on F. It
actually depends on the index too, which means we need to be careful about how
the results are combined before return. In particular if a single Use returns
Live, that counts for the entire object, at the granularity we're considering.

llvm-svn: 228885
2015-02-11 23:13:11 +00:00
Mehdi Amini 9730116bd6 Reassociate: cannot negate a INT_MIN value
Summary:
When trying to canonicalize negative constants out of
multiplication expressions, we need to check that the
constant is not INT_MIN which cannot be negated.

Reviewers: mcrosier

Reviewed By: mcrosier

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7286

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 228872
2015-02-11 19:54:44 +00:00
James Molloy 7c336576a5 [SimplifyCFG] Swap to using TargetTransformInfo for cost
analysis.

We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness"
heuristic and use TTI directly. Generally NFC intended, but we're using a slightly
different heuristic now so there is a slight test churn.

Test changes:
  * combine-comparisons-by-cse.ll: Removed unneeded branch check.
  * 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq.
  * coalesce-subregs.ll: Superfluous block check.
  * 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv.
  * PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present.
  * select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI.

llvm-svn: 228826
2015-02-11 12:15:41 +00:00
James Molloy f147359376 [LoopReroll] Introduce the concept of DAGRootSets.
A DAGRootSet models an induction variable being used in a rerollable
loop. For example:

   x[i*3+0] = y1
   x[i*3+1] = y2
   x[i*3+2] = y3

   Base instruction -> i*3
                    +---+----+
                   /    |     \
               ST[y1]  +1     +2  <-- Roots
                        |      |
                      ST[y2] ST[y3]

There may be multiple DAGRootSets, for example:

   x[i*2+0] = ...   (1)
   x[i*2+1] = ...   (1)
   x[i*2+4] = ...   (2)
   x[i*2+5] = ...   (2)
   x[(i+1234)*2+5678] = ... (3)
   x[(i+1234)*2+5679] = ... (3)

This concept is similar to the "Scale" member used previously, but allows
multiple independent sets of roots based off the same induction variable.

llvm-svn: 228821
2015-02-11 09:19:47 +00:00
Zachary Turner 3bd47cee78 Use ADDITIONAL_HEADER_DIRS in all LLVM CMake projects.
This allows IDEs to recognize the entire set of header files for
each of the core LLVM projects.

Differential Revision: http://reviews.llvm.org/D7526
Reviewed By: Chris Bieneman

llvm-svn: 228798
2015-02-11 03:28:02 +00:00
Justin Bogner d24e185784 InstrProf: Lower coverage mappings by setting their sections appropriately
Add handling for __llvm_coverage_mapping to the InstrProfiling
pass. We need to make sure the constant and any profile names it
refers to are in the correct sections, which is easier and cleaner to
do here where we have to know about profiling sections anyway.

This is really tricky to test without a frontend, so I'm committing
the test for the fix in clang. If anyone knows a good way to test this
within LLVM, please let me know.

Fixes PR22531.

llvm-svn: 228793
2015-02-11 02:52:44 +00:00
Reid Kleckner 96d011315a Don't promote asynch EH invokes of nounwind functions to calls
If the landingpad of the invoke is using a personality function that
catches asynch exceptions, then it can catch a trap.

Also add some landingpads to invalid LLVM IR test cases that lack them.

Over-the-shoulder reviewed by David Majnemer.

llvm-svn: 228782
2015-02-11 01:23:16 +00:00
David Majnemer 7679300d93 EarlyCSE: It isn't safe to CSE across synchronization boundaries
This fixes PR22514.

llvm-svn: 228760
2015-02-10 23:09:43 +00:00
Tim Northover 43c0d2db50 DeadArgElim: arguments affect all returned sub-values by default.
Unless we meet an insertvalue on a path from some value to a return, that value
will be live if *any* of the return's components are live, so all of those
components must be added to the MaybeLiveUses.

Previously we were deleting arguments if sub-value 0 turned out to be dead.

llvm-svn: 228731
2015-02-10 19:49:18 +00:00
Chandler Carruth 2496910325 Revert r228556: InstCombine: propagate nonNull through assume
This commit isn't using the correct context, and is transfoming calls
that are operands to loads rather than calls that are operands to an
icmp feeding into an assume. I've replied on the original review thread
with a very reduced test case and some thoughts on how to rework this.

llvm-svn: 228677
2015-02-10 08:07:32 +00:00
Philip Reames 7e7dc3e9df Adjust how we avoid poll insertion inside the poll function (NFC)
I realized that my early fix for this was overly complicated.  Rather than scatter checks around in a bunch of places, just exit early when we visit the poll function itself.

Thinking about it a bit, the whole inlining mechanism used with gc.safepoint_poll could probably be cleaned up a bit.  Originally, poll insertion was fused with gc relocation rewriting.  It might be worth going back to see if we can simplify the chain of events now that these two are seperated.  As one thought, maybe it makes sense to rewrite calls inside the helper function before inlining it to the many callers.  This would require us to visit the poll function before any other functions though..

llvm-svn: 228634
2015-02-10 00:04:53 +00:00
Adrian Prantl 34e7590e0d Debug info: When updating debug info during SROA, do not emit debug info
for any padding introduced by SROA. In particular, do not emit debug info
for an alloca that represents only the padding introduced by a previous
iteration.

Fixes PR22495.

llvm-svn: 228632
2015-02-09 23:57:22 +00:00
Adrian Prantl 27bd01f71c Debug info: Use DW_OP_bit_piece instead of DW_OP_piece in the
intermediate representation. This
- increases consistency by using the same granularity everywhere
- allows for pieces < 1 byte
- DW_OP_piece didn't actually allow storing an offset.

Part of PR22495.

llvm-svn: 228631
2015-02-09 23:57:15 +00:00
Ramkumar Ramachandra 3edf74fe29 [Statepoint] Improve two asserts, fix some style (NFC)
Summary:
It's important that our users immediately know what gc.safepoint_poll
is. Also fix the style of the declaration of CreateGCStatepoint, in
preparation for another change that will wrap it.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7517

llvm-svn: 228626
2015-02-09 23:02:10 +00:00
Ramkumar Ramachandra 2e4b9e0a37 PlaceSafepoints: modernize gc.result.* -> gc.result
Differential Revision: http://reviews.llvm.org/D7516

llvm-svn: 228625
2015-02-09 23:00:40 +00:00
Philip Reames d4a912fefd Update file comment to clarify points highlighted in review (NFC)
llvm-svn: 228621
2015-02-09 22:44:03 +00:00
Philip Reames a29de87ea4 Use range for loops in PlaceSafepoints (NFC)
llvm-svn: 228620
2015-02-09 22:26:11 +00:00
Duncan P. N. Exon Smith bd75ad4d0c IR: Take uint64_t in DIBuilder::createExpression()
`DIExpression` deals with `uint64_t`, so it doesn't make sense that
`createExpression()` is created from `int64_t`.  Switch to `uint64_t` to
unify them.

I've temporarily left in the `int64_t` version, which forwards to the
`uint64_t` version.  I'll delete it once I've updated the callers.

llvm-svn: 228619
2015-02-09 22:13:27 +00:00
Philip Reames b1ed02f728 Add basic tests for PlaceSafepoints
This is just adding really simple tests which should have been part of the original submission.  When doing so, I discovered that I'd mistakenly removed required pieces when preparing the patch for upstream submission.  I fixed two such bugs in this submission.

llvm-svn: 228610
2015-02-09 21:48:05 +00:00
Akira Hatanaka 8d3cb829ce Fix a bug in DemoteRegToStack where a reload instruction was inserted into the
wrong basic block.

This would happen when the result of an invoke was used by a phi instruction
in the invoke's normal destination block. An instruction to reload the invoke's
value would get inserted before the critical edge was split and a new basic
block (which is the correct insertion point for the reload) was created. This
commit fixes the bug by splitting the critical edge before all the reload
instructions are inserted.

Also, hoist up the code which computes the insertion point to the only place
that need that computation.

rdar://problem/15978721

llvm-svn: 228566
2015-02-09 06:38:23 +00:00
Tim Northover 705d2af9e1 DeadArgElim: fix mismatch in accounting of array return types.
Some parts of DeadArgElim were only considering the individual fields
of StructTypes separately, but others (where insertvalue &
extractvalue instructions occur) also looked into ArrayTypes.

This one is an actual bug; the mismatch can lead to an argument being
considered used by a return sub-value that isn't being tracked (and
hence is dead by default). It then gets incorrectly eliminated.

llvm-svn: 228559
2015-02-09 01:21:00 +00:00
Tim Northover 854c927de5 DeadArgElim: assess uses of entire return value aggregate.
Previously, a non-extractvalue use of an aggregate return value meant
the entire return was considered live (the algorithm gave up
entirely). This was correct, but conservative. It's better to actually
look at that Use, making the analysis results apply to all sub-values
under consideration.

E.g.

  %val = call { i32, i32 } @whatever()
  [...]
  ret { i32, i32 } %val

The return is using the entire aggregate (sub-values 0 and 1). We can
still simplify @whatever if we can prove that this return is itself
unused.

Also unifies the logic slightly between aggregate and non-aggregate
cases..

llvm-svn: 228558
2015-02-09 01:20:53 +00:00
Ramkumar Ramachandra a021ee62ca InstCombine: propagate nonNull through assume
Make assume (load (call|invoke) != null) set nonNull return attribute
for the call and invoke. Also include tests.

Differential Revision: http://reviews.llvm.org/D7107

llvm-svn: 228556
2015-02-09 01:13:13 +00:00
Bjorn Steinbrink 5ec7522771 Correctly combine alias.scope metadata by a union instead of intersecting
Summary:
The alias.scope metadata represents sets of things an instruction might
alias with. When generically combining the metadata from two
instructions the result must be the union of the original sets, because
the new instruction might alias with anything any of the original
instructions aliased with.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7490

llvm-svn: 228525
2015-02-08 17:07:14 +00:00
Benjamin Kramer f094d77de8 LoopIdiom: Use utility functions.
The only difference between deleteIfDeadInstruction and
RecursivelyDeleteTriviallyDeadInstructions is that the former also
manually invalidates SCEV. That's unnecessary because SCEV automatically
gets informed when an instruction is deleted via a ValueHandle. NFC.

llvm-svn: 228508
2015-02-07 21:37:08 +00:00
Bjorn Steinbrink 71bf3b800a Properly update AA metadata when performing call slot optimization
Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7482

llvm-svn: 228500
2015-02-07 17:54:36 +00:00
Evgeniy Stepanov 4e12057760 [msan] Fix "missing origin" in atomic store.
An atomic store always make the target location fully initialized (in the
current implementation). It should not store origin. Initialized memory can't
have meaningful origin, and, due to origin granularity (4 bytes) there is a
chance that this extra store would overwrite meaningfull origin for an adjacent
location.

llvm-svn: 228444
2015-02-06 21:47:39 +00:00
Michael Zolotukhin 7af83c1f39 Use estimated number of optimized insns in unroll-threshold computation.
If complete-unroll could help us to optimize away N% of instructions, we
might want to do this even if the final size would exceed loop-unroll
threshold. However, we don't want to unroll huge loop, and we are add
AbsoluteThreshold to avoid that - this threshold will never be crossed,
even if we expect to optimize 99% instructions after that.

llvm-svn: 228434
2015-02-06 20:20:40 +00:00
Michael Zolotukhin 4e8598eee3 [InstSimplify] Add SimplifyFPBinOp function.
It is a variation of SimplifyBinOp, but it takes into account
FastMathFlags.

It is needed in inliner and loop-unroller to accurately predict the
transformation's outcome (previously we dropped the flags and were too
conservative in some cases).

Example:
float foo(float *a, float b) {
 float r;
 if (a[1] * b)
   r = /* a lot of expensive computations */;
 else
   r = 1;
 return r;
}
float boo(float *a) {
 return foo(a, 0.0);
}

Without this patch, we don't inline 'foo' into 'boo'.

llvm-svn: 228432
2015-02-06 20:02:51 +00:00
Adam Nemet 7206d7a5d2 [LV] Move addRuntimeCheck to LoopAccessAnalysis
This will allow it to be shared with the new Loop Distribution pass.

getFirstInst is currently duplicated across LoopVectorize.cpp and
LoopAccessAnalysis.cpp.  This is a short-term work-around until we figure out
a better solution.

NFC.  (The code moved is adjusted a bit for the name of the Loop member and
that PtrRtCheck is now a reference rather than a pointer.)

llvm-svn: 228418
2015-02-06 18:31:04 +00:00
Benjamin Kramer 970eac40bf Make helper functions/classes/globals static. NFC.
llvm-svn: 228410
2015-02-06 17:51:54 +00:00
Matthias Braun 2e404597f4 InstCombine: Combine select sequences into a single select
Normalize
select(C0, select(C1, a, b), b) -> select((C0 & C1), a, b)
select(C0, a, select(C1, a, b)) -> select((C0 | C1), a, b)

This normal form may enable further combines on the And/Or and shortens
paths for the values. Many targets prefer the other but can go back
easily in CodeGen.

Differential Revision: http://reviews.llvm.org/D7399

llvm-svn: 228409
2015-02-06 17:49:36 +00:00
Benjamin Kramer 39f76acb5c IRCE: Demote template to ArrayRef and SmallVector to array.
NFC.

llvm-svn: 228398
2015-02-06 14:43:49 +00:00
Alexey Samsonov 19763c48df [ASan] Enable -asan-stack-dynamic-alloca by default.
By default, store all local variables in dynamic alloca instead of
static one. It reduces the stack space usage in use-after-return mode
(dynamic alloca will not be called if the local variables are stored
in a fake stack), and improves the debug info quality for local
variables (they will not be described relatively to %rbp/%rsp, which
are assumed to be clobbered by function calls).

llvm-svn: 228336
2015-02-05 19:39:20 +00:00
Hans Wennborg 8b4dbdf15d LowerSwitch: Use ConstantInt for CaseRange::{Low,High}
Case values are always ConstantInt. This allows us to remove
a bunch of casts. NFC.

llvm-svn: 228312
2015-02-05 16:58:10 +00:00
Hans Wennborg 8c82fbcb73 LowerSwitch: remove default args from CaseRange ctor; NFC
llvm-svn: 228311
2015-02-05 16:50:27 +00:00
Aaron Ballman 94d4d33a38 Removing an unused variable warning I accidentally introduced with my last warning fix; NFC.
llvm-svn: 228295
2015-02-05 13:52:42 +00:00
Aaron Ballman 1b072b340b Silencing an MSVC warning about a switch statement with no cases; NFC.
llvm-svn: 228294
2015-02-05 13:40:04 +00:00
Michael Zolotukhin a9aadd2903 Implement new heuristic for complete loop unrolling.
Complete loop unrolling can make some loads constant, thus enabling a
lot of other optimizations. To catch such cases, we look for loads that
might become constants and estimate number of instructions that would be
simplified or become dead after substitution.

Example:
Suppose we have:
int a[] = {0, 1, 0};
v = 0;
for (i = 0; i < 3; i ++)
  v += b[i]*a[i];

If we completely unroll the loop, we would get:
v = b[0]*a[0] + b[1]*a[1] + b[2]*a[2]

Which then will be simplified to:
v = b[0]* 0 + b[1]* 1 + b[2]* 0

And finally:
v = b[1]

llvm-svn: 228265
2015-02-05 02:34:00 +00:00
Tom Stellard 080209d573 StructurizeCFG: Remove obsolete fix for loop backedge detection
This is no longer needed now that we are using a reverse post-order
traversal.

llvm-svn: 228187
2015-02-04 20:49:47 +00:00
Tom Stellard 071ec90b68 StructurizeCFG: Use a reverse post-order traversal
We were previously doing a post-order traversal and operating on the
list in reverse, however this would occasionaly cause backedges for
loops to be visited before some of the other blocks in the loop.

We know use a reverse post-order traversal, which avoids this issue.

The reverse post-order traversal is not completely ideal, so we need
to manually fixup the list to ensure that inner loop backedges are
visited before outer loop backedges.

llvm-svn: 228186
2015-02-04 20:49:44 +00:00
Duncan P. N. Exon Smith 920df5c1bb Utils: Resolve cycles under distinct MDNodes
Track unresolved nodes under distinct `MDNode`s during `MapMetadata()`,
and resolve them at the end.  Previously, these cycles wouldn't get
resolved.

llvm-svn: 228180
2015-02-04 19:44:34 +00:00
Reid Kleckner c26a17a822 Add range adapters predecessors() and successors() for BBs
Use them in two isolated transforms so we know they work and aren't dead
code.

llvm-svn: 228173
2015-02-04 19:14:57 +00:00
Alexey Samsonov b9b8027cee SpecialCaseList: Add support for parsing multiple input files.
Summary:
This change allows users to create SpecialCaseList objects from
multiple local files. This is needed to implement a proper support
for -fsanitize-blacklist flag (allow users to specify multiple blacklists,
in addition to default blacklist, see PR22431).

DFSan can also benefit from this change, as DFSan instrumentation pass now
accepts ABI-lists both from -fsanitize-blacklist= and -mllvm -dfsan-abilist flags.

Go bindings are fixed accordingly.

Test Plan: regression test suite

Reviewers: pcc

Subscribers: llvm-commits, axw, kcc

Differential Revision: http://reviews.llvm.org/D7367

llvm-svn: 228155
2015-02-04 17:39:48 +00:00
Aaron Ballman 34c325e749 Fixing a -Wsign-compare warning; NFC
llvm-svn: 228142
2015-02-04 14:01:08 +00:00
Philip Reames 72634d6af0 Fix a warning in non-asserts builds
llvm-svn: 228114
2015-02-04 05:11:20 +00:00
Kostya Serebryany 77cc729ad7 [sanitizer] add another workaround for PR 17409: when over a threshold emit coverage instrumentation as calls.
llvm-svn: 228102
2015-02-04 01:21:45 +00:00
Philip Reames 5a9685dba6 Clang format of a file introduced in 228090 (NFC)
llvm-svn: 228091
2015-02-04 00:39:57 +00:00
Philip Reames 47cc673e1f Add a pass for inserting safepoints into (nearly) arbitrary IR
This pass is responsible for figuring out where to place call safepoints and safepoint polls. It doesn't actually make the relocations explicit; that's the job of the RewriteStatepointsForGC pass (http://reviews.llvm.org/D6975).

Note that this code is not yet finalized.  Its moving in tree for incremental development, but further cleanup is needed and will happen over the next few days.  It is not yet part of the standard pass order.  

Planned changes in the near future:
 - I plan on restructuring the statepoint rewrite to use the functions add to the IRBuilder a while back. 
 - In the current pass, the function "gc.safepoint_poll" is treated specially but is not an intrinsic. I plan to make identifying the poll function a property of the GCStrategy at some point in the near future.
 - As follow on patches, I will be separating a collection of test cases we have out of tree and submitting them upstream. 
 - It's not explicit in the code, but these two patches are introducing a new state for a statepoint which looks a lot like a patchpoint. There's no a transient form which doesn't yet have the relocations explicitly represented, but does prevent reordering of memory operations. Once this is in, I need to update actually make this explicit by reserving the 'unused' argument of the statepoint as a flag, updating the docs, and making the code explicitly check for such a thing. This wasn't really planned, but once I split the two passes - which was done for other reasons - the intermediate state fell out. Just reminds us once again that we need to merge statepoints and patchpoints at some point in the not that distant future.

Future directions planned:
 - Identifying more cases where a backedge safepoint isn't required to ensure timely execution of a safepoint poll.
 - Tweaking the insertion process to generate easier to optimize IR. (For example, investigating making SplitBackedge) the default.
 - Adding opt-in flags for a GCStrategy to use this pass. Once done, add this pass to the actual pass ordering.

Differential Revision: http://reviews.llvm.org/D6981

llvm-svn: 228090
2015-02-04 00:37:33 +00:00
Adam Nemet 5add5d9d85 [LV] Split off memcheck block really at the first check
I've noticed this while trying to move addRuntimeCheck to LoopAccessAnalysis.

I think that the intention was to early exit from the overflow checking before
the code for the memchecks.  This is the entire reason why we compute
FirstCheckInst but then we don't use that as the splitting instruction but the
final check.  Looks like an oversight.

llvm-svn: 228056
2015-02-03 22:45:39 +00:00
Daniel Berlin 487aed0d77 Allow PRE to insert no-cost phi nodes
llvm-svn: 228024
2015-02-03 20:37:08 +00:00
Jingyue Wu d7966ff3b9 Add straight-line strength reduction to LLVM
Summary:
Straight-line strength reduction (SLSR) is implemented in GCC but not yet in
LLVM. It has proven to effectively simplify statements derived from an unrolled
loop, and can potentially benefit many other cases too. For example,

LLVM unrolls

  #pragma unroll
  foo (int i = 0; i < 3; ++i) {
    sum += foo((b + i) * s);
  }

into

  sum += foo(b * s);
  sum += foo((b + 1) * s);
  sum += foo((b + 2) * s);

However, no optimizations yet reduce the internal redundancy of the three
expressions:

  b * s
  (b + 1) * s
  (b + 2) * s

With SLSR, LLVM can optimize these three expressions into:

  t1 = b * s
  t2 = t1 + s
  t3 = t2 + s

This commit is only an initial step towards implementing a series of such
optimizations. I will implement more (see TODO in the file commentary) in the
near future. This optimization is enabled for the NVPTX backend for now.
However, I am more than happy to push it to the standard optimization pipeline
after more thorough performance tests.

Test Plan: test/StraightLineStrengthReduce/slsr.ll

Reviewers: eliben, HaoLiu, meheff, hfinkel, jholewinski, atrick

Reviewed By: jholewinski, atrick

Subscribers: karthikthecool, jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D7310

llvm-svn: 228016
2015-02-03 19:37:06 +00:00
Adam Nemet b60295a525 [LoopVectorize] Fix rebase glitch in r227751
LoopVectorizationLegality::{getNumLoads,getNumStores} should forward to
LoopAccessAnalysis now.

Thanks to Takumi for noticing this!

llvm-svn: 227992
2015-02-03 17:59:53 +00:00
Renato Golin af213728cc Adding AArch64 support to ASan instrumentation
For the time being, it is still hardcoded to support only the 39 VA bits
variant, I plan to work on supporting 42 and 48 VA bits variants, but I
don't have access to such hardware at the moment.

Patch by Chrystophe Lyon.

llvm-svn: 227965
2015-02-03 11:20:45 +00:00
NAKAMURA Takumi c7f8bfc5e5 Resurrect initializers for NumLoads and NumStores in LoopVectorizationLegality to suppress undefined behavior.
FIXME: Shall they be managed in LAA?
llvm-svn: 227940
2015-02-03 03:55:06 +00:00
Jingyue Wu 49a766e468 Resurrect the assertion removed by r227717
Summary: MSVC can compile "LoopID->getOperand(0) == LoopID" when LoopID is MDNode*.

Test Plan: no regression

Reviewers: mkuper

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D7327

llvm-svn: 227853
2015-02-02 20:41:11 +00:00
Erik Eckstein 7330e358f6 Fix: SLPVectorizer crashes with assertion when vectorizing a cmp instruction.
The commit r225977 uncovered this bug. The problem was that the vectorizer tried to
read the second operand of an already deleted instruction.
The bug didn't show up before r225977 because the freed memory still contained a non-null pointer.
With r225977 deletion of instructions is delayed and the read operand pointer is always null.

llvm-svn: 227800
2015-02-02 12:45:34 +00:00
Benjamin Kramer ad960019b0 LoopVectorize: Remove initializer list that blocks MSVC.
llvm-svn: 227766
2015-02-01 21:13:26 +00:00
Adam Nemet 0456327cfb [LoopVectorize] Move LoopAccessAnalysis to its own module
Other than moving code and adding the boilerplate for the new files, the code
being moved is unchanged.

There are a few global functions that are shared with the rest of the
LoopVectorizer.  I moved these to the new module as well (emitLoopAnalysis,
stripIntegerCast, replaceSymbolicStrideSCEV) along with the Report class used
by emitLoopAnalysis.  There is probably room for further improvement in this
area.

I kept DEBUG_TYPE "loop-vectorize" because it's used as the PassName with
emitOptimizationRemarkAnalysis.  This will obviously have to change.

NFC.  This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.

llvm-svn: 227756
2015-02-01 16:56:15 +00:00
Adam Nemet 3f737d935d [LoopVectorize] Move RuntimePointerCheck under LoopAccessAnalysis
This class needs to remain public because it's used by
LoopVectorizationLegality::addRuntimeCheck.

NFC.  This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.

llvm-svn: 227755
2015-02-01 16:56:11 +00:00
Adam Nemet c237f68fc9 [LoopVectorize] Pass parameters explicitly to MemoryDepChecker
Rather than using globals use a structure to pass parameters from the
vectorizer.  This prepares the class to be moved outside the LoopVectorizer.

It's not great how all this is passed through in LoopAccessAnalysis but this
is all expected to change once the class start servicing the Loop Distribution
pass as well where some of these parameters make no sense.

NFC.  This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.

llvm-svn: 227754
2015-02-01 16:56:09 +00:00
Adam Nemet c28ffdcf35 [LoopVectorize] Split out LoopAccessAnalysis from LoopVectorizationLegality
Move the canVectorizeMemory functionality from LoopVectorizationLegality to a
new class LoopAccessAnalysis and forward users.

Currently the collection of the symbolic stride information is kept with
LoopVectorizationLegality and it becomes an input to LoopAccessAnalysis.

NFC.  This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.

llvm-svn: 227751
2015-02-01 16:56:04 +00:00
Adam Nemet 5985971fa5 [LoopVectorize] Add accessors for Num{Stores,Loads,PredStores} in AccessAnalysis
These members are moving to LoopAccessAnalysis.  The accessors help to hide
this.

NFC.  This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.

llvm-svn: 227750
2015-02-01 16:56:02 +00:00
Adam Nemet a2abc59dac [LoopVectorize] Rename the Report class to VectorizationReport
This class will become public in the new LoopAccessAnalysis header so the name
needs to be more global.

NFC.  This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.

llvm-svn: 227749
2015-02-01 16:56:00 +00:00
Adam Nemet ac71cecdc6 [LoopVectorize] Factor out duplicated code into Report::emitAnalysis
The logic in emitAnalysis is duplicated across multiple functions.  This
splits it into a function.  Another use will be added by the patchset.

NFC.  This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.

llvm-svn: 227748
2015-02-01 16:55:58 +00:00
Adam Nemet 6854b9203a [LoopVectorize] Split out RuntimePointerCheck from LoopVectorizationLegality
RuntimePointerCheck will be used through LoopAccessAnalysis in
LoopVectorizationLegality.  Later in the patchset it will become a local class
of LoopAccessAnalysis.

NFC.  This is part of the patchset that splits out the memory dependence logic
from LoopVectorizationLegality into a new class LoopAccessAnalysis.
LoopAccessAnalysis will be used by the new Loop Distribution pass.

llvm-svn: 227747
2015-02-01 16:55:56 +00:00
Chandler Carruth 21fc195c13 [multiversion] Kill FunctionTargetTransformInfo, TTI itself is now
per-function and supports the exact desired interface.

llvm-svn: 227743
2015-02-01 14:37:03 +00:00
Benjamin Kramer 6ab86b1bb6 EarlyCSE: Replace custom hash mixing with Hashing.h
Brings it in line with the other hashes in EarlyCSE.

llvm-svn: 227733
2015-02-01 12:30:59 +00:00
Chandler Carruth fdb9c573f7 [multiversion] Thread a function argument through all the callers of the
getTTI method used to get an actual TTI object.

No functionality changed. This just threads the argument and ensures
code like the inliner can correctly look up the callee's TTI rather than
using a fixed one.

The next change will use this to implement per-function subtarget usage
by TTI. The changes after that should eliminate the need for FTTI as that
will have become the default.

llvm-svn: 227730
2015-02-01 12:01:35 +00:00
Chandler Carruth fdffd87d68 [PM] Port SimplifyCFG to the new pass manager.
This should be sufficient to replace the initial (minor) function pass
pipeline in Clang with the new pass manager. I'll probably add an (off
by default) flag to do that just to ensure we can get extra testing.

llvm-svn: 227726
2015-02-01 11:34:21 +00:00
Chandler Carruth e8c686aa86 [PM] Port EarlyCSE to the new pass manager.
I've added RUN lines both to the basic test for EarlyCSE and the
target-specific test, as this serves as a nice test that the TTI layer
in the new pass manager is in fact working well.

llvm-svn: 227725
2015-02-01 10:51:23 +00:00
Michael Kuperstein a691f3e921 Removed assert that doesn't typecheck and breaks debug MSVC build.
llvm-svn: 227717
2015-02-01 08:46:20 +00:00
Jingyue Wu 6c26bb63fe [SeparateConstOffsetFromGEP] skip optnone functions
llvm-svn: 227705
2015-02-01 02:34:41 +00:00
Jingyue Wu 6e091c8eab [SeparateConstOffsetFromGEP] set PreservesCFG flag
SeparateConstOffsetFromGEP does not change the shape of the control flow graph.

llvm-svn: 227704
2015-02-01 02:33:02 +00:00
Jingyue Wu 0220df0dfd [NVPTX] Emit .pragma "nounroll" for loops marked with nounroll
Summary:
CUDA driver can unroll loops when jit-compiling PTX. To prevent CUDA
driver from unrolling a loop marked with llvm.loop.unroll.disable is not
unrolled by CUDA driver, we need to emit .pragma "nounroll" at the
header of that loop.

This patch also extracts getting unroll metadata from loop ID metadata
into a shared helper function.

Test Plan: test/CodeGen/NVPTX/nounroll.ll

Reviewers: eliben, meheff, jholewinski

Reviewed By: jholewinski

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D7041

llvm-svn: 227703
2015-02-01 02:27:45 +00:00
Adrian Prantl 152ac396db Fix PR22393. When recursively replacing an aggregate with a smaller
aggregate or scalar, the debug info needs to refer to the absolute offset
(relative to the entire variable) instead of storing the offset inside
the smaller aggregate.

llvm-svn: 227702
2015-02-01 00:58:04 +00:00