Commit Graph

10996 Commits

Author SHA1 Message Date
Arnold Schwaighofer a2c8e008d2 LoopVectorizer: Truncate i64 trip counts of i32 phis if necessary
In signed arithmetic we could end up with an i64 trip count for an i32 phi.
Because it is signed arithmetic we know that this is only defined if the i32
does not wrap. It is therefore safe to truncate the i64 trip count to a i32
value.

Fixes PR18049.

llvm-svn: 195787
2013-11-26 22:11:23 +00:00
Diego Novillo c0dd1037c8 Refactor some code in SampleProfile.cpp
I'm adding new functionality in the sample profiler. This will
require more data to be kept around for each function, so I moved
the structure SampleProfile that we keep for each function into
a separate class.

There are no functional changes in this patch. It simply provides
a new home where to place all the new data that I need to propagate
weights through edges.

There are some other name and minor edits throughout.

llvm-svn: 195780
2013-11-26 20:37:33 +00:00
Nadav Rotem f9f8482e3a PR18060 - When we RAUW values with ExtractElement instructions in some cases
we generate PHI nodes with multiple entries from the same basic block but
with different values. Enabling CSE on ExtractElement instructions make sure
that all of the RAUWed instructions are the same.

llvm-svn: 195773
2013-11-26 17:29:19 +00:00
Stepan Dyatkovskiy abb8505dc5 PR17925 bugfix.
Short description.

This issue is about case of treating pointers as integers.
We treat pointers as different if they references different address space.
At the same time, we treat pointers equal to integers (with machine address
width). It was a point of false-positive. Consider next case on 32bit machine:

void foo0(i32 addrespace(1)* %p)
void foo1(i32 addrespace(2)* %p)
void foo2(i32 %p)

foo0 != foo1, while
foo1 == foo2 and foo0 == foo2.

As you can see it breaks transitivity. That means that result depends on order
of how functions are presented in module. Next order causes merging of foo0
and foo1: foo2, foo0, foo1
First foo0 will be merged with foo2, foo0 will be erased. Second foo1 will be
merged with foo2.
Depending on order, things could be merged we don't expect to.

The fix:
Forbid to treat any pointer as integer, except for those, who belong to address space 0.

llvm-svn: 195769
2013-11-26 16:11:03 +00:00
Chandler Carruth 6378cf539f [PM] Split the CallGraph out from the ModulePass which creates the
CallGraph.

This makes the CallGraph a totally generic analysis object that is the
container for the graph data structure and the primary interface for
querying and manipulating it. The pass logic is separated into its own
class. For compatibility reasons, the pass provides wrapper methods for
most of the methods on CallGraph -- they all just forward.

This will allow the new pass manager infrastructure to provide its own
analysis pass that constructs the same CallGraph object and makes it
available. The idea is that in the new pass manager, the analysis pass's
'run' method returns a concrete analysis 'result'. Here, that result is
a 'CallGraph'. The 'run' method will typically do only minimal work,
deferring much of the work into the implementation of the result object
in order to be lazy about computing things, but when (like DomTree)
there is *some* up-front computation, the analysis does it prior to
handing the result back to the querying pass.

I know some of this is fairly ugly. I'm happy to change it around if
folks can suggest a cleaner interim state, but there is going to be some
amount of unavoidable ugliness during the transition period. The good
thing is that this is very limited and will naturally go away when the
old pass infrastructure goes away. It won't hang around to bother us
later.

Next up is the initial new-PM-style call graph analysis. =]

llvm-svn: 195722
2013-11-26 04:19:30 +00:00
Chandler Carruth 57458517ef Migrate metadata information from scalar to vector instructions during
SLP vectorization. Based on the code in BBVectorizer.

Fixes PR17741.

Patch by Raul Silvera, reviewed by Hal and Nadav. Reformatted by my
driving of clang-format. =]

llvm-svn: 195528
2013-11-23 00:48:34 +00:00
Yuchen Wu c87ca32163 llvm-cov: Split entry blocks in GCNOProfiling.cpp.
gcov expects every function to contain an entry block that
unconditionally branches into the next block. clang does not implement
basic blocks in this manner, so gcov did not output correct branch info
if the entry block branched to multiple blocks.

This change splits every function's entry block into an empty block and
a block with the rest of the instructions. The instrumentation code will
take care of the rest.

llvm-svn: 195513
2013-11-22 23:07:45 +00:00
Manman Ren cb14bbcc48 Debug Info: move StripDebugInfo from StripSymbols.cpp to DebugInfo.cpp.
We can share the implementation between StripSymbols and dropping debug info
for metadata versions that do not match.

Also update the comments to match the implementation. A follow-on patch will
drop the "Debug Info Version" module flag in StripDebugInfo.

llvm-svn: 195505
2013-11-22 22:06:31 +00:00
Matt Arsenault 6ea0aade26 StructurizeCFG: Fix verification failure with some loops.
If the beginning of the loop was also the entry block
of the function, branches were inserted to the entry block
which isn't allowed. If this occurs, create a new dummy
function entry block that branches to the start of the loop.

llvm-svn: 195493
2013-11-22 19:24:39 +00:00
Matt Arsenault 9fb6e0ba58 StructurizeCFG: Fix inverting a branch on an argument
llvm-svn: 195492
2013-11-22 19:24:37 +00:00
Rafael Espindola 6597992c69 Add a fixed version of r195470 back.
The fix is simply to use CurI instead of I when handling aliases to
avoid accessing a invalid iterator.

original message:

Convert linkonce* to weak* instead of strong.

Also refactor the logic into a helper function. This is an important improve
on mingw where the linker complains about mixed weak and strong symbols.
Converting to weak ensures that the symbol is not dropped, but keeps in a
comdat, making the linker happy.

llvm-svn: 195477
2013-11-22 17:58:12 +00:00
Rafael Espindola 77aa674cc4 Revert "Convert linkonce* to weak* instead of strong."
This reverts commit r195470.
Debugging failure in some bots.

llvm-svn: 195472
2013-11-22 17:09:34 +00:00
Richard Sandiford 8ee1b77de3 Add a Scalarizer pass.
llvm-svn: 195471
2013-11-22 16:58:05 +00:00
Rafael Espindola 5574032575 Convert linkonce* to weak* instead of strong.
Also refactor the logic into a helper function. This is an important improvement
on mingw where the linker complains about mixed weak and strong symbols.
Converting to weak ensures that the symbol is not dropped, but keeps in a
comdat, making the linker happy.

llvm-svn: 195470
2013-11-22 16:14:30 +00:00
Arnold Schwaighofer 1756e1ea92 SLPVectorizer: Fix whitespace errors.
llvm-svn: 195468
2013-11-22 15:47:17 +00:00
Yi Jiang 79a2b0a6d1 SLP Vectorizer: Extract cost will only be added once even if the scalar has multiple external uses.
llvm-svn: 195406
2013-11-22 01:57:02 +00:00
Peter Collingbourne 0be79e1ade Introduce two command-line flags for the instrumentation pass to control whether the labels of pointers should be ignored in load and store instructions
The new command line flags are -dfsan-ignore-pointer-label-on-store and -dfsan-ignore-pointer-label-on-load. Their default value matches the current labelling scheme.

Additionally, the function __dfsan_union_load is marked as readonly.

Patch by Lorenzo Martignoni!

Differential Revision: http://llvm-reviews.chandlerc.com/D2187

llvm-svn: 195382
2013-11-21 23:20:54 +00:00
Evgeniy Stepanov cb5bdffc4e [msan] Propagate condition origin in select instruction.
llvm-svn: 195349
2013-11-21 12:00:24 +00:00
Yuchen Wu 2a9d96992d llvm-cov: Don't assume FileChecksum was generated.
For cases where emitProfileArcs() was called but emitProfileNotes() was
not, set the CfgChecksum to 0.

llvm-svn: 195311
2013-11-21 04:53:39 +00:00
Yuchen Wu 664dc7678b llvm-cov: Fixed some bugs related to file checksum.
Added call to update CfgChecksum. Made FileChecksum a vector, separate
for each source file.

llvm-svn: 195309
2013-11-21 04:01:05 +00:00
Yuchen Wu babe749125 llvm-cov: Added file checksum to gcno and gcda files.
Instead of permanently outputting "MVLL" as the file checksum, clang
will create gcno and gcda checksums by hashing the destination block
numbers of every arc. This allows for llvm-cov to check if the two gcov
files are synchronized.

Regenerated the test files so they contain the checksum. Also added
negative test to ensure error when the checksums don't match.

llvm-svn: 195191
2013-11-20 04:15:05 +00:00
Arnold Schwaighofer 8bc4a0ba14 SLPVectorizer: Fix stale for Value pointer array
We are slicing an array of Value pointers and process those slices in a loop.
The problem is that we might invalidate a later slice by vectorizing a former
slice.

Use a WeakVH to track the pointer. If the pointer is deleted or RAUW'ed we can
tell.

The test case will only fail when running with libgmalloc.

radar://15498655

llvm-svn: 195162
2013-11-19 22:20:20 +00:00
Arnold Schwaighofer 5f7c48ebff SLPVectorizer: Fix whitespace errors
llvm-svn: 195161
2013-11-19 22:20:18 +00:00
Chandler Carruth a126200665 Fix an issue where SROA computed different results based on the relative
order of slices of the alloca which have exactly the same size and other
properties. This was found by a perniciously unstable sort
implementation used to flush out buggy uses of the algorithm.

The fundamental idea is that findCommonType should return the best
common type it can find across all of the slices in the range. There
were two bugs here previously:

1) We would accept an integer type smaller than a byte-width multiple,
   and if there were different bit-width integer types, we would accept
   the first one. This caused an actual failure in the testcase updated
   here when the sort order changed.
2) If we found a bad combination of types or a non-load, non-store use
   before an integer typed load or store we would bail, but if we found
   the integere typed load or store, we would use it. The correct
   behavior is to always use an integer typed operation which covers the
   partition if one exists.

While a clever debugging sort algorithm found problem #1 in our existing
test cases, I have no useful test case ideas for #2. I spotted in by
inspection when looking at this code.

llvm-svn: 195118
2013-11-19 09:03:18 +00:00
Michael Ilseman d930c19d20 Add support for software expansion of 64-bit integer division instructions.
Patch by Dmitri Shtilman!

llvm-svn: 195116
2013-11-19 06:54:19 +00:00
Adrian Prantl 8e10fdbc0f Debug info: Let LowerDbgDeclare perfom the dbg.declare -> dbg.value
lowering only for load/stores to scalar allocas. The resulting values
confuse the backend and don't add anything because we can describe
array-allocas with a dbg.declare intrinsic just fine.

rdar://problem/15464571

llvm-svn: 195052
2013-11-18 23:04:38 +00:00
Alexey Samsonov a788b940f7 [ASan] Fix PR17867 - make sure ASan doesn't crash if use-after-scope and use-after-return are combined.
llvm-svn: 195014
2013-11-18 14:53:55 +00:00
Arnold Schwaighofer b72cb4ec49 LoopVectorizer: Extend the induction variable to a larger type
In some case the loop exit count computation can overflow. Extend the type to
prevent most of those cases.

The problem is loops like:
int main ()
{
  int a = 1;
  char b = 0;
  lbl:
    a &= 4;
    b--;
    if (b) goto lbl;
  return a;
}

The backedge count is 255. The induction variable type is i8. If we add one to
255 to get the exit count we overflow to zero.

To work around this issue we extend the type of the induction variable to i32 in
the case of i8 and i16.

PR17532

llvm-svn: 195008
2013-11-18 13:14:32 +00:00
NAKAMURA Takumi f9c8339a4e Utils/LoopUnroll.cpp: Tweak (StringRef)OldName to be valid until it is used, since r194601.
eraseFromParent() invalidates OldName.

llvm-svn: 194970
2013-11-17 18:05:34 +00:00
Hal Finkel 29aeb20518 Add a loop rerolling flag to the PassManagerBuilder
This adds a boolean member variable to the PassManagerBuilder to control loop
rerolling (just like we have for unrolling and the various vectorization
options). This is necessary for control by the frontend. Loop rerolling remains
disabled by default at all optimization levels.

llvm-svn: 194966
2013-11-17 16:02:50 +00:00
Hal Finkel 66cd3f1ba3 Add the cold attribute to error-reporting call sites
Generally speaking, control flow paths with error reporting calls are cold.
So far, error reporting calls are calls to perror and calls to fprintf,
fwrite, etc. with stderr as the stream. This can be extended in the future.

The primary motivation is to improve block placement (the cold attribute
affects the static branch prediction heuristics).

llvm-svn: 194943
2013-11-17 02:06:35 +00:00
Hal Finkel 67107ea1af Fix ndebug-build unused variable in loop rerolling
llvm-svn: 194941
2013-11-17 01:21:54 +00:00
Hal Finkel bf45efde2d Add a loop rerolling pass
This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The
transformation aims to take loops like this:

for (int i = 0; i < 3200; i += 5) {
  a[i]     += alpha * b[i];
  a[i + 1] += alpha * b[i + 1];
  a[i + 2] += alpha * b[i + 2];
  a[i + 3] += alpha * b[i + 3];
  a[i + 4] += alpha * b[i + 4];
}

and turn them into this:

for (int i = 0; i < 3200; ++i) {
  a[i] += alpha * b[i];
}

and loops like this:

for (int i = 0; i < 500; ++i) {
  x[3*i] = foo(0);
  x[3*i+1] = foo(0);
  x[3*i+2] = foo(0);
}

and turn them into this:

for (int i = 0; i < 1500; ++i) {
  x[i] = foo(0);
}

There are two motivations for this transformation:

  1. Code-size reduction (especially relevant, obviously, when compiling for
code size).

  2. Providing greater choice to the loop vectorizer (and generic unroller) to
choose the unrolling factor (and a better ability to vectorize). The loop
vectorizer can take vector lengths and register pressure into account when
choosing an unrolling factor, for example, and a pre-unrolled loop limits that
choice. This is especially problematic if the manual unrolling was optimized
for a machine different from the current target.

The current implementation is limited to single basic-block loops only. The
rerolling recognition should work regardless of how the loop iterations are
intermixed within the loop body (subject to dependency and side-effect
constraints), but the significant restriction is that the order of the
instructions in each iteration must be identical. This seems sufficient to
capture all current use cases.

This pass is not currently enabled by default at any optimization level.

llvm-svn: 194939
2013-11-16 23:59:05 +00:00
Hal Finkel 12100bf7e8 Apply the InstCombine fptrunc sqrt optimization to llvm.sqrt
InstCombine, in visitFPTrunc, applies the following optimization to sqrt calls:

  (fptrunc (sqrt (fpext x))) -> (sqrtf x)

but does not apply the same optimization to llvm.sqrt. This is a problem
because, to enable vectorization, Clang generates llvm.sqrt instead of sqrt in
fast-math mode, and because this optimization is being applied to sqrt and not
applied to llvm.sqrt, sometimes the fast-math code is slower.

This change makes InstCombine apply this optimization to llvm.sqrt as well.

This fixes the specific problem in PR17758, although the same underlying issue
(optimizations applied to libcalls are not applied to intrinsics) exists for
other optimizations in SimplifyLibCalls.

llvm-svn: 194935
2013-11-16 21:29:08 +00:00
Benjamin Kramer 03f3e248eb InstCombine: fold (A >> C) == (B >> C) --> (A^B) < (1 << C) for constant Cs.
This is common in bitfield code.

llvm-svn: 194925
2013-11-16 16:00:48 +00:00
Arnold Schwaighofer dbb7b87d7a LoopVectorizer: Use abi alignment for accesses with no alignment
When we vectorize a scalar access with no alignment specified, we have to set
the target's abi alignment of the scalar access on the vectorized access.
Using the same alignment of zero would be wrong because most targets will have a
bigger abi alignment for vector types.

This probably fixes PR17878.

llvm-svn: 194876
2013-11-15 23:09:33 +00:00
Manman Ren bc37658a7f ArgumentPromotion: correctly transfer TBAA tags and alignments.
We used to use std::map<IndicesVector, LoadInst*> for OriginalLoads, and when we
try to promote two arguments, they will both write to OriginalLoads causing
created loads for the two arguments to have the same original load. And the same
tbaa tag and alignment will be put to the created loads for the two arguments.

The fix is to use std::map<std::pair<Argument*, IndicesVector>, LoadInst*>
for OriginalLoads, so each Argument will write to different parts of the map.

PR17906

llvm-svn: 194846
2013-11-15 20:41:15 +00:00
Kostya Serebryany 0604c62d7b [asan] use GlobalValue::PrivateLinkage for coverage guard to save quite a bit of code size
llvm-svn: 194800
2013-11-15 09:52:05 +00:00
Bob Wilson da4147c743 Reapply "[asan] Poor man's coverage that works with ASan"
I was able to successfully run a bootstrapped LTO build of clang with
r194701, so this change does not seem to be the cause of our failing
buildbots.

llvm-svn: 194789
2013-11-15 07:16:09 +00:00
Matt Arsenault a9e95abcbf Add instcombine visitor for addrspacecast
llvm-svn: 194786
2013-11-15 05:45:08 +00:00
Bob Wilson ae73587c4b Revert "[asan] Poor man's coverage that works with ASan"
This reverts commit 194701. Apple's bootstrapped LTO builds have been failing,
and this change (along with compiler-rt 194702-194704) is the only thing on
the blamelist.  I will either reappy these changes or help debug the problem,
depending on whether this fixes the buildbots.

llvm-svn: 194780
2013-11-15 03:28:22 +00:00
Kostya Serebryany 6da3f74061 [asan] Poor man's coverage that works with ASan
llvm-svn: 194701
2013-11-14 13:27:41 +00:00
Evgeniy Stepanov 585813e33d [msan] Fast path optimization for wrap-indirect-calls feature of MemorySanitizer.
Indirect call wrapping helps MSanDR (dynamic instrumentation companion tool
for MSan) to catch all cases where execution leaves a compiler-instrumented
module by allowing the tool to rewrite targets of indirect calls.

This change is an optimization that skips wrapping for calls when target is
inside the current module. This relies on the linker providing symbols at the
begin and end of the module code (or code + data, does not really matter).
Gold linker provides such symbols by default. GNU (BFD) linker needs a link
flag: -Wl,--defsym=__executable_start=0.

More info:
https://code.google.com/p/memory-sanitizer/wiki/MSanDR#Native_exec

llvm-svn: 194697
2013-11-14 12:29:04 +00:00
Jakub Staszak 86a7492f0d Use StringRef instead of std::string
llvm-svn: 194601
2013-11-13 20:09:11 +00:00
Alexey Samsonov aa19c0a1c3 Fix -Wdelete-non-virtual-dtor warnings by making SampleProfile methods non-virtual
llvm-svn: 194568
2013-11-13 13:09:39 +00:00
Diego Novillo 8d6568b56b SampleProfileLoader pass. Initial setup.
This adds a new scalar pass that reads a file with samples generated
by 'perf' during runtime. The samples read from the profile are
incorporated and emmited as IR metadata reflecting that profile.

The profile file is assumed to have been generated by an external
profile source. The profile information is converted into IR metadata,
which is later used by the analysis routines to estimate block
frequencies, edge weights and other related data.

External profile information files have no fixed format, each profiler
is free to define its own. This includes both the on-disk representation
of the profile and the kind of profile information stored in the file.
A common kind of profile is based on sampling (e.g., perf), which
essentially counts how many times each line of the program has been
executed during the run.

The SampleProfileLoader pass is organized as a scalar transformation.
On startup, it reads the file given in -sample-profile-file to
determine what kind of profile it contains.  This file is assumed to
contain profile information for the whole application. The profile
data in the file is read and incorporated into the internal state of
the corresponding profiler.

To facilitate testing, I've organized the profilers to support two file
formats: text and native. The native format is whatever on-disk
representation the profiler wants to support, I think this will mostly
be bitcode files, but it could be anything the profiler wants to
support. To do this, every profiler must implement the
SampleProfile::loadNative() function.

The text format is mostly meant for debugging. Records are separated by
newlines, but each profiler is free to interpret records as it sees fit.
Profilers must implement the SampleProfile::loadText() function.

Finally, the pass will call SampleProfile::emitAnnotations() for each
function in the current translation unit. This function needs to
translate the loaded profile into IR metadata, which the analyzer will
later be able to use.

This patch implements the first steps towards the above design. I've
implemented a sample-based flat profiler. The format of the profile is
fairly simplistic. Each sampled function contains a list of relative
line locations (from the start of the function) together with a count
representing how many samples were collected at that line during
execution. I generate this profile using perf and a separate converter
tool.

Currently, I have only implemented a text format for these profiles. I
am interested in initial feedback to the whole approach before I send
the other parts of the implementation for review.

This patch implements:

- The SampleProfileLoader pass.
- The base ExternalProfile class with the core interface.
- A SampleProfile sub-class using the above interface. The profiler
  generates branch weight metadata on every branch instructions that
  matches the profiles.
- A text loader class to assist the implementation of
  SampleProfile::loadText().
- Basic unit tests for the pass.

Additionally, the patch uses profile information to compute branch
weights based on instruction samples.

This patch converts instruction samples into branch weights. It
does a fairly simplistic conversion:

Given a multi-way branch instruction, it calculates the weight of
each branch based on the maximum sample count gathered from each
target basic block.

Note that this assignment of branch weights is somewhat lossy and can be
misleading. If a basic block has more than one incoming branch, all the
incoming branches will get the same weight. In reality, it may be that
only one of them is the most heavily taken branch.

I will adjust this assignment in subsequent patches.

llvm-svn: 194566
2013-11-13 12:22:21 +00:00
Nadav Rotem ea186b9515 Update the docs to match the function name.
llvm-svn: 194537
2013-11-13 01:12:01 +00:00
Nadav Rotem 0ed2fdb5af Fold (iszero(A&K1) | iszero(A&K2)) -> (A&(K1|K2)) != (K1|K2) if we know that K1 and K2 are 'one-hot' (only one bit is on).
llvm-svn: 194525
2013-11-12 22:38:59 +00:00
Nadav Rotem 53d32211b7 FoldBranchToCommonDest merges branches into a single branch with or/and of the condition. It has a heuristics for estimating when some of the dependencies are processed by out-of-order processors. This patch adds another rule to the heuristics that says that if the "BonusInstruction" that we speculatively execute is used by the condition of the second branch then it is okay to hoist it. This change exposes more opportunities for other passes to transform the code. It does not matter that much that we if-convert the code because the selectiondag builder splits or/and branches into multiple branches when profitable.
llvm-svn: 194524
2013-11-12 22:37:16 +00:00
Rafael Espindola dd8757abbc Corruptly merge constants with explicit and implicit alignments.
Constant merge can merge a constant with implicit alignment with one that has
explicit alignment. Before this change it was assuming that the explicit
alignment was higher than the implicit one, causing the result to be under
aligned in some cases.

Fixes pr17815.

Patch by Chris Smowton!

llvm-svn: 194506
2013-11-12 20:21:43 +00:00
Benjamin Kramer 7c30260ab3 SimplifyCFG: Use existing constant folding logic when forming switch tables.
Both simpler and more powerful than the hand-rolled folding logic.

llvm-svn: 194475
2013-11-12 12:24:36 +00:00
Shuxin Yang f1ec34bdfd Correct a glitch in r194424 which may invalidate iterator.
llvm-svn: 194457
2013-11-12 08:33:03 +00:00
Yuchen Wu 062f24c973 llvm-cov: Added call to update run/program counts.
Also updated test files that were generated from this change.

llvm-svn: 194453
2013-11-12 04:59:08 +00:00
Shuxin Yang 3168ab3376 Fix PR17952.
The symptom is that an assertion is triggered. The assertion was added by
me to detect the situation when value is propagated from dead blocks.
(We can certainly get rid of assertion; it is safe to do so, because propagating
 value from dead block to alive join node is certainly ok.)

  The root cause of this bug is : edge-splitting is conducted on the fly,
the edge being split could be a dead edge, therefore the block that 
split the critial edge needs to be flagged "dead" as well.

  There are 3 ways to fix this bug:
  1) Get rid of the assertion as I mentioned eariler 
  2) When an dead edge is split, flag the inserted block "dead".
  3) proactively split the critical edges connecting dead and live blocks when
     new dead blocks are revealed.

  This fix go for 3) with additional 2 LOC.

  Testing case was added by Rafael the other day.

llvm-svn: 194424
2013-11-11 22:00:23 +00:00
Renato Golin 3f67a7de36 Move debug message in vectorizer
No functional change, just better reporting.

llvm-svn: 194388
2013-11-11 16:27:35 +00:00
Evgeniy Stepanov 560e089355 [msan] Propagate origin for insertvalue, extractvalue.
llvm-svn: 194374
2013-11-11 13:37:10 +00:00
Bill Wendling fed6c220ec Revert "Resurrect r191017 " GVN proceeds in the presence of dead code" plus a fix to PR17307 & 17308."
This causes PR17852.

This reverts commit d93e8a06b2ca09ab18f390cd514b7443e2e571f7.

Conflicts:
	test/Transforms/GVN/cond_br2.ll

llvm-svn: 194348
2013-11-10 07:34:34 +00:00
Matt Arsenault c900303e2f Use type form of getIntPtrType.
This should be inconsequential and is work
towards removing the default address space
arguments.

llvm-svn: 194347
2013-11-10 04:46:57 +00:00
Nadav Rotem 5ba1c6ced8 SimplifyCFG has a heuristics for out-of-order processors that decides when it is worthwhile to merge branches. It tries to estimate if the operands of the instruction that we want to hoist are ready. This commit marks function arguments as 'ready' because they require no calculation. This boosts libquantum and a few other workloads from the testsuite.
llvm-svn: 194346
2013-11-10 04:13:31 +00:00
Matt Arsenault 5bcefabcda Teach MergeFunctions about address spaces
llvm-svn: 194342
2013-11-10 01:44:37 +00:00
Hal Finkel 1a642aef37 Remove dead code from LoopUnswitch
LoopUnswitch's code simplification routine has logic to convert conditional
branches into unconditional branches, after unswitching makes the condition
constant, and then remove any blocks that renders dead. Unfortunately, this
code is dead, currently broken, and furthermore, has never been alive (at least
as far back at 2006).

No functionality change intended.

llvm-svn: 194277
2013-11-08 19:58:21 +00:00
Michael Gottesman 24b2f6fdda [objc-arc] Convert the one directional retain/release relation assert to a conditional check + fail.
Due to the previously added overflow checks, we can have a retain/release
relation that is one directional. This occurs specifically when we run into an
additive overflow causing us to drop state in only one direction. If that
occurs, we should bail and not optimize that retain/release instead of
asserting.

Apologies for the size of the testcase. It is necessary to cause the additive
cfg overflow to trigger.

rdar://15377890

llvm-svn: 194083
2013-11-05 16:02:40 +00:00
Hal Finkel 081eaef6fa Add a runtime unrolling parameter to the LoopUnroll pass constructor
As with the other loop unrolling parameters (the unrolling threshold, partial
unrolling, etc.) runtime unrolling can now also be controlled via the
constructor. This will be necessary for moving non-trivial unrolling late in
the pass manager (after loop vectorization).

No functionality change intended.

llvm-svn: 194027
2013-11-05 00:08:03 +00:00
Shuxin Yang d1382b6c31 Remove dead code
llvm-svn: 194017
2013-11-04 21:44:01 +00:00
Benjamin Kramer 9e7f7c7fdb SLPVectorizer: Use properlyDominates to satisfy the irreflexivity of a strict weak ordering.
STL debug mode checks this.

llvm-svn: 194015
2013-11-04 21:34:55 +00:00
Matt Arsenault 243140f2fd Scalarize select vector arguments when extracted.
When the elements are extracted from a select on vectors
or a vector select, do the select on the extracted scalars
from the input if there is only one use.

llvm-svn: 194013
2013-11-04 20:36:06 +00:00
Benjamin Kramer 191ba00b83 SLPVectorizer: Add a missing pair of parens. No functionality change.
llvm-svn: 193958
2013-11-03 12:54:32 +00:00
Benjamin Kramer 91e8f3c348 SLPVectorizer: When CSEing generated gathers only scan blocks containing them.
Instead of doing a RPO traversal of the whole function remember the blocks
containing gathers (typically <= 2) and scan them in dominator-first order.

The actual CSE is still quadratic, but I'm not confident that adding a
scoped hash table here is worth it as we're only looking at the generated
instructions and not arbitrary code.

llvm-svn: 193956
2013-11-03 12:27:52 +00:00
David Majnemer 120f4a06fd Revert "Inliner: Handle readonly attribute per argument when adding memcpy"
This reverts commit r193356, it caused PR17781.

A reduced test case covering this regression has been added to the test suite.

llvm-svn: 193955
2013-11-03 12:22:13 +00:00
David Majnemer 927df85de0 Spell "Actual" correctly
llvm-svn: 193954
2013-11-03 11:09:39 +00:00
Bob Wilson d8d92d90fa Convert calls to __sinpi and __cospi into __sincospi_stret
This adds an SimplifyLibCalls case which converts the special __sinpi and
__cospi (float & double variants) into a __sincospi_stret where appropriate to
remove duplicated work.

Patch by Tim Northover

llvm-svn: 193943
2013-11-03 06:48:38 +00:00
Benjamin Kramer 089c1e4f6d SLPVectorizer: Remove duplicated function.
llvm-svn: 193927
2013-11-02 14:46:27 +00:00
Benjamin Kramer 568a1cd9df LoopVectorize: Remove quadratic behavior the local CSE.
Doing this with a hash map doesn't change behavior and avoids calling
isIdenticalTo O(n^2) times. This should probably eventually move into a utility
class shared with EarlyCSE and the limited CSE in the SLPVectorizer.

llvm-svn: 193926
2013-11-02 13:39:00 +00:00
Arnold Schwaighofer d0789cdffe LoopVectorizer: Move cse code into its own function
llvm-svn: 193895
2013-11-01 23:28:54 +00:00
Arnold Schwaighofer a846a7f8f0 LoopVectorizer: Perform redundancy elimination on induction variables
When the loop vectorizer was part of the SCC inliner pass manager gvn would
run after the loop vectorizer followed by instcombine. This way redundancy
(multiple uses) were removed and instcombine could perform scalarization on the
induction variables. Having moved the loop vectorizer to later we no longer run
any form of redundancy elimination before we perform instcombine. This caused
vectorized induction variables to survive that did not before.

On a recent iMac this helps linpack back from 6000Mflops to 7000Mflops.

This should also help lpbench and paq8p.

I ran a Release (without Asserts) build over the test-suite and did not see any
negative impact on compile time.

radar://15339680

llvm-svn: 193891
2013-11-01 22:18:19 +00:00
Benjamin Kramer 1fbcdca9e3 LoopVectorize: Look for consecutive acces in GEPs with trailing zero indices
If we have a pointer to a single-element struct we can still build wide loads
and stores to it (if there is no padding).

llvm-svn: 193860
2013-11-01 14:09:50 +00:00
Arnold Schwaighofer 70a4665f55 LoopVectorizer: If dependency checks fail try runtime checks
When a dependence check fails we can still try to vectorize loops with runtime
array bounds checks.

This helps linpack to vectorize a loop in dgefa. And we are back to 2x of the
scalar performance on a corei7-avx.

radar://15339680

llvm-svn: 193853
2013-11-01 03:05:07 +00:00
Arnold Schwaighofer 1ca922e296 LoopVectorizer: Clear all member data structures in RuntimeCheck.reset()
Clear all data structures when resetting the RuntimeCheck data structure.

No test case. This was exposed by an upcomming change.

llvm-svn: 193852
2013-11-01 03:05:04 +00:00
Manman Ren 87a2adc7fe Do not convert "call asm" to "invoke asm" in Inliner.
Given that backend does not handle "invoke asm" correctly ("invoke asm" will be
handled by SelectionDAGBuilder::visitInlineAsm, which does not have the right
setup for LPadToCallSiteMap) and we already made the assumption that inline asm
does not throw in InstCombiner::visitCallSite, we are going to make the same
assumption in Inliner to make sure we don't convert "call asm" to "invoke asm".

If it becomes necessary to add support for "invoke asm" later on, we will need
to modify the backend as well as remove the assumptions that inline asm does
not throw.

Fix rdar://15317907

llvm-svn: 193808
2013-10-31 21:56:03 +00:00
Rafael Espindola 282a47037b Use LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN instead of the "dso list".
There are two ways one could implement hiding of linkonce_odr symbols in LTO:
* LLVM tells the linker which symbols can be hidden if not used from native
  files.
* The linker tells LLVM which symbols are not used from other object files,
  but will be put in the dso symbol table if present.

GOLD's API is the second option. It was implemented almost 1:1 in llvm by
passing the list down to internalize.

LLVM already had partial support for the first option. It is also very similar
to how ld64 handles hiding these symbols when *not* doing LTO.

This patch then
* removes the APIs for the DSO list.
* marks LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN all linkonce_odr unnamed_addr
  global values and other linkonce_odr whose address is not used.
* makes the gold plugin responsible for handling the API mismatch.

llvm-svn: 193800
2013-10-31 20:51:58 +00:00
Rafael Espindola 6554e5a94d Merge CallGraph and BasicCallGraph.
llvm-svn: 193734
2013-10-31 03:03:55 +00:00
Matt Arsenault 38b8ecf378 Teach scalarrepl about address spaces
llvm-svn: 193720
2013-10-30 22:54:58 +00:00
Matt Arsenault 614ea99da7 Fix GVN creating bitcast between address spaces
llvm-svn: 193710
2013-10-30 19:05:41 +00:00
Arnold Schwaighofer 77af0f6e82 ARM cost model: Account for zero cost scalar SROA instructions
By vectorizing a series of srl, or, ... instructions we have obfuscated the
intention so much that the backend does not know how to fold this code away.

radar://15336950

llvm-svn: 193573
2013-10-29 01:33:53 +00:00
Arnold Schwaighofer 86252451c4 SLPVectorizer: Use vector type for vectorized memory operations
No test case, because with the current cost model we don't see a difference.
An upcoming ARM memory cost model change will expose and test this bug.

radar://15332579

llvm-svn: 193572
2013-10-29 01:33:50 +00:00
Shuxin Yang 2e1890e18b Revert r193251 : Use address-taken to disambiguate global variable and indirect memops.
llvm-svn: 193489
2013-10-27 03:08:44 +00:00
Wan Xiaofei be640b28c0 Quick look-up for block in loop.
This patch implements quick look-up for block in loop by maintaining a hash set for blocks.
It improves the efficiency of loop analysis a lot, the biggest improvement could be 5-6%(458.sjeng).
Below are the compilation time for our benchmark in llc before & after the patch.

Benchmark	llc - trunk		llc - patched	
401.bzip2	0.339081	100.00%	0.329657	102.86%
403.gcc		19.853966	100.00%	19.605466	101.27%
429.mcf		0.049823	100.00%	0.048451	102.83%
433.milc	0.514898	100.00%	0.510217	100.92%
444.namd	1.109328	100.00%	1.103481	100.53%
445.gobmk	4.988028	100.00%	4.929114	101.20%
456.hmmer	0.843871	100.00%	0.825865	102.18%
458.sjeng	0.754238	100.00%	0.714095	105.62%
464.h264ref	2.9668		100.00%	2.90612		102.09%
471.omnetpp	4.556533	100.00%	4.511886	100.99%
bitmnp01	0.038168	100.00%	0.0357		106.91%
idctrn01	0.037745	100.00%	0.037332	101.11%
libquake2	3.78689		100.00%	3.76209		100.66%
libquake_	2.251525	100.00%	2.234104	100.78%
linpack		0.033159	100.00%	0.032788	101.13%
matrix01	0.045319	100.00%	0.043497	104.19%
nbench		0.333161	100.00%	0.329799	101.02%
tblook01	0.017863	100.00%	0.017666	101.12%
ttsprk01	0.054337	100.00%	0.053057	102.41%

Reviewer	: Andrew Trick <atrick@apple.com>, Hal Finkel <hfinkel@anl.gov>
Approver	: Andrew Trick <atrick@apple.com>
Test		: Pass make check-all & llvm test-suite

llvm-svn: 193460
2013-10-26 03:08:02 +00:00
Andrew Trick 57243da70f Fix SCEVExpander: don't try to expand quadratic recurrences outside a loop.
Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
(affecting trunk and 3.3)

When SCEV expands a recurrence outside of a loop it attempts to scale
by the stride of the recurrence. Chained recurrences don't work that
way. We could compute binomial coefficients, but would hve to
guarantee that the chained AddRec's are in a perfectly reduced form.

llvm-svn: 193438
2013-10-25 21:35:56 +00:00
Rafael Espindola 7749d7ccc7 Handle calls and invokes in GlobalStatus.
This patch teaches GlobalStatus to analyze a call that uses the global value as
a callee, not as an argument.

With this change internalize call handle the common use of linkonce_odr
functions. This reduces the number of linkonce_odr functions in a LTO build of
clang (checked with the emit-llvm gold plugin option) from 1730 to 60.

llvm-svn: 193436
2013-10-25 21:29:52 +00:00
Hal Finkel 02f562df43 LoopVectorizer: Don't attempt to vectorize extractelement instructions
The loop vectorizer does not currently understand how to vectorize
extractelement instructions. The existing check, which excluded all
vector-valued instructions, did not catch extractelement instructions because
it checked only the return value. As a result, vectorization would proceed,
producing illegal instructions like this:

  %58 = extractelement <2 x i32> %15, i32 0
  %59 = extractelement i32 %58, i32 0

where the second extractelement is illegal because its first operand is not a vector.

llvm-svn: 193434
2013-10-25 20:40:15 +00:00
Tom Stellard bc7d87f07c Inliner: Handle readonly attribute per argument when adding memcpy
Patch by: Vincent Lejeune

llvm-svn: 193356
2013-10-24 16:38:33 +00:00
Renato Golin 1ba143e140 Mark vector loops as already vectorized
Make sure we mark all loops (scalar and vector) when vectorizing,
so that we don't try to vectorize them anymore. Also, set unroll
to 1, since this is what we check for on early exit.

llvm-svn: 193349
2013-10-24 14:50:51 +00:00
Nuno Lopes 340b0463e6 fix PR17635: false positive with packed structures
LLVM optimizers may widen accesses to packed structures that overflow the structure itself, but should be in bounds up to the alignment of the object

llvm-svn: 193317
2013-10-24 09:17:24 +00:00
Juergen Ributzka d04d096ecf Fix a bug in LinearFunctionTestReplace that created invalid loop exit checks.
Reviewed by Andy

llvm-svn: 193303
2013-10-24 05:29:56 +00:00
Andrew Trick ada2356ac9 Clarify comments in genLoopLimit.
llvm-svn: 193292
2013-10-24 00:43:38 +00:00
Yuchen Wu 3197b25b27 Fixed comment typo in GCOVProfiling.cpp
llvm-svn: 193268
2013-10-23 20:35:00 +00:00
Shuxin Yang e4fb375995 Use address-taken to disambiguate global variable and indirect memops.
Major steps include:
 1). introduces a not-addr-taken bit-field in GlobalVariable
 2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable 
    dosen't have its address taken.
 3). AA use this info for disambiguation. 

llvm-svn: 193251
2013-10-23 17:28:19 +00:00
Eric Christopher 874fa0f6c7 Fix spelling, grammar, and match naming convention for test files.
llvm-svn: 193130
2013-10-21 23:14:06 +00:00
Tom Stellard e1631ddf93 SimplifyCFG: Don't duplicate calls to functions marked noduplicate v2
v2:
  - Use CI->cannotDuplicate()

llvm-svn: 193115
2013-10-21 20:07:30 +00:00
Matt Arsenault 404c60a7c3 Use more type helper functions
llvm-svn: 193109
2013-10-21 19:43:56 +00:00