Commit Graph

11451 Commits

Author SHA1 Message Date
Chandler Carruth 1bf38c6a71 Fix a really nasty SROA bug with how we handled out-of-bounds memcpy
intrinsics.

Reported on the list by Evan with a couple of attempts to fix, but it
took a while to dig down to the root cause. There are two overlapping
bugs here, both centering around the circumstance of discovering
a memcpy operand which is known to be completely outside the bounds of
the alloca.

First, we need to kill the *other* side of the memcpy if it was added to
this alloca. Otherwise we'll factor it into our slicing and try to
rewrite it even though we know for a fact that it is dead. This is made
more tricky because we can visit the sides in either order. So we have
to both kill the other side and skip instructions marked as dead. The
latter really should be goodness in every case, but here is a matter of
correctness.

Second, we need to actually remove the *uses* of the alloca by the
memcpy when queuing it for later deletion. Otherwise it may still be
using the alloca when we go to promote it (if the rewrite re-uses the
existing alloca instruction). Do this by factoring out the
use-clobbering used when for nixing a Phi argument and re-using it
across the operands of a to-be-deleted instruction.

llvm-svn: 199590
2014-01-19 12:16:54 +00:00
Arnold Schwaighofer cc742dd9e4 LoopVectorizer: A reduction that has multiple uses of the reduction value is not
a reduction.

Really. Under certain circumstances (the use list of an instruction has to be
set up right - hence the extra pass in the test case) we would not recognize
when a value in a potential reduction cycle was used multiple times by the
reduction cycle.

Fixes PR18526.
radar://15851149

llvm-svn: 199570
2014-01-19 03:18:31 +00:00
Nick Lewycky a6a17d77d2 Don't refuse to transform constexpr(call(arg, ...)) to call(constexpr(arg), ...)) just because the function has multiple return values even if their return types are the same. Patch by Eduard Burtescu!
llvm-svn: 199564
2014-01-18 22:47:12 +00:00
Benjamin Kramer fea9ac99b0 InstCombine: Make the (fmul X, -1.0) -> (fsub -0.0, X) transform handle vectors too.
PR18532.

llvm-svn: 199553
2014-01-18 16:43:14 +00:00
Owen Anderson 48b842ef7c Fix more instances of dropped fast math flags when optimizing FADD instructions. All found by inspection (aka grep).
llvm-svn: 199528
2014-01-18 00:48:14 +00:00
Kostya Serebryany 714c67c31e [asan] extend asan-coverage (still experimental).
- add a mode for collecting per-block coverage (-asan-coverage=2).
   So far the implementation is naive (all blocks are instrumented),
   the performance overhead on top of asan could be as high as 30%.
 - Make sure the one-time calls to __sanitizer_cov are moved to function buttom,
   which in turn required to copy the original debug info into the call insn.

Here is the performance data on SPEC 2006
(train data, comparing asan with asan-coverage={0,1,2}):

                             asan+cov0     asan+cov1      diff 0-1    asan+cov2       diff 0-2      diff 1-2
       400.perlbench,        65.60,        65.80,         1.00,        76.20,         1.16,         1.16
           401.bzip2,        65.10,        65.50,         1.01,        75.90,         1.17,         1.16
             403.gcc,         1.64,         1.69,         1.03,         2.04,         1.24,         1.21
             429.mcf,        21.90,        22.60,         1.03,        23.20,         1.06,         1.03
           445.gobmk,       166.00,       169.00,         1.02,       205.00,         1.23,         1.21
           456.hmmer,        88.30,        87.90,         1.00,        91.00,         1.03,         1.04
           458.sjeng,       210.00,       222.00,         1.06,       258.00,         1.23,         1.16
      462.libquantum,         1.73,         1.75,         1.01,         2.11,         1.22,         1.21
         464.h264ref,       147.00,       152.00,         1.03,       160.00,         1.09,         1.05
         471.omnetpp,       115.00,       116.00,         1.01,       140.00,         1.22,         1.21
           473.astar,       133.00,       131.00,         0.98,       142.00,         1.07,         1.08
       483.xalancbmk,       118.00,       120.00,         1.02,       154.00,         1.31,         1.28
            433.milc,        19.80,        20.00,         1.01,        20.10,         1.02,         1.01
            444.namd,        16.20,        16.20,         1.00,        17.60,         1.09,         1.09
          447.dealII,        41.80,        42.20,         1.01,        43.50,         1.04,         1.03
          450.soplex,         7.51,         7.82,         1.04,         8.25,         1.10,         1.05
          453.povray,        14.00,        14.40,         1.03,        15.80,         1.13,         1.10
             470.lbm,        33.30,        34.10,         1.02,        34.10,         1.02,         1.00
         482.sphinx3,        12.40,        12.30,         0.99,        13.00,         1.05,         1.06

llvm-svn: 199488
2014-01-17 11:00:30 +00:00
Quentin Colombet dc0b2ea2bc [opt][PassInfo] Allow opt to run passes that need target machine.
When registering a pass, a pass can now specify a second construct that takes as
argument a pointer to TargetMachine.
The PassInfo class has been updated to reflect that possibility.
If such a constructor exists opt will use it instead of the default constructor
when instantiating the pass.

Since such IR passes are supposed to be rare, no specific support has been
added to this commit to allow an easy registration of such a pass.
In other words, for such pass, the initialization function has to be
hand-written (see CodeGenPrepare for instance).

Now, codegenprepare can be tested using opt:
opt -codegenprepare -mtriple=mytriple input.ll

llvm-svn: 199430
2014-01-16 21:44:34 +00:00
Owen Anderson e7321660c1 Fix two cases where we could lose fast math flags when optimizing FADD expressions.
llvm-svn: 199427
2014-01-16 21:26:02 +00:00
Owen Anderson 4557a156e3 Fix an instance where we would drop fast math flags when performing an fdiv to reciprocal multiply transformation.
llvm-svn: 199425
2014-01-16 21:07:52 +00:00
Owen Anderson e8537fc7e0 Fix a bug in InstCombine where we failed to preserve fast math flags when optimizing an FMUL expression.
llvm-svn: 199424
2014-01-16 20:59:41 +00:00
Owen Anderson f74cfe031f Teach InstCombine that (fmul X, -1.0) can be simplified to (fneg X), which LLVM expresses as (fsub -0.0, X).
llvm-svn: 199420
2014-01-16 20:36:42 +00:00
Evgeniy Stepanov 13665367a0 [asan] Remove -fsanitize-address-zero-base-shadow command line
flag from clang, and disable zero-base shadow support on all platforms
where it is not the default behavior.

- It is completely unused, as far as we know.
- It is ABI-incompatible with non-zero-base shadow, which means all
objects in a process must be built with the same setting. Failing to
do so results in a segmentation fault at runtime.
- It introduces a backward dependency of compiler-rt on user code,
which is uncommon and complicates testing.

This is the LLVM part of a larger change.

llvm-svn: 199371
2014-01-16 10:19:12 +00:00
Hans Wennborg 4744ac1733 Switch-to-lookup tables: set threshold to 3 cases
There has been an old FIXME to find the right cut-off for when it's worth
analyzing and potentially transforming a switch to a lookup table.

The switches always have two or more cases. I could not measure any speed-up
by transforming a switch with two cases. A switch with three cases gets a nice
speed-up, and I couldn't measure any compile-time regression, so I think this
is the right threshold.

In a Clang self-host, this causes 480 new switches to be transformed,
and reduces the final binary size with 8 KB.

llvm-svn: 199294
2014-01-15 05:00:27 +00:00
Arnold Schwaighofer dc4c9460a2 LoopVectorize: Only strip casts from integer types when replacing symbolic
strides

Fixes PR18480.

llvm-svn: 199291
2014-01-15 03:35:46 +00:00
Matt Arsenault 2d353d1a10 Do pointer cast simplifications on addrspacecast
llvm-svn: 199254
2014-01-14 20:00:45 +00:00
Matt Arsenault f08a44f903 Remove a check for an illegal condition.
Bitcasts can't be between address spaces anymore.

llvm-svn: 199253
2014-01-14 19:56:57 +00:00
Matt Arsenault e55a2c2e6b Make nocapture analysis work with addrspacecast
llvm-svn: 199246
2014-01-14 19:11:52 +00:00
Duncan P. N. Exon Smith 93be7c4fb3 Reapply "LTO: add API to set strategy for -internalize"
Reapply r199191, reverted in r199197 because it carelessly broke
Other/link-opts.ll.  The problem was that calling
createInternalizePass("main") would select
createInternalizePass(bool("main")) instead of
createInternalizePass(ArrayRef<const char *>("main")).  This commit
fixes the bug.

The original commit message follows.

Add API to LTOCodeGenerator to specify a strategy for the -internalize
pass.

This is a new attempt at Bill's change in r185882, which he reverted in
r188029 due to problems with the gold linker.  This puts the onus on the
linker to decide whether (and what) to internalize.

In particular, running internalize before outputting an object file may
change a 'weak' symbol into an internal one, even though that symbol
could be needed by an external object file --- e.g., with arclite.

This patch enables three strategies:

- LTO_INTERNALIZE_FULL: the default (and the old behaviour).
- LTO_INTERNALIZE_NONE: skip -internalize.
- LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
  visibility.

LTO_INTERNALIZE_FULL should be used when linking an executable.

Outputting an object file (e.g., via ld -r) is more complicated, and
depends on whether hidden symbols should be internalized.  E.g., for
ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
LTO_INTERNALIZE_HIDDEN can be used otherwise.  However,
LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
eventually need to link with others.

lto_codegen_set_internalize_strategy() sets the strategy for subsequent
calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().

<rdar://problem/14334895>

llvm-svn: 199244
2014-01-14 18:52:17 +00:00
Nico Rieck 7157bb765e Decouple dllexport/dllimport from linkage
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.

Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:

  define available_externally dllimport void @f() {}
  @Var = dllexport global i32 1, align 4

Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.

llvm-svn: 199218
2014-01-14 15:22:47 +00:00
Nico Rieck 9d2e0df049 Revert "Decouple dllexport/dllimport from linkage"
Revert this for now until I fix an issue in Clang with it.

This reverts commit r199204.

llvm-svn: 199207
2014-01-14 12:38:32 +00:00
Nico Rieck e43aaf7967 Decouple dllexport/dllimport from linkage
Representing dllexport/dllimport as distinct linkage types prevents using
these attributes on templates and inline functions.

Instead of introducing further mixed linkage types to include linkonce and
weak ODR, the old import/export linkage types are replaced with a new
separate visibility-like specifier:

  define available_externally dllimport void @f() {}
  @Var = dllexport global i32 1, align 4

Linkage for dllexported globals and functions is now equal to their linkage
without dllexport. Imported globals and functions must be either
declarations with external linkage, or definitions with
AvailableExternallyLinkage.

llvm-svn: 199204
2014-01-14 11:55:03 +00:00
NAKAMURA Takumi 23c0ab53b2 Revert r199191, "LTO: add API to set strategy for -internalize"
Please update also Other/link-opts.ll, in next time.

llvm-svn: 199197
2014-01-14 09:40:18 +00:00
Duncan P. N. Exon Smith 43ea3478bf LTO: add API to set strategy for -internalize
Add API to LTOCodeGenerator to specify a strategy for the -internalize
pass.

This is a new attempt at Bill's change in r185882, which he reverted in
r188029 due to problems with the gold linker.  This puts the onus on the
linker to decide whether (and what) to internalize.

In particular, running internalize before outputting an object file may
change a 'weak' symbol into an internal one, even though that symbol
could be needed by an external object file --- e.g., with arclite.

This patch enables three strategies:

- LTO_INTERNALIZE_FULL: the default (and the old behaviour).
- LTO_INTERNALIZE_NONE: skip -internalize.
- LTO_INTERNALIZE_HIDDEN: only -internalize symbols with hidden
  visibility.

LTO_INTERNALIZE_FULL should be used when linking an executable.

Outputting an object file (e.g., via ld -r) is more complicated, and
depends on whether hidden symbols should be internalized.  E.g., for
ld -r, LTO_INTERNALIZE_NONE can be used when -keep_private_externs, and
LTO_INTERNALIZE_HIDDEN can be used otherwise.  However,
LTO_INTERNALIZE_FULL is inappropriate, since the output object file will
eventually need to link with others.

lto_codegen_set_internalize_strategy() sets the strategy for subsequent
calls to lto_codegen_write_merged_modules() and lto_codegen_compile*().

<rdar://problem/14334895>

llvm-svn: 199191
2014-01-14 06:37:26 +00:00
Chandler Carruth 73523021d0 [PM] Split DominatorTree into a concrete analysis result object which
can be used by both the new pass manager and the old.

This removes it from any of the virtual mess of the pass interfaces and
lets it derive cleanly from the DominatorTreeBase<> template. In turn,
tons of boilerplate interface can be nuked and it turns into a very
straightforward extension of the base DominatorTree interface.

The old analysis pass is now a simple wrapper. The names and style of
this split should match the split between CallGraph and
CallGraphWrapperPass. All of the users of DominatorTree have been
updated to match using many of the same tricks as with CallGraph. The
goal is that the common type remains the resulting DominatorTree rather
than the pass. This will make subsequent work toward the new pass
manager significantly easier.

Also in numerous places things became cleaner because I switched from
re-running the pass (!!! mid way through some other passes run!!!) to
directly recomputing the domtree.

llvm-svn: 199104
2014-01-13 13:07:17 +00:00
Chandler Carruth e509db410a [PM] Pull the generic graph algorithms and data structures for dominator
trees into the Support library.

These are all expressed in terms of the generic GraphTraits and CFG,
with no reliance on any concrete IR types. Putting them in support
clarifies that and makes the fact that the static analyzer in Clang uses
them much more sane. When moving the Dominators.h file into the IR
library I claimed that this was the right home for it but not something
I planned to work on. Oops.

So why am I doing this? It happens to be one step toward breaking the
requirement that IR verification can only be performed from inside of
a pass context, which completely blocks the implementation of
verification for the new pass manager infrastructure. Fixing it will
also allow removing the concept of the "preverify" step (WTF???) and
allow the verifier to cleanly flag functions which fail verification in
a way that precludes even computing dominance information. Currently,
that results in a fatal error even when you ask the verifier to not
fatally error. It's awesome like that.

The yak shaving will continue...

llvm-svn: 199095
2014-01-13 10:52:56 +00:00
Chandler Carruth 5ad5f15cff [cleanup] Move the Dominators.h and Verifier.h headers into the IR
directory. These passes are already defined in the IR library, and it
doesn't make any sense to have the headers in Analysis.

Long term, I think there is going to be a much better way to divide
these matters. The dominators code should be fully separated into the
abstract graph algorithm and have that put in Support where it becomes
obvious that evn Clang's CFGBlock's can use it. Then the verifier can
manually construct dominance information from the Support-driven
interface while the Analysis library can provide a pass which both
caches, reconstructs, and supports a nice update API.

But those are very long term, and so I don't want to leave the really
confusing structure until that day arrives.

llvm-svn: 199082
2014-01-13 09:26:24 +00:00
Chandler Carruth 07baed53e8 Re-sort #include lines again, prior to moving headers around.
llvm-svn: 199080
2014-01-13 08:04:33 +00:00
Hans Wennborg ac114a3ce7 Switch-to-lookup tables: Don't require a result for the default
case when the lookup table doesn't have any holes.

This means we can build a lookup table for switches like this:

  switch (x) {
    case 0: return 1;
    case 1: return 2;
    case 2: return 3;
    case 3: return 4;
    default: exit(1);
  }

The default case doesn't yield a constant result here, but that doesn't matter,
since a default result is only necessary for filling holes in the lookup table,
and this table doesn't have any holes.

This makes us transform 505 more switches in a clang bootstrap, and shaves 164 KB
off the resulting clang binary.

llvm-svn: 199025
2014-01-12 00:44:41 +00:00
Arnold Schwaighofer 66c742aeea LoopVectorizer: Enable strided memory accesses versioning per default
I saw no compile or execution time regressions on x86_64 -mavx -O3.

radar://13075509

llvm-svn: 199015
2014-01-11 20:40:34 +00:00
NAKAMURA Takumi 41c409ce0d LoopVectorize.cpp: Appease MSC16.
Excuse me, I hope msc16 builders would be fine till its end day.
Introduce nullptr then. ;)

llvm-svn: 199001
2014-01-11 09:59:27 +00:00
Diego Novillo 9518b63bfc Extend and simplify the sample profile input file.
1- Use the line_iterator class to read profile files.

2- Allow comments in profile file. Lines starting with '#'
   are completely ignored while reading the profile.

3- Add parsing support for discriminators and indirect call samples.

   Our external profiler can emit more profile information that we are
   currently not handling. This patch does not add new functionality to
   support this information, but it allows profile files to provide it.

   I will add actual support later on (for at least one of these
   features, I need support for DWARF discriminators in Clang).

   A sample line may contain the following additional information:

   Discriminator. This is used if the sampled program was compiled with
   DWARF discriminator support
   (http://wiki.dwarfstd.org/index.php?title=Path_Discriminators). This
   is currently only emitted by GCC and we just ignore it.

   Potential call targets and samples. If present, this line contains a
   call instruction. This models both direct and indirect calls. Each
   called target is listed together with the number of samples. For
   example,

                    130: 7  foo:3  bar:2  baz:7

   The above means that at relative line offset 130 there is a call
   instruction that calls one of foo(), bar() and baz(). With baz()
   being the relatively more frequent call target.

   Differential Revision: http://llvm-reviews.chandlerc.com/D2355

4- Simplify format of profile input file.

   This implements earlier suggestions to simplify the format of the
   sample profile file. The symbol table is not necessary and function
   profiles do not need to know the number of samples in advance.

   Differential Revision: http://llvm-reviews.chandlerc.com/D2419

llvm-svn: 198973
2014-01-10 23:23:51 +00:00
Diego Novillo 0accb3d2bc Propagation of profile samples through the CFG.
This adds a propagation heuristic to convert instruction samples
into branch weights. It implements a similar heuristic to the one
implemented by Dehao Chen on GCC.

The propagation proceeds in 3 phases:

1- Assignment of block weights. All the basic blocks in the function
   are initial assigned the same weight as their most frequently
   executed instruction.

2- Creation of equivalence classes. Since samples may be missing from
   blocks, we can fill in the gaps by setting the weights of all the
   blocks in the same equivalence class to the same weight. To compute
   the concept of equivalence, we use dominance and loop information.
   Two blocks B1 and B2 are in the same equivalence class if B1
   dominates B2, B2 post-dominates B1 and both are in the same loop.

3- Propagation of block weights into edges. This uses a simple
   propagation heuristic. The following rules are applied to every
   block B in the CFG:

   - If B has a single predecessor/successor, then the weight
     of that edge is the weight of the block.

   - If all the edges are known except one, and the weight of the
     block is already known, the weight of the unknown edge will
     be the weight of the block minus the sum of all the known
     edges. If the sum of all the known edges is larger than B's weight,
     we set the unknown edge weight to zero.

   - If there is a self-referential edge, and the weight of the block is
     known, the weight for that edge is set to the weight of the block
     minus the weight of the other incoming edges to that block (if
     known).

Since this propagation is not guaranteed to finalize for every CFG, we
only allow it to proceed for a limited number of iterations (controlled
by -sample-profile-max-propagate-iterations). It currently uses the same
GCC default of 100.

Before propagation starts, the pass builds (for each block) a list of
unique predecessors and successors. This is necessary to handle
identical edges in multiway branches. Since we visit all blocks and all
edges of the CFG, it is cleaner to build these lists once at the start
of the pass.

Finally, the patch fixes the computation of relative line locations.
The profiler emits lines relative to the function header. To discover
it, we traverse the compilation unit looking for the subprogram
corresponding to the function. The line number of that subprogram is the
line where the function begins. That becomes line zero for all the
relative locations.

llvm-svn: 198972
2014-01-10 23:23:46 +00:00
Arnold Schwaighofer c2e9d759f2 LoopVectorizer: Handle strided memory accesses by versioning
for (i = 0; i < N; ++i)
   A[i * Stride1] += B[i * Stride2];

We take loops like this and check that the symbolic strides 'Strided1/2' are one
and drop to the scalar loop if they are not.

This is currently disabled by default and hidden behind the flag
'enable-mem-access-versioning'.

radar://13075509

llvm-svn: 198950
2014-01-10 18:20:32 +00:00
Chandler Carruth d48cdbf0c3 Put the functionality for printing a value to a raw_ostream as an
operand into the Value interface just like the core print method is.
That gives a more conistent organization to the IR printing interfaces
-- they are all attached to the IR objects themselves. Also, update all
the users.

This removes the 'Writer.h' header which contained only a single function
declaration.

llvm-svn: 198836
2014-01-09 02:29:41 +00:00
Hao Liu 26abebbb2c Fix a bug about generating undef operand when optimising shuffle vector and insert element in instruction combine.
llvm-svn: 198730
2014-01-08 03:06:15 +00:00
Chandler Carruth 9aca918df9 Move the LLVM IR asm writer header files into the IR directory, as they
are part of the core IR library in order to support dumping and other
basic functionality.

Rename the 'Assembly' include directory to 'AsmParser' to match the
library name and the only functionality left their -- printing has been
in the core IR library for quite some time.

Update all of the #includes to match.

All of this started because I wanted to have the layering in good shape
before I started adding support for printing LLVM IR using the new pass
infrastructure, and commandline support for the new pass infrastructure.

llvm-svn: 198688
2014-01-07 12:34:26 +00:00
Chandler Carruth 8a8cd2bab9 Re-sort all of the includes with ./utils/sort_includes.py so that
subsequent changes are easier to review. About to fix some layering
issues, and wanted to separate out the necessary churn.

Also comment and sink the include of "Windows.h" in three .inc files to
match the usage in Memory.inc.

llvm-svn: 198685
2014-01-07 11:48:04 +00:00
Andrew Trick e4a18605e0 Reapply r198654 "indvars: sink truncates outside the loop."
This doesn't seem to have actually broken anything. It was paranoia
on my part. Trying again now that bots are more stable.

This is a follow up of the r198338 commit that added truncates for
lcssa phi nodes. Sinking the truncates below the phis cleans up the
loop and simplifies subsequent analysis within the indvars pass.

llvm-svn: 198678
2014-01-07 06:59:12 +00:00
Andrew Trick 3c0ed08996 Revert "indvars: sink truncates outside the loop."
This reverts commit r198654.

One of the bots reported a SciMark failure.

llvm-svn: 198659
2014-01-07 01:50:58 +00:00
Andrew Trick 0b8e3b2cb4 indvars: sink truncates outside the loop.
This is a follow up of the r198338 commit that added truncates for
lcssa phi nodes. Sinking the truncates below the phis cleans up the
loop and simplifies subsequent analysis within the indvars pass.

llvm-svn: 198654
2014-01-07 01:02:55 +00:00
Andrew Trick b70d9780ac 80 col. comment.
llvm-svn: 198653
2014-01-07 01:02:52 +00:00
Andrew Trick 6796ab424c Reapply r198478 "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things."
Now with a fix for PR18384: ValueHandleBase::ValueIsDeleted.

We need to invalidate SCEV's loop info when we delete a block, even if no values are hoisted.

llvm-svn: 198631
2014-01-06 19:43:14 +00:00
Alp Toker f929e09b10 Add missed cleanup from r198456
All other uses of this macro in LLVM/clang have been moved to the function
definition so follow suite (and the usage advice) here too for consistency.

llvm-svn: 198516
2014-01-04 22:47:48 +00:00
Alp Toker 5e9f3265f8 Revert "Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things."
This commit was the source of crasher PR18384:

While deleting: label %for.cond127
An asserting value handle still pointed to this value!
UNREACHABLE executed at llvm/lib/IR/Value.cpp:671!

Reverting to get the builders green, feel free to re-land after fixing up.
(Renato has a handy isolated repro if you need it.)

This reverts commit r198478.

llvm-svn: 198503
2014-01-04 17:00:45 +00:00
Andrew Trick aceac9746d Fix PR18361: Invalidate LoopDispositions after LoopSimplify hoists things.
getSCEV for an ashr instruction creates an intermediate zext
expression when it truncates its operand.

The operand is initially inside the loop, so the narrow zext
expression has a non-loop-invariant loop disposition.

LoopSimplify then runs on an outer loop, hoists the ashr operand, and
properly invalidate the SCEVs that are mapped to value.

The SCEV expression for the ashr is now an AddRec with the hoisted
value as the now loop-invariant start value.

The LoopDisposition of this wide value was properly invalidated during
LoopSimplify.

However, if we later get the ashr SCEV again, we again try to create
the intermediate zext expression. We get the same SCEV that we did
earlier, and it is still cached because it was never mapped to a
Value. When we try to create a new AddRec we abort because we're using
the old non-loop-invariant LoopDisposition.

I don't have a solution for this other than to clear LoopDisposition
when LoopSimplify hoists things.

I think the long-term strategy should be to perform LoopSimplify on
all loops before computing SCEV and before running any loop opts on
individual loops. It's possible we may want to rerun LoopSimplify on
individual loops, but it should rarely do anything, so rarely require
invalidating SCEV.

llvm-svn: 198478
2014-01-04 05:52:49 +00:00
Nico Weber 7408c7066a Add a LLVM_DUMP_METHOD macro.
The motivation is to mark dump methods as used in debug builds so that they can
be called from lldb, but to not do so in release builds so that they can be
dead-stripped.

There's lots of potential follow-up work suggested in the thread
"Should dump methods be LLVM_ATTRIBUTE_USED only in debug builds?" on cfe-dev,
but everyone seems to agreen on this subset.

Macro name chosen by fair coin toss.

llvm-svn: 198456
2014-01-03 22:53:37 +00:00
David Peixotto ea9ba446d5 Fix loop rerolling pass failure with non-consant loop lower bound
The loop rerolling pass was failing with an assertion failure from a
failed cast on loops like this:

  void foo(int *A, int *B, int m, int n) {
    for (int i = m; i < n; i+=4) {
      A[i+0] = B[i+0] * 4;
      A[i+1] = B[i+1] * 4;
      A[i+2] = B[i+2] * 4;
      A[i+3] = B[i+3] * 4;
    }
  }

The code was casting the SCEV-expanded code for the new
induction variable to a phi-node. When the loop had a non-constant
lower bound, the SCEV expander would end the code expansion with an
add insted of a phi node and the cast would fail.

It looks like the cast to a phi node was only needed to get the
induction variable value coming from the backedge to compute the end
of loop condition. This patch changes the loop reroller to compare
the induction variable to the number of times the backedge is taken
instead of the iteration count of the loop. In other words, we stop
the loop when the current value of the induction variable ==
IterationCount-1. Previously, the comparison was comparing the
induction variable value from the next iteration == IterationCount.

This problem only seems to occur on 32-bit targets. For some reason,
the loop is not rerolled on 64-bit targets.

PR18290

llvm-svn: 198425
2014-01-03 17:20:01 +00:00
Hal Finkel decb024c86 Disable compare sinking in CodeGenPrepare when multiple condition registers are available
As noted in the comment above CodeGenPrepare::OptimizeInst, which aggressively
sinks compares to reduce pressure on the condition register(s), for targets
such as PowerPC with multiple condition registers, this may not be the right
thing to do. This adds an HasMultipleConditionRegisters boolean to TLI, and
CodeGenPrepare::OptimizeInst is skipped when HasMultipleConditionRegisters is
true.

This functionality will be used by the PowerPC backend in an upcoming commit.
Especially when the PowerPC backend starts tracking individual condition
register bits as separate allocatable entities (which will happen in this
upcoming commit), this sinking from CodeGenPrepare::OptimizeInst is
significantly suboptimial.

llvm-svn: 198354
2014-01-02 21:13:43 +00:00
Andrew Trick b6bc783060 indvars: cleanup the IV visitor. It does more than gather sext/zext info.
llvm-svn: 198353
2014-01-02 21:12:11 +00:00
Matt Arsenault 461c8e0a8c Delete unread globals through addrspacecast
llvm-svn: 198346
2014-01-02 20:01:43 +00:00
Matt Arsenault da1deabb16 Fix addrspacecast with metadata globals
llvm-svn: 198345
2014-01-02 19:53:49 +00:00
Andrew Trick 020dd898fc indvars: insert truncate at loop boundary to avoid redundant IVs.
When widening an IV to remove s/zext, we generally try to eliminate
the original narrow IV. However, LCSSA phi nodes outside the loop were
still using the original IV. Clean this up more aggressively to avoid
redundancy in generated code.

llvm-svn: 198338
2014-01-02 19:29:38 +00:00
Nico Weber 1226531099 Set LLVM_EXPORTED_SYMBOL_FILE in CMakeLists whose corresponding Makefiles do so.
(unittests/ExecutionEngine/JIT/CMakeLists.txt is still missing for now, since
it handles export files in a strange way: It generates a .exports file from a
.def file instead of the other way round.)

llvm-svn: 198183
2013-12-29 23:06:49 +00:00
Alexander Potapenko 4f0335f863 [ASan] Fix the test for __asan_gen_ globals and actually fix http://llvm.org/bugs/show_bug.cgi?id=17976
by setting the correct linkage (as stated in the bug).

llvm-svn: 198018
2013-12-25 16:46:27 +00:00
Alexander Potapenko daf96ae81b [ASan] Make sure none of the __asan_gen_ global strings end up in the symbol table, add a test.
This should fix http://llvm.org/bugs/show_bug.cgi?id=17976
Another test checking for the global variables' locations and prefixes on Darwin will be committed separately.

llvm-svn: 198017
2013-12-25 14:22:15 +00:00
Andrew Trick 0ba77a0740 Add support to indvars for optimizing sadd.with.overflow.
Split sadd.with.overflow into add + sadd.with.overflow to allow
analysis and optimization. This should ideally be done after
InstCombine, which can perform code motion (eventually indvars should
run after all canonical instcombines). We want ISEL to recombine the
add and the check, at least on x86.

This is currently under an option for reducing live induction
variables: -liv-reduce. The next step is reducing liveness of IVs that
are live out of the overflow check paths. Once the related
optimizations are fully developed, reviewed and tested, I do expect
this to become default.

llvm-svn: 197926
2013-12-23 23:31:49 +00:00
Richard Sandiford 1fb5c13e3a Fix Scalarizer insertion point when replacing PHIs with insertelements
If the Scalarizer scalarized a vector PHI but could not scalarize
all uses of it, it would insert a series of insertelements to reconstruct
the vector PHI value from the scalar ones.  The problem was that it would
emit these insertelements immediately after the PHI, even if there were
other PHIs after it.

llvm-svn: 197909
2013-12-23 14:51:56 +00:00
Richard Sandiford 3548cbb980 Fix Scalarizer handling of vector GEPs with multiple index operands
The old code only worked for one index operand.  Also handle "inbounds".

llvm-svn: 197908
2013-12-23 14:45:00 +00:00
Kostya Serebryany 530e207d8a [asan] don't unpoison redzones on function exit in use-after-return mode.
Summary:
Before this change the instrumented code before Ret instructions looked like:
  <Unpoison Frame Redzones>
  if (Frame != OriginalFrame) // I.e. Frame is fake
     <Poison Complete Frame>

Now the instrumented code looks like:
  if (Frame != OriginalFrame) // I.e. Frame is fake
     <Poison Complete Frame>
  else
     <Unpoison Frame Redzones>

Reviewers: eugenis

Reviewed By: eugenis

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2458

llvm-svn: 197907
2013-12-23 14:15:08 +00:00
Kostya Serebryany ff7bde1582 [asan] produce fewer stores when poisoning stack shadow
llvm-svn: 197904
2013-12-23 09:24:36 +00:00
Justin Bogner 0ba3f211c4 Transforms: Don't create bad weights when eliminating dead cases
If we happen to eliminate every case in a switch that has branch
weights, we currently try to create metadata for the one remaining
branch, triggering an assert. Instead, we need to check that the
metadata we're trying to create is sensible.

llvm-svn: 197791
2013-12-20 08:21:30 +00:00
Kay Tiong Khoo e37d52095e Stay classy (and legal) LLVM. Remove links to 3rd party SMT solver whose links may not be permanent.
llvm-svn: 197713
2013-12-19 18:35:54 +00:00
Kay Tiong Khoo a570b5adb5 Improved fix for PR17827 (instcombine of shift/and/compare).
This change fixes the case of arithmetic shift right - do not attempt to fold that case.
This change also relaxes the conditions when attempting to fold the logical shift right and shift left cases.

No additional IR-level test cases included at this time. See http://llvm.org/bugs/show_bug.cgi?id=17827 for proofs that these are correct transformations.

llvm-svn: 197705
2013-12-19 18:07:17 +00:00
Evgeniy Stepanov a284e559d7 [dfsan] Simplify code after r197677.
llvm-svn: 197679
2013-12-19 14:37:03 +00:00
Evgeniy Stepanov a9164e9e2a Add an explicit insert point argument to SplitBlockAndInsertIfThen.
Currently SplitBlockAndInsertIfThen requires that branch condition is an
Instruction itself, which is very inconvenient, because it is sometimes an
Operator, or even a Constant.

llvm-svn: 197677
2013-12-19 13:29:56 +00:00
Arnold Schwaighofer 50b8302c55 LoopVectorizer: Don't if-convert constant expressions that can trap
A phi node operand or an instruction operand could be a constant expression that
can trap (division). Check that we don't vectorize such cases.

PR16729
radar://15653590

llvm-svn: 197449
2013-12-17 01:11:01 +00:00
Yi Jiang 6ab044ee35 Enable double to float shrinking optimizations for binary functions like 'fmin/fmax'. Fix radar:15283121
llvm-svn: 197434
2013-12-16 22:42:40 +00:00
Hal Finkel f59fd7dcb4 Fix a use-after-free error in GlobalOpt CleanupConstantGlobalUsers
GlobalOpt's CleanupConstantGlobalUsers function uses a worklist array to manage
constant users to be visited. The pointers in this array need to be weak
handles because when we delete a constant array, we may also be holding a
pointer to one of its elements (or an element of one of its elements if we're
dealing with an array of arrays) in the worklist.

Fixes PR17347.

llvm-svn: 197178
2013-12-12 20:45:24 +00:00
Hal Finkel 26fc4c29c6 Initialize the barrier pass llvm::initializeIPO
The barrier pass is a temporary hack, and should go away soon. Nevertheless, if
we don't initialize it, then opt will not understand -barrier, and this will
break bugpoint (because when it dumps the passes from the default pass manager
-barrier will be there).

llvm-svn: 197177
2013-12-12 20:45:08 +00:00
Yi Jiang f92a574246 Resubmit r196544: Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x)
llvm-svn: 197109
2013-12-12 01:55:04 +00:00
NAKAMURA Takumi 8bc9bfaa5a Prune redundant dependencies in LLVMBuild.txt.
llvm-svn: 196988
2013-12-11 00:30:57 +00:00
Reid Kleckner 30b2a9a59f [asan] Fix the coverage.cc test broken by r196939
It was failing because ASan was adding all of the following to one
function:
- dynamic alloca
- stack realignment
- inline asm

This patch avoids making the static alloca dynamic when coverage is
used.

ASan should probably not be inserting empty inline asm blobs to inhibit
duplicate tail elimination.

llvm-svn: 196973
2013-12-10 21:49:28 +00:00
NAKAMURA Takumi 396d4d3c7e Add proper dependencies to LLVMBuild.txt in llvm/lib.
I'll prune redundant deps in LLVMBuild.txt, later.

llvm-svn: 196881
2013-12-10 05:39:34 +00:00
NAKAMURA Takumi e3afe2ef62 Whitespaces.
llvm-svn: 196880
2013-12-10 05:39:12 +00:00
Justin Bogner a41a7b3ee5 Transforms: Don't create bad branch weights when folding a switch
This avoids creating branch weight metadata of length one when we fold
cases into the default of a switch instruction, which was triggering
an assert.

llvm-svn: 196845
2013-12-10 00:13:41 +00:00
Manman Ren 2e06c8c777 Revert 196544 due to internal bot failures.
llvm-svn: 196732
2013-12-08 20:28:33 +00:00
Mark Seaborn 1b3dd3527e Fix inlining to not lose the "cleanup" clause from landingpads
This fixes PR17872.  This bug can lead to C++ destructors not being
called when they should be, when an exception is thrown.

llvm-svn: 196711
2013-12-08 00:51:21 +00:00
Mark Seaborn ef3dbb93ec Fix inlining to not produce duplicate landingpad clauses
Before this change, inlining one "invoke" into an outer "invoke" call
site can lead to the outer landingpad's catch/filter clauses being
copied multiple times into the resulting landingpad.  This happens:

 * when the inlined function contains multiple "resume" instructions,
   because forwardResume() copies the clauses but is called multiple
   times;

 * when the inlined function contains a "resume" and a "call", because
   HandleCallsInBlockInlinedThroughInvoke() copies the clauses but is
   redundant with forwardResume().

Fix this by deduplicating the code.

This problem doesn't lead to any incorrect execution; it's only
untidy.

This change will make fixing PR17872 a little easier.

llvm-svn: 196710
2013-12-08 00:50:58 +00:00
Jakub Staszak 3ab283c157 Don't #include heavy Dominators.h file in LoopInfo.h. This change reduces
overall time of LLVM compilation by ~1%.

llvm-svn: 196667
2013-12-07 21:20:17 +00:00
Matt Arsenault bbf18c6958 Fix assert with copy from global through addrspacecast
llvm-svn: 196638
2013-12-07 02:58:45 +00:00
Duncan P. N. Exon Smith ce5f93efd5 Don't use isNullValue to evaluate ConstantExpr
ConstantExpr can evaluate to false even when isNullValue gives false.

Fixes PR18143.

llvm-svn: 196611
2013-12-06 21:48:36 +00:00
Kostya Serebryany 152d48d360 [asan] fix ndebug build with strict warnings (-Wunused-variable)
llvm-svn: 196574
2013-12-06 09:26:09 +00:00
Kostya Serebryany 4fb7801b3f [asan] rewrite asan's stack frame layout
Summary:
Rewrite asan's stack frame layout.
First, most of the stack layout logic is moved into a separte file
to make it more testable and (potentially) useful for other projects.
Second, make the frames more compact by using adaptive redzones
(smaller for small objects, larger for large objects).
Third, try to minimized gaps due to large alignments (this is hypothetical since
today we don't see many stack vars aligned by more than 32).

The frames indeed become more compact, but I'll still need to run more benchmarks
before committing, but I am sking for review now to get early feedback.

This change will be accompanied by a trivial change in compiler-rt tests
to match the new frame sizes.

Reviewers: samsonov, dvyukov

Reviewed By: samsonov

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D2324

llvm-svn: 196568
2013-12-06 09:00:17 +00:00
Yi Jiang 01cfa94212 Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x)
llvm-svn: 196544
2013-12-05 22:42:50 +00:00
Renato Golin 729a3ae90a Add #pragma vectorize enable/disable to LLVM
The intended behaviour is to force vectorization on the presence
of the flag (either turn on or off), and to continue the behaviour
as expected in its absence. Tests were added to make sure the all
cases are covered in opt. No tests were added in other tools with
the assumption that they should use the PassManagerBuilder in the
same way.

This patch also removes the outdated -late-vectorize flag, which was
on by default and not helping much.

The pragma metadata is being attached to the same place as other loop
metadata, but nothing forbids one from attaching it to a function
(to enable #pragma optimize) or basic blocks (to hint the basic-block
vectorizers), etc. The logic should be the same all around.

Patches to Clang to produce the metadata will be produced after the
initial implementation is agreed upon and committed. Patches to other
vectorizers (such as SLP and BB) will be added once we're happy with
the pass manager changes.

llvm-svn: 196537
2013-12-05 21:20:02 +00:00
Michael Gottesman 2bf0173b16 Change std::deque => std::vector. No functionality change.
There is no reason to use std::deque here over std::vector. Thus given the
performance differences inbetween the two it makes sense to change deque to
vector.

llvm-svn: 196524
2013-12-05 18:42:12 +00:00
Rafael Espindola cdbde3aacc Fix non-deterministic behavior.
We use CSEBlocks to initialize a worklist:

SmallVector<BasicBlock *, 8> CSEWorkList(CSEBlocks.begin(), CSEBlocks.end());

so it must have a deterministic order.

llvm-svn: 196520
2013-12-05 18:28:01 +00:00
Arnold Schwaighofer 7ee53cac80 SLPVectorizer: An in-tree vectorized entry cannot also be a scalar external use
We were creating external uses for scalar values in MustGather entries that also
had a ScalarToTreeEntry (they also are present in a vectorized tuple). This
meant we would keep a value 'alive' as a scalar and vectorized causing havoc.
This is not necessary because when we create a MustGather vector we explicitly
create external uses entries for the insertelement instructions of the
MustGather vector elements.

Fixes PR18129.

radar://15582184

llvm-svn: 196508
2013-12-05 15:14:40 +00:00
Kostya Serebryany 2460c3fc73 [tsan] fix PR18146: sometimes a variable written into vptr could have an integer type (after other optimizations)
llvm-svn: 196507
2013-12-05 15:03:02 +00:00
Alp Toker f907b891da Correct word hyphenations
This patch tries to avoid unrelated changes other than fixing a few
hyphen-related ambiguities and contractions in nearby lines.

llvm-svn: 196471
2013-12-05 05:44:44 +00:00
Yuchen Wu c15bf89122 llvm-cov: Replace size() with empty() in bool check.
llvm-svn: 196400
2013-12-04 19:18:23 +00:00
Daniel Jasper 87a24d5c27 Un-revert r196358: "llvm-cov: Added support for function checksums."
And add the proper fix.

llvm-svn: 196367
2013-12-04 08:57:17 +00:00
Daniel Jasper c176b5d1d6 Revert r196358: "llvm-cov: Added support for function checksums."
This currently breaks clang/test/CodeGen/code-coverage.c. The root cause
is that the newly introduced access to Funcs[j] is out of bounds.

llvm-svn: 196365
2013-12-04 08:23:33 +00:00
Yuchen Wu 06655f3570 llvm-cov: Added support for function checksums.
The function checksums are hashed from the concatenation of the function
name and line number.

llvm-svn: 196358
2013-12-04 06:00:17 +00:00
Yunzhong Gao 9163e8bce6 Teach the internalize pass to skip dllexported symbols because they could be
referenced in a way that even the linker does not see.

Differential Revision: http://llvm-reviews.chandlerc.com/D2280

llvm-svn: 196300
2013-12-03 18:05:14 +00:00
Kay Tiong Khoo d7b00cac10 Use local variable for repeated use rather than 'get' method. No functional change intended.
llvm-svn: 196164
2013-12-02 22:23:32 +00:00
Kay Tiong Khoo 64b732005f Move variables to where they are used and give them better names. No functional change intended.
llvm-svn: 196163
2013-12-02 22:20:40 +00:00
Kay Tiong Khoo 564560f911 Rename variables to be consistent (CST -> Cst). No functional change intended.
llvm-svn: 196161
2013-12-02 22:11:56 +00:00
Mark Seaborn d91fa22b06 InlineFunction.cpp: Remove a return value that is always false
Remove some associated dead code.

This cleanup is associated with PR17872.

llvm-svn: 196147
2013-12-02 20:50:59 +00:00
Kay Tiong Khoo 5389f74655 Conservative fix for PR17827 - don't optimize a shift + and + compare sequence where the shift is logical unless the comparison is unsigned
llvm-svn: 196129
2013-12-02 18:43:59 +00:00
Kostya Serebryany 08b9cf56be [tsan] fix instrumentation of vector vptr updates (https://code.google.com/p/thread-sanitizer/issues/detail?id=43)
llvm-svn: 196079
2013-12-02 08:07:15 +00:00
Bill Wendling cbcb02c35a Use accessor methods instead.
llvm-svn: 196006
2013-12-01 03:40:42 +00:00
Bill Wendling 2798f1ef58 Use 'unsigned char' to get this past gcc error message:
error: invalid conversion from 'unsigned char' to '{anonymous}::Sequence'

llvm-svn: 196004
2013-12-01 03:36:07 +00:00
Stephen Canon c454964c47 Rein in overzealous InstCombine of fptrunc(OP(fpextend, fpextend)).
llvm-svn: 195934
2013-11-28 21:38:05 +00:00
Nadav Rotem b0082d246a PR1860 - We can't save a list of ExtractElement instructions to CSE because some of these instructions
may be removed and optimized in future iterations. Instead we save a list of basic blocks that we need to CSE.

llvm-svn: 195791
2013-11-26 22:24:25 +00:00
Arnold Schwaighofer a2c8e008d2 LoopVectorizer: Truncate i64 trip counts of i32 phis if necessary
In signed arithmetic we could end up with an i64 trip count for an i32 phi.
Because it is signed arithmetic we know that this is only defined if the i32
does not wrap. It is therefore safe to truncate the i64 trip count to a i32
value.

Fixes PR18049.

llvm-svn: 195787
2013-11-26 22:11:23 +00:00
Diego Novillo c0dd1037c8 Refactor some code in SampleProfile.cpp
I'm adding new functionality in the sample profiler. This will
require more data to be kept around for each function, so I moved
the structure SampleProfile that we keep for each function into
a separate class.

There are no functional changes in this patch. It simply provides
a new home where to place all the new data that I need to propagate
weights through edges.

There are some other name and minor edits throughout.

llvm-svn: 195780
2013-11-26 20:37:33 +00:00
Nadav Rotem f9f8482e3a PR18060 - When we RAUW values with ExtractElement instructions in some cases
we generate PHI nodes with multiple entries from the same basic block but
with different values. Enabling CSE on ExtractElement instructions make sure
that all of the RAUWed instructions are the same.

llvm-svn: 195773
2013-11-26 17:29:19 +00:00
Stepan Dyatkovskiy abb8505dc5 PR17925 bugfix.
Short description.

This issue is about case of treating pointers as integers.
We treat pointers as different if they references different address space.
At the same time, we treat pointers equal to integers (with machine address
width). It was a point of false-positive. Consider next case on 32bit machine:

void foo0(i32 addrespace(1)* %p)
void foo1(i32 addrespace(2)* %p)
void foo2(i32 %p)

foo0 != foo1, while
foo1 == foo2 and foo0 == foo2.

As you can see it breaks transitivity. That means that result depends on order
of how functions are presented in module. Next order causes merging of foo0
and foo1: foo2, foo0, foo1
First foo0 will be merged with foo2, foo0 will be erased. Second foo1 will be
merged with foo2.
Depending on order, things could be merged we don't expect to.

The fix:
Forbid to treat any pointer as integer, except for those, who belong to address space 0.

llvm-svn: 195769
2013-11-26 16:11:03 +00:00
Chandler Carruth 6378cf539f [PM] Split the CallGraph out from the ModulePass which creates the
CallGraph.

This makes the CallGraph a totally generic analysis object that is the
container for the graph data structure and the primary interface for
querying and manipulating it. The pass logic is separated into its own
class. For compatibility reasons, the pass provides wrapper methods for
most of the methods on CallGraph -- they all just forward.

This will allow the new pass manager infrastructure to provide its own
analysis pass that constructs the same CallGraph object and makes it
available. The idea is that in the new pass manager, the analysis pass's
'run' method returns a concrete analysis 'result'. Here, that result is
a 'CallGraph'. The 'run' method will typically do only minimal work,
deferring much of the work into the implementation of the result object
in order to be lazy about computing things, but when (like DomTree)
there is *some* up-front computation, the analysis does it prior to
handing the result back to the querying pass.

I know some of this is fairly ugly. I'm happy to change it around if
folks can suggest a cleaner interim state, but there is going to be some
amount of unavoidable ugliness during the transition period. The good
thing is that this is very limited and will naturally go away when the
old pass infrastructure goes away. It won't hang around to bother us
later.

Next up is the initial new-PM-style call graph analysis. =]

llvm-svn: 195722
2013-11-26 04:19:30 +00:00
Chandler Carruth 57458517ef Migrate metadata information from scalar to vector instructions during
SLP vectorization. Based on the code in BBVectorizer.

Fixes PR17741.

Patch by Raul Silvera, reviewed by Hal and Nadav. Reformatted by my
driving of clang-format. =]

llvm-svn: 195528
2013-11-23 00:48:34 +00:00
Yuchen Wu c87ca32163 llvm-cov: Split entry blocks in GCNOProfiling.cpp.
gcov expects every function to contain an entry block that
unconditionally branches into the next block. clang does not implement
basic blocks in this manner, so gcov did not output correct branch info
if the entry block branched to multiple blocks.

This change splits every function's entry block into an empty block and
a block with the rest of the instructions. The instrumentation code will
take care of the rest.

llvm-svn: 195513
2013-11-22 23:07:45 +00:00
Manman Ren cb14bbcc48 Debug Info: move StripDebugInfo from StripSymbols.cpp to DebugInfo.cpp.
We can share the implementation between StripSymbols and dropping debug info
for metadata versions that do not match.

Also update the comments to match the implementation. A follow-on patch will
drop the "Debug Info Version" module flag in StripDebugInfo.

llvm-svn: 195505
2013-11-22 22:06:31 +00:00
Matt Arsenault 6ea0aade26 StructurizeCFG: Fix verification failure with some loops.
If the beginning of the loop was also the entry block
of the function, branches were inserted to the entry block
which isn't allowed. If this occurs, create a new dummy
function entry block that branches to the start of the loop.

llvm-svn: 195493
2013-11-22 19:24:39 +00:00
Matt Arsenault 9fb6e0ba58 StructurizeCFG: Fix inverting a branch on an argument
llvm-svn: 195492
2013-11-22 19:24:37 +00:00
Rafael Espindola 6597992c69 Add a fixed version of r195470 back.
The fix is simply to use CurI instead of I when handling aliases to
avoid accessing a invalid iterator.

original message:

Convert linkonce* to weak* instead of strong.

Also refactor the logic into a helper function. This is an important improve
on mingw where the linker complains about mixed weak and strong symbols.
Converting to weak ensures that the symbol is not dropped, but keeps in a
comdat, making the linker happy.

llvm-svn: 195477
2013-11-22 17:58:12 +00:00
Rafael Espindola 77aa674cc4 Revert "Convert linkonce* to weak* instead of strong."
This reverts commit r195470.
Debugging failure in some bots.

llvm-svn: 195472
2013-11-22 17:09:34 +00:00
Richard Sandiford 8ee1b77de3 Add a Scalarizer pass.
llvm-svn: 195471
2013-11-22 16:58:05 +00:00
Rafael Espindola 5574032575 Convert linkonce* to weak* instead of strong.
Also refactor the logic into a helper function. This is an important improvement
on mingw where the linker complains about mixed weak and strong symbols.
Converting to weak ensures that the symbol is not dropped, but keeps in a
comdat, making the linker happy.

llvm-svn: 195470
2013-11-22 16:14:30 +00:00
Arnold Schwaighofer 1756e1ea92 SLPVectorizer: Fix whitespace errors.
llvm-svn: 195468
2013-11-22 15:47:17 +00:00
Yi Jiang 79a2b0a6d1 SLP Vectorizer: Extract cost will only be added once even if the scalar has multiple external uses.
llvm-svn: 195406
2013-11-22 01:57:02 +00:00
Peter Collingbourne 0be79e1ade Introduce two command-line flags for the instrumentation pass to control whether the labels of pointers should be ignored in load and store instructions
The new command line flags are -dfsan-ignore-pointer-label-on-store and -dfsan-ignore-pointer-label-on-load. Their default value matches the current labelling scheme.

Additionally, the function __dfsan_union_load is marked as readonly.

Patch by Lorenzo Martignoni!

Differential Revision: http://llvm-reviews.chandlerc.com/D2187

llvm-svn: 195382
2013-11-21 23:20:54 +00:00
Evgeniy Stepanov cb5bdffc4e [msan] Propagate condition origin in select instruction.
llvm-svn: 195349
2013-11-21 12:00:24 +00:00
Yuchen Wu 2a9d96992d llvm-cov: Don't assume FileChecksum was generated.
For cases where emitProfileArcs() was called but emitProfileNotes() was
not, set the CfgChecksum to 0.

llvm-svn: 195311
2013-11-21 04:53:39 +00:00
Yuchen Wu 664dc7678b llvm-cov: Fixed some bugs related to file checksum.
Added call to update CfgChecksum. Made FileChecksum a vector, separate
for each source file.

llvm-svn: 195309
2013-11-21 04:01:05 +00:00
Yuchen Wu babe749125 llvm-cov: Added file checksum to gcno and gcda files.
Instead of permanently outputting "MVLL" as the file checksum, clang
will create gcno and gcda checksums by hashing the destination block
numbers of every arc. This allows for llvm-cov to check if the two gcov
files are synchronized.

Regenerated the test files so they contain the checksum. Also added
negative test to ensure error when the checksums don't match.

llvm-svn: 195191
2013-11-20 04:15:05 +00:00
Arnold Schwaighofer 8bc4a0ba14 SLPVectorizer: Fix stale for Value pointer array
We are slicing an array of Value pointers and process those slices in a loop.
The problem is that we might invalidate a later slice by vectorizing a former
slice.

Use a WeakVH to track the pointer. If the pointer is deleted or RAUW'ed we can
tell.

The test case will only fail when running with libgmalloc.

radar://15498655

llvm-svn: 195162
2013-11-19 22:20:20 +00:00
Arnold Schwaighofer 5f7c48ebff SLPVectorizer: Fix whitespace errors
llvm-svn: 195161
2013-11-19 22:20:18 +00:00
Chandler Carruth a126200665 Fix an issue where SROA computed different results based on the relative
order of slices of the alloca which have exactly the same size and other
properties. This was found by a perniciously unstable sort
implementation used to flush out buggy uses of the algorithm.

The fundamental idea is that findCommonType should return the best
common type it can find across all of the slices in the range. There
were two bugs here previously:

1) We would accept an integer type smaller than a byte-width multiple,
   and if there were different bit-width integer types, we would accept
   the first one. This caused an actual failure in the testcase updated
   here when the sort order changed.
2) If we found a bad combination of types or a non-load, non-store use
   before an integer typed load or store we would bail, but if we found
   the integere typed load or store, we would use it. The correct
   behavior is to always use an integer typed operation which covers the
   partition if one exists.

While a clever debugging sort algorithm found problem #1 in our existing
test cases, I have no useful test case ideas for #2. I spotted in by
inspection when looking at this code.

llvm-svn: 195118
2013-11-19 09:03:18 +00:00
Michael Ilseman d930c19d20 Add support for software expansion of 64-bit integer division instructions.
Patch by Dmitri Shtilman!

llvm-svn: 195116
2013-11-19 06:54:19 +00:00
Adrian Prantl 8e10fdbc0f Debug info: Let LowerDbgDeclare perfom the dbg.declare -> dbg.value
lowering only for load/stores to scalar allocas. The resulting values
confuse the backend and don't add anything because we can describe
array-allocas with a dbg.declare intrinsic just fine.

rdar://problem/15464571

llvm-svn: 195052
2013-11-18 23:04:38 +00:00
Alexey Samsonov a788b940f7 [ASan] Fix PR17867 - make sure ASan doesn't crash if use-after-scope and use-after-return are combined.
llvm-svn: 195014
2013-11-18 14:53:55 +00:00
Arnold Schwaighofer b72cb4ec49 LoopVectorizer: Extend the induction variable to a larger type
In some case the loop exit count computation can overflow. Extend the type to
prevent most of those cases.

The problem is loops like:
int main ()
{
  int a = 1;
  char b = 0;
  lbl:
    a &= 4;
    b--;
    if (b) goto lbl;
  return a;
}

The backedge count is 255. The induction variable type is i8. If we add one to
255 to get the exit count we overflow to zero.

To work around this issue we extend the type of the induction variable to i32 in
the case of i8 and i16.

PR17532

llvm-svn: 195008
2013-11-18 13:14:32 +00:00
NAKAMURA Takumi f9c8339a4e Utils/LoopUnroll.cpp: Tweak (StringRef)OldName to be valid until it is used, since r194601.
eraseFromParent() invalidates OldName.

llvm-svn: 194970
2013-11-17 18:05:34 +00:00
Hal Finkel 29aeb20518 Add a loop rerolling flag to the PassManagerBuilder
This adds a boolean member variable to the PassManagerBuilder to control loop
rerolling (just like we have for unrolling and the various vectorization
options). This is necessary for control by the frontend. Loop rerolling remains
disabled by default at all optimization levels.

llvm-svn: 194966
2013-11-17 16:02:50 +00:00
Hal Finkel 66cd3f1ba3 Add the cold attribute to error-reporting call sites
Generally speaking, control flow paths with error reporting calls are cold.
So far, error reporting calls are calls to perror and calls to fprintf,
fwrite, etc. with stderr as the stream. This can be extended in the future.

The primary motivation is to improve block placement (the cold attribute
affects the static branch prediction heuristics).

llvm-svn: 194943
2013-11-17 02:06:35 +00:00
Hal Finkel 67107ea1af Fix ndebug-build unused variable in loop rerolling
llvm-svn: 194941
2013-11-17 01:21:54 +00:00
Hal Finkel bf45efde2d Add a loop rerolling pass
This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The
transformation aims to take loops like this:

for (int i = 0; i < 3200; i += 5) {
  a[i]     += alpha * b[i];
  a[i + 1] += alpha * b[i + 1];
  a[i + 2] += alpha * b[i + 2];
  a[i + 3] += alpha * b[i + 3];
  a[i + 4] += alpha * b[i + 4];
}

and turn them into this:

for (int i = 0; i < 3200; ++i) {
  a[i] += alpha * b[i];
}

and loops like this:

for (int i = 0; i < 500; ++i) {
  x[3*i] = foo(0);
  x[3*i+1] = foo(0);
  x[3*i+2] = foo(0);
}

and turn them into this:

for (int i = 0; i < 1500; ++i) {
  x[i] = foo(0);
}

There are two motivations for this transformation:

  1. Code-size reduction (especially relevant, obviously, when compiling for
code size).

  2. Providing greater choice to the loop vectorizer (and generic unroller) to
choose the unrolling factor (and a better ability to vectorize). The loop
vectorizer can take vector lengths and register pressure into account when
choosing an unrolling factor, for example, and a pre-unrolled loop limits that
choice. This is especially problematic if the manual unrolling was optimized
for a machine different from the current target.

The current implementation is limited to single basic-block loops only. The
rerolling recognition should work regardless of how the loop iterations are
intermixed within the loop body (subject to dependency and side-effect
constraints), but the significant restriction is that the order of the
instructions in each iteration must be identical. This seems sufficient to
capture all current use cases.

This pass is not currently enabled by default at any optimization level.

llvm-svn: 194939
2013-11-16 23:59:05 +00:00
Hal Finkel 12100bf7e8 Apply the InstCombine fptrunc sqrt optimization to llvm.sqrt
InstCombine, in visitFPTrunc, applies the following optimization to sqrt calls:

  (fptrunc (sqrt (fpext x))) -> (sqrtf x)

but does not apply the same optimization to llvm.sqrt. This is a problem
because, to enable vectorization, Clang generates llvm.sqrt instead of sqrt in
fast-math mode, and because this optimization is being applied to sqrt and not
applied to llvm.sqrt, sometimes the fast-math code is slower.

This change makes InstCombine apply this optimization to llvm.sqrt as well.

This fixes the specific problem in PR17758, although the same underlying issue
(optimizations applied to libcalls are not applied to intrinsics) exists for
other optimizations in SimplifyLibCalls.

llvm-svn: 194935
2013-11-16 21:29:08 +00:00
Benjamin Kramer 03f3e248eb InstCombine: fold (A >> C) == (B >> C) --> (A^B) < (1 << C) for constant Cs.
This is common in bitfield code.

llvm-svn: 194925
2013-11-16 16:00:48 +00:00
Arnold Schwaighofer dbb7b87d7a LoopVectorizer: Use abi alignment for accesses with no alignment
When we vectorize a scalar access with no alignment specified, we have to set
the target's abi alignment of the scalar access on the vectorized access.
Using the same alignment of zero would be wrong because most targets will have a
bigger abi alignment for vector types.

This probably fixes PR17878.

llvm-svn: 194876
2013-11-15 23:09:33 +00:00
Manman Ren bc37658a7f ArgumentPromotion: correctly transfer TBAA tags and alignments.
We used to use std::map<IndicesVector, LoadInst*> for OriginalLoads, and when we
try to promote two arguments, they will both write to OriginalLoads causing
created loads for the two arguments to have the same original load. And the same
tbaa tag and alignment will be put to the created loads for the two arguments.

The fix is to use std::map<std::pair<Argument*, IndicesVector>, LoadInst*>
for OriginalLoads, so each Argument will write to different parts of the map.

PR17906

llvm-svn: 194846
2013-11-15 20:41:15 +00:00
Kostya Serebryany 0604c62d7b [asan] use GlobalValue::PrivateLinkage for coverage guard to save quite a bit of code size
llvm-svn: 194800
2013-11-15 09:52:05 +00:00
Bob Wilson da4147c743 Reapply "[asan] Poor man's coverage that works with ASan"
I was able to successfully run a bootstrapped LTO build of clang with
r194701, so this change does not seem to be the cause of our failing
buildbots.

llvm-svn: 194789
2013-11-15 07:16:09 +00:00
Matt Arsenault a9e95abcbf Add instcombine visitor for addrspacecast
llvm-svn: 194786
2013-11-15 05:45:08 +00:00
Bob Wilson ae73587c4b Revert "[asan] Poor man's coverage that works with ASan"
This reverts commit 194701. Apple's bootstrapped LTO builds have been failing,
and this change (along with compiler-rt 194702-194704) is the only thing on
the blamelist.  I will either reappy these changes or help debug the problem,
depending on whether this fixes the buildbots.

llvm-svn: 194780
2013-11-15 03:28:22 +00:00
Kostya Serebryany 6da3f74061 [asan] Poor man's coverage that works with ASan
llvm-svn: 194701
2013-11-14 13:27:41 +00:00
Evgeniy Stepanov 585813e33d [msan] Fast path optimization for wrap-indirect-calls feature of MemorySanitizer.
Indirect call wrapping helps MSanDR (dynamic instrumentation companion tool
for MSan) to catch all cases where execution leaves a compiler-instrumented
module by allowing the tool to rewrite targets of indirect calls.

This change is an optimization that skips wrapping for calls when target is
inside the current module. This relies on the linker providing symbols at the
begin and end of the module code (or code + data, does not really matter).
Gold linker provides such symbols by default. GNU (BFD) linker needs a link
flag: -Wl,--defsym=__executable_start=0.

More info:
https://code.google.com/p/memory-sanitizer/wiki/MSanDR#Native_exec

llvm-svn: 194697
2013-11-14 12:29:04 +00:00
Jakub Staszak 86a7492f0d Use StringRef instead of std::string
llvm-svn: 194601
2013-11-13 20:09:11 +00:00
Alexey Samsonov aa19c0a1c3 Fix -Wdelete-non-virtual-dtor warnings by making SampleProfile methods non-virtual
llvm-svn: 194568
2013-11-13 13:09:39 +00:00
Diego Novillo 8d6568b56b SampleProfileLoader pass. Initial setup.
This adds a new scalar pass that reads a file with samples generated
by 'perf' during runtime. The samples read from the profile are
incorporated and emmited as IR metadata reflecting that profile.

The profile file is assumed to have been generated by an external
profile source. The profile information is converted into IR metadata,
which is later used by the analysis routines to estimate block
frequencies, edge weights and other related data.

External profile information files have no fixed format, each profiler
is free to define its own. This includes both the on-disk representation
of the profile and the kind of profile information stored in the file.
A common kind of profile is based on sampling (e.g., perf), which
essentially counts how many times each line of the program has been
executed during the run.

The SampleProfileLoader pass is organized as a scalar transformation.
On startup, it reads the file given in -sample-profile-file to
determine what kind of profile it contains.  This file is assumed to
contain profile information for the whole application. The profile
data in the file is read and incorporated into the internal state of
the corresponding profiler.

To facilitate testing, I've organized the profilers to support two file
formats: text and native. The native format is whatever on-disk
representation the profiler wants to support, I think this will mostly
be bitcode files, but it could be anything the profiler wants to
support. To do this, every profiler must implement the
SampleProfile::loadNative() function.

The text format is mostly meant for debugging. Records are separated by
newlines, but each profiler is free to interpret records as it sees fit.
Profilers must implement the SampleProfile::loadText() function.

Finally, the pass will call SampleProfile::emitAnnotations() for each
function in the current translation unit. This function needs to
translate the loaded profile into IR metadata, which the analyzer will
later be able to use.

This patch implements the first steps towards the above design. I've
implemented a sample-based flat profiler. The format of the profile is
fairly simplistic. Each sampled function contains a list of relative
line locations (from the start of the function) together with a count
representing how many samples were collected at that line during
execution. I generate this profile using perf and a separate converter
tool.

Currently, I have only implemented a text format for these profiles. I
am interested in initial feedback to the whole approach before I send
the other parts of the implementation for review.

This patch implements:

- The SampleProfileLoader pass.
- The base ExternalProfile class with the core interface.
- A SampleProfile sub-class using the above interface. The profiler
  generates branch weight metadata on every branch instructions that
  matches the profiles.
- A text loader class to assist the implementation of
  SampleProfile::loadText().
- Basic unit tests for the pass.

Additionally, the patch uses profile information to compute branch
weights based on instruction samples.

This patch converts instruction samples into branch weights. It
does a fairly simplistic conversion:

Given a multi-way branch instruction, it calculates the weight of
each branch based on the maximum sample count gathered from each
target basic block.

Note that this assignment of branch weights is somewhat lossy and can be
misleading. If a basic block has more than one incoming branch, all the
incoming branches will get the same weight. In reality, it may be that
only one of them is the most heavily taken branch.

I will adjust this assignment in subsequent patches.

llvm-svn: 194566
2013-11-13 12:22:21 +00:00
Nadav Rotem ea186b9515 Update the docs to match the function name.
llvm-svn: 194537
2013-11-13 01:12:01 +00:00
Nadav Rotem 0ed2fdb5af Fold (iszero(A&K1) | iszero(A&K2)) -> (A&(K1|K2)) != (K1|K2) if we know that K1 and K2 are 'one-hot' (only one bit is on).
llvm-svn: 194525
2013-11-12 22:38:59 +00:00
Nadav Rotem 53d32211b7 FoldBranchToCommonDest merges branches into a single branch with or/and of the condition. It has a heuristics for estimating when some of the dependencies are processed by out-of-order processors. This patch adds another rule to the heuristics that says that if the "BonusInstruction" that we speculatively execute is used by the condition of the second branch then it is okay to hoist it. This change exposes more opportunities for other passes to transform the code. It does not matter that much that we if-convert the code because the selectiondag builder splits or/and branches into multiple branches when profitable.
llvm-svn: 194524
2013-11-12 22:37:16 +00:00
Rafael Espindola dd8757abbc Corruptly merge constants with explicit and implicit alignments.
Constant merge can merge a constant with implicit alignment with one that has
explicit alignment. Before this change it was assuming that the explicit
alignment was higher than the implicit one, causing the result to be under
aligned in some cases.

Fixes pr17815.

Patch by Chris Smowton!

llvm-svn: 194506
2013-11-12 20:21:43 +00:00
Benjamin Kramer 7c30260ab3 SimplifyCFG: Use existing constant folding logic when forming switch tables.
Both simpler and more powerful than the hand-rolled folding logic.

llvm-svn: 194475
2013-11-12 12:24:36 +00:00
Shuxin Yang f1ec34bdfd Correct a glitch in r194424 which may invalidate iterator.
llvm-svn: 194457
2013-11-12 08:33:03 +00:00
Yuchen Wu 062f24c973 llvm-cov: Added call to update run/program counts.
Also updated test files that were generated from this change.

llvm-svn: 194453
2013-11-12 04:59:08 +00:00
Shuxin Yang 3168ab3376 Fix PR17952.
The symptom is that an assertion is triggered. The assertion was added by
me to detect the situation when value is propagated from dead blocks.
(We can certainly get rid of assertion; it is safe to do so, because propagating
 value from dead block to alive join node is certainly ok.)

  The root cause of this bug is : edge-splitting is conducted on the fly,
the edge being split could be a dead edge, therefore the block that 
split the critial edge needs to be flagged "dead" as well.

  There are 3 ways to fix this bug:
  1) Get rid of the assertion as I mentioned eariler 
  2) When an dead edge is split, flag the inserted block "dead".
  3) proactively split the critical edges connecting dead and live blocks when
     new dead blocks are revealed.

  This fix go for 3) with additional 2 LOC.

  Testing case was added by Rafael the other day.

llvm-svn: 194424
2013-11-11 22:00:23 +00:00
Renato Golin 3f67a7de36 Move debug message in vectorizer
No functional change, just better reporting.

llvm-svn: 194388
2013-11-11 16:27:35 +00:00
Evgeniy Stepanov 560e089355 [msan] Propagate origin for insertvalue, extractvalue.
llvm-svn: 194374
2013-11-11 13:37:10 +00:00
Bill Wendling fed6c220ec Revert "Resurrect r191017 " GVN proceeds in the presence of dead code" plus a fix to PR17307 & 17308."
This causes PR17852.

This reverts commit d93e8a06b2ca09ab18f390cd514b7443e2e571f7.

Conflicts:
	test/Transforms/GVN/cond_br2.ll

llvm-svn: 194348
2013-11-10 07:34:34 +00:00
Matt Arsenault c900303e2f Use type form of getIntPtrType.
This should be inconsequential and is work
towards removing the default address space
arguments.

llvm-svn: 194347
2013-11-10 04:46:57 +00:00
Nadav Rotem 5ba1c6ced8 SimplifyCFG has a heuristics for out-of-order processors that decides when it is worthwhile to merge branches. It tries to estimate if the operands of the instruction that we want to hoist are ready. This commit marks function arguments as 'ready' because they require no calculation. This boosts libquantum and a few other workloads from the testsuite.
llvm-svn: 194346
2013-11-10 04:13:31 +00:00
Matt Arsenault 5bcefabcda Teach MergeFunctions about address spaces
llvm-svn: 194342
2013-11-10 01:44:37 +00:00
Hal Finkel 1a642aef37 Remove dead code from LoopUnswitch
LoopUnswitch's code simplification routine has logic to convert conditional
branches into unconditional branches, after unswitching makes the condition
constant, and then remove any blocks that renders dead. Unfortunately, this
code is dead, currently broken, and furthermore, has never been alive (at least
as far back at 2006).

No functionality change intended.

llvm-svn: 194277
2013-11-08 19:58:21 +00:00
Michael Gottesman 24b2f6fdda [objc-arc] Convert the one directional retain/release relation assert to a conditional check + fail.
Due to the previously added overflow checks, we can have a retain/release
relation that is one directional. This occurs specifically when we run into an
additive overflow causing us to drop state in only one direction. If that
occurs, we should bail and not optimize that retain/release instead of
asserting.

Apologies for the size of the testcase. It is necessary to cause the additive
cfg overflow to trigger.

rdar://15377890

llvm-svn: 194083
2013-11-05 16:02:40 +00:00
Hal Finkel 081eaef6fa Add a runtime unrolling parameter to the LoopUnroll pass constructor
As with the other loop unrolling parameters (the unrolling threshold, partial
unrolling, etc.) runtime unrolling can now also be controlled via the
constructor. This will be necessary for moving non-trivial unrolling late in
the pass manager (after loop vectorization).

No functionality change intended.

llvm-svn: 194027
2013-11-05 00:08:03 +00:00
Shuxin Yang d1382b6c31 Remove dead code
llvm-svn: 194017
2013-11-04 21:44:01 +00:00
Benjamin Kramer 9e7f7c7fdb SLPVectorizer: Use properlyDominates to satisfy the irreflexivity of a strict weak ordering.
STL debug mode checks this.

llvm-svn: 194015
2013-11-04 21:34:55 +00:00
Matt Arsenault 243140f2fd Scalarize select vector arguments when extracted.
When the elements are extracted from a select on vectors
or a vector select, do the select on the extracted scalars
from the input if there is only one use.

llvm-svn: 194013
2013-11-04 20:36:06 +00:00
Benjamin Kramer 191ba00b83 SLPVectorizer: Add a missing pair of parens. No functionality change.
llvm-svn: 193958
2013-11-03 12:54:32 +00:00
Benjamin Kramer 91e8f3c348 SLPVectorizer: When CSEing generated gathers only scan blocks containing them.
Instead of doing a RPO traversal of the whole function remember the blocks
containing gathers (typically <= 2) and scan them in dominator-first order.

The actual CSE is still quadratic, but I'm not confident that adding a
scoped hash table here is worth it as we're only looking at the generated
instructions and not arbitrary code.

llvm-svn: 193956
2013-11-03 12:27:52 +00:00
David Majnemer 120f4a06fd Revert "Inliner: Handle readonly attribute per argument when adding memcpy"
This reverts commit r193356, it caused PR17781.

A reduced test case covering this regression has been added to the test suite.

llvm-svn: 193955
2013-11-03 12:22:13 +00:00
David Majnemer 927df85de0 Spell "Actual" correctly
llvm-svn: 193954
2013-11-03 11:09:39 +00:00
Bob Wilson d8d92d90fa Convert calls to __sinpi and __cospi into __sincospi_stret
This adds an SimplifyLibCalls case which converts the special __sinpi and
__cospi (float & double variants) into a __sincospi_stret where appropriate to
remove duplicated work.

Patch by Tim Northover

llvm-svn: 193943
2013-11-03 06:48:38 +00:00
Benjamin Kramer 089c1e4f6d SLPVectorizer: Remove duplicated function.
llvm-svn: 193927
2013-11-02 14:46:27 +00:00
Benjamin Kramer 568a1cd9df LoopVectorize: Remove quadratic behavior the local CSE.
Doing this with a hash map doesn't change behavior and avoids calling
isIdenticalTo O(n^2) times. This should probably eventually move into a utility
class shared with EarlyCSE and the limited CSE in the SLPVectorizer.

llvm-svn: 193926
2013-11-02 13:39:00 +00:00
Arnold Schwaighofer d0789cdffe LoopVectorizer: Move cse code into its own function
llvm-svn: 193895
2013-11-01 23:28:54 +00:00
Arnold Schwaighofer a846a7f8f0 LoopVectorizer: Perform redundancy elimination on induction variables
When the loop vectorizer was part of the SCC inliner pass manager gvn would
run after the loop vectorizer followed by instcombine. This way redundancy
(multiple uses) were removed and instcombine could perform scalarization on the
induction variables. Having moved the loop vectorizer to later we no longer run
any form of redundancy elimination before we perform instcombine. This caused
vectorized induction variables to survive that did not before.

On a recent iMac this helps linpack back from 6000Mflops to 7000Mflops.

This should also help lpbench and paq8p.

I ran a Release (without Asserts) build over the test-suite and did not see any
negative impact on compile time.

radar://15339680

llvm-svn: 193891
2013-11-01 22:18:19 +00:00
Benjamin Kramer 1fbcdca9e3 LoopVectorize: Look for consecutive acces in GEPs with trailing zero indices
If we have a pointer to a single-element struct we can still build wide loads
and stores to it (if there is no padding).

llvm-svn: 193860
2013-11-01 14:09:50 +00:00
Arnold Schwaighofer 70a4665f55 LoopVectorizer: If dependency checks fail try runtime checks
When a dependence check fails we can still try to vectorize loops with runtime
array bounds checks.

This helps linpack to vectorize a loop in dgefa. And we are back to 2x of the
scalar performance on a corei7-avx.

radar://15339680

llvm-svn: 193853
2013-11-01 03:05:07 +00:00
Arnold Schwaighofer 1ca922e296 LoopVectorizer: Clear all member data structures in RuntimeCheck.reset()
Clear all data structures when resetting the RuntimeCheck data structure.

No test case. This was exposed by an upcomming change.

llvm-svn: 193852
2013-11-01 03:05:04 +00:00
Manman Ren 87a2adc7fe Do not convert "call asm" to "invoke asm" in Inliner.
Given that backend does not handle "invoke asm" correctly ("invoke asm" will be
handled by SelectionDAGBuilder::visitInlineAsm, which does not have the right
setup for LPadToCallSiteMap) and we already made the assumption that inline asm
does not throw in InstCombiner::visitCallSite, we are going to make the same
assumption in Inliner to make sure we don't convert "call asm" to "invoke asm".

If it becomes necessary to add support for "invoke asm" later on, we will need
to modify the backend as well as remove the assumptions that inline asm does
not throw.

Fix rdar://15317907

llvm-svn: 193808
2013-10-31 21:56:03 +00:00
Rafael Espindola 282a47037b Use LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN instead of the "dso list".
There are two ways one could implement hiding of linkonce_odr symbols in LTO:
* LLVM tells the linker which symbols can be hidden if not used from native
  files.
* The linker tells LLVM which symbols are not used from other object files,
  but will be put in the dso symbol table if present.

GOLD's API is the second option. It was implemented almost 1:1 in llvm by
passing the list down to internalize.

LLVM already had partial support for the first option. It is also very similar
to how ld64 handles hiding these symbols when *not* doing LTO.

This patch then
* removes the APIs for the DSO list.
* marks LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN all linkonce_odr unnamed_addr
  global values and other linkonce_odr whose address is not used.
* makes the gold plugin responsible for handling the API mismatch.

llvm-svn: 193800
2013-10-31 20:51:58 +00:00
Rafael Espindola 6554e5a94d Merge CallGraph and BasicCallGraph.
llvm-svn: 193734
2013-10-31 03:03:55 +00:00
Matt Arsenault 38b8ecf378 Teach scalarrepl about address spaces
llvm-svn: 193720
2013-10-30 22:54:58 +00:00
Matt Arsenault 614ea99da7 Fix GVN creating bitcast between address spaces
llvm-svn: 193710
2013-10-30 19:05:41 +00:00
Arnold Schwaighofer 77af0f6e82 ARM cost model: Account for zero cost scalar SROA instructions
By vectorizing a series of srl, or, ... instructions we have obfuscated the
intention so much that the backend does not know how to fold this code away.

radar://15336950

llvm-svn: 193573
2013-10-29 01:33:53 +00:00
Arnold Schwaighofer 86252451c4 SLPVectorizer: Use vector type for vectorized memory operations
No test case, because with the current cost model we don't see a difference.
An upcoming ARM memory cost model change will expose and test this bug.

radar://15332579

llvm-svn: 193572
2013-10-29 01:33:50 +00:00
Shuxin Yang 2e1890e18b Revert r193251 : Use address-taken to disambiguate global variable and indirect memops.
llvm-svn: 193489
2013-10-27 03:08:44 +00:00
Wan Xiaofei be640b28c0 Quick look-up for block in loop.
This patch implements quick look-up for block in loop by maintaining a hash set for blocks.
It improves the efficiency of loop analysis a lot, the biggest improvement could be 5-6%(458.sjeng).
Below are the compilation time for our benchmark in llc before & after the patch.

Benchmark	llc - trunk		llc - patched	
401.bzip2	0.339081	100.00%	0.329657	102.86%
403.gcc		19.853966	100.00%	19.605466	101.27%
429.mcf		0.049823	100.00%	0.048451	102.83%
433.milc	0.514898	100.00%	0.510217	100.92%
444.namd	1.109328	100.00%	1.103481	100.53%
445.gobmk	4.988028	100.00%	4.929114	101.20%
456.hmmer	0.843871	100.00%	0.825865	102.18%
458.sjeng	0.754238	100.00%	0.714095	105.62%
464.h264ref	2.9668		100.00%	2.90612		102.09%
471.omnetpp	4.556533	100.00%	4.511886	100.99%
bitmnp01	0.038168	100.00%	0.0357		106.91%
idctrn01	0.037745	100.00%	0.037332	101.11%
libquake2	3.78689		100.00%	3.76209		100.66%
libquake_	2.251525	100.00%	2.234104	100.78%
linpack		0.033159	100.00%	0.032788	101.13%
matrix01	0.045319	100.00%	0.043497	104.19%
nbench		0.333161	100.00%	0.329799	101.02%
tblook01	0.017863	100.00%	0.017666	101.12%
ttsprk01	0.054337	100.00%	0.053057	102.41%

Reviewer	: Andrew Trick <atrick@apple.com>, Hal Finkel <hfinkel@anl.gov>
Approver	: Andrew Trick <atrick@apple.com>
Test		: Pass make check-all & llvm test-suite

llvm-svn: 193460
2013-10-26 03:08:02 +00:00
Andrew Trick 57243da70f Fix SCEVExpander: don't try to expand quadratic recurrences outside a loop.
Partial fix for PR17459: wrong code at -O3 on x86_64-linux-gnu
(affecting trunk and 3.3)

When SCEV expands a recurrence outside of a loop it attempts to scale
by the stride of the recurrence. Chained recurrences don't work that
way. We could compute binomial coefficients, but would hve to
guarantee that the chained AddRec's are in a perfectly reduced form.

llvm-svn: 193438
2013-10-25 21:35:56 +00:00
Rafael Espindola 7749d7ccc7 Handle calls and invokes in GlobalStatus.
This patch teaches GlobalStatus to analyze a call that uses the global value as
a callee, not as an argument.

With this change internalize call handle the common use of linkonce_odr
functions. This reduces the number of linkonce_odr functions in a LTO build of
clang (checked with the emit-llvm gold plugin option) from 1730 to 60.

llvm-svn: 193436
2013-10-25 21:29:52 +00:00
Hal Finkel 02f562df43 LoopVectorizer: Don't attempt to vectorize extractelement instructions
The loop vectorizer does not currently understand how to vectorize
extractelement instructions. The existing check, which excluded all
vector-valued instructions, did not catch extractelement instructions because
it checked only the return value. As a result, vectorization would proceed,
producing illegal instructions like this:

  %58 = extractelement <2 x i32> %15, i32 0
  %59 = extractelement i32 %58, i32 0

where the second extractelement is illegal because its first operand is not a vector.

llvm-svn: 193434
2013-10-25 20:40:15 +00:00
Tom Stellard bc7d87f07c Inliner: Handle readonly attribute per argument when adding memcpy
Patch by: Vincent Lejeune

llvm-svn: 193356
2013-10-24 16:38:33 +00:00
Renato Golin 1ba143e140 Mark vector loops as already vectorized
Make sure we mark all loops (scalar and vector) when vectorizing,
so that we don't try to vectorize them anymore. Also, set unroll
to 1, since this is what we check for on early exit.

llvm-svn: 193349
2013-10-24 14:50:51 +00:00
Nuno Lopes 340b0463e6 fix PR17635: false positive with packed structures
LLVM optimizers may widen accesses to packed structures that overflow the structure itself, but should be in bounds up to the alignment of the object

llvm-svn: 193317
2013-10-24 09:17:24 +00:00
Juergen Ributzka d04d096ecf Fix a bug in LinearFunctionTestReplace that created invalid loop exit checks.
Reviewed by Andy

llvm-svn: 193303
2013-10-24 05:29:56 +00:00
Andrew Trick ada2356ac9 Clarify comments in genLoopLimit.
llvm-svn: 193292
2013-10-24 00:43:38 +00:00
Yuchen Wu 3197b25b27 Fixed comment typo in GCOVProfiling.cpp
llvm-svn: 193268
2013-10-23 20:35:00 +00:00
Shuxin Yang e4fb375995 Use address-taken to disambiguate global variable and indirect memops.
Major steps include:
 1). introduces a not-addr-taken bit-field in GlobalVariable
 2). GlobalOpt pass sets "not-address-taken" if it proves a global varirable 
    dosen't have its address taken.
 3). AA use this info for disambiguation. 

llvm-svn: 193251
2013-10-23 17:28:19 +00:00
Eric Christopher 874fa0f6c7 Fix spelling, grammar, and match naming convention for test files.
llvm-svn: 193130
2013-10-21 23:14:06 +00:00
Tom Stellard e1631ddf93 SimplifyCFG: Don't duplicate calls to functions marked noduplicate v2
v2:
  - Use CI->cannotDuplicate()

llvm-svn: 193115
2013-10-21 20:07:30 +00:00
Matt Arsenault 404c60a7c3 Use more type helper functions
llvm-svn: 193109
2013-10-21 19:43:56 +00:00
Matt Arsenault fa64659bd8 Teach SimplifyCFG about address spaces
llvm-svn: 193104
2013-10-21 18:55:08 +00:00
Rafael Espindola 3d7fc25c7c Optimize more linkonce_odr values during LTO.
When a linkonce_odr value that is on the dso list is not unnamed_addr
we can still look to see if anything is actually using its address. If
not, it is safe to hide it.

This patch implements that by moving GlobalStatus to Transforms/Utils
and using it in Internalize.

llvm-svn: 193090
2013-10-21 17:14:55 +00:00
Michael Gottesman 63c63ac21e Fix the predecessor removal logic in r193045.
Additionally some small comment/stylistic fixes are included as well.

llvm-svn: 193068
2013-10-21 05:20:11 +00:00
Bill Wendling 90dd90afcb Don't eliminate a partially redundant load if it's in a landing pad.
A landing pad can be jumped to only by the unwind edge of an invoke
instruction. If we eliminate a partially redundant load in a landing pad, it
will create a basic block that violates this constraint. It then leads to other
problems down the line if it tries to merge that basic block with the landing
pad. Avoid this by not eliminating the load in a landing pad.

PR17621

llvm-svn: 193064
2013-10-21 04:09:17 +00:00
Michael Gottesman c024f3258a Teach simplify-cfg how to correctly create covered lookup tables for switches on iN with N >= 3.
One optimization simplify-cfg performs is the converting of switches to
lookup tables if the switch has > 4 cases. This is done by:

1. Finding the max/min case value and calculating the switch case range.
2. Create a lookup table basic block.
3. Perform a check in the switch's BB to see if the input value is in
the switch's case range. If the input value satisfies said predicate
branch to the lookup table BB, otherwise branch to the switch's default
destination BB using the default value as the result.

The conditional check consists of subtracting the min case value of the
table from any input iN value and then ensuring that said value is
unsigned less than the size of the lookup table represented as an iN
value.

If the lookup table is a covered lookup table, the size of the table will be N
which is 0 as an iN value. Thus the comparison will be an `icmp ult` of an iN
value against 0 which is always false yielding the incorrect result.

This patch fixes this problem by recognizing if we have a covered lookup table
and if we do, unconditionally jumps to the lookup table BB since the covering
property of the lookup table implies no input values could not be handled by
said BB.

rdar://15268442

llvm-svn: 193045
2013-10-20 07:04:37 +00:00
Bill Wendling 4fea22c63b Perform an intelligent splice of the predecessor with the single successor.
If the predecessor's being spliced into a landing pad, then we need the PHIs to
come first and the rest of the predecessor's code to come *after* the landing
pad instruction.

llvm-svn: 193035
2013-10-19 11:27:12 +00:00
Nadav Rotem 7f27e0b0ce Mark some command line flags as hidden
llvm-svn: 193013
2013-10-18 23:38:13 +00:00
Rafael Espindola 045a78fa7e Rename fields of GlobalStatus to match the coding style.
llvm-svn: 192910
2013-10-17 18:18:52 +00:00
Rafael Espindola 27797baee7 rename SafeToDestroyConstant to isSafeToDestroyConstant and clang-format.
llvm-svn: 192907
2013-10-17 18:06:32 +00:00
Rafael Espindola 026c9cbefe Simplify the interface of AnalyzeGlobal a bit and rename to analyzeGlobal.
No functionality change.

llvm-svn: 192906
2013-10-17 18:00:25 +00:00
Evgeniy Stepanov 21a9c93a4d [msan] Use zero-extension in shadow cast by default.
Switch to sign-extension in r192575 caused 7% perf loss on 482.sphinx3.

llvm-svn: 192882
2013-10-17 10:53:50 +00:00
Dmitry Vyukov b1ad5720a2 tsan: implement no_sanitize_thread attribute
If a function has no_sanitize_thread attribute,
do not instrument memory accesses in it.

llvm-svn: 192871
2013-10-17 07:20:06 +00:00
Arnold Schwaighofer a66582470b SLPVectorizer: Don't vectorize volatile memory operations
radar://15231682

Reapply r192799,
  http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang/builds/8226
showed that the bot is still broken even with this out.

llvm-svn: 192820
2013-10-16 17:52:40 +00:00
Arnold Schwaighofer 06a0324f6a Revert "SLPVectorizer: Don't vectorize volatile memory operations"
This speculatively reverts commit 192799. It might have broken a linux buildbot.

llvm-svn: 192816
2013-10-16 17:19:40 +00:00
Arnold Schwaighofer 5078ea2bd9 SLPVectorizer: Don't vectorize volatile memory operations
radar://15231682

llvm-svn: 192799
2013-10-16 16:09:00 +00:00
Kostya Serebryany d3d23bec66 [asan] Optimize accesses to global arrays with constant index
Summary:
Given a global array G[N], which is declared in this CU and has static initializer
avoid instrumenting accesses like G[i], where 'i' is a constant and 0<=i<N.
Also add a bit of stats.

This eliminates ~1% of instrumentations on SPEC2006
and also partially helps when asan is being run together with coverage.

Reviewers: samsonov

Reviewed By: samsonov

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1947

llvm-svn: 192794
2013-10-16 14:06:14 +00:00
Benjamin Kramer c97850be76 LoopVectorize: Properly reflect PODness in comments.
llvm-svn: 192717
2013-10-15 16:19:54 +00:00
Craig Topper ef9e993eaa Remove x86_sse42_crc32_64_8 intrinsic. It has no functional difference from x86_sse42_crc32_32_8 and was not mapped to a clang builtin. I'm not even sure why this form of the instruction is even called out explicitly in the docs. Also add AutoUpgrade support to convert it into the other intrinsic with appropriate trunc and zext.
llvm-svn: 192672
2013-10-15 05:20:47 +00:00
Rafael Espindola 8c1d78ad51 Remove lib/Transforms/Instrumentation/ProfilingUtils.*
They were leftover from the old profiling support.

Patch by Alastair Murray.

llvm-svn: 192605
2013-10-14 16:46:46 +00:00
Chris Lattner 94fc4bed1f Basic blocks typically have few predecessors. Use a SmallDenseMap to
avoid a heap allocation when this is the case.

llvm-svn: 192602
2013-10-14 16:05:55 +00:00
Evgeniy Stepanov be83d8f693 [msan] Instrument x86.*_cvt* intrinsics.
Currently MSan checks that arguments of *cvt* intrinsics are fully initialized.
That's too much to ask: some of them only operate on lower half, or even
quarter, of the input register.

llvm-svn: 192599
2013-10-14 15:16:25 +00:00
Evgeniy Stepanov 9b5517b127 [msan] Fix handling of scalar select of vectors.
llvm-svn: 192575
2013-10-14 09:52:09 +00:00
Arnold Schwaighofer 58864d2d5f SLPVectorizer: Sort PHINodes based on their opcode
Before this patch we relied on the order of phi nodes when we looked for phi
nodes of the same type. This could prevent vectorization of cases where there
was a phi node of a second type in between phi nodes of some type.

This is important for vectorization of an internal graphics kernel. On the test
suite + external on x86_64 (and on a run on armv7s) it showed no impact on
either performance or compile time.

radar://15024459

llvm-svn: 192537
2013-10-12 18:56:27 +00:00
Tobias Grosser 5cff1e2d78 LoopVectorize: Add missing INITIALIZE_PASS_DEPENDENCY macros
Contributed-by:  Peter Zotov  <whitequark@whitequark.org>
llvm-svn: 192536
2013-10-12 18:29:15 +00:00
Renato Golin dd943a8919 Better info when debugging vectorizer
llvm-svn: 192460
2013-10-11 16:14:39 +00:00
Shuxin Yang 1cab418ce2 Fix a bug in Dead Argument Elimination.
If a function seen at compile time is not necessarily the one linked to
the binary being built, it is illegal to change the actual arguments
passing to it. 

  e.g. 
   --------------------------
   void foo(int lol) {
     // foo() has linkage satisifying isWeakForLinker()
     // "lol" is not used at all.
   }

   void bar(int lo2) {
      // xform to foo(undef) is illegal, as compiler dose not know which
      // instance of foo() will be linked to the the binary being built.
      foo(lol2); 
   }
  -----------------------------

  Such functions can be captured by isWeakForLinker(). NOTE that
mayBeOverridden() is insufficient for this purpose as it dosen't include
linkage types like AvailableExternallyLinkage and LinkOnceODRLinkage.
Take link_odr* as an example, it indicates a set of *EQUIVALENT* globals
that can be merged at link-time. However, the semantic of 
*EQUIVALENT*-functions includes parameters. Changing parameters breaks
the assumption.

  Thank John McCall for help, especially for the explanation of subtle
difference between linkage types.

  rdar://11546243

llvm-svn: 192302
2013-10-09 17:21:44 +00:00
Arnold Schwaighofer 0caddfc731 LoopVectorize: External uses must use the last value in a reduction cycle
Otherwise, we don't perform operations that would have been performed on
the scalar version.

Fixes PR17498.

llvm-svn: 192133
2013-10-07 21:05:43 +00:00
Alexey Samsonov a1944e6d26 Revert r191834 until we measure the effect of this benchmarks and maybe find a better way to fix it
llvm-svn: 192121
2013-10-07 19:03:24 +00:00
Hal Finkel f5a3eaea55 UpdatePHINodes in BasicBlockUtils should not crash on duplicate predecessors
UpdatePHINodes has an optimization to reuse an existing PHI node, where it
first deletes all of its entries and then replaces them. Unfortunately, in the
case where we had duplicate predecessors (which are allowed so long as the
associated PHI entries have the same value), the loop removing the existing PHI
entries from the to-be-reused PHI would assert (if that PHI was not the one
which had the duplicates).

llvm-svn: 192001
2013-10-04 23:41:05 +00:00
Arnold Schwaighofer 698d4ac8a8 SLPVectorizer: Sort inputs to commutative binary operations
Sort the operands of the other entries in the current vectorization root
according to the first entry's operands opcodes.

%conv0 = uitofp ...
%load0 = load float ...

= fmul %conv0, %load0
= fmul %load0, %conv1
= fmul %load0, %conv2

Make sure that we recursively vectorize <%conv0, %conv1, %conv2> and <%load0,
%load0, %load0>.

This makes it more likely to obtain vectorizable trees. We have to be careful
when we sort that we don't destroy 'good' existing ordering implied by source
order.

radar://15080067

llvm-svn: 191977
2013-10-04 20:39:16 +00:00
Owen Anderson 5797bfd4a3 Pull fptrunc's upwards through selects when one of the select's selectands was a constant. This has a number of benefits, including producing small immediates (easier to materialize, smaller constant pools) as well as being more likely to allow the fptrunc to fuse with a preceding instruction (truncating selects are unusual).
llvm-svn: 191929
2013-10-03 21:08:05 +00:00
Rafael Espindola cda2911caa Optimize linkonce_odr unnamed_addr functions during LTO.
Generalize the API so we can distinguish symbols that are needed just for a DSO
symbol table from those that are used from some native .o.

The symbols that are only wanted for the dso symbol table can be dropped if
llvm can prove every other dso has a copy (linkonce_odr) and the address is not
important (unnamed_addr).

llvm-svn: 191922
2013-10-03 18:29:09 +00:00
Matt Arsenault bfa37e546d Make gep i8* X, -(ptrtoint Y) transform work with address spaces
llvm-svn: 191920
2013-10-03 18:15:57 +00:00
Matt Arsenault 0be1cb1c7b Don't use runtime bounds check between address spaces.
Don't vectorize with a runtime check if it requires a
comparison between pointers with different address spaces.
The values can't be assumed to be directly comparable.
Previously it would create an illegal bitcast.

llvm-svn: 191862
2013-10-02 22:38:17 +00:00
Yi Jiang 8fd1a806d5 Apply slp vectorization on fully-vectorizable tree of height 2
llvm-svn: 191852
2013-10-02 20:20:39 +00:00
Matt Arsenault 39d592fe48 Fix debug printing spacing.
Fix missing newlines, missing and extra spaces in printed messages.

llvm-svn: 191851
2013-10-02 20:04:29 +00:00
Matt Arsenault cccbe16785 Fix comment grammar and capitalization.
llvm-svn: 191850
2013-10-02 20:04:26 +00:00
Benjamin Kramer b9add84ef6 SLPVectorizer: Make store chain finding more aggressive with GetUnderlyingObject.
This recursively strips all GEPs like the existing code. It also handles bitcasts and
other operations that do not change the pointer value.

llvm-svn: 191847
2013-10-02 19:06:06 +00:00
Tom Stellard d3e916eb6a StructurizeCFG: Add dependency on LowerSwitch pass
Switch instructions were crashing the StructurizeCFG pass, and it's
probably easier anyway if we don't need to handle them in this pass.

Reviewed-by: Christian König <christian.koenig@amd.com>
llvm-svn: 191841
2013-10-02 17:04:59 +00:00
Chandler Carruth ea56494625 Remove the very substantial, largely unmaintained legacy PGO
infrastructure.

This was essentially work toward PGO based on a design that had several
flaws, partially dating from a time when LLVM had a different
architecture, and with an effort to modernize it abandoned without being
completed. Since then, it has bitrotted for several years further. The
result is nearly unusable, and isn't helping any of the modern PGO
efforts. Instead, it is getting in the way, adding confusion about PGO
in LLVM and distracting everyone with maintenance on essentially dead
code. Removing it paves the way for modern efforts around PGO.

Among other effects, this removes the last of the runtime libraries from
LLVM. Those are being developed in the separate 'compiler-rt' project
now, with somewhat different licensing specifically more approriate for
runtimes.

llvm-svn: 191835
2013-10-02 15:42:23 +00:00
Alexey Samsonov 31540172d0 Remove "localize global" optimization
Summary:
As discussed in http://llvm-reviews.chandlerc.com/D1754,
this optimization isn't really valid for C, and fires too rarely anyway.

Reviewers: rafael, nicholas

Reviewed By: nicholas

CC: rnk, llvm-commits, nicholas

Differential Revision: http://llvm-reviews.chandlerc.com/D1769

llvm-svn: 191834
2013-10-02 15:31:34 +00:00
Matt Arsenault 517d84e268 Don't merge tiny functions.
It's silly to merge functions like these:

define void @foo(i32 %x) {
  ret void
}

define void @bar(i32 %x) {
  ret void
}

to get

define void @bar(i32) {
  tail call void @foo(i32 %0)
  ret void
}

llvm-svn: 191786
2013-10-01 18:05:30 +00:00
Rafael Espindola 44fee4e0eb Remove several unused variables.
Patch by Alp Toker.

llvm-svn: 191757
2013-10-01 13:32:03 +00:00
Matt Arsenault 5ea37f8d89 Fix code duplication
llvm-svn: 191716
2013-10-01 00:01:14 +00:00
Matt Arsenault 8468062c6e Use right address space size in InstCombineCompares
The test's output doesn't change, but this ensures
this is actually hit with a different address space.

llvm-svn: 191701
2013-09-30 21:11:01 +00:00
Matt Arsenault 06adecabe7 Constant fold ptrtoint + compare with address spaces
llvm-svn: 191699
2013-09-30 21:06:18 +00:00
Benjamin Kramer f00472908a BoundsChecking: Fix refacto.
llvm-svn: 191676
2013-09-30 15:52:50 +00:00
Benjamin Kramer 6e931528fe Convert manual insert point restores to the new RAII object.
llvm-svn: 191675
2013-09-30 15:40:17 +00:00
Benjamin Kramer 6748576a0d InstCombine: Replace manual fast math flag copying with the new IRBuilder RAII helper.
Defines away the issue where cast<Instruction> would fail because constant
folding happened. Also slightly cleaner.

llvm-svn: 191674
2013-09-30 15:39:59 +00:00
Benjamin Kramer d36f1abefd IRBuilder: Add RAII objects to reset insertion points or fast math flags.
Inspired by the object from the SLPVectorizer. This found a minor bug in the
debug loc restoration in the vectorizer where the location of a following
instruction was attached instead of the location from the original instruction.

llvm-svn: 191673
2013-09-30 15:39:48 +00:00
Joey Gouly d51a35c6a0 Fix a bug in InstCombine where it attempted to cast a Value* to an Instruction*
when it was actually a Constant*.

There are quite a few other casts to Instruction that might have the same problem,
but this is the only one I have a test case for.

llvm-svn: 191668
2013-09-30 14:18:35 +00:00
Robert Wilhelm 2788d3ec99 Even more spelling fixes for "instruction".
llvm-svn: 191611
2013-09-28 13:42:22 +00:00
Robert Wilhelm f0cfb83bb4 Fix spelling intruction -> instruction.
llvm-svn: 191610
2013-09-28 11:46:15 +00:00
Matt Arsenault 31cfc78f81 Use right pointer type in DebugIR
llvm-svn: 191576
2013-09-27 22:26:25 +00:00
Matt Arsenault fa25272db9 Use type helper functions
llvm-svn: 191574
2013-09-27 22:18:51 +00:00
Matt Arsenault 29f31735a2 Fix SLPVectorizer using wrong address space for load/store
llvm-svn: 191564
2013-09-27 21:24:57 +00:00
Justin Bogner 4a9ac8cd75 InstCombine: Only foldSelectICmpAndOr for integer types
Currently foldSelectICmpAndOr asserts if the "or" involves a vector
containing several of the same power of two. We can easily avoid this by
only performing the fold on integer types, like foldSelectICmpAnd does.

Fixes <rdar://problem/15012516>

llvm-svn: 191552
2013-09-27 20:35:39 +00:00
Justin Bogner ca9bd8fac1 Transforms: Use getFirstNonPHI to set the insertion point for PHIs
We were previously using getFirstInsertionPt to insert PHI
instructions when vectorizing, but getFirstInsertionPt also skips past
landingpads, causing this to generate invalid IR.

We can avoid this issue by using getFirstNonPHI instead.

llvm-svn: 191526
2013-09-27 15:30:25 +00:00
Puyan Lotfi 74e38de492 First check in. Modified a comment.
llvm-svn: 191491
2013-09-27 07:36:10 +00:00
Arnold Schwaighofer 07520324f5 SLPVectorize: Put horizontal reductions feeding a store under separate flag
Put them under a separate flag for experimentation. They are more likely to
interfere with loop vectorization which happens later in the pass pipeline.

llvm-svn: 191371
2013-09-25 14:02:32 +00:00
Evgeniy Stepanov 32be0340f5 [msan] Fix -Wreturn-type warnings in non-self-hosted build.
llvm-svn: 191361
2013-09-25 08:56:00 +00:00
Yi Jiang edf2d9179e set the cost of tiny trees to INT_MAX in SLP vectorizer to disable vectorization on them
llvm-svn: 191314
2013-09-24 17:26:43 +00:00
Benjamin Kramer 30d249a1b3 Push analysis passes to InstSimplify when they're around anyways.
llvm-svn: 191309
2013-09-24 16:37:40 +00:00
Evgeniy Stepanov 5522a70674 [msan] Handling of atomic load/store, atomic rmw, cmpxchg.
llvm-svn: 191287
2013-09-24 11:20:27 +00:00
Arnold Schwaighofer 22639407d7 Revert "LoopVectorizer: Only allow vectorization of intrinsics."
Revert 191122 - with extra checks we are allowed to vectorize math library
function calls.

Standard library indentifiers are reserved names so functions with external
linkage must not overrided them. However, functions with internal linkage can.

Therefore, we can vectorize calls to math library functions with a check for
external linkage and matching signature. This matches what we do during
SelectionDAG building.

llvm-svn: 191206
2013-09-23 14:54:39 +00:00
Benjamin Kramer 8817cca5ce Provide basic type safety for array_pod_sort comparators.
This makes using array_pod_sort significantly safer. The implementation relies
on function pointer casting but that should be safe as we're dealing with void*
here.

llvm-svn: 191175
2013-09-22 14:09:50 +00:00
Benjamin Kramer 5626259506 Drop spurious handle in comment.
llvm-svn: 191172
2013-09-22 11:24:58 +00:00
Benjamin Kramer 90901a35ce SROA: Handle casts involving vectors of pointers and integer scalars.
SROA wants to convert any types of equivalent widths but it's not possible to
convert vectors of pointers to an integer scalar with a single cast. As a
workaround we add a bitcast to the corresponding int ptr type first. This type
of cast used to be an edge case but has become common with SLP vectorization.
Fixes PR17271.

llvm-svn: 191143
2013-09-21 20:36:04 +00:00
Arnold Schwaighofer d743feef81 SLPVectorizer: Fix multiline comment warning
llvm-svn: 191135
2013-09-21 05:37:30 +00:00
Arnold Schwaighofer 500242d4fe Reapply "SLPVectorizer: Handle more horizontal reductions (disabled)""
Reapply r191108 with a fix for a memory corruption error I introduced.  Of
course, we can't reference the scalars that we replace by vectorizing and then
call their eraseFromParent method. I only 'needed' the scalars to get the
DebugLoc. Just store the DebugLoc before actually vectorizing instead. As a nice
side effect, this also simplifies the interface between BoUpSLP and the
HorizontalReduction class to returning a value pointer (the vectorized tree
root).

radar://14607682

llvm-svn: 191123
2013-09-21 01:06:00 +00:00
Nadav Rotem 3371172a67 LoopVectorizer: Only allow vectorization of intrinsics. We can't know for sure that the functions 'abs' or 'round' are the functions from libm.
rdar://15012650

llvm-svn: 191122
2013-09-21 00:27:05 +00:00
Arnold Schwaighofer f1dfbfdde1 Revert "SLPVectorizer: Handle more horizontal reductions (disabled)"
This reverts commit r191108.

The horizontal.ll test case fails under libgmalloc. Thanks Shuxin for pointing
this out to me.

llvm-svn: 191121
2013-09-21 00:06:20 +00:00
Shuxin Yang 6e35094bbf Resurrect r191017 " GVN proceeds in the presence of dead code" plus a fix to PR17307 & 17308.
The problem of r191017 is that when GVN fabricate a val-number for a dead instruction (in order
to make following expr-PRE happy), it forget to fabricate a leader-table entry for it as well.

llvm-svn: 191118
2013-09-20 23:12:57 +00:00
Benjamin Kramer 0e2d162d1e InstCombine: Remove unused argument. No functionality change.
llvm-svn: 191112
2013-09-20 22:12:42 +00:00
Arnold Schwaighofer 4724963112 SLPVectorizer: Handle more horizontal reductions (disabled)
Match reductions starting at binary operation feeding into a phi. The code
handles trees like

 r += v1 + v2 + v3 ...

and

 r += v1
 r += v2
 ...

and

 r *= v1 + v2 + ...

We currently only handle associative operations (add, fadd fast).

The code can now also handle reductions feeding into stores.

 a[i] = v1 + v2 + v3 + ...

The code is currently disabled behind the flag "-slp-vectorize-hor".  The cost
model for most architectures is not there yet.

I found one opportunity of a horizontal reduction feeding a phi in TSVC
(LoopRerolling-flt) and there are several opportunities where reductions feed
into stores.

radar://14607682

llvm-svn: 191108
2013-09-20 21:18:20 +00:00
Joerg Sonnenberger 1fbe323649 Revert r191017, it results in segmentation faults in Qt.
llvm-svn: 191104
2013-09-20 20:33:57 +00:00
Benjamin Kramer e6461e3053 InstCombine: Canonicalize (gep i8* X, -(ptrtoint Y)) to (sub (ptrtoint X), (ptrtoint Y))
The GEP pattern is what SCEV expander emits for "ugly geps". The latter is what
you get for pointer subtraction in C code. The rest of instcombine already
knows how to deal with that so just canonicalize on that.

llvm-svn: 191090
2013-09-20 14:38:44 +00:00
Shuxin Yang 3a7ca6ec87 [Fast-math] Disable "(C1/X)*C2 => (C1*C2)/X" if C1/X has multiple uses.
If "C1/X" were having multiple uses, the only benefit of this
transformation is to potentially shorten critical path. But it is at the
cost of instroducing additional div.

  The additional div may or may not incur cost depending on how div is
implemented. If it is implemented using Newton–Raphson iteration, it dosen't
seem to incur any cost (FIXME). However, if the div blocks the entire
pipeline, that sounds to be pretty expensive. Let CodeGen to take care 
this transformation.

  This patch sees 6% on a benchmark.

rdar://15032743

llvm-svn: 191037
2013-09-19 21:13:46 +00:00
Benjamin Kramer 0b37cdf9af InstCombine: Don't allow turning vector-of-pointer loads into vector-of-integer.
The code below can't handle any pointers. PR17293.

llvm-svn: 191036
2013-09-19 20:59:04 +00:00
Shuxin Yang 74c9a170b8 GVN proceeds in the presence of dead code.
This is how it ignores the dead code:
1) When a dead branch target, say block B, is identified, all the
    blocks dominated by B is dead as well.

2) The PHIs of those blocks in dominance-frontier(B) is updated such
   that the operands corresponding to dead predecessors are replaced
   by "UndefVal".

   Using lattice's jargon, the "UndefVal" is the "Top" in essence.
   Phi node like this "phi(v1 bb1, undef xx)" will be optimized into
   "v1" if v1 is constant, or v1 is an instruction which dominate this
   PHI node.

3) When analyzing the availability of a load L, all dead mem-ops which
   L depends on disguise as a load which evaluate exactly same value as L.

4) The dead mem-ops will be materialized as "UndefVal" during code motion.

llvm-svn: 191017
2013-09-19 17:22:51 +00:00
Evgeniy Stepanov 37b8645480 [msan] Wrap indirect functions.
Adds a flag to the MemorySanitizer pass that enables runtime rewriting of
indirect calls. This is part of MSanDR implementation and is needed to return
control to the DynamiRio-based helper tool on transition between instrumented
and non-instrumented modules. Disabled by default.

llvm-svn: 191006
2013-09-19 15:22:35 +00:00
Kostya Serebryany f322382e22 [asan] call __asan_stack_malloc_N only if use-after-return detection is enabled with the run-time option
llvm-svn: 190939
2013-09-18 14:07:14 +00:00
Robert Lytton f637e2cb23 Prevent LoopVectorizer and SLPVectorizer running if the target has no vector registers.
XCore target: Add XCoreTargetTransformInfo
This is where getNumberOfRegisters() resides, which in turn returns the
number of vector registers (=0).

llvm-svn: 190936
2013-09-18 12:43:35 +00:00
Craig Topper be3e01e61f Revert accidental commit I had to make to get the test case in PR17268 to still work correctly.
llvm-svn: 190917
2013-09-18 04:10:17 +00:00
Craig Topper 98064b9f4d Lift alignment restrictions for load/store folding on VINSERTF128/VEXTRACTF128. Fixes PR17268.
llvm-svn: 190916
2013-09-18 03:55:53 +00:00
David Blaikie eacc287b49 ifndef NDEBUG-out an asserts-only constant committed in r190863
llvm-svn: 190905
2013-09-18 00:11:27 +00:00
Quentin Colombet 870b662779 Revert the load slicing done in r190870.
To avoid regressions with bitfield optimizations, this slicing should take place
later, like ISel time.

llvm-svn: 190891
2013-09-17 22:01:26 +00:00
Matt Arsenault e6952f28ca Cleanup handling of constant function casts.
Some of this code is no longer necessary since int<->ptr casts are no
longer occur as of r187444.

This also fixes handling vectors of pointers, and adds a bunch of new
testcases for vectors and address spaces.

llvm-svn: 190885
2013-09-17 21:10:14 +00:00
Arnold Schwaighofer 4a3dcaa193 SLPVectorizer: Don't vectorize phi nodes that use invoke values
We can't insert an insertelement after an invoke. We would have to split a
critical edge. So when we see a phi node that uses an invoke we just give up.

radar://14990770

llvm-svn: 190871
2013-09-17 17:03:29 +00:00
Quentin Colombet b8d672ef5b [InstCombiner] Slice a big load in two loads when the elements are next to each
other in memory.

The motivation was to get rid of truncate and shift right instructions that get
in the way of paired load or floating point load.
E.g.,
Consider the following example:
struct Complex {
  float real;
  float imm;
};

When accessing a complex, llvm was generating a 64-bits load and the imm field
was obtained by a trunc(lshr) sequence, resulting in poor code generation, at
least for x86.

The idea is to declare that two load instructions is the canonical form for
loading two arithmetic type, which are next to each other in memory.

Two scalar loads at a constant offset from each other are pretty
easy to detect for the sorts of passes that like to mess with loads. 

<rdar://problem/14477220>

llvm-svn: 190870
2013-09-17 16:57:34 +00:00
Kostya Serebryany bc86efb89d [asan] inline the calls to __asan_stack_free_* with small sizes. Yet another 10%-20% speedup for use-after-return
llvm-svn: 190863
2013-09-17 12:14:50 +00:00
Stepan Dyatkovskiy dc2c4b4462 Bugfix for PR17099:
Wrong cast operation.
MergeFunctions emits Bitcast instead of pointer-to-integer operation.
Patch fixes MergeFunctions::writeThunk function. It replaces
unconditional Bitcast creation with "Value* createCast(...)" method, that
checks operand types and selects proper instruction.
See unit-test as example.

llvm-svn: 190859
2013-09-17 09:36:11 +00:00
Matt Arsenault 899f7d2b00 MemCpyOptimizer: Use max legal int size instead of pointer size
If there are no legal integers, assume 1 byte.

This makes more sense than using the pointer size as
a guess for the maximum GPR width.

It is conceivable to want to use some 64-bit pointers
on a target where 64-bit integers aren't legal.

llvm-svn: 190817
2013-09-16 22:43:16 +00:00
Arnold Schwaighofer 53e622cef4 Don't vectorize if there are outside loop users of the induction variable.
We would have to compute the pre increment value, either by computing it on
every loop iteration or by splitting the edge out of the loop and inserting a
computation for it there.

For now, just give up vectorizing such loops.

Fixes PR17179.

llvm-svn: 190790
2013-09-16 16:17:24 +00:00
Evgeniy Stepanov 604293fbb4 [msan] Check return value of main().
llvm-svn: 190782
2013-09-16 13:24:32 +00:00
Peter Collingbourne 3fa50f9b05 Implement function prefix data as an IR feature.
Previous discussion:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/063909.html

Differential Revision: http://llvm-reviews.chandlerc.com/D1191

llvm-svn: 190773
2013-09-16 01:08:15 +00:00
Benjamin Kramer 7d6052687e Replace some unnecessary vector copies with references.
llvm-svn: 190770
2013-09-15 22:04:42 +00:00
Robert Wilhelm 042f10ce41 Fix spelling.
llvm-svn: 190750
2013-09-14 09:34:59 +00:00
Chandler Carruth ebeac5cb89 Remove the long, long defunct IR block placement pass.
This pass was based on the previous (essentially unused) profiling
infrastructure and the assumption that by ordering the basic blocks at
the IR level in a particular way, the correct layout would happen in the
end. This sometimes worked, and mostly didn't. It also was a really
naive implementation of the classical paper that dates from when branch
predictors were primarily directional and when loop structure wasn't
commonly available. It also didn't factor into the equation
non-fallthrough branches and other machine level details.

Anyways, for all of these reasons and more, I wrote
MachineBlockPlacement, which completely supercedes this pass. It both
uses modern profile information infrastructure, and actually works. =]

llvm-svn: 190748
2013-09-14 09:28:14 +00:00
Evgeniy Stepanov 0435ecd18f [msan] Add source file:line to stack origin reports.
Compiler part.

llvm-svn: 190689
2013-09-13 12:54:49 +00:00
Duncan Sands c9e95ad0db Avoid a compiler warning about Found not being used when assertions are
disabled.

llvm-svn: 190668
2013-09-13 08:16:06 +00:00
Hal Finkel 8f2e700522 Add getUnrollingPreferences to TTI
Allow targets to customize the default behavior of the generic loop unrolling
transformation. This will be used by the PowerPC backend when targeting the A2
core (which is in-order with a deep pipeline), and using more aggressive
defaults is important.

llvm-svn: 190542
2013-09-11 19:25:43 +00:00
Benjamin Kramer 079b96e6f7 Revert "Give internal classes hidden visibility."
It works with clang, but GCC has different rules so we can't make all of those
hidden. This reverts commit r190534.

llvm-svn: 190536
2013-09-11 18:05:11 +00:00
Benjamin Kramer 6a44af3629 Give internal classes hidden visibility.
Worth 100k on a linux/x86_64 Release+Asserts clang.

llvm-svn: 190534
2013-09-11 17:42:27 +00:00
Matt Arsenault d3471e9ea8 Use type form of getIntPtrType
This doesn't change anything since malloc always returns
address space 0.

llvm-svn: 190498
2013-09-11 07:29:40 +00:00
Matt Arsenault 009faed1be Teach loop-idiom about address space pointer sizes
llvm-svn: 190491
2013-09-11 05:09:42 +00:00
Matt Arsenault 5df49bd703 Add braces
llvm-svn: 190490
2013-09-11 05:09:35 +00:00
Eli Friedman 77d7fbb924 Get rid of unused isPodLike definitions.
llvm-svn: 190461
2013-09-11 00:36:54 +00:00
Eli Friedman 05906faa4d Don't assert on invalid loop vectorization hint.
llvm-svn: 190450
2013-09-10 23:45:25 +00:00
Eli Friedman c1f1f852d7 Fix mistake in r190442.
llvm-svn: 190446
2013-09-10 23:09:24 +00:00
Eli Friedman 1891f69323 Remove unused functions.
llvm-svn: 190442
2013-09-10 22:42:31 +00:00
Matt Arsenault a90a18e0ea Teach ScalarEvolution about pointer address spaces
llvm-svn: 190425
2013-09-10 19:55:24 +00:00
Benjamin Kramer 934f6f39f4 LoopVectorize: PHI nodes are always at the beginning of a block, no need to scan the whole block.
llvm-svn: 190422
2013-09-10 18:46:15 +00:00
Kostya Serebryany 6805de5467 [asan] refactor the use-after-return API so that the size class is computed at compile time instead of at run-time. llvm part
llvm-svn: 190407
2013-09-10 13:16:56 +00:00
Matt Arsenault f631f8c640 Use StringRef::npos for StringRef instead of std::string one
llvm-svn: 190375
2013-09-10 00:41:53 +00:00
Eli Friedman 33d3700716 Don't shrink atomic ops to bool in GlobalOpt.
LLVM IR doesn't currently allow atomic bool load/store operations, and the
transformation is dubious anyway because it isn't profitable on all platforms.

PR17163.

llvm-svn: 190357
2013-09-09 22:00:13 +00:00
Quentin Colombet 5ab555532b [InstCombiner] Expose opportunities to merge subtract and comparison.
Several architectures use the same instruction to perform both a comparison and
a subtract. The instruction selection framework does not allow to consider
different basic blocks to expose such fusion opportunities.

Therefore, these instructions are “merged” by CSE at MI IR level.

To increase the likelihood of CSE to apply in such situation, we reorder the
operands of the comparison, when they have the same complexity, so that they
matches the order of the most frequent subtract.
E.g.,

icmp A, B
...
sub B, A

<rdar://problem/14514580>

llvm-svn: 190352
2013-09-09 20:56:48 +00:00
Bob Wilson e407736a06 Revert patches to add case-range support for PR1255.
The work on this project was left in an unfinished and inconsistent state.
Hopefully someone will eventually get a chance to implement this feature, but
in the meantime, it is better to put things back the way the were.  I have
left support in the bitcode reader to handle the case-range bitcode format,
so that we do not lose bitcode compatibility with the llvm 3.3 release.

This reverts the following commits: 155464, 156374, 156377, 156613, 156704,
156757, 156804 156808, 156985, 157046, 157112, 157183, 157315, 157384, 157575,
157576, 157586, 157612, 157810, 157814, 157815, 157880, 157881, 157882, 157884,
157887, 157901, 158979, 157987, 157989, 158986, 158997, 159076, 159101, 159100,
159200, 159201, 159207, 159527, 159532, 159540, 159583, 159618, 159658, 159659,
159660, 159661, 159703, 159704, 160076, 167356, 172025, 186736

llvm-svn: 190328
2013-09-09 19:14:35 +00:00
Manman Ren d8c68b1852 TBAA: add isTBAAVtableAccess to MDNode so clients can call the function
instead of having its own implementation.

The implementation of isTBAAVtableAccess is in TypeBasedAliasAnalysis.cpp
since it is related to the format of TBAA metadata.

The path for struct-path tbaa will be exercised by
test/Instrumentation/ThreadSanitizer/read_from_global.ll, vptr_read.ll, and
vptr_update.ll when struct-path tbaa is on by default.

llvm-svn: 190216
2013-09-06 22:47:05 +00:00
Matt Arsenault 8227b9f69c Use type helper functions.
llvm-svn: 190113
2013-09-06 00:37:24 +00:00
Matt Arsenault 37d42ecaff Teach CodeGenPrepare about address spaces
llvm-svn: 190112
2013-09-06 00:18:43 +00:00
Matt Arsenault e6db76071c Consistently use dbgs() in debug printing
llvm-svn: 190093
2013-09-05 19:48:28 +00:00
Rafael Espindola d21ac19bda Remove unused argument.
llvm-svn: 190090
2013-09-05 19:15:21 +00:00
Nick Lewycky 2c88067a46 Declare missing dependency on AliasAnalysis. Patch by Liu Xin!
llvm-svn: 190035
2013-09-05 08:19:58 +00:00
Rafael Espindola b7c0b4a327 Rename some variables to match the style guide.
I am about to patch this code, and this makes the diff far more readable.

llvm-svn: 189982
2013-09-04 20:08:46 +00:00
Rafael Espindola b832d49822 Small simplification given that insert of an empty range is a nop.
llvm-svn: 189971
2013-09-04 18:53:21 +00:00
Rafael Espindola 49a6c153c9 Refactor duplicated logic to a helper function.
No functionality change.

llvm-svn: 189969
2013-09-04 18:37:36 +00:00
Rafael Espindola 9406516af1 Remove dead code.
llvm-svn: 189967
2013-09-04 18:16:02 +00:00
Rafael Espindola 128c5ea902 Revert "Add r159136 back now that pr13124 has been fixed."
This reverts commit r189886.

I found a corner case where this optimization is not valid:

Say we have a "linkonce_odr unnamed_addr" in two translation units:
* In TU 1 this optimization kicks in and makes it hidden.
* In TU 2 it gets const merged with a constant that is *not* unnamed_addr,
  resulting in a non unnamed_addr constant with default visibility.
* The static linker rules for combining visibility them produce a hidden
  symbol, which is incorrect from the point of view of the non unnamed_addr
  constant.

The one place we can do this is when we know that the symbol is not used from
another TU in the same shared object, i.e., during LTO. I will move it there.

llvm-svn: 189954
2013-09-04 16:09:01 +00:00
Tim Northover dc647a2603 InstCombine: allow unmasked icmps to be combined with logical ops
"(icmp op i8 A, B)" is equivalent to "(icmp op i8 (A & 0xff), B)" as a
degenerate case. Allowing this as a "masked" comparison when analysing "(icmp)
&/| (icmp)" allows us to combine them in more cases.

rdar://problem/7625728

llvm-svn: 189931
2013-09-04 11:57:17 +00:00
Tim Northover c0756c454c InstCombine: look for masked compares with subset relation
Even in cases which aren't universally optimisable like "(A & B) != 0 && (A &
C) != 0", the masks can make one of the comparisons completely redundant. In
this case, since we've gone to the effort of spotting masked comparisons we
should combine them.

rdar://problem/7625728

llvm-svn: 189930
2013-09-04 11:57:13 +00:00
Rafael Espindola 5eb7df68bf Add r159136 back now that pr13124 has been fixed.
Original message:
If a constant or a function has linkonce_odr linkage and unnamed_addr, mark
hidden. Being linkonce_odr guarantees that it is available in every dso that
needs it. Being a constant/function with unnamed_addr guarantees that the
copies don't have to be merged.

llvm-svn: 189886
2013-09-03 23:34:36 +00:00
Michael Gottesman 469a80cb30 [objc-arc] Remove dead code from previous commit.
llvm-svn: 189870
2013-09-03 22:40:56 +00:00
Michael Gottesman e29b1c1825 [objc-arc] Turn off the objc_retainBlock -> objc_retain optimization.
The reason that I am turning off this optimization is that there is an
additional case where a block can escape that has come up. Specifically, this
occurs when a block is used in a scope outside of its current scope.

This can cause a captured retainable object pointer whose life is preserved by
the objc_retainBlock to be deallocated before the block is invoked.

An example of the code needed to trigger the bug is:

----
\#import <Foundation/Foundation.h>
int main(int argc, const char * argv[]) {
  void (^somethingToDoLater)();

  {
    NSObject *obj = [NSObject new];

    somethingToDoLater = ^{
      [obj self]; // Crashes here
    };
  }

  NSLog(@"test.");

  somethingToDoLater();
  return 0;
}
----

In the next commit, I remove all the dead code that results from this.

Once I put in the fixing commit I will bring back the tests that I deleted in
this commit.

rdar://14802782.
rdar://14868830.

llvm-svn: 189869
2013-09-03 22:40:54 +00:00
Nadav Rotem 5d78dba6d9 Enable late-vectorization by default.
This patch changes the default setting for the LateVectorization flag that controls where the loop-vectorizer is ran.

Perf gains:
SingleSource/Benchmarks/Shootout/matrix -37.33%
MultiSource/Benchmarks/PAQ8p/paq8p  -22.83%
SingleSource/Benchmarks/Linpack/linpack-pc  -16.22%
SingleSource/Benchmarks/Shootout-C++/ary3 -15.16%
MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt -10.34%
MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl -7.12%

Regressions:
SingleSource/Benchmarks/Misc/lowercase  15.10%
MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt 13.18%
SingleSource/Benchmarks/Shootout-C++/matrix 8.27%
SingleSource/Benchmarks/CoyoteBench/lpbench 7.30%

llvm-svn: 189858
2013-09-03 21:33:17 +00:00
Matt Arsenault 3dfe54e954 Teach InstCombineLoadCast about address spaces.
This is another one that doesn't matter much,
but uses the right GEP index types in the first
place.

llvm-svn: 189854
2013-09-03 21:05:48 +00:00
Matt Arsenault e38e4cdc46 Use type form of getIntPtrType in alloca visitor.
This doesn't actually matter, since alloca is always
0 address space, but this is more consistent.

llvm-svn: 189853
2013-09-03 21:05:15 +00:00
Yi Jiang aeb5b46a85 In this patch we are trying to do two things:
1) If the width of vectorization list candidate is bigger than vector reg width, we will break it down to fit the vector reg.
2) We do not vectorize the width which is not power of two.

The performance result shows it will help some spec benchmarks. mesa improved 6.97% and ammp improved 1.54%. 

llvm-svn: 189830
2013-09-03 17:26:04 +00:00
Evgeniy Stepanov e95d37c81d [msan] Fix handling of select with struct arguments.
llvm-svn: 189796
2013-09-03 13:05:29 +00:00
Evgeniy Stepanov 566f591404 [msan] Fix select instrumentation.
Select condition shadow was being ignored resulting in false negatives.
This change OR-s sign-extended condition shadow into the result shadow.

llvm-svn: 189785
2013-09-03 10:04:11 +00:00
Benjamin Kramer 2702caad08 SimplifyLibCalls: When emitting an overloaded fp function check that it's available.
The existing code missed some edge cases when e.g. we're going to emit sqrtf but
only the availability of sqrt was checked. This happens on odd platforms like
windows.

llvm-svn: 189724
2013-08-31 18:19:35 +00:00
Bill Wendling 2865be79f8 Compulsive reformatting.
llvm-svn: 189697
2013-08-30 21:07:33 +00:00
Benjamin Kramer 010f108382 InstCombine: Check for zero shift amounts before subtracting one causing integer overflow.
PR17026. Also avoid undefined shifts and shift amounts larger than 64 bits
(those are always undef because we can't represent integer types that large).

llvm-svn: 189672
2013-08-30 14:35:35 +00:00
Bill Wendling 4c0d9adecb Random cleanup: No need to use a std::vector here, since createInternalizePass uses an ArrayRef.
llvm-svn: 189632
2013-08-30 00:48:37 +00:00
Hal Finkel 8e83820a04 Revert: r189565 - Add getUnrollingPreferences to TTI
Revert unintentional commit (of an unreviewed change).

Original commit message:

Add getUnrollingPreferences to TTI

Allow targets to customize the default behavior of the generic loop unrolling
transformation. This will be used by the PowerPC backend when targeting the A2
core (which is in-order with a deep pipeline), and using more aggressive
defaults is important.

llvm-svn: 189566
2013-08-29 03:33:15 +00:00
Hal Finkel 63e6c0e9fb Add getUnrollingPreferences to TTI
Allow targets to customize the default behavior of the generic loop unrolling
transformation. This will be used by the PowerPC backend when targeting the A2
core (which is in-order with a deep pipeline), and using more aggressive
defaults is important.

llvm-svn: 189565
2013-08-29 03:29:57 +00:00
Nadav Rotem 4c459bcd47 Vectorizer/PassManager: I am working on moving the vectorizer out of the SCC passes. This patch moves the SLP-vectorizer and BB-vectorizer back into SCC passes for two reasons:
1. They are a kind of cannonicalization.
2. The performance measurements show that it is better to keep them in.

There should be no functional change if you are not enabling the LateVectorization mode.

llvm-svn: 189539
2013-08-28 23:40:29 +00:00
Matt Arsenault 38874731f6 Fix typo.
llvm-svn: 189524
2013-08-28 22:17:26 +00:00
Hal Finkel 6d09904cc9 Disable unrolling in the loop vectorizer when disabled in the pass manager
When unrolling is disabled in the pass manager, the loop vectorizer should also
not unroll loops. This will allow the -fno-unroll-loops option in Clang to
behave as expected (even for vectorizable loops). The loop vectorizer's
-force-vector-unroll option will (continue to) override the pass-manager
setting (including -force-vector-unroll=0 to force use of the internal
auto-selection logic).

In order to test this, I added a flag to opt (-disable-loop-unrolling) to force
disable unrolling through opt (the analog of -fno-unroll-loops in Clang). Also,
this fixes a small bug in opt where the loop vectorizer was enabled only after
the pass manager populated the queue of passes (the global_alias.ll test needed
a slight update to the RUN line as a result of this fix).

llvm-svn: 189499
2013-08-28 18:33:10 +00:00
Alexey Samsonov 9b7e2b555c 80 cols
llvm-svn: 189473
2013-08-28 11:25:12 +00:00
Peter Collingbourne 28a10aff48 DataFlowSanitizer: Implement trampolines for function pointers passed to custom functions.
Differential Revision: http://llvm-reviews.chandlerc.com/D1503

llvm-svn: 189408
2013-08-27 22:09:06 +00:00
Nadav Rotem 6b41f7cc4c Refactor 'vectorizeLoop' no functionality change.
This patch merges LoopVectorize of InnerLoopVectorizer and InnerLoopUnroller by adding checks for VF=1. This helps in erasing the Unroller code that is almost identical to the InnerLoopVectorizer code.

llvm-svn: 189391
2013-08-27 18:52:47 +00:00
Michael Gottesman eab9a7fa7c Fixed typo.
Noticed by Stephen Checkoway <s@pahtak.org>.

llvm-svn: 189312
2013-08-27 04:43:03 +00:00
Matt Arsenault ed9f76d37b Fix inserting instructions before last in bundle.
The builder inserts from before the insert point,
not after, so this would insert before the last
instruction in the bundle instead of after it.

I'm not sure if this can actually be a problem
with any of the current insertions.

llvm-svn: 189285
2013-08-26 23:08:37 +00:00
Nadav Rotem bdc9ff4498 LoopVectorize: Implement partial loop unrolling when vectorization is not profitable.
This patch enables unrolling of loops when vectorization is legal but not profitable.
We add a new class InnerLoopUnroller, that extends InnerLoopVectorizer and replaces some of the vector-specific logic with scalars.

This patch does not introduce any runtime regressions and improves the following workloads:

SingleSource/Benchmarks/Shootout/matrix -22.64%
SingleSource/Benchmarks/Shootout-C++/matrix -13.06%
External/SPEC/CINT2006/464_h264ref/464_h264ref  -3.99%
SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding -1.95%

llvm-svn: 189281
2013-08-26 22:33:26 +00:00
Yi Jiang 7107d41574 test commit. Remove blank line
llvm-svn: 189265
2013-08-26 18:57:55 +00:00
Matt Arsenault bcd8c577d7 Fix unused variable in release build
llvm-svn: 189264
2013-08-26 18:38:29 +00:00
Matt Arsenault 8f21c838c0 Constify functions
llvm-svn: 189234
2013-08-26 17:56:38 +00:00
Matt Arsenault 39274be65f Vectorize starting from insertelements building a vector
llvm-svn: 189233
2013-08-26 17:56:35 +00:00
Matt Arsenault 8405888af1 Check if in set on insertion instead of separately
llvm-svn: 189179
2013-08-24 19:55:38 +00:00
Benjamin Kramer b12cf01908 Add a function object to compare the first or second component of a std::pair.
Replace instances of this scattered around the code base.

llvm-svn: 189169
2013-08-24 12:54:27 +00:00
Peter Collingbourne a96296f3ab DataFlowSanitizer: correctly combine labels in the case where they are equal.
llvm-svn: 189133
2013-08-23 18:45:06 +00:00
Evgeniy Stepanov d42863cc1f [msan] Fix handling of va_arg overflow area on x86_64.
The code was erroneously reading overflow area shadow from the TLS slot,
bypassing the local copy. Reading shadow directly from TLS is wrong, because
it can be overwritten by a nested vararg call, if that happens before va_start.

llvm-svn: 189104
2013-08-23 12:11:00 +00:00
Richard Sandiford 37cd6cfba2 Turn MipsOptimizeMathLibCalls into a target-independent scalar transform
...so that it can be used for z too.  Most of the code is the same.
The only real change is to use TargetTransformInfo to test when a sqrt
instruction is available.

The pass is opt-in because at the moment it only handles sqrt.

llvm-svn: 189097
2013-08-23 10:27:02 +00:00
Alexey Samsonov 6dae24df16 80 cols
llvm-svn: 189091
2013-08-23 07:42:51 +00:00
Michael Gottesman 823aaffd37 Update StripDeadDebugInfo to use DebugInfoFinder so that it is no longer stale to the point of not working and more resilient to debug info changes.
The current version of StripDeadDebugInfo became stale and no longer actually
worked since it was expecting an older version of debug info.

This patch updates it to use DebugInfoFinder and the modern DebugInfo classes as
much as possible to make it more redundent to such changes. Additionally, the
only place where that was avoided (the code where we replace the old sets with
the new), I call verify on the DIContextUnit implying that if the format changes
and my live set changes no longer make sense an assert will be hit. In order to
ensure that that occurs I have included a test case.

The actual stripping of the dead debug info follows the same strategy as was
used before in this class: find the live set and replace the old set in the
given compile unit (which may contain dead global variables/functions) with the
new live one.

llvm-svn: 189078
2013-08-23 00:23:24 +00:00
Peter Collingbourne 34f0c313e2 DataFlowSanitizer: Replace non-instrumented aliases of instrumented functions, and vice versa, with wrappers.
Differential Revision: http://llvm-reviews.chandlerc.com/D1442

llvm-svn: 189054
2013-08-22 20:08:15 +00:00
Peter Collingbourne 761a4fc475 DataFlowSanitizer: Factor the wrapper builder out to buildWrapperFunction.
Differential Revision: http://llvm-reviews.chandlerc.com/D1441

llvm-svn: 189053
2013-08-22 20:08:11 +00:00
Peter Collingbourne 59b1262d01 DataFlowSanitizer: Prefix the name of each instrumented function with "dfs$".
DFSan changes the ABI of each function in the module.  This makes it possible
for a function with the native ABI to be called with the instrumented ABI,
or vice versa, thus possibly invoking undefined behavior.  A simple way
of statically detecting instances of this problem is to prepend the prefix
"dfs$" to the name of each instrumented-ABI function.

This will not catch every such problem; in particular function pointers passed
across the instrumented-native barrier cannot be used on the other side.
These problems could potentially be caught dynamically.

Differential Revision: http://llvm-reviews.chandlerc.com/D1373

llvm-svn: 189052
2013-08-22 20:08:08 +00:00
Chandler Carruth 1c34afcb61 Teach the SLP vectorizer the correct way to check for consecutive access
using GEPs. Previously, it used a number of different heuristics for
analyzing the GEPs. Several of these were conservatively correct, but
failed to fall back to SCEV even when SCEV might have given a reasonable
answer. One was simply incorrect in how it was formulated.

There was good code already to recursively evaluate the constant offsets
in GEPs, look through pointer casts, etc. I gathered this into a form
code like the SLP code can use in a previous commit, which allows all of
this code to become quite simple.

There is some performance (compile time) concern here at first glance as
we're directly attempting to walk both pointers constant GEP chains.
However, a couple of thoughts:

1) The very common cases where there is a dynamic pointer, and a second
   pointer at a constant offset (usually a stride) from it, this code
   will actually not do any unnecessary work.

2) InstCombine and other passes work very hard to collapse constant
   GEPs, so it will be rare that we iterate here for a long time.

That said, if there remain performance problems here, there are some
obvious things that can improve the situation immensely. Doing
a vectorizer-pass-wide memoizer for each individual layer of pointer
values, their base values, and the constant offset is likely to be able
to completely remove redundant work and strictly limit the scaling of
the work to scrape these GEPs. Since this optimization was not done on
the prior version (which would still benefit from it), I've not done it
here. But if folks have benchmarks that slow down it should be straight
forward for them to add.

I've added a test case, but I'm not really confident of the amount of
testing done for different access patterns, strides, and pointer
manipulation.

llvm-svn: 189007
2013-08-22 12:45:17 +00:00
Matt Arsenault f599d97449 Teach LoopVectorize about address space sizes
llvm-svn: 188980
2013-08-22 02:42:55 +00:00
Michael Gottesman 0dc00645a2 Fixed typo.
llvm-svn: 188957
2013-08-21 22:53:54 +00:00
Michael Gottesman 0900993c3c Removed trailing whitespace.
llvm-svn: 188956
2013-08-21 22:53:29 +00:00
Yunzhong Gao 05efa23294 No functionality change.
Replace "(255 & value)" with "(0xFF & value)" to improve clarity.

llvm-svn: 188941
2013-08-21 22:11:15 +00:00
Matt Arsenault 745101d666 Teach InstCombine about address spaces
llvm-svn: 188926
2013-08-21 19:53:10 +00:00
Matt Arsenault 745832dcc9 Use attribute helper function
llvm-svn: 188916
2013-08-21 18:54:50 +00:00
Matt Arsenault 3c71dabd88 Fix typo
llvm-svn: 188915
2013-08-21 18:54:47 +00:00
Bill Wendling 707f601fa5 Move registering the execution of a basic block to the beginning rather than the end.
There are situations which can affect the correctness (or at least expectation)
of the gcov output. For instance, if a call to __gcov_flush() occurs within a
block before the execution count is registered and then the program aborts in
some way, then that block will not be marked as executed. This is not normally
what the user expects.

If we move the code that's registering when a block is executed to the
beginning, we can catch these types of situations.

PR16893

llvm-svn: 188849
2013-08-20 23:52:00 +00:00
Arnold Schwaighofer e1f3ab69d1 SLPVectorizer: Fix invalid iterator errors
Update iterator when the SLP vectorizer changes the instructions in the basic
block by restarting the traversal of the basic block.

Patch by Yi Jiang!

Fixes PR 16899.

llvm-svn: 188832
2013-08-20 21:21:45 +00:00
Hal Finkel 0c5c01aa4a Add a llvm.copysign intrinsic
This adds a llvm.copysign intrinsic; We already have Libfunc recognition for
copysign (which is turned into the FCOPYSIGN SDAG node). In order to
autovectorize calls to copysign in the loop vectorizer, we need a corresponding
intrinsic as well.

In addition to the expected changes to the language reference, the loop
vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into
an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a
few lists in LegalizeVector{Ops,Types} so that vector copysigns can be
expanded.

In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN
be Expand for vector types. This seems correct for all in-tree targets, and I
think is the right thing to do because, previously, there was no way to generate
vector-values FCOPYSIGN nodes (and most targets don't specify an action for
vector-typed FCOPYSIGN).

llvm-svn: 188728
2013-08-19 23:35:46 +00:00
Jakub Staszak b4eb6adebb Use pop_back_val() instead of both back() and pop_back().
llvm-svn: 188723
2013-08-19 22:47:55 +00:00
Matt Arsenault d79f7d9ea1 Teach InstCombine visitGetElementPtr about address spaces
llvm-svn: 188721
2013-08-19 22:17:40 +00:00
Matt Arsenault 98f34e3abe Cleanup visitGetElementPtr to make address space change easier
llvm-svn: 188720
2013-08-19 22:17:34 +00:00
Matt Arsenault 94a028aa43 commonPointerCast cleanups to make address space change easier
llvm-svn: 188719
2013-08-19 22:17:18 +00:00
Matt Arsenault 5aeae18e9d Revert non-test parts of r188507
Re-add the inboundsless tests I didn't add originally

llvm-svn: 188710
2013-08-19 21:40:31 +00:00
Peter Collingbourne aac65a313d Introduce SpecialCaseList::isIn overload for GlobalAliases.
Differential Revision: http://llvm-reviews.chandlerc.com/D1437

llvm-svn: 188688
2013-08-19 19:00:35 +00:00
Michael Kuperstein 4bb3f8f2e4 Adds missing TLI check for library simplification of
* pow(x, 0.5) -> fabs(sqrt(x)) 
* pow(2.0, x) -> exp2(x)

llvm-svn: 188656
2013-08-19 06:55:47 +00:00
Peter Collingbourne 03c3324ccd Remove SpecialCaseList::findCategory.
It turned out that I didn't need this for DFSan.

llvm-svn: 188646
2013-08-19 00:24:20 +00:00
Joerg Sonnenberger 8e3050db51 PR 16899: Do not modify the basic block using the iterator, but keep the
next value. This avoids crashes due to invalidation.

Patch by Joey Gouly.

llvm-svn: 188605
2013-08-17 11:04:47 +00:00
Jim Grosbach d0de8ace8a InstCombine: Use isAllOnesValue() instead of explicit -1.
llvm-svn: 188563
2013-08-16 17:03:36 +00:00
Jim Grosbach 20e3b9ac30 InstCombine: Simplify if(x!=0 && x!=-1).
When both constants are positive or both constants are negative,
InstCombine already simplifies comparisons like this, but when
it's exactly zero and -1, the operand sorting ends up reversed
and the pattern fails to match. Handle that special case.

Follow up for rdar://14689217

llvm-svn: 188512
2013-08-16 00:15:20 +00:00
Matt Arsenault 1de76773bc Don't do FoldCmpLoadFromIndexedGlobal for non inbounds GEPs
This path wasn't tested before without a datalayout,
so add some more tests and re-run with and without one.

llvm-svn: 188507
2013-08-15 23:11:07 +00:00
Matt Arsenault 5cae894a13 Fix spelling
llvm-svn: 188506
2013-08-15 23:11:03 +00:00
Yunzhong Gao c0c2b16932 Fixing a corner-case bug in strchr and strrchr lib call optimizations where
the input character is not converted to char before comparing with zero.

The patch was discussed in this thread:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130812/184069.html

llvm-svn: 188489
2013-08-15 20:58:59 +00:00
Peter Collingbourne 444c59e270 DataFlowSanitizer: Add a debugging feature to help us track nonzero labels.
Summary:
When the -dfsan-debug-nonzero-labels parameter is supplied, the code
is instrumented such that when a call parameter, return value or load
produces a nonzero label, the function __dfsan_nonzero_label is called.
The idea is that a debugger breakpoint can be set on this function
in a nominally label-free program to help identify any bugs in the
instrumentation pass causing labels to be introduced.

Reviewers: eugenis

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1405

llvm-svn: 188472
2013-08-15 18:51:12 +00:00
Mark Lacey a2626555f1 Fix small typo: s/succ/Succ/
llvm-svn: 188415
2013-08-14 22:11:42 +00:00
Peter Collingbourne 9d31d6f329 DataFlowSanitizer: Instrumentation for memset.
Differential Revision: http://llvm-reviews.chandlerc.com/D1395

llvm-svn: 188412
2013-08-14 20:51:38 +00:00
Peter Collingbourne 68162e7512 DataFlowSanitizer: greylist is now ABI list.
This replaces the old incomplete greylist functionality with an ABI
list, which can provide more detailed information about the ABI and
semantics of specific functions.  The pass treats every function in
the "uninstrumented" category in the ABI list file as conforming to
the "native" (i.e. unsanitized) ABI.  Unless the ABI list contains
additional categories for those functions, a call to one of those
functions will produce a warning message, as the labelling behaviour
of the function is unknown.  The other supported categories are
"functional", "discard" and "custom".

- "discard" -- This function does not write to (user-accessible) memory,
  and its return value is unlabelled.
- "functional" -- This function does not write to (user-accessible)
  memory, and the label of its return value is the union of the label of
  its arguments.
- "custom" -- Instead of calling the function, a custom wrapper __dfsw_F
  is called, where F is the name of the function.  This function may wrap
  the original function or provide its own implementation.

Differential Revision: http://llvm-reviews.chandlerc.com/D1345

llvm-svn: 188402
2013-08-14 18:54:12 +00:00
Chandler Carruth 2de93afee3 Fix a really terrifying but improbable bug in mem2reg. If you have seen
extremely subtle miscompilations (such as a load getting replaced with
the value stored *below* the load within a basic block) related to
promoting an alloca to an SSA value, there is the dim possibility that
you hit this. Please let me know if you won this unfortunate lottery.

The first half of mem2reg's core logic (as it is used both in the
standalone mem2reg pass and in SROA) builds up a mapping from
'Instruction *' to the index of that instruction within its basic block.
This allows quickly establishing which store dominate a particular load
even for large basic blocks. We cache this information throughout the
run of mem2reg over a function in order to amortize the cost of
computing it.

This is not in and of itself a strange pattern in LLVM. However, it
introduces a very important constraint: absolutely no instruction can be
deleted from the program without updating the mapping. Otherwise a newly
allocated instruction might get the same pointer address, and then end
up with a wrong index. Yes, LLVM routinely suffers from a *single
threaded* variant of the ABA problem. Most places in LLVM don't find
avoiding this an imposition because they don't both delete and create
new instructions iteratively, but mem2reg *loves* to do this... All the
time. Fortunately, the mem2reg code was really careful about updating
this cache to handle this eventuallity... except when it comes to the
debug declare intrinsic. Oops. The fix is to invalidate that pointer in
the cache when we delete it, the same as we do when deleting alloca
instructions and other instructions.

I've also caused the same bug in new code while working on a fix to
PR16867, so this seems to be a really unfortunate pattern. Hopefully in
subsequent patches the deletion of dead instructions can be consolidated
sufficiently to make it less likely that we'll see future occurences of
this bug.

Sorry for not having a test case, but I have literally no idea how to
reliably trigger this kind of thing. It may be single-threaded, but it
remains an ABA problem. It would require a really amazing number of
stars to align.

llvm-svn: 188367
2013-08-14 08:56:41 +00:00
Matt Arsenault 9e3a6ca698 Fix always creating GEP with i32 indices
Use the pointer size if datalayout is available.
Use i64 if it's not, which is consistent with what other
places do when the pointer size is unknown.

The test doesn't really test this in a useful way
since it will be transformed to that later anyway,
but this now tests it for non-zero arrays and when
datalayout isn't available. The cases in
visitGetElementPtrInst should save an extra re-visit to
the newly created GEP since it won't need to cleanup after
itself.

llvm-svn: 188339
2013-08-14 00:24:38 +00:00
Matt Arsenault fc00f7eabd Use type helper functions instead of cast
llvm-svn: 188338
2013-08-14 00:24:34 +00:00
Matt Arsenault 640ff9dbcf Use array initializer, space around operator
llvm-svn: 188337
2013-08-14 00:24:05 +00:00
Hal Finkel 1a61f621da BBVectorize: Add initial stores to the write set when tracking uses
When computing the use set of a store, we need to add the store to the write
set prior to iterating over later instructions. Otherwise, if there is a later
aliasing load of that store, that load will not be tagged as a use, and bad
things will happen.

trackUsesOfI still adds later dependent stores of an instruction to that
instruction's write set, but it never sees the original instruction, and so
when tracking uses of a store, the store must be added to the write set by the
caller.

Fixes PR16834.

llvm-svn: 188329
2013-08-13 23:34:32 +00:00
Nick Lewycky c7776f737f Revert r187191, which broke opt -mem2reg on the testcases included in PR16867.
However, opt -O2 doesn't run mem2reg directly so nobody noticed until r188146
when SROA started sending more things directly down the PromoteMemToReg path.

In order to revert r187191, I also revert dependent revisions r187296, r187322
and r188146. Fixes PR16867. Does not add the testcases from that PR, but both
of them should get added for both mem2reg and sroa when this revert gets
unreverted.

llvm-svn: 188327
2013-08-13 22:51:58 +00:00
Dmitry Vyukov 96a7084620 dfsan: fix lint warnings
llvm-svn: 188293
2013-08-13 16:52:41 +00:00
Arnold Schwaighofer 124ccf3ad1 Also remove logic in LateVectorize
llvm-svn: 188285
2013-08-13 16:12:04 +00:00
Arnold Schwaighofer c14b59d1a1 Remove logic that decides whether to vectorize or not depending on O-levels
I have moved this logic into clang and opt.

llvm-svn: 188281
2013-08-13 15:51:25 +00:00
Peter Collingbourne 8d642de169 Reapply r188119 now that the bug it exposed is fixed.
llvm-svn: 188217
2013-08-12 22:38:43 +00:00
Peter Collingbourne fb3a2b4f97 DataFlowSanitizer: fix a use-after-free. Spotted by libgmalloc.
llvm-svn: 188216
2013-08-12 22:38:39 +00:00
Bill Wendling e1eaecd528 Move stack protector names to the same place.
llvm-svn: 188198
2013-08-12 20:09:37 +00:00
Nadav Rotem e23147bbd4 Fix PR16797 - Support PHINodes with multiple inputs from the same basic block.
Do not generate new vector values for the same entries because we know that the incoming values
from the same block must be identical.

llvm-svn: 188185
2013-08-12 17:46:44 +00:00
Alexey Samsonov 15dc0af78b Remove unused SpecialCaseList constructors
llvm-svn: 188171
2013-08-12 11:50:44 +00:00
Alexey Samsonov e4b5fb8851 Add SpecialCaseList::createOrDie() factory and use it in sanitizer passes
llvm-svn: 188169
2013-08-12 11:46:09 +00:00
Alexey Samsonov 9e4fdd2656 Introduce factory methods for SpecialCaseList
Summary:
Doing work in constructors is bad: this change suggests to
call SpecialCaseList::create(Path, Error) instead of
"new SpecialCaseList(Path)". Currently the latter may crash with
report_fatal_error, which is undesirable - sometimes we want to report
the error to user gracefully - for example, if he provides an incorrect
file as an argument of Clang's -fsanitize-blacklist flag.

Reviewers: pcc

Reviewed By: pcc

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D1327

llvm-svn: 188156
2013-08-12 07:49:36 +00:00
Richard Sandiford feb34713d5 Fix big-endian handling of integer-to-vector bitcasts in InstCombine
These functions used to assume that the lsb of an integer corresponds
to vector element 0, whereas for big-endian it's the other way around:
the msb is in the first element and the lsb is in the last element.

Fixes MultiSource/Benchmarks/mediabench/gsm/toast for z.

llvm-svn: 188155
2013-08-12 07:26:09 +00:00
Chandler Carruth d7cd7e367e Re-instate r187323 which fast-tracks promotable allocas as soon as the
SROA-based analysis has enough information. This should work now that
both mem2reg *and* the SSAUpdater-based AllocaPromoter have been updated
to be able to promote the types of allocas that the SROA analysis
detects.

I've included tests for the AllocaPromoter that were only possible to
write once we fast-tracked promotable allocas without rewriting them.
This includes a test both for r187347 and r188145.

Original commit log for r187323:
"""
Now that mem2reg understands how to cope with a slightly wider set of uses of
an alloca, we can pre-compute promotability while analyzing an alloca for
splitting in SROA. That lets us short-circuit the common case of a bunch of
trivially promotable allocas. This cuts 20% to 30% off the run time of SROA for
typical frontend-generated IR sequneces I'm seeing. It gets the new SROA to
within 20% of ScalarRepl for such code. My current benchmark for these numbers
is PR15412, but it fits the general pattern of IR emitted by Clang so it should
be widely applicable.
"""

llvm-svn: 188146
2013-08-11 02:17:11 +00:00
Chandler Carruth c17283b407 Finish fixing the SSAUpdater-based AllocaPromoter strategy in SROA to cope with
the more general set of patterns that are now handled by mem2reg and that we
can detect quickly while doing SROA's initial analysis. Notably, this allows it
to promote through no-op bitcast and GEP sequences. A core part of the
SSAUpdater approach is the ability to test whether a particular instruction is
part of the set being promoted. Testing this becomes significantly more complex
in the world where the operand to every load and store isn't the alloca itself.
I ended up using the approach of walking up the def-chain until we find the
alloca. I benchmarked this against keeping a set of pointer operands and
keeping a set of the loads and stores we care about, and this one seemed faster
although the difference was very small.

No test case yet because currently the rewriting always "fixes" the inputs to
not require this. The next patch which re-enables early promotion of easy cases
in SROA will include a test case that specifically exercises this aspect of the
alloca promoter.

llvm-svn: 188145
2013-08-11 01:56:15 +00:00
Chandler Carruth 45b136f4cf Reformat some bits of AllocaPromoter and simplify the name and type of
our visiting datastructures in the AllocaPromoter/SSAUpdater path of
SROA. Also shift the order if clears around to be more consistent.

No functionality changed here, this is just a cleanup.

llvm-svn: 188144
2013-08-11 01:03:18 +00:00
Arnold Schwaighofer 3dcdb89d69 Revert r188119 "Kill some duplicated code for removing unreachable BBs."
It is breaking builbots with libgmalloc enabled on Mac OS X.

$ cd llvm ; mkdir release ; cd release
$ ../configure --enable-optimized —prefix=$PWD/install
$ make
$ make check
$ Release+Asserts/bin/llvm-lit -v --param use_gmalloc=1 --param \
  gmalloc_path=/usr/lib/libgmalloc.dylib \
  ../test/Instrumentation/DataFlowSanitizer/args-unreachable-bb.ll

llvm-svn: 188142
2013-08-10 20:16:06 +00:00
Michael Gottesman d6ce6cbdac [objc-arc] Track if we encountered an additive overflow while computing {TopDown,BottomUp}PathCounts and do nothing if it occurred.
I fixed the aforementioned problems that came up on some of the linux boxes.
Major thanks to Nick Lewycky for his help debugging!

rdar://14590914

llvm-svn: 188122
2013-08-09 23:22:27 +00:00
Peter Collingbourne 32090aba06 Kill some duplicated code for removing unreachable BBs.
This moves removeUnreachableBlocksFromFn from SimplifyCFGPass.cpp
to Utils/Local.cpp and uses it to replace the implementation of
llvm::removeUnreachableBlocks, which appears to do a strict subset
of what removeUnreachableBlocksFromFn does.

Differential Revision: http://llvm-reviews.chandlerc.com/D1334

llvm-svn: 188119
2013-08-09 22:47:24 +00:00
Peter Collingbourne ae66d57bcf DataFlowSanitizer: Remove unreachable BBs so IR continues to verify
under the args ABI.

Differential Revision: http://llvm-reviews.chandlerc.com/D1316

llvm-svn: 188113
2013-08-09 21:42:53 +00:00
Jakub Staszak 23ec6a97d1 Mark obviously const methods. Also use reference for parameters when possible.
llvm-svn: 188103
2013-08-09 20:53:48 +00:00
Michael Gottesman 6663c7d5fc Revert "[objc-arc] Track if we encountered an additive overflow while computing {TopDown,BottomUp}PathCounts and do nothing if it occured."
This reverts commit r187941.

The commit was passing on my os x box, but it is failing on some non-osx
platforms. I do not have time to look into it now, so I am reverting and will
recommit after I figure this out.

llvm-svn: 187946
2013-08-08 00:41:18 +00:00
Peter Collingbourne a5689e69af Fix ARM build.
llvm-svn: 187944
2013-08-08 00:15:27 +00:00
Michael Gottesman ddc89fcccd [objc-arc] Track if we encountered an additive overflow while computing {TopDown,BottomUp}PathCounts and do nothing if it occured.
rdar://14590914

llvm-svn: 187941
2013-08-07 23:56:41 +00:00
Michael Gottesman 0fecf98955 [objc-arc] Change 4 iterator methods which return const_iterators to be const methods.
llvm-svn: 187940
2013-08-07 23:56:34 +00:00
Hal Finkel 171817ee8a Add ISD::FROUND for libm round()
All libm floating-point rounding functions, except for round(), had their own
ISD nodes. Recent PowerPC cores have an instruction for round(), and so here I'm
adding ISD::FROUND so that round() can be custom lowered as well.

For the most part, this is straightforward. I've added an intrinsic
and a matching ISD node just like those for nearbyint() and friends. The
SelectionDAG pattern I've named frnd (because ISD::FP_ROUND has already claimed
fround).

This will be used by the PowerPC backend in a follow-up commit.

llvm-svn: 187926
2013-08-07 22:49:12 +00:00
Peter Collingbourne e5d5b0c71e DataFlowSanitizer; LLVM changes.
DataFlowSanitizer is a generalised dynamic data flow analysis.

Unlike other Sanitizer tools, this tool is not designed to detect a
specific class of bugs on its own.  Instead, it provides a generic
dynamic data flow analysis framework to be used by clients to help
detect application-specific issues within their own code.

Differential Revision: http://llvm-reviews.chandlerc.com/D965

llvm-svn: 187923
2013-08-07 22:47:18 +00:00
Benjamin Kramer 6a4976d3e0 JumpThreading: Turn a select instruction into branching if it allows to thread one half of the select.
This is a common pattern coming out of simplifycfg generating gross code.

a:                                       ; preds = %entry
  %sel = select i1 %cmp1, double %add, double 0.000000e+00
  br label %b

b:
  %cond5 = phi double [ %sel, %a ], [ %sub, %entry ]
  %cmp6 = fcmp oeq double %cond5, 0.000000e+00
  br i1 %cmp6, label %if.then, label %if.end

becomes

a:
  br i1 %cmp1, label %b, label %if.then

b:
  %cond5 = phi double [ %sub, %entry ], [ %add, %a ]
  %cmp6 = fcmp oeq double %cond5, 0.000000e+00
  br i1 %cmp6, label %if.then, label %if.end

Skipping block b completely if possible.

llvm-svn: 187880
2013-08-07 10:29:38 +00:00
Bill Wendling 58f8cef83b Change the linkage of these global values to 'internal'.
The globals being generated here were given the 'private' linkage type. However,
this caused them to end up in different sections with the wrong prefix. E.g.,
they would be in the __TEXT,__const section with an 'L' prefix instead of an 'l'
(lowercase ell) prefix.

The problem is that the linker will eat a literal label with 'L'. If a weak
symbol is then placed into the __TEXT,__const section near that literal, then it
cannot distinguish between the literal and the weak symbol.

Part of the problems here was introduced because the address sanitizer converted
some C strings into constant initializers with trailing nuls. (Thus putting them
in the __const section with the wrong prefix.) The others were variables that
the address sanitizer created but simply had the wrong linkage type.

llvm-svn: 187827
2013-08-06 22:52:42 +00:00
Arnold Schwaighofer a7cd6bf3bb LoopVectorize: Allow vectorization of loops with lifetime markers
Patch by Marc Jessome!

llvm-svn: 187825
2013-08-06 22:37:52 +00:00
Jakub Staszak 27da123d66 Adjust file to the coding standard.
llvm-svn: 187808
2013-08-06 17:03:42 +00:00
Serge Pavlov 71044cbe16 Unbreak Debug build on Windows
llvm-svn: 187786
2013-08-06 08:44:18 +00:00
Tom Stellard aa664d9b92 Factor FlattenCFG out from SimplifyCFG
Patch by: Mei Ye

llvm-svn: 187764
2013-08-06 02:43:45 +00:00
Matt Arsenault ff7dc7248e Fix missing -*- C++ -*-s
llvm-svn: 187758
2013-08-06 00:16:21 +00:00
Peter Collingbourne bace606657 Introduce an optimisation for special case lists with large numbers of literal entries.
Our internal regex implementation does not cope with large numbers
of anchors very efficiently.  Given a ~3600-entry special case list,
regex compilation can take on the order of seconds.  This patch solves
the problem for the special case of patterns matching literal global
names (i.e. patterns with no regex metacharacters).  Rather than
forming regexes from literal global name patterns, add them to
a StringSet which is checked before matching against the regex.
This reduces regex compilation time by an order of roughly thousands
when reading the aforementioned special case list, according to a
completely unscientific study.

No test cases.  I figure that any new tests for this code should
check that regex metacharacters are properly recognised.  However,
I could not find any documentation which documents the fact that the
syntax of global names in special case lists is based on regexes.
The extent to which regex syntax is supported in special case lists
should probably be decided on/documented before writing tests.

Differential Revision: http://llvm-reviews.chandlerc.com/D1150

llvm-svn: 187732
2013-08-05 17:48:04 +00:00
Alexey Samsonov f52b717db3 80-cols
llvm-svn: 187725
2013-08-05 13:19:49 +00:00
Nadav Rotem 5defea90e6 SLPVectorizer: Fix PR16777. PHInodes may use multiple extracted values that come from different blocks.
Thanks Alexey Samsonov.

llvm-svn: 187663
2013-08-02 18:40:24 +00:00
Alexey Samsonov 9096968de5 Fix dereferencing end iterator in SimplifyCFG. Patch by Ye Mei.
llvm-svn: 187646
2013-08-02 08:06:43 +00:00
Matt Arsenault 87dc60761f Teach getOrEnforceKnownAlignment about address spaces
llvm-svn: 187629
2013-08-01 22:42:18 +00:00
Nadav Rotem e4e6e9ed47 Move the optlevel check to the frontend.
llvm-svn: 187628
2013-08-01 22:41:58 +00:00
Nadav Rotem 9153b3871d Only enable SLP-vectorization on O3 builds.
llvm-svn: 187595
2013-08-01 18:28:15 +00:00
Nadav Rotem 25f15358d2 80-col
llvm-svn: 187535
2013-07-31 22:17:45 +00:00
Owen Anderson c7be519dc0 Preserve fast-math flags when folding (fsub x, (fneg y)) to (fadd x, y).
llvm-svn: 187462
2013-07-30 23:53:17 +00:00