Commit Graph

12479 Commits

Author SHA1 Message Date
Sanjay Patel db8e6f472e fix typo; NFC
llvm-svn: 225753
2015-01-13 01:51:52 +00:00
Sanjay Patel 06d5589a84 80-cols; NFC
llvm-svn: 225700
2015-01-12 21:21:28 +00:00
Duncan P. N. Exon Smith 118632dbf6 IR: Split GenericMDNode into MDTuple and UniquableMDNode
Split `GenericMDNode` into two classes (with more descriptive names).

  - `UniquableMDNode` will be a common subclass for `MDNode`s that are
    sometimes uniqued like constants, and sometimes 'distinct'.

    This class gets the (short-lived) RAUW support and related API.

  - `MDTuple` is the basic tuple that has always been returned by
    `MDNode::get()`.  This is as opposed to more specific nodes to be
    added soon, which have additional fields, custom assembly syntax,
    and extra semantics.

    This class gets the hash-related logic, since other sublcasses of
    `UniquableMDNode` may need to hash based on other fields.

To keep this diff from getting too big, I've added casts to `MDTuple`
that won't really scale as new subclasses of `UniquableMDNode` are
added, but I'll clean those up incrementally.

(No functionality change intended.)

llvm-svn: 225682
2015-01-12 20:09:34 +00:00
Sanjay Patel 5f1d9eaad3 GVN: propagate equalities for floating point compares
Allow optimizations based on FP comparison values in the same way
as integers. 

This resolves PR17713:
http://llvm.org/bugs/show_bug.cgi?id=17713

Differential Revision: http://reviews.llvm.org/D6911

llvm-svn: 225660
2015-01-12 19:29:48 +00:00
Timur Iskhodzhanov 00ede84084 [ASan] Move the shadow on Windows 32-bit from 0x20000000 to 0x40000000
llvm-svn: 225641
2015-01-12 17:38:58 +00:00
Ahmed Bougacha e03bef7543 [SimplifyLibCalls] Factor out fortified libcall handling.
This lets us remove CGP duplicate.

Differential Revision: http://reviews.llvm.org/D6541

llvm-svn: 225640
2015-01-12 17:22:43 +00:00
Ahmed Bougacha 6722f5e5b3 [SimplifyLibCalls] Factor out str/mem libcall optimizations.
Put them in a separate function, so we can reuse them to further
simplify fortified libcalls as well.

Differential Revision: http://reviews.llvm.org/D6540

llvm-svn: 225639
2015-01-12 17:20:06 +00:00
Ahmed Bougacha b7d8afb6c5 [SimplifyLibCalls] Factor out signature checks for fortifiable libcalls.
The checks are the same for fortified counterparts to the libcalls, so
we might as well do them in a single place.

Differential Revision: http://reviews.llvm.org/D6539

llvm-svn: 225638
2015-01-12 17:18:19 +00:00
Hal Finkel 38dd590861 [LoopUnroll] Fix the partial unrolling threshold for small loop sizes
When we compute the size of a loop, we include the branch on the backedge and
the comparison feeding the conditional branch. Under normal circumstances,
these don't get replicated with the rest of the loop body when we unroll. This
led to the somewhat surprising behavior that really small loops would not get
unrolled enough -- they could be unrolled more and the resulting loop would be
below the threshold, because we were assuming they'd take
(LoopSize * UnrollingFactor) instructions after unrolling, instead of
(((LoopSize-2) * UnrollingFactor)+2) instructions. This fixes that computation.

llvm-svn: 225565
2015-01-10 00:30:55 +00:00
Michael Zolotukhin d9ade185b9 Update comment.
llvm-svn: 225553
2015-01-09 22:15:06 +00:00
Hans Wennborg dcc6e5bc03 SimplifyCFG: check uses of constant-foldable instrs in switch destinations (PR20210)
The previous code assumed that such instructions could not have any uses
outside CaseDest, with the motivation that the instruction could not
dominate CommonDest because CommonDest has phi nodes in it. That simply
isn't true; e.g., CommonDest could have an edge back to itself.

llvm-svn: 225552
2015-01-09 22:13:31 +00:00
Michael Zolotukhin 1c38bc12de Remove duplicating code. NFC.
The removed condition is checked in the previous loop.

llvm-svn: 225542
2015-01-09 20:36:19 +00:00
Tim Northover eb16112e97 Re-reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before
doing Load PRE"

It's not really expected to stick around, last time it provoked a weird LTO
build failure that I can't reproduce now, and the bot logs are long gone. I'll
re-revert it if the failures recur.

Original description: Perform Scalar PRE on gep indices that feed loads before
doing Load PRE.

llvm-svn: 225536
2015-01-09 19:19:56 +00:00
Suyog Sarda 85d0473650 Assumption that "VectorizedValue" will always be an Instruction is not correct.
It can be a constant or a vector argument.

ex :

define i32 @hadd(<4 x i32> %a) #0 {
entry:
  %vecext = extractelement <4 x i32> %a, i32 0
  %vecext1 = extractelement <4 x i32> %a, i32 1
  %add = add i32 %vecext, %vecext1
  %vecext2 = extractelement <4 x i32> %a, i32 2
  %add3 = add i32 %add, %vecext2
  %vecext4 = extractelement <4 x i32> %a, i32 3
  %add5 = add i32 %add3, %vecext4
  ret i32 %add5
}

llvm-svn: 225517
2015-01-09 10:23:48 +00:00
Philip Reames 567feb98f0 [Refactor] Have getNonLocalPointerDependency take the query instruction
Previously, MemoryDependenceAnalysis::getNonLocalPointerDependency was taking a list of properties about the instruction being queried. Since I'm about to need one more property to be passed down through the infrastructure - I need to know a query instruction is non-volatile in an inner helper - fix the interface once and for all.

I also added some assertions and behaviour clarifications around volatile and ordered field accesses. At the moment, this is mostly to document expected behaviour. The only non-standard instructions which can currently reach this are atomic, but unordered, loads and stores. Neither ordered or volatile accesses can reach here.

The call in GVN is protected by an isSimple check when it first considers the load. The calls in MemDepPrinter are protected by isUnordered checks. Both utilities also check isVolatile for loads and stores.

llvm-svn: 225481
2015-01-09 00:04:22 +00:00
Duncan P. N. Exon Smith 953e1a48f0 Utils: Keep distinct MDNodes distinct in MapMetadata()
Create new copies of distinct `MDNode`s instead of following the
uniquing `MDNode` logic.

Just like self-references (or other cycles), `MapMetadata()` creates a
new node.  In practice most calls use `RF_NoModuleLevelChanges`, in
which case nothing is duplicated anyway.

Part of PR22111.

llvm-svn: 225476
2015-01-08 22:42:30 +00:00
Matt Arsenault b935d9df4c Fix fcmp + fabs instcombines when using the intrinsic
This was only handling the libcall. This is another example
of why only the intrinsic should ever be used when it exists.

llvm-svn: 225465
2015-01-08 20:09:34 +00:00
Adrian Prantl 2561bb8831 Revert "Reapply: Teach SROA how to update debug info for fragmented variables."
This reverts commit r225379 while investigating an assertion failure reported
by Alexey.

llvm-svn: 225424
2015-01-08 02:02:00 +00:00
Adrian Prantl 72b8ee708f Reapply: Teach SROA how to update debug info for fragmented variables.
The two buildbot failures were addressed in LLVM r225378 and CFE r225359.

This rapplies commit 225272 without modifications.

llvm-svn: 225379
2015-01-07 20:52:22 +00:00
David Majnemer 5310c1e954 Analysis: Reformulate WillNotOverflowUnsignedAdd for reusability
WillNotOverflowUnsignedAdd's smarts will live in ValueTracking as
computeOverflowForUnsignedAdd.  It now returns a tri-state result:
never overflows, always overflows and sometimes overflows.

llvm-svn: 225329
2015-01-07 00:39:50 +00:00
David Majnemer 3b83b3fa0b InstCombine: Just a small tidy-up
llvm-svn: 225328
2015-01-07 00:39:42 +00:00
Adrian Prantl 52f943b536 Revert "Reapply: Teach SROA how to update debug info for fragmented variables."
because of a tsan buildbot failure.
This reverts commit 225272.

Fix should be coming soon.

llvm-svn: 225288
2015-01-06 19:47:27 +00:00
Sanjoy Das 7c0ce26614 This patch teaches IndVarSimplify to add nuw and nsw to certain kinds
of operations that provably don't overflow. For example, we can prove
%civ.inc below does not sign-overflow. With this change,
IndVarSimplify changes %civ.inc to an add nsw.

  define i32 @foo(i32* %array, i32* %length_ptr, i32 %init) {
   entry:
    %length = load i32* %length_ptr, !range !0
    %len.sub.1 = sub i32 %length, 1
    %upper = icmp slt i32 %init, %len.sub.1
    br i1 %upper, label %loop, label %exit
  
   loop:
    %civ = phi i32 [ %init, %entry ], [ %civ.inc, %latch ]
    %civ.inc = add i32 %civ, 1
    %cmp = icmp slt i32 %civ.inc, %length
    br i1 %cmp, label %latch, label %break
  
   latch:
    store i32 0, i32* %array
    %check = icmp slt i32 %civ.inc, %len.sub.1
    br i1 %check, label %loop, label %break
  
   break:
    ret i32 %civ.inc
  
   exit:
    ret i32 42
  }

Differential Revision: http://reviews.llvm.org/D6748

llvm-svn: 225282
2015-01-06 19:02:56 +00:00
Adrian Prantl 8335a5724a Reapply: Teach SROA how to update debug info for fragmented variables.
This also rolls in the changes discussed in http://reviews.llvm.org/D6766.
Defers migrating the debug info for new allocas until after all partitions
are created.

Thanks to Chandler for reviewing!

llvm-svn: 225272
2015-01-06 17:14:10 +00:00
Matt Arsenault 55e7312cd8 Convert fcmp with 0.0 from casted integers to icmp
This is already handled in general when it is known the
conversion can't lose bits with smaller integer types
casted into wider floating point types.

This pattern happens somewhat often in GPU programs that cast
workitem intrinsics to float, which are often compared with 0.

Specifically handle the special case of compares with zero which
should also be known to not lose information. I had a more general
version of this which allows equality compares if the casted float is
exactly representable in the integer, but I'm not 100% confident that
is always correct.

Also fold cases that aren't integers to true / false.

llvm-svn: 225265
2015-01-06 15:50:59 +00:00
David Majnemer 9b6b822814 InstCombine: Bitcast call arguments from/to pointer/integer type
Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing
arguments and return values when DataLayout says it is safe to do so.

llvm-svn: 225254
2015-01-06 08:41:31 +00:00
Saleem Abdulrasool 150a1dc5c2 SymbolRewriter: use iplist::splice
The swap implementation for iplist is currently unsupported.  Simply splice the
old list into place, which achieves the same purpose.  This is needed in order
to thread the -frewrite-map-file frontend option correctly.  NFC.

llvm-svn: 225186
2015-01-05 17:56:32 +00:00
Saleem Abdulrasool d37ce30888 SymbolRewriter: 80-column
Wrap a couple of lines.  NFC.

llvm-svn: 225185
2015-01-05 17:56:29 +00:00
Craig Topper d3c02f177a Replace several 'assert(false' with 'llvm_unreachable' or fold a condition into the assert.
llvm-svn: 225160
2015-01-05 10:15:49 +00:00
Jiangning Liu 40c1b35292 Fixed a bug in memory dependence checking module of loop vectorization. The following loop should not be vectorized with current algorithm.
{code}
// loop body
   ... = a[i]          (1)
    ... = a[i+1]       (2)
 .......
a[i+1] = ....          (3)
   a[i] = ...          (4)
{code}

The algorithm tries to collect memory access candidates from AliasSetTracker, and then check memory dependences one another. The memory accesses are unique in AliasSetTracker, and a single memory access in AliasSetTracker may map to multiple entries in AccessAnalysis, which could cover both 'read' and 'write'. Originally the algorithm only checked 'write' entry in Accesses if only 'write' exists. This is incorrect and the consequence is it ignored all read access, and finally some RAW and WAR dependence are missed.

For the case given above, if we ignore two reads, the dependence between (1) and (3) would not be able to be captured, and finally this loop will be incorrectly vectorized.

The fix simply inserts a new loop to find all entries in Accesses. Since it will skip most of all other memory accesses by checking the Value pointer at the very beginning of the loop, it should not increase compile-time visibly.

llvm-svn: 225159
2015-01-05 10:08:58 +00:00
Chandler Carruth 73b0164fe5 [SROA] Apply a somewhat heavy and unpleasant hammer to fix PR22093, an
assert out of the new pre-splitting in SROA.

This fix makes the code do what was originally intended -- when we have
a store of a load both dealing in the same alloca, we force them to both
be pre-split with identical offsets. This is really quite hard to do
because we can keep discovering problems as we go along. We have to
track every load over the current alloca which for any resaon becomes
invalid for pre-splitting, and go back to remove all stores of those
loads. I've included a couple of test cases derived from PR22093 that
cover the different ways this can happen. While that PR only really
triggered the first of these two, its the same fundamental issue.

The other challenge here is documented in a FIXME now. We end up being
quite a bit more aggressive for pre-splitting when loads and stores
don't refer to the same alloca. This aggressiveness comes at the cost of
introducing potentially redundant loads. It isn't clear that this is the
right balance. It might be considerably better to require that we only
do pre-splitting when we can presplit every load and store involved in
the entire operation. That would give more consistent if conservative
results. Unfortunately, it requires a non-trivial change to the actual
pre-splitting operation in order to correctly handle cases where we end
up pre-splitting stores out-of-order. And it isn't 100% clear that this
is the right direction, although I'm starting to suspect that it is.

llvm-svn: 225149
2015-01-05 04:17:53 +00:00
Chandler Carruth 66b3130cda [PM] Split the AssumptionTracker immutable pass into two separate APIs:
a cache of assumptions for a single function, and an immutable pass that
manages those caches.

The motivation for this change is two fold. Immutable analyses are
really hacks around the current pass manager design and don't exist in
the new design. This is usually OK, but it requires that the core logic
of an immutable pass be reasonably partitioned off from the pass logic.
This change does precisely that. As a consequence it also paves the way
for the *many* utility functions that deal in the assumptions to live in
both pass manager worlds by creating an separate non-pass object with
its own independent API that they all rely on. Now, the only bits of the
system that deal with the actual pass mechanics are those that actually
need to deal with the pass mechanics.

Once this separation is made, several simplifications become pretty
obvious in the assumption cache itself. Rather than using a set and
callback value handles, it can just be a vector of weak value handles.
The callers can easily skip the handles that are null, and eventually we
can wrap all of this up behind a filter iterator.

For now, this adds boiler plate to the various passes, but this kind of
boiler plate will end up making it possible to port these passes to the
new pass manager, and so it will end up factored away pretty reasonably.

llvm-svn: 225131
2015-01-04 12:03:27 +00:00
David Majnemer 087dc8b831 InstCombine: match can find ConstantExprs, don't assume we have a Value
We assumed the output of a match was a Value, this would cause us to
assert because we would fail a cast<>.  Instead, use a helper in the
Operator family to hide the distinction between Value and Constant.

This fixes PR22087.

llvm-svn: 225127
2015-01-04 07:36:02 +00:00
Kostya Serebryany d421db05bb [asan] simplify the tracing code, make it use the same guard variables as coverage
llvm-svn: 225103
2015-01-03 00:54:43 +00:00
David Majnemer c8a576b5c0 InstCombine: Detect when llvm.umul.with.overflow always overflows
We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero
and LHSKnownOne * RHSKnownOne overflow.

llvm-svn: 225077
2015-01-02 07:29:47 +00:00
David Majnemer 491331aca8 Analysis: Reformulate WillNotOverflowUnsignedMul for reusability
WillNotOverflowUnsignedMul's smarts will live in ValueTracking as
computeOverflowForUnsignedMul.  It now returns a tri-state result:
never overflows, always overflows and sometimes overflows.

llvm-svn: 225076
2015-01-02 07:29:43 +00:00
Chandler Carruth 24ac830d7c [SROA] Teach SROA to be more aggressive in splitting now that we have
a pre-splitting pass over loads and stores.

Historically, splitting could cause enough problems that I hamstrung the
entire process with a requirement that splittable integer loads and
stores must cover the entire alloca. All smaller loads and stores were
unsplittable to prevent chaos from ensuing. With the new pre-splitting
logic that does load/store pair splitting I introduced in r225061, we
can now very nicely handle arbitrarily splittable loads and stores. In
order to fully benefit from these smarts, we need to mark all of the
integer loads and stores as splittable.

However, we don't actually want to rewrite partitions with all integer
loads and stores marked as splittable. This will fail to extract scalar
integers from aggregates, which is kind of the point of SROA. =] In
order to resolve this, what we really want to do is only do
pre-splitting on the alloca slices with integer loads and stores fully
splittable. This allows us to uncover all non-integer uses of the alloca
that would benefit from a split in an integer load or store (and where
introducing the split is safe because it is just memory transfer from
a load to a store). Once done, we make all the non-whole-alloca integer
loads and stores unsplittable just as they have historically been,
repartition and rewrite.

The result is that when there are integer loads and stores anywhere
within an alloca (such as from a memcpy of a sub-object of a larger
object), we can split them up if there are non-integer components to the
aggregate hiding beneath. I've added the challenging test cases to
demonstrate how this is able to promote to scalars even a case where we
have even *partially* overlapping loads and stores.

This restores the single-store behavior for small arrays of i8s which is
really nice. I've restored both the little endian testing and big endian
testing for these exactly as they were prior to r225061. It also forced
me to be more aggressive in an alignment test to actually defeat SROA.
=] Without the added volatiles there, we actually split up the weird i16
loads and produce nice double allocas with better alignment.

This also uncovered a number of bugs where we failed to handle
splittable load and store slices which didn't have a begininng offset of
zero. Those fixes are included, and without them the existing test cases
explode in glorious fireworks. =]

I've kept support for leaving whole-alloca integer loads and stores as
splittable even for the purpose of rewriting, but I think that's likely
no longer needed. With the new pre-splitting, we might be able to remove
all the splitting support for loads and stores from the rewriter. Not
doing that in this patch to try to isolate any performance regressions
that causes in an easy to find and revert chunk.

llvm-svn: 225074
2015-01-02 03:55:54 +00:00
Chandler Carruth 5986b541d4 [SROA] Make the computation of adjusted pointers not leak GEP
instructions.

I noticed this when working on dialing up how aggressively we can
pre-split loads and stores. My test case wasn't passing because dead
GEPs into the allocas persisted when they were built by this routine.
This isn't terribly harmful, we still rewrote and promoted the alloca
and I can't conceive of how to cause this to happen in a case where we
will keep the exact same alloca but rewrite and promote the uses of it.
If that ever happened, we'd get an assert out of mem2reg.

So I don't have a direct test case yet, but the subsequent commit's test
case wouldn't pass without this. There are other problems fixed by this
patch that I spotted purely by inspection such as the fact that
getAdjustedPtr could have actually deleted dead base pointers. I don't
know how to get a base pointer to go into getAdjustedPtr today, so
I think this bug could never have manifested (and I certainly can't
write a test case for it) but, it wasn't the intent of the code. The
code really just wanted to GC the new instructions built. That can be
done more directly by comparing with the base pointer which is the only
non-new instruction that this code can return.

llvm-svn: 225073
2015-01-02 02:47:38 +00:00
Chandler Carruth 29c22fae46 [SROA] Fix the loop exit placement to be prior to indexing the splits
array. This prevents it from walking out of bounds on the splits array.

Bug found with the existing tests by ASan and by the MSVC debug build.

llvm-svn: 225069
2015-01-02 00:10:22 +00:00
Chandler Carruth c39eaa5041 [SROA] Fix two total think-os in r225061 that should have been caught on
a +asserts bootstrap, but my bootstrap had asserts off. Oops.

Anyways, in some places it is reasonable to cast (as a sanity check) the
pointer operand to a load or store to an instruction within SROA --
namely when the pointer operand is expected to be derived from an
alloca, and thus always an instruction. However, the pre-splitting code
also deals with loads and stores to non-alloca pointers and there we
need to just use the Value*. Nothing about the code relied on the
instruction cast, it was only there essentially as an invariant
assertion. Remove the two that don't actually hold.

This should fix the proximate issue in PR22080, but I'm also doing an
asserts bootstrap myself to see if there are other issues lurking.

I'll craft a reduced test case in a moment, but I wanted to get the tree
healthy as quickly as possible.

llvm-svn: 225068
2015-01-01 23:26:16 +00:00
Chandler Carruth 6044c0bc78 [SROA] Switch to using a more direct debug logging technique in one part
of my new load and store splitting, and fix a bug where it logged
a totally irrelevant slice rather than the actual slice in question.

The logging here previously worked because we used to place new slices
onto the back of the core sequence, but that caused other problems.
I updated the actual code to store new slices in their own vector but
didn't update the logging. There isn't a good way to reuse the logging
any more, and frankly it wasn't needed. We can directly log this bit
more easily.

llvm-svn: 225063
2015-01-01 12:56:47 +00:00
Chandler Carruth 994cde8869 [SROA] Fix formatting with clang-format which I managed to fail to do
prior to committing r225061. Sorry for that.

llvm-svn: 225062
2015-01-01 12:01:03 +00:00
Chandler Carruth 0715cba02d [SROA] Teach SROA how to much more intelligently handle split loads and
stores.

When there are accesses to an entire alloca with an integer
load or store as well as accesses to small pieces of the alloca, SROA
splits up the large integer accesses. In order to do that, it uses bit
math to merge the small accesses into large integers. While this is
effective, it produces insane IR that can cause significant problems in
the rest of the optimizer:

- It can cause load and store mismatches with GVN on the non-alloca side
  where we end up loading an i64 (or some such) rather than loading
  specific elements that are stored.
- We can't always get rid of the integer bit math, which is why we can't
  always fix the loads and stores to work well with GVN.
- This is especially bad when we have operations that mix poorly with
  integer bit math such as floating point operations.
- It will block things like the vectorizer which might be able to handle
  the scalar stores that underly the aggregate.

At the same time, we can't just directly split up these loads and stores
in all cases. If there is actual integer arithmetic involved on the
values, then using integer bit math is actually the perfect lowering
because we can often combine it heavily with the surrounding math.

The solution this patch provides is to find places where SROA is
partitioning aggregates into small elements, and look for splittable
loads and stores that it can split all the way to some other adjacent
load and store. These are uniformly the cases where failing to split the
loads and stores hurts the optimizer that I have seen, and I've looked
extensively at the code produced both from more and less aggressive
approaches to this problem.

However, it is quite tricky to actually do this in SROA. We may have
loads and stores to the same alloca, or other complex patterns that are
hard to handle. This complexity leads to the somewhat subtle algorithm
implemented here. We have to do this entire process as a separate pass
over the partitioning of the alloca, and split up all of the loads prior
to splitting the stores so that we can handle safely the cases of
overlapping, including partially overlapping, loads and stores to the
same alloca. We also have to reconstitute the post-split slice
configuration so we can avoid iterating again over all the alloca uses
(the slow part of SROA). But we also have to ensure that when we split
up loads and stores to *other* allocas, we *do* re-iterate over them in
SROA to adapt to the more refined partitioning now required.

With this, I actually think we can fix a long-standing TODO in SROA
where I avoided splitting as many loads and stores as probably should be
splittable. This limitation historically mitigated the fallout of all
the bad things mentioned above. Now that we have more intelligent
handling, I plan to remove the FIXME and more aggressively mark integer
loads and stores as splittable. I'll do that in a follow-up patch to
help with bisecting any fallout.

The net result of this change should be more fine-grained and accurate
scalars being formed out of aggregates. At the very least, Clang now
generates perfect code for this high-level test case using
std::complex<float>:

  #include <complex>

  void g1(std::complex<float> &x, float a, float b) {
    x += std::complex<float>(a, b);
  }
  void g2(std::complex<float> &x, float a, float b) {
    x -= std::complex<float>(a, b);
  }

  void foo(const std::complex<float> &x, float a, float b,
           std::complex<float> &x1, std::complex<float> &x2) {
    std::complex<float> l1 = x;
    g1(l1, a, b);
    std::complex<float> l2 = x;
    g2(l2, a, b);
    x1 = l1;
    x2 = l2;
  }

This code isn't just hypothetical either. It was reduced out of the hot
inner loops of essentially every part of the Eigen math library when
using std::complex<float>. Those loops would consistently and
pervasively hop between the floating point unit and the integer unit due
to bit math extraction and insertion of floating point values that were
"stored" in a 64-bit integer register around the loop backedge.

So far, this change has passed a bootstrap and I have done some other
testing and so far, no issues. That doesn't mean there won't be though,
so I'll be prepared to help with any fallout. If you performance swings
in particular, please let me know. I'm very curious what all the impact
of this change will be. Stay tuned for the follow-up to also split more
integer loads and stores.

llvm-svn: 225061
2015-01-01 11:54:38 +00:00
Sanjay Patel e68f71574f InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X
Some day the backend may handle instruction-level fast math flags and make
this transform unnecessary, but it's still better practice to use the canonical
representation of fneg when possible (use a -0.0).

This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ).
See also http://reviews.llvm.org/D6723.

Differential Revision: http://reviews.llvm.org/D6731

llvm-svn: 225050
2014-12-31 22:14:05 +00:00
David Majnemer f89dc3edc9 InstCombine: try to transform A-B < 0 into A < B
We are allowed to move the 'B' to the right hand side if we an prove
there is no signed overflow and if the comparison itself is signed.

llvm-svn: 225034
2014-12-31 04:21:41 +00:00
Kostya Serebryany aa185bfc4b [asan] change _sanitizer_cov_module_init to accept int* instead of int**
llvm-svn: 224999
2014-12-30 19:29:28 +00:00
Elena Demikhovsky 84d1997b95 Some code improvements in Masked Load/Store.
No functional changes.

llvm-svn: 224986
2014-12-30 14:28:14 +00:00
Philip Reames 9db26ffc9a Carry facts about nullness and undef across GC relocation
This change implements four basic optimizations:

    If a relocated value isn't used, it doesn't need to be relocated.
    If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.)
    If the value being relocated is undef, the relocation is meaningless.
    If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.)

I outlined other planned work in comments.

Differential Revision: http://reviews.llvm.org/D6600

llvm-svn: 224968
2014-12-29 23:27:30 +00:00
Philip Reames b35f46ce06 Refine the notion of MayThrow in LICM to include a header specific version
In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up.

This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs.

define void @nothrow_header(i64 %x, i64 %y, i1 %cond) {
; CHECK-LABEL: nothrow_header
; CHECK-LABEL: entry
; CHECK: %div = udiv i64 %x, %y
; CHECK-LABEL: loop
; CHECK: call void @use(i64 %div)
entry:
  br label %loop
loop: ; preds = %entry, %for.inc
  %div = udiv i64 %x, %y
  br i1 %cond, label %loop-if, label %exit
loop-if:
  call void @use(i64 %div)
  br label %loop
exit:
  ret void
}

The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load.  The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata.

Differential Revision: http://reviews.llvm.org/D6725

llvm-svn: 224965
2014-12-29 23:00:57 +00:00
Philip Reames 5ad26c353c Loading from null is valid outside of addrspace 0
This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen.  This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces.

We really should introduce a hook to control this property on a per target per address space basis.  We may be loosing valuable optimizations in some address spaces by being too conservative.

Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me.

llvm-svn: 224961
2014-12-29 22:46:21 +00:00
David Majnemer b1296ec0fd InstCombine: Infer nuw for multiplies
A multiply cannot unsigned wrap if there are bitwidth, or more, leading
zero bits between the two operands.

llvm-svn: 224849
2014-12-26 09:50:35 +00:00
David Majnemer 54c2ca2539 InstCombe: Infer nsw for multiplies
We already utilize this logic for reducing overflow intrinsics, it makes
sense to reuse it for normal multiplies as well.

llvm-svn: 224847
2014-12-26 09:10:14 +00:00
Elena Demikhovsky fb81b93e17 Masked Load/Store - Changed the order of parameters in intrinsics.
No functional changes.
The documentation is coming.

llvm-svn: 224829
2014-12-25 07:49:20 +00:00
Chandler Carruth ffb7ce56a6 [SROA] Update the documentation and names for accessing the slices
within a partition of an alloca in SROA.

This reflects the fact that the organization of the slices isn't really
ideal for analysis, but is the naive way in which the slices are
available while we're processing them in the core partitioning
algorithm.

It is possible we could improve matters, and I've left a FIXME with
one of my ideas for how to do this, but it is a lot of work, the benefit
is somewhat minor, and it isn't clear that it would be strictly better.
=/ Not really satisfying, but I'm out of really good ideas.

This also improves one place where the debug logging failed to mark some
split partitions. Now we log in one place, slightly later, and with
accurate information about whether the slice is split by the partition
being rewritten.

llvm-svn: 224800
2014-12-24 01:48:09 +00:00
Chandler Carruth 5031bbe86a [SROA] Refactor the integer and vector promotion testing logic to
operate in terms of the new Partition class, and generally have a more
clear set of arguments. No functionality changed.

The most notable improvements here are consistently using the
terminology of 'partition' for a collection of slices that will be
rewritten together and 'slice' for a region of an alloca that is used by
a particular instruction.

This also makes it more clear that the split things are actually slices
as well, just ones that will be split by the proposed partition.

This doesn't yet address the confusing aspects of the partition's
interface where slices that will be split by the partition and start
prior to the partition are accesssed via Partition::splitSlices() while
the core range of slices exposed by a Partition includes both unsplit
slices and slices which will be split by the end, but started within the
offset range of the partition. This is particularly hard to address
because the algorithm which computes partitions quite literally doesn't
know which slices these will end up being until too late. I'm looking at
whether I can fix that or not, but I'm not optimistic. I'll update the
comments and/or names to further explain this either way. I've also
added one FIXME in this patch relating to this confusion so that I don't
forget about it.

llvm-svn: 224798
2014-12-24 01:05:14 +00:00
Kostya Serebryany 9fdeb37bd3 [asan] change the coverage collection scheme so that we can easily emit coverage for the entire process as a single bit set, and if coverage_bitset=1 actually emit that bitset
llvm-svn: 224789
2014-12-23 22:32:17 +00:00
Michael Liao 5313da3263 [SimplifyCFG] Revise common code sinking
- Fix the case where more than 1 common instructions derived from the same
  operand cannot be sunk. When a pair of value has more than 1 derived values
  in both branches, only 1 derived value could be sunk.
- Replace BB1 -> (BB2, PN) map with joint value map, i.e.
  map of (BB1, BB2) -> PN, which is more accurate to track common ops.

llvm-svn: 224757
2014-12-23 08:26:55 +00:00
Michael Kuperstein 0bf33ffde4 Remove a bad cast in CloneModule()
A cast that was introduced in r209007 was accidentally left in after the changes made to GlobalAlias rules in r210062. This crashes if the aliasee is a now-leggal ConstantExpr.

llvm-svn: 224756
2014-12-23 08:23:45 +00:00
Chandler Carruth c7d1e24b34 Revert r224739: Debug info: Teach SROA how to update debug info for
fragmented variables.

This caused codegen to start crashing when we built somewhat large
programs with debug info and optimizations. 'check-msan' hit in, and
I suspect a bootstrap would as well. I mailed a test case to the
review thread.

llvm-svn: 224750
2014-12-23 02:58:14 +00:00
David Blaikie ea37c1173e Remove dynamic allocation/indirection from GCOVBlocks owned by GCOVFunction
Since these are all created in the DenseMap before they are referenced,
there's no problem with pointer validity by the time it's required. This
removes another use of DeleteContainerSeconds/manual memory management
which I'm cleaning up from time to time.

llvm-svn: 224744
2014-12-22 23:12:42 +00:00
Chandler Carruth e2f66ceed9 [SROA] Lift the logic for traversing the alloca slices one partition at
a time into a partition iterator and a Partition class.

There is a lot of knock-on simplification that this enables, largely
stemming from having a Partition object to refer to in lots of helpers.
I've only done a minimal amount of that because enoguh stuff is changing
as-is in this commit.

This shouldn't change any observable behavior. I've worked hard to
preserve the *exact* traversal semantics which were originally present
even though some of them make no sense. I'll be changing some of this in
subsequent commits now that the logic is carefully factored into
a reusable place.

The primary motivation for this change is to break the rewriting into
phases in order to support more intelligent rewriting. For example, I'm
planning to change how split loads and stores are rewritten to remove
the significant overuse of integer bit packing in the resulting code and
allow more effective secondary splitting of aggregates. For any of this
to work, they have to share the exact traversal logic.

llvm-svn: 224742
2014-12-22 22:46:00 +00:00
Bruno Cardoso Lopes bad65c3b70 [LCSSA] Handle PHI insertion in disjoint loops
Take two disjoint Loops L1 and L2.

LoopSimplify fails to simplify some loops (e.g. when indirect branches
are involved). In such situations, it can happen that an exit for L1 is
the header of L2. Thus, when we create PHIs in one of such exits we are
also inserting PHIs in L2 header.

This could break LCSSA form for L2 because these inserted PHIs can also
have uses in L2 exits, which are never handled in the current
implementation. Provide a fix for this corner case and test that we
don't assert/crash on that.

Differential Revision: http://reviews.llvm.org/D6624

rdar://problem/19166231

llvm-svn: 224740
2014-12-22 22:35:46 +00:00
Adrian Prantl a47ace5901 Debug info: Teach SROA how to update debug info for fragmented variables.
This allows us to generate debug info for extremely advanced code such as

  typedef struct { long int a; int b;} S;

  int foo(S s) {
    return s.b;
  }

which at -O1 on x86_64 is codegen'd into

  define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 {
    ret i32 %s.coerce1, !dbg !24
  }

with this patch we emit the following debug info for this

  TAG_formal_parameter [3]
    AT_location( 0x00000000
                 0x0000000000000000 - 0x0000000000000006: rdi, piece 0x00000008, rsi, piece 0x00000004
                 0x0000000000000006 - 0x0000000000000008: rdi, piece 0x00000008, rax, piece 0x00000004 )
                 AT_name( "s" )
                 AT_decl_file( "/Volumes/Data/llvm/_build.ninja.release/test.c" )

Thanks to chandlerc, dblaikie, and echristo for their feedback on all
previous iterations of this patch!

llvm-svn: 224739
2014-12-22 22:26:00 +00:00
David Majnemer b0362e4ee6 InstCombine: Squash an icmp+select into bitwise arithmetic
(X & INT_MIN) == 0 ? X ^ INT_MIN : X  into  X | INT_MIN
(X & INT_MIN) != 0 ? X ^ INT_MIN : X  into  X & INT_MAX

This fixes PR21993.

llvm-svn: 224676
2014-12-20 04:45:35 +00:00
Chandler Carruth 113dc64c67 [SROA] Run clang-format over the entire SROA pass as I wrote it before
much of the glory of clang-format, and now any time I touch it I risk
introducing formatting changes as part of a functional commit.

Also, clang-format is *way* better at formatting my code than I am.
Most of this is a huge improvement although I reverted a couple of
places where I hit a clang-format bug with lambdas that has been filed
but not (fully) fixed.

llvm-svn: 224666
2014-12-20 02:39:18 +00:00
Tilmann Scheller be98f3c4ec [BBVectorize] Remove two more redundant assignments.
Found by the Clang static analyzer.

llvm-svn: 224590
2014-12-19 17:21:38 +00:00
Tilmann Scheller 945ce0ae00 [BBVectorize] Remove redundant assignment.
Found by the Clang static analyzer.

llvm-svn: 224589
2014-12-19 17:13:12 +00:00
Bruno Cardoso Lopes f6cf8ad4e5 Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. Also, fix code to also return the modified switch when only
the truncation is performed.

This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224588
2014-12-19 17:12:35 +00:00
Tilmann Scheller b811030b47 [LoopVectorize] Remove redundant assignment.
Found by the Clang static analyzer.

llvm-svn: 224587
2014-12-19 17:02:31 +00:00
Sanjay Patel ea3c802887 use -0.0 when creating an fneg instruction
Backends recognize (-0.0 - X) as the canonical form for fneg
and produce better code. Eg, ppc64 with 0.0:

   lis r2, ha16(LCPI0_0)
   lfs f0, lo16(LCPI0_0)(r2)
   fsubs f1, f0, f1
   blr

vs. -0.0:

   fneg f1, f1
   blr

Differential Revision: http://reviews.llvm.org/D6723

llvm-svn: 224583
2014-12-19 16:44:08 +00:00
Bruno Cardoso Lopes 3be15b2fa6 Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr"
Reverts commit r224574 to appease buildbots:

The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

llvm-svn: 224576
2014-12-19 14:36:24 +00:00
Bruno Cardoso Lopes c9005f2f2b [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224574
2014-12-19 14:23:15 +00:00
Duncan P. N. Exon Smith 46d7af5729 Rename MapValue(Metadata*) to MapMetadata()
Instead of reusing the name `MapValue()` when mapping `Metadata`, use
`MapMetadata()`.  The old name doesn't make much sense after the
`Metadata`/`Value` split.

llvm-svn: 224566
2014-12-19 06:06:18 +00:00
Sanjay Patel c242dbb3b6 fix formatting; NFC
llvm-svn: 224542
2014-12-18 21:11:09 +00:00
Viktor Kutuzov b4ffb5d5e9 [Msan] Generalize instrumentation code to support FreeBSD mapping
Differential Revision: http://reviews.llvm.org/D6666

llvm-svn: 224514
2014-12-18 12:12:59 +00:00
Chandler Carruth 68ea415d04 [SROA] Cleanup - remove the use of std::mem_fun_ref nonsense and use
a lambda now that we have them.

llvm-svn: 224500
2014-12-18 05:19:47 +00:00
Kostya Serebryany fea4fb404e [sanitizer] allow -fsanitize-coverage=N w/ -fsanitize=leak, llvm part
llvm-svn: 224463
2014-12-17 21:50:04 +00:00
Suyog Sarda 43fae93da8 Revert 224119 "This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
and vectorizes it." 

This was re-ordering floating point data types resulting in mismatch in output.

llvm-svn: 224424
2014-12-17 10:34:27 +00:00
Erik Eckstein a451b9b0b5 Strength reduce intrinsics with overflow into regular arithmetic operations if possible.
Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced.
This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow.
It completes the work on PR20194.

llvm-svn: 224417
2014-12-17 07:29:19 +00:00
Kostya Serebryany 7376294086 [sanitizer] prevent function call merging for sanitizer-coverage callbacks
llvm-svn: 224372
2014-12-16 21:24:15 +00:00
Elena Demikhovsky f5b72afff4 Masked Load and Store Intrinsics in loop vectorizer.
The loop vectorizer optimizes loops containing conditional memory
accesses by generating masked load and store intrinsics.
This decision is target dependent.

http://reviews.llvm.org/D6527

llvm-svn: 224334
2014-12-16 11:50:42 +00:00
Elena Demikhovsky a5599bfd72 Sink store based on alias analysis
- by Ella Bolshinsky
The alias analysis is used define whether the given instruction
is a barrier for store sinking. For 2 identical stores, following
instructions are checked in the both basic blocks, to determine
whether they are sinking barriers.

http://reviews.llvm.org/D6420

llvm-svn: 224247
2014-12-15 14:09:53 +00:00
Elena Demikhovsky 3fcafa2cdb Loop Vectorizer minor changes in the code -
some comments, function names, identation.

Reviewed here: http://reviews.llvm.org/D6527

llvm-svn: 224218
2014-12-14 09:43:50 +00:00
Steven Wu f179d12e50 More code format fix from r224133, NFC
llvm-svn: 224140
2014-12-12 18:48:37 +00:00
Steven Wu 1f7402a14e Restructure code from r224097. NFC
llvm-svn: 224133
2014-12-12 17:21:54 +00:00
Chad Rosier 78943bcc18 [Reassociate] Use dbgs() instead of errs().
llvm-svn: 224125
2014-12-12 14:44:12 +00:00
Suyog Sarda 384095e65c This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
and vectorizes it. 
 
 Test case :
 
       float hadd(float* a) {
           return (a[0] + a[1]) + (a[2] + a[3]);
        }
 
 
 AArch64 assembly before patch :
 
        ldp	s0, s1, [x0]
 	ldp	s2, s3, [x0, #8]
 	fadd	s0, s0, s1
 	fadd	s1, s2, s3
 	fadd	s0, s0, s1
 	ret
 
 AArch64 assembly after patch :
 
        ldp	d0, d1, [x0]
 	fadd	v0.2s, v0.2s, v1.2s
 	faddp	s0, v0.2s
 	ret

Reviewed Link : http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141208/248531.html

llvm-svn: 224119
2014-12-12 12:53:44 +00:00
Steven Wu 881916dea5 Fix another infinite loop in InstCombine
Summary:
InstCombine infinite-loops for the testcase added
It is because InstCombine is generating instructions that can be
optimized by itself. Fix by not optimizing frem if the optimized
type is the same as original type.
rdar://problem/19150820

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6634

llvm-svn: 224097
2014-12-12 04:34:07 +00:00
Alexey Samsonov 4b7f413e3e [ASan] Change fake stack and local variables handling.
This commit changes the way we get fake stack from ASan runtime
(to find use-after-return errors) and the way we represent local
variables:
  - __asan_stack_malloc function now returns pointer to newly allocated
    fake stack frame, or NULL if frame cannot be allocated. It doesn't
    take pointer to real stack as an input argument, it is calculated
    inside the runtime.
  - __asan_stack_free function doesn't take pointer to real stack as
    an input argument. Now this function is never called if fake stack
    frame wasn't allocated.
  - __asan_init version is bumped to reflect changes in the ABI.
  - new flag "-asan-stack-dynamic-alloca" allows to store all the
    function local variables in a dynamic alloca, instead of the static
    one. It reduces the stack space usage in use-after-return mode
    (dynamic alloca will not be called if the local variables are stored
    in a fake stack), and improves the debug info quality for local
    variables (they will not be described relatively to %rbp/%rsp, which
    are assumed to be clobbered by function calls). This flag is turned
    off by default for now, but I plan to turn it on after more
    testing.

llvm-svn: 224062
2014-12-11 21:53:03 +00:00
Andrea Di Biagio 72b05aa59c [InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.
This patch teaches the instruction combiner how to fold a call to 'insertqi' if
the 'length field' (3rd operand) is set to zero, and if the sum between
field 'length' and 'bit index' (4th operand) is bigger than 64.

From the AMD64 Architecture Programmer's Manual:
1. If the sum of the bit index + length field is greater than 64, then the
   results are undefined;
2. A value of zero in the field length is defined as a length of 64.

This patch improves the existing combining logic for intrinsic 'insertqi'
adding extra checks to address both point 1. and point 2.

Differential Revision: http://reviews.llvm.org/D6583

llvm-svn: 224054
2014-12-11 20:44:59 +00:00
Michael Kuperstein fffb6996c9 The inliner needs to fix up debug information for llvm.dbg.declare, not only for llvm.dbg.value.
Patch by Amjad Aboud

Differential Revision: http://reviews.llvm.org/D6525

llvm-svn: 224015
2014-12-11 12:41:10 +00:00
Erik Eckstein 096ff7dcd6 Refactor creation of overflow result tuples in InstCombineCalls.
Extract the creation of overflow result tuples in a separate function. NFC.

llvm-svn: 224006
2014-12-11 08:02:30 +00:00
Kaelyn Takata 22324f378a Rename static functiom "map" to be more descriptive and to avoid
potential confusion with the std::map type.

llvm-svn: 223853
2014-12-09 23:32:46 +00:00
Michael Zolotukhin 4def395646 Remove redundant variable.
Tested by adding assert(LoopVectorPreHeader == VecPreheader) on LLVM
test suite and SPECs.

llvm-svn: 223847
2014-12-09 22:45:07 +00:00
Chandler Carruth a7f247ea56 Revert r223764 which taught instcombine about integer-based elment extraction
patterns.

This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely
also on ARM and PPC. I really don't know how, but reverting now that I've
confirmed this is actually the culprit. I have a reproduction as well and so
should be able to restore this shortly.

This reverts commit r223764.

Original commit log follows:
Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.

Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.

All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.

With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.

For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
  integer loads and stores. SSA values are tremendously more powerful
  than "copy" intrinsics. Not doing this regresses massive amounts of
  LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
  SROA or every memcpy of a trivially copyable struct will prevent SSA
  formation of the members of that struct. It essentially turns off
  SROA.
- The closest alternative is to actually split the loads and stores when
  partitioning with SROA, but this has all of the downsides historically
  discussed of splitting up loads and stores -- the wide-store
  information is fundamentally lost. We would also see performance
  regressions for bitfield-heavy code and other places where the
  integers aren't really intended to be split without seemingly
  arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
  a choice to make IMO.

llvm-svn: 223813
2014-12-09 19:21:16 +00:00
Frederic Riss 35f0a9aeba Remove unneeded curly braces.
llvm-svn: 223809
2014-12-09 18:57:39 +00:00
Frederic Riss ff58fd207e Reorder the code to avoid inserting at the beginning of a vector.
As per dblaikie suggestion, thanks\!

llvm-svn: 223808
2014-12-09 18:57:34 +00:00
Duncan P. N. Exon Smith 5bf8fef580 IR: Split Metadata from Value
Split `Metadata` away from the `Value` class hierarchy, as part of
PR21532.  Assembly and bitcode changes are in the wings, but this is the
bulk of the change for the IR C++ API.

I have a follow-up patch prepared for `clang`.  If this breaks other
sub-projects, I apologize in advance :(.  Help me compile it on Darwin
I'll try to fix it.  FWIW, the errors should be easy to fix, so it may
be simpler to just fix it yourself.

This breaks the build for all metadata-related code that's out-of-tree.
Rest assured the transition is mechanical and the compiler should catch
almost all of the problems.

Here's a quick guide for updating your code:

  - `Metadata` is the root of a class hierarchy with three main classes:
    `MDNode`, `MDString`, and `ValueAsMetadata`.  It is distinct from
    the `Value` class hierarchy.  It is typeless -- i.e., instances do
    *not* have a `Type`.

  - `MDNode`'s operands are all `Metadata *` (instead of `Value *`).

  - `TrackingVH<MDNode>` and `WeakVH` referring to metadata can be
    replaced with `TrackingMDNodeRef` and `TrackingMDRef`, respectively.

    If you're referring solely to resolved `MDNode`s -- post graph
    construction -- just use `MDNode*`.

  - `MDNode` (and the rest of `Metadata`) have only limited support for
    `replaceAllUsesWith()`.

    As long as an `MDNode` is pointing at a forward declaration -- the
    result of `MDNode::getTemporary()` -- it maintains a side map of its
    uses and can RAUW itself.  Once the forward declarations are fully
    resolved RAUW support is dropped on the ground.  This means that
    uniquing collisions on changing operands cause nodes to become
    "distinct".  (This already happened fairly commonly, whenever an
    operand went to null.)

    If you're constructing complex (non self-reference) `MDNode` cycles,
    you need to call `MDNode::resolveCycles()` on each node (or on a
    top-level node that somehow references all of the nodes).  Also,
    don't do that.  Metadata cycles (and the RAUW machinery needed to
    construct them) are expensive.

  - An `MDNode` can only refer to a `Constant` through a bridge called
    `ConstantAsMetadata` (one of the subclasses of `ValueAsMetadata`).

    As a side effect, accessing an operand of an `MDNode` that is known
    to be, e.g., `ConstantInt`, takes three steps: first, cast from
    `Metadata` to `ConstantAsMetadata`; second, extract the `Constant`;
    third, cast down to `ConstantInt`.

    The eventual goal is to introduce `MDInt`/`MDFloat`/etc. and have
    metadata schema owners transition away from using `Constant`s when
    the type isn't important (and they don't care about referring to
    `GlobalValue`s).

    In the meantime, I've added transitional API to the `mdconst`
    namespace that matches semantics with the old code, in order to
    avoid adding the error-prone three-step equivalent to every call
    site.  If your old code was:

        MDNode *N = foo();
        bar(isa             <ConstantInt>(N->getOperand(0)));
        baz(cast            <ConstantInt>(N->getOperand(1)));
        bak(cast_or_null    <ConstantInt>(N->getOperand(2)));
        bat(dyn_cast        <ConstantInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<ConstantInt>(N->getOperand(4)));

    you can trivially match its semantics with:

        MDNode *N = foo();
        bar(mdconst::hasa               <ConstantInt>(N->getOperand(0)));
        baz(mdconst::extract            <ConstantInt>(N->getOperand(1)));
        bak(mdconst::extract_or_null    <ConstantInt>(N->getOperand(2)));
        bat(mdconst::dyn_extract        <ConstantInt>(N->getOperand(3)));
        bay(mdconst::dyn_extract_or_null<ConstantInt>(N->getOperand(4)));

    and when you transition your metadata schema to `MDInt`:

        MDNode *N = foo();
        bar(isa             <MDInt>(N->getOperand(0)));
        baz(cast            <MDInt>(N->getOperand(1)));
        bak(cast_or_null    <MDInt>(N->getOperand(2)));
        bat(dyn_cast        <MDInt>(N->getOperand(3)));
        bay(dyn_cast_or_null<MDInt>(N->getOperand(4)));

  - A `CallInst` -- specifically, intrinsic instructions -- can refer to
    metadata through a bridge called `MetadataAsValue`.  This is a
    subclass of `Value` where `getType()->isMetadataTy()`.

    `MetadataAsValue` is the *only* class that can legally refer to a
    `LocalAsMetadata`, which is a bridged form of non-`Constant` values
    like `Argument` and `Instruction`.  It can also refer to any other
    `Metadata` subclass.

(I'll break all your testcases in a follow-up commit, when I propagate
this change to assembly.)

llvm-svn: 223802
2014-12-09 18:38:53 +00:00
Frederic Riss 7c78db5065 Correctly handle complex locations expressions in replaceDbgDeclareForAlloca()
replaceDbgDeclareForAlloca() replaces an alloca by a value storing the
address of what was the alloca. If there is a dbg.declare corresponding
to that alloca, we need to lower it to a dbg.value describing the additional
dereference operation to be performed to get to the underlying variable.
 This is done by adding a DW_OP_deref to the complex location part of the
location description. This deref was added to the end of the operation list,
which is wrong. The expression applies to what is described by the
dbg.{declare,value}, and as we are changing this, we need to apply the
DW_OP_deref as the first operation in the list.

Part of the fix for rdar://19162268.

llvm-svn: 223799
2014-12-09 17:55:48 +00:00
Juergen Ributzka 194350a936 Revert "Move function to obtain branch weights into the BranchInst class. NFC."
This reverts commit r223784 and copies the 'ExtractBranchMetadata' to CodeGenPrepare.

llvm-svn: 223795
2014-12-09 17:32:12 +00:00
Juergen Ributzka e2aa3aa38a Move function to obtain branch weights into the BranchInst class. NFC.
Make this function available to other parts of LLVM.

llvm-svn: 223784
2014-12-09 16:36:06 +00:00
Chandler Carruth 7415205113 Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.

Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.

All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.

With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.

For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
  integer loads and stores. SSA values are tremendously more powerful
  than "copy" intrinsics. Not doing this regresses massive amounts of
  LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
  SROA or every memcpy of a trivially copyable struct will prevent SSA
  formation of the members of that struct. It essentially turns off
  SROA.
- The closest alternative is to actually split the loads and stores when
  partitioning with SROA, but this has all of the downsides historically
  discussed of splitting up loads and stores -- the wide-store
  information is fundamentally lost. We would also see performance
  regressions for bitfield-heavy code and other places where the
  integers aren't really intended to be split without seemingly
  arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
  a choice to make IMO.

Differential Revision: http://reviews.llvm.org/D6548

llvm-svn: 223764
2014-12-09 08:55:32 +00:00
Justin Bogner 61ba2e3996 InstrProf: An intrinsic and lowering for instrumentation based profiling
Introduce the ``llvm.instrprof_increment`` intrinsic and the
``-instrprof`` pass. These provide the infrastructure for writing
counters for profiling, as in clang's ``-fprofile-instr-generate``.

The implementation of the instrprof pass is ported directly out of the
CodeGenPGO classes in clang, and with the followup in clang that rips
that code out to use these new intrinsics this ends up being NFC.

Doing the instrumentation this way opens some doors in terms of
improving the counter performance. For example, this will make it
simple to experiment with alternate lowering strategies, and allows us
to try handling profiling specially in some optimizations if we want
to.

Finally, this drastically simplifies the frontend and puts all of the
lowering logic in one place.

llvm-svn: 223672
2014-12-08 18:02:35 +00:00
NAKAMURA Takumi 6980404cfe LLVMInstrumentation requires MC since r223532.
llvm-svn: 223573
2014-12-06 02:22:11 +00:00
Duncan P. N. Exon Smith b236211c4c Utils: Style cleanups, NFC
llvm-svn: 223556
2014-12-06 00:48:17 +00:00
Duncan P. N. Exon Smith b13f7d2e36 Utils: Avoid RAUW on metadata in CloneFunction()
llvm-svn: 223555
2014-12-06 00:48:13 +00:00
Kuba Brecka 1001bb533b Recommit of r223513 and r223514.
Reviewed at http://reviews.llvm.org/D6488

llvm-svn: 223532
2014-12-05 22:19:18 +00:00
Kuba Brecka 086e34bef8 Reverting r223513 and r223514.
llvm-svn: 223520
2014-12-05 21:32:46 +00:00
Peter Collingbourne 0826e60748 [DFSAN][MIPS][LLVM] Defining ShadowPtrMask variable for MIPS64
Patch by Kumar Sukhani!

corresponding compiler-rt patch: http://reviews.llvm.org/D6437
clang patch: http://reviews.llvm.org/D6147

Differential Revision: http://reviews.llvm.org/D6459

llvm-svn: 223516
2014-12-05 21:22:32 +00:00
Kuba Brecka 1e21378a37 AddressSanitizer - Don't instrument globals from cstring_literals sections. (llvm part)
Reviewed at http://reviews.llvm.org/D6488

llvm-svn: 223513
2014-12-05 21:04:43 +00:00
Evgeniy Stepanov d85ddee01d [msan] Avoid extra origin address realignment.
Do not realign origin address if the corresponding application
address is at least 4-byte-aligned.

Saves 2.5% code size in track-origins mode.

llvm-svn: 223464
2014-12-05 14:34:03 +00:00
Simon Pilgrim be24ab367b [InstCombine] Minor optimization for bswap with binary ops
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:

OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )

Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:

fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))

Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.

Differential Revision: http://reviews.llvm.org/D6407

llvm-svn: 223349
2014-12-04 09:44:01 +00:00
Kostya Serebryany 543f3db572 [msan] allow -fsanitize-coverage=N together with -fsanitize=memory, llvm part
llvm-svn: 223312
2014-12-03 23:28:26 +00:00
Matthias Braun 395a82f6cc correct spelling, NFC
llvm-svn: 223274
2014-12-03 22:10:39 +00:00
Matthias Braun d34e4d2354 [SimplifyLibCalls] Improve double->float shrinking to consider constants
This allows cases like float x; fmin(1.0, x); to be optimized to fminf(1.0f, x);

rdar://19049359

Differential Revision: http://reviews.llvm.org/D6496

llvm-svn: 223270
2014-12-03 21:46:33 +00:00
Matthias Braun 892c923c46 [SimplifyLibCalls] Enable double to float shrinking for copysign
rdar://19049359

Differential Revision: http://reviews.llvm.org/D6495

llvm-svn: 223269
2014-12-03 21:46:29 +00:00
Evgeniy Stepanov 2e5a1f1c9c msan] Add compile-time checks for missing origins.
This change makes MemorySanitizer instrumentation a bit more strict
about instructions that have no origin id assigned to them.

This would have caught the bug that was fixed in r222918.

This is re-commit of r222997, reverted in r223211, with 3 more
missing origins added.

llvm-svn: 223236
2014-12-03 14:15:53 +00:00
Erik Eckstein d181752be0 InstCombine: simplify signed range checks
Try to convert two compares of a signed range check into a single unsigned compare.
Examples:
(icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n
(icmp slt x, 0) | (icmp sgt x, n) --> icmp ugt x, n

llvm-svn: 223224
2014-12-03 10:39:15 +00:00
Nick Lewycky a4acb44995 Revert r222997. The newly added compile-time checks are finding missing origins, testcase is being reduced and a PR will be posted shortly.
llvm-svn: 223211
2014-12-03 05:47:00 +00:00
Duncan P. N. Exon Smith a48bd07e5e LoopVectorize: Remove unnecessary RAUW
Remove an unnecessary `MDNode::replaceAllUsesWith()`.  In the preceding
line, `TheLoop->setLoopID()` visits all backedges and sets the new loop
ID.  This sufficiently updates the loop metadata.

Metadata RAUW is going away as part of PR21532.

llvm-svn: 223210
2014-12-03 05:41:20 +00:00
Tom Stellard 1f0dded057 StructurizeCFG: Use LoopInfo analysis for better loop detection
We were assuming that each back-edge in a region represented a unique
loop, which is not always the case.  We need to use LoopInfo to
correctly determine which back-edges are loops.

llvm-svn: 223199
2014-12-03 04:28:32 +00:00
Nick Lewycky 2e8a6219fc Emit the entry block first and the exit block second, then all the blocks in between afterwards. This is what gcc always does, and some out of tree tools depend on that.
llvm-svn: 223193
2014-12-03 02:45:01 +00:00
Peter Collingbourne 51d2de7b9e Prologue support
Patch by Ben Gamari!

This redefines the `prefix` attribute introduced previously and
introduces a `prologue` attribute.  There are a two primary usecases
that these attributes aim to serve,

  1. Function prologue sigils

  2. Function hot-patching: Enable the user to insert `nop` operations
     at the beginning of the function which can later be safely replaced
     with a call to some instrumentation facility

  3. Runtime metadata: Allow a compiler to insert data for use by the
     runtime during execution. GHC is one example of a compiler that
     needs this functionality for its tables-next-to-code functionality.

Previously `prefix` served cases (1) and (2) quite well by allowing the user
to introduce arbitrary data at the entrypoint but before the function
body. Case (3), however, was poorly handled by this approach as it
required that prefix data was valid executable code.

Here we redefine the notion of prefix data to instead be data which
occurs immediately before the function entrypoint (i.e. the symbol
address). Since prefix data now occurs before the function entrypoint,
there is no need for the data to be valid code.

The previous notion of prefix data now goes under the name "prologue
data" to emphasize its duality with the function epilogue.

The intention here is to handle cases (1) and (2) with prologue data and
case (3) with prefix data.

References
----------

This idea arose out of discussions[1] with Reid Kleckner in response to a
proposal to introduce the notion of symbol offsets to enable handling of
case (3).

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-May/073235.html

Test Plan: testsuite

Differential Revision: http://reviews.llvm.org/D6454

llvm-svn: 223189
2014-12-03 02:08:38 +00:00
Michael Zolotukhin ea8327b80f PR21302. Vectorize only bottom-tested loops.
rdar://problem/18886083

llvm-svn: 223171
2014-12-02 22:59:06 +00:00
Philip Reames 1a1bdb22bf [Statepoints 3/4] Statepoint infrastructure for garbage collection: SelectionDAGBuilder
This is the third patch in a small series.  It contains the CodeGen support for lowering the gc.statepoint intrinsic sequences (223078) to the STATEPOINT pseudo machine instruction (223085).  The change also includes the set of helper routines and classes for working with gc.statepoints, gc.relocates, and gc.results since the lowering code uses them.  

With this change, gc.statepoints should be functionally complete.  The documentation will follow in the fourth change, and there will likely be some cleanup changes, but interested parties can start experimenting now.

I'm not particularly happy with the amount of code or complexity involved with the lowering step, but at least it's fairly well isolated.  The statepoint lowering code is split into it's own files and anyone not working on the statepoint support itself should be able to ignore it.  

During the lowering process, we currently spill aggressively to stack. This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics.  Most of the complexity comes from trying to keep relocated copies of values in the same stack slots across statepoints.  Doing so avoids the insertion of pointless load and store instructions to reshuffle the stack.  The current implementation isn't as effective as I'd like, but it is functional and 'good enough' for many common use cases.  

In the long term, I'd like to figure out how to integrate the statepoint lowering with the register allocator.  In principal, we shouldn't need to eagerly spill at all.  The register allocator should do any spilling required and the statepoint should simply record that fact.  Depending on how challenging that turns out to be, we may invest in a smarter global stack slot assignment mechanism as a stop gap measure.  

Reviewed by: atrick, ributzka

llvm-svn: 223137
2014-12-02 18:50:36 +00:00
Bruno Cardoso Lopes 15520db9ad [SwitchLowering] Handle destinations on multiple phi instructions
Follow up from r222926. Also handle multiple destinations from merged
cases on multiple and subsequent phi instructions.

rdar://problem/19106978

llvm-svn: 223135
2014-12-02 18:31:53 +00:00
Bruno Cardoso Lopes d035fbb96f [LICM] Avoind store sinking if no preheader is available
Load instructions are inserted into loop preheaders when sinking stores
and later removed if not used by the SSA updater. Avoid sinking if the
loop has no preheader and avoid crashes. This fixes one more side effect
of not handling indirectbr instructions properly on LoopSimplify.

llvm-svn: 223119
2014-12-02 14:22:34 +00:00
Hans Wennborg 5bef5b522b Revert r223049, r223050 and r223051 while investigating test failures.
I didn't foresee affecting the Clang test suite :/

llvm-svn: 223054
2014-12-01 17:36:43 +00:00
Hans Wennborg 269ebb612e SimplifyCFG: Omit range checks for switch lookup tables when default is unreachable
They would get optimized away later, but we might as well not emit them.

llvm-svn: 223051
2014-12-01 17:08:38 +00:00
Hans Wennborg 5a1e5c05d8 SimplifyCFG: don't remove unreachable default switch destinations
An unreachable default destination can be exploited by other optimizations, and
SDag lowering is now prepared to handle them efficiently.

For example, branches to the unreachable destination will be optimized away,
such as in the case of range checks for switch lookup tables.

On 64-bit Linux, this reduces the size of a clang bootstrap by 80 kB (and
Chromium by 30 kB).

llvm-svn: 223050
2014-12-01 17:08:35 +00:00
Evgeniy Stepanov a056ac8a98 [msan] Add compile-time checks for missing origins.
This change makes MemorySanitizer instrumentation a bit more strict
about instructions that have no origin id assigned to them.

This would have caught the bug that was fixed in r222918.

No functional change.

llvm-svn: 222997
2014-12-01 09:53:51 +00:00
Yury Gribov 3ae427d811 [asan] Change dynamic alloca instrumentation to only consider allocas that are dominating all exits from function.
Reviewed in http://reviews.llvm.org/D6412

llvm-svn: 222991
2014-12-01 08:47:58 +00:00
Duncan P. N. Exon Smith 910f05d181 DebugIR: Delete -debug-ir
llvm-svn: 222945
2014-11-29 03:15:47 +00:00
Duncan P. N. Exon Smith 9bc81fbe92 Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

llvm-svn: 222936
2014-11-28 21:29:14 +00:00
David Majnemer 3d6f80b619 InstCombine: FoldOrOfICmps harder
We may be in a situation where the icmps might not be near each other in
a tree of or instructions.  Try to dig out related compare instructions
and see if they combine.

N.B.  This won't fire on deep trees of compares because rewritting the
tree might end up creating a net increase of IR.  We may have to resort
to something more sophisticated if this is a real problem.

llvm-svn: 222928
2014-11-28 19:58:29 +00:00
Bruno Cardoso Lopes 46d5bf2982 [LICM] Store sink and indirectbr instructions
Loop simplify skips exit-block insertion when exits contain indirectbr
instructions. This leads to an assertion in LICM when trying to sink
stores out of non-dedicated loop exits containing indirectbr
instructions. This patch fix this issue by re-checking for dedicated
exits in LICM prior to store sink attempts.

Differential Revision: http://reviews.llvm.org/D6414

rdar://problem/18943047

llvm-svn: 222927
2014-11-28 19:47:46 +00:00
Bruno Cardoso Lopes bc7ba2c766 [SwitchLowering] Handle multiple destinations on condensed case stmts
Switch cases statements with sequential values that branch to the same
destination BB may often be handled together in a single new source BB.
In this scenario we need to remove remaining incoming values from PHI
instructions in the destination BB, as to match the number of source
branches.

Differential Revision: http://reviews.llvm.org/D6415

rdar://problem/19040894

llvm-svn: 222926
2014-11-28 19:47:33 +00:00
Evgeniy Stepanov a0b6899234 [msan] Fix origin propagation for select of floats.
MSan does not assign origin for instrumentation temps (i.e. the ones that do
not come from the application code), but "select" instrumentation erroneously
tried to use one of those.

https://code.google.com/p/memory-sanitizer/issues/detail?id=78

llvm-svn: 222918
2014-11-28 11:17:58 +00:00
Ankur Garg 876b891d51 Removed extra line from a comment to test first commit. NFC.
llvm-svn: 222916
2014-11-28 10:38:18 +00:00
Erik Eckstein 0d86c7623f reinstate r222872: Peephole optimization in switch table lookup: reuse the guarding table comparison if possible.
Fixed missing dominance check.
Original commit message:

This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
   if (idx < tablesize)
      r = table[idx]; // table does not contain default_value
   else
      r = default_value;
   if (r != default_value)
      ...
Is optimized to:
   cond = idx < tablesize;
   if (cond)
      r = table[idx];
   else
      r = default_value;
   if (cond)
      ...
Jump threading will then eliminate the second if(cond).

llvm-svn: 222891
2014-11-27 15:13:14 +00:00
Evgeniy Stepanov e402d9ef4c [msan] Remove indirect call wrapping code.
This functionality was only used in MSanDR, which is deprecated.

llvm-svn: 222889
2014-11-27 14:54:02 +00:00
Erik Eckstein 2190cd9ffa Revert "Peephole optimization in switch table lookup: reuse the guarding table comparison if possible."
It is breaking the clang bootstrag.

llvm-svn: 222877
2014-11-27 10:59:08 +00:00
Erik Eckstein e73e308ab9 Peephole optimization in switch table lookup: reuse the guarding table comparison if possible.
This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
    if (idx < tablesize)
       r = table[idx]; // table does not contain default_value
    else
       r = default_value;
    if (r != default_value)
       ...
Is optimized to:
    cond = idx < tablesize;
    if (cond)
       r = table[idx];
    else
       r = default_value;
    if (cond)
       ...
\endcode
Jump threading will then eliminate the second if(cond).

llvm-svn: 222872
2014-11-27 08:33:51 +00:00
David Majnemer 40157d5c4d InstCombine: Restore optimizations lost in r210006
This restores our ability to optimize:
(X & C) == 0 ? X ^ C : X  into  X | C
(X & C) != 0 ? X ^ C : X  into  X & ~C

llvm-svn: 222871
2014-11-27 07:25:21 +00:00
David Majnemer 5468e86469 Revert "Added inst combine transforms for single bit tests from Chris's note"
This reverts commit r210006, it miscompiled libapr which is used in who
knows how many projects.

A test has been added to ensure that we don't regress again.

I'll work on a rewrite of what the optimization was trying to do later.

llvm-svn: 222856
2014-11-26 23:00:38 +00:00
Chandler Carruth 816d26fe5e [InstCombine] Change LLVM To canonicalize toward the value type being
stored rather than the pointer type.

This change is analogous to r220138 which changed the canonicalization
for loads. The rationale is the same: memory does not have a type,
operations (and thus the values they produce) have a type. We should
match that type as closely as possible rather than reading some form of
semantics into the pointer type.

With this change, loads and stores should no longer be made with
nonsensical types for the values that tehy load and store. This is
particularly important when trying to match specific loaded and stored
types in the process of doing other instcombines, which is what led me
down this twisty maze of miscanonicalization.

I've put quite some effort into looking through IR to find places where
LLVM's optimizer was being unreasonably conservative in the face of
mismatched load and store types, however it is possible (let's say,
likely!) I have missed some. If you see regressions here, or from
r220138, the likely cause is some part of LLVM failing to cope with load
and store types differing. Test cases appreciated, it is important that
we root all of these out of LLVM.

llvm-svn: 222748
2014-11-25 10:09:51 +00:00
Chandler Carruth 1a3c2c414c Revert r220349 to re-instate r220277 with a fix for PR21330 -- quite
clearly only exactly equal width ptrtoint and inttoptr casts are no-op
casts, it says so right there in the langref. Make the code agree.

Original log from r220277:
Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.

To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.

These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.

I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.

llvm-svn: 222739
2014-11-25 08:20:27 +00:00
Matt Arsenault 238ff1ad1e Bug 21610: Canonicalize min/max fcmp selects to use ordered comparisons
llvm-svn: 222705
2014-11-24 23:15:18 +00:00
Kostya Serebryany 4cadd4afa0 [asan/coverage] change the way asan coverage instrumentation is done: instead of setting the guard to 1 in the generated code, pass the pointer to guard to __sanitizer_cov and set it there. No user-visible functionality change expected
llvm-svn: 222675
2014-11-24 18:49:53 +00:00
David Majnemer 8e6f6a98b5 InstCombine: Don't create an unused instruction
We would create an instruction but not inserting it.
Not inserting the unused instruction would lead us to verification
failure.

This fixes PR21653.

llvm-svn: 222659
2014-11-24 16:41:13 +00:00
David Majnemer b2a6e7458d InstCombine: Don't assume DataLayout is always available
We tried to get the result of DataLayout::getLargestLegalIntTypeSize but
we didn't have a DataLayout.  This resulted in opt crashing.

This fixes PR21651.

llvm-svn: 222645
2014-11-24 07:26:20 +00:00
Elena Demikhovsky 9e5089a938 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 222632
2014-11-23 08:07:43 +00:00
David Majnemer fb3805576b InstCombine: Propagate exact for (sdiv X, Pow2) -> (udiv X, Pow2)
llvm-svn: 222625
2014-11-22 20:00:41 +00:00
David Majnemer ec6e481bc5 InstCombine: Propagate exact for (sdiv X, Y) -> (udiv X, Y)
llvm-svn: 222624
2014-11-22 20:00:38 +00:00
David Majnemer fa4699e65f InstCombine: Propagate exact for (sdiv -X, C) -> (sdiv X, -C)
llvm-svn: 222623
2014-11-22 20:00:34 +00:00
Simon Pilgrim a279410ede Tidied up target triple OS detection. NFC
Use Triple::isOS*() helper functions where possible.

llvm-svn: 222622
2014-11-22 19:12:10 +00:00
David Majnemer a3aeb15613 InstCombine: Propagate exact in (udiv (lshr X,C1),C2) -> (udiv x,C1<<C2)
llvm-svn: 222620
2014-11-22 18:16:54 +00:00
David Majnemer 546f81064c InstCombine: Propagate NSW/NUW for X*(1<<Y) -> X<<Y
llvm-svn: 222613
2014-11-22 08:57:02 +00:00
David Majnemer 8279a7506d InstCombine: Propagate NSW for -X * -Y -> X * Y
llvm-svn: 222612
2014-11-22 07:25:19 +00:00
David Majnemer 83484fdb8b InstCombine: Silence a parenthesis warning
llvm-svn: 222609
2014-11-22 06:09:28 +00:00
David Majnemer 80c8f627db InstCombine: Preserve nsw when folding X*(2^C) -> X << C
llvm-svn: 222606
2014-11-22 04:52:55 +00:00
David Majnemer fd4a6d2b7a InstCombine: Preserve nsw/nuw for ((X << C2)*C1) -> (X * (C1 << C2))
llvm-svn: 222605
2014-11-22 04:52:52 +00:00
David Majnemer 027bc80928 InstCombine: Preserve nsw for (mul %V, -1) -> (sub 0, %V)
llvm-svn: 222604
2014-11-22 04:52:38 +00:00
Gerolf Hoflehner ec6217c929 [InstCombine] Re-commit of r218721 (Optimize icmp-select-icmp sequence)
Fixes the self-host fail. Note that this commit activates dominator
analysis in the combiner by default (like the original commit did).

llvm-svn: 222590
2014-11-21 23:36:44 +00:00
Kostya Serebryany 60ef25bd54 [asan] remove old experimental code
llvm-svn: 222586
2014-11-21 22:34:29 +00:00
Kostya Serebryany ea2cb6f616 [asan] add statistic counter to dynamic alloca instrumentation
llvm-svn: 222573
2014-11-21 21:25:18 +00:00
Roman Divacky d2b9a1b890 Disable header duplication at -Oz in loop-rotate pass.
llvm-svn: 222562
2014-11-21 19:53:24 +00:00
Yury Gribov 55441bb601 [asan] Add new hidden compile-time flag asan-instrument-allocas to sanitize variable-sized dynamic allocas. Patch by Max Ostapenko.
Reviewed at http://reviews.llvm.org/D6055

llvm-svn: 222519
2014-11-21 10:29:50 +00:00
David Majnemer 1f44142e4e This Reassociate change unintentionally slipped in r222499
llvm-svn: 222500
2014-11-21 02:37:38 +00:00
David Majnemer c0a313b57c SROA: The alloca type isn't a candidate promotion type for vectors
The alloca's type is irrelevant, only those types which are used in a
load or store of the exact size of the slice should be considered.

This manifested as an assertion failure when we compared the various
types: we had a size mismatch.

This fixes PR21480.

llvm-svn: 222499
2014-11-21 02:34:55 +00:00
Mehdi Amini ffd0100618 SimplifyCFG: Refactor GatherConstantCompares() result in a struct
Code seems cleaner and easier to understand this way

This is basically r222416, after fixes for MSVC lack of standard 
support, and a few cleaning (got rid of a warning).
Thanks Nakamura Takumi and Nico Weber for the MSVC fixes.

llvm-svn: 222472
2014-11-20 22:40:25 +00:00
Michael Zolotukhin 0dcae71449 Fix a trip-count overflow issue in LoopUnroll.
Currently LoopUnroll generates a prologue loop before the main loop
body to execute first N%UnrollFactor iterations. Also, this loop is
used if trip-count can overflow - it's determined by a runtime check.

However, we've been mistakenly optimizing this loop to a linear code for
UnrollFactor = 2, not taking into account that it also serves as a safe
version of the loop if its trip-count overflows.

llvm-svn: 222451
2014-11-20 20:19:55 +00:00
Timur Iskhodzhanov 71526a3eda Revert r222416, r222422, r222426: the former revision had problems and fixing them introduced bugs
llvm-svn: 222428
2014-11-20 12:36:43 +00:00
Timur Iskhodzhanov a0bffc0c11 Fix a typo
llvm-svn: 222426
2014-11-20 11:48:58 +00:00
NAKAMURA Takumi 5a83192570 SimplifyCFG.cpp: Tweak to let msc17 compliant.
- Use LLVM_DELETED_FUNCTION.
  - Don't use member initializers.
  - Don't use initializer list.

llvm-svn: 222422
2014-11-20 08:59:02 +00:00
Mehdi Amini 65253e76ed SimplifyCFG: Refactor GatherConstantCompares() result in a struct
Code seems cleaner and easier to understand this way

llvm-svn: 222416
2014-11-20 06:51:02 +00:00
Chad Rosier 90a2f9b110 Revert "[Reassociate] As the expression tree is rewritten make sure the operands are"
This reverts commit r222142.  This is causing/exposing an execution-time regression
in spec2006/gcc and coremark on AArch64/A57/Ofast.

Conflicts:

	test/Transforms/Reassociate/optional-flags.ll

llvm-svn: 222398
2014-11-19 23:21:20 +00:00
Nico Weber 06839a536f Try to fix MSVS build after r222384. No intended behavior change.
llvm-svn: 222386
2014-11-19 21:16:11 +00:00
Mehdi Amini 9a25cb8806 SimplifyCFG: turn recursive GatherConstantCompares into iterative
A long sequence of || or && could lead to a stack explosion.

llvm-svn: 222384
2014-11-19 20:09:11 +00:00
Suyog Sarda aba97f4aba Vectorize a reduction chain feeding into a 'return' statement.
e.x 
return (a[0]+b[0]) + (a[1]+b[1])

Differential Revision: http://reviews.llvm.org/D6227

llvm-svn: 222364
2014-11-19 16:07:38 +00:00
Arnaud A. de Grandmaison 7b9dc28060 Fix tail recursion elimination
When the BasicBlock containing the return instrution has a PHI with 2
incoming values, FoldReturnIntoUncondBranch will remove the no longer
used incoming value and remove the no longer needed phi as well. This
leaves us with a BB that no longer has a PHI, but the subsequent call
to FoldReturnIntoUncondBranch from FoldReturnAndProcessPred will not
remove the return instruction (which still uses the result of the call
instruction). This prevents EliminateRecursiveTailCall to remove
the value, as it is still being used in a basicblock which has no
predecessors.

The basicblock can not be erased on the spot, because its iterator is
still being used in runTRE.

This issue was exposed when removing the threshold on size for lifetime
marker insertion for named temporaries in clang. The testcase is a much
reduced version of peelOffOuterExpr(const Expr*, const ExplodedNode *)
from clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp.

llvm-svn: 222354
2014-11-19 13:32:51 +00:00
David Blaikie 70573dcd9f Update SetVector to rely on the underlying set's insert to return a pair<iterator, bool>
This is to be consistent with StringSet and ultimately with the standard
library's associative container insert function.

This lead to updating SmallSet::insert to return pair<iterator, bool>,
and then to update SmallPtrSet::insert to return pair<iterator, bool>,
and then to update all the existing users of those functions...

llvm-svn: 222334
2014-11-19 07:49:26 +00:00
Hao Liu 1d2a061bd8 [SeparateConstOffsetFromGEP] Allow SeparateConstOffsetFromGEP pass to lower GEPs.
If LowerGEP is enabled, it can lower a GEP with multiple indices into GEPs with a single index
or arithmetic operations. Lowering GEPs can always extract structure indices. Lowering GEPs can
also give use more optimization opportunities. It can benefit passes like CSE, LICM and CGP.

Reviewed in http://reviews.llvm.org/D5864

llvm-svn: 222328
2014-11-19 06:24:44 +00:00
Kostya Serebryany cb45b126fb [asan] add experimental basic-block tracing to asan-coverage; also fix -fsanitize-coverage=3 which was broken by r221718
llvm-svn: 222290
2014-11-19 00:22:58 +00:00
Kostya Serebryany e5ea424a77 Introduce llvm::SplitAllCriticalEdges
Summary:
move the code from BreakCriticalEdges::runOnFunction()
into a separate utility function llvm::SplitAllCriticalEdges()
so that it can be used independently.
No functionality change intended.

Test Plan: check-llvm

Reviewers: nlewycky

Reviewed By: nlewycky

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6313

llvm-svn: 222288
2014-11-19 00:17:31 +00:00
Manman Ren c67109313c Revert r222039 because of bot failure.
http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/
Hopefully, bot will be green. If not, we will re-submit the commit.

llvm-svn: 222287
2014-11-19 00:13:26 +00:00
David Majnemer c6b8e20a5c InstCombine: Fix another infinite loop caused by visitFPTrunc
We would attempt to replace an frem's operand with the same operand.
This would cause InstCombine to think real work was done, causing
InstCombine to enter an infinite loop.

This fixes the second part of PR21576.

llvm-svn: 222265
2014-11-18 22:06:45 +00:00
David Majnemer b32eaddf11 Revert "Revert r222040 because of bot failure."
This reverts commit r222203, reverting r222040 didn't end up turning the
bot green.

llvm-svn: 222261
2014-11-18 21:30:02 +00:00
Chad Rosier e53e8c8e58 [Reassociate] Rename local variable to not use same name as a member
variable. NFC.

llvm-svn: 222248
2014-11-18 20:21:54 +00:00
Philip Reames 018dbf18c4 Tweak EarlyCSE to recognize series of dead stores
EarlyCSE is giving up on the current instruction immediately when it recognizes that the current instruction makes a previous store trivially dead. There's no reason to do this. Once the previous store has been deleted, it's perfectly legal to remember the value of the current store (for value forwarding) and the fact the store occurred (it could be dead too!).

Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D6301

llvm-svn: 222241
2014-11-18 17:46:32 +00:00
David Majnemer 6fdb6b8fd4 InstCombine: Fold away tautological masked compares
It is impossible for (x & INT_MAX) == 0 && x == INT_MAX to ever be true.

While this sort of reasoning should normally live in InstSimplify,
the machinery that derives this result is not trivial to split out.

llvm-svn: 222230
2014-11-18 09:31:41 +00:00
David Majnemer 1a3327bb62 InstCombine: Clean up foldLogOpOfMaskedICmps
No functional change intended.

llvm-svn: 222229
2014-11-18 09:31:36 +00:00
Hans Wennborg a6a11a969a SimplifyCFG: Range'ify some for-loops. No functional change.
llvm-svn: 222215
2014-11-18 02:37:11 +00:00
David Majnemer 9a91e4a18a IndVarSimplify: Allow LFTR to fire more often
I added a pessimization in r217102 to prevent miscompiles when the
incremented induction variable was used in a comparison; it would be
poison.

Try to use the incremented induction variable more often when we can be
sure that the increment won't end in poison.

Differential Revision: http://reviews.llvm.org/D6222

llvm-svn: 222213
2014-11-18 02:20:58 +00:00
Manman Ren a64bd44fd8 Revert r222040 because of bot failure.
http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/
Hopefully, bot will be green.

llvm-svn: 222203
2014-11-18 00:33:22 +00:00
Juergen Ributzka c9591e9bdb [SimplifyCFG] Make the value type of the hole check bitmask a power-of-2.
When converting a switch to a lookup table we might have to generate a bitmaks
to encode and check for holes in the original switch statement.

The type of this mask depends on the number of switch statements, which can
result in illegal types for pretty much all architectures.

To avoid unnecessary type legalization and help FastISel this commit increases
the size of the bitmask to next power-of-2 value when necessary.

This fixes rdar://problem/18984639.

llvm-svn: 222168
2014-11-17 19:39:56 +00:00
Chad Rosier bc0b869be9 [Reassociate] As the expression tree is rewritten make sure the operands are
emitted in canonical form.

llvm-svn: 222142
2014-11-17 16:33:50 +00:00
Chad Rosier 9a1ac6e494 [Reassociate] Canonicalize constants to RHS operand.
Fix a thinko where the RHS was already a constant.

llvm-svn: 222139
2014-11-17 15:52:51 +00:00
Erik Eckstein 105374fe5e Optimize switch lookup tables with linear mapping.
This is a simple optimization for switch table lookup:
It computes the output value directly with an (optional) mul and add if there is a linear mapping between index and output.
Example:

int f1(int x) {
  switch (x) {
    case 0: return 10;
    case 1: return 11;
    case 2: return 12;
    case 3: return 13;
  }
  return 0;
}

generates:

define i32 @f1(i32 %x) #0 {
entry:
  %0 = icmp ult i32 %x, 4
  br i1 %0, label %switch.lookup, label %return

switch.lookup:
  %switch.offset = add i32 %x, 10
  ret i32 %switch.offset

return:
  ret i32 0
}

llvm-svn: 222121
2014-11-17 09:13:57 +00:00
Rafael Espindola a3b5b60753 Add back r222061 with a fix.
This adds back r222061, but now calls initializePAEvalPass from the correct
library to avoid link problems.

Original message:

Don't make assumptions about the name of private global variables.

Private variables are can be renamed, so it is not reliable to make
decisions on the name.

The name is also dropped by the assembler before getting to the
linker, so using the name causes a disconnect between how llvm makes a
decision (var name) and how the linker makes a decision (section it is
in).

This patch changes one case where we were looking at the variable name to use
the section instead.

Test tuning by Michael Gottesman.

llvm-svn: 222117
2014-11-17 02:28:27 +00:00
Reid Kleckner 007239863e Revert "Don't make assumptions about the name of private global variables."
This reverts commit r222061.

It's causing linker errors.

llvm-svn: 222077
2014-11-15 02:03:53 +00:00
Rafael Espindola 2fc723099f Don't make assumptions about the name of private global variables.
Private variables are can be renamed, so it is not reliable to make
decisions on the name.

The name is also dropped by the assembler before getting to the
linker, so using the name causes a disconnect between how llvm makes a
decision (var name) and how the linker makes a decision (section it is
in).

This patch changes one case where we were looking at the variable name to use
the section instead.

Test tuning by Michael Gottesman.

llvm-svn: 222061
2014-11-14 23:17:47 +00:00
David Majnemer 8c3d92e7e5 InstCombine: Fix infinite loop caused by visitFPTrunc
We would attempt to replace a fptrunc of an frem with an identical
fptrunc.  This would cause the new fptrunc to be added to the worklist.
Of course, this results in an infinite loop because we will keep
visiting the newly created fptruncs.

This fixes PR21576.

llvm-svn: 222040
2014-11-14 21:21:15 +00:00
Chad Rosier 1ff4c0bf0b Reapply r221924: "[GVN] Perform Scalar PRE on gep indices that feed loads before
doing Load PRE"

This commit updates the failing test in
Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll

The failing test is sensitive to the order in which we process loads.  This
version turns on the RPO traversal instead of the while DT traversal in GVN.
The new test code is functionally same just the order of loads that are
eliminated is swapped.

This new version also fixes an issue where GVN splits a critical edge and
potentially invalidate the RPO/DT iterator.

llvm-svn: 222039
2014-11-14 21:09:13 +00:00
David Blaikie 711cd9c53c Remove redundant virtual on overriden functions.
llvm-svn: 222023
2014-11-14 19:06:36 +00:00
Chad Rosier df8f2a23cb [Reassociate] Canonicalize the operands of all binary operators.
llvm-svn: 222008
2014-11-14 17:09:19 +00:00
Chad Rosier d99df68e19 [Reassociate] Canonicalize operands of vector binary operators.
Prior to this commit fmul and fadd binary operators were being canonicalized for
both scalar and vector versions.  We now canonicalize add, mul, and, or, and xor
vector instructions.

llvm-svn: 222006
2014-11-14 17:08:15 +00:00
Chad Rosier f8b55f1bc5 [Reassociate] Canonicalize constants to RHS operand.
llvm-svn: 222005
2014-11-14 17:05:59 +00:00
Chad Rosier f59e548ba7 [Reassociate] Improve rank debug information. NFC.
llvm-svn: 221999
2014-11-14 15:01:38 +00:00
David Blaikie a92765ca32 Fix 80 cols caught by the linter...
We have a linter running in our build now?

llvm-svn: 221957
2014-11-14 00:41:42 +00:00
Duncan P. N. Exon Smith 77349e7aaf IR: Make MDString::getName() private
Hide the fact that `MDString`'s string is stored in `Value::Name` --
that's going to change soon.  Update the only in-tree client that was
using it instead of `Value::getString()`.

Part of PR21532.

llvm-svn: 221951
2014-11-13 23:59:16 +00:00
Reid Kleckner 971c3ea67b Use nullptr instead of NULL for variadic sentinels
Windows defines NULL to 0, which when used as an argument to a variadic
function, is not a null pointer constant. As a result, Clang's
-Wsentinel fires on this code. Using '0' would be wrong on most 64-bit
platforms, but both MSVC and Clang make it work on Windows. Sidestep the
issue with nullptr.

llvm-svn: 221940
2014-11-13 22:55:19 +00:00
Chad Rosier 8716b58583 Revert "[GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE."
This reverts commit r221924.  It appears the commit was a bit premature and is causing
bot failures that need further investigation.

llvm-svn: 221939
2014-11-13 22:54:59 +00:00
Chad Rosier dd526665fc [GVN] Perform Scalar PRE on gep indices that feed loads before doing Load PRE.
Phabricator Revision: http://reviews.llvm.org/D6103
Patch by "Balaram Makam" <bmakam@codeaurora.org>!

llvm-svn: 221924
2014-11-13 21:17:58 +00:00
Chad Rosier 9074b18785 [Reassociate] Update comment. NFC.
llvm-svn: 221894
2014-11-13 15:40:20 +00:00
Ahmed Bougacha 55a333d89b Add fortified (__*_chk) library functions to TLI (NFC)
One of them (__memcpy_chk) was already there, the others were checked
by comparing function names.
Note that the fortified libfuncs are now part of TLI, but are always
available, because they aren't generated, only optimized into the
non-checking versions.

Differential Revision: http://reviews.llvm.org/D6179

llvm-svn: 221817
2014-11-12 21:23:34 +00:00
Jingyue Wu 8a12cea5f1 Disable indvar widening if arithmetics on the wider type are more expensive
Summary:
Reapply r221772. The old patch breaks the bot because the @indvar_32_bit test
was run whether NVPTX was enabled or not.

IndVarSimplify should not widen an indvar if arithmetics on the wider
indvar are more expensive than those on the narrower indvar. For
instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is
twice as expensive as that on i32, because the hardware needs to
simulate a 64-bit integer using two 32-bit integers.

Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo.

Fixes PR21148.

Test Plan:
Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics
on the wider type are more expensive. This test is run only when NVPTX is
enabled.

Reviewers: jholewinski, eliben, meheff, atrick

Reviewed By: atrick

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D6196

llvm-svn: 221799
2014-11-12 18:09:15 +00:00
Sanjay Patel 7777b50eaf remove function names from comments; NFC
llvm-svn: 221798
2014-11-12 18:07:42 +00:00
Jingyue Wu a48273390c Reverts r221772 which fails tests
llvm-svn: 221773
2014-11-12 07:19:25 +00:00
Jingyue Wu 635a9b14fa Disable indvar widening if arithmetics on the wider type are more expensive
Summary:
IndVarSimplify should not widen an indvar if arithmetics on the wider
indvar are more expensive than those on the narrower indvar. For
instance, although NVPTX64 treats i64 as a legal type, an ADD on i64 is
twice as expensive as that on i32, because the hardware needs to
simulate a 64-bit integer using two 32-bit integers.

Split from D6188, and based on D6195 which adds NVPTXTargetTransformInfo.

Fixes PR21148.

Test Plan:
Added @indvar_32_bit that verifies we do not widen an indvar if the arithmetics
on the wider type are more expensive.

Reviewers: jholewinski, eliben, meheff, atrick

Reviewed By: atrick

Subscribers: jholewinski, llvm-commits

Differential Revision: http://reviews.llvm.org/D6196

llvm-svn: 221772
2014-11-12 06:58:45 +00:00
Bill Schmidt 729547847f [PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics
This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for
PowerPC, which provide programmer access to the lxvd2x, lxvw4x,
stxvd2x, and stxvw4x instructions.

New LLVM intrinsics are provided to represent these four instructions
in IntrinsicsPowerPC.td.  These are patterned after the similar
intrinsics for lvx and stvx (Altivec).  In PPCInstrVSX.td, these
intrinsics are tied to the code gen patterns, with additional patterns
to allow plain vanilla loads and stores to still generate these
instructions.

At -O1 and higher the intrinsics are immediately converted to loads
and stores in InstCombineCalls.cpp.  This will open up more
optimization opportunities while still allowing the correct
instructions to be generated.  (Similar code exists for aligned
Altivec loads and stores.)

The new intrinsics are added to the code that checks for consecutive
loads and stores in PPCISelLowering.cpp, as well as to
PPCTargetLowering::getTgtMemIntrinsic().

There's a new test to verify the correct instructions are generated.
The loads and stores tend to be reordered, so the test just counts
their number.  It runs at -O2, as it's not very effective to test this
at -O0, when many unnecessary loads and stores are generated.

I ended up having to modify vsx-fma-m.ll.  It turns out this test case
is slightly unreliable, but I don't know a good way to prevent
problems with it.  The xvmaddmdp instructions read and write the same
register, which is one of the multiplicands.  Commutativity allows
either to be chosen.  If the FMAs are reordered differently than
expected by the test, the register assignment can be different as a
result.  Hopefully this doesn't change often.

There is a companion patch for Clang.

llvm-svn: 221767
2014-11-12 04:19:40 +00:00
Chad Rosier f53f07046b [Reassociate] Canonicalize negative constants out of expressions.
Add support for FDiv, which was regressed by the previous commit.

llvm-svn: 221738
2014-11-11 23:36:42 +00:00
Philip Reames 66c6de61ee Canonicalize an assume(load != null) into !nonnull metadata
We currently have two ways of informing the optimizer that the result of a load is never null: metadata and assume. This change converts the second in to the former. This avoids a need to implement optimizations using both forms.

We should probably extend this basic idea to metadata of other forms; in particular, range metadata. We view is that assumes should be considered a "last resort" for when there isn't a more canonical way to represent something.

Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5951

llvm-svn: 221737
2014-11-11 23:33:19 +00:00
Kostya Serebryany 231bd088d8 [asan] adding ShadowOffset64 for mips64, patch by Kumar Sukhani
llvm-svn: 221725
2014-11-11 23:02:57 +00:00
Chad Rosier 094ac7735b [Reassociate] Canonicalize negative constants out of expressions.
This is a reapplication of r221171, but we only perform the transformation
on expressions which include a multiplication.  We do not transform rem/div
operations as this doesn't appear to be safe in all cases.

llvm-svn: 221721
2014-11-11 22:58:35 +00:00
Kostya Serebryany 29a18dcbc5 Move asan-coverage into a separate phase.
Summary:
This change moves asan-coverage instrumentation
into a separate Module pass.
The other part of the change in clang introduces a new flag
-fsanitize-coverage=N.
Another small patch will update tests in compiler-rt.

With this patch no functionality change is expected except for the flag name.
The following changes will make the coverage instrumentation work with tsan/msan

Test Plan: Run regression tests, chromium.

Reviewers: nlewycky, samsonov

Reviewed By: nlewycky, samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D6152

llvm-svn: 221718
2014-11-11 22:14:37 +00:00
Duncan P. N. Exon Smith de36e8040f Revert "IR: MDNode => Value"
Instead, we're going to separate metadata from the Value hierarchy.  See
PR21532.

This reverts commit r221375.
This reverts commit r221373.
This reverts commit r221359.
This reverts commit r221167.
This reverts commit r221027.
This reverts commit r221024.
This reverts commit r221023.
This reverts commit r220995.
This reverts commit r220994.

llvm-svn: 221711
2014-11-11 21:30:22 +00:00
Juergen Ributzka d441725d3d [SwitchLowering] Fix the "fixPhis" function.
Switch statements may have more than one incoming edge into the same BB if they
all have the same value. When the switch statement is converted these incoming
edges are now coming from multiple BBs. Updating all incoming values to be from
a single BB is incorrect and would generate invalid LLVM IR.

The fix is to only update the first occurrence of an incoming value. Switch
lowering will perform subsequent calls to this helper function for each incoming
edge with a new basic block - updating all edges in the process.

This fixes rdar://problem/18916275.

llvm-svn: 221627
2014-11-10 21:05:27 +00:00
Vasileios Kalintiris ccde2a9a1e Fix extra semicolon warning. NFC.
llvm-svn: 221613
2014-11-10 17:37:53 +00:00
Saleem Abdulrasool d2c5d7f6da Transforms: address some late comments
We already use the llvm namespace.  Remove the unnecessary prefix.  Use the
StringRef::equals method to compare with C strings rather than instantiating
std::strings.

Addresses late review comments from David Majnemer.

llvm-svn: 221564
2014-11-08 00:00:50 +00:00
Saleem Abdulrasool 92b13aac04 Transforms: sort source files in build
Sort target sources.  NFC.

llvm-svn: 221563
2014-11-08 00:00:47 +00:00
Chad Rosier b3eb452e83 [Reassociate] Better preserve NSW/NUW flags.
Part of PR12985.

Phabricator Revision: http://reviews.llvm.org/D6172

llvm-svn: 221555
2014-11-07 22:12:57 +00:00
Saleem Abdulrasool 89c5ad4cda Transforms: use typedef rather than using aliases
Visual Studio 2012 apparently does not support using alias declarations.  Use
the more traditional typedef approach.  This should let the Windows buildbots
pass.  NFC.

llvm-svn: 221554
2014-11-07 22:09:52 +00:00
Saleem Abdulrasool 5898e09057 Transform: add SymbolRewriter pass
This introduces the symbol rewriter. This is an IR->IR transformation that is
implemented as a CodeGenPrepare pass. This allows for the transparent
adjustment of the symbols during compilation.

It provides a clean, simple, elegant solution for symbol inter-positioning. This
technique is often used, such as in the various sanitizers and performance
analysis.

The control of this is via a custom YAML syntax map file that indicates source
to destination mapping, so as to avoid having the compiler to know the exact
details of the source to destination transformations.

llvm-svn: 221548
2014-11-07 21:32:08 +00:00
David Majnemer 2098b86f64 SCCP: overdefined calls cannot become constant
We would attempt to fold away a call instruction which had been marked
overdefined.  However, it's not valid to transition to constant from
overdefined.

This fixes PR21512.

llvm-svn: 221513
2014-11-07 08:54:19 +00:00
David Majnemer bf93e7c7d3 LoopVectorize: Don't assume pointees are sized
A pointer's pointee might not be sized: the pointee could be a function.

Report this as IK_NoInduction when calculating isInductionVariable.

This fixes PR21508.

llvm-svn: 221501
2014-11-07 00:31:14 +00:00
David Majnemer c1eca5ad7c InstCombine: Rely on cmpxchg's return code when it's strong
Comparing the result of a cmpxchg instruction can be replaced with an
extractvalue of the cmpxchg success indicator.

llvm-svn: 221498
2014-11-06 23:23:30 +00:00
Rafael Espindola b7a4505a3f Base check on the section name, not the variable name.
The variable is private, so the name should not be relied on. Also, the
linker uses the sections, so asan should too when trying to avoid causing
the linker problems.

llvm-svn: 221480
2014-11-06 20:01:34 +00:00
Chad Rosier ac6a2f532c [Reassociate] Don't reassociate when mixing regular and fast-math FP
instructions.  Inlining might cause such cases and it's not valid to
reassociate floating-point instructions without the unsafe algebra flag.

Patch by Mehdi Amini <mehdi_amini@apple.com>!

llvm-svn: 221462
2014-11-06 16:46:37 +00:00
Justin Bogner 58e41344f9 GCOV: Make sure that function idents in the .gcda and .gcno match
When generating gcov compatible profiling, we sometimes skip emitting
data for functions for one reason or another. However, this was
emitting different function IDs in the .gcno and .gcda files, because
the .gcno case was using the loop index before skipping functions and
the .gcda the array index after. This resulted in completely invalid
gcov data.

This fixes the problem by making the .gcno loop track the ID
separately from the loop index.

llvm-svn: 221441
2014-11-06 06:55:02 +00:00
Michael Ilseman a7202bdbed Fix heap-use-after-free bug in expandSDiv when the operands are
constants, as discovered by ASAN.

Patch by Mehdi Amini!

llvm-svn: 221401
2014-11-05 21:28:24 +00:00
Duncan P. N. Exon Smith c5754a65e6 IR: MDNode => Value: NamedMDNode::getOperator()
Change `NamedMDNode::getOperator()` from returning `MDNode *` to
returning `Value *`.  To reduce boilerplate at some call sites, add a
`getOperatorAsMDNode()` for named metadata that's expected to only
return `MDNode` -- for now, that's everything, but debug node named
metadata (such as llvm.dbg.cu and llvm.dbg.sp) will soon change.  This
is part of PR21433.

Note that there's a follow-up patch to clang for the API change.

llvm-svn: 221375
2014-11-05 18:16:03 +00:00
Peter Collingbourne a1099840ff [dfsan] Abort at runtime on indirect calls to uninstrumented vararg functions.
We currently have no infrastructure to support these correctly.

This is accomplished by generating a call to a runtime library function that
aborts at runtime in place of the regular wrapper for such functions. Direct
calls are rewritten in the usual way during traversal of the caller's IR.

We also remove the "split-stack" attribute from such wrappers, as the code
generator cannot currently handle split-stack vararg functions.

llvm-svn: 221360
2014-11-05 17:21:00 +00:00
Reid Kleckner 941e93e9a8 Revert "[Reassociate] Canonicalize negative constants out of expressions."
This reverts commit r221171.

It performs this invalid transformation:
-  %div.i = urem i64 -1, %add
-  %sub.i = sub i64 -2, %div.i
+  %div.i = urem i64 1, %add
+  %sub.i1 = add i64 %div.i, -2

llvm-svn: 221317
2014-11-04 23:42:45 +00:00
Mark Heffernan 2d393ea6ef Revert earlier change removing setPreservesCFG from instcombine (r221223) and
change LoopSimplifyPass to be !isCFGOnly.  The motivation for the earlier patch
(r221223) was that LoopSimplify is not preserved by instcombine though
setPreservesCFG indicates that it is.  This change fixes the issue
by making setPreservesCFG no longer imply LoopSimplifyPass, and is therefore less
invasive.

llvm-svn: 221311
2014-11-04 23:02:09 +00:00
Kostya Serebryany c5bd9810cc [asan] [mips] changed ShadowOffset32 for systems having 16kb PageSize; patch by Kumar Sukhani
llvm-svn: 221288
2014-11-04 19:46:15 +00:00
Reid Kleckner dd3f3edafa Revert "Transforms: reapply SVN r219899"
This reverts commit r220811 and r220839. It made an incorrect change to
musttail handling.

llvm-svn: 221226
2014-11-04 02:02:14 +00:00
Mark Heffernan 2e25042a93 Remove setPreservesCFG from instcombine. The pass, in particular, does not
preserve LoopSimplify because instcombine may replace branch predicates
with undef which loop simplify then replaces with always exit.  Replace
setPreservesCFG with the more constrained preservation of DomTree and
LoopInfo.

llvm-svn: 221223
2014-11-04 01:51:01 +00:00
Hal Finkel 840257a49c Use AA in LoadCombine
LoadCombine can be smarter about aborting when a writing instruction is
encountered, instead of aborting upon encountering any writing instruction, use
an AliasSetTracker, and only abort when encountering some write that might
alias with the loads that could potentially be combined.

This was originally motivated by comments made (and a test case provided) by
David Majnemer in response to PR21448. It turned out that LoadCombine was not
responsible for that PR, but LoadCombine should also be improved so that
unrelated stores (and @llvm.assume) don't interrupt load combining.

llvm-svn: 221203
2014-11-03 23:19:16 +00:00
David Majnemer 7e2b9882b1 InstCombine: Remove infinite loop caused by FoldOpIntoPhi
FoldOpIntoPhi could create an infinite loop if the PHI could potentially
reach a BB it was considering inserting instructions into.  The
instructions it would insert would eventually lead to other combines
firing which would, again, lead to FoldOpIntoPhi firing.

The solution is to handicap FoldOpIntoPhi so that it doesn't attempt to
insert instructions that the PHI might reach.

This fixes PR21377.

llvm-svn: 221187
2014-11-03 21:55:12 +00:00