Commit Graph

3930 Commits

Author SHA1 Message Date
Nick Lewycky 5b15037fc9 Check that TD isn't NULL before dereferencing it down this path.
llvm-svn: 187099
2013-07-25 02:55:14 +00:00
Manman Ren ed696c3dc1 Update testing cases to pass debug info verifier.
llvm-svn: 187083
2013-07-24 22:23:00 +00:00
Rafael Espindola b552086fcb add -disable-debug-info-verifier to 3 test to fix tests with pipefail.
llvm-svn: 187064
2013-07-24 18:44:10 +00:00
Manman Ren fdfc1ebfbc Debug Info: improve the Finder.
Improve the Finder to handle context of a DIVariable used by DbgValueInst.
Fix testing cases to make them pass the verifier.

llvm-svn: 187052
2013-07-24 17:10:09 +00:00
Chandler Carruth 58e25d3905 Fix a problem I introduced in r187029 where we would over-eagerly
schedule an alloca for another iteration in SROA. This only showed up
with a mixture of promotable and unpromotable selects and phis. Added
a test case for this.

llvm-svn: 187031
2013-07-24 12:12:17 +00:00
Chandler Carruth 83ea195d40 Fix PR16687 where we were incorrectly promoting an alloca that had
pending speculation for a phi node. The problem here is that we were
using growth of the specluation set as an indicator of whether
speculation would occur, and if the phi node is already in the set we
don't see it grow. This is a symptom of the fact that this signal is
a total hack.

Unfortunately, I couldn't really come up with a non-hacky way of
signaling that promotion remains valid *after* speculation occurs, such
that we only speculate when all else looks good for promotion. In the
end, I went with at least a much more explicit approach of doing the
work of queuing inside the phi and select processing and setting
a preposterously named flag to convey that we're in the special state of
requiring speculating before promotion.

Thanks to Richard Trieu and Nick Lewycky for the excellent work reducing
a testcase for this from a pretty giant, nasty assert in a big
application. =] The testcase was excellent.

llvm-svn: 187029
2013-07-24 09:47:28 +00:00
Rafael Espindola 2dfb1cfd0c Add -disable-debug-info-verifier.
Found while testing with pipefail enabled.

llvm-svn: 186937
2013-07-23 12:31:37 +00:00
Manman Ren 9974c88f76 Debug Info Finder: use processDeclare and processValue to list debug info
MDNodes used by DbgDeclareInst and DbgValueInst.

Another 16 testing cases failed and they are disabled with
-disable-debug-info-verifier.
A total of 34 cases are disabled with -disable-debug-info-verifier and will be
corrected.

llvm-svn: 186902
2013-07-23 00:22:51 +00:00
Nadav Rotem cf0dcdc71c When we vectorize across multiple basic blocks we may vectorize PHINodes that create a cycle. We already break the cycle on phi-nodes, but arithmetic operations are still uplicated. This patch adds code that checks if the operation that we are vectorizing was vectorized during the visit of the operands and uses this value if it can.
llvm-svn: 186883
2013-07-22 22:18:07 +00:00
Richard Smith 70523c7926 Treat nothrow forms of ::operator delete and ::operator delete[] as
deallocation functions.

llvm-svn: 186798
2013-07-21 23:11:42 +00:00
Rafael Espindola c2bb73fc8d Don't crash when llvm.compiler.used becomes empty.
GlobalOpt simplifies llvm.compiler.used by removing any members that are also
in the more strict llvm.used. Handle the special case where llvm.compiler.used
becomes empty.

llvm-svn: 186778
2013-07-20 23:33:15 +00:00
Stephen Lin a9b57f6bea InstCombine: call FoldOpIntoSelect for all floating binops, not just fmul
llvm-svn: 186759
2013-07-20 07:13:13 +00:00
Matt Arsenault 727aa349ad Have InlineCost check constant fcmps
llvm-svn: 186758
2013-07-20 04:09:00 +00:00
Rafael Espindola 9aadcc4c0e s/compiler_used/compiler.used/.
We were incorrectly using compiler_used instead of compiler.used. Unfortunately
the passes using the broken name had tests also using the broken name.

llvm-svn: 186705
2013-07-19 18:44:51 +00:00
Chandler Carruth 1ed848d55c Fix another assert failure very similar to PR16651's test case. This
test case came from Benjamin and found the parallel bug in the vector
promotion code.

llvm-svn: 186666
2013-07-19 10:57:32 +00:00
Chandler Carruth 5955c9e4da Fix PR16651, an assert introduced in my recent re-work of the innards of
SROA.

The crux of the issue is that now we track uses of a partition of the
alloca in two places: the iterators over the partitioning uses and the
previously collected split uses vector. We weren't accounting for the
fact that the split uses might invalidate integer widening in ways other
than due to their width (in this case due to being volatile).

Further reduced testcase added to the tests.

llvm-svn: 186655
2013-07-19 07:12:23 +00:00
Chandler Carruth f0546402af Reapply r186316 with a fix for one bug where the code could walk off the
end of a vector. This was found with ASan. I've had one other report of
a crasher, but thus far been unable to reproduce the crash. It may well
be fixed with this version, and if not I'd like to get more information
from the build bots about what is happening.

See r186316 for the full commit log for the new implementation of the
SROA algorithm.

llvm-svn: 186565
2013-07-18 07:15:00 +00:00
Stephen Lin 03f9fbbcd7 Restore r181216, which was partially reverted in r182499.
llvm-svn: 186533
2013-07-17 20:06:03 +00:00
Hal Finkel ec7cd26968 Fix comparisons of alloca alignment in inliner merging
Duncan pointed out a mistake in my fix in r186425 when only one of the allocas
being compared had the target-default alignment. This is essentially his
suggested solution. Thanks!

llvm-svn: 186510
2013-07-17 14:32:41 +00:00
Hal Finkel 9caa8f7ba7 When the inliner merges allocas, it must keep the larger alignment
For safety, the inliner cannot decrease the allignment on an alloca when
merging it with another.

I've included two variants of the test case for this: one with DataLayout
available, and one without. When DataLayout is not available, if only one of
the allocas uses the default alignment (getAlignment() == 0), then they cannot
be safely merged.

llvm-svn: 186425
2013-07-16 17:10:55 +00:00
Nadav Rotem 1c1d6c1666 PR16628: Fix a bug in the code that merges compares.
Compares return i1 but they compare different types.

llvm-svn: 186359
2013-07-15 22:52:48 +00:00
Chandler Carruth e3899f2c2c Revert r186316 while I track down an ASan failure and an assert from
a bot.

This reverts the commit which introduced a new implementation of the
fancy SROA pass designed to reduce its overhead. I'll skip the huge
commit log here, refer to r186316 if you're looking for how this all
works and why it works that way.

llvm-svn: 186332
2013-07-15 17:36:21 +00:00
Chandler Carruth e74ff4c643 Reimplement SROA yet again. Same fundamental principle, but a totally
different core implementation strategy.

Previously, SROA would build a relatively elaborate partitioning of an
alloca, associate uses with each partition, and then rewrite the uses of
each partition in an attempt to break apart the alloca into chunks that
could be promoted. This was very wasteful in terms of memory and compile
time because regardless of how complex the alloca or how much we're able
to do in breaking it up, all of the datastructure work to analyze the
partitioning was done up front.

The new implementation attempts to form partitions of the alloca lazily
and on the fly, rewriting the uses that make up that partition as it
goes. This has a few significant effects:
1) Much simpler data structures are used throughout.
2) No more double walk of the recursive use graph of the alloca, only
   walk it once.
3) No more complex algorithms for associating a particular use with
   a particular partition.
4) PHI and Select speculation is simplified and happens lazily.
5) More precise information is available about a specific use of the
   alloca, removing the need for some side datastructures.

Ultimately, I think this is a much better implementation. It removes
about 300 lines of code, but arguably removes more like 500 considering
that some code grew in the process of being factored apart and cleaned
up for this all to work.

I've re-used as much of the old implementation as possible, which
includes the lion's share of code in the form of the rewriting logic.
The interesting new logic centers around how the uses of a partition are
sorted, and split into actual partitions.

Each instruction using a pointer derived from the alloca gets
a 'Partition' entry. This name is totally wrong, but I'll do a rename in
a follow-up commit as there is already enough churn here. The entry
describes the offset range accessed and the nature of the access. Once
we have all of these entries we sort them in a very specific way:
increasing order of begin offset, followed by whether they are
splittable uses (memcpy, etc), followed by the end offset or whatever.
Sorting by splittability is important as it simplifies the collection of
uses into a partition.

Once we have these uses sorted, we walk from the beginning to the end
building up a range of uses that form a partition of the alloca.
Overlapping unsplittable uses are merged into a single partition while
splittable uses are broken apart and carried from one partition to the
next. A partition is also introduced to bridge splittable uses between
the unsplittable regions when necessary.

I've looked at the performance PRs fairly closely. PR15471 no longer
will even load (the module is invalid). Not sure what is up there.
PR15412 improves by between 5% and 10%, however it is nearly impossible
to know what is holding it up as SROA (the entire pass) takes less time
than reading the IR for that test case. The analysis takes the same time
as running mem2reg on the final allocas. I suspect (without much
evidence) that the new implementation will scale much better however,
and it is just the small nature of the test cases that makes the changes
small and noisy. Either way, it is still simpler and cleaner I think.

llvm-svn: 186316
2013-07-15 10:30:19 +00:00
Andrew Trick 8eaae28693 Teach indvars to generate nsw/nuw flags when widening an induction variable.
Fixes PR16600.

llvm-svn: 186272
2013-07-14 02:50:07 +00:00
Stephen Lin c89a0ececb Fixup to r186268 and r186269: don't append -LABEL to CHECK-NOT. No functionality change.
llvm-svn: 186271
2013-07-14 02:10:57 +00:00
Stephen Lin a76289aa1b Catch more CHECK that can be converted to CHECK-LABEL in Transforms for easier debugging. No functionality change.
This conversion was done with the following bash script:

  find test/Transforms -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    if ! grep -q "^; *RUN: *llc" $NAME; then
      TEMP=`mktemp -t temp`
      cp $NAME $TEMP
      sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
      while read FUNC; do
        sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)define\([^@]*\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3define\4@$FUNC(/g" $TEMP
      done
      mv $TEMP $NAME
    fi
  done

llvm-svn: 186269
2013-07-14 01:50:49 +00:00
Stephen Lin c1c7a1309c Update Transforms tests to use CHECK-LABEL for easier debugging. No functionality change.
This update was done with the following bash script:

  find test/Transforms -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    if ! grep -q "^; *RUN: *llc" $NAME; then
      TEMP=`mktemp -t temp`
      cp $NAME $TEMP
      sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
      while read FUNC; do
        sed -i '' "s/;\(.*\)\([A-Za-z0-9_]*\):\( *\)@$FUNC\([( ]*\)\$/;\1\2-LABEL:\3@$FUNC(/g" $TEMP
      done
      mv $TEMP $NAME
    fi
  done

llvm-svn: 186268
2013-07-14 01:42:54 +00:00
Stephen Lin 2e105ff8b7 Modify two Transforms tests to explicitly check for full function names in some cases, rather than just a common prefix. No functionality change.
(This is to avoid confusing a scripted mass update of these tests to use CHECK-LABEL)

llvm-svn: 186267
2013-07-14 01:38:19 +00:00
Stephen Lin 6dd347b39f Add newlines at end of test files, no functionality change
llvm-svn: 186263
2013-07-13 22:00:58 +00:00
Arnold Schwaighofer a92eeebde8 LoopVectorizer: Disallow reductions whose header phi is used outside the loop
If an outside loop user of the reduction value uses the header phi node we
cannot just reduce the vectorized phi value in the vector code epilog because
we would loose VF-1 reductions.

lp:
  p = phi (0, lv)
  lv = lv + 1
  ...
  brcond , lp, outside

outside:
  usr = add 0, p

(Say the loop iterates two times, the value of p coming out of the loop is one).

We cannot just transform this to:

vlp:
  p = phi (<0,0>, lv)
  lv = lv + <1,1>
  ..
  brcond , lp, outside

outside:
  p_reduced = p[0] + [1];
  usr = add 0, p_reduced

(Because the original loop iterated two times the vectorized loop would iterate
one time, but p_reduced ends up being zero instead of one).

We would have to execute VF-1 iterations in the scalar remainder loop in such
cases. For now, just disable vectorization.

PR16522

llvm-svn: 186256
2013-07-13 19:09:29 +00:00
Andrew Trick 960dee381d Make the new vectorizer test immune to TTI
llvm-svn: 186242
2013-07-13 06:40:33 +00:00
Andrew Trick 0ae8c94f8f LoopVectorize fix: LoopInfo must be valid when invoking utils like SCEVExpander.
In general, one should always complete CFG modifications first, update
CFG-based analyses, like Dominatores and LoopInfo, then generate
instruction sequences.

LoopVectorizer was creating a new loop, calling SCEVExpander to
generate checks, then updating LoopInfo. I just changed the order.

llvm-svn: 186241
2013-07-13 06:20:06 +00:00
Nick Lewycky 7459be6dc7 Add a microoptimization for urem.
llvm-svn: 186235
2013-07-13 01:16:47 +00:00
Nick Lewycky 35aeea993b Fix logic error optimizing "icmp pred (urem X, Y), Y" where pred is signed.
Fixes PR16605.

llvm-svn: 186229
2013-07-12 23:42:57 +00:00
Joey Gouly a3250f22c2 Fix a crash in EvaluateInDifferentElementOrder where it would generate an
undef vector of the wrong type.

LGTM'd by Nick Lewycky on IRC.

llvm-svn: 186224
2013-07-12 23:08:06 +00:00
Andrew Trick a1e4118a46 LFTR improvement to avoid truncation.
This is a reimplemntation of the patch originally in r186107.

llvm-svn: 186215
2013-07-12 22:08:48 +00:00
Arnold Schwaighofer 6042a261b8 X86 cost model: Add cost for vectorized gather/scather
radar://14351991

llvm-svn: 186189
2013-07-12 19:16:07 +00:00
Arnold Schwaighofer da2b311865 ARM cost model: Add cost for gather/scather
Fixes a 35% degradation compared to unvectorized code in
MiBench/automotive-susan and an equally serious regression on a private
image processing benchmark.

radar://14351991

llvm-svn: 186188
2013-07-12 19:16:04 +00:00
Stephen Lin 764d8d3d6f Start using CHECK-LABEL in some tests.
llvm-svn: 186163
2013-07-12 14:54:12 +00:00
Chandler Carruth cf3715cadd Revert "indvars: Improve LFTR by eliminating truncation when comparing
against a constant."

This reverts commit r186107. It didn't handle wrapping arithmetic in the
loop correctly and thus caused the following C program to count from
0 to UINT64_MAX instead of from 0 to 255 as intended:

  #include <stdio.h>
  int main() {
    unsigned char first = 0, last = 255;
    do { printf("%d\n", first); } while (first++ != last);
  }

Full test case and instructions to reproduce with just the -indvars pass
sent to the original review thread rather than to r186107's commit.

llvm-svn: 186152
2013-07-12 11:18:55 +00:00
Nadav Rotem 89c41bf06a SLPVectorizer: Sink and enable CSE for ExtractElements.
llvm-svn: 186145
2013-07-12 06:09:24 +00:00
Nadav Rotem fa3c2db211 SLPVectorize: Replace the code that checks for vectorization candidates in successor blocks with code that scans PHINodes.
Before we could vectorize PHINodes scanning successors was a good way of finding candidates. Now we can vectorize the phinodes which is simpler.

llvm-svn: 186139
2013-07-12 00:04:18 +00:00
Andrew Trick 3095993d6f indvars: Improve LFTR by eliminating truncation when comparing against a constant.
Patch by Michele Scandale!

Adds a special handling of the case where, during the loop exit
condition rewriting, the exit value is a constant of bitwidth lower
than the type of the induction variable: instead of introducing a
trunc operation in order to match correctly the operand types, it
allows to convert the constant value to an equivalent constant,
depending on the initial value of the induction variable and the trip
count, in order have an equivalent comparison between the induction
variable and the new constant.

llvm-svn: 186107
2013-07-11 17:08:59 +00:00
Arnold Schwaighofer e97c71b8fd LoopVectorize: Vectorize all accesses in address space zero with unit stride
We can vectorize them because in the case where we wrap in the address space the
unvectorized code would have had to access a pointer value of zero which is
undefined behavior in address space zero according to the LLVM IR semantics.
(Thank you Duncan, for pointing this out to me).

Fixes PR16592.

llvm-svn: 186088
2013-07-11 15:21:55 +00:00
Duncan Sands e773c08021 TryToSimplifyUncondBranchFromEmptyBlock was checking that any common
predecessors of the two blocks it is attempting to merge supply the
same incoming values to any phi in the successor block.  This change
allows merging in the case where there is one or more incoming values
that are undef.  The undef values are rewritten to match the non-undef
value that flows from the other edge.  Patch by Mark Lacey.

llvm-svn: 186069
2013-07-11 08:28:20 +00:00
Nadav Rotem 108ef760ff Consolidate more lit tests.
llvm-svn: 186063
2013-07-11 05:15:11 +00:00
Nadav Rotem e0a49499fe Consolidate some of the lit tests.
llvm-svn: 186062
2013-07-11 05:11:33 +00:00
Nadav Rotem c6b5e2499e Consolidate some of the lit tests.
llvm-svn: 186060
2013-07-11 05:01:50 +00:00
Michael Gottesman b40db26eae Teach TailRecursionElimination to handle certain cases of nocapture escaping allocas.
Without the changes introduced into this patch, if TRE saw any allocas at all,
TRE would not perform TRE *or* mark callsites with the tail marker.

Because TRE runs after mem2reg, this inadequacy is not a death sentence. But
given a callsite A without escaping alloca argument, A may not be able to have
the tail marker placed on it due to a separate callsite B having a write-back
parameter passed in via an argument with the nocapture attribute.

Assume that B is the only other callsite besides A and B only has nocapture
escaping alloca arguments (*NOTE* B may have other arguments that are not passed
allocas). In this case not marking A with the tail marker is unnecessarily
conservative since:

  1. By assumption A has no escaping alloca arguments itself so it can not
     access the caller's stack via its arguments.

  2. Since all of B's escaping alloca arguments are passed as parameters with
     the nocapture attribute, we know that B does not stash said escaping
     allocas in a manner that outlives B itself and thus could be accessed
     indirectly by A.

With the changes introduced by this patch:

  1. If we see any escaping allocas passed as a capturing argument, we do
     nothing and bail early.

  2. If we do not see any escaping allocas passed as captured arguments but we
     do see escaping allocas passed as nocapture arguments:

       i. We do not perform TRE to avoid PR962 since the code generator produces
          significantly worse code for the dynamic allocas that would be created
          by the TRE algorithm.

       ii. If we do not return twice, mark call sites without escaping allocas
           with the tail marker. *NOTE* This excludes functions with escaping
           nocapture allocas.

  3. If we do not see any escaping allocas at all (whether captured or not):

       i. If we do not have usage of setjmp, mark all callsites with the tail
          marker.

       ii. If there are no dynamic/variable sized allocas in the function,
           attempt to perform TRE on all callsites in the function.

Based off of a patch by Nick Lewycky.

rdar://14324281.

llvm-svn: 186057
2013-07-11 04:40:01 +00:00
David Majnemer a80fed7e58 InstSimplify: X >> X -> 0
llvm-svn: 185973
2013-07-09 22:01:22 +00:00
Nadav Rotem d7b574e5b3 Fix PR16571, which is a bug in the code that checks that all of the types in the bundle are uniform.
llvm-svn: 185970
2013-07-09 21:38:08 +00:00
David Majnemer a92b3c914e ValueTracking: Fix bugs in isKnownToBeAPowerOfTwo
(add nsw x, (and x, y)) isn't a power of two if x is zero, it's zero
(add nsw x, (xor x, y)) isn't a power of two if y has bits set that aren't set in x

llvm-svn: 185954
2013-07-09 18:11:10 +00:00
David Majnemer 72d76275ac InstCombine: variations on 0xffffffff - x >= 4
The following transforms are valid if -C is a power of 2:
(icmp ugt (xor X, C), ~C) -> (icmp ult X, C)
(icmp ult (xor X, C), -C) -> (icmp uge X, C)

These are nice, they get rid of the xor.

llvm-svn: 185915
2013-07-09 09:20:58 +00:00
David Majnemer 414d4e58aa InstCombine: X & -C != -C -> X <= u ~C
Tests were added in r185910 somehow.

llvm-svn: 185912
2013-07-09 08:09:32 +00:00
David Majnemer bafa537eb7 Commit r185909 was a misapplied patch, fix it
llvm-svn: 185910
2013-07-09 07:58:32 +00:00
David Majnemer f2a9a513c7 InstCombine: add more transforms
C1-X <u C2 -> (X|(C2-1)) == C1
C1-X >u C2 -> (X|C2) == C1
X-C1 <u C2 -> (X & -C2) == C1
X-C1 >u C2 -> (X & ~C2) == C1

llvm-svn: 185909
2013-07-09 07:50:59 +00:00
David Majnemer fa90a0b325 InstCombine: Fold X-C1 <u 2 -> (X & -2) == C1
Back in r179493 we determined that two transforms collided with each
other.  The fix back then was to reorder the transforms so that the
preferred transform would give it a try and then we would try the
secondary transform.  However, it was noted that the best approach would
canonicalize one transform into the other, removing the collision and
allowing us to optimize IR given to us in that form.

llvm-svn: 185808
2013-07-08 11:53:08 +00:00
Michael Gottesman 8c96263ee3 [objc-arc] Committed test for r185770 as per dblaikie's suggestion.
llvm-svn: 185782
2013-07-08 02:13:47 +00:00
Nick Lewycky c0514629c9 Eliminate trivial redundant loads across nocapture+readonly calls to uncaptured
pointer arguments.

llvm-svn: 185776
2013-07-07 10:15:16 +00:00
Nadav Rotem 2041b742d4 SLPVectorizer: Implement DCE as part of vectorization.
This is a complete re-write if the bottom-up vectorization class.
Before this commit we scanned the instruction tree 3 times. First in search of merge points for the trees. Second, for estimating the cost. And finally for vectorization.
There was a lot of code duplication and adding the DCE exposed bugs. The new design is simpler and DCE was a part of the design.
In this implementation we build the tree once. After that we estimate the cost by scanning the different entries in the constructed tree (in any order). The vectorization phase also works on the built tree.

llvm-svn: 185774
2013-07-07 06:57:07 +00:00
Michael Gottesman 618df456e2 [objc-arc] Remove the alias analysis part of r185764.
Upon further reflection, the alias analysis part of r185764 is not a safe
change.

llvm-svn: 185770
2013-07-07 04:18:03 +00:00
Michael Gottesman a72630d453 [objc-arc] Teach the ARC optimizer that objc_sync_enter/objc_sync_exit do not modify the ref count of an objc object and additionally are inert for modref purposes.
llvm-svn: 185769
2013-07-07 01:52:55 +00:00
David Majnemer 69430609ff InstCombine: typo in or_icmp_eq_B_0_icmp_ult_A_B test
llvm-svn: 185737
2013-07-06 00:54:07 +00:00
Nick Lewycky c2ec0725ce Extend 'readonly' and 'readnone' to work on function arguments as well as
functions. Make the function attributes pass add it to known library functions
and when it can deduce it.

llvm-svn: 185735
2013-07-06 00:29:58 +00:00
Michael Gottesman 275b22e310 [TRE] Combined another test into basic.ll
llvm-svn: 185729
2013-07-05 22:24:06 +00:00
Michael Gottesman e283e1958a [TRE] Merged several tests into the the test basic.ll.
llvm-svn: 185723
2013-07-05 20:45:13 +00:00
David Majnemer c2a990bc00 InstCombine: (icmp eq B, 0) | (icmp ult A, B) -> (icmp ule A, B-1)
This transform allows us to turn IR that looks like:
  %1 = icmp eq i64 %b, 0
  %2 = icmp ult i64 %a, %b
  %3 = or i1 %1, %2
  ret i1 %3

into:
  %0 = add i64 %b, -1
  %1 = icmp uge i64 %0, %a
  ret i1 %1

which means we go from lowering:
        cmpq    %rsi, %rdi
        setb    %cl
        testq   %rsi, %rsi
        sete    %al
        orb     %cl, %al
        ret

to lowering:
        decq    %rsi
        cmpq    %rdi, %rsi
        setae   %al
        ret

llvm-svn: 185677
2013-07-05 00:31:17 +00:00
David Majnemer 37f8f445de InstCombine: Reimplementation of visitUDivOperand
This transform was originally added in r185257 but later removed in
r185415.  The original transform would create instructions speculatively
and then discard them if the speculation was proved incorrect.  This has
been replaced with a scheme that splits the transform into two parts:
preflight and fold.  While we preflight, we build up fold actions that
inform the folding stage on how to act.

llvm-svn: 185667
2013-07-04 21:17:49 +00:00
Benjamin Kramer 371722288c SimplifyCFG: Teach switch generation some patterns that instcombine forms.
This allows us to create switches even if instcombine has munged two of the
incombing compares into one and some bit twiddling. This was motivated by enum
compares that are common in clang.

llvm-svn: 185632
2013-07-04 14:22:02 +00:00
Michael Gottesman bed2e82501 Change the gettimeofday test to only test on a posix platform.
llvm-svn: 185503
2013-07-03 04:15:22 +00:00
Michael Gottesman 2db11161a8 Added support in FunctionAttrs for adding relevant function/argument attributes for the posix call gettimeofday.
This implies annotating it as nounwind and its arguments as nocapture. To be
conservative, we do not annotate the arguments with noalias since some platforms
do not have restrict on the declaration for gettimeofday.

llvm-svn: 185502
2013-07-03 04:00:54 +00:00
Hal Finkel fdbe161b1a Revert r185257 (InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms)
I'm reverting this commit because:

 1. As discussed during review, it needs to be rewritten (to avoid creating and
then deleting instructions).

 2. This is causing optimizer crashes. Specifically, I'm seeing things like
this:

    While deleting: i1 %
    Use still stuck around after Def is destroyed:  <badref> = select i1 <badref>, i32 0, i32 1
    opt: /src/llvm-trunk/lib/IR/Value.cpp:79: virtual llvm::Value::~Value(): Assertion `use_empty() && "Uses remain when a value is destroyed!"' failed.

   I'd guess that these will go away once we're no longer creating/deleting
instructions here, but just in case, I'm adding a regression test.

Because the code is bring rewritten, I've just XFAIL'd the original regression test. Original commit message:

	InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms

	Real world code sometimes has the denominator of a 'udiv' be a
	'select'.  LLVM can handle such cases but only when the 'select'
	operands are symmetric in structure (both select operands are a constant
	power of two or a left shift, etc.).  This falls apart if we are dealt a
	'udiv' where the code is not symetric or if the select operands lead us
	to more select instructions.

	Instead, we should treat the LHS and each select operand as a distinct
	divide operation and try to optimize them independently.  If we can
	to simplify each operation, then we can replace the 'udiv' with, say, a
	'lshr' that has a new select with a bunch of new operands for the
	select.

llvm-svn: 185415
2013-07-02 05:21:11 +00:00
Arnold Schwaighofer ef51cf202b LoopVectorize: Math functions only read rounding mode
Math functions are mark as readonly because they read the floating point
rounding mode. Because we don't vectorize loops that would contain function
calls that set the rounding mode it is safe to ignore this memory read.

llvm-svn: 185299
2013-07-01 00:54:44 +00:00
Stephen Lin 2e551adcd9 DeadArgumentElimination: keep return value on functions that have a live argument with the 'returned' attribute (rather than generate invalid IR); however, if both can be eliminated, both will be
llvm-svn: 185290
2013-06-30 20:26:21 +00:00
Benjamin Kramer cc846016bf ConstantFold: Check that truncating the other side is safe under a sext when trying to remove a sext from a compare.
Fixes PR16462.

llvm-svn: 185284
2013-06-30 13:47:43 +00:00
David Majnemer 7a69d2c06a ValueTracking: Teach isKnownToBeAPowerOfTwo about (ADD X, (XOR X, Y)) where X is a power of two
This allows us to simplify urem instructions involving the add+xor to
turn into simpler math.

llvm-svn: 185272
2013-06-29 23:44:53 +00:00
Benjamin Kramer 4093f29366 InstCombine: Also turn selects fed by an and into arithmetic when the types don't match.
Inserting a zext or trunc is sufficient. This pattern is somewhat common in
LLVM's pointer mangling code.

llvm-svn: 185270
2013-06-29 21:17:04 +00:00
David Majnemer 5953d3712a InstCombine: FoldGEPICmp shouldn't change sign of base pointer comparison
Changing the sign when comparing the base pointer would introduce all
sorts of unexpected things like:
  %gep.i = getelementptr inbounds [1 x i8]* %a, i32 0, i32 0
  %gep2.i = getelementptr inbounds [1 x i8]* %b, i32 0, i32 0
  %cmp.i = icmp ult i8* %gep.i, %gep2.i
  %cmp.i1 = icmp ult [1 x i8]* %a, %b
  %cmp = icmp ne i1 %cmp.i, %cmp.i1
  ret i1 %cmp

into:
  %cmp.i = icmp slt [1 x i8]* %a, %b
  %cmp.i1 = icmp ult [1 x i8]* %a, %b
  %cmp = xor i1 %cmp.i, %cmp.i1
  ret i1 %cmp

By preserving the original sign, we now get:
  ret i1 false

This fixes PR16483.

llvm-svn: 185259
2013-06-29 10:28:04 +00:00
David Majnemer 797227eea6 InstCombine: Be more agressive optimizing 'udiv' instrs with 'select' denoms
Real world code sometimes has the denominator of a 'udiv' be a
'select'.  LLVM can handle such cases but only when the 'select'
operands are symmetric in structure (both select operands are a constant
power of two or a left shift, etc.).  This falls apart if we are dealt a
'udiv' where the code is not symetric or if the select operands lead us
to more select instructions.

Instead, we should treat the LHS and each select operand as a distinct
divide operation and try to optimize them independently.  If we can
to simplify each operation, then we can replace the 'udiv' with, say, a
'lshr' that has a new select with a bunch of new operands for the
select.

llvm-svn: 185257
2013-06-29 08:40:07 +00:00
David Majnemer b889e405eb InstCombine: Optimize (1 << X) Pred CstP2 to X Pred Log2(CstP2)
We may, after other optimizations, find ourselves with IR that looks
like:

  %shl = shl i32 1, %y
  %cmp = icmp ult i32 %shl, 32

Instead, we should just compare the shift count:

  %cmp = icmp ult i32 %y, 5

llvm-svn: 185242
2013-06-28 23:42:03 +00:00
Nadav Rotem 060be733a5 SLP Vectorizer: Add support for trees with external users.
To support this we have to insert 'extractelement' instructions to pick the right lane.
We had this functionality before but I removed it when we moved to the multi-block design because it was too complicated.

llvm-svn: 185230
2013-06-28 22:07:09 +00:00
Daniel Malea 4146b0404e Adding tests for DebugIR pass
- lit tests verify that each line of input LLVM IR gets a !dbg node and a
  corresponding entry of metadata that contains the line number 
- unit tests verify that DebugIR works as advertised in the interface
- refactored some useful IR generation functionality from the MCJIT unit tests
  so it can be reused

llvm-svn: 185212
2013-06-28 20:37:20 +00:00
Manman Ren 983a16c08a Debug Info: clean up usage of Verify.
No functionality change.
It should suffice to check the type of a debug info metadata, instead of
calling Verify. For cases where we know the type of a DI metadata, use
assert.

Also update testing cases to make them conform to the format of DI classes.

llvm-svn: 185135
2013-06-28 05:43:10 +00:00
Matt Arsenault fbfdced30f Convert tests to FileCheck
llvm-svn: 185124
2013-06-28 01:29:35 +00:00
Arnold Schwaighofer 12ecb331af LoopVectorize: Preserve debug location info
radar://14169017

llvm-svn: 185122
2013-06-28 00:38:54 +00:00
Arnold Schwaighofer 38de7cd464 LoopVectorize: Cache edge masks created during if-conversion
Otherwise, we end up with an exponential IR blowup.
Fixes PR16472.

llvm-svn: 185097
2013-06-27 20:31:06 +00:00
Arnold Schwaighofer a2dd195fb3 LoopVectorize: Use vectorized loop invariant gep index anchored in loop
Use vectorized instruction instead of original instruction anchored in the
original loop.

Fixes PR16452 and t2075.c of PR16455.

llvm-svn: 185081
2013-06-27 15:11:55 +00:00
Manman Ren 31dee5bec9 Update testing case to make DI nodes have the correct format.
llvm-svn: 185061
2013-06-27 06:40:18 +00:00
Arnold Schwaighofer 8db6347b9d Fix spelling.
llvm-svn: 185052
2013-06-27 01:01:11 +00:00
Arnold Schwaighofer ccd6c9929b LoopVectorize: Don't store a reversed value in the vectorized value map
When we store values for reversed induction stores we must not store the
reversed value in the vectorized value map. Another instruction might use this
value.

This fixes 3 test cases of PR16455.

llvm-svn: 185051
2013-06-27 00:45:41 +00:00
Michael Gottesman 41748d7c86 Added support for the Builtin attribute.
The Builtin attribute is an attribute that can be placed on function call site that signal that even though a function is declared as being a builtin,

rdar://problem/13727199

llvm-svn: 185049
2013-06-27 00:25:01 +00:00
Nadav Rotem 4c5b2d1de6 Erase all of the instructions that we RAUWed
llvm-svn: 184969
2013-06-26 17:16:09 +00:00
Nadav Rotem f4ca3994b8 Do not add cse-ed instructions into the visited map because we dont want to consider them as a candidate for replacement of instructions to be visited.
llvm-svn: 184966
2013-06-26 16:54:53 +00:00
Nadav Rotem 0794acc1da SLPVectorizer: support slp-vectorization of PHINodes between basic blocks
llvm-svn: 184888
2013-06-25 23:04:09 +00:00
Bob Wilson acfc01dedf Fix SROA to avoid unnecessary scalar conversions for 1-element vectors.
When a 1-element vector alloca is promoted, a store instruction can often be
rewritten without converting the value to a scalar and using an insertelement
instruction to stuff it into the new alloca.  This patch just adds a check
to skip that conversion when it is unnecessary.  This turns out to be really
important for some ARM Neon operations where <1 x i64> is used to get around
the fact that i64 is not a legal type.

llvm-svn: 184870
2013-06-25 19:09:50 +00:00
Arnold Schwaighofer b252c11ccc Reapply 184685 after the SetVector iteration order fix.
This should hopefully have fixed the stage2/stage3 miscompare on the dragonegg
testers.

"LoopVectorize: Use the dependence test utility class

We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.

We can now vectorize loops with simple constant dependence distances.

  for (i = 8; i < 256; ++i) {
    a[i] = a[i+4] * a[i+8];
  }

  for (i = 8; i < 256; ++i) {
    a[i] = a[i-4] * a[i-8];
  }

We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.

I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.

radar://13681598"

llvm-svn: 184724
2013-06-24 12:09:15 +00:00
Arnold Schwaighofer 58ca945f38 Revert "LoopVectorize: Use the dependence test utility class"
This reverts commit cbfa1ca993363ca5c4dbf6c913abc957c584cbac.

We are seeing a stage2 and stage3 miscompare on some dragonegg bots.

llvm-svn: 184690
2013-06-24 06:10:41 +00:00
Arnold Schwaighofer b914a7e2ef LoopVectorize: Use the dependence test utility class
We now no longer need alias analysis - the cases that alias analysis would
handle are now handled as accesses with a large dependence distance.

We can now vectorize loops with simple constant dependence distances.

  for (i = 8; i < 256; ++i) {
    a[i] = a[i+4] * a[i+8];
  }

  for (i = 8; i < 256; ++i) {
    a[i] = a[i-4] * a[i-8];
  }

We would be able to vectorize about 200 more loops (in many cases the cost model
instructs us no to) in the test suite now. Results on x86-64 are a wash.

I have seen one degradation in ammp. Interestingly, the function in which we
now vectorize a loop is never executed so we probably see some instruction
cache effects. There is a 2% improvement in h264ref. There is one or the other
TSCV loop kernel that speeds up.

radar://13681598

llvm-svn: 184685
2013-06-24 03:55:48 +00:00
Nadav Rotem 210e86d7c4 SLP Vectorizer: Add support for vectorizing parts of the tree.
Untill now we detected the vectorizable tree and evaluated the cost of the
entire tree.  With this patch we can decide to trim-out branches of the tree
that are not profitable to vectorizer.

Also, increase the max depth from 6 to 12. In the worse possible case where all
of the code is made of diamond-shaped graph this can bring the cost to 2**10,
but diamonds are not very common.

llvm-svn: 184681
2013-06-24 02:52:43 +00:00
Nadav Rotem 0323925d51 SLP Vectorizer: Fix a bug in the code that does CSE on the generated gather sequences.
Make sure that we don't replace and RAUW two sequences if one does not dominate the other.

llvm-svn: 184674
2013-06-23 21:57:27 +00:00
Nadav Rotem eb65e67eea SLP Vectorizer: Implement a simple CSE optimization for the gather sequences.
llvm-svn: 184660
2013-06-23 06:15:46 +00:00
Nadav Rotem 80de0a28f1 SLP Vectorizer: Implement multi-block slp-vectorization.
Rewrote the SLP-vectorization as a whole-function vectorization pass. It is now able to vectorize chains across multiple basic blocks.
It still does not vectorize PHIs, but this should be easy to do now that we scan the entire function.
I removed the support for extracting values from trees.
We are now able to vectorize more programs, but there are some serious regressions in many workloads (such as flops-6 and mandel-2).

llvm-svn: 184647
2013-06-22 21:34:10 +00:00
Nadav Rotem 14a89c5428 SLPVectorization: Add a basic support for cross-basic block slp vectorization.
We collect gather sequences when we vectorize basic blocks. Gather sequences are excellent
hints for vectorization of other basic blocks.

llvm-svn: 184444
2013-06-20 17:41:45 +00:00
Matt Arsenault d46fce1141 Move StructurizeCFG out of R600 to generic Transforms.
Register it with PassManager

llvm-svn: 184343
2013-06-19 20:18:24 +00:00
Quentin Colombet 145eb97d3a LSR: Fix the parameters used to compute the scaling factor cost.
Prior to this change, the considered addressing modes may be invalid since the
maximum and minimum offsets were not taking into account.
This was causing an assertion failure.

The added test case exercices that behavior.

<rdar://problem/14199725> Assertion failed: (CurScaleCost >= 0 && "Legal
addressing mode has an illegal cost!")

llvm-svn: 184341
2013-06-19 19:59:41 +00:00
Nadav Rotem 1e9668ea81 SLPVectorizer: handle scalars that are extracted from vectors (using ExtractElementInst).
llvm-svn: 184325
2013-06-19 17:33:16 +00:00
Nadav Rotem 86e848c849 SLPVectorizer: start constructing chains at stores that are not power of two.
The type <3 x i8> is a common in graphics and we want to be able to vectorize it.

This changes accelerates bullet by 12% and 471_omnetpp by 5%.

llvm-svn: 184317
2013-06-19 15:57:29 +00:00
Nadav Rotem e98da7f548 SLPVectorizer: vectorize compares and selects.
llvm-svn: 184282
2013-06-19 05:49:52 +00:00
Pekka Jaaskelainen eb90fd1c3b Fix for a regression caused by the LoopVectorizer when
vectorizing loops with memory accesses to non-zero address spaces. It
simply dropped the AS info. Fixes PR16306.

llvm-svn: 184103
2013-06-17 18:49:06 +00:00
Derek Schuff ec9dc01b33 Fix DeleteDeadVarargs not to crash on functions referenced by BlockAddresses
This pass was assuming that if hasAddressTaken() returns false for a
function, the function's only uses are call sites.  That's not true
because there can be references by BlockAddresses too.

Fix the pass to handle this case.  Fix
BlockAddress::replaceUsesOfWithOnConstant() to allow a function's type
to be changed by RAUW'ing the function with a bitcast of the recreated
function.

Patch by Mark Seaborn.

llvm-svn: 183933
2013-06-13 19:51:17 +00:00
Rafael Espindola 8d30480344 Always remove an alias when we rename the target.
Should fix the dragonegg build bots.

llvm-svn: 183845
2013-06-12 16:45:47 +00:00
Rafael Espindola fb3fc0bf34 Convert test to FileCheck.
llvm-svn: 183843
2013-06-12 16:35:53 +00:00
Rafael Espindola a82555c0f8 Change how globalopt handles aliases in llvm.used.
Instead of a custom implementation of replaceAllUsesWith, we just call
replaceAllUsesWith and recreate llvm.used and llvm.compiler-used.

This change is particularity interesting because it makes llvm see
through what clang is doing with static used functions in extern "C"
contexts. With this change, running clang -O2 in

extern "C" {
  __attribute__((used)) static void foo() {}
}

produces

@llvm.used = appending global [1 x i8*] [i8* bitcast (void ()* @foo to
i8*)], section "llvm.metadata"
define internal void @foo() #0 {
entry:
  ret void
}

llvm-svn: 183756
2013-06-11 17:48:06 +00:00
Tim Northover 64280fbba1 Make DeadArgumentElimination more conservative on variadic functions
Variadic functions are particularly fragile in the face of ABI changes, so this
limits how much the pass changes them

llvm-svn: 183625
2013-06-09 02:17:27 +00:00
Shuxin Yang 140d592d84 Fix a potential bug in r183584.
r183584 tries to derive some info from the code *AFTER* a call and apply
these derived info to the code *BEFORE* the call, which is not always safe
as the call in question may never return, and in this case, the derived
info is invalid.
  
  Thank Duncan for pointing out this potential bug.

rdar://14073661 

llvm-svn: 183606
2013-06-08 04:56:05 +00:00
Shuxin Yang bd254f2601 Fix an assertion in MemCpyOpt pass.
The MemCpyOpt pass is capable of optimizing:
      callee(&S); copy N bytes from S to D.
    into:
      callee(&D);
subject to some legality constraints. 

  Assertion is triggered when the compiler tries to evalute "sizeof(typeof(D))",
while D is an opaque-typed, 'sret' formal argument of function being compiled.
i.e. the signature of the func being compiled is something like this:
  T caller(...,%opaque* noalias nocapture sret %D, ...)

  The fix is that when come across such situation, instead of calling some
utility functions to get the size of D's type (which will crash), we simply
assume D has at least N bytes as implified by the copy-instruction.

rdar://14073661 

llvm-svn: 183584
2013-06-07 22:45:21 +00:00
Michael Gottesman 9e7261c874 [objc-arc] Ensure that the cfg path count does not overflow when we multiply TopDownPathCount/BottomUpPathCount.
rdar://12480535

llvm-svn: 183489
2013-06-07 06:16:49 +00:00
Rafael Espindola 932470bcd9 Add a testcase from pr16244.
llvm-svn: 183433
2013-06-06 19:15:23 +00:00
David Majnemer 29130c5e8d IndVarSimplify: check if loop invariant expansion can trap
IndVarSimplify is willing to move divide instructions outside of their
loop bodies if they are invariant of the loop.  However, it may not be
safe to expand them if we do not know if they can trap.

Instead, check to see if it is not safe to expand the instruction and
skip the expansion.

This fixes PR16041.

Testcase by Rafael Ávila de Espíndola.

llvm-svn: 183239
2013-06-04 17:51:58 +00:00
Rafael Espindola a5e536ab0e Second part of pr16069
The problem this time seems to be a thinko. We were assuming that in the CFG

A
| \
|  B
| /
C

speculating the basic block B would cause only the phi value for the B->C edge
to be speculated. That is not true, the phi's are semantically in the edges, so
if the A->B->C path is taken, any code needed for A->C is not executed and we
have to consider it too when deciding to speculate B.

llvm-svn: 183226
2013-06-04 14:11:59 +00:00
David Majnemer c82f27af2a SimplifyCFG: Do not transform PHI to select if doing so would be unsafe
PR16069 is an interesting case where an incoming value to a PHI is a
trap value while also being a 'ConstantExpr'.

We do not consider this case when performing the 'HoistThenElseCodeToIf'
optimization.

Instead, make our modifications more conservative if we detect that we
cannot transform the PHI to a select.

llvm-svn: 183152
2013-06-03 20:43:12 +00:00
Nick Lewycky 3f715e260a When determining the new index for an insertelement, we may not assume that an
index greater than the size of the vector is invalid. The shuffle may be
shrinking the size of the vector. Fixes a crash!

Also drop the maximum recursion depth of the safety check for this
optimization to five.

llvm-svn: 183080
2013-06-01 20:51:31 +00:00
Andrew Trick ee9143acf5 Prevent loop-unroll from making assumptions about undefined behavior.
Fixes rdar:14036816, PR16130.

There is an opportunity to compute precise trip counts for 'or'
expressions and multi-exit loops.
rdar:14038809: Optimize trip count computation for multi-exit loops.

To do this we need to record the fact that ExitLimit assumes NSW. When
it does not we can safely assume that the loop trip count is the
minimum ExitLimt across all subexpressions and loop exits.

llvm-svn: 183060
2013-05-31 23:34:46 +00:00
Arnold Schwaighofer 70a9be5297 LoopVectorize: PHIs with only outside users should prevent vectorization
We check that instructions in the loop don't have outside users (except if
they are reduction values). Unfortunately, we skipped this check for
if-convertable PHIs.

Fixes PR16184.

llvm-svn: 183035
2013-05-31 19:53:50 +00:00
Quentin Colombet 8aa7abe2ae Modify how the formulae are rated in Loop Strength Reduce.
Namely, check if the target allows to fold more that one register in the
addressing mode and if yes, adjust the cost accordingly.

Prior to this commit, reg1 + scale * reg2 accesses were artificially preferred
to reg1 + reg2 accesses. Indeed, the cost model wrongly assumed that reg1 + reg2
needs a temporary register for the computation, whereas it was correctly
estimated for reg1 + scale * reg2.

<rdar://problem/13973908>

llvm-svn: 183021
2013-05-31 17:20:29 +00:00
Rafael Espindola 65281bf36e Simplify multiplications by vectors whose elements are powers of 2.
Patch by Andrea Di Biagio.

llvm-svn: 183005
2013-05-31 14:27:15 +00:00
Nick Lewycky a2b7720618 Reapply with r182909 with a fix to the calculation of the new indices for
insertelement instructions.

llvm-svn: 182976
2013-05-31 00:59:42 +00:00
Evgeniy Stepanov 2c14269883 Revert r182909.
PR/16177

llvm-svn: 182919
2013-05-30 09:40:17 +00:00
Nick Lewycky d7f27094c0 Swizzle vector inputs if it helps us eliminate shuffles.
llvm-svn: 182909
2013-05-30 04:33:38 +00:00
Paul Redmond 5fdf836ba4 Add support for llvm.vectorizer metadata
- llvm.loop.parallel metadata has been renamed to llvm.loop to be more generic
  by making the root of additional loop metadata.
  - Loop::isAnnotatedParallel now looks for llvm.loop and associated
    llvm.mem.parallel_loop_access
  - document llvm.loop and update llvm.mem.parallel_loop_access
- add support for llvm.vectorizer.width and llvm.vectorizer.unroll
  - document llvm.vectorizer.* metadata
  - add utility class LoopVectorizerHints for getting/setting loop metadata
  - use llvm.vectorizer.width=1 to indicate already vectorized instead of
    already_vectorized
- update existing tests that used llvm.loop.parallel and
  llvm.vectorizer.already_vectorized

Reviewed by: Nadav Rotem

llvm-svn: 182802
2013-05-28 20:00:34 +00:00
Andrew Trick e2431c64bc Track IR ordering of SelectionDAG nodes 3/4.
Remove the old IR ordering mechanism and switch to new one.  Fix unit
test failures.

llvm-svn: 182704
2013-05-25 03:08:10 +00:00
Michael Gottesman e67f40c514 [objc-arc] KnownSafe does not imply that it is safe to perform code motion across CFG edges since even if it is safe to remove RR pairs, we may still be able to move a retain/release into a loop.
rdar://13949644

llvm-svn: 182670
2013-05-24 20:44:05 +00:00
Michael Gottesman 5a91bbf33a [objc-arc] Make sure that multiple owners is propogated correctly through the pass via the usage of a global data structure.
rdar://13750319

llvm-svn: 182669
2013-05-24 20:44:02 +00:00
Benjamin Kramer 6ac1e62377 LoopVectorize: LoopSimplify can't canonicalize loops with an indirectbr in it, don't assert on those cases.
Fixes PR16139.

llvm-svn: 182656
2013-05-24 18:05:35 +00:00
Joey Gouly 83699284be scalarizePHI needs to insert the next ExtractElement in the same block
as the BinaryOperator, *not* in the block where the IRBuilder is currently
inserting into. Fixes a bug where scalarizePHI would create instructions
that would not dominate all uses.

llvm-svn: 182639
2013-05-24 12:29:54 +00:00
Nadav Rotem 9e00eb38a2 SLPVectorizer: Change the order in which new instructions are added to the function.
We are not working on a DAG and I ran into a number of problems when I enabled the vectorizations of 'diamond-trees' (trees that share leafs).
* Imroved the numbering API.
* Changed the placement of new instructions to the last root.
* Fixed a bug with external tree users with non-zero lane.
* Fixed a bug in the placement of in-tree users.

llvm-svn: 182508
2013-05-22 19:47:32 +00:00
Jean-Luc Duprat 0dda6f168c This is an update to a previous commit (r181216).
The earlier change list introduced the following inst combines:
B * (uitofp i1 C) —> select C, B, 0
A * (1 - uitofp i1 C) —> select C, 0, A
select C, 0, B + select C, A, 0 —> select C, A, B

Together these 3 changes would simplify :
A * (1 - uitofp i1 C) + B * uitofp i1 C 
down to :
select C, B, A

In practice we found that the first two substitutions can have a
negative effect on performance, because they reduce opportunities to
use FMA contractions; between the two options FMAs are often the
better choice.  This change list amends the previous one to enable
just these inst combines:

select C, B, 0 + select C, 0, A —> select C, B, A
A * (1 - uitofp i1 C) + B * uitofp i1 C —> select C, B, A

llvm-svn: 182499
2013-05-22 18:29:31 +00:00
Arnold Schwaighofer 12b0d1cda0 LoopVectorize: Make Value pointers that could be RAUW'ed a VH
The Value pointers we store in the induction variable list can be RAUW'ed by a
call to SCEVExpander::expandCodeFor, use a TrackingVH instead. Do the same thing
in some other places where we store pointers that could potentially be RAUW'ed.

Fixes PR16073.

llvm-svn: 182485
2013-05-22 16:54:56 +00:00
Benjamin Kramer 75f0923ab2 Move the remaining simplify-libcalls tests to instcombine, merging most of them into a single file.
llvm-svn: 182211
2013-05-19 13:28:39 +00:00
David Majnemer beab5678a3 isKnownToBeAPowerOfTwo: (X & Y) + Y is a power of 2 or zero if y is also.
This is useful if something that looks like (x & (1 << y)) ? 64 : 32 is
the divisor in a modulo operation.

llvm-svn: 182200
2013-05-18 19:30:37 +00:00
Arnold Schwaighofer 693a1ca628 LoopVectorize: Handle single edge PHIs
We might encouter single edge PHIs - handle them with an identity select.

Fixes PR15990.

llvm-svn: 182199
2013-05-18 18:38:34 +00:00
Richard Smith e04f0d34d1 Respect the 'nobuiltin' attribute when determining if a call is to a memory builtin.
llvm-svn: 181978
2013-05-16 04:12:04 +00:00
Arnold Schwaighofer 2d920477a4 LoopVectorize: Hoist conditional loads if possible
InstCombine can be uncooperative to vectorization and sink loads into
conditional blocks. This prevents vectorization.

Undo this optimization if there are unconditional memory accesses to the same
addresses in the loop.

radar://13815763

llvm-svn: 181860
2013-05-15 01:44:30 +00:00
Manman Ren b3c52fb45b GlobalOpt: fix an issue where CXAAtExitFn points to a deleted function.
CXAAtExitFn was set outside a loop and before optimizations where functions
can be deleted. This patch will set CXAAtExitFn inside the loop and after
optimizations.

Seg fault when running LTO because of accesses to a deleted function.
rdar://problem/13838828

llvm-svn: 181838
2013-05-14 21:52:44 +00:00
Arnold Schwaighofer 2e7a922a15 LoopVectorize: Handle loops with multiple forward inductions
We used to give up if we saw two integer inductions. After this patch, we base
further induction variables on the chosen one like we do in the reverse
induction and pointer induction case.

Fixes PR15720.

radar://13851975

llvm-svn: 181746
2013-05-14 00:21:18 +00:00
Michael Gottesman a76143eeee [objc-arc-opts] In the presense of an alloca unconditionally remove RR pairs if and only if we are both KnownSafeBU/KnownSafeTD rather than just either or.
In the presense of a block being initialized, the frontend will emit the
objc_retain on the original pointer and the release on the pointer loaded from
the alloca. The optimizer will through the provenance analysis realize that the
two are related (albiet different), but since we only require KnownSafe in one
direction, will match the inner retain on the original pointer with the guard
release on the original pointer. This is fixed by ensuring that in the presense
of allocas we only unconditionally remove pointers if both our retain and our
release are KnownSafe (i.e. we are KnownSafe in both directions) since we must
deal with the possibility that the frontend will emit what (to the optimizer)
appears to be unbalanced retain/releases.

An example of the miscompile is:

  %A = alloca
  retain(%x)
  retain(%x) <--- Inner Retain
  store %x, %A
  %y = load %A
  ... DO STUFF ...
  release(%y)
  call void @use(%x)
  release(%x) <--- Guarding Release

getting optimized to:

  %A = alloca
  retain(%x)
  store %x, %A
  %y = load %A
  ... DO STUFF ...
  release(%y)
  call void @use(%x)

rdar://13750319

llvm-svn: 181743
2013-05-13 23:49:42 +00:00
Nadav Rotem ce42cc6d4d SLPVectorizer: Fix a bug in the code that generates extracts for values with multiple users.
The external user does not have to be in lane #0. We have to save the lane for each scalar so that we know which vector lane to extract.

llvm-svn: 181674
2013-05-12 22:58:45 +00:00
David Majnemer 6c30f49af3 InstCombine: Flip the order of two urem transforms
There are two transforms in visitUrem that conflict with each other.

*) One, if a divisor is a power of two, subtracts one from the divisor
   and turns it into a bitwise-and.
*) The other unwraps both operands if they are surrounded by zext
   instructions.

Flipping the order allows the subtraction to go beneath the sign
extension.

llvm-svn: 181668
2013-05-12 00:07:05 +00:00
Arnold Schwaighofer f2305e4467 LoopVectorize: Use the widest induction variable type
Use the widest induction type encountered for the cannonical induction variable.

We used to turn the following loop into an empty loop because we used i8 as
induction variable type and truncated 1024 to 0 as trip count.

int a[1024];
void fail() {
  int reverse_induction = 1023;
  unsigned char forward_induction = 0;
  while ((reverse_induction) >= 0) {
    forward_induction++;
    a[reverse_induction] = forward_induction;
    --reverse_induction;
  }
}

radar://13862901

llvm-svn: 181667
2013-05-11 23:04:28 +00:00
David Majnemer 470b077bca InstCombine: Turn urem to bitwise-and more often
Use isKnownToBeAPowerOfTwo in visitUrem so that we may more aggressively
fold away urem instructions.

llvm-svn: 181661
2013-05-11 09:01:28 +00:00