Commit Graph

3842 Commits

Author SHA1 Message Date
Pekka Jaaskelainen d3c90e132a Call the potentially costly isAnnotatedParallel() only once.
Made the uniform write test's checks a bit stricter.

llvm-svn: 180119
2013-04-23 16:44:43 +00:00
Pekka Jaaskelainen 6f2f66b63f Refuse to (even try to) vectorize loops which have uniform writes,
even if erroneously annotated with the parallel loop metadata.

Fixes Bug 15794: 
"Loop Vectorizer: Crashes with the use of llvm.loop.parallel metadata"

llvm-svn: 180081
2013-04-23 08:08:51 +00:00
Anat Shemer 10260a75e3 Changed back (relative to commit 179786) the operations executed when extract(cast) is transformed to cast(extract). It uses the Builder class as before. In addition the result node is added to the Worklist, so all the previous extract users will become the new scalar cast users.
llvm-svn: 180045
2013-04-22 20:51:10 +00:00
David Blaikie f55abeaf4c Revert "Revert "PR14606: debug info imported_module support""
This reverts commit r179840 with a fix to test/DebugInfo/two-cus-from-same-file.ll

I'm not sure why that test only failed on ARM & MIPS and not X86 Linux, even
though the debug info was clearly invalid on all of them, but this ought to fix
it.

llvm-svn: 179996
2013-04-22 06:12:31 +00:00
Benjamin Kramer 0212dc27ed SROA: Don't crash on a select with two identical operands.
This is an edge case that can happen if we modify a chain of multiple selects.
Update all operands in that case and remove the assert. PR15805.

llvm-svn: 179982
2013-04-21 17:48:39 +00:00
Arnold Schwaighofer 6eb32b31bd Revert "SimplifyCFG: If convert single conditional stores"
There is the temptation to make this tranform dependent on target information as
it is not going to be beneficial on all (sub)targets. Therefore, we should
probably do this in MI Early-Ifconversion.

This reverts commit r179957. Original commit message:

"SimplifyCFG: If convert single conditional stores

This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:

a[i] =
may-alias with a[i] load
if (cond)
    a[i] = Y
into an unconditional store.

a[i] = X
may-alias with a[i] load
tmp = cond ? Y : X;
a[i] = tmp

We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case were the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.

hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.

I am going to watch performance numbers across the builtbots and will revert
this if anything unexpected comes up."

llvm-svn: 179980
2013-04-21 13:09:04 +00:00
Nadav Rotem c57af326a4 SLPVectorize: Add support for vectorization of casts.
llvm-svn: 179975
2013-04-21 08:05:59 +00:00
Michael Gottesman d5b701faf1 [objc-arc] Cleaned up tail-call-invariant-enforcement.ll.
Specifically:

1. Added checks that unwind is being properly added to various instructions.
2. Fixed the declaration/calling of objc_release to have a return type of void.
3. Moved all checks to precede the functions and added checks to ensure that the
checks would only match inside the specific function that we are attempting to
check.

llvm-svn: 179973
2013-04-21 02:59:44 +00:00
Michael Gottesman 77aa946321 [objc-arc] Check that objc-arc-expand properly handles all strictly forwarding calls and does not touch calls which are not strictly forwarding (i.e. objc_retainBlock).
llvm-svn: 179972
2013-04-21 01:57:46 +00:00
Michael Gottesman 524052fec1 [objc-arc] Renamed the test file clang-arc-used-intrinsic-removed-if-isolated.ll -> intrinsic-use-isolated.ll to match the other test file intrinsic-use.ll.
llvm-svn: 179971
2013-04-21 01:42:24 +00:00
Nadav Rotem 8aca44a623 Fix PR15800. Do not try to vectorize vectors and structs.
llvm-svn: 179960
2013-04-20 22:29:43 +00:00
Arnold Schwaighofer 3546ccf465 SimplifyCFG: If convert single conditional stores
This transformation will transform a conditional store with a preceeding
uncondtional store to the same location:

 a[i] =
 may-alias with a[i] load
 if (cond)
   a[i] = Y

into an unconditional store.

 a[i] = X
 may-alias with a[i] load
 tmp = cond ? Y : X;
 a[i] = tmp

We assume that on average the cost of a mispredicted branch is going to be
higher than the cost of a second store to the same location, and that the
secondary benefits of creating a bigger basic block for other optimizations to
work on outway the potential case were the branch would be correctly predicted
and the cost of the executing the second store would be noticably reflected in
performance.

hmmer's execution time improves by 30% on an imac12,2 on ref data sets. With
this change we are on par with gcc's performance (gcc also performs this
transformation). There was a 1.2 % performance improvement on a ARM swift chip.
Other tests in the test-suite+external seem to be mostly uninfluenced in my
experiments:
This optimization was triggered on 41 tests such that the executable was
different before/after the patch. Only 1 out of the 40 tests (dealII) was
reproducable below 100% (by about .4%). Given that hmmer benefits so much I
believe this to be a fair trade off.

I am going to watch performance numbers across the builtbots and will revert
this if anything unexpected comes up.

llvm-svn: 179957
2013-04-20 21:42:09 +00:00
Nuno Lopes 36e827602a recommit tests
llvm-svn: 179955
2013-04-20 17:39:52 +00:00
Nadav Rotem 83c7c41bc2 SLPVectorizer: Improve the cost model for loop invariant broadcast values.
llvm-svn: 179930
2013-04-20 06:13:47 +00:00
Benjamin Kramer 630e6e1422 MergeFunc: Make pointer and integer types generate the same hash.
The logic that actually compares the types considers pointers and integers the
same if they are of the same size. This created a strange mismatch between hash
and reality and made the test case for this fail on some platforms (yay,
test cases).

llvm-svn: 179905
2013-04-19 23:06:44 +00:00
Bill Wendling 24e8a0d5f0 Make variable match any name.
llvm-svn: 179903
2013-04-19 22:30:43 +00:00
Bill Wendling 81c8cf5ef9 Try explicitly setting the target triple to see if this gets it to pass on ARM.
llvm-svn: 179890
2013-04-19 21:24:51 +00:00
Chad Rosier 11ebe05643 Attempt to pacify this test for the buildbots.
llvm-svn: 179874
2013-04-19 19:27:33 +00:00
Bill Wendling b670649067 Add test to make sure that a int-to-ptr can be merged correctly.
llvm-svn: 179869
2013-04-19 18:16:06 +00:00
Benjamin Kramer ec1bb4fdaf ConstantFolding: ComputeMaskedBits wants the scalar size for vectors.
Fixes PR15791.

llvm-svn: 179859
2013-04-19 16:56:24 +00:00
Jakub Staszak 9b59d14fc4 Revert 179826. Tests were worthless.
llvm-svn: 179845
2013-04-19 09:32:30 +00:00
Eric Christopher 0e89ade8ff Revert "PR14606: debug info imported_module support"
This reverts commit r179836 as it seems to have caused test failures.

llvm-svn: 179840
2013-04-19 07:47:16 +00:00
David Blaikie 88564f3cf7 PR14606: debug info imported_module support
Adding another CU-wide list, in this case of imported_modules (since they
should be relatively rare, it seemed better to add a list where each element
had a "context" value, rather than add a (usually empty) list to every scope).
This takes care of DW_TAG_imported_module, but to fully address PR14606 we'll
need to expand this to cover DW_TAG_imported_declaration too.

llvm-svn: 179836
2013-04-19 06:57:04 +00:00
Jakub Staszak 2c1daf75b9 Don't run expensive -O2 and -O3 in tests.
llvm-svn: 179825
2013-04-19 01:10:45 +00:00
Anat Shemer 5570318f43 In the function InstCombiner::visitExtractElementInst() removed the limitation that extract is promoted over a cast only if the cast has only one use.
llvm-svn: 179786
2013-04-18 19:56:44 +00:00
Anat Shemer 0c95efad7e Added a function scalarizePHI() that sclarizes a vector phi instruction if it has only 2 uses: one to promote the vector phi in a loop and the other use is an extract operation of one element at a constant location.
llvm-svn: 179783
2013-04-18 19:35:39 +00:00
Arnold Schwaighofer 4cd6aa110c LoopVectorizer: Recognize min/max reductions
A min/max operation is represented by a select(cmp(lt/le/gt/ge, X, Y), X, Y)
sequence in LLVM. If we see such a sequence we can treat it just as any other
commutative binary instruction and reduce it.

This appears to help bzip2 by about 1.5% on an imac12,2.

radar://12960601

llvm-svn: 179773
2013-04-18 17:22:34 +00:00
Benjamin Kramer 8df2cfb858 LoopVectorize: Use a set to avoid longer cycles in the reduction chain too.
Fixes PR15748.

llvm-svn: 179757
2013-04-18 14:29:13 +00:00
David Majnemer 81af06e003 Revert "Combine bit test + conditional or into simple math"
It is causing stage2 builds to fail, let's get them running again.

llvm-svn: 179750
2013-04-18 08:42:33 +00:00
David Majnemer bdf0caf6b1 Combine bit test + conditional or into simple math
Simplify:
(select (icmp eq (and X, C1), 0), Y, (or Y, C2))

Into:
(or (shl (and X, C1), C3), y)

Where:
C3 = Log(C2) - Log(C1)

If:
C1 and C2 are both powers of two

llvm-svn: 179748
2013-04-18 07:30:07 +00:00
Michael Gottesman 323964ca9e [objc-arc] Do not mismatch up retains inside a for loop with releases outside said for loop in the presense of differing provenance caused by escaping blocks.
This occurs due to an alloca representing a separate ownership from the
original pointer. Thus consider the following pseudo-IR:

  objc_retain(%a)
  for (...) {
    objc_retain(%a)
    %block <- %a
    F(%block)
    objc_release(%block)
  }
  objc_release(%a)

From the perspective of the optimizer, the %block is a separate
provenance from the original %a. Thus the optimizer pairs up the inner
retain for %a and the outer release from %a, resulting in segfaults.

This is fixed by noting that the signature of a mismatch of
retain/releases inside the for loop is a Use/CanRelease top down with an
None bottom up (since bottom up the Retain-CanRelease-Use-Release
sequence is completed by the inner objc_retain, but top down due to the
differing provenance from the objc_release said sequence is not
completed). In said case in CheckForCFGHazards, we now clear the state
of %a implying that no pairing will occur.

Additionally a test case is included.

rdar://12969722

llvm-svn: 179747
2013-04-18 05:39:45 +00:00
Michael Gottesman a15ab25238 Streamline arc-annotation test (removing some cases which do not add any extra coverage) and set it up to use FileCheck variables to make the test more robust.
llvm-svn: 179745
2013-04-18 04:34:06 +00:00
Peter Collingbourne 37ae72b508 Do not optimise fprintf() calls if its return value is used.
Differential Revision: http://llvm-reviews.chandlerc.com/D620

llvm-svn: 179661
2013-04-17 02:01:10 +00:00
Hans Wennborg c9e1d99279 simplifycfg: Fix integer overflow converting switch into icmp.
If a switch instruction has a case for every possible value of its type,
with the same successor, SimplifyCFG would replace it with an icmp ult,
but the computation of the bound overflows in that case, which inverts
the test.

Patch by Jed Davis!

llvm-svn: 179587
2013-04-16 08:35:36 +00:00
Bill Wendling 3789171972 We are not able to bitcast a pointer to an integral value.
Two return types are not equivalent if one is a pointer and the other is an
integral. This is because we cannot bitcast a pointer to an integral value.
PR15185

llvm-svn: 179569
2013-04-15 22:33:50 +00:00
Nadav Rotem b9116e6966 SLPVectorizer: Make it a function pass and add code for hoisting the vector-gather sequence out of loops.
llvm-svn: 179562
2013-04-15 22:00:26 +00:00
Eric Christopher 13637e900e Revert "Recommit r179497 after fixing uninitialized variable." until
I can fix the testcases here:

http://lab.llvm.org:8011/builders/clang-native-arm-cortex-a9/builds/6952

This reverts commit r179512 due to testcases specifying triples
that they didn't actually mean and causing failures on other platforms.

llvm-svn: 179513
2013-04-15 07:31:37 +00:00
Eric Christopher fc2beaa136 Recommit r179497 after fixing uninitialized variable.
llvm-svn: 179512
2013-04-15 07:07:21 +00:00
Nadav Rotem 5d393c416f SLPVectorizer: Add support for vectorizing trees that start at compare instructions.
llvm-svn: 179504
2013-04-15 04:25:27 +00:00
Eric Christopher 1f140317e3 Revert "Remove some unused triple and data layout."
This reverts commit r179497 and the accompanying commit as it broke random platforms that aren't osx.

llvm-svn: 179499
2013-04-14 23:35:36 +00:00
Eric Christopher 4eebd14ad0 Remove some unused triple and data layout.
llvm-svn: 179498
2013-04-14 23:32:44 +00:00
David Majnemer 1fae195557 Reorders two transforms that collide with each other
One performs: (X == 13 | X == 14) -> X-13 <u 2
The other: (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1

The problem is that there are certain values of C1 and C2 that
trigger both transforms but the first one blocks out the second,
this generates suboptimal code.

Reordering the transforms should be better in every case and
allows us to do interesting stuff like turn:
  %shr = lshr i32 %X, 4
  %and = and i32 %shr, 15
  %add = add i32 %and, -14
  %tobool = icmp ne i32 %add, 0

into:
  %and = and i32 %X, 240
  %tobool = icmp ne i32 %and, 224

llvm-svn: 179493
2013-04-14 21:15:43 +00:00
Nadav Rotem 6ebddae118 Make the command line triple match the module triple.
llvm-svn: 179492
2013-04-14 20:13:05 +00:00
Nadav Rotem 029208ceeb Remove unused function attributes.
llvm-svn: 179476
2013-04-14 05:47:04 +00:00
Nadav Rotem 54b413d157 SLPVectorizer: Add support for trees that don't start at binary operators, and add the cost of extracting values from the roots of the tree.
llvm-svn: 179475
2013-04-14 05:15:53 +00:00
Nadav Rotem 0b9cf8567b SLPVectorizer: add initial support for reduction variable vectorization.
llvm-svn: 179470
2013-04-14 03:22:20 +00:00
Benjamin Kramer adc1727c39 GlobalDCE: Fix an oversight in my last commit that could lead to crashes.
There is a Constant with non-constant operands: blockaddress.

llvm-svn: 179460
2013-04-13 16:11:14 +00:00
Benjamin Kramer 89ca4bc6d4 Fix a scalability issue with complex ConstantExprs.
This is basically the same fix in three different places. We use a set to avoid
walking the whole tree of a big ConstantExprs multiple times.

For example: (select cmp, (add big_expr 1), (add big_expr 2))
We don't want to visit big_expr twice here, it may consist of thousands of
nodes.

The testcase exercises this by creating an insanely large ConstantExprs out of
a loop. It's questionable if the optimizer should ever create those, but this
can be triggered with real C code. Fixes PR15714.

llvm-svn: 179458
2013-04-13 12:53:18 +00:00
Benjamin Kramer e89c705030 InstCombine: Check the operand types before merging fcmp ord & fcmp ord.
Fixes PR15737.

llvm-svn: 179417
2013-04-12 21:56:23 +00:00
Nadav Rotem 8543ba3e52 SLPVectorizer: add support for vectorization of diamond shaped trees. We now perform a preliminary traversal of the graph to collect values with multiple users and check where the users came from.
llvm-svn: 179414
2013-04-12 21:16:54 +00:00
Nadav Rotem 87a0af6e1b CostModel: increase the default cost of supported floating point operations from 1 to two. Fixed a few tests that changes because now the cost of one insert + a vector operation on two doubles is lower than two scalar operations on doubles.
llvm-svn: 179413
2013-04-12 21:15:03 +00:00
David Majnemer 1a08accbb7 Simplify (A & ~B) in icmp if A is a power of 2
The transform will execute like so:
(A & ~B) == 0 --> (A & B) != 0
(A & ~B) != 0 --> (A & B) == 0

llvm-svn: 179386
2013-04-12 17:25:07 +00:00
Arnold Schwaighofer f9cea17f75 LoopVectorizer: integer division is not a reduction operation
Don't classify idiv/udiv as a reduction operation. Integer division is lossy.
For example : (1 / 2) * 4 != 4/2.

Example:

int a[] = { 2, 5, 2, 2}
int x = 80;

for()
  x /= a[i];

Scalar:
  x /= 2 // = 40
  x /= 5 // = 8
  x /= 2 // = 4
  x /= 2 // = 2

Vectorized:

 <80, 1> / <2,5> //= <40,0>
 <40, 0> / <2,2> //= <20,0>

 20*0 = 0

radar://13640654

llvm-svn: 179381
2013-04-12 15:15:19 +00:00
David Majnemer b81cd63c4b Optimize icmp involving addition better
Allows LLVM to optimize sequences like the following:

%add = add nsw i32 %x, 1
%cmp = icmp sgt i32 %add, %y

into:

%cmp = icmp sge i32 %x, %y

as well as:

%add1 = add nsw i32 %x, 20
%add2 = add nsw i32 %y, 57
%cmp = icmp sge i32 %add1, %add2

into:

%add = add nsw i32 %y, 37
%cmp = icmp sle i32 %cmp, %x

llvm-svn: 179316
2013-04-11 20:05:46 +00:00
Benjamin Kramer a95f87494a Fix for wrong instcombine on vector insert/extract
When trying to collapse sequences of insertelement/extractelement
instructions into single shuffle instructions, there is one specific
case where the Instruction Combiner wrongly updates the resulting
Mask of shuffle indexes.

The problem is in function CollectShuffleElments.

If we have a sequence of insert/extract element instructions
like the one below:

  %tmp1 = extractelement <4 x float> %LHS, i32 0
  %tmp2 = insertelement <4 x float> %RHS, float %tmp1, i32 1
  %tmp3 = extractelement <4 x float> %RHS, i32 2
  %tmp4 = insertelement <4 x float> %tmp2, float %tmp3, i32 3

Where:
  . %RHS will have a mask of [4,5,6,7]
  . %LHS will have a mask of [0,1,2,3]

The Mask of shuffle indexes is wrongly computed to [4,1,6,7]
instead of [4,0,6,7].
When analyzing %tmp2 in order to compute the Mask for the
resulting shuffle instruction, the algorithm forgets to update
the mask index at position 1 with the index associated to the
element extracted from %LHS by instruction %tmp1.

Patch by Andrea DiBiagio!

llvm-svn: 179291
2013-04-11 15:10:09 +00:00
Benjamin Kramer b50682e156 Add missing colons to check lines.
llvm-svn: 179277
2013-04-11 12:41:41 +00:00
Benjamin Kramer 3960c1cd56 FileCheckize a bunch of tests.
llvm-svn: 179276
2013-04-11 12:32:23 +00:00
Nadav Rotem 73dffa4184 Make the SLP store-merger less paranoid about function calls. We check for function calls when we check if it is safe to sink instructions.
llvm-svn: 179207
2013-04-10 19:41:36 +00:00
Nadav Rotem 2d9dec322e Add support for bottom-up SLP vectorization infrastructure.
This commit adds the infrastructure for performing bottom-up SLP vectorization (and other optimizations) on parallel computations.
The infrastructure has three potential users:

  1. The loop vectorizer needs to be able to vectorize AOS data structures such as (sum += A[i] + A[i+1]).

  2. The BB-vectorizer needs this infrastructure for bottom-up SLP vectorization, because bottom-up vectorization is faster to compute.

  3. A loop-roller needs to be able to analyze consecutive chains and roll them into a loop, in order to reduce code size. A loop roller does not need to create vector instructions, and this infrastructure separates the chain analysis from the vectorization.

This patch also includes a simple (100 LOC) bottom up SLP vectorizer that uses the infrastructure, and can vectorize this code:

void SAXPY(int *x, int *y, int a, int i) {
  x[i]   = a * x[i]   + y[i];
  x[i+1] = a * x[i+1] + y[i+1];
  x[i+2] = a * x[i+2] + y[i+2];
  x[i+3] = a * x[i+3] + y[i+3];
}

llvm-svn: 179117
2013-04-09 19:44:35 +00:00
Nadav Rotem abcc64fd13 Revert r176408 and r176407 to address PR15540.
llvm-svn: 179111
2013-04-09 18:16:05 +00:00
Michael Gottesman ccc93e72e1 Converted 8x tests of SimplifyCFG to use FileCheck instead of grep.
llvm-svn: 179087
2013-04-09 05:18:53 +00:00
Nadav Rotem 7b7585d153 Revert 179071 because it is not the right way to support non standard new/new[] operators.
llvm-svn: 179084
2013-04-09 04:43:46 +00:00
Nadav Rotem 9dd90ac5b4 c++ new operators are not malloc-like functions because they do not return uninitialized memory.
Users may overide new-operators and implement any function that they like.

llvm-svn: 179071
2013-04-08 23:40:47 +00:00
Chandler Carruth 0e8a52d18f Fix PR15674 (and PR15603): a SROA think-o.
The fix for PR14972 in r177055 introduced a real think-o in the *store*
side, likely because I was much more focused on the load side. While we
can arbitrarily widen (or narrow) a loaded value, we can't arbitrarily
widen a value to be stored, as that changes the width of memory access!
Lock down the code path in the store rewriting which would do this to
only handle the intended circumstance.

All of the existing tests continue to pass, and I've added a test from
the PR.

llvm-svn: 178974
2013-04-07 11:47:54 +00:00
Michael Gottesman 31ba23aa56 An objc_retain can serve as a use for a different pointer.
This is the counterpart to commit r160637, except it performs the action
in the bottomup portion of the data flow analysis.

llvm-svn: 178922
2013-04-05 22:54:32 +00:00
Michael Gottesman 1d8d25777d Properly model precise lifetime when given an incomplete dataflow sequence.
The normal dataflow sequence in the ARC optimizer consists of the following
states:

    Retain -> CanRelease -> Use -> Release

The optimizer before this patch stored the uses that determine the lifetime of
the retainable object pointer when it bottom up hits a retain or when top down
it hits a release. This is correct for an imprecise lifetime scenario since what
we are trying to do is remove retains/releases while making sure that no
``CanRelease'' (which is usually a call) deallocates the given pointer before we
get to the ``Use'' (since that would cause a segfault).

If we are considering the precise lifetime scenario though, this is not
correct. In such a situation, we *DO* care about the previous sequence, but
additionally, we wish to track the uses resulting from the following incomplete
sequences:

  Retain -> CanRelease -> Release   (TopDown)
  Retain <- Use <- Release          (BottomUp)

*NOTE* This patch looks large but the most of it consists of updating
test cases. Additionally this fix exposed an additional bug. I removed
the test case that expressed said bug and will recommit it with the fix
in a little bit.

llvm-svn: 178921
2013-04-05 22:54:28 +00:00
Shuxin Yang 95adf5258f Disable the optimization about promoting vector-element-access with symbolic index.
This optimization is unstable at this moment; it 
  1) block us on a very important application
  2) PR15200
  3) test6 and test7 in test/Transforms/ScalarRepl/dynamic-vector-gep.ll
     (the CHECK command compare the output against wrong result)

   I personally believe this optimization should not have any impact on the
autovectorized code, as auto-vectorizer is supposed to put gather/scatter
in a "right" way.  Although in theory downstream optimizaters might reveal 
some gather/scatter optimization opportunities, the chance is quite slim.

   For the hand-crafted vectorizing code, in term of redundancy elimination,
load-CSE, copy-propagation and DSE can collectively achieve the same result,
but in much simpler way. On the other hand, these optimizers are able to 
improve the code in a incremental way; in contrast, SROA is sort of all-or-none
approach. However, SROA might slighly win in stack size, as it tries to figure 
out a stretch of memory tightenly cover the area accessed by the dynamic index.

 rdar://13174884
 PR15200

llvm-svn: 178912
2013-04-05 21:07:08 +00:00
Arnold Schwaighofer df6f67ed87 LoopVectorizer: Pass OperandValueKind information to the cost model
Pass down the fact that an operand is going to be a vector of constants.

This should bring the performance of MultiSource/Benchmarks/PAQ8p/paq8p on x86
back. It had degraded to scalar performance due to my pervious shift cost change
that made all shifts expensive on x86.

radar://13576547

llvm-svn: 178809
2013-04-04 23:26:27 +00:00
Michael Gottesman b8c8836594 Remove an optimization where we were changing an objc_autorelease into an objc_autoreleaseReturnValue.
The semantics of ARC implies that a pointer passed into an objc_autorelease
must live until some point (potentially down the stack) where an
autorelease pool is popped. On the other hand, an
objc_autoreleaseReturnValue just signifies that the object must live
until the end of the given function at least.

Thus objc_autorelease is stronger than objc_autoreleaseReturnValue in
terms of the semantics of ARC* implying that performing the given
strength reduction without any knowledge of how this relates to
the autorelease pool pop that is further up the stack violates the
semantics of ARC.

*Even though objc_autoreleaseReturnValue if you know that no RV
optimization will occur is more computationally expensive.

llvm-svn: 178612
2013-04-03 02:57:24 +00:00
Bill Wendling 88d06c3b2d Use a worklist to avoid a sneaky iterator invalidation.
The iterator could be invalidated when it's recursively deleting a whole bunch
of constant expressions in a constant initializer.

Note: This was only reproducible if `opt' was run on a `.bc' file. If `opt' was
run on a `.ll' file, it wouldn't crash. This is why the test first pushes the
`.ll' file through `llvm-as' before feeding it to `opt'.

PR15440

llvm-svn: 178531
2013-04-02 08:16:45 +00:00
Shuxin Yang 6662fd0f15 Correct assertion condition
llvm-svn: 178484
2013-04-01 18:13:05 +00:00
Benjamin Kramer 52ceb44331 X86TTI: Add accurate costs for itofp operations, based on the actual instruction counts.
llvm-svn: 178459
2013-04-01 10:23:49 +00:00
Shuxin Yang 7b0c94e207 Implement XOR reassociation. It is based on following rules:
rule 1: (x | c1) ^ c2 => (x & ~c1) ^ (c1^c2),
     only useful when c1=c2
  rule 2: (x & c1) ^ (x & c2) = (x & (c1^c2))
  rule 3: (x | c1) ^ (x | c2) = (x & c3) ^ c3 where c3 = c1 ^ c2
  rule 4: (x | c1) ^ (x & c2) => (x & c3) ^ c1, where c3 = ~c1 ^ c2

 It reduces an application's size (in terms of # of instructions) by 8.9%.
 Reviwed by Pete Cooper. Thanks a lot!

 rdar://13212115  

llvm-svn: 178409
2013-03-30 02:15:01 +00:00
Michael Gottesman 9412830090 Updated test0 of retain-not-declared.ll to reflect the fact that objc-arc-expand runs before objc-arc/objc-arc-contract.
Specifically, objc-arc-expand will make sure that the
objc_retainAutoreleasedReturnValue, objc_autoreleaseReturnValue, and ret
will all have %call as an argument.

llvm-svn: 178382
2013-03-29 22:44:59 +00:00
Michael Gottesman 3b8f877860 Add clang.arc.used to ModuleHasARC so ARC always runs if said call is present in a module.
clang.arc.used is an interesting call for ARC since ObjCARCContract
needs to run to remove said intrinsic to avoid a linker error (since the
call does not exist).

llvm-svn: 178369
2013-03-29 21:15:23 +00:00
Michael Gottesman 49f9885a2a Non optimizable objc_retainBlock calls are not forwarding.
Since we handle optimizable objc_retainBlocks through strength reduction
in OptimizableIndividualCalls, we know that all code after that point
will only see non-optimizable objc_retainBlock calls. IsForwarding is
only called by functions after that point, so it is ok to just classify
objc_retainBlock as non-forwarding.

<rdar://problem/13249661>.

llvm-svn: 178285
2013-03-28 20:11:30 +00:00
Michael Gottesman 158fdf699e [ObjCARC] Strength reduce objc_retainBlock -> objc_retain if the objc_retainBlock is optimizable.
If an objc_retainBlock has the copy_on_escape metadata attached to it
AND if the block pointer argument only escapes down the stack, we are
allowed to strength reduce the objc_retainBlock to to an objc_retain and
thus optimize it.

Current there is logic in the ARC data flow analysis to handle
this case which is complicated and involved making distinctions in
between objc_retainBlock and objc_retain in certain places and
considering them the same in others.

This patch simplifies said code by:

1. Performing the strength reduction in the initial ARC peephole
analysis (ObjCARCOpts::OptimizeIndividualCalls).

2. Changes the ARC dataflow analysis (which runs after the peephole
analysis) to consider all objc_retainBlock calls to not be optimizable
(since if the call was optimizable, we would have strength reduced it
already).

This patch leaves in the infrastructure in the ARC dataflow analysis to
handle this case, which due to 2 will just be dead code. I am doing this
on purpose to separate the removal of the old code from the testing of
the new code.

<rdar://problem/13249661>.

llvm-svn: 178284
2013-03-28 20:11:19 +00:00
Akira Hatanaka 19468cafad Remove -O3.
llvm-svn: 178278
2013-03-28 19:34:14 +00:00
David Blaikie 5692e72f30 Revert "Adding DIImportedModules to DIScopes."
This reverts commit 342d92c7a0adeabc9ab00f3f0d88d739fe7da4c7.

Turns out we're going with a different schema design to represent
DW_TAG_imported_modules so we won't need this extra field.

llvm-svn: 178215
2013-03-28 02:44:59 +00:00
Akira Hatanaka 99866dd535 Check if Type is a vector before calling function Type::getVectorNumElements.
llvm-svn: 178208
2013-03-28 01:28:02 +00:00
Michael Gottesman 46ebe53ead Added back in the test for arc-annotations.
The test was removed since I had not turned off the test during release
builds. This fails since ARC annotations support  is conditionally
compiled out during release builds. I added the proper requires header
to assuage this issue.

llvm-svn: 178101
2013-03-27 00:09:58 +00:00
David Blaikie a26d70358f Adding DIImportedModules to DIScopes.
This is just the basic groundwork for supporting DW_TAG_imported_module but I
wanted to commit this before pushing support further into Clang or LLVM so that
this rather churny change is isolated from the rest of the work. The major
churn here is obviously adding another field (within the common DIScope prefix)
to all DIScopes (files, classes, namespaces, lexical scopes, etc). This should
be the last big churny change needed for DW_TAG_imported_module/using directive
support/PR14606.

llvm-svn: 178099
2013-03-27 00:07:26 +00:00
Ulrich Weigand b1e02b2af2 Add test case for commit r178031.
llvm-svn: 178038
2013-03-26 17:30:02 +00:00
Bill Wendling 1ee6d8cef4 Remove testcase. It's failing on some platforms but not others.
llvm-svn: 177956
2013-03-26 01:10:03 +00:00
Bill Wendling d78cfa81be Hmm...not failing...odd
llvm-svn: 177955
2013-03-26 01:08:02 +00:00
Bill Wendling a5a44d4fd6 Temporarily XFAIL this test until Michael can look at it.
llvm-svn: 177953
2013-03-26 00:46:31 +00:00
Michael Gottesman cd4de0f9bb [ObjCARC Annotations] Added support for displaying the state of pointers at the bottom/top of BBs of the ARC dataflow analysis for both bottomup and topdown analyses.
This will allow for verification and analysis of the merge function of
the data flow analyses in the ARC optimizer.

The actual implementation of this feature is by introducing calls to
the functions llvm.arc.annotation.{bottomup,topdown}.{bbstart,bbend}
which are only declared. Each such call takes in a pointer to a global
with the same name as the pointer whose provenance is being tracked and
a pointer whose name is one of our Sequence states and points to a
string that contains the same name.

To ensure that the optimizer does not consider these annotations in any
way, I made it so that the annotations are considered to be of IC_None
type.

A test case is included for this commit and the previous
ObjCARCAnnotation commit.

llvm-svn: 177952
2013-03-26 00:42:09 +00:00
John McCall a237239097 Add an optimizer-side test case for ARC bug <rdar://13195034>, fixed
in the frontend with @clang.arc.use.

llvm-svn: 177928
2013-03-25 22:09:52 +00:00
Shuxin Yang 389ed4b8f7 Fix a bug in fast-math fadd/fsub simplification.
The problem is that the code mistakenly took for granted that following constructor 
is able to create an APFloat from a *SIGNED* integer:
   
  APFloat::APFloat(const fltSemantics &ourSemantics, integerPart value)

rdar://13486998

llvm-svn: 177906
2013-03-25 20:43:41 +00:00
NAKAMURA Takumi 951a9b169b Disable, for now, llvm/test/Transforms/GCOVProfiling on win32. I'll investigate them later.
llvm-svn: 177894
2013-03-25 19:47:20 +00:00
Arnaud A. de Grandmaison 3ee88e8a77 Address issues found by Duncan during post-commit review of r177856.
llvm-svn: 177863
2013-03-25 11:47:38 +00:00
Arnaud A. de Grandmaison 9c383d68cf InstCombine: simplify comparisons to zero of (shl %x, Cst) or (mul %x, Cst)
This simplification happens at 2 places :
 - using the nsw attribute when the shl / mul is used by a sign test
 - when the shl / mul is compared for (in)equality to zero

llvm-svn: 177856
2013-03-25 09:48:49 +00:00
John McCall 20182ac0c7 Kill every call to @clang.arc.use in the ARC contract phase.
llvm-svn: 177769
2013-03-22 21:38:36 +00:00
Bill Wendling d96a7a6be8 Update test. There may be multiple catches, but those will be cleaned up.
llvm-svn: 177758
2013-03-22 20:36:39 +00:00
Arnaud A. de Grandmaison f364bc63e7 InstCombine: Improve the result bitvect type when folding (cmp pred (load (gep GV, i)) C) to a bit test.
The original code used i32, and i64 if legal. This introduced unneeded
casts when they aren't legal, or when the index variable i has another
type. In order of preference: try to use i's type; use the smallest
fitting legal type (using an added DataLayout method); default to i32.
A testcase checks that this works when the index gep operand is i16.

Patch by : Ahmed Bougacha <ahmed.bougacha@gmail.com>
Reviewed by : Duncan

llvm-svn: 177712
2013-03-22 08:25:01 +00:00
David Blaikie 0d7d62e4b2 Move the DIFile in DISubprogram to the beginning to be a common prefix along with other DIScopes
llvm-svn: 177674
2013-03-21 22:29:36 +00:00
David Blaikie cc8d090163 Remove unused field in DISubprogram
llvm-svn: 177661
2013-03-21 20:28:52 +00:00
Bill Wendling 5b98172115 Update some EH tests that were violating the new EH model.
The landingpad instruction needs to be the first non-PHI instruction in the
unwind destination block.

llvm-svn: 177650
2013-03-21 18:30:10 +00:00
Meador Inge 6b6a161ccf Move library call prototype attribute inference to functionattrs
The simplify-libcalls pass implemented a doInitialization hook to infer
function prototype attributes for well-known functions.  Given that the
simplify-libcalls pass is going away *and* that the functionattrs pass
is already in place to deduce function attributes, I am moving this logic
to the functionattrs pass.  This approach was discussed during patch
review:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121126/157465.html.

llvm-svn: 177619
2013-03-21 00:55:59 +00:00
David Blaikie 43a729d165 Remove unused field in DICompileUnit
llvm-svn: 177590
2013-03-20 22:34:33 +00:00
Nick Lewycky 2b7fe9fc93 Don't assume the test directory is writable, use %T to find a writable
directory.

llvm-svn: 177488
2013-03-20 05:59:40 +00:00
Arnaud A. de Grandmaison 87c473f0d1 IndVarSimplify: do not recompute an IV value outside of the loop if :
- it is trivially known to be used inside the loop in a way that can not be optimized away
- there is no use outside of the loop which can take advantage of the computation hoisting

llvm-svn: 177432
2013-03-19 20:00:22 +00:00
Nick Lewycky d67186337a Emit the linkage name instead of the function name, when available. This means
that we'll prefer to emit the mangled C++ name (pending a clang change).

llvm-svn: 177371
2013-03-19 01:37:55 +00:00
Manman Ren 1217112d11 Check whether a pointer is non-null (isKnownNonNull) in isKnownNonZero.
This handles the case where we have an inbounds GEP with alloca as the pointer.
This fixes the regression in PR12750 and rdar://13286434.
Note that we can also fix this by handling some GEP cases in isKnownNonNull.

llvm-svn: 177321
2013-03-18 21:23:25 +00:00
David Tweed d505b24277 Initially forgotten-to-svn-add test case for r177279.
llvm-svn: 177280
2013-03-18 12:07:24 +00:00
Michael Gottesman a8b60a4fda Reduced dont-infinite-loop-during-block-escape-analysis.ll with bugpoint and moved it to retain-block-escape-analysis.ll.
*NOTE* I verified that the original bug behind
dont-infinite-loop-during-block-escape-analysis.ll occurs when using opt on
retain-block-escape-analysis.ll.

llvm-svn: 177240
2013-03-17 21:31:12 +00:00
David Blaikie 8fb8224578 Split out filename & directory from DIFile to start generalizing over DIScopes
This is the first step to making all DIScopes have a common metadata prefix (so
that things (using directives, for example) that can appear in any scope can be
added to that common prefix). DIFile is itself a DIScope so the common prefix
of all DIScopes cannot be a DIFile - instead it's the raw filename/directory
name pair.

llvm-svn: 177239
2013-03-17 21:13:55 +00:00
David Blaikie 2e488d1f0d Generalize debug info test to be resilient to changes in metadata node numbering
llvm-svn: 177238
2013-03-17 21:08:22 +00:00
Michael Gottesman 9782183126 The promised test case for r175939.
This test makes sure that the ObjCARC escape analysis looks at the uses of
instructions which copy the block pointer value by checking all four cases where
that can occur.

llvm-svn: 177232
2013-03-17 08:42:58 +00:00
Arnold Schwaighofer 9b55e31bcb LoopVectorizer: Insert some white space to make test case more readable
Also remove some unneeded function attributes.

llvm-svn: 177114
2013-03-14 21:31:09 +00:00
Arnold Schwaighofer 4991ce9d49 Add missing asserts flag to test - it uses debug flags
llvm-svn: 177102
2013-03-14 19:01:58 +00:00
Arnold Schwaighofer c63cf3a0ae LoopVectorize: Invert case when we use a vector cmp value to query select cost
We generate a select with a vectorized condition argument when the condition is
NOT loop invariant. Not the other way around.

llvm-svn: 177098
2013-03-14 18:54:36 +00:00
Shuxin Yang 2eca602f8b Perform factorization as a last resort of unsafe fadd/fsub simplification.
Rules include:
  1)1 x*y +/- x*z => x*(y +/- z) 
    (the order of operands dosen't matter)

  2) y/x +/- z/x => (y +/- z)/x 

 The transformation is disabled if the new add/sub expr "y +/- z" is a 
denormal/naz/inifinity.

rdar://12911472

llvm-svn: 177088
2013-03-14 18:08:26 +00:00
Chandler Carruth a1c54bbe34 PR14972: SROA vs. GVN exposed a really bad bug in SROA.
The fundamental problem is that SROA didn't allow for overly wide loads
where the bits past the end of the alloca were masked away and the load
was sufficiently aligned to ensure there is no risk of page fault, or
other trapping behavior. With such widened loads, SROA would delete the
load entirely rather than clamping it to the size of the alloca in order
to allow mem2reg to fire. This was exposed by a test case that neatly
arranged for GVN to run first, widening certain loads, followed by an
inline step, and then SROA which miscompiles the code. However, I see no
reason why this hasn't been plaguing us in other contexts. It seems
deeply broken.

Diagnosing all of the above took all of 10 minutes of debugging. The
really annoying aspect is that fixing this completely breaks the pass.
;] There was an implicit reliance on the fact that no loads or stores
extended past the alloca once we decided to rewrite them in the final
stage of SROA. This was used to encode information about whether the
loads and stores had been split across multiple partitions of the
original alloca. That required threading explicit tracking of whether
a *use* of a partition is split across multiple partitions.

Once that was done, another problem arose: we allowed splitting of
integer loads and stores iff they were loads and stores to the entire
alloca. This is a really arbitrary limitation, and splitting at least
some integer loads and stores is crucial to maximize promotion
opportunities. My first attempt was to start removing the restriction
entirely, but currently that does Very Bad Things by causing *many*
common alloca patterns to be fully decomposed into i8 operations and
lots of or-ing together to produce larger integers on demand. The code
bloat is terrifying. That is still the right end-goal, but substantial
work must be done to either merge partitions or ensure that small i8
values are eagerly merged in some other pass. Sadly, figuring all this
out took essentially all the time and effort here.

So the end result is that we allow splitting only when the load or store
at least covers the alloca. That ensures widened loads and stores don't
hurt SROA, and that we don't rampantly decompose operations more than we
have previously.

All of this was already fairly well tested, and so I've just updated the
tests to cover the wide load behavior. I can add a test that crafts the
pass ordering magic which caused the original PR, but that seems really
brittle and to provide little benefit. The fundamental problem is that
widened loads should Just Work.

llvm-svn: 177055
2013-03-14 11:32:24 +00:00
Nick Lewycky 3d28d4dee7 Remove a change to the debug info in this test, that I made while testing
something else and forgot to remove.

llvm-svn: 177007
2013-03-14 05:28:10 +00:00
Nick Lewycky d11060d971 Try using %S to find the emitted .gcno file.
llvm-svn: 177006
2013-03-14 05:23:30 +00:00
Nick Lewycky fdfed3e9c9 Refactor GCOV's six constructor arguments into a struct with a getter that
constructs default arguments. It can now take default arguments from
cl::opt'ions. Add a new -default-gcov-version=... option, and actually test it!

Sink the reverse-order of the version into GCOVProfiling, hiding it from our
users.

llvm-svn: 177002
2013-03-14 05:13:26 +00:00
David Blaikie 0d221159a0 Remove the unused 4th operand for DIFile debug info metadata
llvm-svn: 176983
2013-03-13 22:05:21 +00:00
David Blaikie 1ca2f36289 Refactor filename/directory in DICompileUnit into a DIFile
This is the next step towards making the metadata for DIScopes have a common
prefix rather than having to delegate based on their tag type.

llvm-svn: 176913
2013-03-13 00:01:35 +00:00
David Blaikie 452c3ff649 Remove unused "isMain" field from DICompileUnit
llvm-svn: 176910
2013-03-12 22:43:04 +00:00
David Blaikie a4f770d51c Update debug info test cases with empty SplitDebugFilename field.
This could be 'null' or the empty string, DIDescriptor::getStringField
coalesces the two cases anyway so it's just a matter of legible/efficient
representation.

The change in behavior of the DICompileUnit::get* functions could be
subsumed by the full verification check - but ideally that should just be an
assertion if we could front-load the actual debug info metadata failure paths.

llvm-svn: 176907
2013-03-12 22:25:36 +00:00
Jan Wen Voung 6dc3076080 Revert the test moves from 176733. Use "REQUIRES: asserts" instead.
llvm-svn: 176873
2013-03-12 16:27:52 +00:00
David Blaikie 47922fb006 Upgrading debug info test cases to be (more) compatible with the current debug info format.
These cases were found by further work to remove support for debug info
versioning. Common cleanups (other than changing the version info in the tag
field) included adding the last parameter to compile_units (recently added for
fission support) and other cases of trailing fields in lexical blocks, compile
units, and subprograms.

llvm-svn: 176834
2013-03-11 22:37:40 +00:00
Bill Wendling 9534d8885f Don't remove a landing pad if the invoke requires a table entry.
An invoke may require a table entry. For instance, when the function it calls
is expected to throw.
<rdar://problem/13360379>

llvm-svn: 176827
2013-03-11 20:53:00 +00:00
Benjamin Kramer fc0c7bf0d7 Fix test case.
llvm-svn: 176773
2013-03-09 18:34:27 +00:00
Benjamin Kramer 01b75cc0f2 Test case hygiene.
llvm-svn: 176772
2013-03-09 18:25:40 +00:00
Arnold Schwaighofer 4090b61ac3 LoopVectorizer: Ignore dbg.value instructions
We want vectorization to happen at -g. Ignore calls to the dbg.value intrinsic
and don't transfer them to the vectorized code.

radar://13378964

llvm-svn: 176768
2013-03-09 15:56:34 +00:00
Jan Wen Voung 7857a64909 Disable statistics on Release builds and move tests that depend on -stats.
Summary:
Statistics are still available in Release+Asserts (any +Asserts builds),
and stats can also be turned on with LLVM_ENABLE_STATS.

Move some of the FastISel stats that were moved under DEBUG()
back out of DEBUG(), since stats are disabled across the board now.

Many tests depend on grepping "-stats" output.  Move those into
a orig_dir/Stats/. so that they can be marked as unsupported
when building without statistics.

Differential Revision: http://llvm-reviews.chandlerc.com/D486

llvm-svn: 176733
2013-03-08 22:56:31 +00:00
Benjamin Kramer 10a74ed434 Force cpu in test.
llvm-svn: 176702
2013-03-08 17:01:18 +00:00
Benjamin Kramer 37c2d65c5a Insert the reduction start value into the first bypass block to preserve domination.
Fixes PR15344.

llvm-svn: 176701
2013-03-08 16:58:37 +00:00
Andrew Trick a0a5ca06b9 SimplifyCFG fix for volatile load/store.
Fixes rdar:13349374.

Volatile loads and stores need to be preserved even if the language
standard says they are undefined. "volatile" in this context means "get
out of the way compiler, let my platform handle it".

Additionally, this is the only way I know of with llvm to write to the
first page (when hardware allows) without dropping to assembly.

llvm-svn: 176599
2013-03-07 01:03:35 +00:00
Jim Grosbach 95d2eb95c3 InstCombine: Don't shrink allocas when combining with a bitcast.
When considering folding a bitcast of an alloca into the alloca itself,
make sure we don't shrink the amount of memory being allocated, or
things rapidly go sideways.

rdar://13324424

llvm-svn: 176547
2013-03-06 05:44:53 +00:00
Nuno Lopes 589443bd93 recommit r172363 & r171325 (reverted in r172756)
This adds minimalistic support for PHI nodes to llvm.objectsize() evaluation

fingers crossed so that it does break clang boostrap again..

llvm-svn: 176408
2013-03-02 11:36:24 +00:00
Arnold Schwaighofer 20ef54f4c1 X86 cost model: Adjust cost for custom lowered vector multiplies
This matters for example in following matrix multiply:

int **mmult(int rows, int cols, int **m1, int **m2, int **m3) {
  int i, j, k, val;
  for (i=0; i<rows; i++) {
    for (j=0; j<cols; j++) {
      val = 0;
      for (k=0; k<cols; k++) {
        val += m1[i][k] * m2[k][j];
      }
      m3[i][j] = val;
    }
  }
  return(m3);
}

Taken from the test-suite benchmark Shootout.

We estimate the cost of the multiply to be 2 while we generate 9 instructions
for it and end up being quite a bit slower than the scalar version (48% on my
machine).

Also, properly differentiate between avx1 and avx2. On avx-1 we still split the
vector into 2 128bits and handle the subvector muls like above with 9
instructions.
Only on avx-2 will we have a cost of 9 for v4i64.

I changed the test case in test/Transforms/LoopVectorize/X86/avx1.ll to use an
add instead of a mul because with a mul we now no longer vectorize. I did
verify that the mul would be indeed more expensive when vectorized with 3
kernels:

for (i ...)
   r += a[i] * 3;
for (i ...)
  m1[i] = m1[i] * 3; // This matches the test case in avx1.ll
and a matrix multiply.

In each case the vectorized version was considerably slower.

radar://13304919

llvm-svn: 176403
2013-03-02 04:02:52 +00:00
Nadav Rotem 739e37a0d2 PR14448 - prevent the loop vectorizer from vectorizing the same loop twice.
The LoopVectorizer often runs multiple times on the same function due to inlining.
When this happens the loop vectorizer often vectorizes the same loops multiple times, increasing code size and adding unneeded branches.
With this patch, the vectorizer during vectorization puts metadata on scalar loops and marks them as 'already vectorized' so that it knows to ignore them when it sees them a second time.

PR14448.

llvm-svn: 176399
2013-03-02 01:33:49 +00:00
Benjamin Kramer 12f98fae98 LoopVectorize: Don't hang forever if a PHI only has skipped PHI uses.
Fixes PR15384.

llvm-svn: 176366
2013-03-01 19:07:31 +00:00
Quentin Colombet e684a6d4aa Fix a bug in instcombine for fmul in fast math mode.
The instcombine recognized pattern looks like:
a = b * c
d = a +/- Cst
or
a = b * c
d = Cst +/- a

When creating the new operands for fadd or fsub instruction following the related fmul, the first operand was created with the second original operand (M0 was created with C1) and the second with the first (M1 with Opnd0).

The fix consists in creating the new operands with the appropriate original operand, i.e., M0 with Opnd0 and M1 with C1.

llvm-svn: 176300
2013-02-28 21:12:40 +00:00
Benjamin Kramer dc145816fd LoopVectorize: Vectorize math builtin calls.
This properly asks TargetLibraryInfo if a call is available and if it is, it
can be translated into the corresponding LLVM builtin. We don't vectorize sqrt()
yet because I'm not sure about the semantics for negative numbers. The other
intrinsic should be exact equivalents to the libm functions.

Differential Revision: http://llvm-reviews.chandlerc.com/D465

llvm-svn: 176188
2013-02-27 15:24:19 +00:00
Michael Ilseman a7b93c1e5f Constant fold vector bitcasts of halves similarly to how floats and doubles are folded. Test case included.
llvm-svn: 176131
2013-02-26 22:51:07 +00:00
Benjamin Kramer ee40b9a2d4 CVP: If we have a PHI with an incoming select, try to skip the select.
This is a common pattern with dyn_cast and similar constructs, when the
PHI no longer depends on the select it can often be turned into a simpler
construct or even get hoisted out of the loop.

PR15340.

llvm-svn: 175995
2013-02-24 15:34:43 +00:00
Benjamin Kramer b867fea5e6 Fix invalid IR in test, missing incoming value for PHI node.
llvm-svn: 175994
2013-02-24 15:34:29 +00:00
Renato Golin 0890ace58a Some more tests for the global structure vectorizer
llvm-svn: 175964
2013-02-23 12:48:30 +00:00
Renato Golin adc1b07002 More tests to global struct vectorizer
llvm-svn: 175898
2013-02-22 16:18:31 +00:00
Bill Wendling a032374ea0 Use references to attribute groups on the call/invoke instructions.
Listing all of the attributes for the callee of a call/invoke instruction is way
too much and makes the IR unreadable. Use references to attributes instead.

llvm-svn: 175877
2013-02-22 09:09:42 +00:00
Renato Golin cf928cb53f Allow GlobalValues to vectorize with AliasAnalysis
Storing the load/store instructions with the values
and inspect them using Alias Analysis to make sure
they don't alias, since the GEP pointer operand doesn't
take the offset into account.

Trying hard to not add any extra cost to loads and stores
that don't overlap on global values, AA is *only* calculated
if all of the previous attempts failed.

Using biggest vector register size as the stride for the
vectorization access, as we're being conservative and
the cost model (which calculates the real vectorization
factor) is only run after the legalization phase.

We might re-think this relationship in the future, but
for now, I'd rather be safe than sorry.

llvm-svn: 175818
2013-02-21 22:39:03 +00:00
Bill Wendling 90bc19cd91 Modify the LLVM assembly output so that it uses references to represent function attributes.
This makes the LLVM assembly look better. E.g.:

     define void @foo() #0 { ret void }
     attributes #0 = { nounwind noinline ssp }

llvm-svn: 175605
2013-02-20 07:21:42 +00:00
Nadav Rotem 0186347c4c Fix a bug in mayHaveSideEffects. Functions that do not return are now considered as instructions with side effects.
rdar://13227456

llvm-svn: 175553
2013-02-19 20:02:09 +00:00
Bill Wendling c98e4fef1a Temporarily revert r175470 for more review.
llvm-svn: 175476
2013-02-19 00:52:45 +00:00
Bill Wendling 66651e4c2f Check to see if the 'no-builtin' attribute is set before simplifying a library call.
llvm-svn: 175470
2013-02-18 23:17:16 +00:00
Tim Northover 67d3c09332 AArch64: adjust tests which rely on a default JIT
Profiling tests *do* need a JIT. They'll pass if a cross-compiler targetting
AArch64 by default has been built, but fail if a native AArch64 compiler has
been build. Therefore XFAIL is inappropriate and we mark them unsupported.

ExecutionEngine tests are JIT by definition, they should also be unsupported.

Transforms/LICM only uses the interpreter to check the output is still sane
after optimisation. It can be switched to use an interpreter.

llvm-svn: 175433
2013-02-18 11:08:37 +00:00
Hal Finkel 76e65e4542 BBVectorize: Fix an invalid reference bug
This fixes PR15289. This bug was introduced (recently) in r175215; collecting
all std::vector references for candidate pairs to delete at once is invalid
because subsequent lookups in the owning DenseMap could invalidate the
references.

bugpoint was able to reduce a useful test case. Unfortunately, because whether
or not this asserts depends on memory layout, this test case will sometimes
appear to produce valid output. Nevertheless, running under valgrind will
reveal the error.

llvm-svn: 175397
2013-02-17 15:59:26 +00:00
Bill Wendling 23242098e7 The transform is:
(or (bool?A:B),(bool?C:D)) --> (bool?(or A,C):(or B,D))

By the time the OR is visited, both the SELECTs have been visited and not
optimized and the OR itself hasn't been transformed so we do this transform in
the hopes that the new ORs will be optimized.

The transform is explicitly disabled for vector-selects until "codegen matures
to handle them better".

Patch by Muhammad Tauqir!

llvm-svn: 175380
2013-02-16 23:41:36 +00:00
Pekka Jaaskelainen 62848c9c24 Forgot to 'svn add' the LoopVectorizer tests for the new parallel loop metadata, sorry.
llvm-svn: 175311
2013-02-15 21:50:19 +00:00
Arnaud A. de Grandmaison 61c167c62b Teach InstCombine to work with smaller legal types in icmp (shl %v, C1), C2
It enables to work with a smaller constant, which is target friendly for those which can compare to immediates.
It also avoids inserting a shift in favor of a trunc, which can be free on some targets.

This used to work until LLVM-3.1, but regressed with the 3.2 release.

llvm-svn: 175270
2013-02-15 14:35:47 +00:00
Bill Wendling 26b95756c1 Simplify the 'operator<' for the attribute object.
llvm-svn: 175252
2013-02-15 05:25:26 +00:00
Anna Zaks 269d1fa991 Revert "Fix testcase for attribute ordering."
This reverts commit 58f20a3cbfca7384fe5e25e095f18572736a4792.

llvm-svn: 175249
2013-02-15 04:15:53 +00:00
Anna Zaks 61040b915d Revert "Fix testcase for attribute ordering."
This reverts commit 997c6516ca161073a1d516ebca7c0ca7722f64e2.

llvm-svn: 175248
2013-02-15 04:15:50 +00:00
Bill Wendling f7d8d767fb Fix testcase for attribute ordering.
llvm-svn: 175238
2013-02-15 01:04:46 +00:00
Bill Wendling fa1d248ccf Fix testcase for attribute ordering.
llvm-svn: 175236
2013-02-15 00:58:25 +00:00
Nick Lewycky 06417743cf Teach the DataLayout aware constant folder to be much more aggressive towards
'and' instructions. This is a pattern that shows up a lot in ubsan binaries.

llvm-svn: 175128
2013-02-14 03:23:37 +00:00
Eli Bendersky 3ffeb68dd7 s/grep/FileCheck/ in some tests
llvm-svn: 175093
2013-02-13 22:00:37 +00:00
Michael Ilseman 74a6da963b Optimization: bitcast (<1 x ...> insertelement ..., X, ...) to ... ==> bitcast X to ...
llvm-svn: 174905
2013-02-11 21:41:44 +00:00
Michael Ilseman 35f82ff833 Remove trailing whitespace
llvm-svn: 174903
2013-02-11 21:36:49 +00:00
Bill Wendling 84ba97698e FileCheck-ize the tests.
llvm-svn: 174865
2013-02-11 08:34:57 +00:00
Andrew Trick bc7059032b LSR IVChain improvement.
Handle chains in which the same offset is used for both loads and
stores to the same array.

Fixes rdar://11410078.

llvm-svn: 174789
2013-02-09 01:11:01 +00:00
Chad Rosier 22d275f7b8 [SimplifyLibCalls] Library call simplification doen't work if the call site
isn't using the default calling convention.  However, if the transformation is
from a call to inline IR, then the calling convention doesn't matter.
rdar://13157990

llvm-svn: 174724
2013-02-08 18:00:14 +00:00
Andrew Trick 1bd53c3675 Revert "Have InstCombine call SipmlifyCall when handling calls. Test case included."
This reverts commit 3854a5d90fee52af1065edbed34521fff6cdc18d.

This causes a clang unit test to hang: vtable-available-externally.cpp.

llvm-svn: 174692
2013-02-08 01:55:39 +00:00
Michael Ilseman 6092dc5455 Have InstCombine call SipmlifyCall when handling calls. Test case included.
llvm-svn: 174675
2013-02-07 23:01:35 +00:00
Michael Ilseman 5485729b9a Identify and simplify idempotent intrinsics. Test case included.
llvm-svn: 174650
2013-02-07 19:26:05 +00:00
Owen Anderson bfd2ce96c7 Remove this testcase until I can figure out how to properly conditionalize it.
llvm-svn: 174591
2013-02-07 07:01:54 +00:00
Owen Anderson 589baf98b4 Another attempt at getting the XFAIL line right for this test.
llvm-svn: 174588
2013-02-07 06:26:55 +00:00
Michael Ilseman 1dd6f2a5ba Preserve fast-math flags after reassociation and commutation. Update test cases
llvm-svn: 174571
2013-02-07 01:40:15 +00:00
Michael Ilseman 10f2055812 whitespace
llvm-svn: 174569
2013-02-07 01:27:13 +00:00
Owen Anderson 389f7dc7a2 Fix CMake detection of various cmath functions, and XFAIL the test on platforms that are known to be missing them.
llvm-svn: 174564
2013-02-07 00:54:05 +00:00
Owen Anderson d4ebfd8400 Signficantly generalize our ability to constant fold floating point intrinsics, including ones on half types.
llvm-svn: 174555
2013-02-06 22:43:31 +00:00
Benjamin Kramer 944e0abf04 InstCombine: Fix and simplify the inttoptr side too.
llvm-svn: 174438
2013-02-05 20:22:40 +00:00
Michael Gottesman a750006ad6 Added missing newline to end of test case.
llvm-svn: 174433
2013-02-05 19:39:44 +00:00
Benjamin Kramer e477875873 InstCombine: Harden code to work with vectors of pointers and simplify it a bit.
Found by running instcombine on a fabricated test case for the constant folder.

llvm-svn: 174430
2013-02-05 19:21:56 +00:00
Benjamin Kramer a5a9ec5755 ConstantFolding: Fix a crash when encoutering a truncating inttoptr.
This was introduced in r173293.

llvm-svn: 174424
2013-02-05 19:04:36 +00:00
NAKAMURA Takumi 7ec43d9b37 Formatting.
llvm-svn: 174380
2013-02-05 15:32:16 +00:00
NAKAMURA Takumi 6635fe56d3 llvm/test/Transforms/LoopVectorize/X86/vector_ptr_load_store.ll: "-debug" requires +Asserts.
llvm-svn: 174379
2013-02-05 15:32:10 +00:00
Arnold Schwaighofer 22174f5d5a Loop Vectorizer: Handle pointer stores/loads in getWidestType()
In the loop vectorizer cost model, we used to ignore stores/loads of a pointer
type when computing the widest type within a loop. This meant that if we had
only stores/loads of pointers in a loop we would return a widest type of 8bits
(instead of 32 or 64 bit) and therefore a vector factor that was too big.

Now, if we see a consecutive store/load of pointers we use the size of a pointer
(from data layout).

This problem occured in SingleSource/Benchmarks/Shootout-C++/hash.cpp (reduced
test case is the first test in vector_ptr_load_store.ll).

radar://13139343

llvm-svn: 174377
2013-02-05 15:08:02 +00:00
Michael Gottesman e2376cdf71 Add code to GlobalVariable.h so that global variables marked as
externally_initialized return false for hasDefiniteInitializer and
hasUniqueInitializer.

rdar://12580965.

llvm-svn: 174345
2013-02-05 06:53:26 +00:00
David Blaikie 33111dfea0 Remove the (apparently) unnecessary debug info metadata indirection.
The main lists of debug info metadata attached to the compile_unit had an extra
layer of metadata nodes they went through for no apparent reason. This patch
removes that (& still passes just as much of the GDB 7.5 test suite). If anyone
can show evidence as to why these extra metadata nodes are there I'm open to
reverting this patch & documenting why they're there.

llvm-svn: 174266
2013-02-02 05:56:24 +00:00
Dan Gohman 9ee4bc1abc Add a testcase for some past-the-end address subtleties.
llvm-svn: 174210
2013-02-01 19:37:52 +00:00
Benjamin Kramer c05aa958b1 InstSimplify: stripAndComputeConstantOffsets can be called with vectors of pointers too.
Prepare it for vectors of pointers and handle simple cases. We don't handle
complicated cases because accumulateConstantOffset bails on pointer vectors.
Fixes selfhost on i386.

llvm-svn: 174179
2013-02-01 15:21:10 +00:00
Nadav Rotem 4349f6963e Revert r174152. The shift amount may overflow and in that case this transformation is illegal.
llvm-svn: 174156
2013-02-01 07:59:33 +00:00
Nadav Rotem 1d584029ae Optimize shift lefts of a constant by a value plus constant into a single shift.
llvm-svn: 174152
2013-02-01 06:45:40 +00:00
Dan Gohman b3e2d3a638 Rewrite instsimplify's handling if icmp on pointer values to remove the
remaining use of AliasAnalysis concepts such as isIdentifiedObject to
prove pointer inequality.

@external_compare in test/Transforms/InstSimplify/compare.ll shows a simple
case where a noalias argument can be equal to a global variable address, and
while AliasAnalysis can get away with saying that these pointers don't alias,
instsimplify cannot say that they are not equal.

llvm-svn: 174122
2013-02-01 00:11:13 +00:00
Dan Gohman 995d40e1e2 An alloca can be equal to an argument. It can't *alias* an alloca, but it could
be equal, since there's nothing preventing a caller from correctly predicting
the stack location of an alloca.

llvm-svn: 174119
2013-01-31 23:49:33 +00:00
Bill Wendling 1c7cc8ae90 Remove the AttrBuilder form of the Attribute::get creators.
The AttrBuilder is for building a collection of attributes. The Attribute object
holds only one attribute. So it's not really useful for the Attribute object to
have a creator which takes an AttrBuilder.

This has two fallouts:

1. The AttrBuilder no longer holds its internal attributes in a bit-mask form.
2. The attributes are now ordered alphabetically (hence why the tests have changed).

llvm-svn: 174110
2013-01-31 23:16:25 +00:00
Pekka Jaaskelainen 995a3e731d Made the min-trip-count-switch test X86-specific to avoid
breakage with builds without X86-support.

llvm-svn: 174052
2013-01-31 10:33:22 +00:00
Michael Gottesman 41e4ac4224 Filecheckized 2x tests in SimplifyCFG and removed their date prefix to fit with current llvm style for test names.
llvm-svn: 174011
2013-01-31 01:04:23 +00:00
Nadav Rotem 513bd8a73c InstCombine: canonicalize sext-and --> select
sext-not-and --> select.

Patch by Muhammad Tauqir Ahmad.

llvm-svn: 173901
2013-01-30 06:35:22 +00:00
Renato Golin 5e9d55eca0 Adding simple cast cost to ARM
Changing ARMBaseTargetMachine to return ARMTargetLowering intead of
the generic one (similar to x86 code).

Tests showing which instructions were added to cast when necessary
or cost zero when not. Downcast to 16 bits are not lowered in NEON,
so costs are not there yet.

llvm-svn: 173849
2013-01-29 23:31:38 +00:00
Pekka Jaaskelainen f50ab84bb1 LoopVectorize: convert TinyTripCountVectorThreshold constant
to a command line switch.

llvm-svn: 173837
2013-01-29 21:42:08 +00:00
Bill Wendling f2955aa3f2 Convert getAttributes() to return an AttributeSetNode.
The AttributeSetNode contains all of the attributes. This removes one (hopefully
last) use of the Attribute class as a container of multiple attributes.

llvm-svn: 173761
2013-01-29 03:20:31 +00:00
Chandler Carruth 329b590e6e Re-revert r173342, without losing the compile time improvements, flat
out bug fixes, or functionality preserving refactorings.

llvm-svn: 173610
2013-01-27 06:42:03 +00:00
Reid Kleckner ab083f727b FileCheck-ify some grep tests
These tests in particular try to use escaped square brackets as an
argument to grep, which is failing for me with native win32 python.  It
appears the backslash is being lost near the CreateProcess*() call.

llvm-svn: 173506
2013-01-25 22:11:46 +00:00
Chandler Carruth ceff222dea Switch this code away from Value::isUsedInBasicBlock. That code either
loops over instructions in the basic block or the use-def list of the
value, neither of which are really efficient when repeatedly querying
about values in the same basic block.

What's more, we already know that the CondBB is small, and so we can do
a much more efficient test by counting the uses in CondBB, and seeing if
those account for all of the uses.

Finally, we shouldn't blanket fail on any such instruction, instead we
should conservatively assume that those instructions are part of the
cost.

Note that this actually fixes a bug in the pass because
isUsedInBasicBlock has a really terrible bug in it. I'll fix that in my
next commit, but the fix for it would make this code suddenly take the
compile time hit I thought it already was taking, so I wanted to go
ahead and migrate this code to a faster & better pattern.

The bug in isUsedInBasicBlock was also causing other tests to test the
wrong thing entirely: for example we weren't actually disabling
speculation for floating point operations as intended (and tested), but
the test passed because we failed to speculate them due to the
isUsedInBasicBlock failure.

llvm-svn: 173417
2013-01-25 05:40:09 +00:00
Benjamin Kramer 1c4e323fdd Reapply chandlerc's r173342 now that the miscompile it was triggering is fixed.
Original commit message:
Plug TTI into the speculation logic, giving it a real cost interface
that can be specialized by targets.

The goal here is not to be more aggressive, but to just be more accurate
with very obvious cases. There are instructions which are known to be
truly free and which were not being modeled as such in this code -- see
the regression test which is distilled from an inner loop of zlib.

Everywhere the TTI cost model is insufficiently conservative I've added
explicit checks with FIXME comments to go add proper modelling of these
cost factors.

If this causes regressions, the likely solution is to make TTI even more
conservative in its cost estimates, but test cases will help here.

llvm-svn: 173357
2013-01-24 16:44:25 +00:00
Benjamin Kramer 435eba09b7 ConstantFolding: Add a missing folding that leads to a miscompile.
We use constant folding to see if an intrinsic evaluates to the same value as a
constant that we know. If we don't take the undefinedness into account we get a
value that doesn't match the actual implementation, and miscompiled code.

This was uncovered by Chandler's simplifycfg changes.

llvm-svn: 173356
2013-01-24 16:28:28 +00:00
Chandler Carruth 321c6a7c50 Revert r173342 temporarily. It appears to cause a very late miscompile
of stage2 in a bootstrap. Still investigating....

llvm-svn: 173343
2013-01-24 13:24:24 +00:00
Chandler Carruth 5f4519309f Plug TTI into the speculation logic, giving it a real cost interface
that can be specialized by targets.

The goal here is not to be more aggressive, but to just be more accurate
with very obvious cases. There are instructions which are known to be
truly free and which were not being modeled as such in this code -- see
the regression test which is distilled from an inner loop of zlib.

Everywhere the TTI cost model is insufficiently conservative I've added
explicit checks with FIXME comments to go add proper modelling of these
cost factors.

If this causes regressions, the likely solution is to make TTI even more
conservative in its cost estimates, but test cases will help here.

llvm-svn: 173342
2013-01-24 12:39:29 +00:00
Chandler Carruth 01bffaad03 Address a large chunk of this FIXME by accumulating the cost for
unfolded constant expressions rather than checking each one
independently.

llvm-svn: 173341
2013-01-24 12:05:17 +00:00
Chandler Carruth 8a21005cca Switch the constant expression speculation cost evaluation away from
a cost fuction that seems both a bit ad-hoc and also poorly suited to
evaluating constant expressions.

Notably, it is missing any support for trivial expressions such as
'inttoptr'. I could fix this routine, but it isn't clear to me all of
the constraints its other users are operating under.

The core protection that seems relevant here is avoiding the formation
of a select instruction wich a further chain of select operations in
a constant expression operand. Just explicitly encode that constraint.

Also, update the comments and organization here to make it clear where
this needs to go -- this should be driven off of real cost measurements
which take into account the number of constants expressions and the
depth of the constant expression tree.

llvm-svn: 173340
2013-01-24 11:53:01 +00:00
Benjamin Kramer d9c3dabbba ConstantFolding: Evaluate GEP indices in the index type.
This fixes some edge cases that we would get wrong with uint64_ts.
PR14986.

llvm-svn: 173289
2013-01-23 20:41:05 +00:00
Benjamin Kramer e4c46fec73 Revert "InstCombine: Clean up weird code that talks about a modulus that's long gone."
This causes crashes during the build of compiler-rt during selfhost. Add a
testcase for coverage.

llvm-svn: 173279
2013-01-23 17:52:29 +00:00
Bill Wendling d154e283f2 Add the IR attribute 'sspstrong'.
SSPStrong applies a heuristic to insert stack protectors in these situations:

* A Protector is required for functions which contain an array, regardless of
  type or length.

* A Protector is required for functions which contain a structure/union which
  contains an array, regardless of type or length.  Note, there is no limit to
  the depth of nesting.

* A protector is required when the address of a local variable (i.e., stack
  based variable) is exposed. (E.g., such as through a local whose address is
  taken as part of the RHS of an assignment or a local whose address is taken as
  part of a function argument.)

This patch implements the SSPString attribute to be equivalent to
SSPRequired. This will change in a subsequent patch.

llvm-svn: 173230
2013-01-23 06:41:41 +00:00
Nadav Rotem ab3e698ee9 Add support for reverse pointer induction variables. These are loops that contain pointers that count backwards.
For example, this is the hot loop in BZIP:

  do {
    m = *--p;
    *p = ( ... );
  } while (--n);

llvm-svn: 173219
2013-01-23 01:35:00 +00:00
Dmitri Gribenko 44ee2e7a23 Tests: rewrite 'opt ... %s' to 'opt ... < %s' so that opt does not emit a ModuleID
This is done to avoid odd test failures, like the one fixed in r171243.

llvm-svn: 173163
2013-01-22 14:39:21 +00:00
Michael Gottesman 469a465fa8 This test is only supposed to test that the objc-arc alias analysis
allows for gvn to perform certain optimizations. Thus the runline should
only contain -objc-arc-aa, not the full -objc-arc.

llvm-svn: 173126
2013-01-22 04:41:11 +00:00
Andrew Trick 2d35fabc7d Remove target triple from an LSR test.
Manish already fixed this test to work with NoTTI.

llvm-svn: 173110
2013-01-22 00:57:16 +00:00
Paul Redmond 9d86a4a3b6 Transform (sub 0, (zext bool to A)) to (sext bool to A) and
(sub 0, (sext bool to A)) to (zext bool to A).

Patch by Muhammad Ahmad
Reviewed by Duncan Sands

llvm-svn: 173093
2013-01-21 21:57:20 +00:00
Nadav Rotem c42f90b1f4 LoopVectorizer: Implement a new heuristics for selecting the unroll factor.
We ignore the cpu frontend and focus on pipeline utilization. We do this because we
don't have a good way to estimate the loop body size at the IR level.

llvm-svn: 172964
2013-01-20 05:24:29 +00:00
Nadav Rotem 2169dbed2c Change the cpu type in the test.
llvm-svn: 172963
2013-01-20 05:20:56 +00:00
Benjamin Kramer d455ed85d1 LoopVectorizer: Emit memory checks into their own basic block.
This separates the check for "too few elements to run the vector loop" from the
"memory overlap" check, giving a lot nicer code and allowing to skip the memory
checks when we're not going to execute the vector code anyways. We still leave
the decision of whether to emit the memory checks as branches or setccs, but it
seems to be doing a good job. If ugly code pops up we may want to emit them as
separate blocks too. Small speedup on MultiSource/Benchmarks/MallocBench/espresso.

Most of this is legwork to allow multiple bypass blocks while updating PHIs,
dominators and loop info.

llvm-svn: 172902
2013-01-19 13:57:58 +00:00
Bill Wendling da29e00578 Reverting r171325 & r172363. This was causing a mis-compile on the self-hosted LTO build bots.
Okay, here's how to reproduce the problem:

1) Build a Release (or Release+Asserts) version of clang in the normal way.

2) Using the clang & clang++ binaries from (1), build a Release (or
   Release+Asserts) version of the same sources, but this time enable LTO ---
   specify the `-flto' flag on the command line.

3) Run the ARC migrator tests:

    $ arcmt-test --args -triple x86_64-apple-darwin10 -fsyntax-only -x objective-c++ ./src/tools/clang/test/ARCMT/cxx-rewrite.mm

You'll see that the output isn't correct (the whitespace is off).

The mis-compile is in the function `RewriteBuffer::RemoveText' in the
clang/lib/Rewrite/Core/Rewriter.cpp file. When that function and RewriteRope.cpp
are compiled with LTO and the `arcmt-test' executable is regenerated, you'll see
the error. When those files are not LTO'ed, then the output of the `arcmt-test'
is fine.

It is *really* hard to get a testcase out of this. I'll file a PR with what I
have currently.

--- Reverse-merging r172363 into '.':
U    include/llvm/Analysis/MemoryBuiltins.h
U    lib/Analysis/MemoryBuiltins.cpp

--- Reverse-merging r171325 into '.':
U    test/Transforms/InstCombine/objsize.ll
G    include/llvm/Analysis/MemoryBuiltins.h
G    lib/Analysis/MemoryBuiltins.cpp

llvm-svn: 172756
2013-01-17 21:28:46 +00:00
Michael Gottesman 00dfc68c2d Added test for r172599 which fixes bugzilla://14584,rdar://11744105.
llvm-svn: 172656
2013-01-16 21:07:18 +00:00
Benjamin Kramer b7050f0a7c Move test that depends on the x86 target into a target-specific directory.
Should fix the arm buildbot (which only builds the arm target).

llvm-svn: 172611
2013-01-16 13:25:56 +00:00
Benjamin Kramer 1f25d24a8f Remove triple from this test, it makes it fail when X86 TTI is missing.
Without a triple opt falls back to NoTTI which comes closer to LSR's pre-TTI behavior.

llvm-svn: 172609
2013-01-16 13:19:59 +00:00
Nadav Rotem 7df850924d Teach InstCombine to optimize extract of a value from a vector add operation with a constant zero.
llvm-svn: 172576
2013-01-15 23:43:14 +00:00
Shuxin Yang e822745202 1. Hoist minus sign as high as possible in an attempt to reveal
some optimization opportunities (in the enclosing supper-expressions).

   rule 1. (-0.0 - X ) * Y => -0.0 - (X * Y)
     if expression "-0.0 - X" has only one reference.

   rule 2. (0.0 - X ) * Y => -0.0 - (X * Y)
     if expression "0.0 - X" has only one reference, and
        the instruction is marked "noSignedZero".

2. Eliminate negation (The compiler was already able to handle these
    opt if the 0.0s are replaced with -0.0.)

   rule 3: (0.0 - X) * (0.0 - Y) => X * Y
   rule 4: (0.0 - X) * C => X * -C
   if the expr is flagged "noSignedZero".

3. 
  Rule 5: (X*Y) * X => (X*X) * Y
   if X!=Y and the expression is flagged with "UnsafeAlgebra".

   The purpose of this transformation is two-fold:
    a) to form a power expression (of X).
    b) potentially shorten the critical path: After transformation, the
       latency of the instruction Y is amortized by the expression of X*X,
       and therefore Y is in a "less critical" position compared to what it
      was before the transformation. 

4. Remove the InstCombine code about simplifiying "X * select".
   
   The reasons are following:
    a) The "select" is somewhat architecture-dependent, therefore the
       higher level optimizers are not able to precisely predict if
       the simplification really yields any performance improvement
       or not.

    b) The "select" operator is bit complicate, and tends to obscure
       optimization opportunities. It is btter to keep it as low as
       possible in expr tree, and let CodeGen to tackle the optimization.

llvm-svn: 172551
2013-01-15 21:09:32 +00:00
Renato Golin 51c25b0818 Pattern-matched variables in post-inc-icmpzero.ll
Test was failing for clang-native-arm-cortex-a9 build-bot configuration.
The reason for the failure was the test was using hardcoded names.
The attached patch fixes this failure by replacing the hard-coded variables
names with pattern-matched variable names.

Patch by Manish Verma, ARM

llvm-svn: 172534
2013-01-15 15:22:45 +00:00
Shuxin Yang 320f52a4b0 This change is to implement following rules under the condition C_A and/or C_R
---------------------------------------------------------------------------
 C_A: reassociation is allowed
 C_R: reciprocal of a constant C is appropriate, which means 
    - 1/C is exact, or 
    - reciprocal is allowed and 1/C is neither a special value nor a denormal.
 -----------------------------------------------------------------------------

 rule1:  (X/C1) / C2 => X / (C2*C1)  (if C_A)
                     => X * (1/(C2*C1))  (if C_A && C_R)
 rule 2:  X*C1 / C2 => X * (C1/C2)  if C_A
 rule 3: (X/Y)/Z = > X/(Y*Z)  (if C_A && at least one of Y and Z is symbolic value)
 rule 4: Z/(X/Y) = > (Z*Y)/X  (similar to rule3)

 rule 5: C1/(X*C2) => (C1/C2) / X (if C_A)
 rule 6: C1/(X/C2) => (C1*C2) / X (if C_A)
 rule 7: C1/(C2/X) => (C1/C2) * X (if C_A)

llvm-svn: 172488
2013-01-14 22:48:41 +00:00
Andrew Trick d4e1b5e291 SCEVExpander fix. RAUW needs to update the InsertedExpressions cache.
Note that this bug is only exposed because LTO fails to use TTI.

Fixes self-LTO of clang. rdar://13007381.

llvm-svn: 172462
2013-01-14 21:00:37 +00:00
Michael Gottesman c99ee6b336 Added bugzilla PR number to test case.
llvm-svn: 172369
2013-01-13 22:17:22 +00:00
Michael Gottesman f15c0bb495 Fixed an infinite loop in the block escape in analysis in ObjCARC caused by 2x blocks each assigned a value via a phi-node causing each to depend on the other.
A test case is provided as well.

llvm-svn: 172368
2013-01-13 22:12:06 +00:00
Nadav Rotem 40e45eeae2 Fix PR14547. Handle induction variables of small sizes smaller than i32 (i8 and i16).
llvm-svn: 172348
2013-01-13 07:56:29 +00:00
Michael Gottesman 556ff61122 Fixed bug in ObjCARC where we were changing a call from objc_autoreleaseRV => objc_autorelease but were not updating the InstructionClass to IC_Autorelease.
llvm-svn: 172288
2013-01-12 01:25:19 +00:00
Michael Gottesman c9656faf1e Fixed a bug where we were tail calling objc_autorelease causing an object to not be placed into an autorelease pool.
The reason that this occurs is that tail calling objc_autorelease eventually
tail calls -[NSObject autorelease] which supports fast autorelease. This can
cause us to violate the semantic gaurantees of __autoreleasing variables that
assignment to an __autoreleasing variables always yields an object that is
placed into the innermost autorelease pool.

The fix included in this patch works by:

1. In the peephole optimization function OptimizeIndividualFunctions, always
remove tail call from objc_autorelease.
2. Whenever we convert to/from an objc_autorelease, set/unset the tail call
keyword as appropriate.

*NOTE* I also handled the case where objc_autorelease is converted in
OptimizeReturns to an autoreleaseRV which still violates the ARC semantics. I
will be removing that in a later patch and I wanted to make sure that the tree
is in a consistent state vis-a-vis ARC always.

Additionally some test cases are provided and all tests that have tail call marked
objc_autorelease keywords have been modified so that tail call has been removed.

*NOTE* One test fails due to a separate bug that I am going to commit soon. Thus
I marked the check line TMP: instead of CHECK: so make check does not fail.

llvm-svn: 172287
2013-01-12 01:25:15 +00:00
Nadav Rotem e55aa3c848 ARM Cost Model: Modify the target independent cost model to ask
the target if it supports the different CAST types. We didn't do this
on X86 because of the different register sizes and types, but on ARM
this makes sense.

llvm-svn: 172245
2013-01-11 19:54:13 +00:00
Nadav Rotem 853fe0acb9 ARM Cost Model: We need to detect the max bitwidth of types in the loop in order to select the max vectorization factor.
We don't have a detailed analysis on which values are vectorized and which stay scalars in the vectorized loop so we use
another method. We look at reduction variables, loads and stores, which are the only ways to get information in and out
of loop iterations. If the data types are extended and truncated then the cost model will catch the cost of the vector
zext/sext/trunc operations.

llvm-svn: 172178
2013-01-11 07:11:59 +00:00
Michael Gottesman 5284bd013f Converted test dont-tce-tail-marked-call.ll to use FileCheck.
llvm-svn: 172172
2013-01-11 04:16:35 +00:00
Michael Gottesman 93a0d49c7e This commit is a 4x squash commit consisting of 4x functions converted to use FileCheck instead of grep.
Messages:
Converted test case trivial_codegen_tailcall.ll to use FileCheck.
Converted test return_constant.ll to use FileCheck instead of grep.
Converted test reorder_load.ll to use FileCheck instead of grep.
Converted test intervening-inst.ll to use FileCheck instead of grep.

llvm-svn: 172171
2013-01-11 04:12:53 +00:00
Shuxin Yang c5c730b0e0 PR14904: Segmentation fault running pass 'Recognize loop idioms'
The root cause is mistakenly taking for granted that 
    "dyn_cast<Instruction>(a-Value)"
return a non-NULL instruction.

llvm-svn: 172145
2013-01-10 23:32:01 +00:00
Evan Cheng 098d7b76b0 CastInst::castIsValid should return true if the dest type is the same as
Value's current type. The casting is trivial even for aggregate type.

llvm-svn: 172143
2013-01-10 23:22:53 +00:00
Owen Anderson dbf0ca523d Teach InstCombine to hoist FABS and FNEG through FPTRUNC instructions. The application of these operations commutes with the truncation, so we should prefer to do them in the smallest size we can, to save register space, use smaller constant pool entries, etc.
llvm-svn: 172117
2013-01-10 22:06:52 +00:00
Nadav Rotem 6eae65cfac LoopVectorizer: Fix a bug in the vectorization of BinaryOperators. The BinaryOperator can be folded to an Undef, and we don't want to set NSW flags to undef vals.
PR14878

llvm-svn: 172079
2013-01-10 17:34:39 +00:00
Joey Gouly 58bf951dec Fix TryToShrinkGlobalToBoolean in GlobalOpt, so that it does not discard address spaces.
llvm-svn: 172051
2013-01-10 10:31:11 +00:00
Nadav Rotem b1791a75cd ARM Cost model: Use the size of vector registers and widest vectorizable instruction to determine the max vectorization factor.
llvm-svn: 172010
2013-01-09 22:29:00 +00:00
Benjamin Kramer 130fcde3e5 LICM: Hoist insertvalue/extractvalue out of loops.
Fixes PR14854.

llvm-svn: 171984
2013-01-09 18:12:03 +00:00
Nadav Rotem 4c66f87e8e ARM Cost Model: Add a basic vectorization unrolling test.
llvm-svn: 171931
2013-01-09 01:29:07 +00:00
Nadav Rotem 30a65bc39e Remove the -licm pass from the loop vectorizer test because the loop vectorizer does it now.
llvm-svn: 171930
2013-01-09 01:20:59 +00:00
Nadav Rotem b696c36fcd Cost Model: Move the 'max unroll factor' variable to the TTI and add initial Cost Model support on ARM.
llvm-svn: 171928
2013-01-09 01:15:42 +00:00
Shuxin Yang f0537ab681 Consider expression "0.0 - X" as the negation of X if
- this expression is explicitly marked no-signed-zero, or
  - no-signed-zero of this expression can be derived from some context.

llvm-svn: 171922
2013-01-09 00:13:41 +00:00
Bill Wendling 76c6521ba1 Make sure we don't emit instructions before a landingpad instruction.
PR14782

llvm-svn: 171846
2013-01-08 10:51:32 +00:00
Nadav Rotem 5a197c06f3 LoopVectorizer: Add support for floating point reductions
llvm-svn: 171812
2013-01-07 23:13:00 +00:00
Nadav Rotem c60d7d96f5 LoopVectorizer: When we vectorizer and widen loops we process many elements at once. This is a good thing, except for
small loops. On small loops post-loop that handles scalars (and runs slower) can take more time to execute than the
rest of the loop. This patch disables widening of loops with a small static trip count.

llvm-svn: 171798
2013-01-07 21:54:51 +00:00
Shuxin Yang df0e61e793 This change is to implement following rules:
o. X/C1 * C2 => X * (C2/C1) (if C2/C1 is neither special FP nor denormal)
  o. X/C1 * C2 -> X/(C1/C2)   (if C2/C1 is either specical FP or denormal, but C1/C2 is a normal Fp)

     Let MDC denote multiplication or dividion with one & only one operand being a constant
  o. (MDC ± C1) * C2 => (MDC * C2) ± (C1 * C2)
     (so long as the constant-folding doesn't yield any denormal or special value)

llvm-svn: 171793
2013-01-07 21:39:23 +00:00