Commit Graph

4601 Commits

Author SHA1 Message Date
Owen Anderson a4428aa484 Remove an InstCombine that transformed patterns like (x * uitofp i1 y) to (select y, x, 0.0) when the multiply has fast math flags set.
While this might seem like an obvious canonicalization, there is one subtle problem with it.  The result of the original expression
is undef when x is NaN (remember, fast math flags), but the result of the select is always defined when x is NaN.  This means that the
new expression is strictly more defined than the original one.  One unfortunate consequence of this is that the transform is not reversible!
It's always legal to make increase the defined-ness of an expression, but it's not legal to reduce it.  Thus, targets that prefer the original
form of the expression cannot reverse the transform to recover it.  Another way to think of it is that the transform has lost source-level
information (the fast math flags), which is undesirable.

llvm-svn: 215825
2014-08-17 03:51:29 +00:00
David Majnemer f9a095d606 InstCombine: Combine mul with div.
We can combne a mul with a div if one of the operands is a multiple of
the other:

%mul = mul nsw nuw %a, C1
%ret = udiv %mul, C2
  =>
%ret = mul nsw %a, (C1 / C2)

This can expose further optimization opportunities if we end up
multiplying or dividing by a power of 2.

Consider this small example:

define i32 @f(i32 %a) {
  %mul = mul nuw i32 %a, 14
  %div = udiv exact i32 %mul, 7
  ret i32 %div
}

which gets CodeGen'd to:

    imull       $14, %edi, %eax
    imulq       $613566757, %rax, %rcx
    shrq        $32, %rcx
    subl        %ecx, %eax
    shrl        %eax
    addl        %ecx, %eax
    shrl        $2, %eax
    retq

We can now transform this into:
define i32 @f(i32 %a) {
  %shl = shl nuw i32 %a, 1
  ret i32 %shl
}

which gets CodeGen'd to:

    leal        (%rdi,%rdi), %eax
    retq

This fixes PR20681.

llvm-svn: 215815
2014-08-16 08:55:06 +00:00
Hal Finkel 61c386126b Copy noalias metadata from call sites to inlined instructions
When a call site with noalias metadata is inlined, that metadata can be
propagated directly to the inlined instructions (only those that might access
memory because it is not useful on the others). Prior to inlining, the noalias
metadata could express that a call would not alias with some other memory
access, which implies that no instruction within that called function would
alias. By propagating the metadata to the inlined instructions, we preserve
that knowledge.

This should complete the enhancements requested in PR20500.

llvm-svn: 215676
2014-08-14 21:09:37 +00:00
Hal Finkel d2dee16c27 Add noalias metadata for general calls (not just memory intrinsics) during inlining
When preserving noalias function parameter attributes by adding noalias
metadata in the inliner, we should do this for general function calls (not just
memory intrinsics). The logic is very similar to what already existed (except
that we want to add this metadata even for functions taking no relevant
parameters). This metadata can be used by ModRef queries in the caller after
inlining.

This addresses the first part of PR20500. Adding noalias metadata during
inlining is still turned off by default.

llvm-svn: 215657
2014-08-14 16:44:03 +00:00
Chad Rosier 11ab941644 [Reassociation] Add support for reassociation with unsafe algebra.
Vector instructions are (still) not supported for either integer or floating
point.  Hopefully, that work will be landed shortly.

llvm-svn: 215647
2014-08-14 15:23:01 +00:00
David Majnemer 698dca0b95 InstCombine: ((A | ~B) ^ (~A | B)) to A ^ B
Proof using CVC3 follows:
$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR((A | ~B),(~A |B)) = BVXOR(A,B);
$ cvc3 t.cvc
Valid.

Patch by Mayur Pandey!

Differential Revision: http://reviews.llvm.org/D4883

llvm-svn: 215621
2014-08-14 06:46:25 +00:00
David Majnemer f1eda23514 Added InstCombine Transform for ((B | C) & A) | B -> B | (A & C)
Transform ((B | C) & A) | B --> B | (A & C)

Z3 Link: http://rise4fun.com/Z3/hP6p

Patch by Sonam Kumari!

Differential Revision: http://reviews.llvm.org/D4865

llvm-svn: 215619
2014-08-14 06:41:38 +00:00
Jan Vesely 0cd3ec6cfa utils: Fix segfault in flattencfg
v2: continue iterating through the rest of the bb
    use for loop

v3: initialize FlattenCFG pass in ScalarOps
    add test

v4: split off initializing flattencfg to a separate patch
    add comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215574
2014-08-13 20:31:53 +00:00
Chandler Carruth 0fb998110a [optnone] Make the optnone attribute effective at suppressing function
attribute and function argument attribute synthesizing and propagating.

As with the other uses of this attribute, the goal remains a best-effort
(no guarantees) attempt to not optimize the function or assume things
about the function when optimizing. This is particularly useful for
compiler testing, bisecting miscompiles, triaging things, etc. I was
hitting specific issues using optnone to isolate test code from a test
driver for my fuzz testing, and this is one step of fixing that.

llvm-svn: 215538
2014-08-13 10:49:33 +00:00
Karthik Bhat a4a4db91be InstCombine: Combine (xor (or %a, %b) (xor %a, %b)) to (add %a, %b)
Correctness proof of the transform using CVC3-

$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR(A | B, BVXOR(A,B) ) = A & B;

$ cvc3 t.cvc
Valid.

llvm-svn: 215524
2014-08-13 05:13:14 +00:00
Matt Arsenault 4815f09bbe Allwo bitcast + struct GEP transform to work with addrspacecast
llvm-svn: 215467
2014-08-12 19:46:13 +00:00
David Majnemer ab07f00c64 InstCombine: Combine (add (and %a, %b) (or %a, %b)) to (add %a, %b)
What follows bellow is a correctness proof of the transform using CVC3.

$ < t.cvc
A, B : BITVECTOR(32);

QUERY BVPLUS(32, A & B, A | B) = BVPLUS(32, A, B);

$ cvc3 < t.cvc
Valid.

llvm-svn: 215400
2014-08-11 22:32:02 +00:00
Jiangning Liu 40b04fd994 In LVI(Lazy Value Info), originally value on a BB can only be caculated once,
and the lattice will be updated to be a state other than "undefined". This
limiation could miss some opportunities of lowering "overdefined" to be an
even accurate value. So this patch ask the algorithm to try to lower the
lattice value again even if the value has been lowered to be "overdefined".

llvm-svn: 215343
2014-08-11 05:02:04 +00:00
James Molloy 65b08f5e46 [LoopVectorizer] Enable support for floating-point subtraction reductions
llvm-svn: 215200
2014-08-08 12:41:08 +00:00
David Majnemer fe8c7540b0 GlobalOpt: Optimize in the face of insertvalue/extractvalue
GlobalOpt didn't know how to simulate InsertValueInst or
ExtractValueInst.  Optimizing these is pretty straightforward.

N.B. This came up when looking at clang's IRGen for MS ABI member
pointers; they are represented as aggregates.

llvm-svn: 215184
2014-08-08 05:50:43 +00:00
Arnold Schwaighofer 4fb3c47456 SLPVectorizer: Use the type of the value loaded/stored to get the ABI alignment
We were using the pointer type which is incorrect.

llvm-svn: 215162
2014-08-07 22:47:27 +00:00
Owen Anderson 6c19ab1b5d Fix a case in SROA where lifetime intrinsics could inhibit alloca promotion. In
this case, the code path dealing with vector promotion was missing the explicit
checks for lifetime intrinsics that were present on the corresponding integer
promotion path.

llvm-svn: 215148
2014-08-07 21:07:35 +00:00
Rui Ueyama c487f7728e Revert "r214897 - Remove dead zero store to calloc initialized memory"
It broke msan.

llvm-svn: 214989
2014-08-06 19:30:38 +00:00
Philip Reames 00c9b6461f Remove dead zero store to calloc initialized memory
Optimize the following IR:

%1 = tail call noalias i8* @calloc(i64 1, i64 4)
%2 = bitcast i8* %1 to i32*
; This store is dead and should be removed
store i32 0, i32* %2, align 4

Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store.  If the store is to an out of bounds address, it is undefined and thus also removable.

Reviewed By: nicholas

Differential Revision: http://reviews.llvm.org/D3942

llvm-svn: 214897
2014-08-05 17:48:20 +00:00
James Molloy 2b8933c354 Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost.
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.

llvm-svn: 214859
2014-08-05 12:30:34 +00:00
Manman Ren 062f58d550 [SimplifyCFG] fix accessing deleted PHINodes in switch-to-table conversion.
When we have a covered lookup table, make sure we don't delete PHINodes that
are cached in PHIs.

rdar://17887153

llvm-svn: 214642
2014-08-02 23:41:54 +00:00
Erik Eckstein 26a1bf7d84 fix bug 20513 - Crash in SLP Vectorizer
llvm-svn: 214638
2014-08-02 19:39:42 +00:00
Tyler Nowicki 064896bbc5 Add diagnostics to the vectorizer cost model.
When the cost model determines vectorization is not possible/profitable these remarks print an analysis of that decision.

Note that in selectVectorizationFactor() we can assume that OptForSize and ForceVectorization are mutually exclusive.

Reviewed by Arnold Schwaighofer

llvm-svn: 214599
2014-08-02 00:14:03 +00:00
Peter Collingbourne e52646cd80 PartiallyInlineLibCalls: Check sqrt result type before transforming it.
Some configure scripts declare this with the wrong prototype, which can lead
to an assertion failure.

llvm-svn: 214593
2014-08-01 23:21:21 +00:00
Erik Eckstein c80e1dc081 SLPVectorizer: improved scheduling algorithm.
llvm-svn: 214494
2014-08-01 09:20:42 +00:00
Suyog Sarda 56c9a87035 This patch implements transform for pattern "(A & ~B) ^ (~A) -> ~(A & B)".
Differential Revision: http://reviews.llvm.org/D4653

llvm-svn: 214479
2014-08-01 05:07:20 +00:00
Suyog Sarda 1c6c2f69f7 This patch implements transform for pattern "(A | B) & ((~A) ^ B) -> (A & B)".
Differential Revision: http://reviews.llvm.org/D4628

llvm-svn: 214478
2014-08-01 04:59:26 +00:00
Suyog Sarda 52324c82cc This patch implements transform for pattern "( A & (~B)) | (A ^ B) -> (A ^ B)"
Differential Revision: http://reviews.llvm.org/D4652

llvm-svn: 214477
2014-08-01 04:50:31 +00:00
Suyog Sarda 16d646594e This patch implements transform for pattern "(A & B) | ((~A) ^ B) -> (~A ^ B)".
Patch Credit to Ankit Jain !

Differential Revision: http://reviews.llvm.org/D4655

llvm-svn: 214476
2014-08-01 04:41:43 +00:00
Tyler Nowicki b5a65395cc Improve the remark generated for -Rpass-missed.
The current remark is ambiguous and makes it sounds like explicitly specifying vectorization will allow the loop to be vectorized. This is not the case. The improved remark directs the user to -Rpass-analysis=loop-vectorize to determine the cause of the pass-miss.

Reviewed by Arnold Schwaighofer`

llvm-svn: 214445
2014-07-31 21:22:22 +00:00
Tyler Nowicki 9fe497fcac Improve the remark generated when a variable that is used outside the loop is not a reduction or induction variable.
Reviewed by Arnold Schwaighofer

llvm-svn: 214440
2014-07-31 21:02:40 +00:00
David Majnemer a92687d636 InstCombine: Correctly propagate NSW/NUW for x-(-A) -> x+A
We can only propagate the nsw bits if both subtraction instructions are
marked with the appropriate bit.

N.B.  We only propagate the nsw bit in InstCombine because the nuw case
is already handled in InstSimplify.

This fixes PR20189.

llvm-svn: 214385
2014-07-31 04:49:29 +00:00
David Majnemer cd4fbcd1bb InstSimplify: Simplify (X - (0 - Y)) if the second sub is NUW
If the NUW bit is set for 0 - Y, we know that all values for Y other
than 0 would produce a poison value.  This allows us to replace (0 - Y)
with 0 in the expression (X - (0 - Y)) which will ultimately leave us
with X.

This partially fixes PR20189.

llvm-svn: 214384
2014-07-31 04:49:18 +00:00
Rafael Espindola 464fe024c5 Use "weak alias" instead of "alias weak"
Before this patch we had

@a = weak global ...
but
@b = alias weak ...

The patch changes aliases to look more like global variables.

Looking at some really old code suggests that the reason was that the old
bison based parser had a reduction for alias linkages and another one for
global variable linkages. Putting the alias first avoided the reduce/reduce
conflict.

The days of the old .ll parser are long gone. The new one parses just "linkage"
and a later check is responsible for deciding if a linkage is valid in a
given context.

llvm-svn: 214355
2014-07-30 22:51:54 +00:00
David Majnemer 42af3601c2 InstCombine: Simplify (A ^ B) or/and (A ^ B ^ C)
While we can already transform A | (A ^ B) into A | B, things get bad
once we have (A ^ B) | (A ^ B ^ Cst) because reassociation will morph
this into (A ^ B) | ((A ^ Cst) ^ B).  Our existing patterns fail once
this happens.

To fix this, we add a new pattern which looks through the tree of xor
binary operators to see that, in fact, there exists a redundant xor
operation.

What follows bellow is a correctness proof of the transform using CVC3.

$ cat t.cvc
A, B, C : BITVECTOR(64);

QUERY BVXOR(A, B) | BVXOR(BVXOR(B, C), A) = BVXOR(A, B) | C;
QUERY BVXOR(BVXOR(A, C), B) | BVXOR(A, B) = BVXOR(A, B) | C;

QUERY BVXOR(A, B) & BVXOR(BVXOR(B, C), A) = BVXOR(A, B) & ~C;
QUERY BVXOR(BVXOR(A, C), B) & BVXOR(A, B) = BVXOR(A, B) & ~C;

$ cvc3 < t.cvc
Valid.
Valid.
Valid.
Valid.

llvm-svn: 214342
2014-07-30 21:26:37 +00:00
Chad Rosier 78f41b3ca7 SLP Vectorizer: Canonicalize tree operands of commutitive binary operands.
llvm-svn: 214338
2014-07-30 21:07:56 +00:00
Rafael Espindola d07cf400ab SimplifyCFG: Avoid miscompilations due to removed lifetime intrinsics.
The lifetime intrinsics need some work in order to make it clear which
optimizations are or are not valid.

For now dropping this optimization avoids a miscompilation.

Patch by Björn Steinbrink.

llvm-svn: 214336
2014-07-30 21:04:00 +00:00
Tim Northover e2239ff3eb CodeGenPrep: fall back to MVT::Other if instruction's type isn't an EVT.
The test being performed is just an approximation anyway, so it really
shouldn't crash when things don't go entirely as expected.

Should fix PR20474.

llvm-svn: 214177
2014-07-29 10:20:22 +00:00
Hal Finkel f5867a79c5 Canonicalization for @llvm.assume
Adds simple logical canonicalization of assumption intrinsics to instcombine,
currently:
 - invariant(a && b) -> invariant(a); invariant(b)
 - invariant(!(a || b)) -> invariant(!a); invariant(!b)

llvm-svn: 213977
2014-07-25 21:45:17 +00:00
Hal Finkel 930469107d Add @llvm.assume, lowering, and some basic properties
This is the first commit in a series that add an @llvm.assume intrinsic which
can be used to provide the optimizer with a condition it may assume to be true
(when the control flow would hit the intrinsic call). Some basic properties are added here:

 - llvm.invariant(true) is dead.
 - llvm.invariant(false) is unreachable (this directly corresponds to the
   documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef).

The intrinsic is tagged as writing arbitrarily, in order to maintain control
dependencies. BasicAA has been updated, however, to return NoModRef for any
particular location-based query so that we don't unnecessarily block code
motion.

llvm-svn: 213973
2014-07-25 21:13:35 +00:00
Hal Finkel ff0bcb60c9 Convert noalias parameter attributes into noalias metadata during inlining
This functionality is currently turned off by default.

Part of the motivation for introducing scoped-noalias metadata is to enable the
preservation of noalias parameter attribute information after inlining.
Sometimes this can be inferred from the code in the caller after inlining, but
often we simply lose valuable information.

The overall process if fairly simple:
 1. Create a new unqiue scope domain.
 2. For each (used) noalias parameter, create a new alias scope.
 3. For each pointer, collect the underlying objects. Add a noalias scope for
    each noalias parameter from which we're not derived (and has not been
    captured prior to that point).
 4. Add an alias.scope for each noalias parameter from which we might be
    derived (or has been captured before that point).

Note that the capture checks apply only if one of the underlying objects is not
an identified function-local object.

llvm-svn: 213949
2014-07-25 15:50:08 +00:00
Mark Heffernan 8ec1474f7f After unrolling a loop with llvm.loop.unroll.count metadata (unroll factor
hint) the loop unroller replaces the llvm.loop.unroll.count metadata with
llvm.loop.unroll.disable metadata to prevent any subsequent unrolling
passes from unrolling more than the hint indicates.  This patch fixes
an issue where loop unrolling could be disabled for other loops as well which
share the same llvm.loop metadata.

llvm-svn: 213900
2014-07-24 22:36:40 +00:00
Manman Ren 29a2005596 Try to fix the bots again by moving test to X86 directory.
llvm-svn: 213884
2014-07-24 17:57:09 +00:00
Manman Ren a8bc9a4c36 Try to fix the bots. If this does not work, I am going to move it to X86 directory.
llvm-svn: 213880
2014-07-24 17:18:33 +00:00
Hal Finkel 9414665a3b Add scoped-noalias metadata
This commit adds scoped noalias metadata. The primary motivations for this
feature are:
  1. To preserve noalias function attribute information when inlining
  2. To provide the ability to model block-scope C99 restrict pointers

Neither of these two abilities are added here, only the necessary
infrastructure. In fact, there should be no change to existing functionality,
only the addition of new features. The logic that converts noalias function
parameters into this metadata during inlining will come in a follow-up commit.

What is added here is the ability to generally specify noalias memory-access
sets. Regarding the metadata, alias-analysis scopes are defined similar to TBAA
nodes:

!scope0 = metadata !{ metadata !"scope of foo()" }
!scope1 = metadata !{ metadata !"scope 1", metadata !scope0 }
!scope2 = metadata !{ metadata !"scope 2", metadata !scope0 }
!scope3 = metadata !{ metadata !"scope 2.1", metadata !scope2 }
!scope4 = metadata !{ metadata !"scope 2.2", metadata !scope2 }

Loads and stores can be tagged with an alias-analysis scope, and also, with a
noalias tag for a specific scope:

... = load %ptr1, !alias.scope !{ !scope1 }
... = load %ptr2, !alias.scope !{ !scope1, !scope2 }, !noalias !{ !scope1 }

When evaluating an aliasing query, if one of the instructions is associated
with an alias.scope id that is identical to the noalias scope associated with
the other instruction, or is a descendant (in the scope hierarchy) of the
noalias scope associated with the other instruction, then the two memory
accesses are assumed not to alias.

Note that is the first element of the scope metadata is a string, then it can
be combined accross functions and translation units. The string can be replaced
by a self-reference to create globally unqiue scope identifiers.

[Note: This overview is slightly stylized, since the metadata nodes really need
to just be numbers (!0 instead of !scope0), and the scope lists are also global
unnamed metadata.]

Existing noalias metadata in a callee is "cloned" for use by the inlined code.
This is necessary because the aliasing scopes are unique to each call site
(because of possible control dependencies on the aliasing properties). For
example, consider a function: foo(noalias a, noalias b) { *a = *b; } that gets
inlined into bar() { ... if (...) foo(a1, b1); ... if (...) foo(a2, b2); } --
now just because we know that a1 does not alias with b1 at the first call site,
and a2 does not alias with b2 at the second call site, we cannot let inlining
these functons have the metadata imply that a1 does not alias with b2.

llvm-svn: 213864
2014-07-24 14:25:39 +00:00
Manman Ren edc60376ed SimplifyCFG: fix a bug in switch to table conversion
We use gep to access the global array "switch.table", and the table index
should be treated as unsigned. When the highest bit is 1, this commit
zero-extends the index to an integer type with larger size.

For a switch on i2, we used to generate:
%switch.tableidx = sub i2 %0, -2
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx

It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate
%switch.tableidx = sub i2 %0, -2
%switch.tableidx.zext = zext i2 %switch.tableidx to i3
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext

rdar://17735071

llvm-svn: 213815
2014-07-23 23:13:23 +00:00
David Blaikie 8e9cfa5497 ArgPromo+DebugInfo: Handle updating debug info over multiple applications of argument promotion.
While the subprogram map cache used by Dead Argument Elimination works
there, I made a mistake when reusing it for Argument Promotion in
r212128 because ArgPromo may transform functions more than once whereas
DAE transforms each function only once, removing all the dead arguments
in one go.

To address this, ensure that the map is updated after each argument
promotion.

In retrospect it might be a little wasteful to create a map of all
subprograms when only handling a single CGSCC, but the alternative is
walking the debug info for each function in the CGSCC that gets updated.
It's not clear to me what the right tradeoff is there, but since the
current tradeoff seems to be working OK (and the code to keep things
updated is very cheap), let's stick with that for now.

llvm-svn: 213805
2014-07-23 22:09:29 +00:00
David Blaikie f997c6f90b Test debug info in arg promotion with an actual promotion case, rather than a degenerate arg promotion that's actually DAE performed by ArgPromo
Also the debug location I had here was bogus, describing the location of
the call site as in the callee - and unnecessary, so just drop it.

llvm-svn: 213803
2014-07-23 21:30:59 +00:00
Mark Heffernan 9e112443b6 Do not add unroll disable metadata after unrolling pass for loops with #pragma clang loop unroll(full).
llvm-svn: 213789
2014-07-23 20:05:44 +00:00
Mark Heffernan e6b4ba1c41 In unroll pragma syntax and loop hint metadata, change "enable" forms to a new form using the string "full".
llvm-svn: 213772
2014-07-23 17:31:37 +00:00