Commit Graph

2103 Commits

Author SHA1 Message Date
Sanjay Patel 980b280f50 [LibCallSimplifier] fold memset(malloc(x), 0, x) --> calloc(1, x)
This is a step towards solving PR25892:
https://llvm.org/bugs/show_bug.cgi?id=25892

It won't handle the reported case. As noted by the 'TODO' comments in the patch, 
we need to relax the hasOneUse() constraint and also match patterns that include
memset_chk() and the llvm.memset() intrinsic in addition to memset().

Differential Revision: http://reviews.llvm.org/D16337

llvm-svn: 258816
2016-01-26 16:17:24 +00:00
Matt Arsenault bef34e21c7 AMDGPU: Rename intrinsics to use amdgcn prefix
The intrinsic target prefix should match the target name
as it appears in the triple.

This is not yet complete, but gets most of the important ones.
llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled
for compatability for now.

llvm-svn: 258557
2016-01-22 21:30:34 +00:00
Sanjay Patel fcc7c1a0ba [LibCallSimplifier] don't get fooled by a fake fmin()
This is similar to the bug/fix:
https://llvm.org/bugs/show_bug.cgi?id=26211
http://reviews.llvm.org/rL258325

The fmin() test case reveals another bug caused by sloppy
code duplication. It will crash without this patch because
fp128 is a valid floating-point type, but we would think
that we had matched a function that used doubles.

The new helper function can be used to replace similar
checks that are used in several other places in this file.

llvm-svn: 258428
2016-01-21 20:19:54 +00:00
Sanjay Patel bd2dc67142 [LibCallSimplifier] don't get fooled by a fake sqrt()
The test case will crash without this patch because the subsequent call to
hasUnsafeAlgebra() assumes that the call instruction is an FPMathOperator
(ie, returns an FP type).

This part of the function signature check was omitted for the sqrt() case, 
but seems to be in place for all other transforms.

Before:
http://reviews.llvm.org/rL257400
...we would have needlessly continued execution in optimizeSqrt(), but the
bug was harmless because we'd eventually fail some other check and return
without damage.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=26211

Differential Revision: http://reviews.llvm.org/D16198

llvm-svn: 258325
2016-01-20 17:41:14 +00:00
Sanjay Patel 582857c95c add tests to show missing memset/malloc optimizations (PR25892)
llvm-svn: 258218
2016-01-19 23:07:10 +00:00
Sanjay Patel d1f4f03f5e [LibCallSimplifier] use instruction-level fast-math-flags to shrink calls
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555

llvm-svn: 258158
2016-01-19 18:38:52 +00:00
Sanjay Patel 81a63cd11f [LibCallSimplifier] use instruction-level fast-math-flags to transform pow(x, [small integer]) calls
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555

As with D15937, the intent of the patch is to preserve the current behavior of the transform
except that we use the pow call's 'fast' attribute as a trigger rather than a function-level
attribute.

The TODO comment notes a potential follow-on patch that would propagate FMF to the new
instructions.

Differential Revision: http://reviews.llvm.org/D16122

llvm-svn: 258153
2016-01-19 18:15:12 +00:00
Artur Pilipenko f84dc06e5b Push isDereferenceableAndAlignedPointer down into isSafeToLoadUnconditionally
Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D16226

llvm-svn: 258010
2016-01-17 12:35:29 +00:00
Silviu Baranga f29dfd36bb Re-commit r257064, after it was reverted in r257340.
This contains a fix for the issue that caused the revert:
we no longer assume that we can insert instructions after the
instruction that produces the base pointer. We previously
assumed that this would be ok, because the instruction produces
a value and therefore is not a terminator. This is false for invoke
instructions. We will now insert these new instruction directly
at the location of the users.

Original commit message:

[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs

Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257897
2016-01-15 15:52:05 +00:00
James Molloy f01488e2bc [InstCombine] Rewrite bswap/bitreverse handling completely.
There are several requirements that ended up with this design;
  1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early.
  2. Bitreversals and byteswaps are very related in their matching logic.
  3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses.
  4. Bswaps are best matched early in InstCombine.

The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals.

We can then extend the matching logic in one place only.

llvm-svn: 257875
2016-01-15 09:20:19 +00:00
Sanjay Patel 53ba88dbb0 [LibCallSimplifier] use instruction-level fast-math-flags to transform pow(x, 0.5) calls
Also, propagate the FMF to the newly created sqrt() call.

llvm-svn: 257503
2016-01-12 19:06:35 +00:00
Sanjay Patel 6002e78a06 [LibCallSimplifier] use instruction-level fast-math-flags to transform pow(exp(x)) calls
See also:
http://reviews.llvm.org/rL255555
http://reviews.llvm.org/rL256871
http://reviews.llvm.org/rL256964
http://reviews.llvm.org/rL257400
http://reviews.llvm.org/rL257404
http://reviews.llvm.org/rL257414

llvm-svn: 257491
2016-01-12 17:30:37 +00:00
Sanjay Patel dc19b8de76 consolidate exp/exp2 tests
The transform is identical, so keep the tests together and save some overhead.

llvm-svn: 257484
2016-01-12 17:00:38 +00:00
Sanjay Patel 71e550fefb Add/edit tests to include instruction-level FMF on calls
Prepatory patch before changing LibCallSimplifier to use the FMF.
Also, tighten the CHECK lines and give the tests more meaningful names.
Similar changes to:
http://reviews.llvm.org/rL257414

llvm-svn: 257481
2016-01-12 16:50:17 +00:00
Sanjay Patel e896ede7f1 [LibCallSimplifier] use instruction-level fast-math-flags to transform log calls
Also, add tests to verify that we're checking 'fast' on both calls of each transform pair,
tighten the CHECK lines, and give the tests more meaningful names.

This is a continuation of:
http://reviews.llvm.org/rL255555
http://reviews.llvm.org/rL256871
http://reviews.llvm.org/rL256964
http://reviews.llvm.org/rL257400
http://reviews.llvm.org/rL257404

llvm-svn: 257414
2016-01-11 23:31:48 +00:00
Sanjay Patel 6c1ddbb7b6 [LibCallSimplifier] don't allow sqrt transform unless all ops are unsafe
Fix the FIXME added with:
http://reviews.llvm.org/rL257400

llvm-svn: 257404
2016-01-11 22:50:36 +00:00
Sanjay Patel 683f29735f [LibCallSimplifier] use instruction-level fast-math-flags to transform sqrt calls
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555

The intent of the patch is to preserve the current behavior of the transform except
that we use the sqrt instruction's 'fast' attribute as a trigger rather than the
function-level attribute.

But this raises a bug noted by the new FIXME comment.

In order to do this transform:
sqrt((x * x) * y) ---> fabs(x) * sqrt(y)

...we need all of the sqrt, the first fmul, and the second fmul to be 'fast'. 
If any of those ops is strict, we should bail out.

Differential Revision: http://reviews.llvm.org/D15937

llvm-svn: 257400
2016-01-11 22:34:19 +00:00
Silviu Baranga 603954ef0e Revert r257164 - it has caused spec2k6 failures in LTO mode
llvm-svn: 257340
2016-01-11 16:19:38 +00:00
Silviu Baranga 9e007efad2 Re-commit r257064, this time with a fixed assert
In setInsertionPoint if the value is not a PHI, Instruction or
Argument it should be a Constant, not a ConstantExpr.

Original commit message:

[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs

Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257164
2016-01-08 11:11:04 +00:00
Sanjay Patel d72a458d28 [InstCombine] insert a new shuffle in a safe place (PR25999)
Limit this transform to a basic block and guard against PHIs.
Hopefully, this fixes the remaining failures in PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999

llvm-svn: 257133
2016-01-08 01:39:16 +00:00
David Majnemer 867bbc775f Add test for r256912
I forgot to add this with the rest of r256912.

llvm-svn: 257088
2016-01-07 19:27:16 +00:00
Silviu Baranga dd68d46ec1 Revert r257064. It caused failures in some sanitizer tests.
llvm-svn: 257069
2016-01-07 15:46:43 +00:00
Silviu Baranga 57b1b90996 [InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the
base pointers were the same. However, in the case where we have
complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs,
conversions to or from integers, etc) the value of the original
base pointer will be hidden to the optimizer and this transformation
will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the
relevant uses of GEPs to GEPs with a common base pointer. The
GEP comparison will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257064
2016-01-07 14:56:08 +00:00
Sanjay Patel cddcd7256c [LibCallSimplifier] use instruction-level fast-math-flags for tan/atan transform
llvm-svn: 256964
2016-01-06 19:23:35 +00:00
Sanjay Patel 29095ea1b0 [LibCallSimplfier] use instruction-level fast-math-flags for fmin/fmax transforms
llvm-svn: 256871
2016-01-05 20:46:19 +00:00
Sanjay Patel a1c5347982 [InstCombine] insert a new shuffle before its uses (PR26015)
Although this solves the test case in PR26015:
https://llvm.org/bugs/show_bug.cgi?id=26015

And may solve PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999

...I suspect this is not the best solution. I think we want to insert the new shuffle
just ahead of the earliest ExtractElementInst that we're replacing, but I don't know 
how that should be implemented.

Differential Revision: http://reviews.llvm.org/D15878

llvm-svn: 256857
2016-01-05 19:09:47 +00:00
Chen Li c6021038f6 [InstructionCombining] prepareICWorklistFromFunction halts in infinite loop with instructions of token type
Summary: This patch fixes a bug in prepareICWorklistFromFunction, where the loop becomes infinite with instructions of token type. The patch checks if the instruction is token type, and if so it updates EndInst with the current instruction.

Reviewers: reames, majnemer

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D15859

llvm-svn: 256792
2016-01-04 23:28:57 +00:00
Sanjay Patel bee05caa6b [LibCallSimplifier] propagate FMF when shrinking binary calls
llvm-svn: 256682
2015-12-31 23:40:59 +00:00
Sanjay Patel aa23114cb4 [LibCallSimplifier] propagate FMF when shrinking unary calls
llvm-svn: 256679
2015-12-31 21:52:31 +00:00
Sanjay Patel f6f32bcaa4 change function names to avoid accidentally matching the substring
llvm-svn: 256678
2015-12-31 21:25:25 +00:00
Sanjay Patel 4e8b300400 add 'fast' attribute to calls to show that the flag isn't being propagated
llvm-svn: 256677
2015-12-31 21:12:19 +00:00
Chandler Carruth 3a040e6d47 [attrs] Extract the pure inference of function attributes into
a standalone pass.

There is no call graph or even interesting analysis for this part of
function attributes -- it is literally inferring attributes based on the
target library identification. As such, we can do it using a much
simpler module pass that just walks the declarations. This can also
happen much earlier in the pass pipeline which has benefits for any
number of other passes.

In the process, I've cleaned up one particular aspect of the logic which
was necessary in order to separate the two passes cleanly. It now counts
inferred attributes independently rather than just counting all the
inferred attributes as one, and the counts are more clearly explained.

The two test cases we had for this code path are both ... woefully
inadequate and copies of each other. I've kept the superset test and
updated it. We need more testing here, but I had to pick somewhere to
stop fixing everything broken I saw here.

Differential Revision: http://reviews.llvm.org/D15676

llvm-svn: 256466
2015-12-27 08:41:34 +00:00
Chen Li d71999ef1b [gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. 

Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob

Subscribers: reames, mjacob, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15662

llvm-svn: 256443
2015-12-26 07:54:32 +00:00
Sanjay Patel ae945e7927 [InstCombine] transform more extract/insert pairs into shuffles (PR2109)
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229

The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.

The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109

For that example, the IR becomes:

%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2

And x86 SSE output improves from:

movq	(%rdi), %xmm1           ## xmm1 = mem[0],zero
movdqa	%xmm1, %xmm2
shufps	$229, %xmm2, %xmm2      ## xmm2 = xmm2[1,1,2,3]
shufps	$48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps	$132, %xmm1, %xmm0      ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps	$32, %xmm0, %xmm2       ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps	$36, %xmm2, %xmm0       ## xmm0 = xmm0[0,1],xmm2[2,0]
retq

To the almost optimal:

movhpd	(%rdi), %xmm0

Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases. 
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.

Differential Revision: http://reviews.llvm.org/D15096

llvm-svn: 256394
2015-12-24 21:17:56 +00:00
David Majnemer 02f4787e45 [OperandBundles] Have InstCombine play nice with operand bundles
Don't assume a call's use corresponds to an argument operand, it might
correspond to a bundle operand.

llvm-svn: 256327
2015-12-23 09:58:41 +00:00
Philip Reames d7a6cc859a [InstCombine] Extend peephole DSE to handle unordered atomics
This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine.

Key points:
 * We only remove unordered or simple stores.
 * The loads producing values consumed by dead stores don't influence whether the store is dead.

Differential Revision: http://reviews.llvm.org/D15354

llvm-svn: 255932
2015-12-17 22:19:27 +00:00
Nicolai Hahnle 78fd4f087b AMDGPU: mark ldexp LibCalls as unavailable
Summary:
The LibCallSimplifier will turn llvm.exp2.* intrinsics into ldexp* libcalls
which do not make sense with the AMDGPU backend.

In the long run, we'll want an llvm.ldexp.* intrinsic to properly make use of
this optimization, but this works around the problem for now.

See also: http://reviews.llvm.org/D14327 (suggested llvm.ldexp.* implementation)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709

Reviewers: arsenm, tstellarAMD

Differential Revision: http://reviews.llvm.org/D14990

llvm-svn: 255658
2015-12-15 17:24:15 +00:00
Mehdi Amini 1c131b37ed Instcombine: destructor loads of structs that do not contains padding
For non padded structs, we can just proceed and deaggregate them.
We don't want ot do this when there is padding in the struct as to not
lose information about this padding (the subsequents passes would then
try hard to preserve the padding, which is undesirable).

Also update extractvalue.ll and cast.ll so that they use structs with padding.

Remove the FIXME in the extractvalue of laod case as the non padded case is
handled when processing the load, and we don't want to do it on the padded
case.

Patch by: Amaury SECHET <deadalnix@gmail.com>

Differential Revision: http://reviews.llvm.org/D14483

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255600
2015-12-15 01:44:07 +00:00
Sanjay Patel fa54acedd1 add fast-math-flags to 'call' instructions (PR21290)
This patch adds optional fast-math-flags (the same that apply to fmul/fadd/fsub/fdiv/frem/fcmp)
to call instructions in IR. Follow-up patches would use these flags in LibCallSimplifier, add 
support to clang, and extend FMF to the DAG for calls.

Motivating example:

%y = fmul fast float %x, %x
%z = tail call float @sqrtf(float %y)

We'd like to be able to optimize sqrt(x*x) into fabs(x). We do this today using a function-wide
attribute for unsafe-math, but we really want to trigger on the instructions themselves:

%z = tail call fast float @sqrtf(float %y)

because in an LTO build it's possible that calls with fast semantics have been inlined into a
function with non-fast semantics.

The code changes and tests are based on the recent commits that added "notail":
http://reviews.llvm.org/rL252368

and added FMF to fcmp:
http://reviews.llvm.org/rL241901

Differential Revision: http://reviews.llvm.org/D14707

llvm-svn: 255555
2015-12-14 21:59:03 +00:00
Sanjay Patel f727e387be [InstCombine] fold trunc ([lshr] (bitcast vector) ) --> extractelement (PR25543)
This is a fix for PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The idea is to take the existing fold of:
bitcast ( trunc ( lshr ( bitcast X))) --> extractelement (bitcast X)
( http://reviews.llvm.org/rL112232 )

And break it into less specific transforms so we'll catch more cases such as
the example in the bug report:
bitcast ( trunc ( lshr ( bitcast X))) -->
bitcast ( extractelement (bitcast X)) -->
extractelement (bitcast X)

Enabling patches for this change:
http://reviews.llvm.org/rL255399 (combine bitcasts)
http://reviews.llvm.org/rL255433 (canonicalize extractelement(bitcast X))

Differential Revision: http://reviews.llvm.org/D15392

llvm-svn: 255504
2015-12-14 16:16:54 +00:00
Sanjay Patel 1d49fc9b27 [InstCombine] canonicalize (bitcast (extractelement X)) --> (extractelement(bitcast X))
This change was discussed in D15392. It allows us to remove the fold that was added
in:
http://reviews.llvm.org/r255261

...and it will allow us to generalize this fold:
http://reviews.llvm.org/rL112232

while preserving the order of bitcast + extract that it produces and testing shows
is better handled by the backend.

Note that the existing check for "isVectorTy()" wasn't strong enough in general
and specifically because: x86_mmx. It's not a vector, but it's not vectorizable
either. So here we check VectorType::isValidElementType() directly before 
proceeding with the transform.

llvm-svn: 255433
2015-12-12 16:44:48 +00:00
David Majnemer 8a1c45d6e8 [IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
  but they are difficult to explain to others, even to seasoned LLVM
  experts.
- catchendpad and cleanupendpad are optimization barriers.  They cannot
  be split and force all potentially throwing call-sites to be invokes.
  This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
  It is unsplittable, starts a funclet, and has control flow to other
  funclets.
- The nesting relationship between funclets is currently a property of
  control flow edges.  Because of this, we are forced to carefully
  analyze the flow graph to see if there might potentially exist illegal
  nesting among funclets.  While we have logic to clone funclets when
  they are illegally nested, it would be nicer if we had a
  representation which forbade them upfront.

Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
  flow, just a bunch of simple operands;  catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
  the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
  the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad.  Their presence can be inferred
  implicitly using coloring information.

N.B.  The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for.  An expert should take a
look to make sure the results are reasonable.

Reviewers: rnk, JosephTremoulet, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D15139

llvm-svn: 255422
2015-12-12 05:38:55 +00:00
Sanjay Patel 93f55dd36d [InstCombine] allow any pair of bitcasts to be combined
This change is discussed in D15392 and should allow us to effectively
revert:
http://llvm.org/viewvc/llvm-project?view=revision&revision=255261
if we canonicalize bitcasts ahead of extracts.

It should be safe to convert any pair of bitcasts into a single bitcast, 
however, it was mentioned here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110829/127089.html
that we're not allowed to bitcast from an x86_mmx to some other types, but I'm 
not seeing any failures from that, and we have regression tests in CodeGen/X86
that appear to cover all of those cases. 

Some day we'll get to remove that MMX wart from LLVM IR completely?

Differential Revision: http://reviews.llvm.org/D15468

llvm-svn: 255399
2015-12-12 00:33:36 +00:00
Sanjay Patel ffde9e14a2 use FileCheck for better checking
llvm-svn: 255394
2015-12-12 00:01:10 +00:00
Sanjay Patel d497ad43da Add tests for bitcast-bitcast sequences for all scalar/vector permutations
As noted in http://reviews.llvm.org/D15392 , we should be able to improve this.

llvm-svn: 255370
2015-12-11 20:26:30 +00:00
James Molloy 37b82e79b2 [InstCombine] Make MatchBSwap also match bit reversals
MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse.

llvm-svn: 255334
2015-12-11 10:04:51 +00:00
Sanjay Patel c83fd9554a [InstCombine] fold bitcasts around an extractelement (3rd try)
This is a redo of r255137 (reverted at r255227) which was a redo of 
r255124 (reverted at r255126) with a fixed check for a scalar source 
type and an added test for the failure that caused the revert.

Original commit message:

Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255261
2015-12-10 17:09:28 +00:00
Akira Hatanaka a3c0e8e1ba Revert r255137.
This commit broke apple's internal bot.

llvm-svn: 255227
2015-12-10 08:00:52 +00:00
Sanjay Patel b67e6b6044 [InstCombine] fold bitcasts around an extractelement (2nd try)
This is a redo of r255124 (reverted at r255126) with an added check for a
scalar destination type and an added test for the failure seen in Clang's
test/CodeGen/vector.c. The extra test shows a different missing optimization.

Original commit message:

Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255137
2015-12-09 18:57:16 +00:00
Mehdi Amini 4e2b7c454c Revert "[InstCombine] fold bitcasts around an extractelement"
This reverts commit r255124.

Broke http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/4193/steps/test/logs/stdio

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255126
2015-12-09 16:31:39 +00:00
Sanjay Patel 07410ed234 [InstCombine] fold bitcasts around an extractelement
Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255124
2015-12-09 16:17:20 +00:00
Sanjoy Das 9fe86d90ab [InstCombine] Call getCmpPredicateForMinMax only with a valid SPF
Summary:
There are `SelectPatternFlavor`s that don't represent min or max idioms,
and we should not be passing those to `getCmpPredicateForMinMax`.

Fixes PR25745.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15249

llvm-svn: 254869
2015-12-05 23:44:22 +00:00
Weiming Zhao 8213072a45 [SimplifyLibCalls] Optimization for pow(x, n) where n is some constant
Summary:
    In order to avoid calling pow function we generate repeated fmul when n is a
    positive or negative whole number.
    
    For each exponent we pre-compute Addition Chains in order to minimize the no.
    of fmuls.
    Refer: http://wwwhomes.uni-bielefeld.de/achim/addition_chain.html
    
    We pre-compute addition chains for exponents upto 32 (which results in a max of
    7 fmuls).

    For eg:
    4 = 2+2
    5 = 2+3
    6 = 3+3 and so on
    
    Hence,
    pow(x, 4.0) ==> y = fmul x, x
                    x = fmul y, y
                    ret x

    For negative exponents, we simply compute the reciprocal of the final result.
    
    Note: This transformation is only enabled under fast-math.
    
    Patch by Mandeep Singh Grang <mgrang@codeaurora.org>

Reviewers: weimingz, majnemer, escha, davide, scanon, joerg

Subscribers: probinson, escha, llvm-commits

Differential Revision: http://reviews.llvm.org/D13994

llvm-svn: 254776
2015-12-04 22:00:47 +00:00
David Majnemer f6665f65b7 [Analysis] Become aware of MSVC's new/delete functions
The compiler can take advantage of the allocation/deallocation
function's properties.  We knew how to do this for Itanium but had no
support for MSVC-style functions.

llvm-svn: 254656
2015-12-03 22:45:19 +00:00
David Majnemer 942003acc6 Do (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1 rather than (A == C1 || A == C2) -> (A | (C1 ^ C2)) == C2 when C1 ^ C2 is a power of 2.
Differential Revision: http://reviews.llvm.org/D14223

Patch by Amaury SECHET!

llvm-svn: 254518
2015-12-02 16:15:07 +00:00
Sanjay Patel 8b1fb3daba [InstCombine] add tests to show potential vector IR shuffle transforms
llvm-svn: 254342
2015-11-30 22:39:36 +00:00
Davide Italiano 9c26161b2e [SimplifyLibCalls] Remove useless bits of this tests.
llvm-svn: 254318
2015-11-30 19:38:35 +00:00
Davide Italiano 1aeed6a955 [SimplifyLibCalls] Transform log(exp2(y)) to y*log(2) under fast-math.
llvm-svn: 254317
2015-11-30 19:36:35 +00:00
Davide Italiano 0b14f29285 [SimplifyLibCalls] Don't crash if the function doesn't have a name.
llvm-svn: 254265
2015-11-29 21:58:56 +00:00
Davide Italiano b8b7133c94 [SimplifyLibCalls] Tranform log(pow(x, y)) -> y*log(x).
This one is enabled only under -ffast-math. There are cases where the
difference between the value computed and the correct value is huge
even for ffast-math, e.g. as Steven pointed out:

x = -1, y = -4
log(pow(-1), 4) = 0
4*log(-1) = NaN

I checked what GCC does and apparently they do the same optimization
(which result in the dramatic difference). Future work might try to
make this (slightly) less worse.

Differential Revision:	http://reviews.llvm.org/D14400

llvm-svn: 254263
2015-11-29 20:58:04 +00:00
Benjamin Kramer fb419e71f4 [SimplifyLibCalls] Don't depend on a called function having a name, it might be an indirect call.
Fixes the crasher in PR25651 and related crashers using the same pattern.

llvm-svn: 254145
2015-11-26 09:51:17 +00:00
Sanjoy Das 7629346193 [InstCombine] Don't drop operand bundles
Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14857

llvm-svn: 254046
2015-11-25 00:42:19 +00:00
Sanjay Patel 968e91aea0 [InstCombine] fix propagation of fast-math-flags
Noticed while working on D4583:
http://reviews.llvm.org/D4583

llvm-svn: 253997
2015-11-24 17:51:20 +00:00
Rafael Espindola d1beb07d39 Have a single way for creating unique value names.
We had two code paths. One would create names like "foo.1" and the other
names like "foo1".

For globals it is important to use "foo.1" to help C++ name demangling.
For locals there is no strong reason to go one way or the other so I
kept the most common mangling (foo1).

llvm-svn: 253804
2015-11-22 00:16:24 +00:00
Sanjay Patel 42afa272ed move a single test case to where most other instcombine shuffle bug test cases exist
llvm-svn: 253784
2015-11-21 16:12:58 +00:00
Sanjay Patel c4aa50414b [InstCombine] add tests to show missing trunc optimizations
llvm-svn: 253609
2015-11-19 22:11:52 +00:00
Sanjay Patel f1c2370c48 [InstCombine] add tests to show missing bitcast optimizations
llvm-svn: 253602
2015-11-19 21:32:25 +00:00
Pete Cooper 67cf9a723b Revert "Change memcpy/memset/memmove to have dest and source alignments."
This reverts commit r253511.

This likely broke the bots in
http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/20202
http://bb.pgr.jp/builders/clang-3stage-i686-linux/builds/3787

llvm-svn: 253543
2015-11-19 05:56:52 +00:00
Davide Italiano c5cedd195a [SimplifyLibCalls] New trick: pow(x, 0.5) -> sqrt(x) under -ffast-math.
Differential Revision:	http://reviews.llvm.org/D14466

llvm-svn: 253521
2015-11-18 23:21:32 +00:00
Pete Cooper 72bc23ef02 Change memcpy/memset/memmove to have dest and source alignments.
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html

These intrinsics currently have an explicit alignment argument which is
required to be a constant integer.  It represents the alignment of the
source and dest, and so must be the minimum of those.

This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments.  The alignment
argument itself is removed.

There are a few places in the code for which the code needs to be
checked by an expert as to whether using only src/dest alignment is
safe.  For those places, they currently take the minimum of src/dest
alignments which matches the current behaviour.

For example, code which used to read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 500, i32 8, i1 false)
will now read:
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 500, i1 false)

For out of tree owners, I was able to strip alignment from calls using sed by replacing:
  (call.*llvm\.memset.*)i32\ [0-9]*\,\ i1 false\)
with:
  $1i1 false)

and similarly for memmove and memcpy.

I then added back in alignment to test cases which needed it.

A similar commit will be made to clang which actually has many differences in alignment as now
IRBuilder can generate different source/dest alignments on calls.

In IRBuilder itself, a new argument was added.  Instead of calling:
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, /* isVolatile */ false)
you now call
  CreateMemCpy(Dst, Src, getInt64(Size), DstAlign, SrcAlign, /* isVolatile */ false)

There is a temporary class (IntegerAlignment) which takes the source alignment and rejects
implicit conversion from bool.  This is to prevent isVolatile here from passing its default
parameter to the source alignment.

Note, changes in future can now be made to codegen.  I didn't change anything here, but this
change should enable better memcpy code sequences.

Reviewed by Hal Finkel.

llvm-svn: 253511
2015-11-18 22:17:24 +00:00
Andrew Kaylor de642cef2c [EH] Keep filter clauses for types that have been caught.
The instruction combiner previously removed types from filter clauses in Landing Pad instructions if the type had previously been seen in a catch clause.  This is incorrect and prevents unexpected exception handlers from rethrowing the caught type.

Differential Revision: http://reviews.llvm.org/D14669

llvm-svn: 253370
2015-11-17 20:13:04 +00:00
Elena Demikhovsky 121d49b640 Fixed GEP visitor in the InstCombine pass.
The current implementation of GEP visitor in InstCombine fails with assertion on Vector GEP with mix of scalar and vector types, like this:

getelementptr double, double* %a, <8 x i32> %i
(It fails to create a "sext" from <8 x i32> to <8 x i64>)

I fixed it and added some tests.

Differential Revision: http://reviews.llvm.org/D14485

llvm-svn: 253162
2015-11-15 08:19:35 +00:00
James Molloy 2d09c00b91 [InstCombine] Add trivial folding (bitreverse (bitreverse x)) -> x
There are plenty more instcombines we could probably do with bitreverse, but this seems like a very obvious and trivial starting point and was brought up by Hal in his review.

llvm-svn: 252879
2015-11-12 12:39:41 +00:00
David Majnemer eafa28a0d9 [InstCombine] Teach FoldPHIArgZextsIntoPHI about EHPads
FoldPHIArgZextsIntoPHI cannot insert an instruction after the PHI if
there is an EHPad in the BB.  Doing so would result in an instruction
inserted after a terminator.

llvm-svn: 252377
2015-11-07 00:52:53 +00:00
David Majnemer 27f2447fb3 [InstCombine] Don't insert an instruction after a terminator
We tried to insert a cast of a phi in a block whose terminator is an
EHPad.  This is invalid.  Do not attempt the transform in these
circumstances.

llvm-svn: 252370
2015-11-06 23:59:23 +00:00
David Majnemer 7204cff0a1 [InstCombine] Don't RAUW tokens with undef
Let SimplifyCFG remove unreachable BBs which define token instructions.

llvm-svn: 252343
2015-11-06 21:26:32 +00:00
Peter Collingbourne d4bff30370 DI: Reverse direction of subprogram -> function edge.
Previously, subprograms contained a metadata reference to the function they
described. Because most clients need to get or set a subprogram for a given
function rather than the other way around, this created unneeded inefficiency.

For example, many passes needed to call the function llvm::makeSubprogramMap()
to build a mapping from functions to subprograms, and the IR linker needed to
fix up function references in a way that caused quadratic complexity in the IR
linking phase of LTO.

This change reverses the direction of the edge by storing the subprogram as
function-level metadata and removing DISubprogram's function field.

Since this is an IR change, a bitcode upgrade has been provided.

Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is
attached to the PR.

Differential Revision: http://reviews.llvm.org/D14265

llvm-svn: 252219
2015-11-05 22:03:56 +00:00
Davide Italiano 51507d2ad8 [SimplifyLibCalls] New transformation: tan(atan(x)) -> x
This is enabled only under -ffast-math.
So, instead of emitting:
  4007b0:       50                      push   %rax
  4007b1:       e8 8a fd ff ff          callq  400540 <atanf@plt>
  4007b6:       58                      pop    %rax
  4007b7:       e9 94 fd ff ff          jmpq   400550 <tanf@plt>
  4007bc:       0f 1f 40 00             nopl   0x0(%rax)

for:
float mytan(float x) {
  return tanf(atanf(x));
}
we emit a single retq.

Differential Revision:	 http://reviews.llvm.org/D14302

llvm-svn: 252098
2015-11-04 23:36:56 +00:00
Davide Italiano c8a7913f23 [SimplifyLibCalls] Add a new transformation: pow(exp(x), y) -> exp(x*y)
This one is enabled only under -ffast-math (due to rounding/overflows)
but allows us to emit shorter code.

Before (on FreeBSD x86-64):
4007f0:       50                      push   %rax
4007f1:       f2 0f 11 0c 24          movsd  %xmm1,(%rsp)
4007f6:       e8 75 fd ff ff          callq  400570 <exp2@plt>
4007fb:       f2 0f 10 0c 24          movsd  (%rsp),%xmm1
400800:       58                      pop    %rax
400801:       e9 7a fd ff ff          jmpq   400580 <pow@plt>
400806:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
40080d:       00 00 00

After:
4007b0:       f2 0f 59 c1             mulsd  %xmm1,%xmm0
4007b4:       e9 87 fd ff ff          jmpq   400540 <exp2@plt>
4007b9:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)

Differential Revision:	http://reviews.llvm.org/D14045

llvm-svn: 251976
2015-11-03 20:32:23 +00:00
Tim Northover 89a6eefe6f TvOS: add missing support for some libcalls.
llvm-svn: 251811
2015-11-02 18:00:00 +00:00
Artur Pilipenko 5c5011d503 Preserve load alignment and dereferenceable metadata during some transformations
Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D13953

llvm-svn: 251809
2015-11-02 17:53:51 +00:00
Davide Italiano 7f253aa63c [SimplifyLibCalls] Add test to ensure transform is not executed if fast-math
attribute is not present.

During my refactor in r251595 I changed the behavior of optimizeSqrt(),
skipping the transformation if the function wasn't marked with unsafe-fp-math
attribute. This fixed a bug, as confirmed by Sanjay (before the optimization
was silently executed anyway), although it wasn't my primary aim.
This commit adds a test to ensure the code doesn't break again.

Reported by: Marcello Maggioni
Discussed with: Sanjay Patel

llvm-svn: 251747
2015-10-31 20:59:32 +00:00
Silviu Baranga b892e35520 [InstCombine] Teach instcombine not to create extra PHI nodes when folding GEPs
Summary:
InstCombine tries to transform GEP(PHI(GEP1, GEP2, ..)) into GEP(GEP(PHI(...))
when possible. However, this may leave the old PHI node around. Even if we
do end up folding the GEPs, having an extra PHI node might not be beneficial.

This change makes the transformation more conservative. We now only do this if
the PHI has only one use, and can therefore be removed after the transformation.

Reviewers: jmolloy, majnemer

Subscribers: mcrosier, mssimpso, llvm-commits

Differential Revision: http://reviews.llvm.org/D13887

llvm-svn: 251281
2015-10-26 10:25:05 +00:00
Hal Finkel f2199b2178 Handle non-constant shifts in computeKnownBits, and use computeKnownBits for constant folding in InstCombine/Simplify
First, the motivation: LLVM currently does not realize that:

  ((2072 >> (L == 0)) >> 7) & 1 == 0

where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the
lowest-order bit is always zero. There are obviously several ways to go about
fixing this, but the generic solution pursued in this patch is to teach
computeKnownBits something about shifts by a non-constant amount. Previously,
we would give up completely on these. Instead, in cases where we know something
about the low-order bits of the shift-amount operand, we can combine (and
together) the associated restrictions for all shift amounts consistent with
that knowledge. As a further generalization, I refactored all of the logic for
all three kinds of shifts to have this capability. This works well in the above
case, for example, because the dynamic shift amount can only be 0 or 1, and
thus we can say a lot about the known bits of the result.

This brings us to the second part of this change: Even when we know all of the
bits of a value via computeKnownBits, nothing used to constant-fold the result.
This introduces the necessary code into InstCombine and InstSimplify. I've
added it into both because:

  1. InstCombine won't automatically pick up the associated logic in
     InstSimplify (InstCombine uses InstSimplify, but not via the API that
     passes in the original instruction).

  2. Putting the logic in InstCombine allows the resulting simplifications to become
     part of the iterative worklist

  3. Putting the logic in InstSimplify allows the resulting simplifications to be
     used by everywhere else that calls SimplifyInstruction (inlining, unrolling,
     and many others).

And this requires a small change to our definition of an ephemeral value so
that we don't break the rest case from r246696 (where the icmp feeding the
@llvm.assume, is also feeding a br). Under the old definition, the icmp would
not be considered ephemeral (because it is used by the br), but this causes the
assume to remove itself (in addition to simplifying the branch structure), and
it seems more-useful to prevent that from happening.

llvm-svn: 251146
2015-10-23 20:37:08 +00:00
Michael Liao 446c714a76 [InstCombine] Revise the test case to match full sequene
llvm-svn: 250950
2015-10-21 21:50:58 +00:00
Michael Liao c65d386b81 [InstCombine] Optimize icmp of inc/dec at RHS
Allow LLVM to optimize the sequence like the following:

  %inc = add nsw i32 %i, 1
  %cmp = icmp slt %n, %inc

into:

  %cmp = icmp sle i32 %n, %i

The case is not handled previously due to the complexity of compuation of %n.
Hence, LLVM cannot swap operands of icmp accordingly.

llvm-svn: 250746
2015-10-19 22:08:14 +00:00
Simon Pilgrim 216b1bf5ed [InstCombine] SSE4A constant folding and conversion to shuffles.
This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:

1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling

Differential Revision: http://reviews.llvm.org/D13348

llvm-svn: 250609
2015-10-17 11:40:05 +00:00
Philip Reames ddcf6b35a2 Tighten known bits for ctpop based on zero input bits
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.

The basic idea here is that input bits that are known zero reduce the maximum count that the intrinsic could return. We know that the number of bits required to represent a particular count is at most log2(N)+1.

Differential Revision: http://reviews.llvm.org/D13253

llvm-svn: 250338
2015-10-14 22:42:12 +00:00
Simon Pilgrim 3c2b30f8ba [InstCombine][SSE4A] Remove broken INSERTQI range combining optimization
As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index.

The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization.

llvm-svn: 250160
2015-10-13 14:48:54 +00:00
Simon Pilgrim aa0ec7f45c [InstCombine] Tidied up SSE4A tests.
First stage of bugfix discussed in D13348

llvm-svn: 250121
2015-10-12 23:07:06 +00:00
Simon Pilgrim 1d1c56e2df [InstCombine][X86][XOP] Combine XOP integer vector comparisons to native IR
We now have lowering support for XOP PCOM/PCOMU instructions.

llvm-svn: 249977
2015-10-11 14:38:34 +00:00
Sanjay Patel f61a08fbf1 [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886)
This is a partial fix for PR24886:
https://llvm.org/bugs/show_bug.cgi?id=24886

Without this IR transform, the backend (x86 at least) was producing inefficient code.

This patch is making 2 assumptions:

    1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic.
    2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef.

Differential Revision: http://reviews.llvm.org/D13076

llvm-svn: 249702
2015-10-08 17:09:31 +00:00
Sanjay Patel 9115cf8c9d [ValueTracking] teach computeKnownBits that a fabs() clears sign bits
This was requested in D13076: if we're going to canonicalize to fabs(), ValueTracking
should know that fabs() clears sign bits.

In this patch (as in D13076), we're not handling vectors yet even though computeKnownBits'
fabs() case itself should be vector-ready via the splat in this patch. 
Fixing this will require follow-on patches to correct other logic that uses 'getScalarType'.

Differential Revision: http://reviews.llvm.org/D13222

llvm-svn: 249701
2015-10-08 16:56:55 +00:00
Artur Pilipenko d94903c9f8 Teach computeKnownBits to use new align attribute/metadata
Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D13470

llvm-svn: 249557
2015-10-07 16:01:18 +00:00
Hans Wennborg f1f36517b7 InstCombine: Fold comparisons between unguessable allocas and other pointers
This will allow us to optimize code such as:

  int f(int *p) {
    int x;
    return p == &x;
  }

as well as:

  int *allocate(void);
  int f() {
    int x;
    int *p = allocate();
    return p == &x;
  }

The folding can only be done under certain circumstances. Even though p and &x
cannot alias, the comparison must still return true if the pointer
representations are equal. If a user successfully generates a p that's a
correct guess for &x, comparison should return true even though p is an invalid
pointer.

This patch argues that if the address of the alloca isn't observable outside the
function, the function can act as-if the address is impossible to guess from the
outside. The tricky part is keeping the act consistent: if we fold p == &x to
false in one place, we must make sure to fold any other comparisons based on
those pointers similarly. To ensure that, we only fold when &x is involved
exactly once in comparison instructions.

Differential Revision: http://reviews.llvm.org/D13358

llvm-svn: 249490
2015-10-07 00:20:07 +00:00
Philip Reames 675418ebc0 Extend known bits to understand @llvm.bswap
This is a cleaned up patch from the one written by John Regehr based on the findings of the Souper superoptimizer.

When writing tests, I was surprised to find that instsimplify apparently doesn't know how to collapse bit test sequences based purely on known bits. This required me to split my tests across both instsimplify and instcombine.

Differential Revision: http://reviews.llvm.org/D13250

llvm-svn: 249453
2015-10-06 20:20:45 +00:00
Andrea Di Biagio 40f59e4466 [InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector select masks with ConstantExpr elements (PR24922)
If the mask of a select instruction is a ConstantVector, method
SimplifyDemandedVectorElts iterates over the mask elements to identify which
values are selected from the select inputs.

Before this patch, method SimplifyDemandedVectorElts always used method
Constant::isNullValue() to check if a value in the mask was zero. Unfortunately
that method always returns false when called on a ConstantExpr.

This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit
check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we
avoid calling isNullValue() on it.

Fixes PR24922.

Differential Revision: http://reviews.llvm.org/D13219

llvm-svn: 249390
2015-10-06 10:34:53 +00:00
Bruno Cardoso Lopes b491a2d641 [SimplifyLibCalls] Fix instruction misplacement in string/memory libcall optimization
When trying to optimize fortified library functions use the right
location to insert new instructions in order to preserve correct
def-use order.

This fixes an issue where a misplaced instruction definition would
happen to be *after* one of its use after a RAUW, forming invalid IR.
This behavior was introduced by r227250.

Differential Revision: http://reviews.llvm.org/D13301

rdar://problem/22802369

llvm-svn: 249092
2015-10-01 22:43:53 +00:00
Arnaud A. de Grandmaison 849f3bf8c9 [InstCombine] Remove trivially empty lifetime start/end ranges.
Summary:
Some passes may open up opportunities for optimizations, leaving empty
lifetime start/end ranges. For example, with the following code:

    void foo(char *, char *);
    void bar(int Size, bool flag) {
      for (int i = 0; i < Size; ++i) {
        char text[1];
        char buff[1];
        if (flag)
          foo(text, buff); // BBFoo
      }
    }

the loop unswitch pass will create 2 versions of the loop, one with
flag==true, and the other one with flag==false, but always leaving
the BBFoo basic block, with lifetime ranges covering the scope of the for
loop. Simplify CFG will then remove BBFoo in the case where flag==false,
but will leave the lifetime markers.

This patch teaches InstCombine to remove trivially empty lifetime marker
ranges, that is ranges ending right after they were started (ignoring
debug info or other lifetime markers in the range).

This fixes PR24598: excessive compile time after r234581.

Reviewers: reames, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13305

llvm-svn: 249018
2015-10-01 14:54:31 +00:00
Andrea Di Biagio 0594e2a1e9 [InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin shuffles if the shuffle mask is constant.
This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a
builtin shuffle if the mask is constant.

Converting byte shuffle intrinsic calls to builtin shuffles can help finding
more opportunities for combining shuffles later on in selection dag.

We may end up with byte shuffles with constant masks as the result of inlining.

Differential Revision: http://reviews.llvm.org/D13252

llvm-svn: 248913
2015-09-30 16:44:39 +00:00
Jeroen Ketema ab99b59e8c [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
Simon Pilgrim 43f5e0848e [InstCombine] Improve Vector Demanded Bits Through Bitcasts
Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements.

This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector).

This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts.

Differential Revision: http://reviews.llvm.org/D12935

llvm-svn: 248784
2015-09-29 08:19:11 +00:00
Sanjay Patel 9533407566 [InstCombine] fold zexts and constants into a phi (PR24766)
This is one step towards solving PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766

We were not producing the same IR for these two C functions because the store
to the temp bool causes extra zexts:

#include <stdbool.h>

bool switchy(char x1, char x2, char condition) {
   bool conditionMet = false;
   switch (condition) {
   case 0: conditionMet = (x1 == x2); break;
   case 1: conditionMet = (x1 <= x2); break;
   }
   return conditionMet;
}

bool switchy2(char x1, char x2, char condition) {
   switch (condition) {
   case 0: return (x1 == x2);
   case 1: return (x1 <= x2);
   }
  return false;
}

As noted in the code comments, this test case manages to avoid the more general existing
phi optimizations where there are only 2 phi inputs or where there are no constant phi 
args mixed in with the casts ops. It seems like a corner case, but if we don't catch it, 
then I don't think we can get SimplifyCFG to further optimize towards the canonical form
for this function shown in the bug report.

Differential Revision: http://reviews.llvm.org/D12866

llvm-svn: 248689
2015-09-27 20:34:31 +00:00
Simon Pilgrim 91717ee233 [InstCombine] Removed unnecessary meta attributes.
llvm-svn: 248672
2015-09-26 17:49:04 +00:00
Chen Li 7452d95656 [Bug 24848] Use range metadata to constant fold comparisons between two values
Summary:
This is the second part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848.

If both operands of a comparison have range metadata, they should be used to constant fold the comparison.

Reviewers: sanjoy, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13177

llvm-svn: 248650
2015-09-26 03:26:47 +00:00
Sanjay Patel e1b09caaaf [InstCombine] match De Morgan's Law hidden by zext ops (PR22723)
This is a fix for PR22723:
https://llvm.org/bugs/show_bug.cgi?id=22723

My first attempt at this was to change what I thought was the root problem:

xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32

...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop!

My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would
mean potentially returning a different size for the match than what was input. I think
this would require all users of m_Not to check the size of the returned match, so I 
abandoned that idea.

I settled on just fixing the exact case presented in the PR. This patch does allow the
2 functions in PR22723 to compile identically (x86):

bool test(bool x, bool y) { return !x | !y; }
bool test(bool x, bool y) { return !x || !y; }
...
andb	%sil, %dil
xorb	$1, %dil
movb	%dil, %al
retq

Differential Revision: http://reviews.llvm.org/D12705

llvm-svn: 248634
2015-09-25 23:21:38 +00:00
Charlie Turner 2720593ab4 [InstCombine] Recognize another bswap idiom.
Summary:
The byte-swap recognizer can now notice that this

```
uint32_t bswap(uint32_t x)
{
  x = (x & 0x0000FFFF) << 16 | (x & 0xFFFF0000) >> 16;
  x = (x & 0x00FF00FF) << 8 | (x & 0xFF00FF00) >> 8;
  return x;
}
```
    
is a bswap. Fixes PR23863.

Reviewers: nlewycky, hfinkel, hans, jmolloy, rengolin

Subscribers: majnemer, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D12637

llvm-svn: 248482
2015-09-24 10:24:58 +00:00
Akira Hatanaka f6afd11538 [InstCombine] Preserve metadata when merging loads that are phi
arguments.

Make sure InstCombiner::FoldPHIArgLoadIntoPHI doesn't drop the following
metadata:

MD_tbaa
MD_alias_scope
MD_noalias
MD_invariant_load
MD_nonnull
MD_range

rdar://problem/17617709

Differential Revision: http://reviews.llvm.org/D12710

llvm-svn: 248419
2015-09-23 18:40:57 +00:00
Chen Li 5cd6deeae3 [Bug 24848] Use range metadata to constant fold comparisons with constant values
Summary:
This is the first part of fixing bug 24848 https://llvm.org/bugs/show_bug.cgi?id=24848.

When range metadata is provided, it should be used to constant fold comparisons with constant values.

Reviewers: sanjoy, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12988

llvm-svn: 248402
2015-09-23 17:58:44 +00:00
David Majnemer 47ce0b81b0 [InstCombine] FoldICmpCstShrCst failed for ashr when comparing against -1
(icmp eq (ashr C1, %V) -1) may have multiple answers if C1 is not a
power of two and has the sign bit set.

This fixes PR24873.

llvm-svn: 248074
2015-09-19 00:48:31 +00:00
Simon Pilgrim 61116ddc7b [InstCombine] Added vector demanded bits support for SSE4A EXTRQ/INSERTQ instructions
The SSE4A instructions EXTRQ/INSERTQ only use the lower 64-bits (or less) for many of their input vector operands and all of them have undefined upper 64-bits results.

Differential Revision: http://reviews.llvm.org/D12680

llvm-svn: 247934
2015-09-17 20:32:45 +00:00
Sanjoy Das e5f4889ba9 [InstCombine] Optimize icmp slt signum(x), 1 --> icmp slt x, 1
Summary:
`signum(x)` is sometimes implemented as `(x >> 63) | (-x >>> 63)` (for
an `i64` `x`).  This change adds a matcher for that pattern, and an
instcombine rule to optimize `signum(x) s< 1`.

Later, we can also consider optimizing:

  icmp slt signum(x), 0 --> icmp slt x, 0
  icmp sle signum(x), 1 --> true

etc.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12703

llvm-svn: 247846
2015-09-16 20:41:29 +00:00
Arch D. Robison 8ed0854f55 Broaden optimization of fcmp ([us]itofp x, constant) by instcombine.
The patch extends the optimization to cases where the constant's
magnitude is so small or large that the rounding of the conversion
is irrelevant.  The "so small" case includes negative zero.

Differential review: http://reviews.llvm.org/D11210

llvm-svn: 247708
2015-09-15 17:51:59 +00:00
Chen Li 0d043b52eb [InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG. It also adds assertions in isKnownNonNull() and isKnownNonNullFromDominatingCondition() to make sure the value checked is pointer type (as defined in LLVM document). These assertions might trip failures in things which are not  covered under llvm/test, but fixes should be pretty obvious. 

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12779

llvm-svn: 247587
2015-09-14 18:10:43 +00:00
Simon Pilgrim 20c607b110 [InstCombine] CVTPH2PS Vector Demanded Elements + Constant Folding
Improved InstCombine support for CVTPH2PS (F16C half 2 float conversion):

<4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16>) - only uses the bottom 4 i16 elements for the conversion.

Added constant folding support.

Differential Revision: http://reviews.llvm.org/D12731

llvm-svn: 247504
2015-09-12 13:39:53 +00:00
David Blaikie 2f40830dde [opaque pointer type] Add textual IR support for explicit type parameter for global aliases
update.py:
import fileinput
import sys
import re

alias_match_prefix = r"(.*(?:=|:|^)\s*(?:external |)(?:(?:private|internal|linkonce|linkonce_odr|weak|weak_odr|common|appending|extern_weak|available_externally) )?(?:default |hidden |protected )?(?:dllimport |dllexport )?(?:unnamed_addr |)(?:thread_local(?:\([a-z]*\))? )?alias"
plain = re.compile(alias_match_prefix + r" (.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|addrspacecast|\[\[[a-zA-Z]|\{\{).*$)")
cast  = re.compile(alias_match_prefix + r") ((?:bitcast|inttoptr|addrspacecast)\s*\(.* to (.*?)(| addrspace\(\d+\) *)\*\)\s*(?:;.*)?$)")
gep   = re.compile(alias_match_prefix + r") ((?:getelementptr)\s*(?:inbounds)?\s*\((?P<type>.*), (?P=type)(?:\s*addrspace\(\d+\)\s*)?\* .*\)\s*(?:;.*)?$)")

def conv(line):
  m = re.match(cast, line)
  if m:
    return m.group(1) + " " + m.group(3) + ", " + m.group(2)
  m = re.match(gep, line)
  if m:
    return m.group(1) + " " + m.group(3) + ", " + m.group(2)
  m = re.match(plain, line)
  if m:
    return m.group(1) + ", " + m.group(2) + m.group(3) + "*" + m.group(4) + "\n"
  return line

for line in sys.stdin:
  sys.stdout.write(conv(line))

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

llvm-svn: 247378
2015-09-11 03:22:04 +00:00
Mehdi Amini 2bd08527ff Revert "[InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite"
This reverts commit r247356.

Breaks test/Transforms/InstCombine/pr8547.ll with:

Wrong types for attribute: byval inalloca nest noalias nocapture nonnull readnone readonly sret dereferenceable(1) dereferenceable_or_null(1)
  %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str, i64 0, i64 0), i32 nonnull %conv2) #0
LLVM ERROR: Broken function found, compilation aborted!

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247371
2015-09-11 01:33:48 +00:00
Chen Li a29c612ddd [InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12779

llvm-svn: 247356
2015-09-10 23:04:49 +00:00
Chen Li 32a51416e5 [InstCombineCalls] Use isKnownNonNullAt() to check nullness of gc.relocate return value
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of gc.relocate return value. In this way it can handle cases where the relocated value does not have nonnull attribute but has a dominating null check from the CFG.

Reviewers: reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D12772

llvm-svn: 247353
2015-09-10 22:35:41 +00:00
Jakub Kuderski 58ea4eeb9e There is a trunc(lshr (zext A), Cst) optimization in InstCombineCasts that
removes cast by performing the lshr on smaller types. However, currently there
is no trunc(lshr (sext A), Cst) variant.
This patch add such optimization by transforming trunc(lshr (sext A), Cst)
to ashr A, Cst.

Differential Revision: http://reviews.llvm.org/D12520

llvm-svn: 247271
2015-09-10 11:31:20 +00:00
David Majnemer d34dbf07bd Revert trunc(lshr (sext A), Cst) to ashr A, Cst
This reverts commit r246997, it introduced a regression (PR24763).

llvm-svn: 247180
2015-09-09 20:20:08 +00:00
Benjamin Kramer c3c183554b Merge or combine tests and convert to FileCheck.
- Move tests only exercising instsimplify to instsimplify's apint-or.ll
- Actually test the CHECK lines in instsimplify's apint-or.ll
- Merge the remaining tests in apint-or1.ll and apint-or2.ll, use FileCheck

llvm-svn: 247045
2015-09-08 18:36:56 +00:00
Sanjay Patel 21a145c341 add tests for De Morgan instcombines based on PR22723
llvm-svn: 247040
2015-09-08 18:13:03 +00:00
Sanjay Patel 35f673ef08 fix typos, remove noise; NFCI
llvm-svn: 247035
2015-09-08 17:58:22 +00:00
Jakub Kuderski 7cd4810021 There is a trunc(lshr (zext A), Cst) optimization in InstCombineCasts that
removes cast by performing the lshr on smaller types. However, currently there
is no trunc(lshr (sext A), Cst) variant.
This patch add such optimization by transforming trunc(lshr (sext A), Cst)
to ashr A, Cst.

Differential Revision: http://reviews.llvm.org/D12520

llvm-svn: 246997
2015-09-08 10:03:17 +00:00
Sanjay Patel de28573e49 add missing regression tests for De Morgan's Law transform in InstCombine
llvm-svn: 246973
2015-09-07 19:00:38 +00:00
David Majnemer 135ca40a7d [InstCombine] Don't divide by zero when evaluating a potential transform
Trivial multiplication by zero may survive the worklist.  We tried to
reassociate the multiplication with a division instruction, causing us
to divide by zero; bail out instead.

This fixes PR24726.

llvm-svn: 246939
2015-09-06 06:49:59 +00:00
David Majnemer daa24b9789 [InstCombine] Don't assume m_Mul gives back an Instruction
This fixes PR24713.

llvm-svn: 246933
2015-09-05 20:44:56 +00:00
Duncan P. N. Exon Smith 814b8e91c7 DI: Require subprogram definitions to be distinct
As a follow-up to r246098, require `DISubprogram` definitions
(`isDefinition: true`) to be 'distinct'.  Specifically, add an assembler
check, a verifier check, and bitcode upgrading logic to combat testcase
bitrot after the `DIBuilder` change.

While working on the testcases, I realized that
test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore.  Its
purpose was to check for a corner case in PR22792 where two subprogram
definitions match exactly and share the same metadata node.  The new
verifier check, requiring that subprogram definitions are 'distinct',
precludes that possibility.

I updated almost all the IR with the following script:

    git grep -l -E -e '= !DISubprogram\(.* isDefinition: true' |
    grep -v test/Bitcode |
    xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true\)/= distinct \1/'

Likely some variant of would work for out-of-tree testcases.

llvm-svn: 246327
2015-08-28 20:26:49 +00:00
Sanjoy Das 6f5dca70ed [InstCombine] Fix PR24605.
PR24605 is caused due to an incorrect insert point in instcombine's IR
builder.  When simplifying

  %t = add X Y
  ...
  %m = icmp ... %t

the replacement for %t should be placed before %t, not before %m, as
there could be a use of %t between %t and %m.

llvm-svn: 246315
2015-08-28 19:09:31 +00:00
Chad Rosier dc65532fd9 Optimize memcmp(x,y,n)==0 for small n and suitably aligned x/y.
http://reviews.llvm.org/D6952
PR20673

llvm-svn: 246313
2015-08-28 18:30:18 +00:00
Pete Cooper 6b716218fa isKnownNonNull needs to consider globals in non-zero address spaces.
Globals in address spaces other than one may have 0 as a valid address,
so we should not assume that they can be null.

Reviewed by Philip Reames.

llvm-svn: 246137
2015-08-27 03:16:29 +00:00
Sanjoy Das c86c162a58 Re-apply r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0"
The original checkin was buggy, this change has a fix.

Original commit message:

[InstCombine] Transform A & (L - 1) u< L --> L != 0

Summary:

This transform is never a pessimization at the IR level (since it
replaces an `icmp` with another), and has potentiall payoffs:

 1. It may make the `icmp` fold away or become loop invariant.
 2. It may make the `A & (L - 1)` computation dead.

This shows up in Java, in range checks generated by array accesses of
the form `a[i & (a.length - 1)]`.

Reviewers: reames, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12210

llvm-svn: 245753
2015-08-21 22:22:37 +00:00
Simon Pilgrim 76b91d9084 Line endings fix.
llvm-svn: 245736
2015-08-21 21:09:51 +00:00
NAKAMURA Takumi 6a6232818d Revert r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0"
It caused miscompilation in clang.

llvm-svn: 245678
2015-08-21 07:46:07 +00:00
Sanjoy Das e472d8a57a [InstCombine] Transform A & (L - 1) u< L --> L != 0
Summary:
This transform is never a pessimization at the IR level (since it
replaces an `icmp` with another), and has potentiall payoffs:

 1. It may make the `icmp` fold away or become loop invariant.
 2. It may make the `A & (L - 1)` computation dead.

This shows up in Java, in range checks generated by array accesses of
the form `a[i & (a.length - 1)]`.

Reviewers: reames, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12210

llvm-svn: 245635
2015-08-20 22:31:55 +00:00
Balaram Makam ccf59731e3 Optimize bitwise even/odd test (-x&1 -> x&1) to not use negation.
Summary: We know that -x & 1 is equivalent to x & 1, avoid using negation for testing if a negative integer is even or odd.

Reviewers: majnemer

Subscribers: junbuml, mssimpso, gberry, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D12156

llvm-svn: 245569
2015-08-20 15:35:00 +00:00
David Majnemer 8ed559ad22 Revert "[InstCombinePHI] Partial simplification of identity operations."
This reverts commit r244887, it caused PR24470.

llvm-svn: 245194
2015-08-17 03:11:26 +00:00
Sanjay Patel 57fd1dc5db transform fmin/fmax calls when possible (PR24314)
If we can ignore NaNs, fmin/fmax libcalls can become compare and select
(this is what we turn std::min / std::max into).

This IR should then be optimized in the backend to whatever is best for
any given target. Eg, x86 can use minss/maxss instructions.

This should solve PR24314:
https://llvm.org/bugs/show_bug.cgi?id=24314

Differential Revision: http://reviews.llvm.org/D11866

llvm-svn: 245187
2015-08-16 20:18:19 +00:00
David Majnemer dfa3b09541 [InstCombine] Replace an and+icmp with a trunc+icmp
Bitwise arithmetic can obscure a simple sign-test.  If replacing the
mask with a truncate is preferable if the type is legal because it
permits us to rephrase the comparison more explicitly.

llvm-svn: 245171
2015-08-16 07:09:17 +00:00
Nick Lewycky 8075fd22b9 Fix a crash where a utility function wasn't aware of fcmp vectors and created a value with the wrong type. Fixes PR24458!
llvm-svn: 245119
2015-08-14 22:46:49 +00:00
Davide Italiano a195386ca1 [SimplifyLibCalls] Correctly set the is_zero_undef flag for llvm.cttz
If <src> is non-zero we can safely set the flag to true, and this
results in less code generated for, e.g. ffs(x) + 1 on FreeBSD.
Thanks to majnemer for suggesting the fix and reviewing.

Code generated before the patch was applied:


 0:   0f bc c7                bsf    %edi,%eax
 3:   b9 20 00 00 00          mov    $0x20,%ecx
 8:   0f 45 c8                cmovne %eax,%ecx
 b:   83 c1 02                add    $0x2,%ecx
 e:   b8 01 00 00 00          mov    $0x1,%eax
13:   85 ff                   test   %edi,%edi
15:   0f 45 c1                cmovne %ecx,%eax
18:   c3                      retq

Code generated after the patch was applied:

 0:   0f bc cf                bsf    %edi,%ecx
 3:   83 c1 02                add    $0x2,%ecx
 6:   85 ff                   test   %edi,%edi
 8:   b8 01 00 00 00          mov    $0x1,%eax
 d:   0f 45 c1                cmovne %ecx,%eax
10:   c3                      retq

It seems we can still use cmove and save another 'test' instruction, but
that can be tackled separately.

Differential Revision:  http://reviews.llvm.org/D11989	

llvm-svn: 244947
2015-08-13 20:34:26 +00:00
Charlie Turner 6153698f26 [InstCombinePHI] Partial simplification of identity operations.
Consider this code:

BB:
  %i = phi i32 [ 0, %if.then ], [ %c, %if.else ]
  %add = add nsw i32 %i, %b
  ...

In this common case the add can be moved to the %if.else basic block, because
adding zero is an identity operation. If we go though %if.then branch it's
always a win, because add is not executed; if not, the number of instructions
stays the same.

This pattern applies also to other instructions like sub, shl, shr, ashr | 0,
mul, sdiv, div | 1.

Patch by Jakub Kuderski!

llvm-svn: 244887
2015-08-13 12:38:58 +00:00
Simon Pilgrim becd5e8abd [InstCombine] SSE/AVX vector shifts demanded shift amount bits
Most SSE/AVX (non-constant) vector shift instructions only use the lower 64-bits of the 128-bit shift amount vector operand, this patch calls SimplifyDemandedVectorElts to optimize for this.

I had to refactor some of my recent InstCombiner work on the vector shifts to avoid quite a bit of duplicate code, it means that SimplifyX86immshift now (re)decodes the type of shift.

Differential Revision: http://reviews.llvm.org/D11938

llvm-svn: 244872
2015-08-13 07:39:03 +00:00
Simon Pilgrim 8c049d5c03 [InstCombine] Move SSE/AVX vector blend folding to instcombiner
As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely).

InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask).

I also moved all the relevant combine tests into InstCombine/blend_x86.ll

Differential Revision: http://reviews.llvm.org/D11934

llvm-svn: 244723
2015-08-12 08:08:56 +00:00
Sanjoy Das 827529e7a0 Fix PR24354.
`InstCombiner::OptimizeOverflowCheck` was asserting an
invariant (operands to binary operations are ordered by decreasing
complexity) that wasn't really an invariant.  Fix this by instead having
`InstCombiner::OptimizeOverflowCheck` establish the invariant if it does
not hold.

llvm-svn: 244676
2015-08-11 21:33:55 +00:00
Mehdi Amini b10555cc61 Fix InstCombine test: invalid CHECK line slipped in r231270
I incorrectly wrote CHECK-NEXT with followin with ':', the check was
ignored by FileCheck.
The non-inbound GEP is folded here because the DataLayout is no longer
optional, the fold was originally guarded with a comment that said:
    We need TD information to know the pointer size unless this is inbounds.
Now we always have "TD information" and perform the fold.

Thanks Jonathan Roelofs for noticing.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 244613
2015-08-11 15:31:17 +00:00
James Molloy 134bec2722 Add support for floating-point minnum and maxnum
The select pattern recognition in ValueTracking (as used by InstCombine
and SelectionDAGBuilder) only knew about integer patterns. This teaches
it about minimum and maximum operations.

matchSelectPattern() has been extended to return a struct containing the
existing Flavor and a new enum defining the pattern's behavior when
given one NaN operand.

C minnum() is defined to return the non-NaN operand in this case, but
the idiomatic C "a < b ? a : b" would return the NaN operand.

ARM and AArch64 at least have different instructions for these different cases.

llvm-svn: 244580
2015-08-11 09:12:57 +00:00
Simon Pilgrim a3a72b41de [InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombiner
As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations.

Differential Revision: http://reviews.llvm.org/D11886

llvm-svn: 244495
2015-08-10 20:21:15 +00:00
Jonathan Roelofs f45295c366 Fix a few more cases of 'CHECK[^:]*$'. NFCI
llvm-svn: 244491
2015-08-10 19:56:39 +00:00
Jonathan Roelofs 49e46ce8e2 Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCI
I looked into adding a warning / error for this to FileCheck, but there doesn't
seem to be a good way to avoid it triggering on the instances of it in RUN lines.

llvm-svn: 244481
2015-08-10 19:01:27 +00:00
Simon Pilgrim 3815c16bf8 [InstCombine] Fix SSE2/AVX2 vector logical shift by constant
This patch fixes the sse2/avx2 vector shift by constant instcombine call to correctly deal with the fact that the shift amount is formed from the entire lower 64-bit and not just the lowest element as it currently assumes.

e.g.

%1 = tail call <4 x i32> @llvm.x86.sse2.psrl.d(<4 x i32> %v, <4 x i32> <i32 15, i32 15, i32 15, i32 15>)

In this case, (V)PSRLD doesn't perform a lshr by 15 but in fact attempts to shift by 64424509455 ((15 << 32) | 15) - giving a zero result.

In addition, this review also recognizes shift-by-zero from a ConstantAggregateZero type (PR23821).

Differential Revision: http://reviews.llvm.org/D11760

llvm-svn: 244341
2015-08-07 18:22:50 +00:00
Simon Pilgrim 42c611b9ae [InstCombine] Added more specific SSE2/AVX2 vector shift tests.
llvm-svn: 244022
2015-08-05 08:21:38 +00:00
Simon Pilgrim d19b9d8229 [InstCombine] Split off SSE2/AVX2 vector shift tests.
These aren't vector demanded bits tests. More tests to follow.

llvm-svn: 243963
2015-08-04 08:05:27 +00:00
Duncan P. N. Exon Smith 55ca964e94 DI: Disallow uniquable DICompileUnits
Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s.
The backend is liable to start relying on that (if it hasn't already),
so make uniquable `DICompileUnit`s illegal and automatically upgrade old
bitcode.  This is a nice cleanup, since we can remove an unnecessary
`DenseSet` (and the associated uniquing info) from `LLVMContextImpl`.

Almost all the testcases were updated with this script:

    git grep -e '= !DICompileUnit' -l -- test |
    grep -v test/Bitcode |
    xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,'

I imagine something similar should work for out-of-tree testcases.

llvm-svn: 243885
2015-08-03 17:26:41 +00:00
Duncan P. N. Exon Smith ed013cd221 DI: Remove DW_TAG_arg_variable and DW_TAG_auto_variable
Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags,
using `DW_TAG_variable` in their place Stop exposing the `tag:` field at
all in the assembly format for `DILocalVariable`.

Most of the testcase updates were generated by the following sed script:

    find test/ -name "*.ll" -o -name "*.mir" |
    xargs grep -l 'DILocalVariable' |
    xargs sed -i '' \
      -e 's/tag: DW_TAG_arg_variable, //' \
      -e 's/tag: DW_TAG_auto_variable, //'

There were only a handful of tests in `test/Assembly` that I needed to
update by hand.

(Note: a follow-up could change `DILocalVariable::DILocalVariable()` to
set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable`
(as appropriate), instead of having that logic magically in the backend
in `DbgVariable`.  I've added a FIXME to that effect.)

llvm-svn: 243774
2015-07-31 18:58:39 +00:00
Simon Pilgrim 15c0a59463 [InstCombine][X86][SSE] Replace sign/zero extension intrinsics with native IR
Now that we are generating sane codegen for vector sext/zext nodes on SSE targets, this patch uses instcombine to replace the SSE41/AVX2 pmovsx and pmovzx intrinsics with the equivalent native IR code.

Differential Revision: http://reviews.llvm.org/D11503

llvm-svn: 243303
2015-07-27 18:52:15 +00:00
Simon Pilgrim 357b85c926 [InstCombine] Split off SSE4a tests.
These aren't vector demanded bits tests. More tests to follow.

llvm-svn: 243223
2015-07-25 17:14:01 +00:00
Karthik Bhat d818e38ff9 Constfold trunc,rint,nearbyint,ceil and floor using APFloat
A patch by Chakshu Grover!
This patch allows constfolding of trunc,rint,nearbyint,ceil and floor intrinsics using APFloat class.
Differential Revision: http://reviews.llvm.org/D11144

llvm-svn: 242763
2015-07-21 08:52:23 +00:00
David Majnemer 33b6f82e72 [InstCombine] Generalize sub of selects optimization to all BinaryOperators
This exposes further optimization opportunities if the selects are
correlated.

llvm-svn: 242235
2015-07-14 22:39:23 +00:00
Reid Kleckner 486fa3977a Update enforceKnownAlignment after the isWeakForLinker semantic change
Previously we would refrain from attempting to increase the linkage of
available_externally globals because they were considered weak for the
linker. Now they are treated more like a declaration instead of a weak
definition.

This was causing SSE alignment faults in Chromuim, when some code
assumed it could increase the alignment of a dllimported global that it
didn't control.  http://crbug.com/509256

llvm-svn: 242091
2015-07-14 00:11:08 +00:00
Bjorn Steinbrink a6b929dfe2 [InstCombine] Actually combine AA metadata when replacing one load with another
Fixes PR24083

llvm-svn: 241955
2015-07-10 22:30:17 +00:00
Bjorn Steinbrink 8350534772 [InstCombine] Employ AliasAnalysis in FindAvailableLoadedValue
llvm-svn: 241887
2015-07-10 06:55:49 +00:00
Bjorn Steinbrink a91fd0998f [InstCombine] Properly combine metadata when replacing a load with another
Not doing this can lead to misoptimizations down the line, e.g. because
of range metadata on the replacing load excluding values that are valid
for the load that is being replaced.

llvm-svn: 241886
2015-07-10 06:55:44 +00:00
Karthik Bhat d2bc0d8423 Allow constfolding of llvm.sin.* and llvm.cos.* intrinsics
This patch const folds llvm.sin.* and llvm.cos.* intrinsics whenever feasible.

Differential Revision: http://reviews.llvm.org/D10836

llvm-svn: 241665
2015-07-08 03:55:47 +00:00
Jingyue Wu 5e34ce33f5 [InstCombine] call SimplifyICmpInst with correct context
Summary:
Fixes PR23809. Without passing the context to SimplifyICmpInst, we would
use the assume to prove that the condition feeding the assume is
trivially true (see isValidAssumeForContext in ValueTracking.cpp),
causing the removal of the assume which may be useful for later
optimizations.

Test Plan: pr23800.ll

Reviewers: hfinkel, majnemer

Reviewed By: hfinkel

Subscribers: henryhu, llvm-commits, wengxt, broune, meheff, eliben

Differential Revision: http://reviews.llvm.org/D10695

llvm-svn: 240683
2015-06-25 20:14:47 +00:00
Artur Pilipenko 0e21d54b51 Take alignment into account in isSafeToLoadUnconditionally
Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D10475

llvm-svn: 240636
2015-06-25 12:18:43 +00:00
David Majnemer 726901b638 [InstCombine] Optimize subtract of selects into a select of a sub
This came up when examining some code generated by clang's IRGen for
certain member pointers.

llvm-svn: 240369
2015-06-23 02:49:24 +00:00
David Majnemer 7fddeccb8b Move the personality function from LandingPadInst to Function
The personality routine currently lives in the LandingPadInst.

This isn't desirable because:
- All LandingPadInsts in the same function must have the same
  personality routine.  This means that each LandingPadInst beyond the
  first has an operand which produces no additional information.

- There is ongoing work to introduce EH IR constructs other than
  LandingPadInst.  Moving the personality routine off of any one
  particular Instruction and onto the parent function seems a lot better
  than have N different places a personality function can sneak onto an
  exceptional function.

Differential Revision: http://reviews.llvm.org/D10429

llvm-svn: 239940
2015-06-17 20:52:32 +00:00
Philip Reames c25df11614 Reapply 239795 - [InstCombine] Propagate non-null facts to call parameters
The original change broke clang side tests.  I will be submitting those momentarily.  This change includes post commit feedback on the original change from from Pete Cooper.

Original Submission comments:
If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129.

Differential Revision: http://reviews.llvm.org/D9132

llvm-svn: 239849
2015-06-16 20:24:25 +00:00
Philip Reames 1a6305f313 Revert 239795
I forgot to update some clang test cases.  I'll fix and resubmit tomorrow.

llvm-svn: 239800
2015-06-16 01:20:53 +00:00
Philip Reames dfc29fba60 [InstCombine] Propagate non-null facts to call parameters
If a parameter to a function is known non-null, use the existing parameter attributes to record that fact at the call site. This has no optimization benefit by itself - that I know of - but is an enabling change for http://reviews.llvm.org/D9129.

Differential Revision: http://reviews.llvm.org/D9132

llvm-svn: 239795
2015-06-16 00:43:54 +00:00
David Majnemer 468f670021 [InstCombine] Don't miscompile select to poison
If we have (select a, b, c), it is sometimes valid to simplify this to a
single select operand.  However, doing so is only valid if the
computation doesn't inject poison into the computation.

It might be helpful to consider the following example:
  (select (icmp ne %i, INT_MAX), (add nsw %i, 1), INT_MIN)

The select is equivalent to (add %i, 1) but not (add nsw %i, 1).

Self hosting on x86_64 revealed that this occurs very, very rarely so
bailing out is hopefully pretty reasonable.

llvm-svn: 239215
2015-06-06 02:30:43 +00:00
Renato Golin 3dabb23384 Revert "[InstCombine] Rephrase fix to SimplifyWithOpReplaced"
This reverts commit r239141. This commit was an attempt to reintroduce
a previous patch that broke many self-hosting bots with clang timeouts,
but it still has slowdown issues, at least  on ARM, increasing the
compilation time (stage 2, clang's) by 5x.

llvm-svn: 239175
2015-06-05 18:24:12 +00:00
Sanjoy Das 72cb5e1087 [InstCombine] Fix PR23751.
PR23751 was caused by a missing ``break;`` in r234388.

llvm-svn: 239171
2015-06-05 18:04:42 +00:00
David Majnemer 6d8081835d [InstCombine] Rephrase fix to SimplifyWithOpReplaced
I don't have the IR which is causing the build bot breakage but I can
postulate as to why they are timing out:
1. SimplifyWithOpReplaced was stripping flags from the simplified value.
2. visitSelectInstWithICmp was overriding SimplifyWithOpReplaced because
   it's simplification wasn't correct.
3. InstCombine would revisit the add instruction and note that it can
   rederive the flags.
4. By modifying the value, we chose to revisit instructions which reuse
   the value.  One of the instructions is the original select, causing
   LLVM to never reach fixpoint.

Instead, strip the flags only when we are sure we are going to perform
the simplification.

llvm-svn: 239141
2015-06-05 09:57:57 +00:00
Daniel Jasper 917fa5ee66 Revert "[InstCombine] Don't miscompile safe increment idiom"
This is breaking a lot of build bots and is causing very long-running
compiles (infinite loops)?

Likely, we shouldn't return nullptr?

llvm-svn: 239139
2015-06-05 09:31:20 +00:00
David Majnemer 00f7d9ecc8 [InstCombine] Don't miscompile safe increment idiom
We cleverly handle cases where computation done in one argument of a select
instruction is suitable for the other operand, thus obviating the need
of the select and the comparison.  However, the other operand cannot
have flags.

This fixes PR23757.

llvm-svn: 239115
2015-06-04 23:11:30 +00:00
Ahmed Bougacha 0ea9d1e753 [IR] fptrunc-of-fptrunc isn't an EliminableCastPair.
Double and single rounding can produce different results.
This is the IR counterpart to r228911.

llvm-svn: 238531
2015-05-29 00:04:30 +00:00
David Majnemer dd04352558 [InstCombine] Fold IntToPtr and PtrToInt into preceding loads.
Currently we only fold a BitCast into a Load when the BitCast is its
only user.

Do the same for any no-op cast.

Differential Revision: http://reviews.llvm.org/D9152

llvm-svn: 238452
2015-05-28 18:39:17 +00:00
David Majnemer 4c3753c4d4 [InstCombine] Don't eagerly propagate nsw for A*B+A*C => A*(B+C)
InstCombine transforms A *nsw B +nsw A *nsw C to A *nsw (B + C).
This is incorrect -- e.g. if A = -1, B = 1, C = INT_SMAX. Then
nothing in the LHS overflows, but the multiplication in RHS overflows.

We need to first make sure that we won't multiple by INT_SMAX + 1.

Test case `add_of_mul` contributed by Sanjoy Das.

This fixes PR23635.

Differential Revision: http://reviews.llvm.org/D9629

llvm-svn: 238066
2015-05-22 23:02:11 +00:00
David Majnemer 1503258157 [InstSimplify] Handle some overflow intrinsics in InstSimplify
This change does a few things:
- Move some InstCombine transforms to InstSimplify
- Run SimplifyCall from within InstCombine::visitCallInst
- Teach InstSimplify to fold [us]mul_with_overflow(X, undef) to 0.

llvm-svn: 237995
2015-05-22 03:56:46 +00:00
David Majnemer 27e89ba24c [InstCombine] X - 0 is equal to X, not undef
A refactoring made @llvm.ssub.with.overflow.i32(i32 %X, i32 0) transform
into undef instead of %X.

This fixes PR23624.

llvm-svn: 237968
2015-05-21 23:04:21 +00:00
James Molloy 2b21a7cf36 Reapply r237539 with a fix for the Chromium build.
Make sure if we're truncating a constant that would then be sign extended
that the sign extension of the truncated constant is the same as the
original constant.

> Canonicalize min/max expressions correctly.
>
> This patch introduces a canonical form for min/max idioms where one operand
> is extended or truncated. This often happens when the other operand is a
> constant. For example:
>
> %1 = icmp slt i32 %a, i32 0
> %2 = sext i32 %a to i64
> %3 = select i1 %1, i64 %2, i64 0
>
> Would now be canonicalized into:
>
> %1 = icmp slt i32 %a, i32 0
> %2 = select i1 %1, i32 %a, i32 0
> %3 = sext i32 %2 to i64
>
> This builds upon a patch posted by David Majenemer
> (https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
> passively stopped instcombine from ruining canonical patterns. This
> patch additionally actively makes instcombine canonicalize too.
>
> Canonicalization of expressions involving a change in type from int->fp
> or fp->int are not yet implemented.

llvm-svn: 237821
2015-05-20 18:41:25 +00:00
Hans Wennborg 2f21b8760e Revert r237539: "Reapply r237520 with another fix for infinite looping"
This caused PR23583.

llvm-svn: 237739
2015-05-19 23:06:30 +00:00
James Molloy 53958e187a Reapply r237520 with another fix for infinite looping
SimplifyDemandedBits was "simplifying" a constant by removing just sign bits.
This caused a canonicalization race between different parts of instcombine.

Fix and regression test added - third time lucky?

llvm-svn: 237539
2015-05-17 08:27:27 +00:00
James Molloy e8698ae3e1 Revert commits r237521 and r237520.
The AArch64 LNT bot is unhappy - I've found that the problem is in
SimpliftDemandedBits, but that's going to require another code review
so reverting in the meantime.

llvm-svn: 237528
2015-05-16 21:27:14 +00:00
James Molloy 8ae1224fcf Update to r237520 - swap order of CHECK-NEXT lines.
... I'd copied the check-next lines from a previous test so they were
slightly wrong, and had managed to test the wrong source tree. D'oh!

llvm-svn: 237521
2015-05-16 13:26:25 +00:00
James Molloy b5aa200a33 Reapply r237453 with a fix for the test timeouts.
The test timeouts were due to instcombine fighting itself. Regression test added.
Original log message:

Canonicalize min/max expressions correctly.

This patch introduces a canonical form for min/max idioms where one operand
is extended or truncated. This often happens when the other operand is a
constant. For example:

  %1 = icmp slt i32 %a, i32 0
    %2 = sext i32 %a to i64
      %3 = select i1 %1, i64 %2, i64 0

Would now be canonicalized into:

  %1 = icmp slt i32 %a, i32 0
    %2 = select i1 %1, i32 %a, i32 0
      %3 = sext i32 %2 to i64

This builds upon a patch posted by David Majenemer
(https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
passively stopped instcombine from ruining canonical patterns. This
patch additionally actively makes instcombine canonicalize too.

Canonicalization of expressions involving a change in type from int->fp
or fp->int are not yet implemented.

llvm-svn: 237520
2015-05-16 13:10:45 +00:00
James Molloy 1675b4a57f Revert "Canonicalize min/max expressions correctly."
This reverts r237453 - it was causing timeouts on some bots. Reverting
while I investigate (it's probably InstCombine fighting itself...)

llvm-svn: 237458
2015-05-15 17:45:09 +00:00
James Molloy 6edf0b4cd4 Canonicalize min/max expressions correctly.
This patch introduces a canonical form for min/max idioms where one operand
is extended or truncated. This often happens when the other operand is a
constant. For example:

  %1 = icmp slt i32 %a, i32 0
  %2 = sext i32 %a to i64
  %3 = select i1 %1, i64 %2, i64 0

Would now be canonicalized into:

  %1 = icmp slt i32 %a, i32 0
  %2 = select i1 %1, i32 %a, i32 0
  %3 = sext i32 %2 to i64

This builds upon a patch posted by David Majenemer
(https://www.marc.info/?l=llvm-commits&m=143008038714141&w=2). That pass
passively stopped instcombine from ruining canonical patterns. This
patch additionally actively makes instcombine canonicalize too.

Canonicalization of expressions involving a change in type from int->fp
or fp->int are not yet implemented.

llvm-svn: 237453
2015-05-15 16:10:59 +00:00
Sanjoy Das a1d39ba940 [Statepoints] Support for "patchable" statepoints.
Summary:
This change adds two new parameters to the statepoint intrinsic, `i64 id`
and `i32 num_patch_bytes`.  `id` gets propagated to the ID field
in the generated StackMap section.  If the `num_patch_bytes` is
non-zero then the statepoint is lowered to `num_patch_bytes` bytes of
nops instead of a call (the spill and reload code remains unchanged).
A non-zero `num_patch_bytes` is useful in situations where a language
runtime requires complete control over how a call is lowered.

This change brings statepoints one step closer to patchpoints.  With
some additional work (that is not part of this patch) it should be
possible to get rid of `TargetOpcode::STATEPOINT` altogether.

PlaceSafepoints generates `statepoint` wrappers with `id` set to
`0xABCDEF00` (the old default value for the ID reported in the stackmap)
and `num_patch_bytes` set to `0`.  This can be made more sophisticated
later.

Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9546

llvm-svn: 237214
2015-05-12 23:52:24 +00:00
Sunil Srivastava d79dfcbc37 Changed renaming of local symbols by inserting a dot vefore the numeric suffix.
One code change and several test changes to match that
details in http://reviews.llvm.org/D9481

llvm-svn: 237150
2015-05-12 16:47:30 +00:00
Hal Finkel f0d68d788b [InstCombine/PowerPC] Fix single-precision QPX load/store replacement
The QPX single-precision load/store intrinsics have implied
truncation/extension from/to the declared value type of <4 x double> to the
memory type of <4 x float>. When we can prove the alignment of the pointer
argument, and thus replace the intrinsic with a regular load or store, we need
to load or store the correct data type (<4 x float>) instead of (<4 x double>).

llvm-svn: 236973
2015-05-11 06:37:03 +00:00
David Majnemer 176fd7c4af Make buildbots happy
llvm-svn: 236970
2015-05-11 05:33:27 +00:00
David Majnemer 7536460c0f [InstCombine] Canonicalize single element array store
Use the element type instead of the aggregate type.

Differential Revision: http://reviews.llvm.org/D9591

llvm-svn: 236969
2015-05-11 05:04:27 +00:00
David Majnemer 58fb038b1b [InstCombine] Canonicalize single element array load
Use the element type instead of the aggregate type.

Differential Revision: http://reviews.llvm.org/D9596

llvm-svn: 236968
2015-05-11 05:04:22 +00:00
Pat Gavlin cc0431d1c0 Extend the statepoint intrinsic to allow statepoints to be marked as transitions from GC-aware code to code that is not GC-aware.
This changes the shape of the statepoint intrinsic from:

  @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args)

to:

  @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args)

This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back.

In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation.

Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops.

Differential Revision: http://reviews.llvm.org/D9501

llvm-svn: 236888
2015-05-08 18:07:42 +00:00
Mehdi Amini 2668a487a7 Update InstCombine to transform aggregate loads into scalar loads.
Summary:
One step further getting aggregate loads and store being optimized
properly. This will only handle struct with one element at this point.

Test Plan: Added unit tests for the new supported cases.

Reviewers: chandlerc, joker-eph, joker.eph, majnemer

Reviewed By: majnemer

Subscribers: pete, llvm-commits

Differential Revision: http://reviews.llvm.org/D8339

Patch by Amaury Sechet.

From: Amaury Sechet <amaury@fb.com>
llvm-svn: 236695
2015-05-07 05:52:40 +00:00
Matthias Braun e48484c64f InstCombineSimplifyDemanded: Remove nsw/nuw flags when optimizing demanded bits
When optimizing demanded bits of the operands of an Add we have to
remove the nsw/nuw flags as we have no guarantee anymore that we don't
wrap.  This is legal here because the top bit is not demanded.  In fact
this operaion was already performed but missed in the case of an Add
with a constant on the right side.  To fix this this patch refactors the
code to unify the code paths in SimplifyDemandedUseBits() handling of
Add/Sub:

- The transformation of Add->Or is removed from the simplify demand
  code because the equivalent transformation exists in
  InstCombiner::visitAdd()
- KnownOnes/KnownZero are not adjusted for Add x, C anymore as
  computeKnownBits() already performs these computations.
- The simplification of the operands is unified. In this new version
  constant on the right side of a Sub are shrunk now as I could not find
  a reason why not to do so.
- The special case for clearing nsw/nuw in ShrinkDemandedConstant() is
  not necessary anymore as the caller does that already.

Differential Revision: http://reviews.llvm.org/D9415

llvm-svn: 236269
2015-04-30 22:05:30 +00:00
Sanjoy Das 08e95b4703 [InstCombine] Add new rule for MIN(MAX(~A, ~B), ~C) et. al.
Summary:
Optimizing these well are especially interesting for IRCE since it
"clamps" values by generating this sort of pattern through SCEV
expressions.

Depends on D9352.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9353

llvm-svn: 236203
2015-04-30 04:56:04 +00:00
Sanjoy Das a8c178f280 [InstCombine] Add a new formula for SMIN.
Summary:
After this change `MatchSelectPattern` recognizes the following form
of SMIN:

  Y >s C ? ~Y : ~C == ~Y <s ~C ? ~Y : ~C = SMIN(~Y, ~C)

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D9352

llvm-svn: 236202
2015-04-30 04:56:00 +00:00
Duncan P. N. Exon Smith a9308c49ef IR: Give 'DI' prefix to debug info metadata
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`.  The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.

Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one.  It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs.  YMMV of
course.

Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py.  I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three.  It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).

Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.

llvm-svn: 236120
2015-04-29 16:38:44 +00:00
Sanjay Patel c1d20a36fb [x86] instcombine more cases of insertps into a shufflevector
This is a follow-on to D8833 (insertps optimization when the zero mask is not used).

In this patch, we check for the case where the zmask is used, but both input vectors
to the insertps intrinsic are the same operand or the zmask overrides the destination
lane. This lets us replace the 2nd shuffle input operand with the zero vector.

Differential Revision: http://reviews.llvm.org/D9257

llvm-svn: 235810
2015-04-25 20:55:25 +00:00
David Blaikie 445e3fbc54 [opaque pointer type] Add textual IR support for explicit type parameter to the invoke instruction
Same as r235145 for the call instruction - the justification, tradeoffs,
etc are all the same. The conversion script worked the same without any
false negatives (after replacing 'call' with 'invoke').

llvm-svn: 235755
2015-04-24 19:32:54 +00:00
David Majnemer 7d0e99c601 [InstCombine] Use a more targeted fix instead of r235544
Only clear out the NSW/NUW flags if we are optimizing 'add'/'sub' while
taking advantage that the sign bit is not set.  We do this optimization
to further shrink the mask but shrinking the mask isn't NSW/NUW
preserving in this case.

llvm-svn: 235558
2015-04-22 22:42:05 +00:00
David Majnemer fe58d13a17 [InstCombine] Clear out nsw/nuw if we modify computation in the chain
An nsw/nuw operation relies on the values feeding into it to not
overflow if 'poison' is not to be produced.  This means that
optimizations which make modifications to the bottom of a chain (like
SimplifyDemandedBits) must strip out nsw/nuw if they cannot ensure that
they will be preserved.

This fixes PR23309.

llvm-svn: 235544
2015-04-22 20:59:28 +00:00
NAKAMURA Takumi b8854d01a6 Remove a zero-length file of llvm/test/Transforms/InstCombine/descale-zero.ll.
llvm-svn: 235457
2015-04-21 23:14:33 +00:00
Wei Mi a0adf9fd41 Limiting gep merging to fix the performance problem described in
https://llvm.org/bugs/show_bug.cgi?id=23163.

Gep merging sometimes behaves like a reverse CSE/LICM optimization,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.

The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.

Differential Revision: http://reviews.llvm.org/D8911

llvm-svn: 235455
2015-04-21 23:02:15 +00:00
Wei Mi 2940bc82ac Revert r235451 since it is attached to a wrong Differential Revision. Sorry.
llvm-svn: 235453
2015-04-21 22:56:09 +00:00
Wei Mi 6e3344ed98 Limiting gep merging to fix the performance problem described in
https://llvm.org/bugs/show_bug.cgi?id=23163.

Gep merging sometimes behaves like a reverse CSE/LICM optimizations,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.

The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.

Differential Revision: http://reviews.llvm.org/D9007

llvm-svn: 235451
2015-04-21 22:37:09 +00:00
Fiona Glaser 0d41db11a2 InstCombine: fold (sitofp (zext x)) to (uitofp x)
This is okay because the zext guarantees the high bit is zero,
and so the value is unsigned.

llvm-svn: 235364
2015-04-21 00:05:41 +00:00
David Majnemer 45951a6626 [InstCombine] (mul nsw 1, INT_MIN) != (shl nsw 1, 31)
Multiplying INT_MIN by 1 doesn't trigger nsw.  However, shifting 1 into
the sign bit *does* trigger nsw.

llvm-svn: 235250
2015-04-18 04:41:30 +00:00
David Blaikie 23af64846f [opaque pointer type] Add textual IR support for explicit type parameter to the call instruction
See r230786 and r230794 for similar changes to gep and load
respectively.

Call is a bit different because it often doesn't have a single explicit
type - usually the type is deduced from the arguments, and just the
return type is explicit. In those cases there's no need to change the
IR.

When that's not the case, the IR usually contains the pointer type of
the first operand - but since typed pointers are going away, that
representation is insufficient so I'm just stripping the "pointerness"
of the explicit type away.

This does make the IR a bit weird - it /sort of/ reads like the type of
the first operand: "call void () %x(" but %x is actually of type "void
()*" and will eventually be just of type "ptr". But this seems not too
bad and I don't think it would benefit from repeating the type
("void (), void () * %x(" and then eventually "void (), ptr %x(") as has
been done with gep and load.

This also has a side benefit: since the explicit type is no longer a
pointer, there's no ambiguity between an explicit type and a function
that returns a function pointer. Previously this case needed an explicit
type (eg: a function returning a void() function was written as
"call void () () * @x(" rather than "call void () * @x(" because of the
ambiguity between a function returning a pointer to a void() function
and a function returning void).

No ambiguity means even function pointer return types can just be
written alone, without writing the whole function's type.

This leaves /only/ the varargs case where the explicit type is required.

Given the special type syntax in call instructions, the regex-fu used
for migration was a bit more involved in its own unique way (as every
one of these is) so here it is. Use it in conjunction with the apply.sh
script and associated find/xargs commands I've provided in rr230786 to
migrate your out of tree tests. Do let me know if any of this doesn't
cover your cases & we can iterate on a more general script/regexes to
help others with out of tree tests.

About 9 test cases couldn't be automatically migrated - half of those
were functions returning function pointers, where I just had to manually
delete the function argument types now that we didn't need an explicit
function type there. The other half were typedefs of function types used
in calls - just had to manually drop the * from those.

import fileinput
import sys
import re

pat = re.compile(r'((?:=|:|^|\s)call\s(?:[^@]*?))(\s*$|\s*(?:(?:\[\[[a-zA-Z0-9_]+\]\]|[@%](?:(")?[\\\?@a-zA-Z0-9_.]*?(?(3)"|)|{{.*}}))(?:\(|$)|undef|inttoptr|bitcast|null|asm).*$)')
addrspace_end = re.compile(r"addrspace\(\d+\)\s*\*$")
func_end = re.compile("(?:void.*|\)\s*)\*$")

def conv(match, line):
  if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)):
    return line
  return line[:match.start()] + match.group(1)[:match.group(1).rfind('*')].rstrip() + match.group(2) + line[match.end():]

for line in sys.stdin:
  sys.stdout.write(conv(re.search(pat, line), line))

llvm-svn: 235145
2015-04-16 23:24:18 +00:00
Sanjay Patel c86867cd5f [X86, SSE] instcombine common cases of insertps intrinsics into shuffles
This is very similar to D8486 / r232852 (vperm2). If we treat insertps intrinsics
as shufflevectors, we can optimize them better.

I've left all but the full zero case of the zero mask variants out of this patch. 
I don't think those can be converted into a single shuffle in all cases, but I'd
be happy to be proven wrong as I was for vperm2f128.

Either way, we'd need to support whatever sequence we come up with for those cases
in the backend before converting them here.

Differential Revision: http://reviews.llvm.org/D8833

llvm-svn: 235124
2015-04-16 17:52:13 +00:00
Nick Lewycky abe2cc17da Subtraction is not commutative. Fixes PR23212!
llvm-svn: 234780
2015-04-13 19:17:37 +00:00
Sanjoy Das b6c5914308 [InstCombine][CodeGenPrep] Create llvm.uadd.with.overflow in CGP.
Summary:
This change moves creating calls to `llvm.uadd.with.overflow` from
InstCombine to CodeGenPrep.  Combining overflow check patterns into
calls to the said intrinsic in InstCombine inhibits optimization because
it introduces an intrinsic call that not all other transforms and
analyses understand.

Depends on D8888.

Reviewers: majnemer, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8889

llvm-svn: 234638
2015-04-10 21:07:09 +00:00
David Majnemer b2e0f7a40f Fix a typo
CHECK-LABEL had the wrong function name.

llvm-svn: 234051
2015-04-03 20:56:24 +00:00
David Majnemer 98cfe2b7a5 [InstCombine] Use DataLayout to determine vector element width
InstCombine didn't realize that it needs to use DataLayout to determine
how wide pointers are.  This lead to assertion failures.

This fixes PR23113.

llvm-svn: 234046
2015-04-03 20:18:40 +00:00
Duncan P. N. Exon Smith 49e6a70fe3 Verifier: Call verifyModule() from llc and opt
Change `llc` and `opt` to run `verifyModule()`.  This ensures that we
check the full module before `FunctionPass::doInitialization()` ever
gets called (I was getting crashes in `DwarfDebug` instead of verifier
failures when testing a WIP patch that checks operands of compile
units).  In `opt`, also move up debug-info-stripping so that it still
runs before verification.

There was a fair bit of broken code that was sitting in tree.
Interestingly, some were cases of a `select` that referred to itself in
`-instcombine` tests (apparently an intermediate result).  I split them
off to `*-noverify.ll` tests with RUN lines like this:

    opt < %s -S -disable-verify -instcombine | opt -S | FileCheck %s

This avoids verifying the input file (so we can get the broken code into
`-instcombine), but still verifies the output with a second call to
`opt` (to verify that `-instcombine` will clean it up like it should).

llvm-svn: 233432
2015-03-27 22:04:28 +00:00
Duncan P. N. Exon Smith 988a7f8b79 DebugInfo: Fix bad debug info for compile units and types
Fix debug info in these tests, which started failing with a WIP patch to
verify compile units and types.  The problems look like they were all
caused by bitrot.  They fell into these categories:

  - Using `!{i32 0}` instead of `!{}`.
  - Using `!{null}` instead of `!{}`.
  - Using `!MDExpression()` instead of `!{}`.
  - Using `!8` instead of `!{!8}`.
  - `file:` references that pointed at `MDCompileUnit`s instead of the
    same `MDFile` as the compile unit.
  - `file:` references that were numerically off-by-one or (off-by-ten).

llvm-svn: 233415
2015-03-27 20:46:33 +00:00
Philip Reames e1bf27045d Require a GC strategy be specified for functions which use gc.statepoint
This was discussed a while back and I left it optional for migration.  Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory.  

llvm-svn: 233357
2015-03-27 05:09:33 +00:00
Benjamin Kramer 7fa8c430f7 InstCombine: fold (A << C) == (B << C) --> ((A^B) & (~0U >> C)) == 0
Anding and comparing with zero can be done in a single instruction on
most archs so this is a bit cheaper.

llvm-svn: 233291
2015-03-26 17:12:06 +00:00
Sanjay Patel e304bea010 optimize the AVX2 (integer) version of vperm2 into a shuffle
...because this is what happens when an instruction
set puts its underwear on after its pants.

This is an extension of r232852, r233100, and 233110:
http://llvm.org/viewvc/llvm-project?view=revision&revision=232852
http://llvm.org/viewvc/llvm-project?view=revision&revision=233100
http://llvm.org/viewvc/llvm-project?view=revision&revision=233110

llvm-svn: 233127
2015-03-24 22:39:29 +00:00
David Blaikie 1a6bb9fcf6 Revert "Remove an InstCombine that seems to have become redundant."
Assertion fires in compiler-rt. Guess it does fire..

This reverts commit r233116.

llvm-svn: 233121
2015-03-24 21:50:35 +00:00
David Blaikie e37e10dc57 Remove an InstCombine that seems to have become redundant.
Assert that this doesn't fire - I'll remove all of this later, but just
leaving it in for a while in case this is firing & we just don't have
test coverage.

llvm-svn: 233116
2015-03-24 21:31:31 +00:00
Sanjay Patel 43a87fdc79 [X86, AVX] instcombine vperm2 intrinsics with zero inputs into shuffles
This is the IR optimizer follow-on patch for D8563: the x86 backend patch
that converts this kind of shuffle back into a vperm2.

This is also a continuation of the transform that started in D8486. 
In that patch, Andrea suggested that we could convert vperm2 intrinsics that
use zero masks into a single shuffle. 

This is an implementation of that suggestion.

Differential Revision: http://reviews.llvm.org/D8567

llvm-svn: 233110
2015-03-24 20:36:42 +00:00
Benjamin Kramer d6aa0ec737 [SimplifyLibCalls] Fix negative shifts being produced by the memchr -> bitfield transform.
llvm-svn: 232903
2015-03-21 22:04:26 +00:00
Benjamin Kramer 7857d723f1 [SimplifyLibCalls] Turn memchr(const, C, const) into a bitfield check.
strchr("123!", C) != nullptr is a common pattern to check if C is one
of 1, 2, 3 or !. If the largest element of the string is smaller than
the target's register size we can easily create a bitfield and just
do a simple test for set membership.

int foo(char C) { return strchr("123!", C) != nullptr; } now becomes

	cmpl	$64, %edi ## range check
	sbbb	%al, %al
	movabsq	$0xE000200000001, %rcx
	btq	%rdi, %rcx ## bit test
	sbbb	%cl, %cl
	andb	%al, %cl ## and the two conditions
	andb	$1, %cl
	movzbl	%cl, %eax ## returning an int
	ret

(imho the backend should expand this into a series of branches, but
that's a different story)

The code is currently limited to bit fields that fit in a register, so
usually 64 or 32 bits. Sadly, this misses anything using alpha chars
or {}. This could be fixed by just emitting a i128 bit field, but that
can generate really ugly code so we have to find a better way. To some
degree this is also recreating switch lowering logic, but we can't
simply emit a switch instruction and thus change the CFG within
instcombine.

llvm-svn: 232902
2015-03-21 21:09:33 +00:00
Benjamin Kramer 691363e7f2 SimplifyLibCalls: Add basic optimization of memchr calls.
This is just memchr(x, y, 0) -> nullptr and constant folding.

llvm-svn: 232896
2015-03-21 15:36:21 +00:00
Sanjay Patel ccf5f24b7b [X86, AVX] instcombine common cases of vperm2* intrinsics into shuffles
vperm2* intrinsics are just shuffles. 
In a few special cases, they're not even shuffles.

Optimizing intrinsics in InstCombine is better than
handling this in the front-end for at least two reasons:

1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders
   really angry (and so I have regrets about some patches from last week).

2. Doing mask conversion logic in header files is hard to write and 
   subsequently read.

There are a couple of TODOs in this patch to complete this optimization.

Differential Revision: http://reviews.llvm.org/D8486

llvm-svn: 232852
2015-03-20 21:47:56 +00:00
Daniel Jasper 5add63f21e [InstCombine] Don't fold a GEP into itself through a PHI node
This can only occur (I think) through the back-edge of the loop.

However, folding a GEP into itself means that the value of the previous
iteration needs to be stored in the meantime, thus requiring an
additional register variable to be live, but not actually achieving
anything (the gep still needs to be executed once per loop iteration).

The attached test case is derived from:
  typedef unsigned uint32;
  typedef unsigned char uint8;
  inline uint8 *f(uint32 value, uint8 *target) {
    while (value >= 0x80) {
      value >>= 7;
      ++target;
    }
    ++target;
    return target;
  }
  uint8 *g(uint32 b, uint8 *target) {
    target = f(b, f(42, target));
    return target;
  }

What happens is that the GEP stored in incptr2 is folded into itself
through the loop's back-edge and the phi-node stored in loopptr,
effectively incrementing the ptr by "2" in each iteration instead of "1".

In this case, it is actually increasing the number of GEPs required as
the GEP before the loop can't be folded away anymore. For comparison:

With this patch:
  define i8* @test4(i32 %value, i8* %buffer) {
  entry:
    %cmp = icmp ugt i32 %value, 127
    br i1 %cmp, label %loop.header, label %exit

  loop.header:                                      ; preds = %entry
    br label %loop.body

  loop.body:                                        ; preds = %loop.body, %loop.header
    %buffer.pn = phi i8* [ %buffer, %loop.header ], [ %loopptr, %loop.body ]
    %newval = phi i32 [ %value, %loop.header ], [ %shr, %loop.body ]
    %loopptr = getelementptr inbounds i8, i8* %buffer.pn, i64 1
    %shr = lshr i32 %newval, 7
    %cmp2 = icmp ugt i32 %newval, 16383
    br i1 %cmp2, label %loop.body, label %loop.exit

  loop.exit:                                        ; preds = %loop.body
    br label %exit

  exit:                                             ; preds = %loop.exit, %entry
    %0 = phi i8* [ %loopptr, %loop.exit ], [ %buffer, %entry ]
    %incptr3 = getelementptr inbounds i8, i8* %0, i64 2
    ret i8* %incptr3
  }

Without this patch:
  define i8* @test4(i32 %value, i8* %buffer) {
  entry:
    %incptr = getelementptr inbounds i8, i8* %buffer, i64 1
    %cmp = icmp ugt i32 %value, 127
    br i1 %cmp, label %loop.header, label %exit

  loop.header:                                      ; preds = %entry
    br label %loop.body

  loop.body:                                        ; preds = %loop.body, %loop.header
    %0 = phi i8* [ %buffer, %loop.header ], [ %loopptr, %loop.body ]
    %loopptr = phi i8* [ %incptr, %loop.header ], [ %incptr2, %loop.body ]
    %newval = phi i32 [ %value, %loop.header ], [ %shr, %loop.body ]
    %shr = lshr i32 %newval, 7
    %incptr2 = getelementptr inbounds i8, i8* %0, i64 2
    %cmp2 = icmp ugt i32 %newval, 16383
    br i1 %cmp2, label %loop.body, label %loop.exit

  loop.exit:                                        ; preds = %loop.body
    br label %exit

  exit:                                             ; preds = %loop.exit, %entry
    %ptr2 = phi i8* [ %incptr2, %loop.exit ], [ %incptr, %entry ]
    %incptr3 = getelementptr inbounds i8, i8* %ptr2, i64 1
    ret i8* %incptr3
  }

Review: http://reviews.llvm.org/D8245
llvm-svn: 232718
2015-03-19 11:05:08 +00:00
Duncan P. N. Exon Smith 166121ad0b Verifier: Check debug info intrinsic arguments
Verify that debug info intrinsic arguments are valid.  (These checks
will not recurse through the full debug info graph, so they don't need
to be cordoned of in `DebugInfoVerifier`.)

With those checks in place, changing the `DbgIntrinsicInst` accessors to
downcast to `MDLocalVariable` and `MDExpression` is natural (added isa
specializations in `Metadata.h` to support this).

Added tests to `test/Verifier` for the new -verify checks, and fixed the
debug info in all the in-tree tests.

If you have out-of-tree testcases that have started to fail to -verify,
hopefully the verify checks are helpful.  The most likely problem is
that the expression argument is `!{}` (instead of `!MDExpression()`).

llvm-svn: 232296
2015-03-15 01:21:30 +00:00
Mehdi Amini b344ac9afe Update InstCombine to transform aggregate stores into scalar stores.
Summary: This is a first step toward getting proper support for aggregate loads and stores.

Test Plan: Added unittests

Reviewers: reames, chandlerc

Reviewed By: chandlerc

Subscribers: majnemer, joker.eph, chandlerc, llvm-commits

Differential Revision: http://reviews.llvm.org/D7780

Patch by Amaury Sechet

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 232284
2015-03-14 22:19:33 +00:00
Duncan P. N. Exon Smith be95b4afc6 instcombine: alloca: Canonicalize scalar allocation array size
As a follow-up to r232200, add an `-instcombine` to canonicalize scalar
allocations to `i32 1`.  Since r232200, `iX 1` (for X != 32) are only
created by RAUWs, so this shouldn't fire too often.  Nevertheless, it's
a cheap check and a nice cleanup.

llvm-svn: 232202
2015-03-13 19:42:09 +00:00
Duncan P. N. Exon Smith 720762e2c0 AsmWriter: Write alloca array size explicitly (and -instcombine fixup)
Write the `alloca` array size explicitly when it's non-canonical.
Previously, if the array size was `iX 1` (where X is not 32), the type
would mutate to `i32` when round-tripping through assembly.

The testcase I added fails in `verify-uselistorder` (as well as
`FileCheck`), since the use-lists for `i32 1` and `i64 1` change.
(Manman Ren came across this when running `verify-uselistorder` on some
non-trivial, optimized code as part of PR5680.)

The type mutation started with r104911, which allowed array sizes to be
something other than an `i32`.  Starting with r204945, we
"canonicalized" to `i64` on 64-bit platforms -- and then on every
round-trip through assembly, mutated back to `i32`.

I bundled a fixup for `-instcombine` to avoid r204945 on scalar
allocations.  (There wasn't a clean way to sequence this into two
commits, since the assembly change on its own caused testcase churn, and
the `-instcombine` change can't be tested without the assembly changes.)

An obvious alternative fix -- change `AllocaInst::AllocaInst()`,
`AsmWriter` and `LLParser` to treat `intptr_t` as the canonical type for
scalar allocations -- was rejected out of hand, since this required
teaching them each about the data layout.

A follow-up commit will add an `-instcombine` to canonicalize the scalar
allocation array size to `i32 1` rather than leaving `iX 1` alone.

rdar://problem/20075773

llvm-svn: 232200
2015-03-13 19:30:44 +00:00
David Blaikie f72d05bc7b [opaque pointer type] Add textual IR support for explicit type parameter to gep operator
Similar to gep (r230786) and load (r230794) changes.

Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.

(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)

import fileinput
import sys
import re

rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)

def conv(match):
  line = match.group(1)
  line += match.group(4)
  line += ", "
  line += match.group(2)
  return line

line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
  sys.stdout.write(line[off:match.start()])
  sys.stdout.write(conv(match))
  off = match.end()
sys.stdout.write(line[off:])

llvm-svn: 232184
2015-03-13 18:20:45 +00:00
David Majnemer d61a6fd8ed InstCombine: Don't fold call bitcast into args if callee is byval
This fixes a bug reported here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150309/265341.html

llvm-svn: 231948
2015-03-11 18:03:05 +00:00
Sanjay Patel c04b6f242c Inliner should not add callgraph edges for intrinsic calls (PR22857)
The CallGraphNode function "addCalledFunction()" asserts that edges are not to intrinsics.

This patch makes sure that the Inliner does not add such an edge to the callgraph.

Fix for clang crash by assertion: https://llvm.org/bugs/show_bug.cgi?id=22857

Differential Revision: http://reviews.llvm.org/D8231

llvm-svn: 231927
2015-03-11 15:12:32 +00:00
Philip Reames 71c4035c18 If a conditional branch jumps to the same target, remove the condition
Given that large parts of inst combine is restricted to instructions which have one use, getting rid of a use on the condition can help the effectiveness of the optimizer. Also, it allows the condition to potentially be deleted by instcombine rather than waiting for another pass.

I noticed this completely by accident in another test case. It's not anything that actually came from a real workload.

p.s. We should probably do the same thing for switch instructions.

Differential Revision: http://reviews.llvm.org/D8220

llvm-svn: 231881
2015-03-10 22:52:37 +00:00
Philip Reames 1c29227144 Infer known bits from dominating conditions
This patch adds limited support in ValueTracking for inferring known bits of a value from conditional expressions which must be true to reach the instruction we're trying to optimize. At this time, the feature is off by default. Once landed, I'm hoping for feedback from others on both profitability and compile time impact.

Forms of conditional value propagation have been tried in LLVM before and have failed due to compile time problems.  In an attempt to side step that, this patch only considers conditions where the edge leaving the branch dominates the context instruction. It does not attempt full dataflow.  Even with that restriction, it handles many interesting cases:
 * Early exits from functions
 * Early exits from loops (for context instructions in the loop and after the check)
 * Conditions which control entry into loops, including multi-version loops (such as those produced during vectorization, IRCE, loop unswitch, etc..)

Possible applications include optimizing using information provided by constructs such as: preconditions, assumptions, null checks, & range checks.

This patch implements two approaches to the problem that need further benchmarking.  Approach 1 is to directly walk the dominator tree looking for interesting conditions.  Approach 2 is to inspect other uses of the value being queried for interesting comparisons.  From initial benchmarking, it appears that Approach 2 is faster than Approach 1, but this needs to be further validated.  

Differential Revision: http://reviews.llvm.org/D7708

llvm-svn: 231879
2015-03-10 22:43:20 +00:00
Owen Anderson 58364dc4da Fix a crash in InstCombine where we could try to truncate a switch comparison to zero width.
llvm-svn: 231761
2015-03-10 06:51:39 +00:00
Owen Anderson 51b75b8c34 Fix an infinite loop in InstCombine when an instruction with no users and side effects can be constant folded.
ReplaceInstUsesWith needs to return nullptr when the input has no users,
because in that case it does not mutate the program.  Otherwise, we can
get stuck in an infinite loop of repeatedly attempting to constant fold
and instruction with no users.

llvm-svn: 231755
2015-03-10 05:13:47 +00:00
Mehdi Amini eb242a5041 InstCombine: fix fold "fcmp x, undef" to account for NaN
Summary:
See the two test cases.

; Can fold fcmp with undef on one side by choosing NaN for the undef

; Can fold fcmp with undef on both side
;   fcmp u_pred undef, undef -> true
;   fcmp o_pred undef, undef -> false
; because whatever you choose for the first undef
; you can choose NaN for the other undef

Reviewers: hfinkel, chandlerc, majnemer

Reviewed By: majnemer

Subscribers: majnemer, llvm-commits

Differential Revision: http://reviews.llvm.org/D7617

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231626
2015-03-09 03:20:25 +00:00
Owen Anderson 7e621e9d5e Teach DataLayout to infer a plausible alignment for things even when nothing is specified by the user.
llvm-svn: 231613
2015-03-08 21:53:59 +00:00
Nadav Rotem c99a38796c Teach ComputeNumSignBits about signed reminder.
This optimization a continuation of r231140 that reasoned about signed div.

llvm-svn: 231433
2015-03-06 00:23:58 +00:00
Michael Kuperstein bcb26d6880 [InstCombine] Fix an assertion when fmul has a ConstantExpr operand
isNormalFp and isFiniteNonZeroFp should not assume vector operands can not be constant expressions.

Patch by Pawel Jurek <pawel.jurek@intel.com>
Differential Revision: http://reviews.llvm.org/D8053

llvm-svn: 231359
2015-03-05 08:38:57 +00:00
Mehdi Amini 46a43556db Make DataLayout Non-Optional in the Module
Summary:
DataLayout keeps the string used for its creation.

As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().

Get rid of DataLayoutPass: the DataLayout is in the Module

The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.

Make DataLayout Non-Optional in the Module

Module->getDataLayout() will never returns nullptr anymore.

Reviewers: echristo

Subscribers: resistor, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D7992

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
2015-03-04 18:43:29 +00:00
David Majnemer 1bacc0abc9 InstCombine: Ensure select condition types are identical before merging
Selection conditions may be vectors or scalars.  Make sure InstCombine
doesn't indiscriminately assume that a select which is value dependent
on another select have identical select condition types.

This fixes PR22773.

llvm-svn: 231156
2015-03-03 22:40:36 +00:00
Nadav Rotem 029c5c7fdb Teach ComputeNumSignBits about signed divisions.
http://reviews.llvm.org/D8028
rdar://20023136

llvm-svn: 231140
2015-03-03 21:39:02 +00:00