Commit Graph

1848 Commits

Author SHA1 Message Date
David Majnemer 7d0e99c601 [InstCombine] Use a more targeted fix instead of r235544
Only clear out the NSW/NUW flags if we are optimizing 'add'/'sub' while
taking advantage that the sign bit is not set.  We do this optimization
to further shrink the mask but shrinking the mask isn't NSW/NUW
preserving in this case.

llvm-svn: 235558
2015-04-22 22:42:05 +00:00
David Majnemer fe58d13a17 [InstCombine] Clear out nsw/nuw if we modify computation in the chain
An nsw/nuw operation relies on the values feeding into it to not
overflow if 'poison' is not to be produced.  This means that
optimizations which make modifications to the bottom of a chain (like
SimplifyDemandedBits) must strip out nsw/nuw if they cannot ensure that
they will be preserved.

This fixes PR23309.

llvm-svn: 235544
2015-04-22 20:59:28 +00:00
NAKAMURA Takumi b8854d01a6 Remove a zero-length file of llvm/test/Transforms/InstCombine/descale-zero.ll.
llvm-svn: 235457
2015-04-21 23:14:33 +00:00
Wei Mi a0adf9fd41 Limiting gep merging to fix the performance problem described in
https://llvm.org/bugs/show_bug.cgi?id=23163.

Gep merging sometimes behaves like a reverse CSE/LICM optimization,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.

The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.

Differential Revision: http://reviews.llvm.org/D8911

llvm-svn: 235455
2015-04-21 23:02:15 +00:00
Wei Mi 2940bc82ac Revert r235451 since it is attached to a wrong Differential Revision. Sorry.
llvm-svn: 235453
2015-04-21 22:56:09 +00:00
Wei Mi 6e3344ed98 Limiting gep merging to fix the performance problem described in
https://llvm.org/bugs/show_bug.cgi?id=23163.

Gep merging sometimes behaves like a reverse CSE/LICM optimizations,
which has negative impact on performance. In this patch we restrict
gep merging to happen only when the indexes to be merged are both consts,
which ensures such merge is always beneficial.

The patch makes gep merging only happen in very restrictive cases.
It is possible that some analysis/optimization passes rely on the merged
geps to get better result, and we havn't notice them yet. We will be ready
to further improve it once we see the cases.

Differential Revision: http://reviews.llvm.org/D9007

llvm-svn: 235451
2015-04-21 22:37:09 +00:00
Fiona Glaser 0d41db11a2 InstCombine: fold (sitofp (zext x)) to (uitofp x)
This is okay because the zext guarantees the high bit is zero,
and so the value is unsigned.

llvm-svn: 235364
2015-04-21 00:05:41 +00:00
David Majnemer 45951a6626 [InstCombine] (mul nsw 1, INT_MIN) != (shl nsw 1, 31)
Multiplying INT_MIN by 1 doesn't trigger nsw.  However, shifting 1 into
the sign bit *does* trigger nsw.

llvm-svn: 235250
2015-04-18 04:41:30 +00:00
David Blaikie 23af64846f [opaque pointer type] Add textual IR support for explicit type parameter to the call instruction
See r230786 and r230794 for similar changes to gep and load
respectively.

Call is a bit different because it often doesn't have a single explicit
type - usually the type is deduced from the arguments, and just the
return type is explicit. In those cases there's no need to change the
IR.

When that's not the case, the IR usually contains the pointer type of
the first operand - but since typed pointers are going away, that
representation is insufficient so I'm just stripping the "pointerness"
of the explicit type away.

This does make the IR a bit weird - it /sort of/ reads like the type of
the first operand: "call void () %x(" but %x is actually of type "void
()*" and will eventually be just of type "ptr". But this seems not too
bad and I don't think it would benefit from repeating the type
("void (), void () * %x(" and then eventually "void (), ptr %x(") as has
been done with gep and load.

This also has a side benefit: since the explicit type is no longer a
pointer, there's no ambiguity between an explicit type and a function
that returns a function pointer. Previously this case needed an explicit
type (eg: a function returning a void() function was written as
"call void () () * @x(" rather than "call void () * @x(" because of the
ambiguity between a function returning a pointer to a void() function
and a function returning void).

No ambiguity means even function pointer return types can just be
written alone, without writing the whole function's type.

This leaves /only/ the varargs case where the explicit type is required.

Given the special type syntax in call instructions, the regex-fu used
for migration was a bit more involved in its own unique way (as every
one of these is) so here it is. Use it in conjunction with the apply.sh
script and associated find/xargs commands I've provided in rr230786 to
migrate your out of tree tests. Do let me know if any of this doesn't
cover your cases & we can iterate on a more general script/regexes to
help others with out of tree tests.

About 9 test cases couldn't be automatically migrated - half of those
were functions returning function pointers, where I just had to manually
delete the function argument types now that we didn't need an explicit
function type there. The other half were typedefs of function types used
in calls - just had to manually drop the * from those.

import fileinput
import sys
import re

pat = re.compile(r'((?:=|:|^|\s)call\s(?:[^@]*?))(\s*$|\s*(?:(?:\[\[[a-zA-Z0-9_]+\]\]|[@%](?:(")?[\\\?@a-zA-Z0-9_.]*?(?(3)"|)|{{.*}}))(?:\(|$)|undef|inttoptr|bitcast|null|asm).*$)')
addrspace_end = re.compile(r"addrspace\(\d+\)\s*\*$")
func_end = re.compile("(?:void.*|\)\s*)\*$")

def conv(match, line):
  if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)):
    return line
  return line[:match.start()] + match.group(1)[:match.group(1).rfind('*')].rstrip() + match.group(2) + line[match.end():]

for line in sys.stdin:
  sys.stdout.write(conv(re.search(pat, line), line))

llvm-svn: 235145
2015-04-16 23:24:18 +00:00
Sanjay Patel c86867cd5f [X86, SSE] instcombine common cases of insertps intrinsics into shuffles
This is very similar to D8486 / r232852 (vperm2). If we treat insertps intrinsics
as shufflevectors, we can optimize them better.

I've left all but the full zero case of the zero mask variants out of this patch. 
I don't think those can be converted into a single shuffle in all cases, but I'd
be happy to be proven wrong as I was for vperm2f128.

Either way, we'd need to support whatever sequence we come up with for those cases
in the backend before converting them here.

Differential Revision: http://reviews.llvm.org/D8833

llvm-svn: 235124
2015-04-16 17:52:13 +00:00
Nick Lewycky abe2cc17da Subtraction is not commutative. Fixes PR23212!
llvm-svn: 234780
2015-04-13 19:17:37 +00:00
Sanjoy Das b6c5914308 [InstCombine][CodeGenPrep] Create llvm.uadd.with.overflow in CGP.
Summary:
This change moves creating calls to `llvm.uadd.with.overflow` from
InstCombine to CodeGenPrep.  Combining overflow check patterns into
calls to the said intrinsic in InstCombine inhibits optimization because
it introduces an intrinsic call that not all other transforms and
analyses understand.

Depends on D8888.

Reviewers: majnemer, atrick

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D8889

llvm-svn: 234638
2015-04-10 21:07:09 +00:00
David Majnemer b2e0f7a40f Fix a typo
CHECK-LABEL had the wrong function name.

llvm-svn: 234051
2015-04-03 20:56:24 +00:00
David Majnemer 98cfe2b7a5 [InstCombine] Use DataLayout to determine vector element width
InstCombine didn't realize that it needs to use DataLayout to determine
how wide pointers are.  This lead to assertion failures.

This fixes PR23113.

llvm-svn: 234046
2015-04-03 20:18:40 +00:00
Duncan P. N. Exon Smith 49e6a70fe3 Verifier: Call verifyModule() from llc and opt
Change `llc` and `opt` to run `verifyModule()`.  This ensures that we
check the full module before `FunctionPass::doInitialization()` ever
gets called (I was getting crashes in `DwarfDebug` instead of verifier
failures when testing a WIP patch that checks operands of compile
units).  In `opt`, also move up debug-info-stripping so that it still
runs before verification.

There was a fair bit of broken code that was sitting in tree.
Interestingly, some were cases of a `select` that referred to itself in
`-instcombine` tests (apparently an intermediate result).  I split them
off to `*-noverify.ll` tests with RUN lines like this:

    opt < %s -S -disable-verify -instcombine | opt -S | FileCheck %s

This avoids verifying the input file (so we can get the broken code into
`-instcombine), but still verifies the output with a second call to
`opt` (to verify that `-instcombine` will clean it up like it should).

llvm-svn: 233432
2015-03-27 22:04:28 +00:00
Duncan P. N. Exon Smith 988a7f8b79 DebugInfo: Fix bad debug info for compile units and types
Fix debug info in these tests, which started failing with a WIP patch to
verify compile units and types.  The problems look like they were all
caused by bitrot.  They fell into these categories:

  - Using `!{i32 0}` instead of `!{}`.
  - Using `!{null}` instead of `!{}`.
  - Using `!MDExpression()` instead of `!{}`.
  - Using `!8` instead of `!{!8}`.
  - `file:` references that pointed at `MDCompileUnit`s instead of the
    same `MDFile` as the compile unit.
  - `file:` references that were numerically off-by-one or (off-by-ten).

llvm-svn: 233415
2015-03-27 20:46:33 +00:00
Philip Reames e1bf27045d Require a GC strategy be specified for functions which use gc.statepoint
This was discussed a while back and I left it optional for migration.  Since it's been far more than the 'week or two' that was discussed, time to actually make this manditory.  

llvm-svn: 233357
2015-03-27 05:09:33 +00:00
Benjamin Kramer 7fa8c430f7 InstCombine: fold (A << C) == (B << C) --> ((A^B) & (~0U >> C)) == 0
Anding and comparing with zero can be done in a single instruction on
most archs so this is a bit cheaper.

llvm-svn: 233291
2015-03-26 17:12:06 +00:00
Sanjay Patel e304bea010 optimize the AVX2 (integer) version of vperm2 into a shuffle
...because this is what happens when an instruction
set puts its underwear on after its pants.

This is an extension of r232852, r233100, and 233110:
http://llvm.org/viewvc/llvm-project?view=revision&revision=232852
http://llvm.org/viewvc/llvm-project?view=revision&revision=233100
http://llvm.org/viewvc/llvm-project?view=revision&revision=233110

llvm-svn: 233127
2015-03-24 22:39:29 +00:00
David Blaikie 1a6bb9fcf6 Revert "Remove an InstCombine that seems to have become redundant."
Assertion fires in compiler-rt. Guess it does fire..

This reverts commit r233116.

llvm-svn: 233121
2015-03-24 21:50:35 +00:00
David Blaikie e37e10dc57 Remove an InstCombine that seems to have become redundant.
Assert that this doesn't fire - I'll remove all of this later, but just
leaving it in for a while in case this is firing & we just don't have
test coverage.

llvm-svn: 233116
2015-03-24 21:31:31 +00:00
Sanjay Patel 43a87fdc79 [X86, AVX] instcombine vperm2 intrinsics with zero inputs into shuffles
This is the IR optimizer follow-on patch for D8563: the x86 backend patch
that converts this kind of shuffle back into a vperm2.

This is also a continuation of the transform that started in D8486. 
In that patch, Andrea suggested that we could convert vperm2 intrinsics that
use zero masks into a single shuffle. 

This is an implementation of that suggestion.

Differential Revision: http://reviews.llvm.org/D8567

llvm-svn: 233110
2015-03-24 20:36:42 +00:00
Benjamin Kramer d6aa0ec737 [SimplifyLibCalls] Fix negative shifts being produced by the memchr -> bitfield transform.
llvm-svn: 232903
2015-03-21 22:04:26 +00:00
Benjamin Kramer 7857d723f1 [SimplifyLibCalls] Turn memchr(const, C, const) into a bitfield check.
strchr("123!", C) != nullptr is a common pattern to check if C is one
of 1, 2, 3 or !. If the largest element of the string is smaller than
the target's register size we can easily create a bitfield and just
do a simple test for set membership.

int foo(char C) { return strchr("123!", C) != nullptr; } now becomes

	cmpl	$64, %edi ## range check
	sbbb	%al, %al
	movabsq	$0xE000200000001, %rcx
	btq	%rdi, %rcx ## bit test
	sbbb	%cl, %cl
	andb	%al, %cl ## and the two conditions
	andb	$1, %cl
	movzbl	%cl, %eax ## returning an int
	ret

(imho the backend should expand this into a series of branches, but
that's a different story)

The code is currently limited to bit fields that fit in a register, so
usually 64 or 32 bits. Sadly, this misses anything using alpha chars
or {}. This could be fixed by just emitting a i128 bit field, but that
can generate really ugly code so we have to find a better way. To some
degree this is also recreating switch lowering logic, but we can't
simply emit a switch instruction and thus change the CFG within
instcombine.

llvm-svn: 232902
2015-03-21 21:09:33 +00:00
Benjamin Kramer 691363e7f2 SimplifyLibCalls: Add basic optimization of memchr calls.
This is just memchr(x, y, 0) -> nullptr and constant folding.

llvm-svn: 232896
2015-03-21 15:36:21 +00:00
Sanjay Patel ccf5f24b7b [X86, AVX] instcombine common cases of vperm2* intrinsics into shuffles
vperm2* intrinsics are just shuffles. 
In a few special cases, they're not even shuffles.

Optimizing intrinsics in InstCombine is better than
handling this in the front-end for at least two reasons:

1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders
   really angry (and so I have regrets about some patches from last week).

2. Doing mask conversion logic in header files is hard to write and 
   subsequently read.

There are a couple of TODOs in this patch to complete this optimization.

Differential Revision: http://reviews.llvm.org/D8486

llvm-svn: 232852
2015-03-20 21:47:56 +00:00
Daniel Jasper 5add63f21e [InstCombine] Don't fold a GEP into itself through a PHI node
This can only occur (I think) through the back-edge of the loop.

However, folding a GEP into itself means that the value of the previous
iteration needs to be stored in the meantime, thus requiring an
additional register variable to be live, but not actually achieving
anything (the gep still needs to be executed once per loop iteration).

The attached test case is derived from:
  typedef unsigned uint32;
  typedef unsigned char uint8;
  inline uint8 *f(uint32 value, uint8 *target) {
    while (value >= 0x80) {
      value >>= 7;
      ++target;
    }
    ++target;
    return target;
  }
  uint8 *g(uint32 b, uint8 *target) {
    target = f(b, f(42, target));
    return target;
  }

What happens is that the GEP stored in incptr2 is folded into itself
through the loop's back-edge and the phi-node stored in loopptr,
effectively incrementing the ptr by "2" in each iteration instead of "1".

In this case, it is actually increasing the number of GEPs required as
the GEP before the loop can't be folded away anymore. For comparison:

With this patch:
  define i8* @test4(i32 %value, i8* %buffer) {
  entry:
    %cmp = icmp ugt i32 %value, 127
    br i1 %cmp, label %loop.header, label %exit

  loop.header:                                      ; preds = %entry
    br label %loop.body

  loop.body:                                        ; preds = %loop.body, %loop.header
    %buffer.pn = phi i8* [ %buffer, %loop.header ], [ %loopptr, %loop.body ]
    %newval = phi i32 [ %value, %loop.header ], [ %shr, %loop.body ]
    %loopptr = getelementptr inbounds i8, i8* %buffer.pn, i64 1
    %shr = lshr i32 %newval, 7
    %cmp2 = icmp ugt i32 %newval, 16383
    br i1 %cmp2, label %loop.body, label %loop.exit

  loop.exit:                                        ; preds = %loop.body
    br label %exit

  exit:                                             ; preds = %loop.exit, %entry
    %0 = phi i8* [ %loopptr, %loop.exit ], [ %buffer, %entry ]
    %incptr3 = getelementptr inbounds i8, i8* %0, i64 2
    ret i8* %incptr3
  }

Without this patch:
  define i8* @test4(i32 %value, i8* %buffer) {
  entry:
    %incptr = getelementptr inbounds i8, i8* %buffer, i64 1
    %cmp = icmp ugt i32 %value, 127
    br i1 %cmp, label %loop.header, label %exit

  loop.header:                                      ; preds = %entry
    br label %loop.body

  loop.body:                                        ; preds = %loop.body, %loop.header
    %0 = phi i8* [ %buffer, %loop.header ], [ %loopptr, %loop.body ]
    %loopptr = phi i8* [ %incptr, %loop.header ], [ %incptr2, %loop.body ]
    %newval = phi i32 [ %value, %loop.header ], [ %shr, %loop.body ]
    %shr = lshr i32 %newval, 7
    %incptr2 = getelementptr inbounds i8, i8* %0, i64 2
    %cmp2 = icmp ugt i32 %newval, 16383
    br i1 %cmp2, label %loop.body, label %loop.exit

  loop.exit:                                        ; preds = %loop.body
    br label %exit

  exit:                                             ; preds = %loop.exit, %entry
    %ptr2 = phi i8* [ %incptr2, %loop.exit ], [ %incptr, %entry ]
    %incptr3 = getelementptr inbounds i8, i8* %ptr2, i64 1
    ret i8* %incptr3
  }

Review: http://reviews.llvm.org/D8245
llvm-svn: 232718
2015-03-19 11:05:08 +00:00
Duncan P. N. Exon Smith 166121ad0b Verifier: Check debug info intrinsic arguments
Verify that debug info intrinsic arguments are valid.  (These checks
will not recurse through the full debug info graph, so they don't need
to be cordoned of in `DebugInfoVerifier`.)

With those checks in place, changing the `DbgIntrinsicInst` accessors to
downcast to `MDLocalVariable` and `MDExpression` is natural (added isa
specializations in `Metadata.h` to support this).

Added tests to `test/Verifier` for the new -verify checks, and fixed the
debug info in all the in-tree tests.

If you have out-of-tree testcases that have started to fail to -verify,
hopefully the verify checks are helpful.  The most likely problem is
that the expression argument is `!{}` (instead of `!MDExpression()`).

llvm-svn: 232296
2015-03-15 01:21:30 +00:00
Mehdi Amini b344ac9afe Update InstCombine to transform aggregate stores into scalar stores.
Summary: This is a first step toward getting proper support for aggregate loads and stores.

Test Plan: Added unittests

Reviewers: reames, chandlerc

Reviewed By: chandlerc

Subscribers: majnemer, joker.eph, chandlerc, llvm-commits

Differential Revision: http://reviews.llvm.org/D7780

Patch by Amaury Sechet

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 232284
2015-03-14 22:19:33 +00:00
Duncan P. N. Exon Smith be95b4afc6 instcombine: alloca: Canonicalize scalar allocation array size
As a follow-up to r232200, add an `-instcombine` to canonicalize scalar
allocations to `i32 1`.  Since r232200, `iX 1` (for X != 32) are only
created by RAUWs, so this shouldn't fire too often.  Nevertheless, it's
a cheap check and a nice cleanup.

llvm-svn: 232202
2015-03-13 19:42:09 +00:00
Duncan P. N. Exon Smith 720762e2c0 AsmWriter: Write alloca array size explicitly (and -instcombine fixup)
Write the `alloca` array size explicitly when it's non-canonical.
Previously, if the array size was `iX 1` (where X is not 32), the type
would mutate to `i32` when round-tripping through assembly.

The testcase I added fails in `verify-uselistorder` (as well as
`FileCheck`), since the use-lists for `i32 1` and `i64 1` change.
(Manman Ren came across this when running `verify-uselistorder` on some
non-trivial, optimized code as part of PR5680.)

The type mutation started with r104911, which allowed array sizes to be
something other than an `i32`.  Starting with r204945, we
"canonicalized" to `i64` on 64-bit platforms -- and then on every
round-trip through assembly, mutated back to `i32`.

I bundled a fixup for `-instcombine` to avoid r204945 on scalar
allocations.  (There wasn't a clean way to sequence this into two
commits, since the assembly change on its own caused testcase churn, and
the `-instcombine` change can't be tested without the assembly changes.)

An obvious alternative fix -- change `AllocaInst::AllocaInst()`,
`AsmWriter` and `LLParser` to treat `intptr_t` as the canonical type for
scalar allocations -- was rejected out of hand, since this required
teaching them each about the data layout.

A follow-up commit will add an `-instcombine` to canonicalize the scalar
allocation array size to `i32 1` rather than leaving `iX 1` alone.

rdar://problem/20075773

llvm-svn: 232200
2015-03-13 19:30:44 +00:00
David Blaikie f72d05bc7b [opaque pointer type] Add textual IR support for explicit type parameter to gep operator
Similar to gep (r230786) and load (r230794) changes.

Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.

(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)

import fileinput
import sys
import re

rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)

def conv(match):
  line = match.group(1)
  line += match.group(4)
  line += ", "
  line += match.group(2)
  return line

line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
  sys.stdout.write(line[off:match.start()])
  sys.stdout.write(conv(match))
  off = match.end()
sys.stdout.write(line[off:])

llvm-svn: 232184
2015-03-13 18:20:45 +00:00
David Majnemer d61a6fd8ed InstCombine: Don't fold call bitcast into args if callee is byval
This fixes a bug reported here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150309/265341.html

llvm-svn: 231948
2015-03-11 18:03:05 +00:00
Sanjay Patel c04b6f242c Inliner should not add callgraph edges for intrinsic calls (PR22857)
The CallGraphNode function "addCalledFunction()" asserts that edges are not to intrinsics.

This patch makes sure that the Inliner does not add such an edge to the callgraph.

Fix for clang crash by assertion: https://llvm.org/bugs/show_bug.cgi?id=22857

Differential Revision: http://reviews.llvm.org/D8231

llvm-svn: 231927
2015-03-11 15:12:32 +00:00
Philip Reames 71c4035c18 If a conditional branch jumps to the same target, remove the condition
Given that large parts of inst combine is restricted to instructions which have one use, getting rid of a use on the condition can help the effectiveness of the optimizer. Also, it allows the condition to potentially be deleted by instcombine rather than waiting for another pass.

I noticed this completely by accident in another test case. It's not anything that actually came from a real workload.

p.s. We should probably do the same thing for switch instructions.

Differential Revision: http://reviews.llvm.org/D8220

llvm-svn: 231881
2015-03-10 22:52:37 +00:00
Philip Reames 1c29227144 Infer known bits from dominating conditions
This patch adds limited support in ValueTracking for inferring known bits of a value from conditional expressions which must be true to reach the instruction we're trying to optimize. At this time, the feature is off by default. Once landed, I'm hoping for feedback from others on both profitability and compile time impact.

Forms of conditional value propagation have been tried in LLVM before and have failed due to compile time problems.  In an attempt to side step that, this patch only considers conditions where the edge leaving the branch dominates the context instruction. It does not attempt full dataflow.  Even with that restriction, it handles many interesting cases:
 * Early exits from functions
 * Early exits from loops (for context instructions in the loop and after the check)
 * Conditions which control entry into loops, including multi-version loops (such as those produced during vectorization, IRCE, loop unswitch, etc..)

Possible applications include optimizing using information provided by constructs such as: preconditions, assumptions, null checks, & range checks.

This patch implements two approaches to the problem that need further benchmarking.  Approach 1 is to directly walk the dominator tree looking for interesting conditions.  Approach 2 is to inspect other uses of the value being queried for interesting comparisons.  From initial benchmarking, it appears that Approach 2 is faster than Approach 1, but this needs to be further validated.  

Differential Revision: http://reviews.llvm.org/D7708

llvm-svn: 231879
2015-03-10 22:43:20 +00:00
Owen Anderson 58364dc4da Fix a crash in InstCombine where we could try to truncate a switch comparison to zero width.
llvm-svn: 231761
2015-03-10 06:51:39 +00:00
Owen Anderson 51b75b8c34 Fix an infinite loop in InstCombine when an instruction with no users and side effects can be constant folded.
ReplaceInstUsesWith needs to return nullptr when the input has no users,
because in that case it does not mutate the program.  Otherwise, we can
get stuck in an infinite loop of repeatedly attempting to constant fold
and instruction with no users.

llvm-svn: 231755
2015-03-10 05:13:47 +00:00
Mehdi Amini eb242a5041 InstCombine: fix fold "fcmp x, undef" to account for NaN
Summary:
See the two test cases.

; Can fold fcmp with undef on one side by choosing NaN for the undef

; Can fold fcmp with undef on both side
;   fcmp u_pred undef, undef -> true
;   fcmp o_pred undef, undef -> false
; because whatever you choose for the first undef
; you can choose NaN for the other undef

Reviewers: hfinkel, chandlerc, majnemer

Reviewed By: majnemer

Subscribers: majnemer, llvm-commits

Differential Revision: http://reviews.llvm.org/D7617

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231626
2015-03-09 03:20:25 +00:00
Owen Anderson 7e621e9d5e Teach DataLayout to infer a plausible alignment for things even when nothing is specified by the user.
llvm-svn: 231613
2015-03-08 21:53:59 +00:00
Nadav Rotem c99a38796c Teach ComputeNumSignBits about signed reminder.
This optimization a continuation of r231140 that reasoned about signed div.

llvm-svn: 231433
2015-03-06 00:23:58 +00:00
Michael Kuperstein bcb26d6880 [InstCombine] Fix an assertion when fmul has a ConstantExpr operand
isNormalFp and isFiniteNonZeroFp should not assume vector operands can not be constant expressions.

Patch by Pawel Jurek <pawel.jurek@intel.com>
Differential Revision: http://reviews.llvm.org/D8053

llvm-svn: 231359
2015-03-05 08:38:57 +00:00
Mehdi Amini 46a43556db Make DataLayout Non-Optional in the Module
Summary:
DataLayout keeps the string used for its creation.

As a side effect it is no longer needed in the Module.
This is "almost" NFC, the string is no longer
canonicalized, you can't rely on two "equals" DataLayout
having the same string returned by getStringRepresentation().

Get rid of DataLayoutPass: the DataLayout is in the Module

The DataLayout is "per-module", let's enforce this by not
duplicating it more than necessary.
One more step toward non-optionality of the DataLayout in the
module.

Make DataLayout Non-Optional in the Module

Module->getDataLayout() will never returns nullptr anymore.

Reviewers: echristo

Subscribers: resistor, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D7992

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 231270
2015-03-04 18:43:29 +00:00
David Majnemer 1bacc0abc9 InstCombine: Ensure select condition types are identical before merging
Selection conditions may be vectors or scalars.  Make sure InstCombine
doesn't indiscriminately assume that a select which is value dependent
on another select have identical select condition types.

This fixes PR22773.

llvm-svn: 231156
2015-03-03 22:40:36 +00:00
Nadav Rotem 029c5c7fdb Teach ComputeNumSignBits about signed divisions.
http://reviews.llvm.org/D8028
rdar://20023136

llvm-svn: 231140
2015-03-03 21:39:02 +00:00
Duncan P. N. Exon Smith e274180f0e DebugInfo: Move new hierarchy into place
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464.  I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned.  Let me know if I'm wrong :).

The code changes are fairly mechanical:

  - Bumped the "Debug Info Version".
  - `DIBuilder` now creates the appropriate subclass of `MDNode`.
  - Subclasses of DIDescriptor now expect to hold their "MD"
    counterparts (e.g., `DIBasicType` expects `MDBasicType`).
  - Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
    for printing comments.
  - Big update to LangRef to describe the nodes in the new hierarchy.
    Feel free to make it better.

Testcase changes are enormous.  There's an accompanying clang commit on
its way.

If you have out-of-tree debug info testcases, I just broke your build.

  - `upgrade-specialized-nodes.sh` is attached to PR22564.  I used it to
    update all the IR testcases.
  - Unfortunately I failed to find way to script the updates to CHECK
    lines, so I updated all of these by hand.  This was fairly painful,
    since the old CHECKs are difficult to reason about.  That's one of
    the benefits of the new hierarchy.

This work isn't quite finished, BTW.  The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro).  Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers.  I
also expect to make a few schema changes now that it's easier to reason
about everything.

llvm-svn: 231082
2015-03-03 17:24:31 +00:00
David Blaikie a79ac14fa6 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
David Blaikie 79e6c74981 [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.

This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.

* This doesn't modify gep operators, only instructions (operators will be
  handled separately)

* Textual IR changes only. Bitcode (including upgrade) and changing the
  in-memory representation will be in separate changes.

* geps of vectors are transformed as:
    getelementptr <4 x float*> %x, ...
  ->getelementptr float, <4 x float*> %x, ...
  Then, once the opaque pointer type is introduced, this will ultimately look
  like:
    getelementptr float, <4 x ptr> %x
  with the unambiguous interpretation that it is a vector of pointers to float.

* address spaces remain on the pointer, not the type:
    getelementptr float addrspace(1)* %x
  ->getelementptr float, float addrspace(1)* %x
  Then, eventually:
    getelementptr float, ptr addrspace(1) %x

Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.

update.py:
import fileinput
import sys
import re

ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")

def conv(match, line):
  if not match:
    return line
  line = match.groups()[0]
  if len(match.groups()[5]) == 0:
    line += match.groups()[2]
  line += match.groups()[3]
  line += ", "
  line += match.groups()[1]
  line += "\n"
  return line

for line in sys.stdin:
  if line.find("getelementptr ") == line.find("getelementptr inbounds"):
    if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
      line = conv(re.match(ibrep, line), line)
  elif line.find("getelementptr ") != line.find("getelementptr ("):
    line = conv(re.match(normrep, line), line)
  sys.stdout.write(line)

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).

The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7636

llvm-svn: 230786
2015-02-27 19:29:02 +00:00
Hal Finkel 221f467185 [InstCombine/PowerPC] Convert aligned QPX load/store intrinsics into loads/stores
InstCombine has long had logic to convert aligned Altivec load/store intrinsics
into regular loads and stores. This mirrors that functionality for QPX vector
load/store intrinsics.

llvm-svn: 230660
2015-02-26 18:56:03 +00:00
Hal Finkel 18ee7c14fd [InstCombine] Add a test for altivec load/store intrinsic simplification
InstCombine has logic to convert aligned Altivec load/store intrinsics into
regular loads and stores. Unfortunately, there seems to be no regression test
covering this behavior. Adding one...

llvm-svn: 230632
2015-02-26 14:22:41 +00:00
JF Bastien d52c990a90 InstCombine: extract instead of shuffle when performing vector/array type punning
Summary: SROA generates code that isn't quite as easy to optimize and contains unusual-sized shuffles, but that code is generally correct. As discussed in D7487 the right place to clean things up is InstCombine, which will pick up the type-punning pattern and transform it into a more obvious bitcast+extractelement, while leaving the other patterns SROA encounters as-is.

Test Plan: make check

Reviewers: jvoung, chandlerc

Subscribers: llvm-commits
llvm-svn: 230560
2015-02-25 22:30:51 +00:00
Charles Davis 33d1dc0008 [IC] Turn non-null MD on pointer loads to range MD on integer loads.
Summary:
This change fixes the FIXME that you recently added when you committed
(a modified version of) my patch.  When `InstCombine` combines a load and
store of an pointer to those of an equivalently-sized integer, it currently
drops any `!nonnull` metadata that might be present.  This change replaces
`!nonnull` metadata with `!range !{ 1, -1 }` metadata instead.

Reviewers: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7621

llvm-svn: 230462
2015-02-25 05:10:25 +00:00
Sanjoy Das 82ea3d45b5 New instcombine rule: max(~a,~b) -> ~min(a, b)
This case is interesting because ScalarEvolutionExpander lowers min(a,
b) as ~max(~a,~b).  I think the profitability heuristics can be made
more clever/aggressive, but this is a start.

Differential Revision: http://reviews.llvm.org/D7821

llvm-svn: 230285
2015-02-24 00:08:41 +00:00
Hal Finkel 847e05f569 [InstCombine] Remove unnecessary variable indexing into single-element arrays
This change addresses a deficiency pointed out in PR22629. To copy from the bug
report:

[from the bug report]

Consider this code:

int f(int x) {
  int a[] = {12};
  return a[x];
}

GCC knows to optimize this to

movl     $12, %eax
ret

The code generated by recent Clang at -O3 is:

movslq   %edi, %rax
movl     .L_ZZ1fiE1a(,%rax,4), %eax
retq

.L_ZZ1fiE1a:
  .long    12                      # 0xc

[end from the bug report]

This definitely seems worth fixing. I've also seen this kind of code before (as
the base case of generic vector wrapper templates with one element).

The general idea is to look at the GEP feeding a load or a store, which has
some variable as its first non-zero index, and determine if that index must be
zero (or else an out-of-bounds access would occur). We can do this for allocas
and globals with constant initializers where we know the maximum size of the
underlying object. When we find such a GEP, we create a new one for the memory
access with that first variable index replaced with a constant zero.

Even if we can't eliminate the memory access (and sometimes we can't), it is
still useful because it removes unnecessary indexing calculations.

llvm-svn: 229959
2015-02-20 03:05:53 +00:00
Akira Hatanaka 1defd5afbd [InstCombine] Do not insert a GEP instruction before a landingpad instruction.
InstCombiner::visitGetElementPtrInst was using getFirstNonPHI to compute the
insertion point, which caused the verifier to complain when a GEP was inserted
before a landingpad instruction. This commit fixes it to use getFirstInsertionPt
instead.

rdar://problem/19394964

llvm-svn: 229619
2015-02-18 03:30:11 +00:00
Mehdi Amini b9a0fa4822 InstCombine: fold more cases of (fp_to_u/sint (u/sint_to_fp val))
Fixes radar 15486701.

From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 229437
2015-02-16 21:47:54 +00:00
Mehdi Amini 7aab8752ba Tests: reformat sitofp.ll and use FileCheck
From: Fiona Glaser <fglaser@apple.com>
llvm-svn: 229436
2015-02-16 21:47:50 +00:00
Ramkumar Ramachandra 8fcb498a9a InstCombine: propagate deref via new addDereferenceableAttr
The "dereferenceable" attribute cannot be added via .addAttribute(),
since it also expects a size in bytes. AttrBuilder#addAttribute or
AttributeSet#addAttribute is wrapped by classes Function, InvokeInst,
and CallInst. Add corresponding wrappers to
AttrBuilder#addDereferenceableAttr.

Having done this, propagate the dereferenceable attribute via
gc.relocate, adding a test to exercise it. Note that -datalayout is
required during execution over and above -instcombine, because
InstCombine only optionally requires DataLayoutPass.

Differential Revision: http://reviews.llvm.org/D7510

llvm-svn: 229265
2015-02-14 19:37:54 +00:00
Philip Reames 9ae15209ad [InstCombine] When canonicalizing gep indices, prefer zext when possible
If we know that the sign bit of a value being sign extended is zero, we can use a zero extension instead.  This is motivated by the fact that zero extensions are generally cheaper on x86 (and most other architectures?).  We already apply a similar transform in DAGCombine, this just extends that to the IR level.

This comes up when we eagerly canonicalize gep indices to the width of a machine register (i64 on x86_64). To do so, we insert sign extensions (sext) to promote smaller types. 

Differential Revision: http://reviews.llvm.org/D7255

llvm-svn: 229189
2015-02-14 00:05:36 +00:00
Andrea Di Biagio 30d471f6aa [InstCombine] Fix regression introduced at r227197.
This patch fixes a problem I accidentally introduced in an instruction combine
on select instructions added at r227197. That revision taught the instruction
combiner how to fold a cttz/ctlz followed by a icmp plus select into a single
cttz/ctlz with flag 'is_zero_undef' cleared.

However, the new rule added at r227197 would have produced wrong results in the
case where a cttz/ctlz with flag 'is_zero_undef' cleared was follwed by a
zero-extend or truncate. In that case, the folded instruction would have
been inserted in a wrong location thus leaving the CFG in an inconsistent
state.

This patch fixes the problem and add two reproducible test cases to
existing test 'InstCombine/select-cmp-cttz-ctlz.ll'.

llvm-svn: 229124
2015-02-13 16:33:34 +00:00
Michael Liao d266b928ae [InstCombine] Fix a bug when combining `icmp` from `ptrtoint`
- First, there's a crash when we try to combine that pointers into `icmp`
  directly by creating a `bitcast`, which is invalid if that two pointers are
  from different address spaces.

- It's not always appropriate to cast one pointer to another if they are from
  different address spaces as that is not no-op cast. Instead, we only combine
  `icmp` from `ptrtoint` if that two pointers are of the same address space.

llvm-svn: 229063
2015-02-13 04:51:26 +00:00
Chandler Carruth 87fdafc7b2 [IC] Fix a bug with the instcombine canonicalizing of loads and
propagating of metadata.

We were propagating !nonnull metadata even when the newly formed load is
no longer of a pointer type. This is clearly broken and results in LLVM
failing the verifier and aborting. This patch just restricts the
propagation of !nonnull metadata to when we actually have a pointer
type.

This bug report and the initial version of this patch was provided by
Charles Davis! Many thanks for finding this!

We still need to add logic to round-trip the metadata correctly if we
combine from pointer types to integer types and then back by using range
metadata for the integer type loads. But this is the minimal and safe
version of the patch, which is important so we can backport it into 3.6.

llvm-svn: 229029
2015-02-13 02:30:01 +00:00
Benjamin Kramer 443c7967ea InstCombine: Allow folding of xor into icmp by changing the predicate for vectors
The loop vectorizer can create this pattern.

llvm-svn: 228954
2015-02-12 20:26:46 +00:00
Chandler Carruth 2496910325 Revert r228556: InstCombine: propagate nonNull through assume
This commit isn't using the correct context, and is transfoming calls
that are operands to loads rather than calls that are operands to an
icmp feeding into an assume. I've replied on the original review thread
with a very reduced test case and some thoughts on how to rework this.

llvm-svn: 228677
2015-02-10 08:07:32 +00:00
Ramkumar Ramachandra a021ee62ca InstCombine: propagate nonNull through assume
Make assume (load (call|invoke) != null) set nonNull return attribute
for the call and invoke. Also include tests.

Differential Revision: http://reviews.llvm.org/D7107

llvm-svn: 228556
2015-02-09 01:13:13 +00:00
Matthias Braun 2e404597f4 InstCombine: Combine select sequences into a single select
Normalize
select(C0, select(C1, a, b), b) -> select((C0 & C1), a, b)
select(C0, a, select(C1, a, b)) -> select((C0 | C1), a, b)

This normal form may enable further combines on the And/Or and shortens
paths for the values. Many targets prefer the other but can go back
easily in CodeGen.

Differential Revision: http://reviews.llvm.org/D7399

llvm-svn: 228409
2015-02-06 17:49:36 +00:00
Reid Kleckner 4af6415237 Move EH personality type classification to Analysis/LibCallSemantics.h
Summary:
Also add enum types for __C_specific_handler and _CxxFrameHandler3 for
which we know a few things.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7214

llvm-svn: 227284
2015-01-28 01:17:38 +00:00
Ahmed Bougacha 1ac9356524 [SimplifyLibCalls] Don't confuse strcpy_chk for stpcpy_chk.
This was introduced in a faulty refactoring (r225640, mea culpa):
the tests weren't testing the return values, so, for both
__strcpy_chk and __stpcpy_chk, we would return the end of the
buffer (matching stpcpy) instead of the beginning (for strcpy).

The root cause was the prefix "__" being ignored when comparing,
which made us always pick LibFunc::stpcpy_chk.
Pass the LibFunc::Func directly to avoid this kind of error.
Also, make the testcases as explicit as possible to prevent this.

The now-useful testcases expose another, entangled, stpcpy problem,
with the further simplification.  This was introduced in a
refactoring (r225640) to match the original behavior.

However, this leads to problems when successive simplifications
generate several similar instructions, none of which are removed
by the custom replaceAllUsesWith.

For instance, InstCombine (the main user) doesn't erase the
instruction in its custom RAUW.  When trying to simplify say
__stpcpy_chk:
- first, an stpcpy is created (fortified simplifier),
- second, a memcpy is created (normal simplifier), but the
  stpcpy call isn't removed.
- third, InstCombine later revisits the instructions,
  and simplifies the first stpcpy to a memcpy.  We now have
  two memcpys.

llvm-svn: 227250
2015-01-27 21:52:16 +00:00
Andrea Di Biagio 086cbc37ad [InstCombine] Teach how to fold a select into a cttz/ctlz with the 'is_zero_undef' flag.
This patch teaches the Instruction Combiner how to fold a cttz/ctlz followed by
a icmp plus select into a single cttz/ctlz with flag 'is_zero_undef' cleared.

Added test InstCombine/select-cmp-cttz-ctlz.ll.

llvm-svn: 227197
2015-01-27 15:58:14 +00:00
Chandler Carruth 83ba269e4b [PM] Port instcombine to the new pass manager!
This is exciting as this is a much more involved port. This is
a complex, existing transformation pass. All of the core logic is shared
between both old and new pass managers. Only the access to the analyses
is separate because the actual techniques are separate. This also uses
a bunch of different and interesting analyses and is the first time
where we need to use an analysis across an IR layer.

This also paves the way to expose instcombine utility functions. I've
got a static function that implements the core pass logic over
a function which might be mildly interesting, but more interesting is
likely exposing a routine which just uses instructions *already in* the
worklist and combines until empty.

I've switched one of my favorite instcombine tests to run with both as
well to make sure this keeps working.

llvm-svn: 226987
2015-01-24 04:19:17 +00:00
Chandler Carruth cd8522ef44 [canonicalize] Teach InstCombine to canonicalize loads which are only
ever stored to always use a legal integer type if one is available.

Regardless of whether this particular type is good or bad, it ensures we
don't get weird differences in generated code (and resulting
performance) from "equivalent" patterns that happen to end up using
a slightly different type.

After some discussion on llvmdev it seems everyone generally likes this
canonicalization. However, there may be some parts of LLVM that handle
it poorly and need to be fixed. I have at least verified that this
doesn't impede GVN and instcombine's store-to-load forwarding powers in
any obvious cases. Subtle cases are exactly what we need te flush out if
they remain.

Also note that this IR pattern should already be hitting LLVM from Clang
at least because it is exactly the IR which would be produced if you
used memcpy to copy a pointer or floating point between memory instead
of a variable.

llvm-svn: 226781
2015-01-22 05:08:12 +00:00
David Majnemer 4c0a6e918a InstCombine: Don't strip bitcasts off of callsites marked 'thunk'
The return type of a thunk is meaningless, we just want the arguments
and return value to be forwarded.

llvm-svn: 226708
2015-01-21 22:32:04 +00:00
Richard Smith e78bb1249e For PR21145: recognise a builtin call to a known deallocation function even if
it's defined in the current module. Clang generates this situation for the
C++14 sized deallocation functions, because it generates a weak definition in
case one isn't provided by the C++ runtime library.

llvm-svn: 226069
2015-01-15 01:00:33 +00:00
Duncan P. N. Exon Smith 9885469922 IR: Move MDLocation into place
This commit moves `MDLocation`, finishing off PR21433.  There's an
accompanying clang commit for frontend testcases.  I'll attach the
testcase upgrade script I used to PR21433 to help out-of-tree
frontends/backends.

This changes the schema for `DebugLoc` and `DILocation` from:

    !{i32 3, i32 7, !7, !8}

to:

    !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8)

Note that empty fields (line/column: 0 and inlinedAt: null) don't get
printed by the assembly writer.

llvm-svn: 226048
2015-01-14 22:27:36 +00:00
David Majnemer a0afb55ff9 InstCombine: Don't take A-B<0 into A<B if A-B has other uses
This fixes PR22226.

llvm-svn: 226023
2015-01-14 19:26:56 +00:00
Ahmed Bougacha 71d7b18e3d [SimplifyLibCalls] Don't try to simplify indirect calls.
It turns out, all callsites of the simplifier are guarded by a check for
CallInst::getCalledFunction (i.e., to make sure the callee is direct).

This check wasn't done when trying to further optimize a simplified fortified
libcall, introduced by a refactoring in r225640.

Fix that, add a testcase, and document the requirement.

llvm-svn: 225895
2015-01-14 00:55:05 +00:00
Matt Arsenault b935d9df4c Fix fcmp + fabs instcombines when using the intrinsic
This was only handling the libcall. This is another example
of why only the intrinsic should ever be used when it exists.

llvm-svn: 225465
2015-01-08 20:09:34 +00:00
Matt Arsenault 2458393104 Fix using wrong intrinsic in test
This is a leftover from renaming the intrinsic.
It's surprising the unknown llvm. intrinsic wasn't rejected.

llvm-svn: 225304
2015-01-06 23:00:33 +00:00
Matt Arsenault 55e7312cd8 Convert fcmp with 0.0 from casted integers to icmp
This is already handled in general when it is known the
conversion can't lose bits with smaller integer types
casted into wider floating point types.

This pattern happens somewhat often in GPU programs that cast
workitem intrinsics to float, which are often compared with 0.

Specifically handle the special case of compares with zero which
should also be known to not lose information. I had a more general
version of this which allows equality compares if the casted float is
exactly representable in the integer, but I'm not 100% confident that
is always correct.

Also fold cases that aren't integers to true / false.

llvm-svn: 225265
2015-01-06 15:50:59 +00:00
David Majnemer 9b6b822814 InstCombine: Bitcast call arguments from/to pointer/integer type
Try harder to get rid of bitcast'd calls by ptrtoint/inttoptr'ing
arguments and return values when DataLayout says it is safe to do so.

llvm-svn: 225254
2015-01-06 08:41:31 +00:00
David Majnemer 087dc8b831 InstCombine: match can find ConstantExprs, don't assume we have a Value
We assumed the output of a match was a Value, this would cause us to
assert because we would fail a cast<>.  Instead, use a helper in the
Operator family to hide the distinction between Value and Constant.

This fixes PR22087.

llvm-svn: 225127
2015-01-04 07:36:02 +00:00
David Majnemer c8a576b5c0 InstCombine: Detect when llvm.umul.with.overflow always overflows
We know overflow always occurs if both ~LHSKnownZero * ~RHSKnownZero
and LHSKnownOne * RHSKnownOne overflow.

llvm-svn: 225077
2015-01-02 07:29:47 +00:00
Sanjay Patel e68f71574f InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X
Some day the backend may handle instruction-level fast math flags and make
this transform unnecessary, but it's still better practice to use the canonical
representation of fneg when possible (use a -0.0).

This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ).
See also http://reviews.llvm.org/D6723.

Differential Revision: http://reviews.llvm.org/D6731

llvm-svn: 225050
2014-12-31 22:14:05 +00:00
David Majnemer f89dc3edc9 InstCombine: try to transform A-B < 0 into A < B
We are allowed to move the 'B' to the right hand side if we an prove
there is no signed overflow and if the comparison itself is signed.

llvm-svn: 225034
2014-12-31 04:21:41 +00:00
Philip Reames 9db26ffc9a Carry facts about nullness and undef across GC relocation
This change implements four basic optimizations:

    If a relocated value isn't used, it doesn't need to be relocated.
    If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.)
    If the value being relocated is undef, the relocation is meaningless.
    If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.)

I outlined other planned work in comments.

Differential Revision: http://reviews.llvm.org/D6600

llvm-svn: 224968
2014-12-29 23:27:30 +00:00
Philip Reames 5ad26c353c Loading from null is valid outside of addrspace 0
This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen.  This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces.

We really should introduce a hook to control this property on a per target per address space basis.  We may be loosing valuable optimizations in some address spaces by being too conservative.

Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me.

llvm-svn: 224961
2014-12-29 22:46:21 +00:00
David Majnemer b1296ec0fd InstCombine: Infer nuw for multiplies
A multiply cannot unsigned wrap if there are bitwidth, or more, leading
zero bits between the two operands.

llvm-svn: 224849
2014-12-26 09:50:35 +00:00
David Majnemer 54c2ca2539 InstCombe: Infer nsw for multiplies
We already utilize this logic for reducing overflow intrinsics, it makes
sense to reuse it for normal multiplies as well.

llvm-svn: 224847
2014-12-26 09:10:14 +00:00
Michael Kuperstein be8032c875 [ValueTracking] Move GlobalAlias handling to be after the max depth check in computeKnownBits()
GlobalAlias handling used to be after GlobalValue handling, which meant it was, in practice, dead code. r220165 moved GlobalAlias handling to be before GlobalValue handling, but also moved it to be before the max depth check, causing an assert due to a recursion depth limit violation. 

This moves GlobalAlias handling forward to where it's safe, and changes the GlobalValue handling to only look at GlobalObjects.

Differential Revision: http://reviews.llvm.org/D6758

llvm-svn: 224765
2014-12-23 11:33:41 +00:00
David Majnemer 6eed0e0d20 This should have been part of r224676.
llvm-svn: 224677
2014-12-20 04:48:34 +00:00
David Majnemer b0362e4ee6 InstCombine: Squash an icmp+select into bitwise arithmetic
(X & INT_MIN) == 0 ? X ^ INT_MIN : X  into  X | INT_MIN
(X & INT_MIN) != 0 ? X ^ INT_MIN : X  into  X & INT_MAX

This fixes PR21993.

llvm-svn: 224676
2014-12-20 04:45:35 +00:00
Bruno Cardoso Lopes f6cf8ad4e5 Reapply: [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. Also, fix code to also return the modified switch when only
the truncation is performed.

This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224588
2014-12-19 17:12:35 +00:00
Sanjay Patel ea3c802887 use -0.0 when creating an fneg instruction
Backends recognize (-0.0 - X) as the canonical form for fneg
and produce better code. Eg, ppc64 with 0.0:

   lis r2, ha16(LCPI0_0)
   lfs f0, lo16(LCPI0_0)(r2)
   fsubs f1, f0, f1
   blr

vs. -0.0:

   fneg f1, f1
   blr

Differential Revision: http://reviews.llvm.org/D6723

llvm-svn: 224583
2014-12-19 16:44:08 +00:00
Bruno Cardoso Lopes 3be15b2fa6 Revert "[InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr"
Reverts commit r224574 to appease buildbots:

The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

llvm-svn: 224576
2014-12-19 14:36:24 +00:00
Bruno Cardoso Lopes c9005f2f2b [InstCombine] Fix visitSwitchInst to use right operand types for sub cstexpr
The visitSwitchInst generates SUB constant expressions to recompute the
switch condition. When truncating the condition to a smaller type, SUB
expressions should use the previous type (before trunc) for both
operands. This fixes an assertion crash.

Differential Revision: http://reviews.llvm.org/D6644

rdar://problem/19191835

llvm-svn: 224574
2014-12-19 14:23:15 +00:00
Erik Eckstein a451b9b0b5 Strength reduce intrinsics with overflow into regular arithmetic operations if possible.
Some intrinsics, like s/uadd.with.overflow and umul.with.overflow, are already strength reduced.
This change adds other arithmetic intrinsics: s/usub.with.overflow, smul.with.overflow.
It completes the work on PR20194.

llvm-svn: 224417
2014-12-17 07:29:19 +00:00
Duncan P. N. Exon Smith be7ea19b58 IR: Make metadata typeless in assembly
Now that `Metadata` is typeless, reflect that in the assembly.  These
are the matching assembly changes for the metadata/value split in
r223802.

  - Only use the `metadata` type when referencing metadata from a call
    intrinsic -- i.e., only when it's used as a `Value`.

  - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
    when referencing it from call intrinsics.

So, assembly like this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata !{i32 %v}, metadata !0)
      call void @llvm.foo(metadata !{i32 7}, metadata !0)
      call void @llvm.foo(metadata !1, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{metadata !3}, metadata !0)
      ret void, !bar !2
    }
    !0 = metadata !{metadata !2}
    !1 = metadata !{i32* @global}
    !2 = metadata !{metadata !3}
    !3 = metadata !{}

turns into this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata i32 %v, metadata !0)
      call void @llvm.foo(metadata i32 7, metadata !0)
      call void @llvm.foo(metadata i32* @global, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{!3}, metadata !0)
      ret void, !bar !2
    }
    !0 = !{!2}
    !1 = !{i32* @global}
    !2 = !{!3}
    !3 = !{}

I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines).  I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.

This is part of PR21532.

llvm-svn: 224257
2014-12-15 19:07:53 +00:00
David Majnemer 9b6097586c ValueTracking: Don't recurse too deeply in computeKnownBitsFromAssume
Respect the MaxDepth recursion limit, doing otherwise will trigger an
assert in computeKnownBits.

This fixes PR21891.

llvm-svn: 224168
2014-12-12 23:59:29 +00:00
Steven Wu 881916dea5 Fix another infinite loop in InstCombine
Summary:
InstCombine infinite-loops for the testcase added
It is because InstCombine is generating instructions that can be
optimized by itself. Fix by not optimizing frem if the optimized
type is the same as original type.
rdar://problem/19150820

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D6634

llvm-svn: 224097
2014-12-12 04:34:07 +00:00
Andrea Di Biagio 72b05aa59c [InstCombine][X86] Improved folding of calls to Intrinsic::x86_sse4a_insertqi.
This patch teaches the instruction combiner how to fold a call to 'insertqi' if
the 'length field' (3rd operand) is set to zero, and if the sum between
field 'length' and 'bit index' (4th operand) is bigger than 64.

From the AMD64 Architecture Programmer's Manual:
1. If the sum of the bit index + length field is greater than 64, then the
   results are undefined;
2. A value of zero in the field length is defined as a length of 64.

This patch improves the existing combining logic for intrinsic 'insertqi'
adding extra checks to address both point 1. and point 2.

Differential Revision: http://reviews.llvm.org/D6583

llvm-svn: 224054
2014-12-11 20:44:59 +00:00
David Majnemer 5a7717e498 ConstantFold, InstSimplify: undef >>a x can be either -1 or 0, choose 0
Zero is usually a nicer constant to have than -1.

llvm-svn: 223969
2014-12-10 21:58:15 +00:00
Chandler Carruth a7f247ea56 Revert r223764 which taught instcombine about integer-based elment extraction
patterns.

This is causing Clang to miscompile itself for 32-bit x86 somehow, and likely
also on ARM and PPC. I really don't know how, but reverting now that I've
confirmed this is actually the culprit. I have a reproduction as well and so
should be able to restore this shortly.

This reverts commit r223764.

Original commit log follows:
Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.

Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.

All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.

With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.

For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
  integer loads and stores. SSA values are tremendously more powerful
  than "copy" intrinsics. Not doing this regresses massive amounts of
  LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
  SROA or every memcpy of a trivially copyable struct will prevent SSA
  formation of the members of that struct. It essentially turns off
  SROA.
- The closest alternative is to actually split the loads and stores when
  partitioning with SROA, but this has all of the downsides historically
  discussed of splitting up loads and stores -- the wide-store
  information is fundamentally lost. We would also see performance
  regressions for bitfield-heavy code and other places where the
  integers aren't really intended to be split without seemingly
  arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
  a choice to make IMO.

llvm-svn: 223813
2014-12-09 19:21:16 +00:00
Sonam Kumari 72ccc3c428 Removal Of Duplicate Test Cases and Addition Of Missing Check Statements
llvm-svn: 223768
2014-12-09 10:46:38 +00:00
Ankur Garg 51eeba70da [test/Transforms/InstCombine/shift.ll] Removed duplicate test cases. NFC.
Removed some duplicate test cases from the file /test/Transforms/InstCombine/shift.ll.

test54 and test57 were duplicates of each other.
test55 and test58 were duplicates of each other.

(Removed test57 and test58)

llvm-svn: 223767
2014-12-09 10:35:19 +00:00
Chandler Carruth 7415205113 Teach instcombine to canonicalize "element extraction" from a load of an
integer and "element insertion" into a store of an integer into actual
element extraction, element insertion, and vector loads and stores.

Previously various parts of LLVM (including instcombine itself) would
introduce integer loads and stores into the code as a way of opaquely
loading and storing "bits". In some cases (such as a memcpy of
std::complex<float> object) we will eventually end up using those bits
in non-integer types. In order for SROA to effectively promote the
allocas involved, it splits these "store a bag of bits" integer loads
and stores up into the constituent parts. However, for non-alloca loads
and tsores which remain, it uses integer math to recombine the values
into a large integer to load or store.

All of this would be "fine", except that it forces LLVM to go through
integer math to combine and split up values. While this makes perfect
sense for integers (and in fact is critical for bitfields to end up
lowering efficiently) it is *terrible* for non-integer types, especially
floating point types. We have a much more canonical way of representing
the act of concatenating the bits of two SSA values in LLVM: a vector
and insertelement. This patch teaching InstCombine to use this
representation.

With this patch applied, LLVM will no longer introduce integer math into
the critical path of every loop over std::complex<float> operations such
as those that make up the hot path of ... oh, most HPC code, Eigen, and
any other heavy linear algebra library.

For the record, I looked *extensively* at fixing this in other parts of
the compiler, but it just doesn't work:
- We really do want to canonicalize memcpy and other bit-motion to
  integer loads and stores. SSA values are tremendously more powerful
  than "copy" intrinsics. Not doing this regresses massive amounts of
  LLVM's scalar optimizer.
- We really do need to split up integer loads and stores of this form in
  SROA or every memcpy of a trivially copyable struct will prevent SSA
  formation of the members of that struct. It essentially turns off
  SROA.
- The closest alternative is to actually split the loads and stores when
  partitioning with SROA, but this has all of the downsides historically
  discussed of splitting up loads and stores -- the wide-store
  information is fundamentally lost. We would also see performance
  regressions for bitfield-heavy code and other places where the
  integers aren't really intended to be split without seemingly
  arbitrary logic to treat integers totally differently.
- We *can* effectively fix this in instcombine, so it isn't that hard of
  a choice to make IMO.

Differential Revision: http://reviews.llvm.org/D6548

llvm-svn: 223764
2014-12-09 08:55:32 +00:00
Sonam Kumari 90d266c0a9 Removal Of Duplicate Test Case from shift.ll file
llvm-svn: 223648
2014-12-08 09:40:43 +00:00
Philip Reames 5b3ce71b62 Add a test case for argument type coercion in an invoke of a vararg function
This would have caught the bug I fixed in 223370.  

llvm-svn: 223378
2014-12-04 19:13:45 +00:00
Simon Pilgrim be24ab367b [InstCombine] Minor optimization for bswap with binary ops
Added instcombine optimizations for BSWAP with AND/OR/XOR ops:

OP( BSWAP(x), BSWAP(y) ) -> BSWAP( OP(x, y) )
OP( BSWAP(x), CONSTANT ) -> BSWAP( OP(x, BSWAP(CONSTANT) ) )

Since its just a one liner, I've also added BSWAP to the DAGCombiner equivalent as well:

fold (OP (bswap x), (bswap y)) -> (bswap (OP x, y))

Refactored bswap-fold tests to use FileCheck instead of just checking that the bswaps had gone.

Differential Revision: http://reviews.llvm.org/D6407

llvm-svn: 223349
2014-12-04 09:44:01 +00:00
Matthias Braun d34e4d2354 [SimplifyLibCalls] Improve double->float shrinking to consider constants
This allows cases like float x; fmin(1.0, x); to be optimized to fminf(1.0f, x);

rdar://19049359

Differential Revision: http://reviews.llvm.org/D6496

llvm-svn: 223270
2014-12-03 21:46:33 +00:00
Matthias Braun 892c923c46 [SimplifyLibCalls] Enable double to float shrinking for copysign
rdar://19049359

Differential Revision: http://reviews.llvm.org/D6495

llvm-svn: 223269
2014-12-03 21:46:29 +00:00
Erik Eckstein d181752be0 InstCombine: simplify signed range checks
Try to convert two compares of a signed range check into a single unsigned compare.
Examples:
(icmp sge x, 0) & (icmp slt x, n) --> icmp ult x, n
(icmp slt x, 0) | (icmp sgt x, n) --> icmp ugt x, n

llvm-svn: 223224
2014-12-03 10:39:15 +00:00
Sonam Kumari f2eacabd66 [signext.ll] Removal Of Duplicate Test Cases
Removed the duplicate test case existing in signext.ll file.

llvm-svn: 223109
2014-12-02 05:29:47 +00:00
Sonam Kumari 237cfa9916 Removed extra whitespace. (Testing commit access). NFC.
llvm-svn: 222994
2014-12-01 09:27:46 +00:00
David Majnemer 3d6f80b619 InstCombine: FoldOrOfICmps harder
We may be in a situation where the icmps might not be near each other in
a tree of or instructions.  Try to dig out related compare instructions
and see if they combine.

N.B.  This won't fire on deep trees of compares because rewritting the
tree might end up creating a net increase of IR.  We may have to resort
to something more sophisticated if this is a real problem.

llvm-svn: 222928
2014-11-28 19:58:29 +00:00
Suyog Sarda f8516e1662 Use FileCheck instead of grep. Change by Ankur Garg.
Differential Revision: http://reviews.llvm.org/D6430

llvm-svn: 222879
2014-11-27 11:22:49 +00:00
Suyog Sarda c3024c75e0 Use FileCheck instead of grep. Change by Sonam.
Differential Revision: http://reviews.llvm.org/D6432

llvm-svn: 222876
2014-11-27 10:57:24 +00:00
David Majnemer 40157d5c4d InstCombine: Restore optimizations lost in r210006
This restores our ability to optimize:
(X & C) == 0 ? X ^ C : X  into  X | C
(X & C) != 0 ? X ^ C : X  into  X & ~C

llvm-svn: 222871
2014-11-27 07:25:21 +00:00
David Majnemer c6a5e1dd4f InstSimplify: Restore optimizations lost in r210006
This restores our ability to optimize:
(X & C) ? X & ~C : X  into  X & ~C
(X & C) ? X : X & ~C  into  X
(X & C) ? X | C : X  into  X
(X & C) ? X : X | C  into  X | C

llvm-svn: 222868
2014-11-27 06:32:46 +00:00
David Majnemer 5468e86469 Revert "Added inst combine transforms for single bit tests from Chris's note"
This reverts commit r210006, it miscompiled libapr which is used in who
knows how many projects.

A test has been added to ensure that we don't regress again.

I'll work on a rewrite of what the optimization was trying to do later.

llvm-svn: 222856
2014-11-26 23:00:38 +00:00
Chandler Carruth 816d26fe5e [InstCombine] Change LLVM To canonicalize toward the value type being
stored rather than the pointer type.

This change is analogous to r220138 which changed the canonicalization
for loads. The rationale is the same: memory does not have a type,
operations (and thus the values they produce) have a type. We should
match that type as closely as possible rather than reading some form of
semantics into the pointer type.

With this change, loads and stores should no longer be made with
nonsensical types for the values that tehy load and store. This is
particularly important when trying to match specific loaded and stored
types in the process of doing other instcombines, which is what led me
down this twisty maze of miscanonicalization.

I've put quite some effort into looking through IR to find places where
LLVM's optimizer was being unreasonably conservative in the face of
mismatched load and store types, however it is possible (let's say,
likely!) I have missed some. If you see regressions here, or from
r220138, the likely cause is some part of LLVM failing to cope with load
and store types differing. Test cases appreciated, it is important that
we root all of these out of LLVM.

llvm-svn: 222748
2014-11-25 10:09:51 +00:00
Suyog Sarda 99c9c1f2b0 Change the test case file to use FileCheck instead of grep. NFC.
Change by Ankur Garg.

Differential Revision: http://reviews.llvm.org/D6382

llvm-svn: 222740
2014-11-25 08:44:56 +00:00
Chandler Carruth 1a3c2c414c Revert r220349 to re-instate r220277 with a fix for PR21330 -- quite
clearly only exactly equal width ptrtoint and inttoptr casts are no-op
casts, it says so right there in the langref. Make the code agree.

Original log from r220277:
Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.

To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.

These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.

I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.

llvm-svn: 222739
2014-11-25 08:20:27 +00:00
Matt Arsenault 238ff1ad1e Bug 21610: Canonicalize min/max fcmp selects to use ordered comparisons
llvm-svn: 222705
2014-11-24 23:15:18 +00:00
Matt Arsenault ea515d33c9 Convert test to FileCheck and use CHECK-LABEL
llvm-svn: 222704
2014-11-24 23:03:17 +00:00
David Majnemer 8e6f6a98b5 InstCombine: Don't create an unused instruction
We would create an instruction but not inserting it.
Not inserting the unused instruction would lead us to verification
failure.

This fixes PR21653.

llvm-svn: 222659
2014-11-24 16:41:13 +00:00
David Majnemer b2a6e7458d InstCombine: Don't assume DataLayout is always available
We tried to get the result of DataLayout::getLargestLegalIntTypeSize but
we didn't have a DataLayout.  This resulted in opt crashing.

This fixes PR21651.

llvm-svn: 222645
2014-11-24 07:26:20 +00:00
David Majnemer fb3805576b InstCombine: Propagate exact for (sdiv X, Pow2) -> (udiv X, Pow2)
llvm-svn: 222625
2014-11-22 20:00:41 +00:00
David Majnemer ec6e481bc5 InstCombine: Propagate exact for (sdiv X, Y) -> (udiv X, Y)
llvm-svn: 222624
2014-11-22 20:00:38 +00:00
David Majnemer fa4699e65f InstCombine: Propagate exact for (sdiv -X, C) -> (sdiv X, -C)
llvm-svn: 222623
2014-11-22 20:00:34 +00:00
David Majnemer a3aeb15613 InstCombine: Propagate exact in (udiv (lshr X,C1),C2) -> (udiv x,C1<<C2)
llvm-svn: 222620
2014-11-22 18:16:54 +00:00
David Majnemer 546f81064c InstCombine: Propagate NSW/NUW for X*(1<<Y) -> X<<Y
llvm-svn: 222613
2014-11-22 08:57:02 +00:00
David Majnemer 8279a7506d InstCombine: Propagate NSW for -X * -Y -> X * Y
llvm-svn: 222612
2014-11-22 07:25:19 +00:00
David Majnemer 80c8f627db InstCombine: Preserve nsw when folding X*(2^C) -> X << C
llvm-svn: 222606
2014-11-22 04:52:55 +00:00
David Majnemer fd4a6d2b7a InstCombine: Preserve nsw/nuw for ((X << C2)*C1) -> (X * (C1 << C2))
llvm-svn: 222605
2014-11-22 04:52:52 +00:00
David Majnemer 027bc80928 InstCombine: Preserve nsw for (mul %V, -1) -> (sub 0, %V)
llvm-svn: 222604
2014-11-22 04:52:38 +00:00
Gerolf Hoflehner ec6217c929 [InstCombine] Re-commit of r218721 (Optimize icmp-select-icmp sequence)
Fixes the self-host fail. Note that this commit activates dominator
analysis in the combiner by default (like the original commit did).

llvm-svn: 222590
2014-11-21 23:36:44 +00:00
David Majnemer c6b8e20a5c InstCombine: Fix another infinite loop caused by visitFPTrunc
We would attempt to replace an frem's operand with the same operand.
This would cause InstCombine to think real work was done, causing
InstCombine to enter an infinite loop.

This fixes the second part of PR21576.

llvm-svn: 222265
2014-11-18 22:06:45 +00:00
David Majnemer b32eaddf11 Revert "Revert r222040 because of bot failure."
This reverts commit r222203, reverting r222040 didn't end up turning the
bot green.

llvm-svn: 222261
2014-11-18 21:30:02 +00:00
David Majnemer 6fdb6b8fd4 InstCombine: Fold away tautological masked compares
It is impossible for (x & INT_MAX) == 0 && x == INT_MAX to ever be true.

While this sort of reasoning should normally live in InstSimplify,
the machinery that derives this result is not trivial to split out.

llvm-svn: 222230
2014-11-18 09:31:41 +00:00
Manman Ren a64bd44fd8 Revert r222040 because of bot failure.
http://lab.llvm.org:8080/green/job/clang-Rlto_master/298/
Hopefully, bot will be green.

llvm-svn: 222203
2014-11-18 00:33:22 +00:00
David Majnemer 8c3d92e7e5 InstCombine: Fix infinite loop caused by visitFPTrunc
We would attempt to replace a fptrunc of an frem with an identical
fptrunc.  This would cause the new fptrunc to be added to the worklist.
Of course, this results in an infinite loop because we will keep
visiting the newly created fptruncs.

This fixes PR21576.

llvm-svn: 222040
2014-11-14 21:21:15 +00:00
Sanjay Patel 4c219fd248 CGSCC should not treat intrinsic calls like function calls (PR21403)
Make the handling of calls to intrinsics in CGSCC consistent: 
they are not treated like regular function calls because they
are never lowered to function calls.

Without this patch, we can get dangling pointer asserts from
the subsequent loop that processes callsites because it already
ignores intrinsics.

See http://llvm.org/bugs/show_bug.cgi?id=21403 for more details / discussion.

Differential Revision: http://reviews.llvm.org/D6124

llvm-svn: 221802
2014-11-12 18:25:47 +00:00
Bill Schmidt 729547847f [PowerPC] Add vec_vsx_ld and vec_vsx_st intrinsics
This patch enables the vec_vsx_ld and vec_vsx_st intrinsics for
PowerPC, which provide programmer access to the lxvd2x, lxvw4x,
stxvd2x, and stxvw4x instructions.

New LLVM intrinsics are provided to represent these four instructions
in IntrinsicsPowerPC.td.  These are patterned after the similar
intrinsics for lvx and stvx (Altivec).  In PPCInstrVSX.td, these
intrinsics are tied to the code gen patterns, with additional patterns
to allow plain vanilla loads and stores to still generate these
instructions.

At -O1 and higher the intrinsics are immediately converted to loads
and stores in InstCombineCalls.cpp.  This will open up more
optimization opportunities while still allowing the correct
instructions to be generated.  (Similar code exists for aligned
Altivec loads and stores.)

The new intrinsics are added to the code that checks for consecutive
loads and stores in PPCISelLowering.cpp, as well as to
PPCTargetLowering::getTgtMemIntrinsic().

There's a new test to verify the correct instructions are generated.
The loads and stores tend to be reordered, so the test just counts
their number.  It runs at -O2, as it's not very effective to test this
at -O0, when many unnecessary loads and stores are generated.

I ended up having to modify vsx-fma-m.ll.  It turns out this test case
is slightly unreliable, but I don't know a good way to prevent
problems with it.  The xvmaddmdp instructions read and write the same
register, which is one of the multiplicands.  Commutativity allows
either to be chosen.  If the FMAs are reordered differently than
expected by the test, the register assignment can be different as a
result.  Hopefully this doesn't change often.

There is a companion patch for Clang.

llvm-svn: 221767
2014-11-12 04:19:40 +00:00
Philip Reames 66c6de61ee Canonicalize an assume(load != null) into !nonnull metadata
We currently have two ways of informing the optimizer that the result of a load is never null: metadata and assume. This change converts the second in to the former. This avoids a need to implement optimizations using both forms.

We should probably extend this basic idea to metadata of other forms; in particular, range metadata. We view is that assumes should be considered a "last resort" for when there isn't a more canonical way to represent something.

Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5951

llvm-svn: 221737
2014-11-11 23:33:19 +00:00
David Majnemer c1eca5ad7c InstCombine: Rely on cmpxchg's return code when it's strong
Comparing the result of a cmpxchg instruction can be replaced with an
extractvalue of the cmpxchg success indicator.

llvm-svn: 221498
2014-11-06 23:23:30 +00:00
David Majnemer d28edfea03 Minimize test case further
No functional change intended.

llvm-svn: 221237
2014-11-04 05:17:58 +00:00
David Majnemer 7e2b9882b1 InstCombine: Remove infinite loop caused by FoldOpIntoPhi
FoldOpIntoPhi could create an infinite loop if the PHI could potentially
reach a BB it was considering inserting instructions into.  The
instructions it would insert would eventually lead to other combines
firing which would, again, lead to FoldOpIntoPhi firing.

The solution is to handicap FoldOpIntoPhi so that it doesn't attempt to
insert instructions that the PHI might reach.

This fixes PR21377.

llvm-svn: 221187
2014-11-03 21:55:12 +00:00
David Majnemer 72a643dc8f InstCombine: Combine (X | Y) - X to (~X & Y)
This implements the transformation from (X | Y) - X to (~X & Y).

Differential Revision: http://reviews.llvm.org/D5791

llvm-svn: 221129
2014-11-03 05:53:55 +00:00
David Majnemer 634ca236dc InstCombine: Don't assume that m_ZExt matches an Instruction
m_ZExt might bind against a ConstantExpr instead of an Instruction.
Assuming this, using cast<Instruction>, results in InstCombine crashing.

Instead, introduce ZExtOperator to bridge both Instruction and
ConstantExpr ZExts.

This fixes PR21445.

llvm-svn: 221069
2014-11-01 23:46:05 +00:00
David Majnemer 549f4f2510 InstCombine: Combine (X+cst) < 0 --> X < -cst
This can happen pretty often in code that looks like:
int foo = bar - 1;
if (foo < 0)
  do stuff

In this case, bar < 1 is an equivalent condition.

This transform requires that the add instruction be annotated with nsw.

llvm-svn: 221045
2014-11-01 09:09:51 +00:00
Philip Reames 4cb4d3e048 Add handling for range metadata in ValueTracking isKnownNonZero
If we load from a location with range metadata, we can use information about the ranges of the loaded value for optimization purposes.  This helps to remove redundant checks and canonicalize checks for other optimization passes.  This particular patch checks whether a value is known to be non-zero from the range metadata.

Currently, these tests are against InstCombine.  In theory, all of these should be InstSimplify since we're not inserting any new instructions.  Moving the code may follow in a separate change.

Reviewed by: Hal
Differential Revision: http://reviews.llvm.org/D5947

llvm-svn: 220925
2014-10-30 20:25:19 +00:00
David Majnemer c8bdd23acf InstCombine: Fix a combine assuming that icmp operands were integers
An icmp may have pointer arguments, it isn't limited to integers or
vectors of integers.

This fixes PR21388.

llvm-svn: 220664
2014-10-27 05:47:49 +00:00
Sanjay Patel 848309da7c Handle sqrt() shrinking in SimplifyLibCalls like any other call
This patch removes a chunk of special case logic for folding 
(float)sqrt((double)x) -> sqrtf(x)
in InstCombineCasts and handles it in the mainstream path of SimplifyLibCalls.

No functional change intended, but I loosened the restriction on the existing
sqrt testcases to allow for this optimization even without unsafe-fp-math because
that's the existing behavior.

I also added a missing test case for not shrinking the llvm.sqrt.f64 intrinsic
in case the result is used as a double.

Differential Revision: http://reviews.llvm.org/D5919

llvm-svn: 220514
2014-10-23 21:52:45 +00:00
Sanjay Patel a92fa44740 Shrinkify libcalls: use float versions of double libm functions with fast-math (bug 17850)
When a call to a double-precision libm function has fast-math semantics 
(via function attribute for now because there is no IR-level FMF on calls), 
we can avoid fpext/fptrunc operations and use the float version of the call
if the input and output are both float.

We already do this optimization using a command-line option; this patch just
adds the ability for fast-math to use the existing functionality.

I moved the cl::opt from InstructionCombining into SimplifyLibCalls because
it's only ever used internally to that class.

Modified the existing test cases to use the unsafe-fp-math attribute rather
than repeating all tests.

This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850

Differential Revision: http://reviews.llvm.org/D5893

llvm-svn: 220390
2014-10-22 15:29:23 +00:00
Hans Wennborg 0b39fc0d16 Revert "Teach the load analysis to allow finding available values which require" (r220277)
This seems to have caused PR21330.

llvm-svn: 220349
2014-10-21 23:49:52 +00:00
Matt Arsenault d6511b49ac Add minnum / maxnum intrinsics
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.

llvm-svn: 220341
2014-10-21 23:00:20 +00:00
David Majnemer d205602a0b InstCombine: Simplify FoldICmpCstShrCst
This function was complicated by the fact that it tried to perform
canonicalizations that were already preformed by InstSimplify.  Remove
this extra code and move the tests over to InstSimplify.  Add asserts to
make sure our preconditions hold before we make any assumptions.

llvm-svn: 220314
2014-10-21 19:51:55 +00:00
Chandler Carruth aa72a6dd3b Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.

To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.

These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.

I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.

llvm-svn: 220277
2014-10-21 09:00:40 +00:00
Chandler Carruth a32038b006 Fix a miscompile introduced in r220178.
The original code had an implicit assumption that if the test for
allocas or globals was reached, the two pointers were not equal. With my
changes to make the pointer analysis more powerful here, I also had to
guard against circumstances where the results weren't useful. That in
turn violated the assumption and gave rise to a circumstance in which we
could have a store with both the queried pointer and stored pointer
rooted at *the same* alloca. Clearly, we cannot ignore such a store.
There are other things we might do in this code to better handle the
case of both pointers ending up at the same alloca or global, but it
seems best to at least make the test explicit in what it intends to
check.

I've added tests for both the alloca and global case here.

llvm-svn: 220190
2014-10-20 10:03:01 +00:00
Chandler Carruth eeec35ae1c Teach the load analysis driving core instcombine logic and other bits of
logic to look through pointer casts, making them trivially stronger in
the face of loads and stores with intervening pointer casts.

I've included a few test cases that demonstrate the kind of folding
instcombine can do without pointer casts and then variations which
obfuscate the logic through bitcasts. Without this patch, the variations
all fail to optimize fully.

This is more important now than it has been in the past as I've started
moving the load canonicialization to more closely follow the value type
requirements rather than the pointer type requirements and thus this
needs to be prepared for more pointer casts. When I made the same change
to stores several test cases regressed without logic along these lines
so I wanted to systematically improve matters first.

llvm-svn: 220178
2014-10-20 00:24:14 +00:00
Chandler Carruth b5f4c32830 Add a datalayout string to this test so that it exercises the full gamut
of InstCombine rather than just the bits enabled when datalayout is
optional.

The primary fixes here are because now things are little endian.

In good news, silliness like this seems like it will be going away as
we've got pretty stong consensus on dropping optional datalayout
entirely.

llvm-svn: 220176
2014-10-20 00:11:31 +00:00
Chandler Carruth bc6378defb Do a better and more complete job of preserving metadata when combining
loads.

This handles many more cases than just the AA metadata, some of them
suggested by Hal in his review of the AA metadata handling patch. I've
tried to test this behavior where tractable to do so.

I'll point out that I have specifically *not* included a test for
debuginfo because it was going to require 2 or 3 times as much work to
craft some input which would survive the "helpful" stripping of debug
info metadata that doesn't match the desired schema. This is another
good example of why the current state of write-ability for our debug
info metadata is unacceptable. I spent over 30 minutes trying to conjure
some test case that would survive, even copying from other debug info
tests, but it always failed to survive with no explanation of why or how
I might fix it. =[

llvm-svn: 220165
2014-10-19 10:46:46 +00:00
Chandler Carruth 5b8cd2f73c Move previously dead code to handle computing the known bits of an alias
up to where it actually works as intended. The problem is that
a GlobalAlias isa GlobalValue and so the prior block handled all of the
cases.

This allows us to constant fold based on the actual constant expression
in the global alias. As an example, see the last function in the newly
added test case which explicitly aligns an unaligned pointer using
constant expression math. Without this change, we fail to see that and
fold an alignment test to zero.

llvm-svn: 220164
2014-10-19 09:06:56 +00:00
David Majnemer 312c3e5f39 InstCombine: (sub (or A B) (xor A B)) --> (and A B)
The following implements the transformation:
(sub (or A B) (xor A B)) --> (and A B).

Patch by Ankur Garg!

Differential Revision: http://reviews.llvm.org/D5719

llvm-svn: 220163
2014-10-19 08:32:32 +00:00
David Majnemer 59939acd26 InstCombine: Optimize icmp eq/ne (shl Const2, A), Const1
The following implements the optimization for sequences of the form:
icmp eq/ne (shl Const2, A), Const1

Such sequences can be transformed to:
icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2))

This handles only the equality operators for now. Other operators need
to be handled.

Patch by Ankur Garg!

llvm-svn: 220162
2014-10-19 08:23:08 +00:00
Chandler Carruth a801dd5799 Fix a long-standing miscompile in the load analysis that was uncovered
by my refactoring of this code.

The method isSafeToLoadUnconditionally assumes that the load will
proceed with the preferred type alignment. Given that, it has to ensure
that the alloca or global is at least that aligned. It has always done
this historically when a datalayout is present, but has never checked it
when the datalayout is absent. When I refactored the code in r220156,
I exposed this path when datalayout was present and that turned the
latent bug into a patent bug.

This fixes the issue by just removing the special case which allows
folding things without datalayout. This isn't worth the complexity of
trying to tease apart when it is or isn't safe without actually knowing
the preferred alignment.

llvm-svn: 220161
2014-10-19 08:17:50 +00:00
Chandler Carruth be9dccd64d Preserve AA metadata when combining (cast (load (...))) -> (load (cast
(...))).

llvm-svn: 220141
2014-10-18 11:00:12 +00:00
Chandler Carruth 2f75fcfef3 [InstCombine] Do an about-face on how LLVM canonicalizes (cast (load
...)) and (load (cast ...)): canonicalize toward the former.

Historically, we've tried to load using the type of the *pointer*, and
tried to match that type as closely as possible removing as many pointer
casts as we could and trading them for bitcasts of the loaded value.
This is deeply and fundamentally wrong.

Repeat after me: memory does not have a type! This was a hard lesson for
me to learn working on SROA.

There is only one thing that should actually drive the type used for
a pointer, and that is the type which we need to use to load from that
pointer. Matching up pointer types to the loaded value types is very
useful because it minimizes the physical size of the IR required for
no-op casts. Similarly, the only thing that should drive the type used
for a loaded value is *how that value is used*! Again, this minimizes
casts. And in fact, the *only* thing motivating types in any part of
LLVM's IR are the types used by the operations in the IR. We should
match them as closely as possible.

I've ended up removing some tests here as they were testing bugs or
behavior that is no longer present. Mostly though, this is just cleanup
to let the tests continue to function as intended.

The only fallout I've found so far from this change was SROA and I have
fixed it to not be impeded by the different type of load. If you find
more places where this change causes optimizations not to fire, those
too are likely bugs where we are assuming that the type of pointers is
"significant" for optimization purposes.

llvm-svn: 220138
2014-10-18 06:36:22 +00:00
Chandler Carruth 71009cad95 Remove a test that was ported from the old llvm-gcc frontend test suite.
This test is pretty awesome. It is claiming to test devirtualization.
However, the code in question is not in fact devirtualized by LLVM. If
you take the original C++ test case and run it through Clang at -O3 we
fail to devirtualize it completely. It also isn't a sufficiently focused
test case.

The *reason* we fail to devirtualize it isn't because of any missing
instcombine though. Instead, it is because we fail to emit an available
externally vtable and thus the vtable is just an external and completely
opaque. If I cause the vtable to be emitted, we successfully
devirtualize things.

Anyways, I'm just removing it because it is providing negative value at
this point: it isn't representative of the output of Clang really, LLVM
isn't doing the transform it claims to be testing, LLVM's failure to do
the transform isn't actually an LLVM bug at all and we shouldn't be
testing for it here, and finally the test is written in such a way that
it will trivially pass even when the point of the test is failing.

llvm-svn: 220137
2014-10-18 06:36:18 +00:00
Rafael Espindola 11aaaeebe0 Delete -std-compile-opts.
These days -std-compile-opts was just a silly alias for -O3.

llvm-svn: 219951
2014-10-16 20:00:02 +00:00
Sanjay Patel c699a6117b fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)
If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().

In the simplest case, this:

   y = sqrt(x * x);

becomes this:

   y = fabs(x);

This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.

Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to
use function-level attributes to do this optimization. This needs to be fixed
for both the intrinsics and in the backend.

Differential Revision: http://reviews.llvm.org/D5787

llvm-svn: 219944
2014-10-16 18:48:17 +00:00
Akira Hatanaka 5c221ef98f Reapply r219832 - InstCombine: Narrow switch instructions using known bits.
The code committed in r219832 asserted when it attempted to shrink a switch
statement whose type was larger than 64-bit.

llvm-svn: 219902
2014-10-16 06:00:46 +00:00
Akira Hatanaka 40c2cf4afc Revert r219832.
llvm-svn: 219884
2014-10-16 01:17:02 +00:00
Akira Hatanaka 5bb9346a45 InstCombine: Narrow switch instructions using known bits.
Truncate the operands of a switch instruction to a narrower type if the upper
bits are known to be all ones or zeros.

rdar://problem/17720004

llvm-svn: 219832
2014-10-15 19:05:50 +00:00
Sanjay Patel 0ca42bb5a8 Optimize away fabs() calls when input is squared (known positive).
Eliminate library calls and intrinsic calls to fabs when the input 
is a squared value.

Note that no unsafe-math / fast-math assumptions are needed for
this optimization.

Differential Revision: http://reviews.llvm.org/D5777

llvm-svn: 219717
2014-10-14 20:43:11 +00:00
David Majnemer dad2103801 InstCombine: Don't miscompile X % ((Pow2 << A) >>u B)
We assumed that A must be greater than B because the right hand side of
a remainder operator must be nonzero.

However, it is possible for A to be less than B if Pow2 is a power of
two greater than 1.

Take for example:
i32 %A = 0
i32 %B = 31
i32 Pow2 = 2147483648

((Pow2 << 0) >>u 31) is non-zero but A is less than B.

This fixes PR21274.

llvm-svn: 219713
2014-10-14 20:28:40 +00:00
David Majnemer db0773089f InstCombine: Fix miscompile in X % -Y -> X % Y transform
We assumed that negation operations of the form (0 - %Z) resulted in a
negative number.  This isn't true if %Z was originally negative.
Substituting the negative number into the remainder operation may result
in undefined behavior because the dividend might be INT_MIN.

This fixes PR21256.

llvm-svn: 219639
2014-10-13 22:37:51 +00:00
David Majnemer a252138942 InstCombine: Don't miscompile (x lshr C1) udiv C2
We have a transform that changes:
  (x lshr C1) udiv C2
into:
  x udiv (C2 << C1)

However, it is unsafe to do so if C2 << C1 discards any of C2's bits.

This fixes PR21255.

llvm-svn: 219634
2014-10-13 21:48:30 +00:00
Benjamin Kramer 240b85eec5 InstCombine: Turn (x != 0 & x <u C) into the canonical range check form (x-1 <u C-1)
llvm-svn: 219585
2014-10-12 14:02:34 +00:00
David Majnemer fe7fccff11 InstCombine: Don't fold (X <<s log(INT_MIN)) /s INT_MIN to X
Consider the case where X is 2.  (2 <<s 31)/s-2147483648 is zero but we
would fold to X.  Note that this is valid when we are in the unsigned
domain because we require NUW: 2 <<u 31 results in poison.

This fixes PR21245.

llvm-svn: 219568
2014-10-11 10:20:04 +00:00
David Majnemer cb9d596655 InstCombine, InstSimplify: (%X /s C1) /s C2 isn't always 0 when C1 * C2 overflow
consider:
C1 = INT_MIN
C2 = -1

C1 * C2 overflows without a doubt but consider the following:
%x = i32 INT_MIN

This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1.

N. B.  Move the unsigned version of this transform to InstSimplify, it
doesn't create any new instructions.

This fixes PR21243.

llvm-svn: 219567
2014-10-11 10:20:01 +00:00
David Majnemer 3cac85e071 InstCombine: mul to shl shouldn't preserve nsw
consider:
mul i32 nsw %x, -2147483648

this instruction will not result in poison if %x is 1

however, if we transform this into:
shl i32 nsw %x, 31

then we will be generating poison because we just shifted into the sign
bit.

This fixes PR21242.

llvm-svn: 219566
2014-10-11 10:19:52 +00:00
Sanjay Patel ad8b666624 Return undef on FP <-> Int conversions that overflow (PR21330).
The LLVM Lang Ref states for signed/unsigned int to float conversions:
"If the value cannot fit in the floating point value, the results are undefined."

And for FP to signed/unsigned int:
"If the value cannot fit in ty2, the results are undefined."

This matches the C definitions.

The existing behavior pins to infinity or a max int value, but that may just
lead to more confusion as seen in:
http://llvm.org/bugs/show_bug.cgi?id=21130

Returning undef will hopefully lead to a less silent failure.

Differential Revision: http://reviews.llvm.org/D5603

llvm-svn: 219542
2014-10-10 23:00:21 +00:00
Andrea Di Biagio 458a669f49 [InstCombine] Fix wrong folding of constant comparisons involving ashr and negative values.
This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we
wrongly computed the distance between the highest bits set of two negative
values.

This fixes PR21222.

Differential Revision: http://reviews.llvm.org/D5700

llvm-svn: 219406
2014-10-09 12:41:49 +00:00
Justin Bogner 894eff7a9f Revert "[InstCombine] re-commit r218721 with fix for pr21199"
This seems to cause a miscompile when building clang, which causes a
bootstrapped clang to fail or crash in several of its tests.

See:
  http://lab.llvm.org:8013/builders/clang-x86_64-darwin11-RA/builds/1184
  http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/7813

This reverts commit r219282.

llvm-svn: 219317
2014-10-08 16:30:22 +00:00
Gerolf Hoflehner e2ff5b9223 [InstCombine] re-commit r218721 with fix for pr21199
The icmp-select-icmp optimization targets select-icmp.eq
only. This is now ensured by testing the branch predicate
explictly. This commit also includes the test case for pr21199.

llvm-svn: 219282
2014-10-08 06:42:19 +00:00
Hans Wennborg 1256198bbc Revert r219175 - [InstCombine] re-commit r218721 icmp-select-icmp optimization
This seems to have caused PR21199.

llvm-svn: 219264
2014-10-08 01:05:57 +00:00
Suyog Sarda 181cc9a029 Remove Extra lines. NFC.
llvm-svn: 219201
2014-10-07 11:31:31 +00:00
Gerolf Hoflehner c0b4c20e5e [InstCombine] re-commit r218721 icmp-select-icmp optimization
Takes care of the assert that caused build fails.
Rather than asserting the code checks now that the definition
and use are in the same block, and does not attempt
to optimize when that is not the case.

llvm-svn: 219175
2014-10-07 00:16:12 +00:00
Hal Finkel 04a156139e [InstCombine] Remove redundant @llvm.assume intrinsics
For any @llvm.assume intrinsic, if there is another which dominates it and uses
the same condition, then it is redundant and can be removed. While this does
not alter the semantics of the @llvm.assume intrinsics, it makes subsequent
handling more efficient (and the resulting IR easier to read).

llvm-svn: 219067
2014-10-04 21:27:06 +00:00
Richard Smith 1ed4229f6f PR21145: Teach LLVM about C++14 sized deallocation functions.
C++14 adds new builtin signatures for 'operator delete'. This change allows
new/delete pairs to be removed in C++14 onwards, as they were in C++11 and
before.

llvm-svn: 219014
2014-10-03 20:17:06 +00:00
Duncan P. N. Exon Smith 176b691d32 Revert "Revert "DI: Fold constant arguments into a single MDString""
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash.  The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).

Original commit message follows.

--

This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 219010
2014-10-03 20:01:09 +00:00
Duncan P. N. Exon Smith 786cd049fc Revert "DI: Fold constant arguments into a single MDString"
This reverts commit r218914 while I investigate some bots.

llvm-svn: 218918
2014-10-02 22:15:31 +00:00
Duncan P. N. Exon Smith 571f97bd90 DI: Fold constant arguments into a single MDString
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 218914
2014-10-02 21:56:57 +00:00
Sanjay Patel 13a657819b Remove unused function attribute params.
llvm-svn: 218909
2014-10-02 21:12:04 +00:00
Sanjay Patel 12d1ce5408 Optimize square root squared (PR21126).
When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X.

This can happen in the real world when calculating x ** 3/2. This occurs
in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c.

Differential Revision: http://reviews.llvm.org/D5584

llvm-svn: 218906
2014-10-02 21:10:54 +00:00
Sanjay Patel 7b2cd9ad86 Make the sqrt intrinsic return undef for a negative input.
As discussed here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html

And again here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html

The sqrt of a negative number when using the llvm intrinsic is undefined. 
We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref.

This change should not affect any code that isn't using "no-nans-fp-math"; 
ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call.

Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and 
possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either. 
We knowingly approve of this difference with the other compilers in an attempt to flag code 
that is invoking undefined behavior.

A front-end warning should also try to convince the user that the program will fail:
http://llvm.org/bugs/show_bug.cgi?id=21093

Differential Revision: http://reviews.llvm.org/D5527

llvm-svn: 218803
2014-10-01 20:36:33 +00:00
Adrian Prantl 87b7eb9d0f Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
2014-10-01 18:55:02 +00:00
Adrian Prantl b458dc2eee Revert r218778 while investigating buldbot breakage.
"Move the complex address expression out of DIVariable and into an extra"

llvm-svn: 218782
2014-10-01 18:10:54 +00:00
Adrian Prantl 25a7174e7a Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

llvm-svn: 218778
2014-10-01 17:55:39 +00:00