Commit Graph

12258 Commits

Author SHA1 Message Date
Kostya Serebryany 4f8f0c5aa2 [asan] experimental tracing for indirect calls, llvm part.
llvm-svn: 220699
2014-10-27 18:13:56 +00:00
David Majnemer c8bdd23acf InstCombine: Fix a combine assuming that icmp operands were integers
An icmp may have pointer arguments, it isn't limited to integers or
vectors of integers.

This fixes PR21388.

llvm-svn: 220664
2014-10-27 05:47:49 +00:00
Arnold Schwaighofer eb1a38fa73 Add an option to the LTO code generator to disable vectorization during LTO
We used to always vectorize (slp and loop vectorize) in the LTO pass pipeline.

r220345 changed it so that we used the PassManager's fields 'LoopVectorize' and
'SLPVectorize' out of the desire to be able to disable vectorization using the
cl::opt flags 'vectorize-loops'/'slp-vectorize' which the before mentioned
fields default to.
Unfortunately, this turns off vectorization because those fields
default to false.
This commit adds flags to the LTO library to disable lto vectorization which
reconciles the desire to optionally disable vectorization during LTO and
the desired behavior of defaulting to enabled vectorization.

We really want tools to set PassManager flags directly to enable/disable
vectorization and not go the route via cl::opt flags *in*
PassManagerBuilder.cpp.

llvm-svn: 220652
2014-10-26 21:50:58 +00:00
Andrew Trick dd925ad218 LSR: Minor cleanup after Daniel's patch.
Combine the Inserted an Done sets into a Visited set.

llvm-svn: 220623
2014-10-25 19:59:30 +00:00
Andrew Trick 9ccbed5a12 Fix LSR compile time.
This is a simple fix that brings the compilation time from 5min to 5s
on a specific real-world example. It's a large chain of computation in
a crypto routine (always a problem for SCEV). A unit test is not
feasible and there would be no way to check it. The fix is just basic
good practice for dealing with SCEVs, there's no risk of regression.

Patch by Daniel Reynaud!

llvm-svn: 220622
2014-10-25 19:42:07 +00:00
Jingyue Wu fe72fcebf6 [SeparateConstOffsetFromGEP] Fixed a bug related to unsigned modulo
The dividend in "signed % unsigned" is treated as unsigned instead of signed,
causing unexpected behavior such as -64 % (uint64_t)24 == 0.

Added a regression test in split-gep.ll

Patched by Hao Liu.

llvm-svn: 220618
2014-10-25 18:34:03 +00:00
Benjamin Kramer 63207bc9c3 Clean up assume intrinsic pattern matching, no need to check that the argument is a value.
Also make it const safe and remove superfluous casting. NFC.

llvm-svn: 220616
2014-10-25 18:09:01 +00:00
Jingyue Wu b723152379 [SeparateConstOffsetFromGEP] Fixed a bug in rebuilding OR expressions
The two operands of the new OR expression should be NextInChain and TheOther
instead of the two original operands.

Added a regression test in split-gep.ll.

Hao Liu reported this bug, and provded the test case and an initial patch.
Thanks! 

llvm-svn: 220615
2014-10-25 17:36:21 +00:00
David Majnemer 2abb8183b5 InstCombine: Remove overzealous asserts
These asserts can trigger if the worklist iteration order is
sufficiently unlucky.  Instead of adding special case logic to handle
these edge conditions, just bail out on trying to transform them:
InstSimplify will get them when it reaches them on the worklist.

This fixes PR21378.

N.B.  No test case is included because any test would rely on the
fragile worklist iteration order.

llvm-svn: 220612
2014-10-25 07:13:13 +00:00
Evgeniy Stepanov d337a59db5 [msan] Make -msan-check-constant-shadow a bit stronger.
Allow (under the experimental flag) non-Instructions to participate in MSan checks.

llvm-svn: 220601
2014-10-24 23:34:15 +00:00
Nick Lewycky 592d84974c If requested, apply function merging at -O0 too. It's useful there to reduce the time to compile.
llvm-svn: 220537
2014-10-23 23:49:31 +00:00
Timur Iskhodzhanov eb229ca928 Make getDISubprogram(const Function *F) available in LLVM
Reviewed at http://reviews.llvm.org/D5950

llvm-svn: 220536
2014-10-23 23:46:28 +00:00
Sanjay Patel 848309da7c Handle sqrt() shrinking in SimplifyLibCalls like any other call
This patch removes a chunk of special case logic for folding 
(float)sqrt((double)x) -> sqrtf(x)
in InstCombineCasts and handles it in the mainstream path of SimplifyLibCalls.

No functional change intended, but I loosened the restriction on the existing
sqrt testcases to allow for this optimization even without unsafe-fp-math because
that's the existing behavior.

I also added a missing test case for not shrinking the llvm.sqrt.f64 intrinsic
in case the result is used as a double.

Differential Revision: http://reviews.llvm.org/D5919

llvm-svn: 220514
2014-10-23 21:52:45 +00:00
Frederic Riss c1892e2d48 Assert that ValueHandleBase::ValueIsRAUWd doesn't change the tracked Value type.
This invariant is enforced in Value::replaceAllUsesWith, thus it seems
logical to apply it also to ValueHandles. This commit fixes InstCombine
to not trigger the assertion during the removal of constant bitcasts in
call instructions.

Differential Revision: http://reviews.llvm.org/D5828

llvm-svn: 220468
2014-10-23 04:08:42 +00:00
Evgeniy Stepanov 7db296eba5 [msan] Emit checks for constant shadow values under an experimental flag.
Does not change the default behavior.

llvm-svn: 220457
2014-10-23 01:05:46 +00:00
Benjamin Kramer 26ce8ff637 LoopVectorize: Simplify code. No functionality change.
llvm-svn: 220405
2014-10-22 19:13:54 +00:00
Diego Novillo 19e7b7e27c Shorten auto iterators for function basic blocks.
Use consistent naming for basic block instances.

No functional changes.

llvm-svn: 220404
2014-10-22 18:39:50 +00:00
Diego Novillo b368b7d558 Use auto iteration in lib/Transforms/Scalar/SampleProfile.cpp. No functional changes.
llvm-svn: 220394
2014-10-22 16:51:50 +00:00
Philip Reames d92c2a7592 Preserving 'nonnull' metadata in SimplifyCFG
When we hoist two loads above an if, we can preserve the nonnull metadata.  We could also do the same for sinking them, but we appear to not handle metadata at all in that case.

Thanks to Hal for the review.

Differential Revision: http://reviews.llvm.org/D5910

llvm-svn: 220392
2014-10-22 16:37:13 +00:00
Sanjay Patel a92fa44740 Shrinkify libcalls: use float versions of double libm functions with fast-math (bug 17850)
When a call to a double-precision libm function has fast-math semantics 
(via function attribute for now because there is no IR-level FMF on calls), 
we can avoid fpext/fptrunc operations and use the float version of the call
if the input and output are both float.

We already do this optimization using a command-line option; this patch just
adds the ability for fast-math to use the existing functionality.

I moved the cl::opt from InstructionCombining into SimplifyLibCalls because
it's only ever used internally to that class.

Modified the existing test cases to use the unsafe-fp-math attribute rather
than repeating all tests.

This patch should solve: http://llvm.org/bugs/show_bug.cgi?id=17850

Differential Revision: http://reviews.llvm.org/D5893

llvm-svn: 220390
2014-10-22 15:29:23 +00:00
Diego Novillo a67c0b43e1 Change error to warning when a profile cannot be found.
When the profile for a function cannot be applied, we use to emit an
error. This seems extreme. The compiler can continue, it's just that the
optimization opportunities won't include profile information.

llvm-svn: 220386
2014-10-22 13:36:35 +00:00
Diego Novillo 8027b80b41 Support using sample profiles with partial debug info.
Summary:
When using a profile, we used to require the use -gmlt so that we could
get access to the line locations. This is used to match line numbers in
the input profile to the line numbers in the function's IR.

But this is actually not necessary. The driver can provide source
location tracking without the emission of debug information. In these
cases, the annotation 'llvm.dbg.cu' is missing from the IR, but the
actual line location annotations are still present.

This patch adds a new way of looking for the start of the current
function. Instead of looking through the compile units in llvm.dbg.cu,
we can walk up the scope for the first instruction in the function with
a debug loc. If that describes the function, we use it. Otherwise, we
keep looking until we find one.

If no such instruction is found, we then give up and produce an error.

Reviewers: echristo, dblaikie

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5887

llvm-svn: 220382
2014-10-22 12:59:00 +00:00
Evgeniy Stepanov 35eb265421 [msan] Handle param-tls overflow.
ParamTLS (shadow for function arguments) is of limited size. This change
makes all arguments that do not fit unpoisoned, and avoids writing
past the end of a TLS buffer.

llvm-svn: 220351
2014-10-22 00:12:40 +00:00
Hans Wennborg 0b39fc0d16 Revert "Teach the load analysis to allow finding available values which require" (r220277)
This seems to have caused PR21330.

llvm-svn: 220349
2014-10-21 23:49:52 +00:00
JF Bastien f42a6ea5ac LTO: respect command-line options that disable vectorization.
Summary: Patches 202051 and 208013 added calls to LTO's PassManager which unconditionally add LoopVectorizePass and SLPVectorizerPass instead of following the logic in PassManagerBuilder::populateModulePassManager and honoring the -vectorize-loops -run-slp-after-loop-vectorization flags.

Reviewers: nadav, aschwaighofer, yijiang

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5884

llvm-svn: 220345
2014-10-21 23:18:21 +00:00
Matt Arsenault d6511b49ac Add minnum / maxnum intrinsics
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.

llvm-svn: 220341
2014-10-21 23:00:20 +00:00
Philip Reames d7c21364a9 Teach combineMetadata how to merge 'nonnull' metadata.
combineMetadata is used when merging two instructions into one.  This change teaches it how to merge 'nonnull' - i.e. only preserve it on the new instruction if it's set on both sources.  This isn't actually used yet since I haven't adjusted any of the call sites to pass in nonnull as a 'known metadata'.  

llvm-svn: 220325
2014-10-21 21:02:19 +00:00
Philip Reames b2d3f035e2 Preserve 'nonnull' when changing type of the load.
When changing the type of a load in Chandler's recent InstCombine changes, we can preserve the new 'nonnull' metadata.  

I considered adding an assert since 'nonnull' is only valid on pointer types, but casting a pointer to a non-pointer would involve more than a bitcast anyways.  If someone extends this transform to handle more than bitcasts, the verifier will report the malformed IR, so a separate assertion isn't needed.  Also, the fpmath flags would have the same problem.

llvm-svn: 220324
2014-10-21 21:00:03 +00:00
David Majnemer d205602a0b InstCombine: Simplify FoldICmpCstShrCst
This function was complicated by the fact that it tried to perform
canonicalizations that were already preformed by InstSimplify.  Remove
this extra code and move the tests over to InstSimplify.  Add asserts to
make sure our preconditions hold before we make any assumptions.

llvm-svn: 220314
2014-10-21 19:51:55 +00:00
Chandler Carruth aa72a6dd3b Teach the load analysis to allow finding available values which require
inttoptr or ptrtoint cast provided there is datalayout available.
Eventually, the datalayout can just be required but in practice it will
always be there today.

To go with the ability to expose available values requiring a ptrtoint
or inttoptr cast, helpers are added to perform one of these three casts.

These smarts are necessary to finish canonicalizing loads and stores to
the operational type requirements without regressing fundamental
combines.

I've added some test cases. These should actually improve as the load
combining and store combining improves, but they may fundamentally be
highlighting some missing combines for select in addition to exercising
the specific added logic to load analysis.

llvm-svn: 220277
2014-10-21 09:00:40 +00:00
Paul Robinson f60e0a160f Do not attribute static allocas to the call site's DebugLoc.
When functions are inlined, instructions without debug information are
attributed to the call site's DebugLoc. After inlining, inlined static
allocas are moved to the caller's entry block, adjacent to the caller's
original static alloca instructions. By retaining the call site's
DebugLoc, these instructions could cause instructions that were
subsequently inserted at the entry block to pick up the same DebugLoc.

Patch by Wolfgang Pieb!

llvm-svn: 220255
2014-10-21 01:00:55 +00:00
Philip Reames 5a3f5f751b Introduce enum values for previously defined metadata types. (NFC)
Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well.  Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison.  This change adds enum value for three existing metadata types:
+    MD_nontemporal = 9, // "nontemporal"
+    MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access"
+    MD_nonnull = 11 // "nonnull"

I went through an updated various uses as well.  I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate.  For example, there were several items in LoopInfo.cpp I chose not to update.

llvm-svn: 220248
2014-10-21 00:13:20 +00:00
David Majnemer f3cadce84c IR: Replace DataLayout::RoundUpAlignment with RoundUpToAlignment
No functional change intended, just cleaning up some code.

llvm-svn: 220187
2014-10-20 06:13:33 +00:00
Chandler Carruth 6665d62117 Fix a somewhat subtle pair of issues with JumpThreading I introduced in
r220178. First, the creation routine doesn't insert prior to the
terminator of the basic block provided, but really at the end of the
basic block. Instead, get the terminator and insert before that. The
next issue was that we need to ensure multiple PHI node entries for
a single predecessor re-use the same cast instruction rather than
creating new ones.

All of the logic here was without tests previously. I've reduced and
added a test case from the test suite that crashed without both of these
fixes.

llvm-svn: 220186
2014-10-20 05:34:36 +00:00
Chandler Carruth eeec35ae1c Teach the load analysis driving core instcombine logic and other bits of
logic to look through pointer casts, making them trivially stronger in
the face of loads and stores with intervening pointer casts.

I've included a few test cases that demonstrate the kind of folding
instcombine can do without pointer casts and then variations which
obfuscate the logic through bitcasts. Without this patch, the variations
all fail to optimize fully.

This is more important now than it has been in the past as I've started
moving the load canonicialization to more closely follow the value type
requirements rather than the pointer type requirements and thus this
needs to be prepared for more pointer casts. When I made the same change
to stores several test cases regressed without logic along these lines
so I wanted to systematically improve matters first.

llvm-svn: 220178
2014-10-20 00:24:14 +00:00
Chandler Carruth bc6378defb Do a better and more complete job of preserving metadata when combining
loads.

This handles many more cases than just the AA metadata, some of them
suggested by Hal in his review of the AA metadata handling patch. I've
tried to test this behavior where tractable to do so.

I'll point out that I have specifically *not* included a test for
debuginfo because it was going to require 2 or 3 times as much work to
craft some input which would survive the "helpful" stripping of debug
info metadata that doesn't match the desired schema. This is another
good example of why the current state of write-ability for our debug
info metadata is unacceptable. I spent over 30 minutes trying to conjure
some test case that would survive, even copying from other debug info
tests, but it always failed to survive with no explanation of why or how
I might fix it. =[

llvm-svn: 220165
2014-10-19 10:46:46 +00:00
David Majnemer 312c3e5f39 InstCombine: (sub (or A B) (xor A B)) --> (and A B)
The following implements the transformation:
(sub (or A B) (xor A B)) --> (and A B).

Patch by Ankur Garg!

Differential Revision: http://reviews.llvm.org/D5719

llvm-svn: 220163
2014-10-19 08:32:32 +00:00
David Majnemer 59939acd26 InstCombine: Optimize icmp eq/ne (shl Const2, A), Const1
The following implements the optimization for sequences of the form:
icmp eq/ne (shl Const2, A), Const1

Such sequences can be transformed to:
icmp eq/ne A, (TrailingZeros(Const1) - TrailingZeros(Const2))

This handles only the equality operators for now. Other operators need
to be handled.

Patch by Ankur Garg!

llvm-svn: 220162
2014-10-19 08:23:08 +00:00
Chandler Carruth a801dd5799 Fix a long-standing miscompile in the load analysis that was uncovered
by my refactoring of this code.

The method isSafeToLoadUnconditionally assumes that the load will
proceed with the preferred type alignment. Given that, it has to ensure
that the alloca or global is at least that aligned. It has always done
this historically when a datalayout is present, but has never checked it
when the datalayout is absent. When I refactored the code in r220156,
I exposed this path when datalayout was present and that turned the
latent bug into a patent bug.

This fixes the issue by just removing the special case which allows
folding things without datalayout. This isn't worth the complexity of
trying to tease apart when it is or isn't safe without actually knowing
the preferred alignment.

llvm-svn: 220161
2014-10-19 08:17:50 +00:00
Chandler Carruth be9dccd64d Preserve AA metadata when combining (cast (load (...))) -> (load (cast
(...))).

llvm-svn: 220141
2014-10-18 11:00:12 +00:00
Chandler Carruth 2f75fcfef3 [InstCombine] Do an about-face on how LLVM canonicalizes (cast (load
...)) and (load (cast ...)): canonicalize toward the former.

Historically, we've tried to load using the type of the *pointer*, and
tried to match that type as closely as possible removing as many pointer
casts as we could and trading them for bitcasts of the loaded value.
This is deeply and fundamentally wrong.

Repeat after me: memory does not have a type! This was a hard lesson for
me to learn working on SROA.

There is only one thing that should actually drive the type used for
a pointer, and that is the type which we need to use to load from that
pointer. Matching up pointer types to the loaded value types is very
useful because it minimizes the physical size of the IR required for
no-op casts. Similarly, the only thing that should drive the type used
for a loaded value is *how that value is used*! Again, this minimizes
casts. And in fact, the *only* thing motivating types in any part of
LLVM's IR are the types used by the operations in the IR. We should
match them as closely as possible.

I've ended up removing some tests here as they were testing bugs or
behavior that is no longer present. Mostly though, this is just cleanup
to let the tests continue to function as intended.

The only fallout I've found so far from this change was SROA and I have
fixed it to not be impeded by the different type of load. If you find
more places where this change causes optimizations not to fire, those
too are likely bugs where we are assuming that the type of pointers is
"significant" for optimization purposes.

llvm-svn: 220138
2014-10-18 06:36:22 +00:00
Chandler Carruth 2dc9682e59 [SROA] Change how SROA does vector-based promotion of allocas to handle
cases where the alloca type, the load types, and the store types used
all disagree.

Previously, the only way that vector-based promotion occured was if the
alloca type was a vector type. This was one of the *very* few remaining
uses of the alloca's type to guide SROA/mem2reg left in LLVM. It turns
out it was a bad idea.

The alloca type can change very easily based on the mixture of types
loaded and stored to that alloca. We shouldn't be relying on it as
a signal for very much. Instead, the source of truth should be loads and
stores. We should canonicalize the loads and stores as much as possible
and then rely on them exclusively in SROA.

When looking and loads and stores, we may find many different candidate
vector types. This change will let SROA try all of them to find a vector
type which is a viable way to promote the entire alloca to a vector
register.

With this change, it becomes possible to do better canonicalization and
optimization of loads and stores without breaking SROA in random ways,
and that should allow fixing a core source of performance loss in hot
numerical loops such as those in Eigen.

llvm-svn: 220116
2014-10-18 00:44:02 +00:00
Evgeniy Stepanov e08633e900 [msan] Fix handling of byval arguments with large alignment.
MSan param-tls slots are 8-byte aligned. This change clips
alignment of memcpy into param-tls to 8.

llvm-svn: 220101
2014-10-17 23:29:44 +00:00
Rafael Espindola 7da1ea83a9 Revert "TRE: make TRE a bit more aggressive"
This reverts commit r219899.

This also updates byval-tail-call.ll to make it clear what was breaking.
Adding r219899 again will cause the load/store to disappear.

llvm-svn: 220093
2014-10-17 21:25:48 +00:00
Hal Finkel dd38c0b876 [DSE] Remove no-data-layout-only type-based overlap checking
DSE's overlap checking contained special logic, used only when no DataLayout
was available, which inferred a complete overwrite when the pointee types were
equal. This logic seems fine for regular loads/stores, but does not work for
memcpy and friends. Instead of fixing this, I'm just removing it.
Philosophically, transformations should not contain enhanced behavior used only
when data layout is lacking (data layout should be strictly additive), and
maintaining these rarely-tested code paths seems not worthwhile at this stage.

Credit to Aliaksei Zasenka for the bug report and the diagnosis. The test case
(slightly reduced from that provided by Aliaksei) replaces the original
contents of test/Transforms/DeadStoreElimination/no-targetdata.ll -- a few
other tests have been updated to have a data layout.

llvm-svn: 220035
2014-10-17 11:56:00 +00:00
Chandler Carruth 8393406f05 [SROA] Switch the common variable name for the 'AllocaSlices' class to
'AS'.

Using 'S' as this was a terrible idea. Arguably, 'AS' is not much
better, but it at least follows the idea of using initialisms and
removes active confusion about the AllocaSlices variable and a Slice
variable.

llvm-svn: 219963
2014-10-16 21:11:55 +00:00
Chandler Carruth 61747042c1 [SROA] More range-based cleanups to SROA, these brought to you by
clang-modernize.

I did have to clean up the variable types and whitespace a bit because
the use of auto made the code much less readable here.

llvm-svn: 219962
2014-10-16 21:05:14 +00:00
Chandler Carruth 57d4cae202 [SROA] Switch a couple of overly complex iterator accessors to just be
ArrayRef accessors.

I think this even came up in review that this was over-engineered, and
indeed it was. Time to un-build it.

llvm-svn: 219958
2014-10-16 20:42:08 +00:00
Chandler Carruth c659df9389 [SROA] Start more deeply moving SROA to use ranges rather than just
iterators.

There are a ton of places where it essentially wants ranges
rather than just iterators. This is just the first step that adds the
core slice range typedefs and uses them in a couple of places. I still
have to explicitly construct them because they've not been punched
throughout the entire set of code. More range-based cleanups incoming.

llvm-svn: 219955
2014-10-16 20:24:07 +00:00
Bjorn Steinbrink d20816fde9 Allow call-slop optzn for destinations with a suitable dereferenceable attribute
Summary:
Currently, call slot optimization requires that if the destination is an
argument, the argument has the sret attribute. This is to ensure that
the memory access won't trap. In addition to sret, we can also allow the
optimization to happen for arguments that have the new dereferenceable
attribute, which gives the same guarantee.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5832

llvm-svn: 219950
2014-10-16 19:43:08 +00:00
Sanjay Patel c699a6117b fold: sqrt(x * x * y) -> fabs(x) * sqrt(y)
If a square root call has an FP multiplication argument that can be reassociated,
then we can hoist a repeated factor out of the square root call and into a fabs().

In the simplest case, this:

   y = sqrt(x * x);

becomes this:

   y = fabs(x);

This patch relies on an earlier optimization in instcombine or reassociate to put the
multiplication tree into a canonical form, so we don't have to search over
every permutation of the multiplication tree.

Because there are no IR-level FastMathFlags for intrinsics (PR21290), we have to
use function-level attributes to do this optimization. This needs to be fixed
for both the intrinsics and in the backend.

Differential Revision: http://reviews.llvm.org/D5787

llvm-svn: 219944
2014-10-16 18:48:17 +00:00
Akira Hatanaka 5c221ef98f Reapply r219832 - InstCombine: Narrow switch instructions using known bits.
The code committed in r219832 asserted when it attempted to shrink a switch
statement whose type was larger than 64-bit.

llvm-svn: 219902
2014-10-16 06:00:46 +00:00
Saleem Abdulrasool 7f52921976 TRE: make TRE a bit more aggressive
Make tail recursion elimination a bit more aggressive.  This allows us to get
tail recursion on functions that are just branches to a different function.  The
fact that the function takes a byval argument does not restrict it from being
optimised into just a tail call.

llvm-svn: 219899
2014-10-16 03:27:30 +00:00
Akira Hatanaka 40c2cf4afc Revert r219832.
llvm-svn: 219884
2014-10-16 01:17:02 +00:00
Hal Finkel 68dc3c7ab2 Preserve non-byval pointer alignment attributes using @llvm.assume when inlining
For pointer-typed function arguments, enhanced alignment can be asserted using
the 'align' attribute. When inlining, if this enhanced alignment information is
not otherwise available, preserve it using @llvm.assume-based alignment
assumptions.

llvm-svn: 219876
2014-10-15 23:44:41 +00:00
Chris Bieneman 5c4e9551c9 Fixing the build failure due to compiler warnings and unnecessary disambiguation.
llvm-svn: 219861
2014-10-15 23:11:35 +00:00
Chris Bieneman 732e0aa9fb Defining a new API for debug options that doesn't rely on static global cl::opts.
Summary:
This is based on the discussions from the LLVMDev thread:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075886.html

Reviewers: chandlerc

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5389

llvm-svn: 219854
2014-10-15 21:54:35 +00:00
Akira Hatanaka 5bb9346a45 InstCombine: Narrow switch instructions using known bits.
Truncate the operands of a switch instruction to a narrower type if the upper
bits are known to be all ones or zeros.

rdar://problem/17720004

llvm-svn: 219832
2014-10-15 19:05:50 +00:00
Hal Finkel 3b7fc86677 [SLPVectorize] Basic ephemeral-value awareness
The SLP vectorizer should not vectorize ephemeral values. These are used to
express information to the optimizer, and vectorizing them does not lead to
faster code (because the ephemeral values are dropped prior to code generation,
vectorized or not), and obscures the information the instructions are
attempting to communicate (the logic that interprets the arguments to
@llvm.assume generically does not understand vectorized conditions).

Also, uses by ephemeral values are free (because they, and the necessary
extractelement instructions, will be dropped prior to code generation).

llvm-svn: 219816
2014-10-15 17:35:01 +00:00
Eric Christopher 611f0488ff No need to cache this unused variable.
Patch by Ehsan Akhgari.

llvm-svn: 219749
2014-10-14 23:58:51 +00:00
Hal Finkel 1a600faba0 [LoopVectorize] Ignore @llvm.assume for cost estimates and legality
A few minor changes to prevent @llvm.assume from interfering with loop
vectorization. First, treat @llvm.assume like the lifetime intrinsics, which
are scalarized (but don't otherwise interfere with the legality checking).
Second, ignore the cost of ephemeral instructions in the loop (these will go
away anyway during CodeGen).

Alignment assumptions and other uses of @llvm.assume can often end up inside of
loops that should be vectorized (this is not uncommon for assumptions generated
by __attribute__((align_value(n))), for example).

llvm-svn: 219741
2014-10-14 22:59:49 +00:00
Sanjay Patel 0ca42bb5a8 Optimize away fabs() calls when input is squared (known positive).
Eliminate library calls and intrinsic calls to fabs when the input 
is a squared value.

Note that no unsafe-math / fast-math assumptions are needed for
this optimization.

Differential Revision: http://reviews.llvm.org/D5777

llvm-svn: 219717
2014-10-14 20:43:11 +00:00
David Majnemer dad2103801 InstCombine: Don't miscompile X % ((Pow2 << A) >>u B)
We assumed that A must be greater than B because the right hand side of
a remainder operator must be nonzero.

However, it is possible for A to be less than B if Pow2 is a power of
two greater than 1.

Take for example:
i32 %A = 0
i32 %B = 31
i32 Pow2 = 2147483648

((Pow2 << 0) >>u 31) is non-zero but A is less than B.

This fixes PR21274.

llvm-svn: 219713
2014-10-14 20:28:40 +00:00
Marcello Maggioni 5bbe3df63f Switch to select optimization for two-case switches
This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block
and a test to check that this condition is handled.

llvm-svn: 219656
2014-10-14 01:58:26 +00:00
Sanjay Patel 17045f7fac fix formatting; NFC
llvm-svn: 219645
2014-10-14 00:33:23 +00:00
Chandler Carruth 7b8297a61e Add some optional passes around the vectorizer to both better prepare
the IR going into it and to clean up the IR produced by the vectorizers.

Note that these are *off by default* right now while folks collect data
on whether the performance tradeoff is reasonable.

In a build of the 'opt' binary, I see about 2% compile time regression
due to this change on average. This is in my mind essentially the worst
expected case: very little of the opt binary is going to *benefit* from
these extra passes.

I've seen several benchmarks improve in performance my small amounts due
to running these passes, and there are certain (rare) cases where these
passes make a huge difference by either enabling the vectorizer at all
or by hoisting runtime checks out of the outer loop. My primary
motivation is to prevent people from seeing runtime check overhead in
benchmarks where the existing passes and optimizers would be able to
eliminate that.

I've chosen the sequence of passes based on the kinds of things that
seem likely to be relevant for the code at each stage: rotaing loops for
the vectorizer, finding correlated values, loop invariants, and
unswitching opportunities from any runtime checks, and cleaning up
commonalities exposed by the SLP vectorizer.

I'll be pinging existing threads where some of these issues have come up
and will start new threads to get folks to benchmark and collect data on
whether this is the right tradeoff or we should do something else.

llvm-svn: 219644
2014-10-14 00:31:29 +00:00
David Majnemer db0773089f InstCombine: Fix miscompile in X % -Y -> X % Y transform
We assumed that negation operations of the form (0 - %Z) resulted in a
negative number.  This isn't true if %Z was originally negative.
Substituting the negative number into the remainder operation may result
in undefined behavior because the dividend might be INT_MIN.

This fixes PR21256.

llvm-svn: 219639
2014-10-13 22:37:51 +00:00
David Majnemer a252138942 InstCombine: Don't miscompile (x lshr C1) udiv C2
We have a transform that changes:
  (x lshr C1) udiv C2
into:
  x udiv (C2 << C1)

However, it is unsafe to do so if C2 << C1 discards any of C2's bits.

This fixes PR21255.

llvm-svn: 219634
2014-10-13 21:48:30 +00:00
Joerg Sonnenberger 5ca10d0edb Revert r219223, it creates invalid PHI nodes.
llvm-svn: 219587
2014-10-12 17:16:04 +00:00
Benjamin Kramer 240b85eec5 InstCombine: Turn (x != 0 & x <u C) into the canonical range check form (x-1 <u C-1)
llvm-svn: 219585
2014-10-12 14:02:34 +00:00
David Majnemer 27adb1240f InstCombine: Simplify commonIDivTransforms
A helper routine, MultiplyOverflows, was a less efficient
reimplementation of APInt's smul_ov and umul_ov.  While we are here,
clean up the code so it's more uniform.

No functionality change intended.

llvm-svn: 219583
2014-10-12 08:34:24 +00:00
David Majnemer fe7fccff11 InstCombine: Don't fold (X <<s log(INT_MIN)) /s INT_MIN to X
Consider the case where X is 2.  (2 <<s 31)/s-2147483648 is zero but we
would fold to X.  Note that this is valid when we are in the unsigned
domain because we require NUW: 2 <<u 31 results in poison.

This fixes PR21245.

llvm-svn: 219568
2014-10-11 10:20:04 +00:00
David Majnemer cb9d596655 InstCombine, InstSimplify: (%X /s C1) /s C2 isn't always 0 when C1 * C2 overflow
consider:
C1 = INT_MIN
C2 = -1

C1 * C2 overflows without a doubt but consider the following:
%x = i32 INT_MIN

This means that (%X /s C1) is 1 and (%X /s C1) /s C2 is -1.

N. B.  Move the unsigned version of this transform to InstSimplify, it
doesn't create any new instructions.

This fixes PR21243.

llvm-svn: 219567
2014-10-11 10:20:01 +00:00
David Majnemer 3cac85e071 InstCombine: mul to shl shouldn't preserve nsw
consider:
mul i32 nsw %x, -2147483648

this instruction will not result in poison if %x is 1

however, if we transform this into:
shl i32 nsw %x, 31

then we will be generating poison because we just shifted into the sign
bit.

This fixes PR21242.

llvm-svn: 219566
2014-10-11 10:19:52 +00:00
Chandler Carruth bff0ae772c [SCEV] Fix one more caller blindly passing the latch to SCEV's
getSmallConstantTripCount even when it isn't the exiting block.

I missed this in my first audit, very sorry. This was found in LNT and
elsewhere. I don't have a test case, but it was completely obvious from
inspection that this was the problem. I'll see if I can reduce a test
case, but I'm not really hopeful, and the value seems quite low.

llvm-svn: 219562
2014-10-11 05:28:30 +00:00
Chandler Carruth 6666c27e99 [SCEV] Add some asserts to the recently improved trip count computation
routines and fix all of the bugs they expose.

I hit a test case that crashed even without these asserts due to passing
a non-exiting latch to the ExitingBlock parameter of the trip count
computation machinery. However, when I add the nice asserts, it turns
out we have plenty of coverage of these bugs, they just didn't manifest
in crashers.

The core problem seems to stem from an assumption that the latch *is*
the exiting block. While this is often true, and somewhat the "normal"
way to think about loops, it isn't necessarily true. The correct way to
call the trip count routines in a *generic* fashion (that is, without
a particular exit in mind) is to just use the loop's single exiting
block if it has one. The trip count can't be computed generically unless
it does. This works great for the loop vectorizer. The loop unroller
actually *wants* to select the latch when it has to chose between
multiple exits because for unrolling it is the latch trips that matter.
But if this is the desire, it needs to explicitly guard for non-exiting
latches and check for the generic trip count in that case.

I've added the asserts, and added convenience APIs for querying the trip
count generically that check for a single exit block. I've kept the APIs
consistent between computing trip count and trip multiples.

Thansk to Mark for the help debugging and tracking down the *right* fix
here!

llvm-svn: 219550
2014-10-11 00:12:11 +00:00
Arnold Schwaighofer d7d010eb2a SimplifyCFG: Don't convert phis into selects if we could remove undef behavior
instead

We used to transform this:

  define void @test6(i1 %cond, i8* %ptr) {
  entry:
    br i1 %cond, label %bb1, label %bb2

  bb1:
    br label %bb2

  bb2:
    %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

into this:

  define void @test6(i1 %cond, i8* %ptr) {
    %ptr.2 = select i1 %cond, i8* null, i8* %ptr
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

because the simplifycfg transformation into selects would happen to happen
before the simplifycfg transformation that removes unreachable control flow
(We have 'unreachable control flow' due to the store to null which is undefined
behavior).

The existing transformation that removes unreachable control flow in simplifycfg
is:

  /// If BB has an incoming value that will always trigger undefined behavior
  /// (eg. null pointer dereference), remove the branch leading here.
  static bool removeUndefIntroducingPredecessor(BasicBlock *BB)

Now we generate:

  define void @test6(i1 %cond, i8* %ptr) {
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

I did not see any impact on the test-suite + externals.

rdar://18596215

llvm-svn: 219462
2014-10-10 01:27:02 +00:00
Chad Rosier bd64d46188 [Reassociate] Don't canonicalize X - undef to X + (-undef).
Phabricator Revision: http://reviews.llvm.org/D5674
PR21205

llvm-svn: 219434
2014-10-09 20:06:29 +00:00
Andrea Di Biagio 458a669f49 [InstCombine] Fix wrong folding of constant comparisons involving ashr and negative values.
This patch fixes a bug in method InstCombiner::FoldCmpCstShrCst where we
wrongly computed the distance between the highest bits set of two negative
values.

This fixes PR21222.

Differential Revision: http://reviews.llvm.org/D5700

llvm-svn: 219406
2014-10-09 12:41:49 +00:00
Bob Wilson 9868d71ffe Use triple's isiOS() and isOSDarwin() methods.
These methods are already used in lots of places. This makes things more
consistent. NFC.

llvm-svn: 219386
2014-10-09 05:43:30 +00:00
David Majnemer ac07703842 Inliner: Non-local functions in COMDATs shouldn't be dropped
A function with discardable linkage cannot be discarded if its a member
of a COMDAT group without considering all the other COMDAT members as
well.  This sort of thing is already handled by GlobalOpt/GlobalDCE.

This fixes PR21206.

llvm-svn: 219335
2014-10-08 19:32:32 +00:00
Justin Bogner 894eff7a9f Revert "[InstCombine] re-commit r218721 with fix for pr21199"
This seems to cause a miscompile when building clang, which causes a
bootstrapped clang to fail or crash in several of its tests.

See:
  http://lab.llvm.org:8013/builders/clang-x86_64-darwin11-RA/builds/1184
  http://bb.pgr.jp/builders/clang-3stage-x86_64-linux/builds/7813

This reverts commit r219282.

llvm-svn: 219317
2014-10-08 16:30:22 +00:00
Suyog Sarda cba4b1d64d Format spacing and remove extra lines to comply with standards. NFC.
Differential Revision: http://reviews.llvm.org/D5649
 

llvm-svn: 219286
2014-10-08 08:37:49 +00:00
David Majnemer 1b3b70e371 GlobalOpt: Don't drop unused memberes of a Comdat
A linkonce_odr member of a COMDAT shouldn't be dropped if we need to
keep the entire COMDAT group.

This fixes PR21191.

llvm-svn: 219283
2014-10-08 07:23:31 +00:00
Gerolf Hoflehner e2ff5b9223 [InstCombine] re-commit r218721 with fix for pr21199
The icmp-select-icmp optimization targets select-icmp.eq
only. This is now ensured by testing the branch predicate
explictly. This commit also includes the test case for pr21199.

llvm-svn: 219282
2014-10-08 06:42:19 +00:00
Hans Wennborg 1256198bbc Revert r219175 - [InstCombine] re-commit r218721 icmp-select-icmp optimization
This seems to have caused PR21199.

llvm-svn: 219264
2014-10-08 01:05:57 +00:00
David Blaikie c6c6c7b177 DebugInfo+DFSan: Ensure that debug info references to llvm::Functions remain pointing to the underlying function when wrappers are created
This is somewhat the inverse of how similar bugs in DAE and ArgPromo
manifested and were addressed. In those passes, individual call sites
were visited explicitly, and then the old function was deleted. This
left the debug info with a null llvm::Function* that needed to be
updated to point to the new function.

In the case of DFSan, it RAUWs the old function with the wrapper, which
includes debug info. So now the debug info refers to the wrapper, which
doesn't actually have any instructions with debug info in it, so it is
ignored entirely - resulting in a DW_TAG_subprogram with no high/low pc,
etc. Instead, fix up the debug info to refer to the original function
after the RAUW messed it up.

Reviewed/discussed with Peter Collingbourne on the llvm-dev mailing
list.

llvm-svn: 219249
2014-10-07 22:59:46 +00:00
Duncan P. N. Exon Smith c46cfcbbc6 LoopUnroll: Create sub-loops in LoopInfo
`LoopUnrollPass` says that it preserves `LoopInfo` -- make it so.  In
particular, tell `LoopInfo` about copies of inner loops when unrolling
the outer loop.

Conservatively, also tell `ScalarEvolution` to forget about the original
versions of these loops, since their inputs may have changed.

Fixes PR20987.

llvm-svn: 219241
2014-10-07 21:19:00 +00:00
Duncan P. N. Exon Smith 9b4d37e8f5 LoopUnroll: Only check for ScalarEvolution analysis once, NFC
A follow-up commit will add use to a tight loop.  We might as well just
find it once anyway.

llvm-svn: 219239
2014-10-07 21:12:44 +00:00
Marcello Maggioni 963bc87dbd Two case switch to select optimization
This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block
to a select or a couple of selects (depending if the default block is reachable or not).

The typical case this optimization wants to be able to optimize is this one:

Example:
switch (a) {
  case 10:                %0 = icmp eq i32 %a, 10
    return 10;            %1 = select i1 %0, i32 10, i32 4
  case 20:        ---->   %2 = icmp eq i32 %a, 20
    return 2;             %3 = select i1 %2, i32 2, i32 %1
  default:
    return 4;
}

It also sets the base for further optimizations that are planned and being reviewed.

llvm-svn: 219223
2014-10-07 18:16:44 +00:00
David Blaikie 17364d4e05 DebugInfo+DeadArgElimination: Ensure llvm::Function*s from debug info are updated even when DAE removes both varargs and non-varargs arguments on the same function.
After some stellar (& inspired) help from Reid Kleckner providing a test
case for some rather unstable undefined behavior showing up as
assertions produced by r214761, I was able to fix this issue in DAE
involving the application of both varargs removal, followed by normal
argument removal.

Indeed I introduced this same bug into ArgumentPromotion (r212128) by
copying the code from DAE, and when I fixed the bug in ArgPromo
(r213805) and commented in that patch that I didn't need to address the
same issue in DAE because it was a single pass. Turns out it's two pass,
one for the varargs and one for the normal arguments, so the same fix is
needed (at least during varargs removal). So here it is.

(the observable/net effect of this bug, even when it didn't result in
assertion failure, is that debug info would describe the DAE'd function
in the abstract, but wouldn't provide high/low_pc, variable locations,
line table, etc (it would appear as though the function had been
entirely optimized away), see the original PR14016 for details of the
general problem)

I'm not recommitting the assertion just yet, as there's been another
regression of it since I last tried. It might just be a few test cases
weren't adequately updated after Adrian or Duncan's recent schema
changes.

llvm-svn: 219210
2014-10-07 15:10:23 +00:00
Suyog Sarda 65f5ae997c Reformat if statement to comply with LLVM standards. NFC.
Differential Revision: http://reviews.llvm.org/D5644

llvm-svn: 219203
2014-10-07 12:04:07 +00:00
Suyog Sarda ea205517a9 Reformat to comply with LLVM coding standards using clang-format.
NFC.

Differential Revision: http://reviews.llvm.org/D5645

llvm-svn: 219202
2014-10-07 11:56:06 +00:00
Tilmann Scheller 2bc5cb687b [InstCombine] Reformat if statements to comply with LLVM Coding Standards.
Patch by Sonam Kumari!

Differential Revision: http://reviews.llvm.org/D5643

llvm-svn: 219198
2014-10-07 10:19:34 +00:00
David Majnemer e025321d36 GlobalDCE: Don't drop any COMDAT members
If we require a single member of a comdat, require all of the other
members as well.

This fixes PR20981.

llvm-svn: 219191
2014-10-07 07:07:19 +00:00
Gerolf Hoflehner c0b4c20e5e [InstCombine] re-commit r218721 icmp-select-icmp optimization
Takes care of the assert that caused build fails.
Rather than asserting the code checks now that the definition
and use are in the same block, and does not attempt
to optimize when that is not the case.

llvm-svn: 219175
2014-10-07 00:16:12 +00:00
David Blaikie e44ee92a3f range-for some loops in DAE
llvm-svn: 219167
2014-10-06 22:59:29 +00:00
Duncan P. N. Exon Smith e5d7d9797b LoopUnroll: Change code order of changes to new basic blocks
Add new basic blocks to `LoopInfo` earlier.  No functionality change
intended (simplifies upcoming bugfix patch).

llvm-svn: 219150
2014-10-06 22:05:02 +00:00
Duncan P. N. Exon Smith 0bbf5418c6 Sink comment, NFC
llvm-svn: 219149
2014-10-06 22:04:59 +00:00
Owen Anderson 8373d338f6 Give the Reassociate pass a bit more flexibility and autonomy when optimizing expressions.
Particularly, it addresses cases where Reassociate breaks Subtracts but then fails to optimize combinations like I1 + -I2 where I1 and I2 have the same rank and are identical.

Patch by Dmitri Shtilman.

llvm-svn: 219092
2014-10-05 23:41:26 +00:00
Hal Finkel 4564688806 [InstCombine] Simplify the logic from r219067 using ValueTracking
Joerg suggested on IRC that I look at generalizing the logic from r219067 to
handle more general redundancies (like removing an assume(x > 3) dominated by
an assume(x > 5)). The way to do this would be to ask ValueTracking to
determine the value of the i1 argument. It turns out that ValueTracking is not
very good at this right now (although it does get the trivial redundancy case)
because it does not understand ICmps. Nevertheless, the resulting code in
InstCombine is simpler than r219067, so we might as well do it now.

llvm-svn: 219070
2014-10-05 00:53:02 +00:00
Hal Finkel 04a156139e [InstCombine] Remove redundant @llvm.assume intrinsics
For any @llvm.assume intrinsic, if there is another which dominates it and uses
the same condition, then it is redundant and can be removed. While this does
not alter the semantics of the @llvm.assume intrinsics, it makes subsequent
handling more efficient (and the resulting IR easier to read).

llvm-svn: 219067
2014-10-04 21:27:06 +00:00
Benjamin Kramer c6cc58e703 Remove unnecessary copying or replace it with moves in a bunch of places.
NFC.

llvm-svn: 219061
2014-10-04 16:55:56 +00:00
Duncan P. N. Exon Smith 176b691d32 Revert "Revert "DI: Fold constant arguments into a single MDString""
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash.  The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).

Original commit message follows.

--

This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 219010
2014-10-03 20:01:09 +00:00
Benjamin Kramer e12a6bac32 Eliminate some deep std::vector copies. NFC.
llvm-svn: 218999
2014-10-03 18:33:16 +00:00
Duncan P. N. Exon Smith 786cd049fc Revert "DI: Fold constant arguments into a single MDString"
This reverts commit r218914 while I investigate some bots.

llvm-svn: 218918
2014-10-02 22:15:31 +00:00
Duncan P. N. Exon Smith 571f97bd90 DI: Fold constant arguments into a single MDString
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 218914
2014-10-02 21:56:57 +00:00
Sanjay Patel 12d1ce5408 Optimize square root squared (PR21126).
When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X.

This can happen in the real world when calculating x ** 3/2. This occurs
in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c.

Differential Revision: http://reviews.llvm.org/D5584

llvm-svn: 218906
2014-10-02 21:10:54 +00:00
Sanjay Patel b41d46118a Use the local variable that other clauses around here are already using.
llvm-svn: 218876
2014-10-02 15:20:45 +00:00
Zinovy Nis ccc3e3733b [BUG][INDVAR] Fix for PR21014: wrong SCEV operands commuting for non-commutative instructions
My commit rL216160 introduced a bug PR21014: IndVars widens code 'for (i = ; i < ...; i++) arr[ CONST - i]' into 'for (i = ; i < ...; i++) arr[ i - CONST]'
thus inverting index expression. This patch fixes it. 
Thanks to Jörg Sonnenberger for pointing.

Differential Revision: http://reviews.llvm.org/D5576

llvm-svn: 218867
2014-10-02 13:01:15 +00:00
Duncan P. N. Exon Smith 611afb229c DIBuilder: Encapsulate DIExpression's element type
`DIExpression`'s elements are 64-bit integers that are stored as
`ConstantInt`.  The accessors already encapsulate the storage.  This
commit updates the `DIBuilder` API to also encapsulate that.

llvm-svn: 218797
2014-10-01 20:26:08 +00:00
Adrian Prantl 87b7eb9d0f Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
2014-10-01 18:55:02 +00:00
Adrian Prantl b458dc2eee Revert r218778 while investigating buldbot breakage.
"Move the complex address expression out of DIVariable and into an extra"

llvm-svn: 218782
2014-10-01 18:10:54 +00:00
Adrian Prantl 25a7174e7a Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

llvm-svn: 218778
2014-10-01 17:55:39 +00:00
Tom Stellard 0a4e9a3b25 C API: Add LLVMCloneModule()
llvm-svn: 218775
2014-10-01 17:14:57 +00:00
Evgeniy Stepanov 815f2869ad Revert r218721, r218735.
Failing bootstrap on Linux (arm, x86).

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13139/steps/bootstrap%20clang/logs/stdio
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/470
http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/8518

llvm-svn: 218752
2014-10-01 10:07:28 +00:00
Gerolf Hoflehner 19fc3dafc8 [InstCombine] Fix for assert build failures caused by r218721
The icmp-select-icmp optimization made the implicit assumption
that the select-icmp instructions are in the same block and asserted on it.
The fix explicitly checks for that condition and conservatively suppresses
the optimization when it is violated.

llvm-svn: 218735
2014-10-01 03:24:39 +00:00
Gerolf Hoflehner 08cc4b950c [InstCombine] Optimize icmp-select-icmp
In special cases select instructions can be eliminated by
replacing them with a cheaper bitwise operation even when the
select result is used outside its home block. The instances implemented
are patterns like
    %x=icmp.eq
    %y=select %x,%r, null
    %z=icmp.eq|neq %y, null
    br %z,true, false
==> %x=icmp.ne
    %y=icmp.eq %r,null
    %z=or %x,%y
    br %z,true,false
The optimization is integrated into the instruction
combiner and performed only when all uses of the select result can
be replaced by the select operand proper. For this dominator information
is used and dominance is now a required analysis pass in the combiner.
The optimization itself is iterative. The critical step is to replace the
select result with the non-constant select operand. So the select becomes
local and the combiner iteratively works out simpler code pattern and
eventually eliminates the select.

rdar://17853760

llvm-svn: 218721
2014-10-01 00:13:22 +00:00
Jingyue Wu fc0296704c [SimplifyCFG] threshold for folding branches with common destination
Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.

The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:

  __global__ void foo(int a, int b, int c, int d, int e, int n,
                      const int *input, int *output) {
    int sum = 0;
    for (int i = 0; i < n; ++i)
      sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
    *output = sum;
  }

The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:

  sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];

Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.

We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!

Test Plan: Added one test case to check the threshold is in effect

Reviewers: nadav, eliben, meheff, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D5529

llvm-svn: 218711
2014-09-30 22:23:38 +00:00
Lorenzo Martignoni 40d3deeb7d Introduce support for custom wrappers for vararg functions.
Differential Revision: http://reviews.llvm.org/D5412

llvm-svn: 218671
2014-09-30 12:33:16 +00:00
Chad Rosier aab5d7bd33 [IndVarSimplify] Widen loop unsigned compares.
This patch extends r217953 to handle unsigned comparison.
Phabricator revision: http://reviews.llvm.org/D5526

llvm-svn: 218659
2014-09-30 03:17:42 +00:00
Kevin Qin fc02e3c363 Use a loop to simplify the runtime unrolling prologue.
Runtime unrolling will create a prologue to execute the extra
iterations which is can't divided by the unroll factor. It
generates an if-then-else sequence to jump into a factor -1
times unrolled loop body, like

    extraiters = tripcount % loopfactor
    if (extraiters == 0) jump Loop:
    if (extraiters == loopfactor) jump L1
    if (extraiters == loopfactor-1) jump L2
    ...
    L1:  LoopBody;
    L2:  LoopBody;
    ...
    if tripcount < loopfactor jump End
    Loop:
    ...
    End:

It means if the unroll factor is 4, the loop body will be 7
times unrolled, 3 are in loop prologue, and 4 are in the loop.
This commit is to use a loop to execute the extra iterations
in prologue, like

        extraiters = tripcount % loopfactor
        if (extraiters == 0) jump Loop:
        else jump Prol
 Prol:  LoopBody;
        extraiters -= 1                 // Omitted if unroll factor is 2.
        if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2.
        if (tripcount < loopfactor) jump End
 Loop:
 ...
 End:

Then when unroll factor is 4, the loop body will be copied by
only 5 times, 1 in the prologue loop, 4 in the original loop.
And if the unroll factor is 2, new loop won't be created, just
as the original solution.

llvm-svn: 218604
2014-09-29 11:15:00 +00:00
Chad Rosier 7b974b73ae [IndVar] Don't widen loop compare unless IV user is sign extended.
PR21030

llvm-svn: 218539
2014-09-26 20:05:35 +00:00
Kostya Serebryany 34ddf8725c [asan] don't instrument module CTORs that may be run before asan.module_ctor. This fixes asan running together -coverage
llvm-svn: 218421
2014-09-24 22:41:55 +00:00
David Peixotto 0d4d5e64ec Fix assertion in LICM doFinalization()
The doFinalization method checks that the LoopToAliasSetMap is
empty. LICM populates that map as it runs through the loop nest,
deleting the entries for child loops as it goes. However, if a child
loop is deleted by another pass (e.g. unrolling) then the loop will
never be deleted from the map because LICM walks the loop nest to
find entries it can delete.

The fix is to delete the loop from the map and free the alias set
when the loop is deleted from the loop nest.

Differential Revision: http://reviews.llvm.org/D5305

llvm-svn: 218387
2014-09-24 16:48:31 +00:00
Michael Liao d120916ca7 Allow BB duplication threshold to be adjusted through JumpThreading's ctor
- BB duplication may not be desired on targets where there is no or small
  branch penalty and code duplication needs restrict control.

llvm-svn: 218375
2014-09-24 04:59:06 +00:00
Reid Kleckner 78927e884b GlobalOpt: Preserve comdats of unoptimized initializers
Rather than slurping in and splatting out the whole ctor list, preserve
the existing array entries without trying to understand them.  Only
remove the entries that we know we can optimize away.  This way we don't
need to wire through priority and comdats or anything else we might add.

Fixes a linker issue where the .init_array or .ctors entry would point
to discarded initialization code if the comdat group from the TU with
the faulty global_ctors entry was dropped.

llvm-svn: 218337
2014-09-23 22:33:01 +00:00
Lenny Maiorani 9eefc81219 Using a deque to manage the stack of nodes is faster here.
Vector is slow due to many reallocations as the size regularly changes in
  unpredictable ways. See the investigation provided on the mailing list for
  more information:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120116/135228.html

llvm-svn: 218182
2014-09-20 13:29:20 +00:00
Eric Christopher d85ffb1fc0 Add a new pass FunctionTargetTransformInfo. This pass serves as a
shim between the TargetTransformInfo immutable pass and the Subtarget
via the TargetMachine and Function. Migrate a single call from
BasicTargetTransformInfo as an example and provide shims where TargetMachine
begins taking a Function to determine the subtarget.

No functional change.

llvm-svn: 218004
2014-09-18 00:34:14 +00:00
David Blaikie dba94ec3c7 Reapply fix in r217988 (reverted in r217989) and remove the alternative fix committed in r217987.
This type isn't owned polymorphically (as demonstrated by making the
dtor protected and everything still compiling) so just address the
warning by protecting the base dtor and making the derived class final.

llvm-svn: 217990
2014-09-17 22:27:36 +00:00
David Blaikie d8978ec085 Revert "Fix -Wnon-virtual-dtor warning introduced in r217982."
An alternative fix was already committed.

This reverts commit r217988.

llvm-svn: 217989
2014-09-17 22:17:59 +00:00
David Blaikie 20dd05ccfd Fix -Wnon-virtual-dtor warning introduced in r217982.
llvm-svn: 217988
2014-09-17 22:15:40 +00:00
Chris Bieneman cf93cbb7a4 Fixing a build error.
llvm-svn: 217983
2014-09-17 21:06:59 +00:00
Chris Bieneman ad070d0588 Refactoring SimplifyLibCalls to remove static initializers and generally cleaning up the code.
Summary: This eliminates ~200 lines of code mostly file scoped struct definitions that were unnecessary.

Reviewers: chandlerc, resistor

Reviewed By: resistor

Subscribers: morisset, resistor, llvm-commits

Differential Revision: http://reviews.llvm.org/D5364

llvm-svn: 217982
2014-09-17 20:55:46 +00:00
Chad Rosier 307b50b0f6 [IndVarSimplify] Partially revert r217953 to see if this fixes the bots.
Specifically, disable widening of unsigned compare instructions.

llvm-svn: 217962
2014-09-17 16:35:09 +00:00
Chad Rosier bb99f40530 [IndVarSimplify] Widen loop compare instructions.
This improves other optimizations such as LSR.  A sext may be added to the
compare's other operand, but this can often be hoisted outside of the loop.

llvm-svn: 217953
2014-09-17 14:10:33 +00:00
Andrea Di Biagio 5b92b4971a [InstCombine] Fix wrong folding of constant comparison involving ahsr and negative quantities (PR20945).
Example:
define i1 @foo(i32 %a) {
  %shr = ashr i32 -9, %a
  %cmp = icmp ne i32 %shr, -5
  ret i1 %cmp
}

Before this fix, the instruction combiner wrongly thought that %shr
could have never been equal to -5. Therefore, %cmp was always folded to 'true'.
However, when %a is equal to 1, then %cmp evaluates to 'false'. Therefore,
in this example, it is not valid to fold %cmp to 'true'.
The problem was only affecting the case where the comparison was between
negative quantities where one of the quantities was obtained from arithmetic
shift of a negative constant.

This patch fixes the problem with the wrong folding (fixes PR20945).
With this patch, the 'icmp' from the example is now simplified to a
comparison between %a and 1. This still allows us to get rid of the arithmetic
shift (%shr).

llvm-svn: 217950
2014-09-17 11:32:31 +00:00
Jingyue Wu b67140b812 Remove dead code in SimplifyCFG
Summary: UsedByBranch is always true according to how BonusInst is defined.

Test Plan:
Passes check-all, and also verified 

if (BonusInst && !UsedByBranch) {
  ...
}

is never entered during check-all.

Reviewers: resistor, nadav, jingyue

Reviewed By: jingyue

Subscribers: llvm-commits, eliben, meheff

Differential Revision: http://reviews.llvm.org/D5324

llvm-svn: 217824
2014-09-15 20:48:13 +00:00
Nick Lewycky 9e6d184803 Add control of function merging to the PMBuilder.
llvm-svn: 217731
2014-09-13 21:46:00 +00:00
Benjamin Kramer 0bd147da17 Simplify code. No functionality change.
llvm-svn: 217726
2014-09-13 12:38:49 +00:00
Juergen Ributzka 14ae60407d [C API] Make the 'lower switch' pass available via the C API.
llvm-svn: 217630
2014-09-11 21:32:32 +00:00
Hal Finkel f83e1f7f66 [AlignmentFromAssumptions] Don't crash just because the target is 32-bit
We used to crash processing any relevant @llvm.assume on a 32-bit target
(because we'd ask SE to subtract expressions of differing types). I've copied
our 'simple.ll' test, but with the data layout from arm-linux-gnueabihf to get
some meaningful test coverage here.

llvm-svn: 217574
2014-09-11 08:40:17 +00:00
Rafael Espindola c435adcde0 Add doInitialization/doFinalization to DataLayoutPass.
With this a DataLayoutPass can be reused for multiple modules.

Once we have doInitialization/doFinalization, it doesn't seem necessary to pass
a Module to the constructor.

Overall this change seems in line with the idea of making DataLayout a required
part of Module. With it the only way of having a DataLayout used is to add it
to the Module.

llvm-svn: 217548
2014-09-10 21:27:43 +00:00
Hal Finkel 71b7084112 [AlignmentFromAssumptions] Don't divide by zero for unknown starting alignment
The routine that determines an alignment given some SCEV returns zero if the
answer is unknown. In a case where we could determine the increment of an
AddRec but not the starting alignment, we would compute the integer modulus by
zero (which is illegal and traps). Prevent this by returning early if either
the start or increment alignment is unknown (zero).

llvm-svn: 217544
2014-09-10 21:05:52 +00:00
Gerolf Hoflehner 008e5cdcba [PassManager] Adding Hidden attribute to EnableMLSM option
llvm-svn: 217539
2014-09-10 20:24:03 +00:00
Gerolf Hoflehner 24815d9b8f [MergedLoadStoreMotion] Move pass enabling option to PassManagerBuilder
llvm-svn: 217538
2014-09-10 19:55:29 +00:00
Sanjay Patel b653de1ada Rename getMaximumUnrollFactor -> getMaxInterleaveFactor; also rename option names controlling this variable.
"Unroll" is not the appropriate name for this variable. Clang already uses 
the term "interleave" in pragmas and metadata for this.

Differential Revision: http://reviews.llvm.org/D5066

llvm-svn: 217528
2014-09-10 17:58:16 +00:00
Gerolf Hoflehner e4f6684d1b Removed misleading comment.
llvm-svn: 217527
2014-09-10 17:54:50 +00:00
Stepan Dyatkovskiy fe134cdfa7 MergeFunctions: FunctionPtr has been renamed to FunctionNode.
It's supposed to store additional pass information for current function here.
That was the reason for name change.

llvm-svn: 217483
2014-09-10 10:08:25 +00:00
NAKAMURA Takumi 1ab0cf0e28 SampleProfile.cpp: Prune a stray \param added in r217437. [-Wdocumentation]
llvm-svn: 217465
2014-09-09 22:44:30 +00:00
NAKAMURA Takumi bb4fac9050 ScalarOpts/LLVMBuild.txt: Prune unused dependency to IPA.
llvm-svn: 217448
2014-09-09 15:00:38 +00:00
NAKAMURA Takumi 37ffecf06b ScalarOpts/LLVMBuild.txt: Reorder.
llvm-svn: 217447
2014-09-09 15:00:26 +00:00
Diego Novillo de1ab26f52 Re-factor sample profile reader into lib/ProfileData.
Summary:
This patch moves the profile reading logic out of the Sample Profile
transformation into a generic profile reader facility in
lib/ProfileData.

The intent is to use this new reader to implement a sample profile
reader/writer that can be used to convert sample profiles from external
sources into LLVM.

This first patch introduces no functional changes. It moves the profile
reading code from lib/Transforms/SampleProfile.cpp into
lib/ProfileData/SampleProfReader.cpp.

In subsequent patches I will:

- Add a bitcode format for sample profiles to allow for more efficient
  encoding of the profile.
- Add a writer for both text and bitcode format profiles.
- Add a 'convert' command to llvm-profdata to be able to convert between
  the two (and serve as entry point for other sample profile formats).

Reviewers: bogner, echristo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D5250

llvm-svn: 217437
2014-09-09 12:40:50 +00:00
Andrew Trick 8fc3c6c093 Add a comment to getNewAlignmentDiff.
llvm-svn: 217350
2014-09-07 23:16:24 +00:00
Hal Finkel 93873cc10e Check for all known bits on ret in InstCombine
From a combination of @llvm.assume calls (and perhaps through other means, such
as range metadata), it is possible that all bits of a return value might be
known. Previously, InstCombine did not check for this (which is understandable
given assumptions of constant propagation), but means that we'd miss simple
cases where assumptions are involved.

llvm-svn: 217346
2014-09-07 21:28:34 +00:00
Hal Finkel 7e1844940e Make use of @llvm.assume from LazyValueInfo
This change teaches LazyValueInfo to use the @llvm.assume intrinsic. Like with
the known-bits change (r217342), this requires feeding a "context" instruction
pointer through many functions. Aside from a little refactoring to reuse the
logic that turns predicates into constant ranges in LVI, the only new code is
that which can 'merge' the range from an assumption into that otherwise
computed. There is also a small addition to JumpThreading so that it can have
LVI use assumptions in the same block as the comparison feeding a conditional
branch.

With this patch, we can now simplify this as expected:
int foo(int a) {
  __builtin_assume(a > 5);
  if (a > 3) {
    bar();
    return 1;
  }
  return 0;
}

llvm-svn: 217345
2014-09-07 20:29:59 +00:00
Hal Finkel d67e463901 Add an AlignmentFromAssumptions Pass
This adds a ScalarEvolution-powered transformation that updates load, store and
memory intrinsic pointer alignments based on invariant((a+q) & b == 0)
expressions. Many of the simple cases we can get with ValueTracking, but we
still need something like this for the more complicated cases (such as those
with an offset) that require some algebra. Note that gcc's
__builtin_assume_aligned's optional third argument provides exactly for this
kind of 'misalignment' offset for which this kind of logic is necessary.

The primary motivation is to fixup alignments for vector loads/stores after
vectorization (and unrolling). This pass is added to the optimization pipeline
just after the SLP vectorizer runs (which, admittedly, does not preserve SE,
although I imagine it could).  Regardless, I actually don't think that the
preservation matters too much in this case: SE computes lazily, and this pass
won't issue any SE queries unless there are any assume intrinsics, so there
should be no real additional cost in the common case (SLP does preserve DT and
LoopInfo).

llvm-svn: 217344
2014-09-07 20:05:11 +00:00
Hal Finkel 15aeaaf24a Add additional patterns for @llvm.assume in ValueTracking
This builds on r217342, which added the infrastructure to compute known bits
using assumptions (@llvm.assume calls). That original commit added only a few
patterns (to catch common cases related to determining pointer alignment); this
change adds several other patterns for simple cases.

r217342 contained that, for assume(v & b = a), bits in the mask
that are known to be one, we can propagate known bits from the a to v. It also
had a known-bits transfer for assume(a = b). This patch adds:

assume(~(v & b) = a) : For those bits in the mask that are known to be one, we
                       can propagate inverted known bits from the a to v.

assume(v | b = a) :    For those bits in b that are known to be zero, we can
                       propagate known bits from the a to v.

assume(~(v | b) = a):  For those bits in b that are known to be zero, we can
                       propagate inverted known bits from the a to v.

assume(v ^ b = a) :    For those bits in b that are known to be zero, we can
		       propagate known bits from the a to v. For those bits in
		       b that are known to be one, we can propagate inverted
                       known bits from the a to v.

assume(~(v ^ b) = a) : For those bits in b that are known to be zero, we can
		       propagate inverted known bits from the a to v. For those
		       bits in b that are known to be one, we can propagate
                       known bits from the a to v.

assume(v << c = a) :   For those bits in a that are known, we can propagate them
                       to known bits in v shifted to the right by c.

assume(~(v << c) = a) : For those bits in a that are known, we can propagate
                        them inverted to known bits in v shifted to the right by c.

assume(v >> c = a) :   For those bits in a that are known, we can propagate them
                       to known bits in v shifted to the right by c.

assume(~(v >> c) = a) : For those bits in a that are known, we can propagate
                        them inverted to known bits in v shifted to the right by c.

assume(v >=_s c) where c is non-negative: The sign bit of v is zero

assume(v >_s c) where c is at least -1: The sign bit of v is zero

assume(v <=_s c) where c is negative: The sign bit of v is one

assume(v <_s c) where c is non-positive: The sign bit of v is one

assume(v <=_u c): Transfer the known high zero bits

assume(v <_u c): Transfer the known high zero bits (if c is know to be a power
                 of 2, transfer one more)

A small addition to InstCombine was necessary for some of the test cases. The
problem is that when InstCombine was simplifying and, or, etc. it would fail to
check the 'do I know all of the bits' condition before checking less specific
conditions and would not fully constant-fold the result. I'm not sure how to
trigger this aside from using assumptions, so I've just included the change
here.

llvm-svn: 217343
2014-09-07 19:21:07 +00:00
Hal Finkel 60db05896a Make use of @llvm.assume in ValueTracking (computeKnownBits, etc.)
This change, which allows @llvm.assume to be used from within computeKnownBits
(and other associated functions in ValueTracking), adds some (optional)
parameters to computeKnownBits and friends. These functions now (optionally)
take a "context" instruction pointer, an AssumptionTracker pointer, and also a
DomTree pointer, and most of the changes are just to pass this new information
when it is easily available from InstSimplify, InstCombine, etc.

As explained below, the significant conceptual change is that known properties
of a value might depend on the control-flow location of the use (because we
care that the @llvm.assume dominates the use because assumptions have
control-flow dependencies). This means that, when we ask if bits are known in a
value, we might get different answers for different uses.

The significant changes are all in ValueTracking. Two main changes: First, as
with the rest of the code, new parameters need to be passed around. To make
this easier, I grouped them into a structure, and I made internal static
versions of the relevant functions that take this structure as a parameter. The
new code does as you might expect, it looks for @llvm.assume calls that make
use of the value we're trying to learn something about (often indirectly),
attempts to pattern match that expression, and uses the result if successful.
By making use of the AssumptionTracker, the process of finding @llvm.assume
calls is not expensive.

Part of the structure being passed around inside ValueTracking is a set of
already-considered @llvm.assume calls. This is to prevent a query using, for
example, the assume(a == b), to recurse on itself. The context and DT params
are used to find applicable assumptions. An assumption needs to dominate the
context instruction, or come after it deterministically. In this latter case we
only handle the specific case where both the assumption and the context
instruction are in the same block, and we need to exclude assumptions from
being used to simplify their own ephemeral values (those which contribute only
to the assumption) because otherwise the assumption would prove its feeding
comparison trivial and would be removed.

This commit adds the plumbing and the logic for a simple masked-bit propagation
(just enough to write a regression test). Future commits add more patterns
(and, correspondingly, more regression tests).

llvm-svn: 217342
2014-09-07 18:57:58 +00:00
Hal Finkel 57f03dda49 Add functions for finding ephemeral values
This adds a set of utility functions for collecting 'ephemeral' values. These
are LLVM IR values that are used only by @llvm.assume intrinsics (directly or
indirectly), and thus will be removed prior to code generation, implying that
they should be considered free for certain purposes (like inlining). The
inliner's cost analysis, and a few other passes, have been updated to account
for ephemeral values using the provided functionality.

This functionality is important for the usability of @llvm.assume, because it
limits the "non-local" side-effects of adding llvm.assume on inlining, loop
unrolling, etc. (these are hints, and do not generate code, so they should not
directly contribute to estimates of execution cost).

llvm-svn: 217335
2014-09-07 13:49:57 +00:00
Hal Finkel 74c2f355d2 Add an Assumption-Tracking Pass
This adds an immutable pass, AssumptionTracker, which keeps a cache of
@llvm.assume call instructions within a module. It uses callback value handles
to keep stale functions and intrinsics out of the map, and it relies on any
code that creates new @llvm.assume calls to notify it of the new instructions.
The benefit is that code needing to find @llvm.assume intrinsics can do so
directly, without scanning the function, thus allowing the cost of @llvm.assume
handling to be negligible when none are present.

The current design is intended to be lightweight. We don't keep track of
anything until we need a list of assumptions in some function. The first time
this happens, we scan the function. After that, we add/remove @llvm.assume
calls from the cache in response to registration calls and ValueHandle
callbacks.

There are no new direct test cases for this pass, but because it calls it
validation function upon module finalization, we'll pick up detectable
inconsistencies from the other tests that touch @llvm.assume calls.

This pass will be used by follow-up commits that make use of @llvm.assume.

llvm-svn: 217334
2014-09-07 12:44:26 +00:00
David Majnemer 6fe6ea740c InstCombine: Remove a special case pattern
The special case did not work when run under -reassociate and can easily
be expressed by a further generalization of an existing pattern.

llvm-svn: 217227
2014-09-05 06:09:24 +00:00
James Molloy 6b95d8ed36 Enable noalias metadata by default and swap the order of the SLP and Loop vectorizers by default.
After some time maturing, hopefully the flags themselves will be removed.

llvm-svn: 217144
2014-09-04 13:23:08 +00:00
Tilmann Scheller faabbb5fb6 [GVN] Format variable name.
Local variables need to start with an upper case letter.

llvm-svn: 217133
2014-09-04 06:38:00 +00:00
David Majnemer 13046deef3 IndVarSimplify: Address review comments for r217102
No functional change intended, just some cleanups and comments added.

llvm-svn: 217115
2014-09-04 00:23:13 +00:00
Kostya Serebryany 3175521844 [asan] fix debug info produced for asan-coverage=2
llvm-svn: 217106
2014-09-03 23:24:18 +00:00
David Majnemer c6ab01ecca IndVarSimplify: Don't let LFTR compare against a poison value
LinearFunctionTestReplace tries to use the *next* indvar to compare
against when possible.  However, it may be the case that the calculation
for the next indvar has NUW/NSW flags and that it may only be safely
used inside the loop.  Using it in a comparison to calculate the exit
condition could result in observing poison.

This fixes PR20680.

Differential Revision: http://reviews.llvm.org/D5174

llvm-svn: 217102
2014-09-03 23:03:18 +00:00
Kostya Serebryany 351b078b6d [asan] add -asan-coverage=3: instrument all blocks and critical edges.
llvm-svn: 217098
2014-09-03 22:37:37 +00:00
Benjamin Kramer 89854ebe8e Make some helpers static or move into the llvm namespace.
llvm-svn: 217077
2014-09-03 21:04:12 +00:00
Sanjay Patel 9433a28845 Preserve IR flags (nsw, nuw, exact, fast-math) in SLP vectorizer (PR20802).
The SLP vectorizer should propagate IR-level optimization hints/flags (nsw, nuw, exact, fast-math)
when converting scalar instructions into vectors. But this isn't a simple copy - we need to take
the intersection (the logical 'and') of the sets of flags on the scalars.

The solution is further complicated because we can have non-uniform (non-SIMD) vector ops after:
http://reviews.llvm.org/D4015
http://llvm.org/viewvc/llvm-project?view=revision&revision=211339

The vast majority of changed files are existing tests that were not propagating IR flags, but I've
also added a new test file for focused testing of IR flag possibilities.

Differential Revision: http://reviews.llvm.org/D5172

llvm-svn: 217051
2014-09-03 17:40:30 +00:00
Sanjay Patel a982d992f0 Change name of copyFlags() to copyIRFlags(). Add convenience method for logical 'and' of all flags. NFC.
Adding 'IR' to the names in an attempt to be less ambiguous about the flags we're dealing with here.

The 'and' method is needed by the SLPVectorizer (PR20802) and possibly other passes.

llvm-svn: 217004
2014-09-03 01:06:50 +00:00
Hal Finkel 445dda5c4a Add pass-manager flags to use CFL AA
Add -use-cfl-aa (and -use-cfl-aa-in-codegen) to add CFL AA in the default pass
managers (for easy testing).

llvm-svn: 216978
2014-09-02 22:12:54 +00:00
Kostya Serebryany ad23852ac3 [asan] Assign a low branch weight to ASan's slow path, patch by Jonas Wagner. This speeds up asan (at least on SPEC) by 1%-5% or more. Also fix lint in dfsan.
llvm-svn: 216972
2014-09-02 21:46:51 +00:00
Yi Jiang 77a609b556 Generate extract for in-tree uses if the use is scalar operand in vectorized instruction. radar://18144665
llvm-svn: 216946
2014-09-02 21:00:39 +00:00
David Blaikie 15913f46b2 unique_ptrify the result of SpecialCaseList::create
llvm-svn: 216925
2014-09-02 18:13:54 +00:00
David Majnemer 49428105aa LICM: Don't crash when an instruction is used by an unreachable BB
Summary:
BBs might contain non-LCSSA'd values after the LCSSA pass is run if they
are unreachable from the entry block.

Normally, the users of the instruction would be PHIs but the unreachable
BBs have normal users; rewrite their uses to be undef values.

An alternative fix could involve fixing this at LCSSA but that would
require this invariant to hold after subsequent transforms.  If a BB
created an unreachable block, they would be in violation of this.

This fixes PR19798.

Differential Revision: http://reviews.llvm.org/D5146

llvm-svn: 216911
2014-09-02 16:22:00 +00:00
David Majnemer d4cffcf073 SROA: Don't insert instructions before a PHI
SROA may decide that it needs to insert a bitcast and would set it's
insertion point before a PHI.  This will create an invalid module
right quick.

Instead, choose the first insertion point in the basic block that holds
our PHI.

This fixes PR20822.

Differential Revision: http://reviews.llvm.org/D5141

llvm-svn: 216891
2014-09-01 21:20:14 +00:00
David Majnemer d2df50196f Revert "Revert two GEP-related InstCombine commits"
This reverts commit r216698 which reverted r216523 and r216598.

We would attempt to perform the transformation even if the match()
failed because, as a side effect, it would set V.  This would trick us
into believing that we correctly found a place to correctly apply the
transform.

An additional test case was added to getelementptr.ll so that we might
not regress in the future.

llvm-svn: 216890
2014-09-01 21:10:02 +00:00
Sanjay Patel 5ad239e15a Add a convenience method to copy wrapping, exact, and fast-math flags (NFC).
The loop vectorizer preserves wrapping, exact, and fast-math properties of scalar instructions.
This patch adds a convenience method to make that operation easier because we need to do this
in the loop vectorizer, SLP vectorizer, and possibly other places.

Although this is a 'no functional change' patch, I've added a testcase to verify that the exact
flag is preserved by the loop vectorizer. The wrapping and fast-math flags are already checked
in existing testcases.

Differential Revision: http://reviews.llvm.org/D5138

llvm-svn: 216886
2014-09-01 18:44:57 +00:00
Chandler Carruth 18cee1defc Fix a really bad miscompile introduced in r216865 - the else-if logic
chain became completely broken here as *all* intrinsic users ended up
being skipped, and the ones that seemed to be singled out were actually
the exact wrong set.

This is a great example of why long else-if chains can be easily
confusing. Switch the entire code to use early exits and early continues
to have simpler (and more importantly, correct) logic here, as well as
fixing the reversed logic for detecting and continuing on lifetime
intrinsics.

I've also significantly cleaned up the test case and added another test
case demonstrating an example where the optimization is not (trivially)
safe to perform.

llvm-svn: 216871
2014-09-01 10:09:18 +00:00
Renato Golin 86a6c3f269 Small refactor on VectorizerHint for deduplication
Previously, the hint mechanism relied on clean up passes to remove redundant
metadata, which still showed up if running opt at low levels of optimization.
That also has shown that multiple nodes of the same type, but with different
values could still coexist, even if temporary, and cause confusion if the
next pass got the wrong value.

This patch makes sure that, if metadata already exists in a loop, the hint
mechanism will never append a new node, but always replace the existing one.
It also enhances the algorithm to cope with more metadata types in the future
by just adding a new type, not a lot of code.

Re-applying again due to MSVC 2013 being minimum requirement, and this patch
having C++11 that MSVC 2012 didn't support.

Fixes PR20655.

llvm-svn: 216870
2014-09-01 10:00:17 +00:00
Hal Finkel 0c083024f0 Feed AA to the inliner and use AA->getModRefBehavior in AddAliasScopeMetadata
This feeds AA through the IFI structure into the inliner so that
AddAliasScopeMetadata can use AA->getModRefBehavior to figure out which
functions only access their arguments (instead of just hard-coding some
knowledge of memory intrinsics). Most of the information is only available from
BasicAA; this is important for preserving alias scoping information for
target-specific intrinsics when doing the noalias parameter attribute to
metadata conversion.

llvm-svn: 216866
2014-09-01 09:01:39 +00:00
Nick Lewycky fc243d54d2 Ignore lifetime intrinsics in use list for MemCpyOptimizer. Patch by Luqman Aden, review by Hal Finkel.
llvm-svn: 216865
2014-09-01 06:03:11 +00:00
Hal Finkel cbb85f249e Fix AddAliasScopeMetadata again - alias.scope must be a complete description
I thought that I had fixed this problem in r216818, but I did not do a very
good job. The underlying issue is that when we add alias.scope metadata we are
asserting that this metadata completely describes the aliasing relationships
within the current aliasing scope domain, and so in the context of translating
noalias argument attributes, the pointers must all be based on noalias
arguments (as underlying objects) and have no other kind of underlying object.
In r216818 excluding appropriate accesses from getting alias.scope metadata is
done by looking for underlying objects that are not identified function-local
objects -- but that's wrong because allocas, etc. are also function-local
objects and we need to explicitly check that all underlying objects are the
noalias arguments for which we're adding metadata aliasing scopes.

This fixes the underlying-object check for adding alias.scope metadata, and
does some refactoring of the related capture-checking eligibility logic (and
adds more comments; hopefully making everything a bit clearer).

Fixes self-hosting on x86_64 with -mllvm -enable-noalias-to-md-conversion (the
feature is still disabled by default).

llvm-svn: 216863
2014-09-01 04:26:40 +00:00
Craig Topper 6dc4a8bc2c Fix some cases where StringRef was being passed by const reference. Remove const from some other StringRefs since its implicitly const already.
llvm-svn: 216820
2014-08-30 16:48:02 +00:00
Hal Finkel a3708df41a Fix AddAliasScopeMetadata to not add scopes when deriving from unknown pointers
The previous implementation of AddAliasScopeMetadata, which adds noalias
metadata to preserve noalias parameter attribute information when inlining had
a flaw: it would add alias.scope metadata to accesses which might have been
derived from pointers other than noalias function parameters. This was
incorrect because even some access known not to alias with all noalias function
parameters could easily alias with an access derived from some other pointer.
Instead, when deriving from some unknown pointer, we cannot add alias.scope
metadata at all. This fixes a miscompile of the test-suite's tramp3d-v4.
Furthermore, we cannot add alias.scope to functions unless we know they
access only argument-derived pointers (currently, we know this only for
memory intrinsics).

Also, we fix a theoretical problem with using the NoCapture attribute to skip
the capture check. This is incorrect (as explained in the comment added), but
would not matter in any code generated by Clang because we get only inferred
nocapture attributes in Clang-generated IR.

This functionality is not yet enabled by default.

llvm-svn: 216818
2014-08-30 12:48:33 +00:00
David Majnemer 492e612e01 InstCombine: Respect recursion depth in visitUDivOperand
llvm-svn: 216817
2014-08-30 09:19:05 +00:00
David Majnemer 5e96f1b4c8 InstCombine: Try harder to combine icmp instructions
consider: (and (icmp X, Y), (and Z, (icmp A, B)))
It may be possible to combine (icmp X, Y) with (icmp A, B).
If we successfully combine, create an 'and' instruction with Z.

This fixes PR20814.

N.B. There is room for improvement after this change but I'm not
convinced it's worth chasing yet.

llvm-svn: 216814
2014-08-30 06:18:20 +00:00
Hal Finkel 2d3d6da44b Fix a typo in AddAliasScopeMetadata
llvm-svn: 216741
2014-08-29 16:33:41 +00:00
David Majnemer 400e725bde Revert two GEP-related InstCombine commits
This reverts commit r216523 and r216598; people have reported
regressions.

llvm-svn: 216698
2014-08-29 00:06:43 +00:00
Reid Kleckner febb279c9c Don't promote byval pointer arguments when padding matters
Don't promote byval pointer arguments when when their size in bits is
not equal to their alloc size in bits. This can happen for x86_fp80,
where the size in bits is 80 but the alloca size in bits in 128.
Promoting these types can break passing unions of x86_fp80s and other
types.

Patch by Thomas Jablin!

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D5057

llvm-svn: 216693
2014-08-28 22:42:00 +00:00
David Majnemer 074052b623 InstCombine: Remove redundant combines
InstSimplify already handles icmp (X+Y), X (and things like it)
appropriately.  The first thing that InstCombine does is run
InstSimplify on the instruction.

llvm-svn: 216659
2014-08-28 10:08:37 +00:00
Erik Eckstein 8354cfaf95 Fix: SLPVectorizer tried to move an instruction which was replaced by a vector instruction.
For a detailed description of the problem see the comment in the test file.
The problematic moveBefore() calls are not required anymore because the new
scheduling algorithm ensures a correct ordering anyway.

llvm-svn: 216656
2014-08-28 07:04:02 +00:00
David Majnemer 76d06bc613 InstSimplify: Move a transform from InstCombine to InstSimplify
Several combines involving icmp (shl C2, %X) C1 can be simplified
without introducing any new instructions.  Move them to InstSimplify;
while we are at it, make them more powerful.

llvm-svn: 216642
2014-08-28 03:34:28 +00:00
David Majnemer 22ccfc4484 InstCombine: Combine gep X, (Y-X) to Y
We try to perform this transform in InstSimplify but we aren't always
able to.  Sometimes, we need to insert a bitcast if X and Y don't have
the same time.

llvm-svn: 216598
2014-08-27 20:08:37 +00:00
Michael Zolotukhin 5dc466b863 [SLP] Re-enable vectorization of GEP expressions (re-apply r210342 with a fix).
llvm-svn: 216549
2014-08-27 15:01:18 +00:00
Craig Topper e1d1294853 Simplify creation of a bunch of ArrayRefs by using None, makeArrayRef or just letting them be implicitly created.
llvm-svn: 216525
2014-08-27 05:25:25 +00:00
Craig Topper 3af9722529 Fix some cases were ArrayRefs were being passed by reference. Also remove 'const' from some other ArrayRef uses since its implicitly const already.
llvm-svn: 216524
2014-08-27 05:25:00 +00:00
David Majnemer 54e97d5dc0 InstCombine: Optimize GEP's involving ptrtoint better
We supported transforming:
(gep i8* X, -(ptrtoint Y))

to:
(inttoptr (sub (ptrtoint X), (ptrtoint Y)))

However, this only fired if 'X' had type i8*.  Generalize this to
support various types of different sizes.  This results in much better
CodeGen, especially for pointers to packed structs.

llvm-svn: 216523
2014-08-27 05:16:04 +00:00
Joerg Sonnenberger cb5674b9c2 Revert r210342 and r210343, add test case for the crasher.
PR 20642.

llvm-svn: 216475
2014-08-26 19:06:41 +00:00
Dinesh Dwivedi 4919bbe29d This patch enables SimplifyUsingDistributiveLaws() to handle following pattens.
(X >> Z) & (Y >> Z)  -> (X&Y) >> Z  for all shifts.
(X >> Z) | (Y >> Z)  -> (X|Y) >> Z  for all shifts.
(X >> Z) ^ (Y >> Z)  -> (X^Y) >> Z  for all shifts.

These patterns were previously handled separately in visitAnd()/visitOr()/visitXor().

Differential Revision: http://reviews.llvm.org/D4951

llvm-svn: 216443
2014-08-26 08:53:32 +00:00
Reid Kleckner 3715461b48 musttail: Don't eliminate varargs packs if there is a forwarding call
Also clean up and beef up this grep test for the feature.

llvm-svn: 216425
2014-08-26 00:59:51 +00:00
Sanjay Patel 4e31cdabd1 fix typos in comments
llvm-svn: 216424
2014-08-26 00:59:15 +00:00
Reid Kleckner e6e88f99b3 ArgPromotion: Don't touch variadic functions
Adding, removing, or changing non-pack parameters can change the ABI
classification of pack parameters. Clang and other frontends encode the
classification in the IR of the call site, but the callee side
determines it dynamically based on the number of registers consumed so
far. Changing the prototype affects the number of registers consumed
would break such code.

Dead argument elimination performs a similar task and already has a
similar check to avoid this problem.

Patch by Thomas Jablin!

llvm-svn: 216421
2014-08-25 23:58:48 +00:00
Rafael Espindola 3fd1e9933f Modernize raw_fd_ostream's constructor a bit.
Take a StringRef instead of a "const char *".
Take a "std::error_code &" instead of a "std::string &" for error.

A create static method would be even better, but this patch is already a bit too
big.

llvm-svn: 216393
2014-08-25 18:16:47 +00:00
Bruno Cardoso Lopes e2a1fa35df Remove dangling initializers in GlobalDCE
GlobalDCE deletes global vars and updates their initializers to nullptr
while leaving underlying constants to be cleaned up later by its uses.
The clean up may never happen, fix this by forcing it every time it's
safe to destroy constants.

Final patch by Rafael Espindola
http://reviews.llvm.org/D4931

<rdar://problem/17523868>

llvm-svn: 216390
2014-08-25 17:51:14 +00:00
Stepan Dyatkovskiy c90308bf83 MergeFunctions, tiny refactoring:
cmpAPFloat has been renamed to cmpAPFloats (multiple form).

llvm-svn: 216376
2014-08-25 08:22:46 +00:00
Stepan Dyatkovskiy 7f895c1184 MergeFunctions, tiny refactoring:
cmpAPInt has been renamed to cmpAPInts (multiple form).

llvm-svn: 216375
2014-08-25 08:19:50 +00:00
Stepan Dyatkovskiy 0b765dee6e MergeFunctions, tiny refactoring:
cmpType has been renamed to cmpTypes (multiple form).

llvm-svn: 216374
2014-08-25 08:16:39 +00:00
Stepan Dyatkovskiy 016daddc52 MergeFunctions, tiny refactoring:
cmpGEP has been renamed to cmpGEPs (multiple form).

llvm-svn: 216373
2014-08-25 08:12:45 +00:00
Karthik Bhat 7f33ff7dea Allow vectorization of division by uniform power of 2.
This patch adds support to recognize division by uniform power of 2 and modifies the cost table to vectorize division by uniform power of 2 whenever possible.
Updates Cost model for Loop and SLP Vectorizer.The cost table is currently only updated for X86 backend.
Thanks to Hal, Andrea, Sanjay for the review. (http://reviews.llvm.org/D4971)

llvm-svn: 216371
2014-08-25 04:56:54 +00:00
Craig Topper 4627679cec Use range based for loops to avoid needing to re-mention SmallPtrSet size.
llvm-svn: 216351
2014-08-24 23:23:06 +00:00
David Majnemer 0ffccf7fb5 InstCombine: Properly optimize or'ing bittests together
CFE, with -03, would turn:
bool f(unsigned x) {
  bool a = x & 1;
  bool b = x & 2;
  return a | b;
}

into:
  %1 = lshr i32 %x, 1
  %2 = or i32 %1, %x
  %3 = and i32 %2, 1
  %4 = icmp ne i32 %3, 0

This sort of thing exposes a nasty pathology in GCC, ICC and LLVM.

Instead, we would rather want:
  %1 = and i32 %x, 3
  %2 = icmp ne i32 %1, 0

Things get a bit more interesting in the following case:
  %1 = lshr i32 %x, %y
  %2 = or i32 %1, %x
  %3 = and i32 %2, 1
  %4 = icmp ne i32 %3, 0

Replacing it with the following sequence is better:
  %1 = shl nuw i32 1, %y
  %2 = or i32 %1, 1
  %3 = and i32 %2, %x
  %4 = icmp ne i32 %3, 0

This sequence is preferable because %1 doesn't involve %x and could
potentially be hoisted out of loops if it is invariant; only perform
this transform in the non-constant case if we know we won't increase
register pressure.

llvm-svn: 216343
2014-08-24 09:10:57 +00:00
Jingyue Wu ec33fa9aca [SROA] Fold a PHI node if all its incoming values are the same
Summary:
Fixes PR20425.

During slice building, if all of the incoming values of a PHI node are the same, replace the PHI node with the common value. This simplification makes alloca's used by PHI nodes easier to promote.

Test Plan: Added three more tests in phi-and-select.ll

Reviewers: nlewycky, eliben, meheff, chandlerc

Reviewed By: chandlerc

Subscribers: zinovy.nis, hfinkel, baldrick, llvm-commits

Differential Revision: http://reviews.llvm.org/D4659

llvm-svn: 216299
2014-08-22 22:45:57 +00:00
David Majnemer 49775e0173 InstCombine: Don't unconditionally preserve 'nuw' when shrinking constants
Consider:
  %add = add nuw i32 %a, -16777216
  %and = and i32 %add, 255

Regardless of whether or not we demand the sign bit of %add, we cannot
replace -16777216 with 2130706432 without also removing 'nuw' from the
instruction.

llvm-svn: 216273
2014-08-22 17:11:04 +00:00
David Majnemer 0e6c986696 InstCombine: sub nsw %x, C -> add nsw %x, -C if C isn't INT_MIN
We can preserve nsw during this transform if -C won't overflow.

llvm-svn: 216269
2014-08-22 16:41:23 +00:00
David Majnemer 42b83a5e36 InstCombine: Don't unconditionally preserve 'nsw' when shrinking constants
Consider:
  %add = add nsw i32 %a, -16777216
  %and = and i32 %add, 255

Regardless of whether or not we demand the sign bit of %add, we cannot
replace -16777216 with 2130706432 without also removing 'nsw' from the
instruction.

This fixes PR20377.

llvm-svn: 216261
2014-08-22 07:56:32 +00:00
Erik Eckstein b49d7abb7b fix: SLPVectorizer crashes for unreachable blocks containing not schedulable instructions.
In unreachable blocks it's legal to have instructions like "%x = op %x".
Such instuctions are not schedulable. Therefore the SLPVectorizer has to check for
unreachable blocks and ignore them.

Fixes bug 20646.

llvm-svn: 216256
2014-08-22 01:18:39 +00:00
Peter Collingbourne fab565a56b [dfsan] Fix non-determinism bug in non-zero label check annotator.
We now use a std::vector instead of a DenseSet to store the list of
label checks so that we can iterate over it deterministically.

llvm-svn: 216255
2014-08-22 01:18:18 +00:00
Reid Kleckner c36f48f08a SROA: Handle a case of store size being smaller than allocation size
In this case, we are creating an x86_fp80 slice for a union from C where
the padding bytes may contain real data. An x86_fp80 alloca is 16 bytes,
and that's just fine. We can't, however, use regular loads and stores to
access the slice, because the store size is only 10 bytes / 80 bits.
Instead, use memcpy and memset.

Fixes PR18726.

Reviewed By: chandlerc

Differential Revision: http://reviews.llvm.org/D5012

llvm-svn: 216248
2014-08-22 00:09:56 +00:00
David Blaikie 2f3f76fdb1 Use DILexicalBlockFile, rather than DILexicalBlock, to track discriminator changes to ensure discriminator changes don't introduce new DWARF DW_TAG_lexical_blocks.
Somewhat unnoticed in the original implementation of discriminators, but
it could cause instructions to end up in new, small,
DW_TAG_lexical_blocks due to the use of DILexicalBlock to track
discriminator changes.

Instead, use DILexicalBlockFile which we already use to track file
changes without introducing new scopes, so it works well to track
discriminator changes in the same way.

llvm-svn: 216239
2014-08-21 22:45:21 +00:00
Rafael Espindola 7cebf36a95 Move some logic to populateLTOPassManager.
This will avoid code duplication in the next commit which calls it directly
from the gold plugin.

llvm-svn: 216211
2014-08-21 20:03:44 +00:00
Rafael Espindola 216e0c0617 Respect LibraryInfo in populateLTOPassManager and use it. NFC.
llvm-svn: 216203
2014-08-21 18:49:52 +00:00
Rafael Espindola e07caad9e7 Handle inlining in populateLTOPassManager like in populateModulePassManager.
No functionality change.

llvm-svn: 216178
2014-08-21 13:35:30 +00:00
Zinovy Nis 33406da5f4 [CLNUP] Remove return after llvm_unreachable. Thanks to Hal Finkel for pointing.
llvm-svn: 216176
2014-08-21 13:30:05 +00:00
Rafael Espindola 208bc533cd Move DisableGVNLoadPRE from populateLTOPassManager to PassManagerBuilder.
llvm-svn: 216174
2014-08-21 13:13:17 +00:00
Erik Verbruggen 2b98bd2a80 Reassociate x + -0.1234 * y into x - 0.1234 * y
This does not require -ffast-math, and it gives CSE/GVN more options to
eliminate duplicate expressions in, e.g.:

  return ((x + 0.1234 * y) * (x - 0.1234 * y));

Differential Revision: http://reviews.llvm.org/D4904

llvm-svn: 216169
2014-08-21 10:45:30 +00:00
Zinovy Nis 0a36cba29d [INDVARS] Extend using of widening of induction variables for the cases of "sub nsw" and "mul nsw" instructions.
Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like:

int N = 100;
float **A;

void foo(int x0, int x1)
{
        float * A_cur = &A[0][0];
        float * A_next = &A[1][0];
        for(int x = x0; x < x1; ++x).
        {
          // Currently only [x+N] case is widened. Others 2 cases lead to sext.
          // This patch fixes it, so all 3 cases do not need sext.
          const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N];
          A_next[x] = div;
        }
}
...
> clang++ test.cpp -march=core-avx2 -Ofast  -fno-unroll-loops -fno-tree-vectorize -S -o -

Differential Revision: http://reviews.llvm.org/D4695

llvm-svn: 216160
2014-08-21 08:25:45 +00:00
Craig Topper 71b7b68b74 Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size.
llvm-svn: 216158
2014-08-21 05:55:13 +00:00
David Majnemer 5d1aeba2ea InstCombine: Fold ((A | B) & C1) ^ (B & C2) -> (A & C1) ^ B if C1^C2=-1
Adapted from a patch by Richard Smith, test-case written by me.

llvm-svn: 216157
2014-08-21 05:14:48 +00:00
James Molloy 82c995d450 [LoopVectorizer] Limit unroll factor in the presence of nested reductions.
If we have a scalar reduction, we can increase the critical path length if the loop we're unrolling is inside another loop. Limit, by default to 2, so the critical path only gets increased by one reduction operation.

llvm-svn: 216140
2014-08-20 23:53:52 +00:00
Yi Jiang 1a4e73d7bf New InstCombine pattern: (icmp ult/ule (A + C1), C3) | (icmp ult/ule (A + C2), C3) to (icmp ult/ule ((A & ~(C1 ^ C2)) + max(C1, C2)), C3) under certain condition
llvm-svn: 216135
2014-08-20 22:55:40 +00:00
David Majnemer 42158f3eea InstCombine: Annotate sub with nuw when we prove it's safe
We can prove that a 'sub' can be a 'sub nuw' if the left-hand side is
negative and the right-hand side is non-negative.

llvm-svn: 216045
2014-08-20 07:17:31 +00:00
Peter Collingbourne f39430bd4a [dfsan] Treat vararg custom functions like unimplemented functions.
Because declarations of these functions can appear in places like autoconf
checks, they have to be handled somehow, even though we do not support
vararg custom functions. We do so by printing a warning and calling the
uninstrumented function, as we do for unimplemented functions.

llvm-svn: 216042
2014-08-20 01:40:23 +00:00
David Majnemer 57d5bc8849 InstCombine: Annotate sub with nsw when we prove it's safe
We can prove that a 'sub' can be a 'sub nsw' under certain conditions:
- The sign bits of the operands is the same.
- Both operands have more than 1 sign bit.

The subtraction cannot be a signed overflow in either case.

llvm-svn: 216037
2014-08-19 23:36:30 +00:00
Renato Golin 06d601fb3e Revert "Small refactor on VectorizerHint for deduplication"
This reverts commit r215994 because MSVC 2012 can't cope with its C++11 goodness.

llvm-svn: 215999
2014-08-19 18:08:50 +00:00
Renato Golin dd6394d833 Small refactor on VectorizerHint for deduplication
Previously, the hint mechanism relied on clean up passes to remove redundant
metadata, which still showed up if running opt at low levels of optimization.
That also has shown that multiple nodes of the same type, but with different
values could still coexist, even if temporary, and cause confusion if the
next pass got the wrong value.

This patch makes sure that, if metadata already exists in a loop, the hint
mechanism will never append a new node, but always replace the existing one.
It also enhances the algorithm to cope with more metadata types in the future
by just adding a new type, not a lot of code.

llvm-svn: 215994
2014-08-19 17:30:43 +00:00
Mayur Pandey 960507beb4 InstCombine: ((A & ~B) ^ (~A & B)) to A ^ B
Proof using CVC3 follows:
$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR((A & ~B),(~A & B)) = BVXOR(A,B);
$ cvc3 t.cvc
Valid.

Differential Revision: http://reviews.llvm.org/D4898

llvm-svn: 215974
2014-08-19 08:19:19 +00:00
Craig Topper 97ebe53032 Const-correct and prevent a copy of a SmallPtrSet.
llvm-svn: 215973
2014-08-19 07:44:27 +00:00
Mayur Pandey 75b76c6a92 test commit (spelling correction)
llvm-svn: 215970
2014-08-19 06:41:55 +00:00
Craig Topper 6230691c91 Revert "Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size."
Getting a weird buildbot failure that I need to investigate.

llvm-svn: 215870
2014-08-18 00:24:38 +00:00
Craig Topper 5229cfd163 Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid needing to mention the size.
llvm-svn: 215868
2014-08-17 23:47:00 +00:00
Owen Anderson a4428aa484 Remove an InstCombine that transformed patterns like (x * uitofp i1 y) to (select y, x, 0.0) when the multiply has fast math flags set.
While this might seem like an obvious canonicalization, there is one subtle problem with it.  The result of the original expression
is undef when x is NaN (remember, fast math flags), but the result of the select is always defined when x is NaN.  This means that the
new expression is strictly more defined than the original one.  One unfortunate consequence of this is that the transform is not reversible!
It's always legal to make increase the defined-ness of an expression, but it's not legal to reduce it.  Thus, targets that prefer the original
form of the expression cannot reverse the transform to recover it.  Another way to think of it is that the transform has lost source-level
information (the fast math flags), which is undesirable.

llvm-svn: 215825
2014-08-17 03:51:29 +00:00
David Majnemer 1a0bbc8a5c InstCombine: Fix a potential bug in 0 - (X sdiv C) -> (X sdiv -C)
While *most* (X sdiv 1) operations will get caught by InstSimplify, it
is still possible for a sdiv to appear in the worklist which hasn't been
simplified yet.

This means that it is possible for 0 - (X sdiv 1) to get transformed
into (X sdiv -1); dividing by -1 can make the transform produce undef
values instead of the proper result.

Sorry for the lack of testcase, it's a bit problematic because it relies
on the exact order of operations in the worklist.

llvm-svn: 215818
2014-08-16 09:23:42 +00:00
David Majnemer f9a095d606 InstCombine: Combine mul with div.
We can combne a mul with a div if one of the operands is a multiple of
the other:

%mul = mul nsw nuw %a, C1
%ret = udiv %mul, C2
  =>
%ret = mul nsw %a, (C1 / C2)

This can expose further optimization opportunities if we end up
multiplying or dividing by a power of 2.

Consider this small example:

define i32 @f(i32 %a) {
  %mul = mul nuw i32 %a, 14
  %div = udiv exact i32 %mul, 7
  ret i32 %div
}

which gets CodeGen'd to:

    imull       $14, %edi, %eax
    imulq       $613566757, %rax, %rcx
    shrq        $32, %rcx
    subl        %ecx, %eax
    shrl        %eax
    addl        %ecx, %eax
    shrl        $2, %eax
    retq

We can now transform this into:
define i32 @f(i32 %a) {
  %shl = shl nuw i32 %a, 1
  ret i32 %shl
}

which gets CodeGen'd to:

    leal        (%rdi,%rdi), %eax
    retq

This fixes PR20681.

llvm-svn: 215815
2014-08-16 08:55:06 +00:00
Rafael Espindola ea46c32f81 Introduce a helper to combine instruction metadata.
Replace the old code in GVN and BBVectorize with it. Update SimplifyCFG to use
it.

Patch by Björn Steinbrink!

llvm-svn: 215723
2014-08-15 15:46:38 +00:00
Hal Finkel 61c386126b Copy noalias metadata from call sites to inlined instructions
When a call site with noalias metadata is inlined, that metadata can be
propagated directly to the inlined instructions (only those that might access
memory because it is not useful on the others). Prior to inlining, the noalias
metadata could express that a call would not alias with some other memory
access, which implies that no instruction within that called function would
alias. By propagating the metadata to the inlined instructions, we preserve
that knowledge.

This should complete the enhancements requested in PR20500.

llvm-svn: 215676
2014-08-14 21:09:37 +00:00
Hal Finkel d2dee16c27 Add noalias metadata for general calls (not just memory intrinsics) during inlining
When preserving noalias function parameter attributes by adding noalias
metadata in the inliner, we should do this for general function calls (not just
memory intrinsics). The logic is very similar to what already existed (except
that we want to add this metadata even for functions taking no relevant
parameters). This metadata can be used by ModRef queries in the caller after
inlining.

This addresses the first part of PR20500. Adding noalias metadata during
inlining is still turned off by default.

llvm-svn: 215657
2014-08-14 16:44:03 +00:00
Chad Rosier 11ab941644 [Reassociation] Add support for reassociation with unsafe algebra.
Vector instructions are (still) not supported for either integer or floating
point.  Hopefully, that work will be landed shortly.

llvm-svn: 215647
2014-08-14 15:23:01 +00:00
David Majnemer 698dca0b95 InstCombine: ((A | ~B) ^ (~A | B)) to A ^ B
Proof using CVC3 follows:
$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR((A | ~B),(~A |B)) = BVXOR(A,B);
$ cvc3 t.cvc
Valid.

Patch by Mayur Pandey!

Differential Revision: http://reviews.llvm.org/D4883

llvm-svn: 215621
2014-08-14 06:46:25 +00:00
David Majnemer f1eda23514 Added InstCombine Transform for ((B | C) & A) | B -> B | (A & C)
Transform ((B | C) & A) | B --> B | (A & C)

Z3 Link: http://rise4fun.com/Z3/hP6p

Patch by Sonam Kumari!

Differential Revision: http://reviews.llvm.org/D4865

llvm-svn: 215619
2014-08-14 06:41:38 +00:00
Jan Vesely 0cd3ec6cfa utils: Fix segfault in flattencfg
v2: continue iterating through the rest of the bb
    use for loop

v3: initialize FlattenCFG pass in ScalarOps
    add test

v4: split off initializing flattencfg to a separate patch
    add comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215574
2014-08-13 20:31:53 +00:00
Jan Vesely 5a956d49f7 Initialize FlattenCFG pass
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 215573
2014-08-13 20:31:52 +00:00
Benjamin Kramer a7c40ef022 Canonicalize header guards into a common format.
Add header guards to files that were missing guards. Remove #endif comments
as they don't seem common in LLVM (we can easily add them back if we decide
they're useful)

Changes made by clang-tidy with minor tweaks.

llvm-svn: 215558
2014-08-13 16:26:38 +00:00
Chandler Carruth 0fb998110a [optnone] Make the optnone attribute effective at suppressing function
attribute and function argument attribute synthesizing and propagating.

As with the other uses of this attribute, the goal remains a best-effort
(no guarantees) attempt to not optimize the function or assume things
about the function when optimizing. This is particularly useful for
compiler testing, bisecting miscompiles, triaging things, etc. I was
hitting specific issues using optnone to isolate test code from a test
driver for my fuzz testing, and this is one step of fixing that.

llvm-svn: 215538
2014-08-13 10:49:33 +00:00
Chandler Carruth 3f92ecc2a0 Revert r215415 which causse MSan to crash on a great deal of C++ code.
I've followed up on the original commit as well.

llvm-svn: 215532
2014-08-13 09:19:39 +00:00
Karthik Bhat a4a4db91be InstCombine: Combine (xor (or %a, %b) (xor %a, %b)) to (add %a, %b)
Correctness proof of the transform using CVC3-

$ cat t.cvc
A, B : BITVECTOR(32);
QUERY BVXOR(A | B, BVXOR(A,B) ) = A & B;

$ cvc3 t.cvc
Valid.

llvm-svn: 215524
2014-08-13 05:13:14 +00:00
Matt Arsenault 4815f09bbe Allwo bitcast + struct GEP transform to work with addrspacecast
llvm-svn: 215467
2014-08-12 19:46:13 +00:00
Reid Kleckner 3ae6e1528a msan: Handle musttail calls
First, avoid calling setTailCall(false) on musttail calls.  The funciton
prototypes should be "congruent", so the shadow layout should be exactly
the same.

Second, avoid inserting instrumentation after a musttail call to
propagate the return value shadow.  We don't need to propagate the
result of a tail call, it should already be in the right place.

Reviewed By: eugenis

Differential Revision: http://reviews.llvm.org/D4331

llvm-svn: 215415
2014-08-12 00:12:43 +00:00
Reid Kleckner e31acf239a Move helper for getting a terminating musttail call to BasicBlock
No functional change.  To be used in future commits that need to look
for such instructions.

Reviewed By: rafael

Differential Revision: http://reviews.llvm.org/D4504

llvm-svn: 215413
2014-08-12 00:05:15 +00:00
David Majnemer ab07f00c64 InstCombine: Combine (add (and %a, %b) (or %a, %b)) to (add %a, %b)
What follows bellow is a correctness proof of the transform using CVC3.

$ < t.cvc
A, B : BITVECTOR(32);

QUERY BVPLUS(32, A & B, A | B) = BVPLUS(32, A, B);

$ cvc3 < t.cvc
Valid.

llvm-svn: 215400
2014-08-11 22:32:02 +00:00
James Molloy 65b08f5e46 [LoopVectorizer] Enable support for floating-point subtraction reductions
llvm-svn: 215200
2014-08-08 12:41:08 +00:00
David Majnemer fe8c7540b0 GlobalOpt: Optimize in the face of insertvalue/extractvalue
GlobalOpt didn't know how to simulate InsertValueInst or
ExtractValueInst.  Optimizing these is pretty straightforward.

N.B. This came up when looking at clang's IRGen for MS ABI member
pointers; they are represented as aggregates.

llvm-svn: 215184
2014-08-08 05:50:43 +00:00
Gerolf Hoflehner ea96a3d336 Fix for multi-line comment warning
llvm-svn: 215169
2014-08-07 23:19:55 +00:00
Arnold Schwaighofer 4fb3c47456 SLPVectorizer: Use the type of the value loaded/stored to get the ABI alignment
We were using the pointer type which is incorrect.

llvm-svn: 215162
2014-08-07 22:47:27 +00:00
Owen Anderson 6c19ab1b5d Fix a case in SROA where lifetime intrinsics could inhibit alloca promotion. In
this case, the code path dealing with vector promotion was missing the explicit
checks for lifetime intrinsics that were present on the corresponding integer
promotion path.

llvm-svn: 215148
2014-08-07 21:07:35 +00:00
Rui Ueyama c487f7728e Revert "r214897 - Remove dead zero store to calloc initialized memory"
It broke msan.

llvm-svn: 214989
2014-08-06 19:30:38 +00:00
James Molloy 568da0990e Add a new option -run-slp-after-loop-vectorization.
This swaps the order of the loop vectorizer and the SLP/BB vectorizers. It is disabled by default so we can do performance testing - ideally we want to change to having the loop vectorizer running first, and the SLP vectorizer using its leftovers instead of the other way around.

llvm-svn: 214963
2014-08-06 12:56:19 +00:00
Peter Collingbourne df240b252a [dfsan] Try not to create too many additional basic blocks in functions which
already have a large number of blocks. Works around a performance issue with
the greedy register allocator.

llvm-svn: 214944
2014-08-06 00:33:40 +00:00
JF Bastien ac8b66b32c Fix typos in comments and doc
Committing http://reviews.llvm.org/D4798 for Robin Morisset (morisset@google.com)

llvm-svn: 214934
2014-08-05 23:27:34 +00:00
Rafael Espindola f9e52cf015 Don't internalize all but main by default.
This is mostly a cleanup, but it changes a fairly old behavior.

Every "real" LTO user was already disabling the silly internalize pass
and creating the internalize pass itself. The difference with this
patch is for "opt -std-link-opts" and the C api.

Now to get a usable behavior out of opt one doesn't need the funny
looking command line:

opt -internalize -disable-internalize -internalize-public-api-list=foo,bar -std-link-opts

llvm-svn: 214919
2014-08-05 20:10:38 +00:00
Philip Reames 00c9b6461f Remove dead zero store to calloc initialized memory
Optimize the following IR:

%1 = tail call noalias i8* @calloc(i64 1, i64 4)
%2 = bitcast i8* %1 to i32*
; This store is dead and should be removed
store i32 0, i32* %2, align 4

Memory returned by calloc is guaranteed to be zero initialized. If the value being stored is the constant zero (and the store is not otherwise observable across threads), we can delete the store.  If the store is to an out of bounds address, it is undefined and thus also removable.

Reviewed By: nicholas

Differential Revision: http://reviews.llvm.org/D3942

llvm-svn: 214897
2014-08-05 17:48:20 +00:00
James Molloy 2b8933c354 Teach the SLP Vectorizer that keeping some values live over a callsite can have a cost.
Some types, such as 128-bit vector types on AArch64, don't have any callee-saved registers. So if a value needs to stay live over a callsite, it must be spilled and refilled. This cost is now taken into account.

llvm-svn: 214859
2014-08-05 12:30:34 +00:00
Manman Ren 062f58d550 [SimplifyCFG] fix accessing deleted PHINodes in switch-to-table conversion.
When we have a covered lookup table, make sure we don't delete PHINodes that
are cached in PHIs.

rdar://17887153

llvm-svn: 214642
2014-08-02 23:41:54 +00:00
Erik Eckstein 26a1bf7d84 fix bug 20513 - Crash in SLP Vectorizer
llvm-svn: 214638
2014-08-02 19:39:42 +00:00
Alexey Samsonov d9ad5cec0c [ASan] Use metadata to pass source-level information from Clang to ASan.
Instead of creating global variables for source locations and global names,
just create metadata nodes and strings. They will be transformed into actual
globals in the instrumentation pass (if necessary). This approach is more
flexible:
1) we don't have to ensure that our custom globals survive all the optimizations
2) if globals are discarded for some reason, we will simply ignore metadata for them
   and won't have to erase corresponding globals
3) metadata for source locations can be reused for other purposes: e.g. we may
   attach source location metadata to alloca instructions and provide better descriptions
   for stack variables in ASan error reports.

No functionality change.

llvm-svn: 214604
2014-08-02 00:35:50 +00:00
Tyler Nowicki 064896bbc5 Add diagnostics to the vectorizer cost model.
When the cost model determines vectorization is not possible/profitable these remarks print an analysis of that decision.

Note that in selectVectorizationFactor() we can assume that OptForSize and ForceVectorization are mutually exclusive.

Reviewed by Arnold Schwaighofer

llvm-svn: 214599
2014-08-02 00:14:03 +00:00
Peter Collingbourne e52646cd80 PartiallyInlineLibCalls: Check sqrt result type before transforming it.
Some configure scripts declare this with the wrong prototype, which can lead
to an assertion failure.

llvm-svn: 214593
2014-08-01 23:21:21 +00:00
Peter Collingbourne 142fdff0d5 [dfsan] Correctly handle loads and stores of zero size.
llvm-svn: 214561
2014-08-01 21:18:18 +00:00
Rafael Espindola 3f6481d0d3 Remove some calls to std::move.
Instead of moving out the data in a ErrorOr<std::unique_ptr<Foo>>, get
a reference to it.

Thanks to David Blaikie for the suggestion.

llvm-svn: 214516
2014-08-01 14:31:55 +00:00
Erik Eckstein 690dd037d9 SLPVectorizer: fix build problem in Release configuration
llvm-svn: 214496
2014-08-01 09:47:38 +00:00
Erik Eckstein c80e1dc081 SLPVectorizer: improved scheduling algorithm.
llvm-svn: 214494
2014-08-01 09:20:42 +00:00
Erik Eckstein f16a808292 SLP Vectorizer: added statistics counter
llvm-svn: 214487
2014-08-01 08:14:28 +00:00
Erik Eckstein 4944b2ff94 SLP Vectorizer: improve canonicalize tree operands of commutitive binary operands.
This reverts r214338 (except the test file) and replaces it with a more general algorithm.

llvm-svn: 214485
2014-08-01 08:05:55 +00:00
Suyog Sarda 56c9a87035 This patch implements transform for pattern "(A & ~B) ^ (~A) -> ~(A & B)".
Differential Revision: http://reviews.llvm.org/D4653

llvm-svn: 214479
2014-08-01 05:07:20 +00:00
Suyog Sarda 1c6c2f69f7 This patch implements transform for pattern "(A | B) & ((~A) ^ B) -> (A & B)".
Differential Revision: http://reviews.llvm.org/D4628

llvm-svn: 214478
2014-08-01 04:59:26 +00:00
Suyog Sarda 52324c82cc This patch implements transform for pattern "( A & (~B)) | (A ^ B) -> (A ^ B)"
Differential Revision: http://reviews.llvm.org/D4652

llvm-svn: 214477
2014-08-01 04:50:31 +00:00
Suyog Sarda 16d646594e This patch implements transform for pattern "(A & B) | ((~A) ^ B) -> (~A ^ B)".
Patch Credit to Ankit Jain !

Differential Revision: http://reviews.llvm.org/D4655

llvm-svn: 214476
2014-08-01 04:41:43 +00:00
Tyler Nowicki b5a65395cc Improve the remark generated for -Rpass-missed.
The current remark is ambiguous and makes it sounds like explicitly specifying vectorization will allow the loop to be vectorized. This is not the case. The improved remark directs the user to -Rpass-analysis=loop-vectorize to determine the cause of the pass-miss.

Reviewed by Arnold Schwaighofer`

llvm-svn: 214445
2014-07-31 21:22:22 +00:00
Tyler Nowicki 9fe497fcac Improve the remark generated when a variable that is used outside the loop is not a reduction or induction variable.
Reviewed by Arnold Schwaighofer

llvm-svn: 214440
2014-07-31 21:02:40 +00:00
Evgeniy Stepanov 5997feb7dc [msan] Fix handling of array types.
Switch array type shadow from a single integer to
an array of integers (i.e. make it per-element).
This simplifies instrumentation of extractvalue and fixes PR20493.

llvm-svn: 214398
2014-07-31 11:02:27 +00:00
Stepan Dyatkovskiy 87c046189d MergeFunctions, tiny refactoring:
cmpOperation has been renamed to cmpOperations (multiple form).

llvm-svn: 214392
2014-07-31 07:16:59 +00:00
David Majnemer a92687d636 InstCombine: Correctly propagate NSW/NUW for x-(-A) -> x+A
We can only propagate the nsw bits if both subtraction instructions are
marked with the appropriate bit.

N.B.  We only propagate the nsw bit in InstCombine because the nuw case
is already handled in InstSimplify.

This fixes PR20189.

llvm-svn: 214385
2014-07-31 04:49:29 +00:00
David Majnemer 42af3601c2 InstCombine: Simplify (A ^ B) or/and (A ^ B ^ C)
While we can already transform A | (A ^ B) into A | B, things get bad
once we have (A ^ B) | (A ^ B ^ Cst) because reassociation will morph
this into (A ^ B) | ((A ^ Cst) ^ B).  Our existing patterns fail once
this happens.

To fix this, we add a new pattern which looks through the tree of xor
binary operators to see that, in fact, there exists a redundant xor
operation.

What follows bellow is a correctness proof of the transform using CVC3.

$ cat t.cvc
A, B, C : BITVECTOR(64);

QUERY BVXOR(A, B) | BVXOR(BVXOR(B, C), A) = BVXOR(A, B) | C;
QUERY BVXOR(BVXOR(A, C), B) | BVXOR(A, B) = BVXOR(A, B) | C;

QUERY BVXOR(A, B) & BVXOR(BVXOR(B, C), A) = BVXOR(A, B) & ~C;
QUERY BVXOR(BVXOR(A, C), B) & BVXOR(A, B) = BVXOR(A, B) & ~C;

$ cvc3 < t.cvc
Valid.
Valid.
Valid.
Valid.

llvm-svn: 214342
2014-07-30 21:26:37 +00:00
Chad Rosier 78f41b3ca7 SLP Vectorizer: Canonicalize tree operands of commutitive binary operands.
llvm-svn: 214338
2014-07-30 21:07:56 +00:00
Rafael Espindola d07cf400ab SimplifyCFG: Avoid miscompilations due to removed lifetime intrinsics.
The lifetime intrinsics need some work in order to make it clear which
optimizations are or are not valid.

For now dropping this optimization avoids a miscompilation.

Patch by Björn Steinbrink.

llvm-svn: 214336
2014-07-30 21:04:00 +00:00
Aaron Ballman 573f3b5313 Fixing a few -Woverloaded-virtual warnings by exposing the hidden virtual function as well. No functional changes intended.
llvm-svn: 214325
2014-07-30 19:23:59 +00:00
Rafael Espindola 3cf4af11d5 Add the missing hasLinkOnceODRLinkage predicate.
llvm-svn: 214312
2014-07-30 15:57:51 +00:00
Manman Ren f8a1967c8c [Debug Info] add DISubroutineType and its creation takes DITypeArray.
DITypeArray is an array of DITypeRef, at its creation, we will create
DITypeRef (i.e use the identifier if the type node has an identifier).

This is the last patch to unique the type array of a subroutine type.

rdar://17628609

llvm-svn: 214132
2014-07-28 22:24:06 +00:00
Manman Ren ab8ffbaaee [Debug Info] rename getTypeArray to getElements, setTypeArray to setArrays.
This is the second of a series of patches to handle type uniqueing of the
type array for a subroutine type.

For vector and array types, getElements returns the array of subranges, so it
is a better name than getTypeArray. Even for class, struct and enum types,
getElements returns the members, which can be subprograms.

setArrays can set up to two arrays, the second is the templates.

This commit should have no functionality change.

llvm-svn: 214112
2014-07-28 19:14:13 +00:00