Commit Graph

6259 Commits

Author SHA1 Message Date
Sanjay Patel fcc7c1a0ba [LibCallSimplifier] don't get fooled by a fake fmin()
This is similar to the bug/fix:
https://llvm.org/bugs/show_bug.cgi?id=26211
http://reviews.llvm.org/rL258325

The fmin() test case reveals another bug caused by sloppy
code duplication. It will crash without this patch because
fp128 is a valid floating-point type, but we would think
that we had matched a function that used doubles.

The new helper function can be used to replace similar
checks that are used in several other places in this file.

llvm-svn: 258428
2016-01-21 20:19:54 +00:00
David Majnemer 3af5bf30e3 [InstCombine] Simplify (x >> y) <= x
This commit extends the patterns recognised by InstSimplify to also handle (x >> y) <= x in the same way as (x /u y) <= x.

The missing optimisation was found investigating why LLVM did not optimise away bound checks in a binary search: https://github.com/rust-lang/rust/pull/30917

Patch by Andrea Canciani!

Differential Revision: http://reviews.llvm.org/D16402

llvm-svn: 258422
2016-01-21 18:55:54 +00:00
Rong Xu ed9fec7365 [PGO] IR level instrumentation of indirect call value profiling
This patch adds the instrumentation for indirect call value profiling. It finds all the indirect call-sites and generates instrprof_value_profile intrinsic calls. A new opt level option -disable-vp is introduced to disable this instrumentation.

Reviewers: davidxl, betulb, vsk

Differential Revision: http://reviews.llvm.org/D16016

llvm-svn: 258417
2016-01-21 18:11:44 +00:00
Matthew Simpson 486bace5cc Revert "[SLP] Truncate expressions to minimum required bit width"
This reverts commit r258404.

llvm-svn: 258408
2016-01-21 17:17:20 +00:00
Vedant Kumar 61035fa3cb [GCOV] Avoid emitting profile arcs for module and skeleton CUs
Do not emit profile arc files and note files for module and skeleton
CU's.

Our users report seeing unexpected *.gcda and *.gcno files in their
projects when using gcov-style profiling with modules or frameworks.
The unwanted files come from these modules. This is not very helpful
for end-users. Further, we've seen reports of instrumented programs
crashing while writing these files out (due to I/O failures).

rdar://problem/22838296

Reviewed-by: aprantl

Differential Revision: http://reviews.llvm.org/D15997

llvm-svn: 258406
2016-01-21 17:04:42 +00:00
Matthew Simpson cb17d72170 [SLP] Truncate expressions to minimum required bit width
This change attempts to produce vectorized integer expressions in bit widths
that are narrower than their scalar counterparts. The need for demotion arises
especially on architectures in which the small integer types (e.g., i8 and i16)
are not legal for scalar operations but can still be used in vectors. Like
similar work done within the loop vectorizer, we rely on InstCombine to perform
the actual type-shrinking. We use the DemandedBits analysis and
ComputeNumSignBits from ValueTracking to determine the minimum required bit
width of an expression.

Differential revision: http://reviews.llvm.org/D15815

llvm-svn: 258404
2016-01-21 16:31:55 +00:00
Sanjoy Das a34ce95b60 Add a "gc-transition" operand bundle
Summary:
This adds a new kind of operand bundle to LLVM denoted by the
`"gc-transition"` tag.  Inputs to `"gc-transition"` operand bundle are
lowered into the "transition args" section of `gc.statepoint` by
`RewriteStatepointsForGC`.

This removes the last bit of functionality that was unsupported in the
deopt bundle based code path in `RewriteStatepointsForGC`.

Reviewers: pgavlin, JosephTremoulet, reames

Subscribers: sanjoy, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D16342

llvm-svn: 258338
2016-01-20 19:50:25 +00:00
Sanjay Patel bd2dc67142 [LibCallSimplifier] don't get fooled by a fake sqrt()
The test case will crash without this patch because the subsequent call to
hasUnsafeAlgebra() assumes that the call instruction is an FPMathOperator
(ie, returns an FP type).

This part of the function signature check was omitted for the sqrt() case, 
but seems to be in place for all other transforms.

Before:
http://reviews.llvm.org/rL257400
...we would have needlessly continued execution in optimizeSqrt(), but the
bug was harmless because we'd eventually fail some other check and return
without damage.

This should fix:
https://llvm.org/bugs/show_bug.cgi?id=26211

Differential Revision: http://reviews.llvm.org/D16198

llvm-svn: 258325
2016-01-20 17:41:14 +00:00
Joseph Tremoulet b41632bf0f [Inliner/WinEH] Honor implicit nounwinds
Summary:
Funclet EH tables require that a given funclet have only one unwind
destination for exceptional exits.  The verifier will therefore reject
e.g. two cleanuprets with different unwind dests for the same cleanup, or
two invokes exiting the same funclet but to different unwind dests.
Because catchswitch has no 'nounwind' variant, and because IR producers
are not *required* to annotate calls which will not unwind as 'nounwind',
it is legal to nest a call or an "unwind to caller" catchswitch within a
funclet pad that has an unwind destination other than caller; it is
undefined behavior for such a call or catchswitch to unwind.

Normally when inlining an invoke, calls in the inlined sequence are
rewritten to invokes that unwind to the callsite invoke's unwind
destination, and "unwind to caller" catchswitches in the inlined sequence
are rewritten to unwind to the callsite invoke's unwind destination.
However, if such a call or "unwind to caller" catchswitch is located in a
callee funclet that has another exceptional exit with an unwind
destination within the callee, applying the normal transformation would
give that callee funclet multiple unwind destinations for its exceptional
exits.  There would be no way for EH table generation to determine which
is the "true" exit, and the verifier would reject the function
accordingly.

Add logic to the inliner to detect these cases and leave such calls and
"unwind to caller" catchswitches as calls and "unwind to caller"
catchswitches in the inlined sequence.

This fixes PR26147.


Reviewers: rnk, andrew.w.kaylor, majnemer

Subscribers: alexcrichton, llvm-commits

Differential Revision: http://reviews.llvm.org/D16319

llvm-svn: 258273
2016-01-20 02:15:15 +00:00
Sanjay Patel 582857c95c add tests to show missing memset/malloc optimizations (PR25892)
llvm-svn: 258218
2016-01-19 23:07:10 +00:00
Sanjoy Das 29a4b5dc0d [SCEV] Fix PR26207
In some cases, the max backedge taken count can be more conservative
than the exact backedge taken count (for instance, because
ScalarEvolution::getRange is not control-flow sensitive whereas
computeExitLimitFromICmp can be).  In these cases,
computeExitLimitFromCond (specifically the bit that deals with `and` and
`or` instructions) can create an ExitLimit instance with a
`SCEVCouldNotCompute` max backedge count expression, but a computable
exact backedge count expression.  This violates an implicit SCEV
assumption: a computable exact BE count should imply a computable max BE
count.

This change

 - Makes the above implicit invariant explicit by adding an assert to
   ExitLimit's constructor

 - Changes `computeExitLimitFromCond` to be more robust around
   conservative max backedge counts

llvm-svn: 258184
2016-01-19 20:53:51 +00:00
Sanjay Patel d1f4f03f5e [LibCallSimplifier] use instruction-level fast-math-flags to shrink calls
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555

llvm-svn: 258158
2016-01-19 18:38:52 +00:00
Sanjay Patel 81a63cd11f [LibCallSimplifier] use instruction-level fast-math-flags to transform pow(x, [small integer]) calls
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555

As with D15937, the intent of the patch is to preserve the current behavior of the transform
except that we use the pow call's 'fast' attribute as a trigger rather than a function-level
attribute.

The TODO comment notes a potential follow-on patch that would propagate FMF to the new
instructions.

Differential Revision: http://reviews.llvm.org/D16122

llvm-svn: 258153
2016-01-19 18:15:12 +00:00
Sanjoy Das de47590589 [IndVars] Fix PR25576
`LCSSASafePhiForRAUW` as computed was incorrect -- in cases like
these (this exact example does not actually trigger the bug):

define i32 @f(i32 %n, i1* %c) {
entry:
  br label %outer.loop

outer.loop:
  br label %inner.loop

inner.loop:
  %iv = phi i32 [ 0, %outer.loop ], [ %iv.inc, %inner.loop ]
  %iv.inc = add nuw nsw i32 %iv, 1
  %tc = udiv i32 %n, 13
  %be.cond = icmp ult i32 %iv, %tc
  br i1 %be.cond, label %inner.loop, label %inner.exit

inner.exit:
  %iv.lcssa = phi i32 [ %iv, %inner.loop ]
  %outer.be.cond = load volatile i1, i1* %c
  br i1 %outer.be.cond, label %outer.loop, label %leave

leave:
  %iv.lcssa.lcssa = phi i32 [ %iv.lcssa, %inner.exit ]
  ret i32 %iv.lcssa.lcssa
}

`LCSSASafePhiForRAUW` is true for `%iv.lcssa` when re-rewriting the exit
value of `%iv` for `%inner.loop` to `%tc` (this can happen due to
`SCEVExpander::findExistingExpansion`), but the RAUW breaks LCSSA.

To fix this, instead of computing `SafePhi` with special logic, decide
the safety of RAUW directly via `replacementPreservesLCSSAForm`.

llvm-svn: 258016
2016-01-17 18:12:52 +00:00
Artur Pilipenko f84dc06e5b Push isDereferenceableAndAlignedPointer down into isSafeToLoadUnconditionally
Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D16226

llvm-svn: 258010
2016-01-17 12:35:29 +00:00
Igor Laevsky 28eeb3f66c [BasicAliasAnalysis] Take into account operand bundles in the getModRefInfo function
Differential Revision: http://reviews.llvm.org/D16225

llvm-svn: 257991
2016-01-16 12:15:53 +00:00
Matthew Simpson 57fe1b10db Reapply r257800 with fix
The fix uniques the bundle of getelementptr indices we are about to vectorize
since it's possible for the same index to be used by multiple instructions.
The original commit message is below.

[SLP] Vectorize the index computations of getelementptr instructions.

This patch seeds the SLP vectorizer with getelementptr indices. The primary
motivation in doing so is to vectorize gather-like idioms beginning with
consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these
cases could be vectorized with a top-down phase, seeding the existing bottom-up
phase with the index computations avoids the complexity, compile-time, and
phase ordering issues associated with a full top-down pass. Only bundles of
single-index getelementptrs with non-constant differences are considered for
vectorization.

llvm-svn: 257918
2016-01-15 18:51:51 +00:00
Silviu Baranga f29dfd36bb Re-commit r257064, after it was reverted in r257340.
This contains a fix for the issue that caused the revert:
we no longer assume that we can insert instructions after the
instruction that produces the base pointer. We previously
assumed that this would be ok, because the instruction produces
a value and therefore is not a terminator. This is false for invoke
instructions. We will now insert these new instruction directly
at the location of the users.

Original commit message:

[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs

Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257897
2016-01-15 15:52:05 +00:00
Matthew Simpson 9258e013a2 Revert "[SLP] Vectorize the index computations of getelementptr instructions."
This reverts commit r257800.

llvm-svn: 257888
2016-01-15 13:10:46 +00:00
James Molloy f01488e2bc [InstCombine] Rewrite bswap/bitreverse handling completely.
There are several requirements that ended up with this design;
  1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early.
  2. Bitreversals and byteswaps are very related in their matching logic.
  3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses.
  4. Bswaps are best matched early in InstCombine.

The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals.

We can then extend the matching logic in one place only.

llvm-svn: 257875
2016-01-15 09:20:19 +00:00
Keno Fischer 81e2e9ef86 Reapply r257105 "[Verifier] Check that debug values have proper size"
I originally reapplied this in 257550, but had to revert again due to bot
breakage. The only change in this version is to allow either the TypeSize
or the TypeAllocSize of the variable to be the one represented in debug info
(hopefully in the future we can figure out how to encode the difference).
Additionally, several bot failures following r257550, were due to
optimizer bugs now fixed in r257787 and r257795.

r257550 commit message was:

```
The follow extra changes were made to test cases:

Manually making the variable be the actual type instead of a pointer
to avoid pointer-size differences in generic code:

    LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll
    LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll
    LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll
    LLVM :: DebugInfo/Generic/varargs.ll

Delete sizing information from debug info for the same reason
(but the presence of the pointer was important to the test case):

    LLVM :: DebugInfo/Generic/restrict.ll
    LLVM :: DebugInfo/Generic/tu-composite.ll
    LLVM :: Linker/type-unique-type-array-a.ll
    LLVM :: Linker/type-unique-simple2.ll

Fixing an incorrect DW_OP_deref

    LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll

Fixing a missing DW_OP_deref

    LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll

Additionally, clang should no longer complain during bootstrap should no
longer happen after r257534.

The original commit message was:
``
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.

One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.

Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
  it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
  variable as that would depend on the target, even though this test is
  supposed to be generic
- I had to manually declared size/align for reference type. See also the
  discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
``

```

llvm-svn: 257850
2016-01-15 00:46:17 +00:00
Matthew Simpson 791fd160c3 [SLP] Vectorize the index computations of getelementptr instructions.
This patch seeds the SLP vectorizer with getelementptr indices. The primary
motivation in doing so is to vectorize gather-like idioms beginning with
consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these
cases could be vectorized with a top-down phase, seeding the existing bottom-up
phase with the index computations avoids the complexity, compile-time, and
phase ordering issues associated with a full top-down pass. Only bundles of
single-index getelementptrs with non-constant differences are considered for
vectorization.

Differential Revision: http://reviews.llvm.org/D14829

llvm-svn: 257800
2016-01-14 20:46:27 +00:00
Keno Fischer d5354fdddb [SROA] Also insert a bit piece expression if only one piece is needed
Summary: If SROA creates only one piece (e.g. because the other is not needed),
it still needs to create a bit_piece expression if that bit piece is smaller
than the original size of the alloca.

Reviewers: aprantl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16187

llvm-svn: 257795
2016-01-14 20:06:34 +00:00
Keno Fischer 1dd319f3b6 [Utils] Fix incorrect dbg.declare store conversion
Summary: The dbg.declare -> dbg.value conversion did not check which operand of
the store instruction the alloca was passed to. As a result code that stored the
address of an alloca, rather than storing to the alloca, would still trigger
the conversion routine, leading to the insertion of an incorrect dbg.value
intrinsic.

Reviewers: aprantl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16169

llvm-svn: 257787
2016-01-14 19:12:27 +00:00
Akira Hatanaka b45b1ea86f [Inliner] Merge the attributes of the caller and callee functions
This patch turns off the fast-math optimization attribute on the caller
if the callee's fast-math attribute is not turned on.

For example,

- before inlining
 caller: "less-precise-fpmad"="true"
 callee: "less-precise-fpmad"="false"

- after inlining
 caller: "less-precise-fpmad"="false"

Alternatively, it's possible to block inlining if the caller's and
callee's attributes don't match. If this approach is preferable to the
one in this patch, we can discuss post-commit.

rdar://problem/19836465

Differential Revision: http://reviews.llvm.org/D7802

llvm-svn: 257575
2016-01-13 06:02:45 +00:00
Keno Fischer 78e5c9e6e2 Re-Revert r257105 (Verifier debug info changes)
While I investigate some new buildbot failures. This was originally reapplied
as r257550 and r257558.

llvm-svn: 257563
2016-01-13 02:31:14 +00:00
Keno Fischer 25916079ff Reapply r257105 "[Verifier] Check that debug values have proper size"
The follow extra changes were made to test cases:

Manually making the variable be the actual type instead of a pointer
to avoid pointer-size differences in generic code:

    LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll
    LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll
    LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll
    LLVM :: DebugInfo/Generic/varargs.ll

Delete sizing information from debug info for the same reason
(but the presence of the pointer was important to the test case):

    LLVM :: DebugInfo/Generic/restrict.ll
    LLVM :: DebugInfo/Generic/tu-composite.ll
    LLVM :: Linker/type-unique-type-array-a.ll
    LLVM :: Linker/type-unique-simple2.ll

Fixing an incorrect DW_OP_deref

    LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll

Fixing a missing DW_OP_deref

    LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll

Additionally, clang should no longer complain during bootstrap should no
longer happen after r257534.

The original commit message was:
```
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.

One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.

Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
  it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
  variable as that would depend on the target, even though this test is
  supposed to be generic
- I had to manually declared size/align for reference type. See also the
  discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
```

llvm-svn: 257550
2016-01-13 00:31:44 +00:00
Fiona Glaser db7824f0c1 CannotBeOrderedLessThanZero: add some missing cases
llvm-svn: 257542
2016-01-12 23:37:30 +00:00
Keno Fischer 9aae445e09 [Utils] Insert DW_OP_bit_piece when only describing part of the variable
Summary: The dbg.declare -> dbg.value conversion looks through any zext/sext
to find a value to describe the variable (in the expectation that those
zext/sext instruction will go away later). However, those values do not
cover the entire variable and thus need a DW_OP_bit_piece.

Reviewers: aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16061

llvm-svn: 257534
2016-01-12 22:46:09 +00:00
Sanjay Patel 53ba88dbb0 [LibCallSimplifier] use instruction-level fast-math-flags to transform pow(x, 0.5) calls
Also, propagate the FMF to the newly created sqrt() call.

llvm-svn: 257503
2016-01-12 19:06:35 +00:00
Teresa Johnson 1e26e39579 Fix bot failure from r257493: remove extraneous temp file read
This was left from an earlier version of the test.

llvm-svn: 257494
2016-01-12 17:53:59 +00:00
Teresa Johnson 388497e8be [ThinLTO] Handle an external call from an import to an alias in dest
The findExternalCalls routine ignores calls to functions already
defined in the dest module. This was not handling the case where
the definition in the current module is actually an alias to a
function call.

llvm-svn: 257493
2016-01-12 17:48:44 +00:00
Sanjay Patel 6002e78a06 [LibCallSimplifier] use instruction-level fast-math-flags to transform pow(exp(x)) calls
See also:
http://reviews.llvm.org/rL255555
http://reviews.llvm.org/rL256871
http://reviews.llvm.org/rL256964
http://reviews.llvm.org/rL257400
http://reviews.llvm.org/rL257404
http://reviews.llvm.org/rL257414

llvm-svn: 257491
2016-01-12 17:30:37 +00:00
Sanjay Patel dc19b8de76 consolidate exp/exp2 tests
The transform is identical, so keep the tests together and save some overhead.

llvm-svn: 257484
2016-01-12 17:00:38 +00:00
Sanjay Patel 71e550fefb Add/edit tests to include instruction-level FMF on calls
Prepatory patch before changing LibCallSimplifier to use the FMF.
Also, tighten the CHECK lines and give the tests more meaningful names.
Similar changes to:
http://reviews.llvm.org/rL257414

llvm-svn: 257481
2016-01-12 16:50:17 +00:00
Teresa Johnson 5fe40050bd [IRMover] Don't copy personality, etc unless creating def
Function::copyAttributesFrom will copy the personality function, prefix
data and prolog data from the source function to the new function, and
is invoked when the IRMover copies the function prototype. This puts a
reference to a constant in the source module on a function in the dest
module, which causes an error when deleting the source module after
importing, since the personality function in the source module still has
uses (this would presumably also be an issue for the prologue and prefix
data). Remove the copies added to the dest copy when creating the new
prototype, as they are mapped properly when/if we link the function body.

llvm-svn: 257420
2016-01-12 00:24:24 +00:00
Sanjay Patel e896ede7f1 [LibCallSimplifier] use instruction-level fast-math-flags to transform log calls
Also, add tests to verify that we're checking 'fast' on both calls of each transform pair,
tighten the CHECK lines, and give the tests more meaningful names.

This is a continuation of:
http://reviews.llvm.org/rL255555
http://reviews.llvm.org/rL256871
http://reviews.llvm.org/rL256964
http://reviews.llvm.org/rL257400
http://reviews.llvm.org/rL257404

llvm-svn: 257414
2016-01-11 23:31:48 +00:00
Sanjay Patel 6c1ddbb7b6 [LibCallSimplifier] don't allow sqrt transform unless all ops are unsafe
Fix the FIXME added with:
http://reviews.llvm.org/rL257400

llvm-svn: 257404
2016-01-11 22:50:36 +00:00
Justin Bogner 0fb7ed5726 LoopUnroll: Use the optsize threshold for minsize as well
Currently we're unrolling loops more in minsize than in optsize, which
means -Oz will have a larger code size than -Os. That doesn't make any
sense.

This resolves the FIXME about this in LoopUnrollPass and extends the
optsize test to make sure we use the smaller threshold for minsize as
well.

llvm-svn: 257402
2016-01-11 22:39:43 +00:00
Sanjay Patel 683f29735f [LibCallSimplifier] use instruction-level fast-math-flags to transform sqrt calls
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555

The intent of the patch is to preserve the current behavior of the transform except
that we use the sqrt instruction's 'fast' attribute as a trigger rather than the
function-level attribute.

But this raises a bug noted by the new FIXME comment.

In order to do this transform:
sqrt((x * x) * y) ---> fabs(x) * sqrt(y)

...we need all of the sqrt, the first fmul, and the second fmul to be 'fast'. 
If any of those ops is strict, we should bail out.

Differential Revision: http://reviews.llvm.org/D15937

llvm-svn: 257400
2016-01-11 22:34:19 +00:00
Silviu Baranga 603954ef0e Revert r257164 - it has caused spec2k6 failures in LTO mode
llvm-svn: 257340
2016-01-11 16:19:38 +00:00
David Majnemer 7a99dd4d3f Add test for r257279.
llvm-svn: 257280
2016-01-10 07:13:33 +00:00
Chen Li 1689c2f54b [SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad.
Summary:
This is a fix of D13718. D13718 was committed but then reverted because of the following bug:
https://llvm.org/bugs/show_bug.cgi?id=25299

This patch fixes the issue shown in the bug.

Reviewers: majnemer, reames

Subscribers: jevinskie, llvm-commits

Differential Revision: http://reviews.llvm.org/D14308

llvm-svn: 257277
2016-01-10 05:48:01 +00:00
Manuel Jacob 734e73342d [RS4GC] Update and simplify handling of Constants in findBaseDefiningValueOfVector().
Summary:
This is analogous to r256079, which removed an overly strong assertion, and
r256812, which simplified the code by replacing three conditionals by one.

Reviewers: reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D16019

llvm-svn: 257250
2016-01-09 04:02:16 +00:00
Philip Reames 5715f576ea [rs4gc] Optionally directly relocated vector of pointers
This patch teaches rewrite-statepoints-for-gc to relocate vector-of-pointers directly rather than trying to split them. This builds on the recent lowering/IR changes to allow vector typed gc.relocates.

The motivation for this is that we recently found a bug in the vector splitting code where depending on visit order, a vector might not be relocated at some safepoint. Specifically, the bug is that the splitting code wasn't updating the side tables (live vector) of other safepoints. As a result, a vector which was live at two safepoints might not be updated at one of them. However, if you happened to visit safepoints in post order over the dominator tree, everything worked correctly. Weirdly, it turns out that post order is actually an incredibly common order to visit instructions in in practice. Frustratingly, I have not managed to write a test case which actually hits this. I can only reproduce it in large IR files produced by actual applications.

Rather than continue to make this code more complicated, we can remove all of the complexity by just representing the relocation of the entire vector natively in the IR.

At the moment, the new functionality is hidden behind a flag. To use this code, you need to pass "-rs4gc-split-vector-values=0". Once I have a chance to stress test with this option and get feedback from other users, my plan is to flip the default and remove the original splitting code. I would just remove it now, but given the rareness of the bug, I figured it was better to leave it in place until the new approach has been stress tested.

Differential Revision: http://reviews.llvm.org/D15982

llvm-svn: 257244
2016-01-09 01:31:13 +00:00
Haicheng Wu a6a3279bd3 [JumpThreading] Split select that has constant conditions coming from the PHI node
Look for PHI/Select in the same BB of the form

bb:
  %p = phi [false, %bb1], [true, %bb2], [false, %bb3], [true, %bb4], ...
  %s = select p, trueval, falseval

And expand the select into a branch structure. This later enables
jump-threading over bb in this pass.

Using the similar approach of SimplifyCFG::FoldCondBranchOnPHI(), unfold
select if the associated PHI has at least one constant.  If the unfolded
select is not jump-threaded, it will be folded again in the later
optimizations.

llvm-svn: 257198
2016-01-08 19:39:39 +00:00
Justin Bogner e9fb228d59 LoopInfo: Simplify ownership of Loop objects
It's strange that LoopInfo mostly owns the Loop objects, but that it
defers deleting them to the loop pass manager. Instead, change the
oddly named "updateUnloop" to "markAsRemoved" and have it queue the
Loop object for deletion. We can't delete the Loop immediately when we
remove it, since we need its pointer identity still, so we'll mark the
object as "invalid" so that clients can see what's going on.

llvm-svn: 257191
2016-01-08 19:08:53 +00:00
Teresa Johnson a1080ee6f0 [ThinLTO] Delay metadata materializtion in function importer
The function importer was still materializing metadata when modules were
loaded for function importing. We only want to materialize it when we
are going to invoke the metadata linking postpass. Materializing it
before function importing is not only unnecessary, but also causes
metadata referenced by imported functions to be mapped in early, and
then not connected to the rest of the module level metadata when it is
ultimately linked in.

Augmented the test case to specifically check for the metadata being
properly connected, which it wasn't before this fix.

llvm-svn: 257171
2016-01-08 14:17:41 +00:00
Silviu Baranga 9e007efad2 Re-commit r257064, this time with a fixed assert
In setInsertionPoint if the value is not a PHI, Instruction or
Argument it should be a Constant, not a ConstantExpr.

Original commit message:

[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs

Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257164
2016-01-08 11:11:04 +00:00
Chandler Carruth 1926b70e37 [attrs] Split the late-revisit pattern for deducing norecurse in
a top-down manner into a true top-down or RPO pass over the call graph.

There are specific patterns of function attributes, notably the
norecurse attribute, which are most effectively propagated top-down
because all they us caller information.

Walk in RPO over the call graph SCCs takes the form of a module pass run
immediately after the CGSCC pass managers postorder walk of the SCCs,
trying again to deduce norerucrse for each singular SCC in the call
graph.

This removes a very legacy pass manager specific trick of using a lazy
revisit list traversed during finalization of the CGSCC pass. There is
no analogous finalization step in the new pass manager, and a lazy
revisit list is just trying to produce an RPO iteration of the call
graph. We can do that more directly if more expensively. It seems
unlikely that this will be the expensive part of any compilation though
as we never examine the function bodies here. Even in an LTO run over
a very large module, this should be a reasonable fast set of operations
over a reasonably small working set -- the function call graph itself.

In the future, if this really is a compile time performance issue, we
can look at building support for both post order and RPO traversals
directly into a pass manager that builds and maintains the PO list of
SCCs.

Differential Revision: http://reviews.llvm.org/D15785

llvm-svn: 257163
2016-01-08 10:55:52 +00:00
Sanjay Patel d72a458d28 [InstCombine] insert a new shuffle in a safe place (PR25999)
Limit this transform to a basic block and guard against PHIs.
Hopefully, this fixes the remaining failures in PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999

llvm-svn: 257133
2016-01-08 01:39:16 +00:00
Aditya Nandakumar f94c149f7f Instructions to be redone only if from the same BB
While adding instructions(possible roots) to be redone, make sure they
are from the same basic block.

llvm-svn: 257112
2016-01-07 23:22:55 +00:00
Keno Fischer ea33a25816 Temporarily revert r257105 "[Verifier] Check that debug values have proper size"
Looks like there's a case where clang generates debug info that triggers
the new verifier check. Reverting while investigating.

llvm-svn: 257107
2016-01-07 22:39:11 +00:00
Keno Fischer b3326be6ad [Verifier] Check that debug values have proper size
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.

One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.

Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
  it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
  variable as that would depend on the target, even though this test is
  supposed to be generic
- I had to manually declared size/align for reference type. See also the
  discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref

Reviewers: aprantl
Differential Revision: http://reviews.llvm.org/D14276

llvm-svn: 257105
2016-01-07 22:18:37 +00:00
David Majnemer f1a9c9e148 [SCCP] Don't violate the lattice invariants
We marked values which are 'undef' as constant instead of undefined
which violates SCCP's invariants.  If we can figure out that a
computation results in 'undef', leave it in the undefined state.

This fixes PR16052.

llvm-svn: 257102
2016-01-07 21:36:16 +00:00
David Majnemer 867bbc775f Add test for r256912
I forgot to add this with the rest of r256912.

llvm-svn: 257088
2016-01-07 19:27:16 +00:00
David Majnemer bae945735a [SCCP] Can't go from overdefined to constant
The fix for PR23999 made us mark loads of null as producing the constant
undef which upsets the lattice.  Instead, keep the load as "undefined".
This fixes PR26044.

llvm-svn: 257087
2016-01-07 19:25:39 +00:00
Silviu Baranga dd68d46ec1 Revert r257064. It caused failures in some sanitizer tests.
llvm-svn: 257069
2016-01-07 15:46:43 +00:00
Silviu Baranga 57b1b90996 [InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.

InstCombine was already doing this when comparing two GEPs if the
base pointers were the same. However, in the case where we have
complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs,
conversions to or from integers, etc) the value of the original
base pointer will be hidden to the optimizer and this transformation
will be disabled.

This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the
relevant uses of GEPs to GEPs with a common base pointer. The
GEP comparison will be converted to a comparison done on indices.

Reviewers: majnemer, jmolloy

Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits

Differential Revision: http://reviews.llvm.org/D15146

llvm-svn: 257064
2016-01-07 14:56:08 +00:00
Mehdi Amini 0535003bef Fix PR26051: Memcpy optimization should introduce a call to memcpy before the store destination position
This is a conservative fix, I expect Amaury to relax this.
Follow-up for r256923

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 256999
2016-01-06 23:50:22 +00:00
Chen Li 78bde83003 [SplitLandingPadPredecessors] Create a PHINode for the original landingpad only if it has some uses
Summary: This patch adds a check in SplitLandingPadPredecessors to see if the original landingpad instruction has any uses. If not, we don't need to create a PHINode for it in the joint block since it's gonna be a dead code anyway. The motivation for this patch is that we found a bug that SplitLandingPadPredecessors created a PHINode of token type landingpad, which failed the verifier since PHINode can not be token type. However, the created PHINode will never be used in our code pattern. This patch will workaround this bug, and we might add supports in SplitLandingPadPredecessors to handle token type landingpad with uses in the future.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15835

llvm-svn: 256972
2016-01-06 20:32:05 +00:00
Amaury Sechet 3235c08253 Promote aggregate store to memset when possible
Summary: As per title. This will allow the optimizer to pick up on it.

Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15923

llvm-svn: 256969
2016-01-06 19:47:24 +00:00
Sanjay Patel cddcd7256c [LibCallSimplifier] use instruction-level fast-math-flags for tan/atan transform
llvm-svn: 256964
2016-01-06 19:23:35 +00:00
Amaury Sechet d3b2c0fd94 Improve load/store to memcpy for aggregate
Summary: It turns out that if we don't try to do it at the store location, we can do it before any operation that alias the load, as long as no operation alias the store.

Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15903

llvm-svn: 256923
2016-01-06 09:30:39 +00:00
Philip Reames ae050a5703 [BasicAA] Remove special casing of memset_pattern16 in favor of generic attribute inference
Most of the properties of memset_pattern16 can be now covered by the generic attributes and inferred by InferFunctionAttrs.  The only exceptions are:
- We don't yet have a writeonly attribute for the first argument.
- We don't have an attribute for modeling the access size facts encoded in MemoryLocation.cpp.  

Differential Revision: http://reviews.llvm.org/D15879

llvm-svn: 256911
2016-01-06 04:53:16 +00:00
Manuel Jacob 3eedd11329 [Statepoints] Check for the "gc-leaf-function" attribute on call sites as well.
Reviewers: sanjoy, reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15900

llvm-svn: 256875
2016-01-05 23:59:08 +00:00
Sanjay Patel 29095ea1b0 [LibCallSimplfier] use instruction-level fast-math-flags for fmin/fmax transforms
llvm-svn: 256871
2016-01-05 20:46:19 +00:00
Amaury Sechet a0c242cdfd Implement load to store => memcpy in MemCpyOpt for aggregates
Summary:
Most of the tool chain is able to optimize scalar and memcpy like operation effisciently while it isn't that good with aggregates. In order to improve the support of aggregate, we try to change aggregate manipulation into either scalar or memcpy like ones whenever possible without loosing informations.

This is one such opportunity.

Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15894

llvm-svn: 256868
2016-01-05 20:17:48 +00:00
Manuel Jacob 68b753a4fb Correct my last commit (revision 256860).
I forgot to save a small wording improvement before committing.

llvm-svn: 256862
2016-01-05 19:45:54 +00:00
Manuel Jacob b8060cd88a [PlaceSafepoints] Add a test.
Calls of functions with the "gc-leaf-function" attribute shouldn't be turned
into a safepoint.

llvm-svn: 256860
2016-01-05 19:40:58 +00:00
Sanjay Patel a1c5347982 [InstCombine] insert a new shuffle before its uses (PR26015)
Although this solves the test case in PR26015:
https://llvm.org/bugs/show_bug.cgi?id=26015

And may solve PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999

...I suspect this is not the best solution. I think we want to insert the new shuffle
just ahead of the earliest ExtractElementInst that we're replacing, but I don't know 
how that should be implemented.

Differential Revision: http://reviews.llvm.org/D15878

llvm-svn: 256857
2016-01-05 19:09:47 +00:00
David Majnemer 59eb733af1 [SimplifyCFG] Further improve our ability to remove redundant catchpads
In r256814, we managed to remove catchpads which were trivially redudant
because they were the same SSA value.  We can do better using the same
algorithm but with a smarter datastructure by hashing the SSA values
within the catchpad and comparing them structurally.

llvm-svn: 256815
2016-01-05 07:42:17 +00:00
David Majnemer 2fa8651a8f [SimplifyCFG] Remove redundant catchpads
Remove duplicate catchpad handlers from a catchswitch.

llvm-svn: 256814
2016-01-05 06:27:50 +00:00
Joseph Tremoulet 0d808888c1 [WinEH] Simplify unreachable catchpads
Summary:
At least for CoreCLR, a catchpad which immediately executes an
`unreachable` instruction indicates that the exception can never have a
matching type, and so such catchpads can be removed, and so can their
catchswitches if the catchswitch becomes empty.

Reviewers: rnk, andrew.w.kaylor, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15846

llvm-svn: 256809
2016-01-05 02:37:41 +00:00
Chen Li c6021038f6 [InstructionCombining] prepareICWorklistFromFunction halts in infinite loop with instructions of token type
Summary: This patch fixes a bug in prepareICWorklistFromFunction, where the loop becomes infinite with instructions of token type. The patch checks if the instruction is token type, and if so it updates EndInst with the current instruction.

Reviewers: reames, majnemer

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D15859

llvm-svn: 256792
2016-01-04 23:28:57 +00:00
David Majnemer b33f3a239a [LICM] Fix a small oversight introduced in r256763
r256763 had promoteLoopAccessesToScalars check for the existence of a
catchswitch when the exit blocks were populated but
promoteLoopAccessesToScalars may be called with a prepopulated set of
exit blocks which would also need to be checked.

This fixes PR26019.

llvm-svn: 256788
2016-01-04 23:16:22 +00:00
Philip Reames 2466719e44 [MemoryBuiltins] Remove isOperatorNewLike by consolidating non-null inference handling
This patch removes the isOperatorNewLike predicate since it was only being used to establish a non-null return value and we have attributes specifically for that purpose with generic handling. To keep approximate the same behaviour for existing frontends, I added the various operator new like (i.e. instances of operator new) to InferFunctionAttrs. It's not really clear to me why this isn't handled in Clang, but I didn't want to break existing code and any subtle assumptions it might have.

Once this patch is in, I'm going to start separating the isAllocLike family of predicates. These appear to be being used for a mixture of things which should be more clearly separated and documented. Today, they're being used to indicate (at least) aliasing facts, CSE-ability, and default values from an allocation site.

Differential Revision: http://reviews.llvm.org/D15820

llvm-svn: 256787
2016-01-04 22:49:23 +00:00
Aditya Nandakumar 12d060481a Remove dead instructions before Redoing
Before reevaluating instructions, iterate over all instructions
to be reevaluated and remove trivially dead instructions and if
any of it's operands become trivially dead, mark it for deletion
until all trivially dead instructions have been removed

llvm-svn: 256773
2016-01-04 19:48:14 +00:00
David Majnemer 219055f9df [LICM] Don't insert instructions after a catchswitch when performing loop promotion
Inserting after a catchswitch results in verifier errors, bail out on
promotion if a catchswitch is a loop exit.

llvm-svn: 256763
2016-01-04 17:42:19 +00:00
David Majnemer 42a0730c42 [LICM] Make instruction sinking funclet-aware
We had two bugs here:
- We might try to sink into a catchswitch, causing verifier failures.
- We will succeed in sinking into a cleanuppad but we didn't update the
  funclet operand bundle.

This fixes PR26000.

llvm-svn: 256728
2016-01-04 03:37:39 +00:00
Dimitry Andric 227b928abc Fix several accidental DOS line endings in source files
Summary:
There are a number of files in the tree which have been accidentally checked in with DOS line endings.  Convert these to native line endings.

There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those.

Reviewers: joerg, aaron.ballman

Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15848

llvm-svn: 256707
2016-01-03 17:22:03 +00:00
Sanjay Patel bee05caa6b [LibCallSimplifier] propagate FMF when shrinking binary calls
llvm-svn: 256682
2015-12-31 23:40:59 +00:00
Sanjay Patel aa23114cb4 [LibCallSimplifier] propagate FMF when shrinking unary calls
llvm-svn: 256679
2015-12-31 21:52:31 +00:00
Sanjay Patel f6f32bcaa4 change function names to avoid accidentally matching the substring
llvm-svn: 256678
2015-12-31 21:25:25 +00:00
Sanjay Patel 4e8b300400 add 'fast' attribute to calls to show that the flag isn't being propagated
llvm-svn: 256677
2015-12-31 21:12:19 +00:00
Geoff Berry 43dc285915 [JumpThreading] Fix opcode bonus in getJumpThreadDuplicationCost()
The code that was meant to adjust the duplication cost based on the
terminator opcode was not being executed in cases where the initial
threshold was hit inside the loop.

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D15536

llvm-svn: 256568
2015-12-29 18:10:16 +00:00
Manuel Jacob 9db5b93ffc [RS4GC] Fix rematerialization of bitcast of bitcast.
Summary:
Previously, only the outer (last) bitcast was rematerialized, resulting in a
use of the unrelocated inner (first) bitcast after the statepoint.  See the
test case for an example.

Reviewers: igor-laevsky, reames

Subscribers: reames, alex, llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D15789

llvm-svn: 256520
2015-12-28 20:14:05 +00:00
Chandler Carruth 3a040e6d47 [attrs] Extract the pure inference of function attributes into
a standalone pass.

There is no call graph or even interesting analysis for this part of
function attributes -- it is literally inferring attributes based on the
target library identification. As such, we can do it using a much
simpler module pass that just walks the declarations. This can also
happen much earlier in the pass pipeline which has benefits for any
number of other passes.

In the process, I've cleaned up one particular aspect of the logic which
was necessary in order to separate the two passes cleanly. It now counts
inferred attributes independently rather than just counting all the
inferred attributes as one, and the counts are more clearly explained.

The two test cases we had for this code path are both ... woefully
inadequate and copies of each other. I've kept the superset test and
updated it. We need more testing here, but I had to pick somewhere to
stop fixing everything broken I saw here.

Differential Revision: http://reviews.llvm.org/D15676

llvm-svn: 256466
2015-12-27 08:41:34 +00:00
Chandler Carruth f49f1a87ef [attrs] Split off the forced attributes utility into its own pass that
is (by default) run much earlier than FuncitonAttrs proper.

This allows forcing optnone or other widely impactful attributes. It is
also a bit simpler as the force attribute behavior needs no specific
iteration order.

I've added the pass into the default module pass pipeline and LTO pass
pipeline which mirrors where function attrs itself was being run.

Differential Revision: http://reviews.llvm.org/D15668

llvm-svn: 256465
2015-12-27 08:13:45 +00:00
Benjamin Kramer 13dfb7df32 Fix safepoint intrinsic signatures in test.
Should bring back the bots after r256443.

llvm-svn: 256450
2015-12-26 11:40:48 +00:00
Chen Li d71999ef1b [gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. 

Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob

Subscribers: reames, mjacob, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15662

llvm-svn: 256443
2015-12-26 07:54:32 +00:00
Sanjay Patel ae945e7927 [InstCombine] transform more extract/insert pairs into shuffles (PR2109)
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229

The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.

The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109

For that example, the IR becomes:

%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2

And x86 SSE output improves from:

movq	(%rdi), %xmm1           ## xmm1 = mem[0],zero
movdqa	%xmm1, %xmm2
shufps	$229, %xmm2, %xmm2      ## xmm2 = xmm2[1,1,2,3]
shufps	$48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps	$132, %xmm1, %xmm0      ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps	$32, %xmm0, %xmm2       ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps	$36, %xmm2, %xmm0       ## xmm0 = xmm0[0,1],xmm2[2,0]
retq

To the almost optimal:

movhpd	(%rdi), %xmm0

Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases. 
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.

Differential Revision: http://reviews.llvm.org/D15096

llvm-svn: 256394
2015-12-24 21:17:56 +00:00
David Majnemer 63ad9e0543 [OperandBundles] Have TailCallElim play nice with operand bundles
A call site's use of a Value might not correspond to an argument
operand but to a bundle operand.

This fixes PR25928.

llvm-svn: 256328
2015-12-23 09:58:43 +00:00
David Majnemer 02f4787e45 [OperandBundles] Have InstCombine play nice with operand bundles
Don't assume a call's use corresponds to an argument operand, it might
correspond to a bundle operand.

llvm-svn: 256327
2015-12-23 09:58:41 +00:00
David Majnemer 464be3724a [OperandBundles] Have DeadArgElim play nice with operand bundles
A call site's use of a Value might not correspond to an argument
operand but to a bundle operand.

llvm-svn: 256326
2015-12-23 09:58:36 +00:00
Manuel Jacob a4efd8ac2e [RS4GC] Fix base pair printing for constants.
Previously, "%" + name of the value was printed for each derived and base
pointer.  This is correct for instructions, but wrong for e.g. globals.

llvm-svn: 256305
2015-12-23 00:19:45 +00:00
Rafael Espindola 10d9a033db Also add unnamed_addr to functions.
llvm-svn: 256281
2015-12-22 20:43:30 +00:00
Rafael Espindola 5349d87a69 Delete dead GlobalAliases.
llvm-svn: 256276
2015-12-22 19:50:22 +00:00
Cong Hou e93b8e1539 [BPI] Replace weights by probabilities in BPI.
This patch removes all weight-related interfaces from BPI and replace
them by probability versions. With this patch, we won't use edge weight
anymore in either IR or MC passes. Edge probabilitiy is a better
representation in terms of CFG update and validation.


Differential revision: http://reviews.llvm.org/D15519 

llvm-svn: 256263
2015-12-22 18:56:14 +00:00
Manuel Jacob 4e4f60ded0 Remove deprecated llvm.experimental.gc.result.{int,float,ptr} intrinsics.
Summary:
These were deprecated 11 months ago when a generic
llvm.experimental.gc.result intrinsic, which works for all types, was added.

Reviewers: sanjoy, reames

Subscribers: sanjoy, chenli, llvm-commits

Differential Revision: http://reviews.llvm.org/D15719

llvm-svn: 256262
2015-12-22 18:44:45 +00:00
Manuel Jacob 990dfa6fe5 [RS4GC] Fix crash in the case that a live variable has a constant base.
Summary:
Previously, RS4GC crashed in CreateGCRelocates() because it assumed
that every base is also in the array of live variables, which isn't true if a
live variable has a constant base.

This change fixes the crash by making sure CreateGCRelocates() won't try to
relocate a live variable with a constant base.  This would be unnecessary
anyway because anything with a constant base won't move.

Reviewers: reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D15556

llvm-svn: 256252
2015-12-22 16:50:44 +00:00
Easwaran Raman bdb6f1dcc3 Determine callee's hotness and adjust threshold based on that. NFC.
This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold
callees and uses values of inlinehint-threshold and inlinecold-threshold
respectively as the thresholds for such callees.

Differential Revision: http://reviews.llvm.org/D15245

llvm-svn: 256222
2015-12-22 00:32:35 +00:00
Evgeniy Stepanov 8827f2db85 [safestack] Add option for non-TLS unsafe stack pointer.
This patch adds an option, -safe-stack-no-tls, for using normal
storage instead of thread-local storage for the unsafe stack pointer.
This can be useful when SafeStack is applied to an operating system
kernel.

http://reviews.llvm.org/D15673

Patch by Michael LeMay.

llvm-svn: 256221
2015-12-22 00:13:11 +00:00
Evgeniy Stepanov fda72c52a2 [cfi] Fix LowerBitSets on 32-bit targets.
This code attempts to truncate IntPtrTy to i32, which may be the same
type.

llvm-svn: 256205
2015-12-21 22:14:04 +00:00
Sanjoy Das ab0626e35f Nonnull elements in OperandBundleCallSites are not all Instructions
`CloneAndPruneIntoFromInst` sometimes RAUW's dead instructions with
`undef` before erasing them (to avoid deleting instructions that still
have uses).  This changes the `WeakVH` in `OperandBundleCallSites` to
hold an `undef`, and we need to guard for this situation in eventuality
in `llvm::InlineFunction`.

llvm-svn: 256110
2015-12-19 22:40:28 +00:00
Sanjoy Das b496834f8e [Deopt bundles] Fix a test case
The `CHECK-NOT` line was incorrect, and would not have caught a
breakage.

llvm-svn: 256109
2015-12-19 22:40:22 +00:00
Manuel Jacob f4b3577d66 Remove double blanks. NFC.
llvm-svn: 256100
2015-12-19 18:26:53 +00:00
Philip Reames 5d54689bca [RS4GC] Remove an overly strong assertion
As shown by the included test case, it's reasonable to end up with constant references during base pointer calculation.  The code actually handled this case just fine, we only had the assert to help isolate problems under the belief that constant references shouldn't be present in IR generated by managed frontends. This turned out to be wrong on two fronts: 1) Manual Jacobs is working on a language with constant references, and b) we found a case where the optimizer does create them in practice.

llvm-svn: 256079
2015-12-19 02:38:22 +00:00
Keno Fischer 00cbf9a69a Clean up the processing of dbg.value in various places
Summary:
First up is instcombine, where in the dbg.declare -> dbg.value conversion,
the llvm.dbg.value needs to be called on the actual loaded value, rather
than the address (since the whole point of this transformation is to be
able to get rid of the alloca). Further, now that that's cleaned up, we
can remove a hack in the backend, that would add an implicit OP_deref if
the argument to dbg.value was an alloca. This stems from before the
existence of DIExpression and is no longer necessary since the deref can
be expressed explicitly.

Now, in order to make sure that the tests pass with this change, we need to
correct the printing of DEBUG_VALUE comments to take into account the
expression, which wasn't taken into account before.

Unfortunately, for both these changes, there were a number of incorrect
test cases (mostly the wrong number of DW_OP_derefs, but also a couple
where the test itself was broken more badly). aprantl and I have gone
through and adjusted these test case in order to make them pass with
these fixes and in some cases to make sure they're actually testing
what they are meant to test.

Reviewers: aprantl

Subscribers: dsanders

Differential Revision: http://reviews.llvm.org/D14186

llvm-svn: 256077
2015-12-19 02:02:44 +00:00
Matt Arsenault 2aed6ca1d3 AMDGPU: Switch barrier intrinsics to using convergent
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.

llvm-svn: 256075
2015-12-19 01:46:41 +00:00
Jingyue Wu ba3ca76ed2 [NaryReassociate] allow candidate to have a different type
Summary:
If Candiadte may have a different type from GEP, we should bitcast or
pointer cast it to GEP's type so that the later RAUW doesn't complain.

Added a test in nary-gep.ll

Reviewers: tra, meheff

Subscribers: mcrosier, llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D15618

llvm-svn: 256035
2015-12-18 21:36:30 +00:00
Andrew Kaylor 123048d26a [WinEH] Update LCSSA to handle catchswitch with handlers inside and outside a loop
Differential Revision: http://reviews.llvm.org/D15630

llvm-svn: 256005
2015-12-18 18:12:35 +00:00
Philip Reames d7a6cc859a [InstCombine] Extend peephole DSE to handle unordered atomics
This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine.

Key points:
 * We only remove unordered or simple stores.
 * The loads producing values consumed by dead stores don't influence whether the store is dead.

Differential Revision: http://reviews.llvm.org/D15354

llvm-svn: 255932
2015-12-17 22:19:27 +00:00
Philip Reames 15145fb7b1 [EarlyCSE] DSE of atomic unordered stores
The rules for removing trivially dead stores are a lot less complicated than loads. Since we know the later store post dominates the former and the former dominates the later, unless the former has side effects other than the actual store, we can remove it. One slightly surprising thing is that we can freely remove atomic stores, even if the later one isn't atomic. There's no guarantee the atomic one was every visible.

For the moment, we don't handle DSE of ordered atomic stores. We could extend the same chain of reasoning to them, but the catch is we'd then have to model the ordering effect without a store instruction. Since our fences are a stronger than our operation orderings, simple using a fence isn't an obvious win. This arguable calls for a refinement in our fence specification, but that's (much) later work.

Differential Revision: http://reviews.llvm.org/D15352

llvm-svn: 255914
2015-12-17 18:50:50 +00:00
Teresa Johnson e5a6191732 [ThinLTO] Metadata linking for imported functions
Summary:
Second patch split out from http://reviews.llvm.org/D14752.

Maps metadata as a post-pass from each module when importing complete,
suturing up final metadata to the temporary metadata left on the
imported instructions.

This entails saving the mapping from bitcode value id to temporary
metadata in the importing pass, and from bitcode value id to final
metadata during the metadata linking postpass.

Depends on D14825.

Reviewers: dexonsmith, joker.eph

Subscribers: davidxl, llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D14838

llvm-svn: 255909
2015-12-17 17:14:09 +00:00
Charlie Turner b69b92855d [NFC] Update horizontal reduction test cases.
These testcases no longer need to specify -slp-vectorize-hor, since it was
enabled by default in r252733.

llvm-svn: 255783
2015-12-16 17:22:24 +00:00
James Molloy 3d21dcf3ed [SimplifyCFG] Don't create unnecessary PHIs
In conditional store merging, we were creating PHIs when we didn't
need to. If the value to be predicated isn't defined in the block
we're predicating, then it doesn't need a PHI at all (because we only
deal with triangles and diamonds, any value not in the predicated BB
must dominate the predicated BB).

This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite.

Now with a fix (and fixed tests) for the conformance issue seen in Chromium.

llvm-svn: 255767
2015-12-16 14:12:44 +00:00
Philip Reames ae1f265bf1 [EarlyCSE] DSE of stores which write back loaded values
Extend EarlyCSE with an additional style of dead store elimination. If we write back a value just read from that memory location, we can eliminate the store under the assumption that the value hasn't changed.

I'm implementing this mostly because I noticed the omission when looking at the code. It seemed strange to have InstCombine have a peephole which was more powerful than EarlyCSE. :)

Differential Revision: http://reviews.llvm.org/D15397

llvm-svn: 255739
2015-12-16 01:01:30 +00:00
Philip Reames 61a24ab6cc [IR] Add support for floating pointer atomic loads and stores
This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom.

Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend.

This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form.

Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out.

As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material.

Differential Revision: http://reviews.llvm.org/D15471

llvm-svn: 255737
2015-12-16 00:49:36 +00:00
Evgeniy Stepanov 67849d56c3 Cross-DSO control flow integrity (LLVM part).
An LTO pass that generates a __cfi_check() function that validates a
call based on a hash of the call-site-known type and the target
pointer.

llvm-svn: 255693
2015-12-15 23:00:08 +00:00
Cong Hou a73ffa2206 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
(This is the third attempt to check in this patch, and the first two are r255454
and r255460. The once failed test file reg-usage.ll is now moved to
test/Transform/LoopVectorize/X86 directory with target datalayout and target
triple indicated.)

LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177

llvm-svn: 255691
2015-12-15 22:45:09 +00:00
Sanjay Patel 38a022623a [SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)
This is the last general step to allow more IR-level speculation with a safety harness in place in CodeGenPrepare.

The intent is to restore the behavior enabled by:
http://reviews.llvm.org/rL228826

but prevent bad performance such as:
https://llvm.org/bugs/show_bug.cgi?id=24818

Earlier patches in this sequence:
D12882 (disable SimplifyCFG speculation for expensive instructions)
D13297 (have CGP despeculate expensive ops)
D14630 (have CGP despeculate special versions of cttz/ctlz)

As shown in the test cases, we only have two instructions currently affected: ctz for some x86 and fdiv generally. 
Allowing exactly one expensive instruction is a bit of a hack, but it lines up with what is currently implemented
in CGP. If we make the despeculation more general in CGP, we can make the speculation here more liberal.

A follow-up patch will adjust the cost for sqrt and possibly other typically expensive math intrinsics (currently
everything is cheap by default). GPU targets would likely want to override those expensive default costs (just as
they probably should already override the cost of div/rem) because just about any math is cheaper than control-flow
on those targets.

Differential Revision: http://reviews.llvm.org/D15213

llvm-svn: 255660
2015-12-15 17:38:29 +00:00
Nicolai Hahnle 78fd4f087b AMDGPU: mark ldexp LibCalls as unavailable
Summary:
The LibCallSimplifier will turn llvm.exp2.* intrinsics into ldexp* libcalls
which do not make sense with the AMDGPU backend.

In the long run, we'll want an llvm.ldexp.* intrinsic to properly make use of
this optimization, but this works around the problem for now.

See also: http://reviews.llvm.org/D14327 (suggested llvm.ldexp.* implementation)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709

Reviewers: arsenm, tstellarAMD

Differential Revision: http://reviews.llvm.org/D14990

llvm-svn: 255658
2015-12-15 17:24:15 +00:00
Mehdi Amini 1c131b37ed Instcombine: destructor loads of structs that do not contains padding
For non padded structs, we can just proceed and deaggregate them.
We don't want ot do this when there is padding in the struct as to not
lose information about this padding (the subsequents passes would then
try hard to preserve the padding, which is undesirable).

Also update extractvalue.ll and cast.ll so that they use structs with padding.

Remove the FIXME in the extractvalue of laod case as the non padded case is
handled when processing the load, and we don't want to do it on the padded
case.

Patch by: Amaury SECHET <deadalnix@gmail.com>

Differential Revision: http://reviews.llvm.org/D14483

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255600
2015-12-15 01:44:07 +00:00
Xinliang David Li c7018a25c6 [PGO] make profile prefix even shorter and more readable
llvm-svn: 255586
2015-12-15 00:32:56 +00:00
Xinliang David Li 0812747979 [PGO] Shorten profile symbol prefixes
Profile symbols have long prefixes which waste space and creating pressure for linker.
This patch shortens the prefixes to minimal length without losing verbosity.

Differential Revision: http://reviews.llvm.org/D15503

llvm-svn: 255575
2015-12-14 23:26:27 +00:00
Reid Kleckner db9a91e324 Revert "Don't create unnecessary PHIs"
This reverts commit r255489.

It causes test failures in Chromium and does not appear to respect the
AlternativeV parameter.

llvm-svn: 255562
2015-12-14 22:36:57 +00:00
Sanjay Patel fa54acedd1 add fast-math-flags to 'call' instructions (PR21290)
This patch adds optional fast-math-flags (the same that apply to fmul/fadd/fsub/fdiv/frem/fcmp)
to call instructions in IR. Follow-up patches would use these flags in LibCallSimplifier, add 
support to clang, and extend FMF to the DAG for calls.

Motivating example:

%y = fmul fast float %x, %x
%z = tail call float @sqrtf(float %y)

We'd like to be able to optimize sqrt(x*x) into fabs(x). We do this today using a function-wide
attribute for unsafe-math, but we really want to trigger on the instructions themselves:

%z = tail call fast float @sqrtf(float %y)

because in an LTO build it's possible that calls with fast semantics have been inlined into a
function with non-fast semantics.

The code changes and tests are based on the recent commits that added "notail":
http://reviews.llvm.org/rL252368

and added FMF to fcmp:
http://reviews.llvm.org/rL241901

Differential Revision: http://reviews.llvm.org/D14707

llvm-svn: 255555
2015-12-14 21:59:03 +00:00
David Majnemer bbfc7219ef [IR] Remove terminatepad
It turns out that terminatepad gives little benefit over a cleanuppad
which calls the termination function.  This is not sufficient to
implement fully generic filters but MSVC doesn't support them which
makes terminatepad a little over-designed.

Depends on D15478.

Differential Revision: http://reviews.llvm.org/D15479

llvm-svn: 255522
2015-12-14 18:34:23 +00:00
Sanjay Patel f727e387be [InstCombine] fold trunc ([lshr] (bitcast vector) ) --> extractelement (PR25543)
This is a fix for PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The idea is to take the existing fold of:
bitcast ( trunc ( lshr ( bitcast X))) --> extractelement (bitcast X)
( http://reviews.llvm.org/rL112232 )

And break it into less specific transforms so we'll catch more cases such as
the example in the bug report:
bitcast ( trunc ( lshr ( bitcast X))) -->
bitcast ( extractelement (bitcast X)) -->
extractelement (bitcast X)

Enabling patches for this change:
http://reviews.llvm.org/rL255399 (combine bitcasts)
http://reviews.llvm.org/rL255433 (canonicalize extractelement(bitcast X))

Differential Revision: http://reviews.llvm.org/D15392

llvm-svn: 255504
2015-12-14 16:16:54 +00:00
James Molloy 2b1e101e99 Don't create unnecessary PHIs
In conditional store merging, we were creating PHIs when we didn't
need to. If the value to be predicated isn't defined in the block
we're predicating, then it doesn't need a PHI at all (because we only
deal with triangles and diamonds, any value not in the predicated BB
must dominate the predicated BB).

This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite.

llvm-svn: 255489
2015-12-14 10:57:01 +00:00
Cong Hou ccec6e4d84 Revert r255460, which still causes test failures on some platforms.
Further investigation on the failures is ongoing.

llvm-svn: 255463
2015-12-13 17:15:38 +00:00
Cong Hou e6a210f50b [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
(This is the second attempt to check in this patch: REQUIRES: asserts is added
to reg-usage.ll now.)

LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177

llvm-svn: 255460
2015-12-13 16:55:46 +00:00
Cong Hou 7c369156eb Revert r255454 as it leads to several test failers on buildbots.
llvm-svn: 255456
2015-12-13 09:28:57 +00:00
Cong Hou 7f8b43d424 [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.
LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.


Differential revision: http://reviews.llvm.org/D15177

llvm-svn: 255454
2015-12-13 08:44:08 +00:00
Xinliang David Li d1bab96045 [PGO] Stop using invalid char in instr variable names.
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).

This is the second try. The fix in this patch is very localized.
Only profile symbol names of profile symbols with internal 
linkage are fixed up while initializer of name syms are not 
changes. This means there is no format change nor version bump.

llvm-svn: 255434
2015-12-12 17:28:03 +00:00
Sanjay Patel 1d49fc9b27 [InstCombine] canonicalize (bitcast (extractelement X)) --> (extractelement(bitcast X))
This change was discussed in D15392. It allows us to remove the fold that was added
in:
http://reviews.llvm.org/r255261

...and it will allow us to generalize this fold:
http://reviews.llvm.org/rL112232

while preserving the order of bitcast + extract that it produces and testing shows
is better handled by the backend.

Note that the existing check for "isVectorTy()" wasn't strong enough in general
and specifically because: x86_mmx. It's not a vector, but it's not vectorizable
either. So here we check VectorType::isValidElementType() directly before 
proceeding with the transform.

llvm-svn: 255433
2015-12-12 16:44:48 +00:00
David Majnemer 550654aaf1 Move catchpad-phi-cast.ll to the X86 specific subdirectory
It is X86 specific and will not be properly exercised unless LLVM is
built with the X86 target.

llvm-svn: 255426
2015-12-12 06:21:08 +00:00
David Majnemer 8a1c45d6e8 [IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
  but they are difficult to explain to others, even to seasoned LLVM
  experts.
- catchendpad and cleanupendpad are optimization barriers.  They cannot
  be split and force all potentially throwing call-sites to be invokes.
  This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
  It is unsplittable, starts a funclet, and has control flow to other
  funclets.
- The nesting relationship between funclets is currently a property of
  control flow edges.  Because of this, we are forced to carefully
  analyze the flow graph to see if there might potentially exist illegal
  nesting among funclets.  While we have logic to clone funclets when
  they are illegally nested, it would be nicer if we had a
  representation which forbade them upfront.

Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
  flow, just a bunch of simple operands;  catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
  the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
  the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad.  Their presence can be inferred
  implicitly using coloring information.

N.B.  The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for.  An expert should take a
look to make sure the results are reasonable.

Reviewers: rnk, JosephTremoulet, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D15139

llvm-svn: 255422
2015-12-12 05:38:55 +00:00
Sanjay Patel 93f55dd36d [InstCombine] allow any pair of bitcasts to be combined
This change is discussed in D15392 and should allow us to effectively
revert:
http://llvm.org/viewvc/llvm-project?view=revision&revision=255261
if we canonicalize bitcasts ahead of extracts.

It should be safe to convert any pair of bitcasts into a single bitcast, 
however, it was mentioned here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110829/127089.html
that we're not allowed to bitcast from an x86_mmx to some other types, but I'm 
not seeing any failures from that, and we have regression tests in CodeGen/X86
that appear to cover all of those cases. 

Some day we'll get to remove that MMX wart from LLVM IR completely?

Differential Revision: http://reviews.llvm.org/D15468

llvm-svn: 255399
2015-12-12 00:33:36 +00:00
Sanjay Patel ffde9e14a2 use FileCheck for better checking
llvm-svn: 255394
2015-12-12 00:01:10 +00:00
Sanjay Patel d497ad43da Add tests for bitcast-bitcast sequences for all scalar/vector permutations
As noted in http://reviews.llvm.org/D15392 , we should be able to improve this.

llvm-svn: 255370
2015-12-11 20:26:30 +00:00
Xinliang David Li a86545b0b5 [PGO] Revert r255365: solution incomplete, not handling lambda yet
llvm-svn: 255369
2015-12-11 20:23:22 +00:00
Xinliang David Li c79283ef29 [PGO] Stop using invalid char in instr variable names.
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).

This patch fixed the issue. With the change, the index format
version will be bumped up by 1. Backward compatibility is 
preserved with this change.

Differential Revision: http://reviews.llvm.org/D15243

llvm-svn: 255365
2015-12-11 19:53:19 +00:00
Chad Rosier d7634fc91d Revert r255247, r255265, and r255286 due to serious compile-time regressions.
Revert "[DSE] Disable non-local DSE to see if the bots go green."
Revert "[DeadStoreElimination] Use range-based loops. NFC."
Revert "[DeadStoreElimination] Add support for non-local DSE."

llvm-svn: 255354
2015-12-11 18:39:41 +00:00
James Molloy 1bb6ea5e2d [Mem2Reg] Respect optnone
Mem2Reg shouldn't be optimizing a function that is marked
optnone. There is a test checking this that fails when mem2reg is
explicitly added to the standard pass pipeline.

llvm-svn: 255336
2015-12-11 13:36:59 +00:00
James Molloy 37b82e79b2 [InstCombine] Make MatchBSwap also match bit reversals
MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse.

llvm-svn: 255334
2015-12-11 10:04:51 +00:00
JF Bastien 82bf85ffed EarlyCSE: add tests
Summary: As a follow-up to rL255054 I wasn't able to convince myself that the code did what I thought, so I wrote more tests.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15371

llvm-svn: 255295
2015-12-10 20:24:34 +00:00
Chad Rosier 843c7b4309 [DSE] Disable non-local DSE to see if the bots go green.
I see a few bots timing out, so I'm speculatively disabling r255247.

llvm-svn: 255286
2015-12-10 19:23:02 +00:00
Rong Xu 2611ff8a27 [PGO] Use %t as the temporary profdata filename in the test cases.
Using %t rather %T/<specific_name> as the temporary profdata filename.

llvm-svn: 255271
2015-12-10 18:24:44 +00:00
Sanjay Patel c83fd9554a [InstCombine] fold bitcasts around an extractelement (3rd try)
This is a redo of r255137 (reverted at r255227) which was a redo of 
r255124 (reverted at r255126) with a fixed check for a scalar source 
type and an added test for the failure that caused the revert.

Original commit message:

Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255261
2015-12-10 17:09:28 +00:00
Chad Rosier 533bc3fcac [DeadStoreElimination] Add support for non-local DSE.
We extend the search for redundant stores to predecessor blocks that
unconditionally lead to the block BB with the current store instruction.  That
also includes single-block loops that unconditionally lead to BB, and
if-then-else blocks where then- and else-blocks unconditionally lead to BB.

http://reviews.llvm.org/D13363
Patch by Ivan Baev <ibaev@codeaurora.org>!

llvm-svn: 255247
2015-12-10 13:51:43 +00:00
Akira Hatanaka a3c0e8e1ba Revert r255137.
This commit broke apple's internal bot.

llvm-svn: 255227
2015-12-10 08:00:52 +00:00
Rong Xu 7dd9b1ea75 [PGO] Rename the profdata filename to avoid the conflict b/w tests.
Two tests diag_mismatch.ll and diag_no_funcprofdata.ll generates the same
profdata filename which can conflict in current test runs. This patch
renames them to have different names. 

llvm-svn: 255158
2015-12-09 21:27:59 +00:00
Justin Bogner b7389d6714 IR: Make ConstantDataArray::getFP actually return a ConstantDataArray
The ConstantDataArray::getFP(LLVMContext &, ArrayRef<uint16_t>)
overload has had a typo in it since it was written, where it will
create a Vector instead of an Array. This obviously doesn't work at
all, but it turns out that until r254991 there weren't actually any
callers of this overload. Fix the typo and add some test coverage.

llvm-svn: 255157
2015-12-09 21:21:07 +00:00
Reid Kleckner 54ade23504 [Float2Int] Don't operate on vector instructions
This fixes a crash bug. It's also not clear if we'd want to do this
transform for vectors.

llvm-svn: 255155
2015-12-09 21:08:18 +00:00
Sanjoy Das 9abfb0b429 Use WeakVH to keep track of calls with operand bundles in CloneCodeInfo
`CloneAndPruneIntoFromInst` can DCE instructions after cloning them into
the new function, and so an AssertingVH is too strong.  This change
switches CloneCodeInfo to use a std::vector<WeakVH>.

llvm-svn: 255148
2015-12-09 20:33:52 +00:00
Sanjay Patel b67e6b6044 [InstCombine] fold bitcasts around an extractelement (2nd try)
This is a redo of r255124 (reverted at r255126) with an added check for a
scalar destination type and an added test for the failure seen in Clang's
test/CodeGen/vector.c. The extra test shows a different missing optimization.

Original commit message:

Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255137
2015-12-09 18:57:16 +00:00
Rong Xu f430ae40cf [PGO] Resubmit "MST based PGO instrumentation infrastructure" (r254021)
This new patch fixes a few bugs that exposed in last submit. It also improves
the test cases.
--Original Commit Message--
This patch implements a minimum spanning tree (MST) based instrumentation for
PGO. The use of MST guarantees minimum number of CFG edges getting
instrumented. An addition optimization is to instrument the less executed
edges to further reduce the instrumentation overhead. The patch contains both the
instrumentation and the use of the profile to set the branch weights.

Differential Revision: http://reviews.llvm.org/D12781

llvm-svn: 255132
2015-12-09 18:08:16 +00:00
Mehdi Amini 4e2b7c454c Revert "[InstCombine] fold bitcasts around an extractelement"
This reverts commit r255124.

Broke http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/4193/steps/test/logs/stdio

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255126
2015-12-09 16:31:39 +00:00
Sanjay Patel 07410ed234 [InstCombine] fold bitcasts around an extractelement
Example:
  bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
    --->
  extractelement <2 x float> %X, i32 1

This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543

The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)

Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232

with 2 less specific transforms to catch the case in the bug report.

Differential Revision: http://reviews.llvm.org/D14879

llvm-svn: 255124
2015-12-09 16:17:20 +00:00
Mehdi Amini b000bbdec2 Change hasUniqueInitializer() to call isStrongDefinitionForLinker() instead of !isWeakForLinker()
Summary:
Available_externally global variable with initializer were considered "hasInitializer()",
while obviously it can't match the description:

    Whether the global variable has an initializer, and any changes made to the
    initializer will turn up in the final executable.

since modifying the initializer of an externally available variable does not make sense.

Reviewers: pcc, rafael

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15351

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255123
2015-12-09 16:17:07 +00:00
Sanjoy Das b8dced5dfa Don't drop attributes when inlining through "deopt" operand bundles
Test case attached (test case also checks that we don't drop the calling
convention, but that functionality was correct before this patch).

llvm-svn: 255088
2015-12-09 01:01:28 +00:00
Sanjoy Das 48945cdc15 [OperandBundles] Have PruneEH work correct with operand bundles.
For an invoke with operand bundles, the [op_begin(), op_end()-3] range
can contain things other than invoke arguments.  This change teaches
PruneEH to use arg_begin() and arg_end() explicitly.

llvm-svn: 255073
2015-12-08 23:16:52 +00:00
Reid Kleckner 8de1fe23ed [CGP] Reimplement r255055 a different way
llvm-svn: 255070
2015-12-08 23:00:03 +00:00
Reid Kleckner e18f92bfe9 Revert "[CGP] Check that we have an insert point before moving llvm.dbg.value around"
This reverts commit r255055.

Breakage has been reported.

llvm-svn: 255063
2015-12-08 22:33:23 +00:00
Sanjoy Das 8a954a0553 [OperandBundles] Fix a transform in simplifycfg
Reviewers: pcc, majnemer, reames

Subscribers: reames, llvm-commits

Differential Revision: http://reviews.llvm.org/D15345

llvm-svn: 255062
2015-12-08 22:26:08 +00:00
Reid Kleckner 7c005324d5 [CGP] Check that we have an insert point before moving llvm.dbg.value around
llvm-svn: 255055
2015-12-08 21:50:52 +00:00
Philip Reames 8fc2cbf933 [EarlyCSE] Value forwarding for unordered atomics
This patch teaches the fully redundant load part of EarlyCSE how to forward from atomic and volatile loads and stores, and how to eliminate unordered atomics (only). This patch does not include dead store elimination support for unordered atomics, that will follow in the near future.

The basic idea is that we allow all loads and stores to be tracked by the AvailableLoad table. We store a bit in the table which tracks whether load/store was atomic, and then only replace atomic loads with ones which were also atomic.

No attempt is made to refine our handling of ordered loads or stores. Those are still treated as full fences. We could pretty easily extend the release fence handling to release stores, but that should be a separate patch.

Differential Revision: http://reviews.llvm.org/D15337

llvm-svn: 255054
2015-12-08 21:45:41 +00:00
Mehdi Amini bddfbeaf59 Revert "Add Available Externally linkage type to isWeakForLinker()"
This reverts r255043, as per post-review concern were raised on the correctness.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255045
2015-12-08 19:13:31 +00:00
Mehdi Amini 69e3ae8d4b Cleanup test: remove useless alignment
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255044
2015-12-08 19:02:55 +00:00
Mehdi Amini 37c25fa1d1 Add Available Externally linkage type to isWeakForLinker()
Per LangRef: "Globals with available_externally linkage are
allowed to be discarded at will, and are otherwise the same
as linkonce_odr", since linkonce_odr is in this list it makes
sense to have available_externally there as well.

Reviewers: rafael

Differential Revision: http://reviews.llvm.org/D15323

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255043
2015-12-08 19:01:29 +00:00
Sanjoy Das 683bf070ef [IndVars] Have getInsertPointForUses preserve LCSSA
Summary:
Also add a stricter post-condition for IndVarSimplify.

Fixes PR25578.  Test case by Michael Zolotukhin.

Reviewers: hfinkel, atrick, mzolotukhin

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15059

llvm-svn: 254977
2015-12-08 00:13:21 +00:00
Sanjoy Das b771eb6d69 [SCEVExpander] Have hoistIVInc preserve LCSSA
Summary:
(Note: the problematic invocation of hoistIVInc that caused PR24804 came
from IndVarSimplify, not from SCEVExpander itself)

Fixes PR24804.  Test case by David Majnemer.

Reviewers: hfinkel, majnemer, atrick, mzolotukhin

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15058

llvm-svn: 254976
2015-12-08 00:13:17 +00:00
Sanjoy Das 9fe86d90ab [InstCombine] Call getCmpPredicateForMinMax only with a valid SPF
Summary:
There are `SelectPatternFlavor`s that don't represent min or max idioms,
and we should not be passing those to `getCmpPredicateForMinMax`.

Fixes PR25745.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15249

llvm-svn: 254869
2015-12-05 23:44:22 +00:00
Weiming Zhao 8213072a45 [SimplifyLibCalls] Optimization for pow(x, n) where n is some constant
Summary:
    In order to avoid calling pow function we generate repeated fmul when n is a
    positive or negative whole number.
    
    For each exponent we pre-compute Addition Chains in order to minimize the no.
    of fmuls.
    Refer: http://wwwhomes.uni-bielefeld.de/achim/addition_chain.html
    
    We pre-compute addition chains for exponents upto 32 (which results in a max of
    7 fmuls).

    For eg:
    4 = 2+2
    5 = 2+3
    6 = 3+3 and so on
    
    Hence,
    pow(x, 4.0) ==> y = fmul x, x
                    x = fmul y, y
                    ret x

    For negative exponents, we simply compute the reciprocal of the final result.
    
    Note: This transformation is only enabled under fast-math.
    
    Patch by Mandeep Singh Grang <mgrang@codeaurora.org>

Reviewers: weimingz, majnemer, escha, davide, scanon, joerg

Subscribers: probinson, escha, llvm-commits

Differential Revision: http://reviews.llvm.org/D13994

llvm-svn: 254776
2015-12-04 22:00:47 +00:00
David Majnemer f6665f65b7 [Analysis] Become aware of MSVC's new/delete functions
The compiler can take advantage of the allocation/deallocation
function's properties.  We knew how to do this for Itanium but had no
support for MSVC-style functions.

llvm-svn: 254656
2015-12-03 22:45:19 +00:00
Andrew Kaylor 92b3b16ba3 Move branch folding test to a better location.
llvm-svn: 254640
2015-12-03 19:41:25 +00:00
Andrew Kaylor 412eabdeb2 Fix buildbot failures
llvm-svn: 254636
2015-12-03 19:30:38 +00:00
Andrew Kaylor 9efb2332e2 [WinEH] Avoid infinite loop in BranchFolding for multiple single block funclets
Differential Revision: http://reviews.llvm.org/D14996

llvm-svn: 254629
2015-12-03 18:55:28 +00:00
Rafael Espindola 8c04472edf Delete what is now duplicated code.
Having to import an alias as declaration is not thinlto specific.

The test difference are because when we already have a decl and we are
not importing it, we just leave the decl alone.

llvm-svn: 254556
2015-12-02 22:22:24 +00:00
David Majnemer 942003acc6 Do (A == C1 || A == C2) -> (A & ~(C1 ^ C2)) == C1 rather than (A == C1 || A == C2) -> (A | (C1 ^ C2)) == C2 when C1 ^ C2 is a power of 2.
Differential Revision: http://reviews.llvm.org/D14223

Patch by Amaury SECHET!

llvm-svn: 254518
2015-12-02 16:15:07 +00:00
Evgeniy Stepanov 42f3b12274 [safestack] Protect byval function arguments.
Detect unsafe byval function arguments and move them to the unsafe
stack.

llvm-svn: 254353
2015-12-01 00:40:05 +00:00
Evgeniy Stepanov a4ac3f4bdf [safestack] Fix handling of array allocas.
The current code does not take alloca array size into account and,
as a result, considers any access past the first array element to be
unsafe.

llvm-svn: 254350
2015-12-01 00:06:13 +00:00
Sanjay Patel 8b1fb3daba [InstCombine] add tests to show potential vector IR shuffle transforms
llvm-svn: 254342
2015-11-30 22:39:36 +00:00
Davide Italiano 9c26161b2e [SimplifyLibCalls] Remove useless bits of this tests.
llvm-svn: 254318
2015-11-30 19:38:35 +00:00
Davide Italiano 1aeed6a955 [SimplifyLibCalls] Transform log(exp2(y)) to y*log(2) under fast-math.
llvm-svn: 254317
2015-11-30 19:36:35 +00:00
Davide Italiano 0b14f29285 [SimplifyLibCalls] Don't crash if the function doesn't have a name.
llvm-svn: 254265
2015-11-29 21:58:56 +00:00
Davide Italiano b8b7133c94 [SimplifyLibCalls] Tranform log(pow(x, y)) -> y*log(x).
This one is enabled only under -ffast-math. There are cases where the
difference between the value computed and the correct value is huge
even for ffast-math, e.g. as Steven pointed out:

x = -1, y = -4
log(pow(-1), 4) = 0
4*log(-1) = NaN

I checked what GCC does and apparently they do the same optimization
(which result in the dramatic difference). Future work might try to
make this (slightly) less worse.

Differential Revision:	http://reviews.llvm.org/D14400

llvm-svn: 254263
2015-11-29 20:58:04 +00:00
Diego Novillo 84f06cc835 SamplePGO - Add initial support for inliner annotations.
This adds two thresholds to the sample profiler to affect inlining
decisions: the concept of global hotness and coldness.

Functions that have accumulated more than a certain fraction of samples at
runtime, are annotated with the InlineHint attribute. Conversely,
functions that accumulate less than a certain fraction of samples, are
annotated with the Cold attribute.

This is very similar to the hints emitted by Clang when using
instrumentation profiles.

Notice that this is a very blunt instrument. A function may have
globally collected a significant fraction of samples, but that does not
necessarily mean that every callsite for that function is hot.

Ideally, we would annotate each callsite with the samples collected at
that callsite. This way, the inliner can incorporate all these weights
into its cost model.

Once the inliner offers this functionality, we can change the hints
emitted here to a more precise per-callsite annotation. For now, this is
providing some measure of speedups with our internal benchmarks. I've
observed speedups of up to 23% (though the geo mean is about 3%). I expect
these numbers to improve as the inliner gets better annotations.

llvm-svn: 254212
2015-11-27 23:14:51 +00:00
Charlie Turner 54336a5a4e [LoopVectorize] Use MapVector rather than DenseMap for MinBWs.
The order in which instructions are truncated in truncateToMinimalBitwidths
effects code generation. Switch to a map with a determinisic order, since the
iteration order over a DenseMap is not defined.

This code is not hot, so the difference in container performance isn't
interesting.

Many thanks to David Blaikie for making me aware of MapVector!

Fixes PR25490.

Differential Revision: http://reviews.llvm.org/D14981

llvm-svn: 254179
2015-11-26 20:39:51 +00:00
Rafael Espindola 8934577171 Disallow aliases to available_externally.
They are as much trouble as aliases to declarations. They are requiring
the code generator to define a symbol with the same value as another
symbol, but the second symbol is undefined.

If representing this is important for some optimization, we could add
support for available_externally aliases. They would be *required* to
point to a declaration (or available_externally definition).

llvm-svn: 254170
2015-11-26 19:22:59 +00:00
Benjamin Kramer fb419e71f4 [SimplifyLibCalls] Don't depend on a called function having a name, it might be an indirect call.
Fixes the crasher in PR25651 and related crashers using the same pattern.

llvm-svn: 254145
2015-11-26 09:51:17 +00:00
Evgeniy Stepanov 9842d61ca4 [safestack] Fix alignment of dynamic allocas.
Fixes PR25588.

llvm-svn: 254109
2015-11-25 22:52:30 +00:00
Sanjoy Das 7629346193 [InstCombine] Don't drop operand bundles
Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14857

llvm-svn: 254046
2015-11-25 00:42:19 +00:00
Rong Xu 55fa418a90 Revert r254021
llvm-svn: 254042
2015-11-24 23:57:51 +00:00
Rong Xu 25c106b347 [PGO] Revert revision r254021,r254028,r254035
Revert the above revision due to multiple issues.

llvm-svn: 254040
2015-11-24 23:49:08 +00:00
Teresa Johnson 3930361969 [ThinLTO] Add option to limit importing based on instruction count
Add a simple initial heuristic to control importing based on the number
of instructions recorded in the function's summary. Add option to
control the limit, and test using option.

llvm-svn: 254036
2015-11-24 22:55:46 +00:00
Rong Xu 88cb57aba9 [PGO] Relax test cases in PGO instrumentation
Fix buildbot failure for clang-x86_64-linux-selfhost-modules.
http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/8866
The failing test cases are newly added from r254021. It seems the IR has a
different order in this platform. In this patch, I temporarily relax the test
case to make the build green. I'll have a complete fix (more robust way to test)
soon.

llvm-svn: 254035
2015-11-24 22:50:34 +00:00
Diego Novillo 0b6985a3c6 SamplePGO - Add test for hot/cold inlined functions.
When the original binary is executed and sampled, the resulting profile
contains information on the original inline stack. We currently follow
the original inline plan if we notice that the inlined callsite has more
than 0 samples to it.

A better way is to determine whether the callsite is actually worth
inlining. If the callsite accumulates a small fraction of the samples
spent in the parent function, then we don't want to bother inlining it
(as it means that the callsite is actually cold).

This patch introduces a threshold expressed in percentage of samples
in relation to the parent function.  If the callsite uses less than N%
of the total samples used by its parent, the original inline decision is
not re-applied.

I've set the threshold to the very arbitrary value of 5%. I'm yet to do
any actual experiments to see what's a good value. I wanted to separate
the basic mechanism from the tuning.

llvm-svn: 254034
2015-11-24 22:38:37 +00:00