This is similar to the bug/fix:
https://llvm.org/bugs/show_bug.cgi?id=26211http://reviews.llvm.org/rL258325
The fmin() test case reveals another bug caused by sloppy
code duplication. It will crash without this patch because
fp128 is a valid floating-point type, but we would think
that we had matched a function that used doubles.
The new helper function can be used to replace similar
checks that are used in several other places in this file.
llvm-svn: 258428
This commit extends the patterns recognised by InstSimplify to also handle (x >> y) <= x in the same way as (x /u y) <= x.
The missing optimisation was found investigating why LLVM did not optimise away bound checks in a binary search: https://github.com/rust-lang/rust/pull/30917
Patch by Andrea Canciani!
Differential Revision: http://reviews.llvm.org/D16402
llvm-svn: 258422
This patch adds the instrumentation for indirect call value profiling. It finds all the indirect call-sites and generates instrprof_value_profile intrinsic calls. A new opt level option -disable-vp is introduced to disable this instrumentation.
Reviewers: davidxl, betulb, vsk
Differential Revision: http://reviews.llvm.org/D16016
llvm-svn: 258417
Do not emit profile arc files and note files for module and skeleton
CU's.
Our users report seeing unexpected *.gcda and *.gcno files in their
projects when using gcov-style profiling with modules or frameworks.
The unwanted files come from these modules. This is not very helpful
for end-users. Further, we've seen reports of instrumented programs
crashing while writing these files out (due to I/O failures).
rdar://problem/22838296
Reviewed-by: aprantl
Differential Revision: http://reviews.llvm.org/D15997
llvm-svn: 258406
This change attempts to produce vectorized integer expressions in bit widths
that are narrower than their scalar counterparts. The need for demotion arises
especially on architectures in which the small integer types (e.g., i8 and i16)
are not legal for scalar operations but can still be used in vectors. Like
similar work done within the loop vectorizer, we rely on InstCombine to perform
the actual type-shrinking. We use the DemandedBits analysis and
ComputeNumSignBits from ValueTracking to determine the minimum required bit
width of an expression.
Differential revision: http://reviews.llvm.org/D15815
llvm-svn: 258404
Summary:
This adds a new kind of operand bundle to LLVM denoted by the
`"gc-transition"` tag. Inputs to `"gc-transition"` operand bundle are
lowered into the "transition args" section of `gc.statepoint` by
`RewriteStatepointsForGC`.
This removes the last bit of functionality that was unsupported in the
deopt bundle based code path in `RewriteStatepointsForGC`.
Reviewers: pgavlin, JosephTremoulet, reames
Subscribers: sanjoy, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16342
llvm-svn: 258338
The test case will crash without this patch because the subsequent call to
hasUnsafeAlgebra() assumes that the call instruction is an FPMathOperator
(ie, returns an FP type).
This part of the function signature check was omitted for the sqrt() case,
but seems to be in place for all other transforms.
Before:
http://reviews.llvm.org/rL257400
...we would have needlessly continued execution in optimizeSqrt(), but the
bug was harmless because we'd eventually fail some other check and return
without damage.
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=26211
Differential Revision: http://reviews.llvm.org/D16198
llvm-svn: 258325
Summary:
Funclet EH tables require that a given funclet have only one unwind
destination for exceptional exits. The verifier will therefore reject
e.g. two cleanuprets with different unwind dests for the same cleanup, or
two invokes exiting the same funclet but to different unwind dests.
Because catchswitch has no 'nounwind' variant, and because IR producers
are not *required* to annotate calls which will not unwind as 'nounwind',
it is legal to nest a call or an "unwind to caller" catchswitch within a
funclet pad that has an unwind destination other than caller; it is
undefined behavior for such a call or catchswitch to unwind.
Normally when inlining an invoke, calls in the inlined sequence are
rewritten to invokes that unwind to the callsite invoke's unwind
destination, and "unwind to caller" catchswitches in the inlined sequence
are rewritten to unwind to the callsite invoke's unwind destination.
However, if such a call or "unwind to caller" catchswitch is located in a
callee funclet that has another exceptional exit with an unwind
destination within the callee, applying the normal transformation would
give that callee funclet multiple unwind destinations for its exceptional
exits. There would be no way for EH table generation to determine which
is the "true" exit, and the verifier would reject the function
accordingly.
Add logic to the inliner to detect these cases and leave such calls and
"unwind to caller" catchswitches as calls and "unwind to caller"
catchswitches in the inlined sequence.
This fixes PR26147.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: alexcrichton, llvm-commits
Differential Revision: http://reviews.llvm.org/D16319
llvm-svn: 258273
In some cases, the max backedge taken count can be more conservative
than the exact backedge taken count (for instance, because
ScalarEvolution::getRange is not control-flow sensitive whereas
computeExitLimitFromICmp can be). In these cases,
computeExitLimitFromCond (specifically the bit that deals with `and` and
`or` instructions) can create an ExitLimit instance with a
`SCEVCouldNotCompute` max backedge count expression, but a computable
exact backedge count expression. This violates an implicit SCEV
assumption: a computable exact BE count should imply a computable max BE
count.
This change
- Makes the above implicit invariant explicit by adding an assert to
ExitLimit's constructor
- Changes `computeExitLimitFromCond` to be more robust around
conservative max backedge counts
llvm-svn: 258184
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555
As with D15937, the intent of the patch is to preserve the current behavior of the transform
except that we use the pow call's 'fast' attribute as a trigger rather than a function-level
attribute.
The TODO comment notes a potential follow-on patch that would propagate FMF to the new
instructions.
Differential Revision: http://reviews.llvm.org/D16122
llvm-svn: 258153
The fix uniques the bundle of getelementptr indices we are about to vectorize
since it's possible for the same index to be used by multiple instructions.
The original commit message is below.
[SLP] Vectorize the index computations of getelementptr instructions.
This patch seeds the SLP vectorizer with getelementptr indices. The primary
motivation in doing so is to vectorize gather-like idioms beginning with
consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these
cases could be vectorized with a top-down phase, seeding the existing bottom-up
phase with the index computations avoids the complexity, compile-time, and
phase ordering issues associated with a full top-down pass. Only bundles of
single-index getelementptrs with non-constant differences are considered for
vectorization.
llvm-svn: 257918
This contains a fix for the issue that caused the revert:
we no longer assume that we can insert instructions after the
instruction that produces the base pointer. We previously
assumed that this would be ok, because the instruction produces
a value and therefore is not a terminator. This is false for invoke
instructions. We will now insert these new instruction directly
at the location of the users.
Original commit message:
[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.
InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.
This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.
Reviewers: majnemer, jmolloy
Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits
Differential Revision: http://reviews.llvm.org/D15146
llvm-svn: 257897
There are several requirements that ended up with this design;
1. Matching bitreversals is too heavyweight for InstCombine and doesn't really need to be done so early.
2. Bitreversals and byteswaps are very related in their matching logic.
3. We want to implement support for matching more advanced bswap/bitreverse patterns like partial bswaps/bitreverses.
4. Bswaps are best matched early in InstCombine.
The result of these is that a new utility function is created in Transforms/Utils/Local.h that can be configured to search for bswaps, bitreverses or both. InstCombine uses it to find only bswaps, CGP uses it to find only bitreversals.
We can then extend the matching logic in one place only.
llvm-svn: 257875
I originally reapplied this in 257550, but had to revert again due to bot
breakage. The only change in this version is to allow either the TypeSize
or the TypeAllocSize of the variable to be the one represented in debug info
(hopefully in the future we can figure out how to encode the difference).
Additionally, several bot failures following r257550, were due to
optimizer bugs now fixed in r257787 and r257795.
r257550 commit message was:
```
The follow extra changes were made to test cases:
Manually making the variable be the actual type instead of a pointer
to avoid pointer-size differences in generic code:
LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll
LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll
LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll
LLVM :: DebugInfo/Generic/varargs.ll
Delete sizing information from debug info for the same reason
(but the presence of the pointer was important to the test case):
LLVM :: DebugInfo/Generic/restrict.ll
LLVM :: DebugInfo/Generic/tu-composite.ll
LLVM :: Linker/type-unique-type-array-a.ll
LLVM :: Linker/type-unique-simple2.ll
Fixing an incorrect DW_OP_deref
LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll
Fixing a missing DW_OP_deref
LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll
Additionally, clang should no longer complain during bootstrap should no
longer happen after r257534.
The original commit message was:
``
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.
One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.
Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
variable as that would depend on the target, even though this test is
supposed to be generic
- I had to manually declared size/align for reference type. See also the
discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
``
```
llvm-svn: 257850
This patch seeds the SLP vectorizer with getelementptr indices. The primary
motivation in doing so is to vectorize gather-like idioms beginning with
consecutive loads (e.g., g[a[0] - b[0]] + g[a[1] - b[1]] + ...). While these
cases could be vectorized with a top-down phase, seeding the existing bottom-up
phase with the index computations avoids the complexity, compile-time, and
phase ordering issues associated with a full top-down pass. Only bundles of
single-index getelementptrs with non-constant differences are considered for
vectorization.
Differential Revision: http://reviews.llvm.org/D14829
llvm-svn: 257800
Summary: If SROA creates only one piece (e.g. because the other is not needed),
it still needs to create a bit_piece expression if that bit piece is smaller
than the original size of the alloca.
Reviewers: aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16187
llvm-svn: 257795
Summary: The dbg.declare -> dbg.value conversion did not check which operand of
the store instruction the alloca was passed to. As a result code that stored the
address of an alloca, rather than storing to the alloca, would still trigger
the conversion routine, leading to the insertion of an incorrect dbg.value
intrinsic.
Reviewers: aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16169
llvm-svn: 257787
This patch turns off the fast-math optimization attribute on the caller
if the callee's fast-math attribute is not turned on.
For example,
- before inlining
caller: "less-precise-fpmad"="true"
callee: "less-precise-fpmad"="false"
- after inlining
caller: "less-precise-fpmad"="false"
Alternatively, it's possible to block inlining if the caller's and
callee's attributes don't match. If this approach is preferable to the
one in this patch, we can discuss post-commit.
rdar://problem/19836465
Differential Revision: http://reviews.llvm.org/D7802
llvm-svn: 257575
The follow extra changes were made to test cases:
Manually making the variable be the actual type instead of a pointer
to avoid pointer-size differences in generic code:
LLVM :: DebugInfo/Generic/2010-03-24-MemberFn.ll
LLVM :: DebugInfo/Generic/2010-04-06-NestedFnDbgInfo.ll
LLVM :: DebugInfo/Generic/2010-05-03-DisableFramePtr.ll
LLVM :: DebugInfo/Generic/varargs.ll
Delete sizing information from debug info for the same reason
(but the presence of the pointer was important to the test case):
LLVM :: DebugInfo/Generic/restrict.ll
LLVM :: DebugInfo/Generic/tu-composite.ll
LLVM :: Linker/type-unique-type-array-a.ll
LLVM :: Linker/type-unique-simple2.ll
Fixing an incorrect DW_OP_deref
LLVM :: DebugInfo/Generic/2010-05-03-OriginDIE.ll
Fixing a missing DW_OP_deref
LLVM :: DebugInfo/Generic/incorrect-variable-debugloc.ll
Additionally, clang should no longer complain during bootstrap should no
longer happen after r257534.
The original commit message was:
```
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.
One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.
Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
variable as that would depend on the target, even though this test is
supposed to be generic
- I had to manually declared size/align for reference type. See also the
discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
```
llvm-svn: 257550
Summary: The dbg.declare -> dbg.value conversion looks through any zext/sext
to find a value to describe the variable (in the expectation that those
zext/sext instruction will go away later). However, those values do not
cover the entire variable and thus need a DW_OP_bit_piece.
Reviewers: aprantl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D16061
llvm-svn: 257534
The findExternalCalls routine ignores calls to functions already
defined in the dest module. This was not handling the case where
the definition in the current module is actually an alias to a
function call.
llvm-svn: 257493
Prepatory patch before changing LibCallSimplifier to use the FMF.
Also, tighten the CHECK lines and give the tests more meaningful names.
Similar changes to:
http://reviews.llvm.org/rL257414
llvm-svn: 257481
Function::copyAttributesFrom will copy the personality function, prefix
data and prolog data from the source function to the new function, and
is invoked when the IRMover copies the function prototype. This puts a
reference to a constant in the source module on a function in the dest
module, which causes an error when deleting the source module after
importing, since the personality function in the source module still has
uses (this would presumably also be an issue for the prologue and prefix
data). Remove the copies added to the dest copy when creating the new
prototype, as they are mapped properly when/if we link the function body.
llvm-svn: 257420
Currently we're unrolling loops more in minsize than in optsize, which
means -Oz will have a larger code size than -Os. That doesn't make any
sense.
This resolves the FIXME about this in LoopUnrollPass and extends the
optsize test to make sure we use the smaller threshold for minsize as
well.
llvm-svn: 257402
This is a continuation of adding FMF to call instructions:
http://reviews.llvm.org/rL255555
The intent of the patch is to preserve the current behavior of the transform except
that we use the sqrt instruction's 'fast' attribute as a trigger rather than the
function-level attribute.
But this raises a bug noted by the new FIXME comment.
In order to do this transform:
sqrt((x * x) * y) ---> fabs(x) * sqrt(y)
...we need all of the sqrt, the first fmul, and the second fmul to be 'fast'.
If any of those ops is strict, we should bail out.
Differential Revision: http://reviews.llvm.org/D15937
llvm-svn: 257400
Summary:
This is a fix of D13718. D13718 was committed but then reverted because of the following bug:
https://llvm.org/bugs/show_bug.cgi?id=25299
This patch fixes the issue shown in the bug.
Reviewers: majnemer, reames
Subscribers: jevinskie, llvm-commits
Differential Revision: http://reviews.llvm.org/D14308
llvm-svn: 257277
Summary:
This is analogous to r256079, which removed an overly strong assertion, and
r256812, which simplified the code by replacing three conditionals by one.
Reviewers: reames
Subscribers: sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D16019
llvm-svn: 257250
This patch teaches rewrite-statepoints-for-gc to relocate vector-of-pointers directly rather than trying to split them. This builds on the recent lowering/IR changes to allow vector typed gc.relocates.
The motivation for this is that we recently found a bug in the vector splitting code where depending on visit order, a vector might not be relocated at some safepoint. Specifically, the bug is that the splitting code wasn't updating the side tables (live vector) of other safepoints. As a result, a vector which was live at two safepoints might not be updated at one of them. However, if you happened to visit safepoints in post order over the dominator tree, everything worked correctly. Weirdly, it turns out that post order is actually an incredibly common order to visit instructions in in practice. Frustratingly, I have not managed to write a test case which actually hits this. I can only reproduce it in large IR files produced by actual applications.
Rather than continue to make this code more complicated, we can remove all of the complexity by just representing the relocation of the entire vector natively in the IR.
At the moment, the new functionality is hidden behind a flag. To use this code, you need to pass "-rs4gc-split-vector-values=0". Once I have a chance to stress test with this option and get feedback from other users, my plan is to flip the default and remove the original splitting code. I would just remove it now, but given the rareness of the bug, I figured it was better to leave it in place until the new approach has been stress tested.
Differential Revision: http://reviews.llvm.org/D15982
llvm-svn: 257244
Look for PHI/Select in the same BB of the form
bb:
%p = phi [false, %bb1], [true, %bb2], [false, %bb3], [true, %bb4], ...
%s = select p, trueval, falseval
And expand the select into a branch structure. This later enables
jump-threading over bb in this pass.
Using the similar approach of SimplifyCFG::FoldCondBranchOnPHI(), unfold
select if the associated PHI has at least one constant. If the unfolded
select is not jump-threaded, it will be folded again in the later
optimizations.
llvm-svn: 257198
It's strange that LoopInfo mostly owns the Loop objects, but that it
defers deleting them to the loop pass manager. Instead, change the
oddly named "updateUnloop" to "markAsRemoved" and have it queue the
Loop object for deletion. We can't delete the Loop immediately when we
remove it, since we need its pointer identity still, so we'll mark the
object as "invalid" so that clients can see what's going on.
llvm-svn: 257191
The function importer was still materializing metadata when modules were
loaded for function importing. We only want to materialize it when we
are going to invoke the metadata linking postpass. Materializing it
before function importing is not only unnecessary, but also causes
metadata referenced by imported functions to be mapped in early, and
then not connected to the rest of the module level metadata when it is
ultimately linked in.
Augmented the test case to specifically check for the metadata being
properly connected, which it wasn't before this fix.
llvm-svn: 257171
In setInsertionPoint if the value is not a PHI, Instruction or
Argument it should be a Constant, not a ConstantExpr.
Original commit message:
[InstCombine] Look through PHIs, GEPs, IntToPtrs and PtrToInts to expose more constants when comparing GEPs
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.
InstCombine was already doing this when comparing two GEPs if the base
pointers were the same. However, in the case where we have complex
pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs, conversions to
or from integers, etc) the value of the original base pointer will be
hidden to the optimizer and this transformation will be disabled.
This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the relevant
uses of GEPs to GEPs with a common base pointer. The GEP comparison
will be converted to a comparison done on indices.
Reviewers: majnemer, jmolloy
Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits
Differential Revision: http://reviews.llvm.org/D15146
llvm-svn: 257164
a top-down manner into a true top-down or RPO pass over the call graph.
There are specific patterns of function attributes, notably the
norecurse attribute, which are most effectively propagated top-down
because all they us caller information.
Walk in RPO over the call graph SCCs takes the form of a module pass run
immediately after the CGSCC pass managers postorder walk of the SCCs,
trying again to deduce norerucrse for each singular SCC in the call
graph.
This removes a very legacy pass manager specific trick of using a lazy
revisit list traversed during finalization of the CGSCC pass. There is
no analogous finalization step in the new pass manager, and a lazy
revisit list is just trying to produce an RPO iteration of the call
graph. We can do that more directly if more expensively. It seems
unlikely that this will be the expensive part of any compilation though
as we never examine the function bodies here. Even in an LTO run over
a very large module, this should be a reasonable fast set of operations
over a reasonably small working set -- the function call graph itself.
In the future, if this really is a compile time performance issue, we
can look at building support for both post order and RPO traversals
directly into a pass manager that builds and maintains the PO list of
SCCs.
Differential Revision: http://reviews.llvm.org/D15785
llvm-svn: 257163
Limit this transform to a basic block and guard against PHIs.
Hopefully, this fixes the remaining failures in PR25999:
https://llvm.org/bugs/show_bug.cgi?id=25999
llvm-svn: 257133
Summary:
Teach the Verifier to make sure that the storage size given to llvm.dbg.declare
or the value size given to llvm.dbg.value agree with what is declared in
DebugInfo. This is implicitly assumed in a number of passes (e.g. in SROA).
Additionally this catches a number of common mistakes, such as passing a
pointer when a value was intended or vice versa.
One complication comes from stack coloring which modifies the original IR when
it merges allocas in order to make sure that if AA falls back to the IR it gets
the correct result. However, given this new invariant, indiscriminately
replacing one alloca by a different (differently sized one) is no longer valid.
Fix this by just undefing out any use of the alloca in a dbg.declare in this
case.
Additionally, I had to fix a number of test cases. Of particular note:
- I regenerated dbg-changes-codegen-branch-folding.ll from the given source as
it was affected by the bug fixed in r256077
- two-cus-from-same-file.ll was changed to avoid having a variable-typed debug
variable as that would depend on the target, even though this test is
supposed to be generic
- I had to manually declared size/align for reference type. See also the
discussion for D14275/r253186.
- fpstack-debuginstr-kill.ll required changing `double` to `long double`
- most others were just a question of adding OP_deref
Reviewers: aprantl
Differential Revision: http://reviews.llvm.org/D14276
llvm-svn: 257105
We marked values which are 'undef' as constant instead of undefined
which violates SCCP's invariants. If we can figure out that a
computation results in 'undef', leave it in the undefined state.
This fixes PR16052.
llvm-svn: 257102
The fix for PR23999 made us mark loads of null as producing the constant
undef which upsets the lattice. Instead, keep the load as "undefined".
This fixes PR26044.
llvm-svn: 257087
Summary:
When comparing two GEP instructions which have the same base pointer
and one of them has a constant index, it is possible to only compare
indices, transforming it to a compare with a constant. This removes
one use for the GEP instruction with the constant index, can reduce
register pressure and can sometimes lead to removing the comparisson
entirely.
InstCombine was already doing this when comparing two GEPs if the
base pointers were the same. However, in the case where we have
complex pointer arithmetic (GEPs applied to GEPs, PHIs of GEPs,
conversions to or from integers, etc) the value of the original
base pointer will be hidden to the optimizer and this transformation
will be disabled.
This change detects when the two sides of the comparison can be
expressed as GEPs with the same base pointer, even if they don't
appear as such in the IR. The transformation will convert all the
pointer arithmetic to arithmetic done on indices and all the
relevant uses of GEPs to GEPs with a common base pointer. The
GEP comparison will be converted to a comparison done on indices.
Reviewers: majnemer, jmolloy
Subscribers: hfinkel, jevinskie, jmolloy, aadg, llvm-commits
Differential Revision: http://reviews.llvm.org/D15146
llvm-svn: 257064
Summary: This patch adds a check in SplitLandingPadPredecessors to see if the original landingpad instruction has any uses. If not, we don't need to create a PHINode for it in the joint block since it's gonna be a dead code anyway. The motivation for this patch is that we found a bug that SplitLandingPadPredecessors created a PHINode of token type landingpad, which failed the verifier since PHINode can not be token type. However, the created PHINode will never be used in our code pattern. This patch will workaround this bug, and we might add supports in SplitLandingPadPredecessors to handle token type landingpad with uses in the future.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15835
llvm-svn: 256972
Summary: As per title. This will allow the optimizer to pick up on it.
Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15923
llvm-svn: 256969
Summary: It turns out that if we don't try to do it at the store location, we can do it before any operation that alias the load, as long as no operation alias the store.
Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc, joker.eph
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15903
llvm-svn: 256923
Most of the properties of memset_pattern16 can be now covered by the generic attributes and inferred by InferFunctionAttrs. The only exceptions are:
- We don't yet have a writeonly attribute for the first argument.
- We don't have an attribute for modeling the access size facts encoded in MemoryLocation.cpp.
Differential Revision: http://reviews.llvm.org/D15879
llvm-svn: 256911
Summary:
Most of the tool chain is able to optimize scalar and memcpy like operation effisciently while it isn't that good with aggregates. In order to improve the support of aggregate, we try to change aggregate manipulation into either scalar or memcpy like ones whenever possible without loosing informations.
This is one such opportunity.
Reviewers: craig.topper, spatel, dexonsmith, Prazek, chandlerc
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15894
llvm-svn: 256868
In r256814, we managed to remove catchpads which were trivially redudant
because they were the same SSA value. We can do better using the same
algorithm but with a smarter datastructure by hashing the SSA values
within the catchpad and comparing them structurally.
llvm-svn: 256815
Summary:
At least for CoreCLR, a catchpad which immediately executes an
`unreachable` instruction indicates that the exception can never have a
matching type, and so such catchpads can be removed, and so can their
catchswitches if the catchswitch becomes empty.
Reviewers: rnk, andrew.w.kaylor, majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15846
llvm-svn: 256809
Summary: This patch fixes a bug in prepareICWorklistFromFunction, where the loop becomes infinite with instructions of token type. The patch checks if the instruction is token type, and if so it updates EndInst with the current instruction.
Reviewers: reames, majnemer
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D15859
llvm-svn: 256792
r256763 had promoteLoopAccessesToScalars check for the existence of a
catchswitch when the exit blocks were populated but
promoteLoopAccessesToScalars may be called with a prepopulated set of
exit blocks which would also need to be checked.
This fixes PR26019.
llvm-svn: 256788
This patch removes the isOperatorNewLike predicate since it was only being used to establish a non-null return value and we have attributes specifically for that purpose with generic handling. To keep approximate the same behaviour for existing frontends, I added the various operator new like (i.e. instances of operator new) to InferFunctionAttrs. It's not really clear to me why this isn't handled in Clang, but I didn't want to break existing code and any subtle assumptions it might have.
Once this patch is in, I'm going to start separating the isAllocLike family of predicates. These appear to be being used for a mixture of things which should be more clearly separated and documented. Today, they're being used to indicate (at least) aliasing facts, CSE-ability, and default values from an allocation site.
Differential Revision: http://reviews.llvm.org/D15820
llvm-svn: 256787
Before reevaluating instructions, iterate over all instructions
to be reevaluated and remove trivially dead instructions and if
any of it's operands become trivially dead, mark it for deletion
until all trivially dead instructions have been removed
llvm-svn: 256773
We had two bugs here:
- We might try to sink into a catchswitch, causing verifier failures.
- We will succeed in sinking into a cleanuppad but we didn't update the
funclet operand bundle.
This fixes PR26000.
llvm-svn: 256728
Summary:
There are a number of files in the tree which have been accidentally checked in with DOS line endings. Convert these to native line endings.
There are also a few files which have DOS line endings on purpose, and I have set the svn:eol-style property to 'CRLF' on those.
Reviewers: joerg, aaron.ballman
Subscribers: aaron.ballman, sanjoy, dsanders, llvm-commits
Differential Revision: http://reviews.llvm.org/D15848
llvm-svn: 256707
The code that was meant to adjust the duplication cost based on the
terminator opcode was not being executed in cases where the initial
threshold was hit inside the loop.
Subscribers: mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D15536
llvm-svn: 256568
Summary:
Previously, only the outer (last) bitcast was rematerialized, resulting in a
use of the unrelocated inner (first) bitcast after the statepoint. See the
test case for an example.
Reviewers: igor-laevsky, reames
Subscribers: reames, alex, llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D15789
llvm-svn: 256520
a standalone pass.
There is no call graph or even interesting analysis for this part of
function attributes -- it is literally inferring attributes based on the
target library identification. As such, we can do it using a much
simpler module pass that just walks the declarations. This can also
happen much earlier in the pass pipeline which has benefits for any
number of other passes.
In the process, I've cleaned up one particular aspect of the logic which
was necessary in order to separate the two passes cleanly. It now counts
inferred attributes independently rather than just counting all the
inferred attributes as one, and the counts are more clearly explained.
The two test cases we had for this code path are both ... woefully
inadequate and copies of each other. I've kept the superset test and
updated it. We need more testing here, but I had to pick somewhere to
stop fixing everything broken I saw here.
Differential Revision: http://reviews.llvm.org/D15676
llvm-svn: 256466
is (by default) run much earlier than FuncitonAttrs proper.
This allows forcing optnone or other widely impactful attributes. It is
also a bit simpler as the force attribute behavior needs no specific
iteration order.
I've added the pass into the default module pass pipeline and LTO pass
pipeline which mirrors where function attrs itself was being run.
Differential Revision: http://reviews.llvm.org/D15668
llvm-svn: 256465
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint.
Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob
Subscribers: reames, mjacob, sanjoy, llvm-commits
Differential Revision: http://reviews.llvm.org/D15662
llvm-svn: 256443
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229
The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.
The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109
For that example, the IR becomes:
%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2
And x86 SSE output improves from:
movq (%rdi), %xmm1 ## xmm1 = mem[0],zero
movdqa %xmm1, %xmm2
shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3]
shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0]
retq
To the almost optimal:
movhpd (%rdi), %xmm0
Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases.
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.
Differential Revision: http://reviews.llvm.org/D15096
llvm-svn: 256394
Previously, "%" + name of the value was printed for each derived and base
pointer. This is correct for instructions, but wrong for e.g. globals.
llvm-svn: 256305
This patch removes all weight-related interfaces from BPI and replace
them by probability versions. With this patch, we won't use edge weight
anymore in either IR or MC passes. Edge probabilitiy is a better
representation in terms of CFG update and validation.
Differential revision: http://reviews.llvm.org/D15519
llvm-svn: 256263
Summary:
These were deprecated 11 months ago when a generic
llvm.experimental.gc.result intrinsic, which works for all types, was added.
Reviewers: sanjoy, reames
Subscribers: sanjoy, chenli, llvm-commits
Differential Revision: http://reviews.llvm.org/D15719
llvm-svn: 256262
Summary:
Previously, RS4GC crashed in CreateGCRelocates() because it assumed
that every base is also in the array of live variables, which isn't true if a
live variable has a constant base.
This change fixes the crash by making sure CreateGCRelocates() won't try to
relocate a live variable with a constant base. This would be unnecessary
anyway because anything with a constant base won't move.
Reviewers: reames
Subscribers: llvm-commits, sanjoy
Differential Revision: http://reviews.llvm.org/D15556
llvm-svn: 256252
This uses the same criteria used in CFE's CodeGenPGO to identify hot and cold
callees and uses values of inlinehint-threshold and inlinecold-threshold
respectively as the thresholds for such callees.
Differential Revision: http://reviews.llvm.org/D15245
llvm-svn: 256222
This patch adds an option, -safe-stack-no-tls, for using normal
storage instead of thread-local storage for the unsafe stack pointer.
This can be useful when SafeStack is applied to an operating system
kernel.
http://reviews.llvm.org/D15673
Patch by Michael LeMay.
llvm-svn: 256221
`CloneAndPruneIntoFromInst` sometimes RAUW's dead instructions with
`undef` before erasing them (to avoid deleting instructions that still
have uses). This changes the `WeakVH` in `OperandBundleCallSites` to
hold an `undef`, and we need to guard for this situation in eventuality
in `llvm::InlineFunction`.
llvm-svn: 256110
As shown by the included test case, it's reasonable to end up with constant references during base pointer calculation. The code actually handled this case just fine, we only had the assert to help isolate problems under the belief that constant references shouldn't be present in IR generated by managed frontends. This turned out to be wrong on two fronts: 1) Manual Jacobs is working on a language with constant references, and b) we found a case where the optimizer does create them in practice.
llvm-svn: 256079
Summary:
First up is instcombine, where in the dbg.declare -> dbg.value conversion,
the llvm.dbg.value needs to be called on the actual loaded value, rather
than the address (since the whole point of this transformation is to be
able to get rid of the alloca). Further, now that that's cleaned up, we
can remove a hack in the backend, that would add an implicit OP_deref if
the argument to dbg.value was an alloca. This stems from before the
existence of DIExpression and is no longer necessary since the deref can
be expressed explicitly.
Now, in order to make sure that the tests pass with this change, we need to
correct the printing of DEBUG_VALUE comments to take into account the
expression, which wasn't taken into account before.
Unfortunately, for both these changes, there were a number of incorrect
test cases (mostly the wrong number of DW_OP_derefs, but also a couple
where the test itself was broken more badly). aprantl and I have gone
through and adjusted these test case in order to make them pass with
these fixes and in some cases to make sure they're actually testing
what they are meant to test.
Reviewers: aprantl
Subscribers: dsanders
Differential Revision: http://reviews.llvm.org/D14186
llvm-svn: 256077
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.
llvm-svn: 256075
Summary:
If Candiadte may have a different type from GEP, we should bitcast or
pointer cast it to GEP's type so that the later RAUW doesn't complain.
Added a test in nary-gep.ll
Reviewers: tra, meheff
Subscribers: mcrosier, llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D15618
llvm-svn: 256035
This extends the same line of reasoning used in EarlyCSE w/http://reviews.llvm.org/D15352 to the DSE implementation in InstCombine.
Key points:
* We only remove unordered or simple stores.
* The loads producing values consumed by dead stores don't influence whether the store is dead.
Differential Revision: http://reviews.llvm.org/D15354
llvm-svn: 255932
The rules for removing trivially dead stores are a lot less complicated than loads. Since we know the later store post dominates the former and the former dominates the later, unless the former has side effects other than the actual store, we can remove it. One slightly surprising thing is that we can freely remove atomic stores, even if the later one isn't atomic. There's no guarantee the atomic one was every visible.
For the moment, we don't handle DSE of ordered atomic stores. We could extend the same chain of reasoning to them, but the catch is we'd then have to model the ordering effect without a store instruction. Since our fences are a stronger than our operation orderings, simple using a fence isn't an obvious win. This arguable calls for a refinement in our fence specification, but that's (much) later work.
Differential Revision: http://reviews.llvm.org/D15352
llvm-svn: 255914
Summary:
Second patch split out from http://reviews.llvm.org/D14752.
Maps metadata as a post-pass from each module when importing complete,
suturing up final metadata to the temporary metadata left on the
imported instructions.
This entails saving the mapping from bitcode value id to temporary
metadata in the importing pass, and from bitcode value id to final
metadata during the metadata linking postpass.
Depends on D14825.
Reviewers: dexonsmith, joker.eph
Subscribers: davidxl, llvm-commits, joker.eph
Differential Revision: http://reviews.llvm.org/D14838
llvm-svn: 255909
In conditional store merging, we were creating PHIs when we didn't
need to. If the value to be predicated isn't defined in the block
we're predicating, then it doesn't need a PHI at all (because we only
deal with triangles and diamonds, any value not in the predicated BB
must dominate the predicated BB).
This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite.
Now with a fix (and fixed tests) for the conformance issue seen in Chromium.
llvm-svn: 255767
Extend EarlyCSE with an additional style of dead store elimination. If we write back a value just read from that memory location, we can eliminate the store under the assumption that the value hasn't changed.
I'm implementing this mostly because I noticed the omission when looking at the code. It seemed strange to have InstCombine have a peephole which was more powerful than EarlyCSE. :)
Differential Revision: http://reviews.llvm.org/D15397
llvm-svn: 255739
This patch allows atomic loads and stores of floating point to be specified in the IR and adds an adapter to allow them to be lowered via existing backend support for bitcast-to-equivalent-integer idiom.
Previously, the only way to specify a atomic float operation was to bitcast the pointer to a i32, load the value as an i32, then bitcast to a float. At it's most basic, this patch simply moves this expansion step to the point we start lowering to the backend.
This patch does not add canonicalization rules to convert the bitcast idioms to the appropriate atomic loads. I plan to do that in the future, but for now, let's simply add the support. I'd like to get instruction selection working through at least one backend (x86-64) without the bitcast conversion before canonicalizing into this form.
Similarly, I haven't yet added the target hooks to opt out of the lowering step I added to AtomicExpand. I figured it would more sense to add those once at least one backend (x86) was ready to actually opt out.
As you can see from the included tests, the generated code quality is not great. I plan on submitting some patches to fix this, but help from others along that line would be very welcome. I'm not super familiar with the backend and my ramp up time may be material.
Differential Revision: http://reviews.llvm.org/D15471
llvm-svn: 255737
An LTO pass that generates a __cfi_check() function that validates a
call based on a hash of the call-site-known type and the target
pointer.
llvm-svn: 255693
(This is the third attempt to check in this patch, and the first two are r255454
and r255460. The once failed test file reg-usage.ll is now moved to
test/Transform/LoopVectorize/X86 directory with target datalayout and target
triple indicated.)
LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.
Differential revision: http://reviews.llvm.org/D15177
llvm-svn: 255691
This is the last general step to allow more IR-level speculation with a safety harness in place in CodeGenPrepare.
The intent is to restore the behavior enabled by:
http://reviews.llvm.org/rL228826
but prevent bad performance such as:
https://llvm.org/bugs/show_bug.cgi?id=24818
Earlier patches in this sequence:
D12882 (disable SimplifyCFG speculation for expensive instructions)
D13297 (have CGP despeculate expensive ops)
D14630 (have CGP despeculate special versions of cttz/ctlz)
As shown in the test cases, we only have two instructions currently affected: ctz for some x86 and fdiv generally.
Allowing exactly one expensive instruction is a bit of a hack, but it lines up with what is currently implemented
in CGP. If we make the despeculation more general in CGP, we can make the speculation here more liberal.
A follow-up patch will adjust the cost for sqrt and possibly other typically expensive math intrinsics (currently
everything is cheap by default). GPU targets would likely want to override those expensive default costs (just as
they probably should already override the cost of div/rem) because just about any math is cheaper than control-flow
on those targets.
Differential Revision: http://reviews.llvm.org/D15213
llvm-svn: 255660
Summary:
The LibCallSimplifier will turn llvm.exp2.* intrinsics into ldexp* libcalls
which do not make sense with the AMDGPU backend.
In the long run, we'll want an llvm.ldexp.* intrinsic to properly make use of
this optimization, but this works around the problem for now.
See also: http://reviews.llvm.org/D14327 (suggested llvm.ldexp.* implementation)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92709
Reviewers: arsenm, tstellarAMD
Differential Revision: http://reviews.llvm.org/D14990
llvm-svn: 255658
For non padded structs, we can just proceed and deaggregate them.
We don't want ot do this when there is padding in the struct as to not
lose information about this padding (the subsequents passes would then
try hard to preserve the padding, which is undesirable).
Also update extractvalue.ll and cast.ll so that they use structs with padding.
Remove the FIXME in the extractvalue of laod case as the non padded case is
handled when processing the load, and we don't want to do it on the padded
case.
Patch by: Amaury SECHET <deadalnix@gmail.com>
Differential Revision: http://reviews.llvm.org/D14483
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255600
Profile symbols have long prefixes which waste space and creating pressure for linker.
This patch shortens the prefixes to minimal length without losing verbosity.
Differential Revision: http://reviews.llvm.org/D15503
llvm-svn: 255575
This patch adds optional fast-math-flags (the same that apply to fmul/fadd/fsub/fdiv/frem/fcmp)
to call instructions in IR. Follow-up patches would use these flags in LibCallSimplifier, add
support to clang, and extend FMF to the DAG for calls.
Motivating example:
%y = fmul fast float %x, %x
%z = tail call float @sqrtf(float %y)
We'd like to be able to optimize sqrt(x*x) into fabs(x). We do this today using a function-wide
attribute for unsafe-math, but we really want to trigger on the instructions themselves:
%z = tail call fast float @sqrtf(float %y)
because in an LTO build it's possible that calls with fast semantics have been inlined into a
function with non-fast semantics.
The code changes and tests are based on the recent commits that added "notail":
http://reviews.llvm.org/rL252368
and added FMF to fcmp:
http://reviews.llvm.org/rL241901
Differential Revision: http://reviews.llvm.org/D14707
llvm-svn: 255555
It turns out that terminatepad gives little benefit over a cleanuppad
which calls the termination function. This is not sufficient to
implement fully generic filters but MSVC doesn't support them which
makes terminatepad a little over-designed.
Depends on D15478.
Differential Revision: http://reviews.llvm.org/D15479
llvm-svn: 255522
In conditional store merging, we were creating PHIs when we didn't
need to. If the value to be predicated isn't defined in the block
we're predicating, then it doesn't need a PHI at all (because we only
deal with triangles and diamonds, any value not in the predicated BB
must dominate the predicated BB).
This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite.
llvm-svn: 255489
(This is the second attempt to check in this patch: REQUIRES: asserts is added
to reg-usage.ll now.)
LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.
Differential revision: http://reviews.llvm.org/D15177
llvm-svn: 255460
LoopVectorizationCostModel::calculateRegisterUsage() is used to estimate the
register usage for specific VFs. However, it takes into account many
instructions that won't be vectorized, such as induction variables,
GetElementPtr instruction, etc.. This makes the loop vectorizer too conservative
when choosing VF. In this patch, the induction variables that won't be
vectorized plus GetElementPtr instruction will be added to ValuesToIgnore set
so that their register usage won't be considered any more.
Differential revision: http://reviews.llvm.org/D15177
llvm-svn: 255454
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).
This is the second try. The fix in this patch is very localized.
Only profile symbol names of profile symbols with internal
linkage are fixed up while initializer of name syms are not
changes. This means there is no format change nor version bump.
llvm-svn: 255434
This change was discussed in D15392. It allows us to remove the fold that was added
in:
http://reviews.llvm.org/r255261
...and it will allow us to generalize this fold:
http://reviews.llvm.org/rL112232
while preserving the order of bitcast + extract that it produces and testing shows
is better handled by the backend.
Note that the existing check for "isVectorTy()" wasn't strong enough in general
and specifically because: x86_mmx. It's not a vector, but it's not vectorizable
either. So here we check VectorType::isValidElementType() directly before
proceeding with the transform.
llvm-svn: 255433
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
but they are difficult to explain to others, even to seasoned LLVM
experts.
- catchendpad and cleanupendpad are optimization barriers. They cannot
be split and force all potentially throwing call-sites to be invokes.
This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
It is unsplittable, starts a funclet, and has control flow to other
funclets.
- The nesting relationship between funclets is currently a property of
control flow edges. Because of this, we are forced to carefully
analyze the flow graph to see if there might potentially exist illegal
nesting among funclets. While we have logic to clone funclets when
they are illegally nested, it would be nicer if we had a
representation which forbade them upfront.
Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
flow, just a bunch of simple operands; catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad. Their presence can be inferred
implicitly using coloring information.
N.B. The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for. An expert should take a
look to make sure the results are reasonable.
Reviewers: rnk, JosephTremoulet, andrew.w.kaylor
Differential Revision: http://reviews.llvm.org/D15139
llvm-svn: 255422
This change is discussed in D15392 and should allow us to effectively
revert:
http://llvm.org/viewvc/llvm-project?view=revision&revision=255261
if we canonicalize bitcasts ahead of extracts.
It should be safe to convert any pair of bitcasts into a single bitcast,
however, it was mentioned here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110829/127089.html
that we're not allowed to bitcast from an x86_mmx to some other types, but I'm
not seeing any failures from that, and we have regression tests in CodeGen/X86
that appear to cover all of those cases.
Some day we'll get to remove that MMX wart from LLVM IR completely?
Differential Revision: http://reviews.llvm.org/D15468
llvm-svn: 255399
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).
This patch fixed the issue. With the change, the index format
version will be bumped up by 1. Backward compatibility is
preserved with this change.
Differential Revision: http://reviews.llvm.org/D15243
llvm-svn: 255365
Revert "[DSE] Disable non-local DSE to see if the bots go green."
Revert "[DeadStoreElimination] Use range-based loops. NFC."
Revert "[DeadStoreElimination] Add support for non-local DSE."
llvm-svn: 255354
Mem2Reg shouldn't be optimizing a function that is marked
optnone. There is a test checking this that fails when mem2reg is
explicitly added to the standard pass pipeline.
llvm-svn: 255336
MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse.
llvm-svn: 255334
Summary: As a follow-up to rL255054 I wasn't able to convince myself that the code did what I thought, so I wrote more tests.
Reviewers: reames
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15371
llvm-svn: 255295
This is a redo of r255137 (reverted at r255227) which was a redo of
r255124 (reverted at r255126) with a fixed check for a scalar source
type and an added test for the failure that caused the revert.
Original commit message:
Example:
bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
--->
extractelement <2 x float> %X, i32 1
This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543
The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)
Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232
with 2 less specific transforms to catch the case in the bug report.
Differential Revision: http://reviews.llvm.org/D14879
llvm-svn: 255261
We extend the search for redundant stores to predecessor blocks that
unconditionally lead to the block BB with the current store instruction. That
also includes single-block loops that unconditionally lead to BB, and
if-then-else blocks where then- and else-blocks unconditionally lead to BB.
http://reviews.llvm.org/D13363
Patch by Ivan Baev <ibaev@codeaurora.org>!
llvm-svn: 255247
Two tests diag_mismatch.ll and diag_no_funcprofdata.ll generates the same
profdata filename which can conflict in current test runs. This patch
renames them to have different names.
llvm-svn: 255158
The ConstantDataArray::getFP(LLVMContext &, ArrayRef<uint16_t>)
overload has had a typo in it since it was written, where it will
create a Vector instead of an Array. This obviously doesn't work at
all, but it turns out that until r254991 there weren't actually any
callers of this overload. Fix the typo and add some test coverage.
llvm-svn: 255157
`CloneAndPruneIntoFromInst` can DCE instructions after cloning them into
the new function, and so an AssertingVH is too strong. This change
switches CloneCodeInfo to use a std::vector<WeakVH>.
llvm-svn: 255148
This is a redo of r255124 (reverted at r255126) with an added check for a
scalar destination type and an added test for the failure seen in Clang's
test/CodeGen/vector.c. The extra test shows a different missing optimization.
Original commit message:
Example:
bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
--->
extractelement <2 x float> %X, i32 1
This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543
The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)
Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232
with 2 less specific transforms to catch the case in the bug report.
Differential Revision: http://reviews.llvm.org/D14879
llvm-svn: 255137
This new patch fixes a few bugs that exposed in last submit. It also improves
the test cases.
--Original Commit Message--
This patch implements a minimum spanning tree (MST) based instrumentation for
PGO. The use of MST guarantees minimum number of CFG edges getting
instrumented. An addition optimization is to instrument the less executed
edges to further reduce the instrumentation overhead. The patch contains both the
instrumentation and the use of the profile to set the branch weights.
Differential Revision: http://reviews.llvm.org/D12781
llvm-svn: 255132
Example:
bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float
--->
extractelement <2 x float> %X, i32 1
This is part of fixing PR25543:
https://llvm.org/bugs/show_bug.cgi?id=25543
The next step will be to generalize this fold:
trunc ( lshr ( bitcast X) ) -> extractelement (X)
Ie, I'm hoping to replace the existing transform of:
bitcast ( trunc ( lshr ( bitcast X)))
added by:
http://reviews.llvm.org/rL112232
with 2 less specific transforms to catch the case in the bug report.
Differential Revision: http://reviews.llvm.org/D14879
llvm-svn: 255124
Summary:
Available_externally global variable with initializer were considered "hasInitializer()",
while obviously it can't match the description:
Whether the global variable has an initializer, and any changes made to the
initializer will turn up in the final executable.
since modifying the initializer of an externally available variable does not make sense.
Reviewers: pcc, rafael
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15351
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255123
Test case attached (test case also checks that we don't drop the calling
convention, but that functionality was correct before this patch).
llvm-svn: 255088
For an invoke with operand bundles, the [op_begin(), op_end()-3] range
can contain things other than invoke arguments. This change teaches
PruneEH to use arg_begin() and arg_end() explicitly.
llvm-svn: 255073
This patch teaches the fully redundant load part of EarlyCSE how to forward from atomic and volatile loads and stores, and how to eliminate unordered atomics (only). This patch does not include dead store elimination support for unordered atomics, that will follow in the near future.
The basic idea is that we allow all loads and stores to be tracked by the AvailableLoad table. We store a bit in the table which tracks whether load/store was atomic, and then only replace atomic loads with ones which were also atomic.
No attempt is made to refine our handling of ordered loads or stores. Those are still treated as full fences. We could pretty easily extend the release fence handling to release stores, but that should be a separate patch.
Differential Revision: http://reviews.llvm.org/D15337
llvm-svn: 255054
Per LangRef: "Globals with available_externally linkage are
allowed to be discarded at will, and are otherwise the same
as linkonce_odr", since linkonce_odr is in this list it makes
sense to have available_externally there as well.
Reviewers: rafael
Differential Revision: http://reviews.llvm.org/D15323
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 255043
Summary:
Also add a stricter post-condition for IndVarSimplify.
Fixes PR25578. Test case by Michael Zolotukhin.
Reviewers: hfinkel, atrick, mzolotukhin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15059
llvm-svn: 254977
Summary:
(Note: the problematic invocation of hoistIVInc that caused PR24804 came
from IndVarSimplify, not from SCEVExpander itself)
Fixes PR24804. Test case by David Majnemer.
Reviewers: hfinkel, majnemer, atrick, mzolotukhin
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15058
llvm-svn: 254976
Summary:
There are `SelectPatternFlavor`s that don't represent min or max idioms,
and we should not be passing those to `getCmpPredicateForMinMax`.
Fixes PR25745.
Reviewers: majnemer
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D15249
llvm-svn: 254869
Summary:
In order to avoid calling pow function we generate repeated fmul when n is a
positive or negative whole number.
For each exponent we pre-compute Addition Chains in order to minimize the no.
of fmuls.
Refer: http://wwwhomes.uni-bielefeld.de/achim/addition_chain.html
We pre-compute addition chains for exponents upto 32 (which results in a max of
7 fmuls).
For eg:
4 = 2+2
5 = 2+3
6 = 3+3 and so on
Hence,
pow(x, 4.0) ==> y = fmul x, x
x = fmul y, y
ret x
For negative exponents, we simply compute the reciprocal of the final result.
Note: This transformation is only enabled under fast-math.
Patch by Mandeep Singh Grang <mgrang@codeaurora.org>
Reviewers: weimingz, majnemer, escha, davide, scanon, joerg
Subscribers: probinson, escha, llvm-commits
Differential Revision: http://reviews.llvm.org/D13994
llvm-svn: 254776
The compiler can take advantage of the allocation/deallocation
function's properties. We knew how to do this for Itanium but had no
support for MSVC-style functions.
llvm-svn: 254656
Having to import an alias as declaration is not thinlto specific.
The test difference are because when we already have a decl and we are
not importing it, we just leave the decl alone.
llvm-svn: 254556
The current code does not take alloca array size into account and,
as a result, considers any access past the first array element to be
unsafe.
llvm-svn: 254350
This one is enabled only under -ffast-math. There are cases where the
difference between the value computed and the correct value is huge
even for ffast-math, e.g. as Steven pointed out:
x = -1, y = -4
log(pow(-1), 4) = 0
4*log(-1) = NaN
I checked what GCC does and apparently they do the same optimization
(which result in the dramatic difference). Future work might try to
make this (slightly) less worse.
Differential Revision: http://reviews.llvm.org/D14400
llvm-svn: 254263
This adds two thresholds to the sample profiler to affect inlining
decisions: the concept of global hotness and coldness.
Functions that have accumulated more than a certain fraction of samples at
runtime, are annotated with the InlineHint attribute. Conversely,
functions that accumulate less than a certain fraction of samples, are
annotated with the Cold attribute.
This is very similar to the hints emitted by Clang when using
instrumentation profiles.
Notice that this is a very blunt instrument. A function may have
globally collected a significant fraction of samples, but that does not
necessarily mean that every callsite for that function is hot.
Ideally, we would annotate each callsite with the samples collected at
that callsite. This way, the inliner can incorporate all these weights
into its cost model.
Once the inliner offers this functionality, we can change the hints
emitted here to a more precise per-callsite annotation. For now, this is
providing some measure of speedups with our internal benchmarks. I've
observed speedups of up to 23% (though the geo mean is about 3%). I expect
these numbers to improve as the inliner gets better annotations.
llvm-svn: 254212
The order in which instructions are truncated in truncateToMinimalBitwidths
effects code generation. Switch to a map with a determinisic order, since the
iteration order over a DenseMap is not defined.
This code is not hot, so the difference in container performance isn't
interesting.
Many thanks to David Blaikie for making me aware of MapVector!
Fixes PR25490.
Differential Revision: http://reviews.llvm.org/D14981
llvm-svn: 254179
They are as much trouble as aliases to declarations. They are requiring
the code generator to define a symbol with the same value as another
symbol, but the second symbol is undefined.
If representing this is important for some optimization, we could add
support for available_externally aliases. They would be *required* to
point to a declaration (or available_externally definition).
llvm-svn: 254170
Add a simple initial heuristic to control importing based on the number
of instructions recorded in the function's summary. Add option to
control the limit, and test using option.
llvm-svn: 254036
Fix buildbot failure for clang-x86_64-linux-selfhost-modules.
http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules/builds/8866
The failing test cases are newly added from r254021. It seems the IR has a
different order in this platform. In this patch, I temporarily relax the test
case to make the build green. I'll have a complete fix (more robust way to test)
soon.
llvm-svn: 254035
When the original binary is executed and sampled, the resulting profile
contains information on the original inline stack. We currently follow
the original inline plan if we notice that the inlined callsite has more
than 0 samples to it.
A better way is to determine whether the callsite is actually worth
inlining. If the callsite accumulates a small fraction of the samples
spent in the parent function, then we don't want to bother inlining it
(as it means that the callsite is actually cold).
This patch introduces a threshold expressed in percentage of samples
in relation to the parent function. If the callsite uses less than N%
of the total samples used by its parent, the original inline decision is
not re-applied.
I've set the threshold to the very arbitrary value of 5%. I'm yet to do
any actual experiments to see what's a good value. I wanted to separate
the basic mechanism from the tuning.
llvm-svn: 254034