Some Intrinsics are overloaded to the extent that return type equality (all
that's been checked up to now) does not guarantee that the arguments are the
same. In these cases SLP vectorizer should not recurse into the operands, which
can be achieved by comparing them as "Function *" rather than simply the ID.
llvm-svn: 205424
For the purpose of calculating the cost of the loop at various vectorization
factors, we need to count dependencies of consecutive pointers as uniforms
(which means that the VF = 1 cost is used for all overall VF values).
For example, the TSVC benchmark function s173 has:
...
%3 = add nsw i64 %indvars.iv, 16000
%arrayidx8 = getelementptr inbounds %struct.GlobalData* @global_data, i64 0, i32 0, i64 %3
...
and we must realize that the add will be a scalar in order to correctly deduce
it to be profitable to vectorize this on PowerPC with VSX enabled. In fact, all
dependencies of a consecutive pointer must be a scalar (uniform), and so we
simply need to add all consecutive pointers to the worklist that currently
detects collects uniforms.
Fixes PR19296.
llvm-svn: 205387
In preparation for an upcoming commit implementing unrolling preferences for
x86, this adds additional fields to the UnrollingPreferences structure:
- PartialThreshold and PartialOptSizeThreshold - Like Threshold and
OptSizeThreshold, but used when not fully unrolling. These are necessary
because we need different thresholds for full unrolling from those used when
partially unrolling (the full unrolling thresholds are generally going to be
larger).
- MaxCount - A cap on the unrolling factor when partially unrolling. This can
be used by a target to prevent the unrolled loop from exceeding some
resource limit independent of the loop size (such as number of branches).
There should be no functionality change for any in-tree targets.
llvm-svn: 205347
The generic (concatenation) loop unroller is currently placed early in the
standard optimization pipeline. This is a good place to perform full unrolling,
but not the right place to perform partial/runtime unrolling. However, most
targets don't enable partial/runtime unrolling, so this never mattered.
However, even some x86 cores benefit from partial/runtime unrolling of very
small loops, and follow-up commits will enable this. First, we need to move
partial/runtime unrolling late in the optimization pipeline (importantly, this
is after SLP and loop vectorization, as vectorization can drastically change
the size of a loop), while keeping the full unrolling where it is now. This
change does just that.
llvm-svn: 205264
This reverts commit r205018.
Conflicts:
lib/Transforms/Vectorize/SLPVectorizer.cpp
test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
This is breaking libclc build.
llvm-svn: 205260
Patch by Tobias Güntner.
I tried to write a test, but the only difference is the Changed value that
gets returned. It can be tested with "opt -debug-pass=Executions -functionattrs,
but that doesn't seem worth it.
llvm-svn: 205121
This adds a second implementation of the AArch64 architecture to LLVM,
accessible in parallel via the "arm64" triple. The plan over the
coming weeks & months is to merge the two into a single backend,
during which time thorough code review should naturally occur.
Everything will be easier with the target in-tree though, hence this
commit.
llvm-svn: 205090
This reverts commit r204912, and follow-up commit r204948.
This introduced a performance regression, and the fix is not completely
clear yet.
llvm-svn: 205010
This reverts commit r203553, and follow-up commits r203558 and r203574.
I will follow this up on the mailinglist to do it in a way that won't
cause subtle PRE bugs.
llvm-svn: 205009
Fixes a miscompile introduced in r204912. It would miscompile code like
(unsigned)(a + -49) <= 5U. The transform would turn this into
(unsigned)a < 55U, which would return true for values in [0, 49], when
it should not.
llvm-svn: 204948
This adds back r204781.
Original message:
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given
define void @my_func() {
ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias
We produce without this patch:
.weak my_alias
my_alias = my_func
.globl my_alias2
my_alias2 = my_alias
That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a
@my_alias = alias void ()* @other_func
would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.
There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.
llvm-svn: 204934
Summary:
Tested with a unit test because we don't appear to have any transforms
that use this other than ASan, I think.
Fixes PR17935.
Reviewers: nicholas
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3194
llvm-svn: 204866
This reverts commit r204781.
I will follow up to with msan folks to see what is what they
were trying to do with aliases to weak aliases.
llvm-svn: 204784
Aliases are just another name for a position in a file. As such, the
regular symbol resolutions are not applied. For example, given
define void @my_func() {
ret void
}
@my_alias = alias weak void ()* @my_func
@my_alias2 = alias void ()* @my_alias
We produce without this patch:
.weak my_alias
my_alias = my_func
.globl my_alias2
my_alias2 = my_alias
That is, in the resulting ELF file my_alias, my_func and my_alias are
just 3 names pointing to offset 0 of .text. That is *not* the
semantics of IR linking. For example, linking in a
@my_alias = alias void ()* @other_func
would require the strong my_alias to override the weak one and
my_alias2 would end up pointing to other_func.
There is no way to represent that with aliases being just another
name, so the best solution seems to be to just disallow it, converting
a miscompile into an error.
llvm-svn: 204781
Summary:
Previously the code didn't check if the before and after types for the
store were pointers to different address spaces. This resulted in
instcombine using a bitcast to convert between pointers to different
address spaces, causing an assertion due to the invalid cast.
It is not be appropriate to use addrspacecast this case because it is
not guaranteed to be a no-op cast. Instead bail out and do not do the
transformation.
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3117
llvm-svn: 204733
Extracts coming from phis were being hoisted, while all others were
sunk to their uses. This was inconsistent and didn't seem to serve a
purpose. Changing all extracts to be sunk to uses is a prerequisite
for adding block frequency to the SLP vectorizer's cost model.
I benchmarked the change in isolation (without block frequency). I
only saw noise on x86 and some potentially significant improvements on
ARM. No major regressions is good enough for me.
llvm-svn: 204699
The cleanup code that removes dead cast instructions only removed them from the
basic block, but didn't delete them. This fix erases them now too.
llvm-svn: 204538
A PHI node usually has only one value/basic block pair per incoming basic block.
In the case of a switch statement it is possible that a following PHI node may
have more than one such pair per incoming basic block. E.g.:
%0 = phi i64 [ 123456, %case2 ], [ 654321, %Entry ], [ 654321, %Entry ]
This is valid and the verfier doesn't complain, because both values are the
same.
Constant hoisting materializes the constant for each operand separately and the
value is still the same, but the variable names have changed. As a result the
verfier can't recognize anymore that they are the same value and complains.
This fix adds special update code for PHI node in constant hoisting to prevent
this corner case.
This fixes <rdar://problem/16394449>
llvm-svn: 204537
Extend the target hook to take also the operand index into account when
calculating the cost of the constant materialization.
Related to <rdar://problem/16381500>
llvm-svn: 204435
Originally the algorithm would search for expensive constants and track their
users, which could be instructions and constant expressions. This change only
tracks the constants for instructions, but constant expressions are indirectly
covered too. If an operand is an constant expression, then we look through the
expression to find anny expensive constants.
The algorithm keep now track of the instruction and the operand index where the
constant is used. This allows more precise hoisting of constant materialization
code for PHI instructions, because we only hoist to the basic block of the
incoming operand. Before we had to find the idom of all PHI operands and hoist
the materialization code there.
This also makes updating of instructions easier. Before we had to keep track of
the original constant, find it in the instructions, and then replace it. Now we
can just simply update the operand.
Related to <rdar://problem/16381500>
llvm-svn: 204433
This simplifies working with the constant candidates and removes the tight
coupling between the map and the vector.
Related to <rdar://problem/16381500>
llvm-svn: 204431
This commit extends the coverage of the constant hoisting pass, adds additonal
debug output and updates the function names according to the style guide.
Related to <rdar://problem/16381500>
llvm-svn: 204389
This option caused LowerInvoke to generate code using SJLJ-based
exception handling, but there is no code left that interprets the
jmp_buf stack that the resulting code maintained (llvm.sjljeh.jblist).
This option has been obsolete for a while, and replaced by
SjLjEHPrepare.
This leaves the default behaviour of LowerInvoke, which is to convert
invokes to calls.
Differential Revision: http://llvm-reviews.chandlerc.com/D3136
llvm-svn: 204388
The use_iterator redesign in r203364 introduced an increment past the
end of a range in -objc-arc-contract. Added an explicit check for the
end of the range.
<rdar://problem/16333235>
llvm-svn: 204195
noise.
Original commit log:
Replace some dead code with an assert. When I first ported this pass
from a loop pass to a function pass I did so in the naive, recursive
way. It doesn't actually work, we need a worklist instead. When
I switched to the worklist I didn't delete the naive recursion. That
recursion was also buggy because it was dead and never really exercised.
llvm-svn: 204187
pass from a loop pass to a function pass I did so in the naive,
recursive way. It doesn't actually work, we need a worklist instead.
When I switched to the worklist I didn't delete the naive recursion.
That recursion was also buggy because it was dead and never really
exercised.
llvm-svn: 204184
LLVM part of MSan implementation of advanced origin tracking,
when we record not only creation point, but all locations where
an uninitialized value was stored to memory, too.
llvm-svn: 204151
Summary:
The compiler does not always generate linkage names. If a function
has been inlined and its body elided, its linkage name may not be
generated.
When the binary executes, the profiler will use its unmangled name
when attributing samples. This results in unmangled names in the
input profile.
We are currently failing hard when this happens. However, in this case
all that happens is that we fail to attribute samples to the inlined
function. While this means fewer optimization opportunities, it should
not cause a compilation failure.
This patch accepts all valid function names, regardless of whether
they were mangled or not.
Reviewers: chandlerc
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3087
llvm-svn: 204142
When GlobalOpt has determined that a GlobalVariable only ever has two values,
it would convert the GlobalVariable to a boolean, and introduce SelectInsts
at every load, to choose between the two possible values. These SelectInsts
introduce overhead and other unpleasantness.
This patch makes GlobalOpt just add range metadata to loads from such
GlobalVariables instead. This enables the same main optimization (as seen in
test/Transforms/GlobalOpt/integer-bool.ll), without introducing selects.
The main downside is that it doesn't get the memory savings of shrinking such
GlobalVariables, but this is expected to be negligible.
llvm-svn: 204076
The "noduplicate" attribute of call instructions is sometimes queried directly
and sometimes through the cannotDuplicate() predicate. This patch streamlines
all queries to use the cannotDuplicate() predicate. It also adds this predicate
to InvokeInst, to mirror what CallInst has.
llvm-svn: 204049
Summary:
The sample profiler pass emits several error messages. Instead of
just aborting the compiler with report_fatal_error, we can emit
better messages using DiagnosticInfo.
This adds a new sub-class of DiagnosticInfo to handle the sample
profiler.
Reviewers: chandlerc, qcolombet
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D3086
llvm-svn: 203976
O(N*log(N)). The idea is to introduce total ordering among functions set.
That allows to build binary tree and perform function look-up procedure in O(log(N)) time.
This patch description:
Introduced total ordering among Type instances. Actually it is improvement for existing
isEquivalentType.
0. Coerce pointer of 0 address space to integer.
1. If left and right types are equal (the same Type* value), return 0 (means equal).
2. If types are of different kind (different type IDs). Return result of type IDs
comparison, treating them as numbers.
3. If types are vectors or integers, return result of its
pointers comparison (casted to numbers).
4. Check whether type ID belongs to the next group:
* Void
* Float
* Double
* X86_FP80
* FP128
* PPC_FP128
* Label
* Metadata
If so, return 0.
5. If left and right are pointers, return result of address space
comparison (numbers comparison).
6. If types are complex.
Then both LEFT and RIGHT will be expanded and their element types will be checked with
the same way. If we get Res != 0 on some stage, return it. Otherwise return 0.
7. For all other cases put llvm_unreachable.
llvm-svn: 203788
This allows us to generate table lookups for code such as:
unsigned test(unsigned x) {
switch (x) {
case 100: return 0;
case 101: return 1;
case 103: return 2;
case 105: return 3;
case 107: return 4;
case 109: return 5;
case 110: return 6;
default: return f(x);
}
}
Since cases 102, 104, etc. are not constants, the lookup table has holes
in those positions. We therefore guard the table lookup with a bitmask check.
Patch by Jasper Neumann!
llvm-svn: 203694
There's a bit of duplicated "magic" code in opt.cpp and Clang's CodeGen that
computes the inliner threshold from opt level and size opt level.
This patch moves the code to a function that lives alongside the inliner itself,
providing a convenient overload to the inliner creation.
A separate patch can be committed to Clang to use this once it's committed to
LLVM. Standalone tools that use the inlining pass can also avoid duplicating
this code and fearing it will go out of sync.
Note: this patch also restructures the conditinal logic of the computation to
be cleaner.
llvm-svn: 203669
After r203553 overflow intrinsics and their non-intrinsic (normal)
instruction get hashed to the same value. This patch prevents PRE from
moving an instruction into a predecessor block, and trying to add a phi
node that gets two different types (the intrinsic result and the
non-intrinsic result), resulting in a failing assert.
llvm-svn: 203574
The syntax for "cmpxchg" should now look something like:
cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic
where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).
rdar://problem/15996804
llvm-svn: 203559
When an overflow intrinsic is followed by a non-overflow instruction,
replace the latter with an extract. For example:
%sadd = tail call { i32, i1 } @llvm.sadd.with.overflow.i32(i32 %a, i32 %b)
%sadd3 = add i32 %a, %b
Here the add statement will be replaced by an extract.
When an overflow intrinsic follows a non-overflow instruction, a clone
of the intrinsic is inserted before the normal instruction, which makes
it the same as the previous case. Subsequent runs of GVN can then clean
up the duplicate instructions and insert the extract.
This fixes PR8817.
llvm-svn: 203553
Summary:
When the sample profiles include discriminator information,
use the discriminator values to distinguish instruction weights
in different basic blocks.
This modifies the BodySamples mapping to map <line, discriminator> pairs
to weights. Instructions on the same line but different blocks, will
use different discriminator values. This, in turn, means that the blocks
may have different weights.
Other changes in this patch:
- Add tests for positive values of line offset, discriminator and samples.
- Change data types from uint32_t to unsigned and int and do additional
validation.
Reviewers: chandlerc
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2857
llvm-svn: 203508
optimize a call to a llvm intrinsic to something that invovles a call to a C
library call, make sure it sets the right calling convention on the call.
e.g.
extern double pow(double, double);
double t(double x) {
return pow(10, x);
}
Compiles to something like this for AAPCS-VFP:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
%0 = call double @llvm.pow.f64(double 1.000000e+01, double %x)
ret double %0
}
declare double @llvm.pow.f64(double, double) #1
Simplify libcall (part of instcombine) will turn the above into:
define arm_aapcs_vfpcc double @t(double %x) #0 {
entry:
%__exp10 = call double @__exp10(double %x) #1
ret double %__exp10
}
declare double @__exp10(double)
The pre-instcombine code works because calls to LLVM builtins are special.
Instruction selection will chose the right calling convention for the call.
However, the code after instcombine is wrong. The call to __exp10 will use
the C calling convention.
I can think of 3 options to fix this.
1. Make "C" calling convention just work since the target should know what CC
is being used.
This doesn't work because each function can use different CC with the "pcs"
attribute.
2. Have Clang add the right CC keyword on the calls to LLVM builtin.
This will work but it doesn't match the LLVM IR specification which states
these are "Standard C Library Intrinsics".
3. Fix simplify libcall so the resulting calls to the C routines will have the
proper CC keyword. e.g.
%__exp10 = call arm_aapcs_vfpcc double @__exp10(double %x) #1
This works and is the solution I implemented here.
Both solutions #2 and #3 would work. After carefully considering the pros and
cons, I decided to implement #3 for the following reasons.
1. It doesn't change the "spec" of the intrinsics.
2. It's a self-contained fix.
There are a couple of potential downsides.
1. There could be other places in the optimizer that is broken in the same way
that's not addressed by this.
2. There could be other calling conventions that need to be propagated by
simplify-libcall that's not handled.
But for now, this is the fix that I'm most comfortable with.
llvm-svn: 203488
This requires a number of steps.
1) Move value_use_iterator into the Value class as an implementation
detail
2) Change it to actually be a *Use* iterator rather than a *User*
iterator.
3) Add an adaptor which is a User iterator that always looks through the
Use to the User.
4) Wrap these in Value::use_iterator and Value::user_iterator typedefs.
5) Add the range adaptors as Value::uses() and Value::users().
6) Update *all* of the callers to correctly distinguish between whether
they wanted a use_iterator (and to explicitly dig out the User when
needed), or a user_iterator which makes the Use itself totally
opaque.
Because #6 requires churning essentially everything that walked the
Use-Def chains, I went ahead and added all of the range adaptors and
switched them to range-based loops where appropriate. Also because the
renaming requires at least churning every line of code, it didn't make
any sense to split these up into multiple commits -- all of which would
touch all of the same lies of code.
The result is still not quite optimal. The Value::use_iterator is a nice
regular iterator, but Value::user_iterator is an iterator over User*s
rather than over the User objects themselves. As a consequence, it fits
a bit awkwardly into the range-based world and it has the weird
extra-dereferencing 'operator->' that so many of our iterators have.
I think this could be fixed by providing something which transforms
a range of T&s into a range of T*s, but that *can* be separated into
another patch, and it isn't yet 100% clear whether this is the right
move.
However, this change gets us most of the benefit and cleans up
a substantial amount of code around Use and User. =]
llvm-svn: 203364
Sequences of insertelement/extractelements are sometimes used to build
vectorsr; this code tries to put them back together into shuffles, but
could only produce a completely uniform shuffle types (<N x T> from two
<N x T> sources).
This should allow shuffles with different numbers of elements on the
input and output sides as well.
llvm-svn: 203229
This compiles with no changes to clang/lld/lldb with MSVC and includes
overloads to various functions which are used by those projects and llvm
which have OwningPtr's as parameters. This should allow out of tree
projects some time to move. There are also no changes to libs/Target,
which should help out of tree targets have time to move, if necessary.
llvm-svn: 203083
already lived there and it is where it belongs -- this is the in-memory
debug location representation.
This is just cleanup -- Modules can actually cope with this, but that
doesn't make it right. After chatting with folks that have out-of-tree
stuff, going ahead and moving the rest of the headers seems preferable.
llvm-svn: 202960
to ensure we don't mess up any of the overrides. Necessary for cleaning
up the Value use iterators and enabling range-based traversing of use
lists.
llvm-svn: 202958
a bit surprising, as the class is almost entirely abstracted away from
any particular IR, however it encodes the comparsion predicates which
mutate ranges as ICmp predicate codes. This is reasonable as they're
used for both instructions and constants. Thus, it belongs in the IR
library with instructions and constants.
llvm-svn: 202838
this would have been required because of the use of DataLayout, but that
has moved into the IR proper. It is still required because this folder
uses the constant folding in the analysis library (which uses the
datalayout) as the more aggressive basis of its folder.
llvm-svn: 202832
directly care about the Value class (it is templated so that the key can
be any arbitrary Value subclass), it is in fact concretely tied to the
Value class through the ValueHandle's CallbackVH interface which relies
on the key type being some Value subclass to establish the value handle
chain.
Ironically, the unittest is already in the right library.
llvm-svn: 202824
Move the test for this class into the IR unittests as well.
This uncovers that ValueMap too is in the IR library. Ironically, the
unittest for ValueMap is useless in the Support library (honestly, so
was the ValueHandle test) and so it already lives in the IR unittests.
Mmmm, tasty layering.
llvm-svn: 202821
name might indicate, it is an iterator over the types in an instruction
in the IR.... You see where this is going.
Another step of modularizing the support library.
llvm-svn: 202815
business.
This header includes Function and BasicBlock and directly uses the
interfaces of both classes. It has to do with the IR, it even has that
in the name. =] Put it in the library it belongs to.
This is one step toward making LLVM's Support library survive a C++
modules bootstrap.
llvm-svn: 202814
DWARF discriminators are used to distinguish multiple control flow paths
on the same source location. When this happens, instructions across
basic block boundaries will share the same debug location.
This pass detects this situation and creates a new lexical scope to one
of the two instructions. This lexical scope is a child scope of the
original and contains a new discriminator value. This discriminator is
then picked up from MCObjectStreamer::EmitDwarfLocDirective to be
written on the object file.
This fixes http://llvm.org/bugs/show_bug.cgi?id=18270.
llvm-svn: 202752
remove_if that its predicate is adaptable. We don't actually need this,
we can write a generic adapter for any predicate.
This lets us remove some very wrong std::function usages. We should
never be using std::function for predicates to algorithms. This incurs
an *indirect* call overhead for every evaluation of the predicate, and
makes it very hard to inline through.
llvm-svn: 202742
operand_values. The first provides a range view over operand Use
objects, and the second provides a range view over the Value*s being
used by those operands.
The naming is "STL-style" rather than "LLVM-style" because we have
historically named iterator methods STL-style, and range methods seem to
have far more in common with their iterator counterparts than with
"normal" APIs. Feel free to bikeshed on this one if you want, I'm happy
to change these around if people feel strongly.
I've switched code in SROA and LCG to exercise these mostly to ensure
they work correctly -- we don't really have an easy way to unittest this
and they're trivial.
llvm-svn: 202687
We should apply fastcc whenever profitable. We can expand this list,
but there are lots of conventions with performance implications that we
don't want to change.
Differential Revision: http://llvm-reviews.chandlerc.com/D2705
llvm-svn: 202293
the default.
Based on the patch by Matt Arsenault, D1764!
I switched one place to use the more direct pointer type to compute the
desired address space, and I reworked the memcpy rewriting section to
reflect significant refactorings that this patch helped inspire.
Thanks to several of the folks who helped review and improve the patch
as well.
llvm-svn: 202247
to work independently for the slice side and the other side.
This allows us to only compute the minimum of the two when we actually
rewrite to a memcpy that needs to take the minimum, and preserve higher
alignment for one side or the other when rewriting to loads and stores.
This fix was inspired by seeing the result of some refactoring that
makes addrspace handling better.
llvm-svn: 202242
D1764, which in turn set off the other refactorings to make
'getSliceAlign()' a sensible thing.
There are two possible inputs to the required alignment of a memory
transfer intrinsic: the alignment constraints of the source and the
destination. If we are *only* introducing a (potentially new) offset
onto one side of the transfer, we don't need to consider the alignment
constraints of the other side. Use this to simplify the logic feeding
into alignment computation for unsplit transfers.
Also, hoist the clamp of the magical zero alignment for these intrinsics
to the more customary one alignment early. This lets several other
conditions melt away.
No functionality changed. There is a further improvement this exposes
which *will* change functionality, but that's arriving in a separate
patch.
llvm-svn: 202232
rewriting logic: don't pass custom offsets for the adjusted pointer to
the new alloca.
We always passed NewBeginOffset here. Sometimes we spelled it
BeginOffset, but only when they were in fact equal. Whats worse, the API
is set up so that you can't reasonably call it with anything else -- it
assumes that you're passing it an offset relative to the *original*
alloca that happens to fall within the new one. That's the whole point
of NewBeginOffset, it's the clamped beginning offset.
No functionality changed.
llvm-svn: 202231
alignment of the slice being rewritten, not any arbitrary offset.
Every caller is really just trying to compute the alignment for the
whole slice, never for some arbitrary alignment. They are also just
passing a type when they have one to see if we can skip an explicit
alignment in the IR by using the type's alignment. This makes for a much
simpler interface.
Another refactoring inspired by the addrspace patch for SROA, although
only loosely related.
llvm-svn: 202230
consistency with memcpy rewriting, and fix a latent bug in the alignment
management for memset.
The alignment issue is that getAdjustedAllocaPtr is computing the
*relative* offset into the new alloca, but the alignment isn't being set
to the relative offset, it was using the the absolute offset which is
into the old alloca.
I don't think its possible to write a test case that actually reaches
this code where the resulting alignment would be observably different,
but the intent was clearly to use the relative offset within the new
alloca.
llvm-svn: 202229
rather than passing them as arguments.
While I generally prefer actual arguments, in this case the readability
loss is substantial. By using members we avoid repeatedly calculating
the offsets, and once we're using members it is useful to ensure that
those names *always* refer to the original-alloca-relative new offset
for a rewritten slice.
No functionality changed. Follow-up refactoring, all toward getting the
address space patch merged.
llvm-svn: 202228
slice being rewritten.
We had the same code scattered across most of the visits. Instead,
compute the new offsets and the slice size once when we start to visit
a particular slice, and use the member variables from then on. This
reduces quite a bit of code duplication.
No functionality changed. Refactoring inspired to make it easier to
apply the address space patch to SROA.
llvm-svn: 202227
checking in SROA.
The primary change is to just rely on uge for checking that the offset
is within the allocation size. This removes the explicit checks against
isNegative which were terribly error prone (including the reversed logic
that led to PR18615) and prevented us from supporting stack allocations
larger than half the address space.... Ok, so maybe the latter isn't
*common* but it's a silly restriction to have.
Also, we used to try to support a PHI node which loaded from before the
start of the allocation if any of the loaded bytes were within the
allocation. This doesn't make any sense, we have never really supported
loading or storing *before* the allocation starts. The simplified logic
just doesn't care.
We continue to allow loading past the end of the allocation in part to
support cases where there is a PHI and some loads are larger than others
and the larger ones reach past the end of the allocation. We could solve
this a different and more conservative way, but I'm still somewhat
paranoid about this.
llvm-svn: 202224
their inputs come from std::stable_sort and they are not total orders.
I'm not a huge fan of this, but the really bad std::stable_sort is right
at the beginning of Reassociate. After we commit to stable-sort based
consistent respect of source order, the downstream sorts shouldn't undo
that unless they have a total order or they are used in an
order-insensitive way. Neither appears to be true for these cases.
I don't have particularly good test cases, but this jumped out by
inspection when looking for output instability in this pass due to
changes in the ordering of std::sort.
llvm-svn: 202196
implemented this way a long time ago and due to the overwhelming bugs
that surfaced, moved to a much more relaxed variant. Richard Smith would
like to understand the magnitude of this problem and it seems fairly
harmless to keep some flag-controlled logic to get the extremely strict
behavior here. I'll remove it if it doesn't prove useful.
llvm-svn: 202193
just "load". This helps avoid pointless de-duping with order-sensitive
numbers as we already have unique names from the original load. It also
makes the resulting IR quite a bit easier to read.
llvm-svn: 202140
the pointer adjustment code. This is the primary code path that creates
totally new instructions in SROA and being able to lump them based on
the pointer value's name for which they were created causes
*significantly* fewer name collisions and general noise in the debug
output. This is particularly significant because it is making it much
harder to track down instability in the output of SROA, as name
de-duplication is a totally harmless form of instability that gets in
the way of seeing real problems.
The new fancy naming scheme tries to dig out the root "pre-SROA" name
for pointer values and associate that all the way through the pointer
formation instructions. Digging out the root is important to prevent the
multiple iterative rounds of SROA from just layering too much cruft on
top of cruft here. We already track the layers of SROAs iteration in the
alloca name prefix. We don't need to duplicate it here.
Should have no functionality change, and shouldn't have any really
measurable impact on NDEBUG builds, as most of the complex logic is
debug-only.
llvm-svn: 202139
using OldPtr more heavily. Lots of this code was written before the
rewriter had an OldPtr member setup ahead of time. There are already
asserts in place that should ensure this doesn't change any
functionality.
llvm-svn: 202135
the break statement, not just think it to yourself....
No idea how this worked at all, much less survived most bots, my
bootstrap, and some bot bootstraps!
The Polly one didn't survive, and this was filed as PR18959. I don't
have a reduced test case and honestly I'm not seeing the need. What we
probably need here are better asserts / debug-build behavior in
SmallPtrSet so that this madness doesn't make it so far.
llvm-svn: 202129
sorting it. This helps uncover latent reliance on the original ordering
which aren't guaranteed to be preserved by std::sort (but often are),
and which are based on the use-def chain orderings which also aren't
(technically) guaranteed.
Only available in C++11 debug builds, and behind a flag to prevent noise
at the moment, but this is generally useful so figured I'd put it in the
tree rather than keeping it out-of-tree.
llvm-svn: 202106
the destination operand or source operand of a memmove.
It so happens that it was impossible for SROA to try to rewrite
self-memmove where the operands are *identical*, because either such
a think is volatile (and we don't rewrite) or it is non-volatile, and we
don't even register it as a use of the alloca.
However, making the 'IsDest' test *rely* on this subtle fact is... Very
confusing for the reader. We should use the direct and readily available
test of the Use* which gives us concrete information about which operand
is being rewritten.
No functionality changed, I hope! ;]
llvm-svn: 202103
ordering.
The fundamental problem that we're hitting here is that the use-def
chain ordering is *itself* not a stable thing to be relying on in the
rewriting for SROA. Further, we use a non-stable sort over the slices to
arrange them based on the section of the alloca they're operating on.
With a debugging STL implementation (or different implementations in
stage2 and stage3) this can cause stage2 != stage3.
The specific aspect of this problem fixed in this commit deals with the
rewriting and load-speculation around PHIs and Selects. This, like many
other aspects of the use-rewriting in SROA, is really part of the
"strong SSA-formation" that is doen by SROA where it works very hard to
canonicalize loads and stores in *just* the right way to satisfy the
needs of mem2reg[1]. When we have a select (or a PHI) with 2 uses of the
same alloca, we test that loads downstream of the select are
speculatable around it twice. If only one of the operands to the select
needs to be rewritten, then if we get lucky we rewrite that one first
and the select is immediately speculatable. This can cause the order of
operand visitation, and thus the order of slices to be rewritten, to
change an alloca from promotable to non-promotable and vice versa.
The fix is to defer all of the speculation until *after* the rewrite
phase is done. Once we've rewritten everything, we can accurately test
for whether speculation will work (once, instead of twice!) and the
order ceases to matter.
This also happens to simplify the other subtlety of speculation -- we
need to *not* speculate anything unless the result of speculating will
make the alloca fully promotable by mem2reg. I had a previous attempt at
simplifying this, but it was still pretty horrible.
There is actually already a *really* nice test case for this in
basictest.ll, but on multiple STL implementations and inputs, we just
got "lucky". Fortunately, the test case is very small and we can
essentially build it in exactly the opposite way to get reasonable
coverage in both directions even from normal STL implementations.
llvm-svn: 202092
After this I will set the default back to F_None. The advantage is that
before this patch forgetting to set F_Binary would corrupt a file on windows.
Forgetting to set F_Text produces one that cannot be read in notepad, which
is a better failure mode :-)
llvm-svn: 202052
During the LTO phase LICM will move loop invariant global variables out of loops
(informed by GlobalModRef). This makes more loops countable presenting
opportunity for the loop vectorizer.
Adding the loop vectorizer improves some TSVC benchmarks and twolf/ref dataset
(5%) on x86-64.
radar://15970632
llvm-svn: 202051
CodeGenPrepare uses extensively TargetLowering which is part of libLLVMCodeGen.
This is a layer violation which would introduce eventually a dependence on
CodeGen in ScalarOpts.
Move CodeGenPrepare into libLLVMCodeGen to avoid that.
Follow-up of <rdar://problem/15519855>
llvm-svn: 201912
I am really sorry for the noise, but the current state where some parts of the
code use TD (from the old name: TargetData) and other parts use DL makes it
hard to write a patch that changes where those variables come from and how
they are passed along.
llvm-svn: 201827
r201608 made llvm corretly handle private globals with MachO. r201622 fixed
a bug in it and r201624 and r201625 were changes for using private linkage,
assuming that llvm would do the right thing.
They all got reverted because r201608 introduced a crash in LTO. This patch
includes a fix for that. The issue was that TargetLoweringObjectFile now has
to be initialized before we can mangle names of private globals. This is
trivially true during the normal codegen pipeline (the asm printer does it),
but LTO has to do it manually.
llvm-svn: 201700
On x86, shifting a vector by a scalar is significantly cheaper than shifting a
vector by another fully general vector. Unfortunately, because SelectionDAG
operates on just one basic block at a time, the shufflevector instruction that
reveals whether the right-hand side of a shift *is* really a scalar is often
not visible to CodeGen when it's needed.
This adds another handler to CodeGenPrepare, to sink any useful shufflevector
instructions down to the basic block where they're used, predicated on a target
hook (since on other architectures, doing so will often just introduce extra
real work).
rdar://problem/16063505
llvm-svn: 201655
As defined in LangRef, aliases do not have sections. However, LLVM's
GlobalAlias class inherits from GlobalValue, which means we can read and
set its section. We should probably ban that as a separate change,
since it doesn't make much sense for an alias to have a section that
differs from its aliasee.
Fixes PR18757, where the section was being lost on the global in code
from Clang like:
extern "C" {
__attribute__((used, section("CUSTOM"))) static int in_custom_section;
}
Reviewers: rafael.espindola
Differential Revision: http://llvm-reviews.chandlerc.com/D2758
llvm-svn: 201286
logical operations on the i1's driving them. This is a bad idea for every
target I can think of (confirmed with micro tests on all of: x86-64, ARM,
AArch64, Mips, and PowerPC) because it forces the i1 to be materialized into
a general purpose register, whereas consuming it directly into a select generally
allows it to exist only transiently in a predicate or flags register.
Chandler ran a set of performance tests with this change, and reported no
measurable change on x86-64.
llvm-svn: 201275
'OK_NonUniformConstValue' to identify operands which are constants but
not constant splats.
The cost model now allows returning 'OK_NonUniformConstValue'
for non splat operands that are instances of ConstantVector or
ConstantDataVector.
With this change, targets are now able to compute different costs
for instructions with non-uniform constant operands.
For example, On X86 the cost of a vector shift may vary depending on whether
the second operand is a uniform or non-uniform constant.
This patch applies the following changes:
- The cost model computation now takes into account non-uniform constants;
- The cost of vector shift instructions has been improved in
X86TargetTransformInfo analysis pass;
- BBVectorize, SLPVectorizer and LoopVectorize now know how to distinguish
between non-uniform and uniform constant operands.
Added a new test to verify that the output of opt
'-cost-model -analyze' is valid in the following configurations: SSE2,
SSE4.1, AVX, AVX2.
llvm-svn: 201272
Fixes PR18753 and PR18782.
This is necessary for LICM to preserve LCSSA correctly and efficiently.
There is still some active discussion about whether we should be using
LCSSA, but we can't just immediately stop using it and we *need* LICM to
preserve it while we are using it. We can restore the old SSAUpdater
driven code if and when there is a serious effort to remove the reliance
on LCSSA from all of the loop passes.
However, this also serves as a great example of why LCSSA is very nice
to have. This change significantly simplifies the process of sinking
instructions for LICM, and makes it quite a bit less expensive.
It wouldn't even be as complex as it is except that I had to start the
process of removing the big recursive LCSSA formation hammer in order to
switch even this much of the re-forming code to asserting that LCSSA was
preserved. I'll fully remove that next just to tidy things up until the
LCSSA debate settles one way or the other.
llvm-svn: 201148
The addressing mode matcher checks at some point the profitability of folding an
instruction into the addressing mode. When the instruction to be folded has
several uses, it checks that the instruction can be folded in each use.
To do so, it creates a new matcher for each use and check if the instruction is
in the list of the matched instructions of this new matcher.
The new matchers may promote some instructions and this has to be undone to keep
the state of the original matcher consistent.
A test case will follow.
<rdar://problem/16020230>
llvm-svn: 201121
The crux of the issue is that LCSSA doesn't preserve stateful alias
analyses. Before r200067, LICM didn't cause LCSSA to run in the LTO pass
manager, where LICM runs essentially without any of the other loop
passes. As a consequence the globalmodref-aa pass run before that loop
pass manager was able to survive the loop pass manager and be used by
DSE to eliminate stores in the function called from the loop body in
Adobe-C++/loop_unroll (and similar patterns in other benchmarks).
When LICM was taught to preserve LCSSA it had to require it as well.
This caused it to be run in the loop pass manager and because it did not
preserve AA, the stateful AA was lost. Most of LLVM's AA isn't stateful
and so this didn't manifest in most cases. Also, in most cases LCSSA was
already running, and so there was no interesting change.
The real kicker is that LCSSA by its definition (injecting PHI nodes
only) trivially preserves AA! All we need to do is mark it, and then
everything goes back to working as intended. It probably was blocking
some other weird cases of stateful AA but the only one I have is
a 1000-line IR test case from loop_unroll, so I don't really have a good
test case here.
Hopefully this fixes the regressions on performance that have been seen
since that revision.
llvm-svn: 201104
Before conditional store vectorization/unrolling we had only one
vectorized/unrolled basic block. After adding support for conditional store
vectorization this will not only be one block but multiple basic blocks. The
last block would have the back-edge. I updated the code to use a vector of basic
blocks instead of a single basic block and fixed the users to use the last entry
in this vector. But, I forgot to add the basic blocks to this vector!
Fixes PR18724.
llvm-svn: 201028
The bitcast instruction during constant materialization was not placed correcly
in the presence of phi nodes. This commit fixes the insertion point to be in the
idom instead.
This fixes PR18768
llvm-svn: 201009
This fix first traverses the whole use list of the constant expression and
keeps track of the instructions that need to be updated. Then perform the
fixup afterwards.
llvm-svn: 201008
mode.
Basically the idea is to transform code like this:
%idx = add nsw i32 %a, 1
%sextidx = sext i32 %idx to i64
%gep = gep i8* %myArray, i64 %sextidx
load i8* %gep
Into:
%sexta = sext i32 %a to i64
%idx = add nsw i64 %sexta, 1
%gep = gep i8* %myArray, i64 %idx
load i8* %gep
That way the computation can be folded into the addressing mode.
This transformation is done as part of the addressing mode matcher.
If the matching fails (not profitable, addressing mode not legal, etc.), the
matcher will revert the related promotions.
<rdar://problem/15519855>
llvm-svn: 200947
225 is the default value of inline-threshold. This change will make sure
we have the same inlining behavior as prior to r200886.
As Chandler points out, even though we don't have code in our testing
suite that uses cold attribute, there are larger applications that do
use cold attribute.
r200886 + this commit intend to keep the same behavior as prior to r200886.
We can later on tune the inlinecold-threshold.
The main purpose of r200886 is to help performance of instrumentation based
PGO before we actually hook up inliner with analysis passes such as BPI and BFI.
For instrumentation based PGO, we try to increase inlining of hot functions and
reduce inlining of cold functions by setting inlinecold-threshold.
Another option suggested by Chandler is to use a boolean flag that controls
if we should use OptSizeThreshold for cold functions. The default value
of the boolean flag should not change the current behavior. But it gives us
less freedom in controlling inlining of cold functions.
llvm-svn: 200898
Ideally only those transform passes that run at -O0 remain enabled,
in reality we get as close as we reasonably can.
Passes are responsible for disabling themselves, it's not the job of
the pass manager to do it for them.
llvm-svn: 200892
Added command line option inlinecold-threshold to set threshold for inlining
functions with cold attribute. Listen to the cold attribute when it would
decrease the inline threshold.
llvm-svn: 200886
No functional change. Updated loops from:
for (I = scc_begin(), E = scc_end(); I != E; ++I)
to:
for (I = scc_begin(); !I.isAtEnd(); ++I)
for teh win.
llvm-svn: 200789