The new analysis is not yet ready for prime time. It has a *critical*
flawed assumption, and some troubling shortages of testing. Until it's
been hammered into better shape, let's stick with the working code. This
should be easy to revert itself when the analysis is ready.
Fixes PR14241, a miscompile of any memcpy-able loop which uses a pointer
as the induction mechanism. If you have been seeing miscompiles in this
revision range, you really want to test with this backed out. The
results of this miscompile are a bit subtle as they can lead to
downstream passes concluding things are impossible which are in fact
possible.
Thanks to David Blaikie for the majority of the reduction of this
miscompile. I'll be checking in the test case in a non-revert commit.
Revesions reverted here:
r167045: LoopIdiom: Fix a serious missed optimization: we only turned
top-level loops into memmove.
r166877: LoopIdiom: Add checks to avoid turning memmove into an infinite
loop.
r166875: LoopIdiom: Recognize memmove loops.
r166874: LoopIdiom: Replace custom dependence analysis with
DependenceAnalysis.
llvm-svn: 167286
r165941: Resubmit the changes to llvm core to update the functions to
support different pointer sizes on a per address space basis.
Despite this commit log, this change primarily changed stuff outside of
VMCore, and those changes do not carry any tests for correctness (or
even plausibility), and we have consistently found questionable or flat
out incorrect cases in these changes. Most of them are probably correct,
but we need to devise a system that makes it more clear when we have
handled the address space concerns correctly, and ideally each pass that
gets updated would receive an accompanying test case that exercises that
pass specificaly w.r.t. alternate address spaces.
However, from this commit, I have retained the new C API entry points.
Those were an orthogonal change that probably should have been split
apart, but they seem entirely good.
In several places the changes were very obvious cleanups with no actual
multiple address space code added; these I have not reverted when
I spotted them.
In a few other places there were merge conflicts due to a cleaner
solution being implemented later, often not using address spaces at all.
In those cases, I've preserved the new code which isn't address space
dependent.
This is part of my ongoing effort to clean out the partial address space
code which carries high risk and low test coverage, and not likely to be
finished before the 3.2 release looms closer. Duncan and I would both
like to see the above issues addressed before we return to these
changes.
llvm-svn: 167222
getIntPtrType support for multiple address spaces via a pointer type,
and also introduced a crasher bug in the constant folder reported in
PR14233.
These commits also contained several problems that should really be
addressed before they are re-committed. I have avoided reverting various
cleanups to the DataLayout APIs that are reasonable to have moving
forward in order to reduce the amount of churn, and minimize the number
of commits that were reverted. I've also manually updated merge
conflicts and manually arranged for the getIntPtrType function to stay
in DataLayout and to be defined in a plausible way after this revert.
Thanks to Duncan for working through this exact strategy with me, and
Nick Lewycky for tracking down the really annoying crasher this
triggered. (Test case to follow in its own commit.)
After discussing with Duncan extensively, and based on a note from
Micah, I'm going to continue to back out some more of the more
problematic patches in this series in order to ensure we go into the
LLVM 3.2 branch with a reasonable story here. I'll send a note to
llvmdev explaining what's going on and why.
Summary of reverted revisions:
r166634: Fix a compiler warning with an unused variable.
r166607: Add some cleanup to the DataLayout changes requested by
Chandler.
r166596: Revert "Back out r166591, not sure why this made it through
since I cancelled the command. Bleh, sorry about this!
r166591: Delete a directory that wasn't supposed to be checked in yet.
r166578: Add in support for getIntPtrType to get the pointer type based
on the address space.
llvm-svn: 167221
This patch migrates the stpcpy optimizations from the simplify-libcalls
pass into the instcombine library call simplifier. Note that the
__stpcpy_chk simplifications were migrated in a previous commit.
llvm-svn: 167083
integers in that the code to handle split alloca-wide integer loads or
stores doesn't come first. It should, for the same reasons as with
integers, and the PR attests to that. Also had to fix a busted assert in
that this test case also covers.
llvm-svn: 167051
When the switch-to-lookup tables transform landed in SimplifyCFG, it
was pointed out that this could be inappropriate for some targets.
Since there was no way at the time for the pass to know anything about
the target, an awkward reverse-transform was added in CodeGenPrepare
that turned lookup tables back into switches for some targets.
This patch uses the new TargetTransformInfo to determine if a
switch should be transformed, and removes
CodeGenPrepare::ConvertLoadToSwitch.
llvm-svn: 167011
checks to avoid performing compile-time arithmetic on PPCDoubleDouble.
Now that APFloat supports arithmetic on PPCDoubleDouble, those checks
are no longer needed, and we can treat the type like any other.
llvm-svn: 166958
wrapper returns a vector of integers when passed a vector of pointers) by having
getIntPtrType itself return a vector of integers in this case. Outside of this
wrapper, I didn't find anywhere in the codebase that was relying on the old
behaviour for vectors of pointers, so give this a whirl through the buildbots.
llvm-svn: 166939
This turns loops like
for (unsigned i = 0; i != n; ++i)
p[i] = p[i+1];
into memmove, which has a highly optimized implementation in most libcs.
This was really easy with the new DependenceAnalysis :)
llvm-svn: 166875
Requires a lot less code and complexity on loop-idiom's side and the more
precise analysis can catch more cases, like the one I included as a test case.
This also fixes the edge-case miscompilation from PR9481.
Compile time performance seems to be slightly worse, but this is mostly due
to an extra LCSSA run scheduled by the PassManager and should be fixed there.
llvm-svn: 166874
smaller integer loads and stores.
The high-level motivation is that the frontend sometimes generates
a single whole-alloca integer load or store during ABI lowering of
splittable allocas. We need to be able to break this apart in order to
see the underlying elements and properly promote them to SSA values. The
hope is that this fixes some performance regressions on x86-32 with the
new SROA pass.
Unfortunately, this causes quite a bit of churn in the test cases, and
bloats some IR that comes out. When we see an alloca that consists soley
of bits and bytes being extracted and re-inserted, we now do some
splitting first, before building widened integer "bucket of bits"
representations. These are always well folded by instcombine however, so
this shouldn't actually result in missed opportunities.
If this splitting of all-integer allocas does cause problems (perhaps
due to smaller SSA values going into the RA), we could potentially go to
some extreme measures to only do this integer splitting trick when there
are non-integer component accesses of an alloca, but discovering this is
quite expensive: it adds yet another complete walk of the recursive use
tree of the alloca.
Either way, I will be watching build bots and LNT bots to see what
fallout there is here. If anyone gets x86-32 numbers before & after this
change, I would be very interested.
llvm-svn: 166662
very small but very important bugfix:
bool shouldExplore(Use *U) {
Value *V = U->get();
if (isa<CallInst>(V) || isa<InvokeInst>(V))
[...]
should have read:
bool shouldExplore(Use *U) {
Value *V = U->getUser();
if (isa<CallInst>(V) || isa<InvokeInst>(V))
Fixes PR14143!
llvm-svn: 166407
It passes all tests, produces better results than the old code but uses the
wrong pass, LoopDependenceAnalysis, which is old and unmaintained. "Why is it
still in tree?", you might ask. The answer is obviously: "To confuse developers."
Just swapping in the new dependency pass sends the pass manager into an infinte
loop, I'll try to figure out why tomorrow.
llvm-svn: 166399
Requires a lot less code and complexity on loop-idiom's side and the more
precise analysis can catch more cases, like the one I included as a test case.
This also fixes the edge-case miscompilation from PR9481. I'm not entirely
sure that all cases are handled that the old checks handled but LDA will
certainly become smarter in the future.
llvm-svn: 166390
This patch migrates the strcpy optimizations from the simplify-libcalls pass
into the instcombine library call simplifier. Note also that StrCpyChkOpt
has been updated with a few simplifications that were being done in the
simplify-libcalls version of StrCpyOpt, but not in the migrated implementation
of StrCpyOpt. There is no reason to overload StrCpyOpt with fortified and
regular simplifications in the new model since there is already a dedicated
simplifier for __strcpy_chk.
llvm-svn: 166198
operate purely on values. Sink the alloca loading and storing logic into
the rewrite routines that are specific to alloca-integer-rewrite
driving. This is just a refactoring here, but the subsequent step will
be to reuse the insertion and extraction logic when rewriting integer
loads and stores that have been split and decomposed into narrower loads
and stores.
No functionality changed other than different names for instructions.
llvm-svn: 166176
The TargetTransform changes are breaking LTO bootstraps of clang. I am
working with Nadav to figure out the problem, but I am reverting it for now
to get our buildbots working.
This reverts svn commits: 165665 165669 165670 165786 165787 165997
and I have also reverted clang svn 165741
llvm-svn: 166168
a pointer. A very bad idea. Let's not do that. Fixes PR14105.
Note that this wasn't *that* glaring of an oversight. Originally, these
routines were only called on offsets within an alloca, which are
intrinsically positive. But over the evolution of the pass, they ended
up being called for arbitrary offsets, and things went downhill...
llvm-svn: 166095
revision makes no sense. We cannot use the address space of the *post
indexed* type to conclude anything about a *pre indexed* pointer type's
size. More importantly, this index can never be over a pointer. We are
indexing over arrays and vectors here.
Of course, I have no test case here. Neither did the original patch. =/
llvm-svn: 166091
includes extracting ints for copying elsewhere and inserting ints when
copying into the alloca. This should fix the CanSROA assertion coming
out of Clang's regression test suite.
llvm-svn: 165931
cases where we have partial integer loads and stores to an otherwise
promotable alloca to widen[1] those loads and stores to cover the entire
alloca and bitcast them into the appropriate type such that promotion
can proceed.
These partial loads and stores stem from an annoying confluence of ARM's
calling convention and ABI lowering and the FCA pre-splitting which
takes place in SROA. Clang lowers a { double, double } in-register
function argument as a [4 x i32] function argument to ensure it is
placed into integer 32-bit registers (a really unnerving implicit
contract between Clang and the ARM backend I would add). This results in
a FCA load of [4 x i32]* from the { double, double } alloca, and SROA
decomposes this into a sequence of i32 loads and stores. Inlining
proceeds, code gets folded, but at the end of the day, we still have i32
stores to the low and high halves of a double alloca. Widening these to
be i64 operations, and bitcasting them to double prior to loading or
storing allows promotion to proceed for these allocas.
I looked quite a bit changing the IR which Clang produces for this case
to be more friendly, but small changes seem unlikely to help. I think
the best representation we could use currently would be to pass 4 i32
arguments thereby avoiding any FCAs, but that would still require this
fix. It seems like it might eventually be nice to somehow encode the ABI
register selection choices outside of the parameter type system so that
the parameter can be a { double, double }, but the CC register
annotations indicate that this should be passed via 4 integer registers.
This patch does not address the second problem in PR14059, which is the
reverse: when a struct alloca is loaded as a *larger* single integer.
This patch also does not address some of the code quality issues with
the FCA-splitting. Those don't actually impede any optimizations really,
but they're on my list to clean up.
[1]: Pedantic footnote: for those concerned about memory model issues
here, this is safe. For the alloca to be promotable, it cannot escape or
have any use of its address that could allow these loads or stores to be
racing. Thus, widening is always safe.
llvm-svn: 165928
Convert the internal representation of the Attributes class into a pointer to an
opaque object that's uniqued by and stored in the LLVMContext object. The
Attributes class then becomes a thin wrapper around this opaque
object. Eventually, the internal representation will be expanded to include
attributes that represent code generation options, etc.
llvm-svn: 165917
This patch migrates the strcmp and strncmp optimizations from the
simplify-libcalls pass into the instcombine library call simplifier.
llvm-svn: 165915
Erasing from the beginning or middle of the vector is expensive, remove_if can
do it in linear time even though it's a bit ugly without lambdas.
No functionality change.
llvm-svn: 165903
This patch migrates the strchr and strrchr optimizations from the
simplify-libcalls pass into the instcombine library call simplifier.
llvm-svn: 165875
This patch migrates the strcat and strncat optimizations from the
simplify-libcalls pass into the instcombine library call simplifier.
llvm-svn: 165874
type coercion code, especially when targetting ARM. Things like [1
x i32] instead of i32 are very common there.
The goal of this logic is to ensure that when we are picking an alloca
type, we look through such wrapper aggregates and across any zero-length
aggregate elements to find the simplest type possible to form a type
partition.
This logic should (generally speaking) rarely fire. It only ends up
kicking in when an alloca is accessed using two different types (for
instance, i32 and float), and the underlying alloca type has wrapper
aggregates around it. I noticed a significant amount of this occurring
looking at stepanov_abstraction generated code for arm, and suspect it
happens elsewhere as well.
Note that this doesn't yet address truly heinous IR productions such as
PR14059 is concerning. Those result in mismatched *sizes* of types in
addition to mismatched access and alloca types.
llvm-svn: 165870
help the dragonegg builders, and no test case at this point, but this
was one dimly plausible case I spotted by inspection. Hopefully will get
a testcase from those bots soon-ish, and will tidy this up with proper
testing.
llvm-svn: 165869
are single value types, the load and store should be directly based upon
the alloca and then bitcasting can fix the type as needed afterward.
This might in theory improve some of the IR coming out of SROA, but
I don't expect big changes yet and don't have any test cases on hand.
This is really just a cleanup/refactoring patch. The next patch will
cause this code path to be hit a lot more, actually get SROA to promote
more allocas and include several more test cases.
llvm-svn: 165864
We use the enums to query whether an Attributes object has that attribute. The
opaque layer is responsible for knowing where that specific attribute is stored.
llvm-svn: 165488
This class is used by LSR and a number of places in the codegen.
This is the first step in de-coupling LSR from TLI, and creating
a new interface in between them.
llvm-svn: 165455
have an alloca or a parameter, since then the alloca test should make sense
to readers, while before it probably appears too specific. No functionality
change.
llvm-svn: 165306
are in fact identity operations. We detect these and kill their
partitions so that even splitting is unaffected by them. This is
particularly important because Clang relies on emitting identity memcpy
operations for struct copies, and these fold away to constants very
often after inlining.
Fixes the last big performance FIXME I have on my plate.
llvm-svn: 165285
the rewrite visitor to make the fact that the speculation is completely
independent a bit more clear.
I promise that this is just a cut/paste of the one visitor and adding
the annonymous namespace wrappings. The diff may look completely
preposterous, it does in git for some reason.
llvm-svn: 165284
cpyDest can be mutated in some cases, which would then cause a crash later if
indeed the memory was underaligned. This brought down several buildbots, so
I guess the underaligned case is much more common than I thought!
llvm-svn: 165228
Currently, we re-visit allocas when something changes about the way they
might be *split* to allow better scalarization to take place. However,
we weren't handling the case when the *promotion* is what would change
the behavior of SROA. When an address derived from an alloca is stored
into another alloca, we consider the first to have escaped. If the
second is ever promoted to an SSA value, we will suddenly be able to run
the SROA pass on the first alloca.
This patch adds explicit support for this form if iteration. When we
detect a store of a pointer derived from an alloca, we flag the
underlying alloca for reprocessing after promotion. The logic works hard
to only do this when there is definitely going to be promotion and it
might remove impediments to the analysis of the alloca.
Thanks to Nick for the great test case and Benjamin for some sanity
check review.
llvm-svn: 165223
was less aligned than the old. In the testcase this results in an overaligned
memset: the memset alignment was correct for the original memory but is too much
for the new memory. Fix this by either increasing the alignment of the new
memory or bailing out if that isn't possible. Should fix the gcc-4.7 self-host
buildbot failure.
llvm-svn: 165220
Sorry for this being broken so long. =/
As part of this, switch all of the existing tests to be Little Endian,
which is the behavior I was asserting in them anyways! Add in a new
big-endian test that checks the interesting behavior there.
Another part of this is to tighten the rules abotu when we perform the
full-integer promotion. This logic now rejects cases where there fully
promoted integer is a non-multiple-of-8 bitwidth or cases where the
loads or stores touch bits which are in the allocated space of the
alloca but are not loaded or stored when accessing the integer. Sadly,
these aren't really observable today as the rest of the pass will
already ensure the invariants hold. However, the latter situation is
likely to become a potential concern in the future.
Thanks to Benjamin and Duncan for early review of this patch. I'm still
looking into whether there are further endianness issues, please let me
know if anyone sees BE failures persisting past this.
llvm-svn: 165219
a memcpy to reflect that '0' has a different meaning when applied to
a load or store. Now we correctly use underaligned loads and stores for
the test case added.
llvm-svn: 165101
necessary during rewriting. As part of this, fix a real think-o here
where we might have left off an alignment specification when the address
is in fact underaligned. I haven't come up with any way to trigger this,
as there is always some other factor that reduces the alignment, but it
certainly might have been an observable bug in some way I can't think
of. This also slightly changes the strategy for placing explicit
alignments on loads and stores to only do so when the alignment does not
match that required by the ABI. This causes a few redundant alignments
to go away from test cases.
I've also added a couple of tests that really push on the alignment that
we end up with on loads and stores. More to come here as I try to fix an
underlying bug I have conjectured and produced test cases for, although
it's not clear if this bug is the one currently hitting dragonegg's
gcc47 bootstrap.
llvm-svn: 165100
preserves the values of the relocated entries, unlikely remove_if. This
allows walking them and erasing them.
Also flesh out the predicate we are using for this to support the
various constraints actually imposed on a UnaryPredicate -- without this
we can't compose it with std::not1.
Thanks to Sean Silva for the review here and noticing the issue with
std::remove_if.
llvm-svn: 165073
scheduled for processing on the worklist eventually gets deleted while
we are processing another alloca, fixing the original test case in
PR13990.
To facilitate this, add a remove_if helper to the SetVector abstraction.
It's not easy to use the standard abstractions for this because of the
specifics of SetVectors types and implementation.
Finally, a nice small test case is included. Thanks to Benjamin for the
fantastic reduced test case here! All I had to do was delete some empty
basic blocks!
llvm-svn: 165065
We require that the indices into the use lists are stable in order to
build fast lookup tables to locate a particular partition use from an
operand of a PHI or select. This is (obviously in hind sight)
incompatible with erasing elements from the array. Really, we don't want
to erase anyways. It is expensive, and a rare operation. Instead, simply
weaken the contract of the PartitionUse structure to allow null Use
pointers to represent dead uses. Now we can clear out the pointer to
mark things as dead, and all it requires is adding some 'continue'
checks to the various loops.
I'm still reducing a test case for this, as the test case I have is
huge. I think this one I can get a nice test case for though, as it was
much more deterministic.
llvm-svn: 165032
being separate was that it can grow the use list. As a consequence, we
can't use the iterator-pair interface, we need an index based interface.
Expose such an interface from the AllocaPartitioning, and use it in the
speculator.
This should at least fix a use-after-free bug found by Duncan, and may
fix some of the other crashers.
I don't have a nice deterministic test case yet, but if I get a good
one, I'll add it.
llvm-svn: 165027
alignment requirements of the new alloca. As one consequence which was
reported as a bug by Duncan, we overaligned memcpy calls to ranges of
allocas after they were rewritten to types with lower alignment
requirements. Other consquences are possible, but I don't have any test
cases for them.
llvm-svn: 164937