forked from OSchip/llvm-project
24ac830d7c
a pre-splitting pass over loads and stores. Historically, splitting could cause enough problems that I hamstrung the entire process with a requirement that splittable integer loads and stores must cover the entire alloca. All smaller loads and stores were unsplittable to prevent chaos from ensuing. With the new pre-splitting logic that does load/store pair splitting I introduced in r225061, we can now very nicely handle arbitrarily splittable loads and stores. In order to fully benefit from these smarts, we need to mark all of the integer loads and stores as splittable. However, we don't actually want to rewrite partitions with all integer loads and stores marked as splittable. This will fail to extract scalar integers from aggregates, which is kind of the point of SROA. =] In order to resolve this, what we really want to do is only do pre-splitting on the alloca slices with integer loads and stores fully splittable. This allows us to uncover all non-integer uses of the alloca that would benefit from a split in an integer load or store (and where introducing the split is safe because it is just memory transfer from a load to a store). Once done, we make all the non-whole-alloca integer loads and stores unsplittable just as they have historically been, repartition and rewrite. The result is that when there are integer loads and stores anywhere within an alloca (such as from a memcpy of a sub-object of a larger object), we can split them up if there are non-integer components to the aggregate hiding beneath. I've added the challenging test cases to demonstrate how this is able to promote to scalars even a case where we have even *partially* overlapping loads and stores. This restores the single-store behavior for small arrays of i8s which is really nice. I've restored both the little endian testing and big endian testing for these exactly as they were prior to r225061. It also forced me to be more aggressive in an alignment test to actually defeat SROA. =] Without the added volatiles there, we actually split up the weird i16 loads and produce nice double allocas with better alignment. This also uncovered a number of bugs where we failed to handle splittable load and store slices which didn't have a begininng offset of zero. Those fixes are included, and without them the existing test cases explode in glorious fireworks. =] I've kept support for leaving whole-alloca integer loads and stores as splittable even for the purpose of rewriting, but I think that's likely no longer needed. With the new pre-splitting, we might be able to remove all the splitting support for loads and stores from the rewriter. Not doing that in this patch to try to isolate any performance regressions that causes in an easy to find and revert chunk. llvm-svn: 225074 |
||
---|---|---|
.. | ||
ADCE.cpp | ||
AlignmentFromAssumptions.cpp | ||
CMakeLists.txt | ||
ConstantHoisting.cpp | ||
ConstantProp.cpp | ||
CorrelatedValuePropagation.cpp | ||
DCE.cpp | ||
DeadStoreElimination.cpp | ||
EarlyCSE.cpp | ||
FlattenCFGPass.cpp | ||
GVN.cpp | ||
IndVarSimplify.cpp | ||
JumpThreading.cpp | ||
LICM.cpp | ||
LLVMBuild.txt | ||
LoadCombine.cpp | ||
LoopDeletion.cpp | ||
LoopIdiomRecognize.cpp | ||
LoopInstSimplify.cpp | ||
LoopRerollPass.cpp | ||
LoopRotation.cpp | ||
LoopStrengthReduce.cpp | ||
LoopUnrollPass.cpp | ||
LoopUnswitch.cpp | ||
LowerAtomic.cpp | ||
Makefile | ||
MemCpyOptimizer.cpp | ||
MergedLoadStoreMotion.cpp | ||
PartiallyInlineLibCalls.cpp | ||
Reassociate.cpp | ||
Reg2Mem.cpp | ||
SCCP.cpp | ||
SROA.cpp | ||
SampleProfile.cpp | ||
Scalar.cpp | ||
ScalarReplAggregates.cpp | ||
Scalarizer.cpp | ||
SeparateConstOffsetFromGEP.cpp | ||
SimplifyCFGPass.cpp | ||
Sink.cpp | ||
StructurizeCFG.cpp | ||
TailRecursionElimination.cpp |