llvm-project/llvm/test/Transforms/DeadStoreElimination
Hal Finkel a1271036c5 Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.

It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.

I discovered this problem when investigating a performance issue with code like
this on PowerPC:

  #include <complex>
  using namespace std;

  complex<float> bar(complex<float> C);
  complex<float> foo(complex<float> C) {
    return bar(C)*C;
  }

which produces this:

  define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
  entry:
    %ref.tmp = alloca i64, align 8
    %tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
    %c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
    %c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
    %0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
    %c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
    %1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
    call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
    %2 = bitcast %"struct.std::complex"* %agg.result to i64*
    %3 = load i64, i64* %ref.tmp, align 8
    store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
    %_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
    %4 = lshr i64 %3, 32
    %5 = trunc i64 %4 to i32
    %6 = bitcast i32 %5 to float
    %_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
    %7 = trunc i64 %3 to i32
    %8 = bitcast i32 %7 to float
    %mul_ad.i.i = fmul fast float %6, %1
    %mul_bc.i.i = fmul fast float %8, %0
    %mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
    %mul_ac.i.i = fmul fast float %6, %0
    %mul_bd.i.i = fmul fast float %8, %1
    %mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
    store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
    store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
    ret void
  }

the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.

In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.

Differential Revision: http://reviews.llvm.org/D18586

llvm-svn: 273559
2016-06-23 13:46:39 +00:00
..
2011-03-25-DSEMiscompile.ll
2011-09-06-EndOfFunction.ll
2011-09-06-MemCpy.ll Revert "Change memcpy/memset/memmove to have dest and source alignments." 2015-11-19 05:56:52 +00:00
OverwriteStoreBegin.ll [DeadStoreElimination] Shorten beginning of memset overwritten by later stores 2016-04-22 19:51:29 +00:00
OverwriteStoreEnd.ll Revert "Change memcpy/memset/memmove to have dest and source alignments." 2015-11-19 05:56:52 +00:00
PartialStore.ll
atomic.ll MemoryDependenceAnalysis: Don't miscompile atomics 2015-03-21 06:19:17 +00:00
calloc-store.ll [DeadStoreElimination] Remove dead zero store to calloc initialized memory 2015-09-23 11:38:44 +00:00
combined-partial-overwrites.ll Allow DeadStoreElimination to track combinations of partial later wrties 2016-06-23 13:46:39 +00:00
const-pointers.ll
crash.ll Revert "Change memcpy/memset/memmove to have dest and source alignments." 2015-11-19 05:56:52 +00:00
cs-cs-aliasing.ll Revert "Change memcpy/memset/memmove to have dest and source alignments." 2015-11-19 05:56:52 +00:00
dominate.ll
dse_with_dbg_value.ll Fix for PR27940 2016-06-20 09:10:10 +00:00
fence.ll Allow value forwarding past release fences in GVN 2016-03-25 22:40:35 +00:00
free.ll
inst-limits.ll [PR27284] Reverse the ownership between DICompileUnit and DISubprogram. 2016-04-15 15:57:41 +00:00
libcalls.ll
lifetime.ll Revert "Change memcpy/memset/memmove to have dest and source alignments." 2015-11-19 05:56:52 +00:00
memintrinsics.ll Revert "Change memcpy/memset/memmove to have dest and source alignments." 2015-11-19 05:56:52 +00:00
no-targetdata.ll Revert "Change memcpy/memset/memmove to have dest and source alignments." 2015-11-19 05:56:52 +00:00
operand-bundles.ll Move previously added test case to the right location 2016-06-13 20:12:07 +00:00
pr11390.ll Revert "Change memcpy/memset/memmove to have dest and source alignments." 2015-11-19 05:56:52 +00:00
simple.ll [PM] Port DSE to the new pass manager 2016-05-17 21:38:13 +00:00