2007-08-03 02:11:11 +08:00
|
|
|
//===- DeadStoreElimination.cpp - Fast Dead Store Elimination -------------===//
|
2007-07-11 08:46:18 +08:00
|
|
|
//
|
2019-01-19 16:50:56 +08:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
2007-07-11 08:46:18 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
//
|
2015-12-12 02:39:41 +08:00
|
|
|
// This file implements a trivial dead store elimination that only considers
|
|
|
|
// basic-block local redundant stores.
|
|
|
|
//
|
|
|
|
// FIXME: This should eventually be extended to be a post-dominator tree
|
|
|
|
// traversal. Doing so would be pretty trivial.
|
2007-07-11 08:46:18 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
#include "llvm/Transforms/Scalar/DeadStoreElimination.h"
|
2017-10-14 05:17:07 +08:00
|
|
|
#include "llvm/ADT/APInt.h"
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
#include "llvm/ADT/DenseMap.h"
|
2012-12-04 00:50:05 +08:00
|
|
|
#include "llvm/ADT/SetVector.h"
|
2017-10-14 05:17:07 +08:00
|
|
|
#include "llvm/ADT/SmallPtrSet.h"
|
|
|
|
#include "llvm/ADT/SmallVector.h"
|
2012-12-04 00:50:05 +08:00
|
|
|
#include "llvm/ADT/Statistic.h"
|
2017-10-14 05:17:07 +08:00
|
|
|
#include "llvm/ADT/StringRef.h"
|
2007-07-12 07:19:17 +08:00
|
|
|
#include "llvm/Analysis/AliasAnalysis.h"
|
2011-10-23 05:59:35 +08:00
|
|
|
#include "llvm/Analysis/CaptureTracking.h"
|
[PM/AA] Rebuild LLVM's alias analysis infrastructure in a way compatible
with the new pass manager, and no longer relying on analysis groups.
This builds essentially a ground-up new AA infrastructure stack for
LLVM. The core ideas are the same that are used throughout the new pass
manager: type erased polymorphism and direct composition. The design is
as follows:
- FunctionAAResults is a type-erasing alias analysis results aggregation
interface to walk a single query across a range of results from
different alias analyses. Currently this is function-specific as we
always assume that aliasing queries are *within* a function.
- AAResultBase is a CRTP utility providing stub implementations of
various parts of the alias analysis result concept, notably in several
cases in terms of other more general parts of the interface. This can
be used to implement only a narrow part of the interface rather than
the entire interface. This isn't really ideal, this logic should be
hoisted into FunctionAAResults as currently it will cause
a significant amount of redundant work, but it faithfully models the
behavior of the prior infrastructure.
- All the alias analysis passes are ported to be wrapper passes for the
legacy PM and new-style analysis passes for the new PM with a shared
result object. In some cases (most notably CFL), this is an extremely
naive approach that we should revisit when we can specialize for the
new pass manager.
- BasicAA has been restructured to reflect that it is much more
fundamentally a function analysis because it uses dominator trees and
loop info that need to be constructed for each function.
All of the references to getting alias analysis results have been
updated to use the new aggregation interface. All the preservation and
other pass management code has been updated accordingly.
The way the FunctionAAResultsWrapperPass works is to detect the
available alias analyses when run, and add them to the results object.
This means that we should be able to continue to respect when various
passes are added to the pipeline, for example adding CFL or adding TBAA
passes should just cause their results to be available and to get folded
into this. The exception to this rule is BasicAA which really needs to
be a function pass due to using dominator trees and loop info. As
a consequence, the FunctionAAResultsWrapperPass directly depends on
BasicAA and always includes it in the aggregation.
This has significant implications for preserving analyses. Generally,
most passes shouldn't bother preserving FunctionAAResultsWrapperPass
because rebuilding the results just updates the set of known AA passes.
The exception to this rule are LoopPass instances which need to preserve
all the function analyses that the loop pass manager will end up
needing. This means preserving both BasicAAWrapperPass and the
aggregating FunctionAAResultsWrapperPass.
Now, when preserving an alias analysis, you do so by directly preserving
that analysis. This is only necessary for non-immutable-pass-provided
alias analyses though, and there are only three of interest: BasicAA,
GlobalsAA (formerly GlobalsModRef), and SCEVAA. Usually BasicAA is
preserved when needed because it (like DominatorTree and LoopInfo) is
marked as a CFG-only pass. I've expanded GlobalsAA into the preserved
set everywhere we previously were preserving all of AliasAnalysis, and
I've added SCEVAA in the intersection of that with where we preserve
SCEV itself.
One significant challenge to all of this is that the CGSCC passes were
actually using the alias analysis implementations by taking advantage of
a pretty amazing set of loop holes in the old pass manager's analysis
management code which allowed analysis groups to slide through in many
cases. Moving away from analysis groups makes this problem much more
obvious. To fix it, I've leveraged the flexibility the design of the new
PM components provides to just directly construct the relevant alias
analyses for the relevant functions in the IPO passes that need them.
This is a bit hacky, but should go away with the new pass manager, and
is already in many ways cleaner than the prior state.
Another significant challenge is that various facilities of the old
alias analysis infrastructure just don't fit any more. The most
significant of these is the alias analysis 'counter' pass. That pass
relied on the ability to snoop on AA queries at different points in the
analysis group chain. Instead, I'm planning to build printing
functionality directly into the aggregation layer. I've not included
that in this patch merely to keep it smaller.
Note that all of this needs a nearly complete rewrite of the AA
documentation. I'm planning to do that, but I'd like to make sure the
new design settles, and to flesh out a bit more of what it looks like in
the new pass manager first.
Differential Revision: http://reviews.llvm.org/D12080
llvm-svn: 247167
2015-09-10 01:55:00 +08:00
|
|
|
#include "llvm/Analysis/GlobalsModRef.h"
|
2009-10-28 04:05:49 +08:00
|
|
|
#include "llvm/Analysis/MemoryBuiltins.h"
|
2007-07-11 08:46:18 +08:00
|
|
|
#include "llvm/Analysis/MemoryDependenceAnalysis.h"
|
2017-10-14 05:17:07 +08:00
|
|
|
#include "llvm/Analysis/MemoryLocation.h"
|
2019-03-29 22:10:24 +08:00
|
|
|
#include "llvm/Analysis/OrderedBasicBlock.h"
|
2015-03-24 03:32:43 +08:00
|
|
|
#include "llvm/Analysis/TargetLibraryInfo.h"
|
2010-12-01 07:05:20 +08:00
|
|
|
#include "llvm/Analysis/ValueTracking.h"
|
2017-10-14 05:17:07 +08:00
|
|
|
#include "llvm/IR/Argument.h"
|
|
|
|
#include "llvm/IR/BasicBlock.h"
|
|
|
|
#include "llvm/IR/CallSite.h"
|
|
|
|
#include "llvm/IR/Constant.h"
|
2013-01-02 19:36:10 +08:00
|
|
|
#include "llvm/IR/Constants.h"
|
|
|
|
#include "llvm/IR/DataLayout.h"
|
2014-01-13 17:26:24 +08:00
|
|
|
#include "llvm/IR/Dominators.h"
|
2013-01-02 19:36:10 +08:00
|
|
|
#include "llvm/IR/Function.h"
|
2017-10-14 05:17:07 +08:00
|
|
|
#include "llvm/IR/InstrTypes.h"
|
|
|
|
#include "llvm/IR/Instruction.h"
|
2013-01-02 19:36:10 +08:00
|
|
|
#include "llvm/IR/Instructions.h"
|
|
|
|
#include "llvm/IR/IntrinsicInst.h"
|
2017-10-14 05:17:07 +08:00
|
|
|
#include "llvm/IR/Intrinsics.h"
|
2017-09-26 21:54:28 +08:00
|
|
|
#include "llvm/IR/LLVMContext.h"
|
2017-10-14 05:17:07 +08:00
|
|
|
#include "llvm/IR/Module.h"
|
|
|
|
#include "llvm/IR/PassManager.h"
|
|
|
|
#include "llvm/IR/Value.h"
|
2012-12-04 00:50:05 +08:00
|
|
|
#include "llvm/Pass.h"
|
2017-10-14 05:17:07 +08:00
|
|
|
#include "llvm/Support/Casting.h"
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
#include "llvm/Support/CommandLine.h"
|
2012-12-04 00:50:05 +08:00
|
|
|
#include "llvm/Support/Debug.h"
|
2017-10-14 05:17:07 +08:00
|
|
|
#include "llvm/Support/ErrorHandling.h"
|
|
|
|
#include "llvm/Support/MathExtras.h"
|
2015-03-24 03:32:43 +08:00
|
|
|
#include "llvm/Support/raw_ostream.h"
|
2016-05-18 05:38:13 +08:00
|
|
|
#include "llvm/Transforms/Scalar.h"
|
2019-03-29 22:10:24 +08:00
|
|
|
#include "llvm/Transforms/Utils/Local.h"
|
2017-10-14 05:17:07 +08:00
|
|
|
#include <algorithm>
|
|
|
|
#include <cassert>
|
|
|
|
#include <cstddef>
|
2018-03-22 06:34:23 +08:00
|
|
|
#include <cstdint>
|
2017-10-14 05:17:07 +08:00
|
|
|
#include <iterator>
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
#include <map>
|
2017-10-14 05:17:07 +08:00
|
|
|
#include <utility>
|
|
|
|
|
2007-07-11 08:46:18 +08:00
|
|
|
using namespace llvm;
|
|
|
|
|
2014-04-22 10:55:47 +08:00
|
|
|
#define DEBUG_TYPE "dse"
|
|
|
|
|
2015-08-13 23:36:11 +08:00
|
|
|
STATISTIC(NumRedundantStores, "Number of redundant stores deleted");
|
2007-07-11 08:46:18 +08:00
|
|
|
STATISTIC(NumFastStores, "Number of stores deleted");
|
2018-08-18 02:40:41 +08:00
|
|
|
STATISTIC(NumFastOther, "Number of other instrs removed");
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
STATISTIC(NumCompletePartials, "Number of stores dead by later partials");
|
2017-09-26 21:54:28 +08:00
|
|
|
STATISTIC(NumModifiedStores, "Number of stores modified");
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
|
|
|
|
static cl::opt<bool>
|
|
|
|
EnablePartialOverwriteTracking("enable-dse-partial-overwrite-tracking",
|
|
|
|
cl::init(true), cl::Hidden,
|
|
|
|
cl::desc("Enable partial-overwrite tracking in DSE"));
|
2007-07-11 08:46:18 +08:00
|
|
|
|
2017-09-26 21:54:28 +08:00
|
|
|
static cl::opt<bool>
|
|
|
|
EnablePartialStoreMerging("enable-dse-partial-store-merging",
|
|
|
|
cl::init(true), cl::Hidden,
|
|
|
|
cl::desc("Enable partial store merging in DSE"));
|
|
|
|
|
2010-12-01 05:58:14 +08:00
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
// Helper functions
|
|
|
|
//===----------------------------------------------------------------------===//
|
2017-10-14 05:17:07 +08:00
|
|
|
using OverlapIntervalsTy = std::map<int64_t, int64_t>;
|
|
|
|
using InstOverlapIntervalsTy = DenseMap<Instruction *, OverlapIntervalsTy>;
|
2010-12-01 05:58:14 +08:00
|
|
|
|
2016-06-11 01:58:01 +08:00
|
|
|
/// Delete this instruction. Before we do, go through and zero out all the
|
2016-05-18 05:38:13 +08:00
|
|
|
/// operands of this instruction. If any of them become dead, delete them and
|
|
|
|
/// the computation tree that feeds them.
|
2015-08-19 10:15:13 +08:00
|
|
|
/// If ValueSet is non-null, remove any deleted instructions from it as well.
|
2016-05-18 05:38:13 +08:00
|
|
|
static void
|
2016-07-07 03:48:52 +08:00
|
|
|
deleteDeadInstruction(Instruction *I, BasicBlock::iterator *BBI,
|
|
|
|
MemoryDependenceResults &MD, const TargetLibraryInfo &TLI,
|
2019-03-29 22:10:24 +08:00
|
|
|
InstOverlapIntervalsTy &IOL, OrderedBasicBlock &OBB,
|
Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
2019-04-24 14:55:50 +08:00
|
|
|
SmallSetVector<const Value *, 16> *ValueSet = nullptr) {
|
2015-08-19 10:15:13 +08:00
|
|
|
SmallVector<Instruction*, 32> NowDeadInsts;
|
|
|
|
|
|
|
|
NowDeadInsts.push_back(I);
|
|
|
|
--NumFastOther;
|
|
|
|
|
2016-07-07 03:48:52 +08:00
|
|
|
// Keeping the iterator straight is a pain, so we let this routine tell the
|
|
|
|
// caller what the next instruction is after we're done mucking about.
|
|
|
|
BasicBlock::iterator NewIter = *BBI;
|
|
|
|
|
2015-08-19 10:15:13 +08:00
|
|
|
// Before we touch this instruction, remove it from memdep!
|
|
|
|
do {
|
|
|
|
Instruction *DeadInst = NowDeadInsts.pop_back_val();
|
|
|
|
++NumFastOther;
|
|
|
|
|
2018-02-14 02:15:26 +08:00
|
|
|
// Try to preserve debug information attached to the dead instruction.
|
|
|
|
salvageDebugInfo(*DeadInst);
|
|
|
|
|
2015-08-19 10:15:13 +08:00
|
|
|
// This instruction is dead, zap it, in stages. Start by removing it from
|
|
|
|
// MemDep, which needs to know the operands and needs it to be in the
|
|
|
|
// function.
|
|
|
|
MD.removeInstruction(DeadInst);
|
|
|
|
|
|
|
|
for (unsigned op = 0, e = DeadInst->getNumOperands(); op != e; ++op) {
|
|
|
|
Value *Op = DeadInst->getOperand(op);
|
|
|
|
DeadInst->setOperand(op, nullptr);
|
|
|
|
|
|
|
|
// If this operand just became dead, add it to the NowDeadInsts list.
|
|
|
|
if (!Op->use_empty()) continue;
|
|
|
|
|
|
|
|
if (Instruction *OpI = dyn_cast<Instruction>(Op))
|
|
|
|
if (isInstructionTriviallyDead(OpI, &TLI))
|
|
|
|
NowDeadInsts.push_back(OpI);
|
|
|
|
}
|
|
|
|
|
2016-08-12 09:09:53 +08:00
|
|
|
if (ValueSet) ValueSet->remove(DeadInst);
|
|
|
|
IOL.erase(DeadInst);
|
2019-03-29 22:10:24 +08:00
|
|
|
OBB.eraseInstruction(DeadInst);
|
2016-07-07 03:48:52 +08:00
|
|
|
|
|
|
|
if (NewIter == DeadInst->getIterator())
|
|
|
|
NewIter = DeadInst->eraseFromParent();
|
|
|
|
else
|
|
|
|
DeadInst->eraseFromParent();
|
2015-08-19 10:15:13 +08:00
|
|
|
} while (!NowDeadInsts.empty());
|
2016-07-07 03:48:52 +08:00
|
|
|
*BBI = NewIter;
|
2015-08-19 10:15:13 +08:00
|
|
|
}
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
/// Does this instruction write some memory? This only returns true for things
|
|
|
|
/// that we can analyze with other helpers below.
|
2018-01-21 09:44:33 +08:00
|
|
|
static bool hasAnalyzableMemoryWrite(Instruction *I,
|
|
|
|
const TargetLibraryInfo &TLI) {
|
2009-11-10 14:46:40 +08:00
|
|
|
if (isa<StoreInst>(I))
|
|
|
|
return true;
|
|
|
|
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
|
|
|
|
switch (II->getIntrinsicID()) {
|
2009-12-02 14:35:55 +08:00
|
|
|
default:
|
|
|
|
return false;
|
|
|
|
case Intrinsic::memset:
|
|
|
|
case Intrinsic::memmove:
|
|
|
|
case Intrinsic::memcpy:
|
2018-04-24 03:06:49 +08:00
|
|
|
case Intrinsic::memcpy_element_unordered_atomic:
|
|
|
|
case Intrinsic::memmove_element_unordered_atomic:
|
|
|
|
case Intrinsic::memset_element_unordered_atomic:
|
2009-12-02 14:35:55 +08:00
|
|
|
case Intrinsic::init_trampoline:
|
|
|
|
case Intrinsic::lifetime_end:
|
|
|
|
return true;
|
2009-11-10 14:46:40 +08:00
|
|
|
}
|
|
|
|
}
|
2015-04-10 22:50:08 +08:00
|
|
|
if (auto CS = CallSite(I)) {
|
2012-09-25 06:09:10 +08:00
|
|
|
if (Function *F = CS.getCalledFunction()) {
|
2016-06-17 01:06:04 +08:00
|
|
|
StringRef FnName = F->getName();
|
[Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).
Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.
The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)
There are additional changes required in clang.
Reviewers: rsmith
Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28476
llvm-svn: 292848
2017-01-24 07:16:46 +08:00
|
|
|
if (TLI.has(LibFunc_strcpy) && FnName == TLI.getName(LibFunc_strcpy))
|
2012-09-25 06:09:10 +08:00
|
|
|
return true;
|
[Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).
Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.
The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)
There are additional changes required in clang.
Reviewers: rsmith
Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28476
llvm-svn: 292848
2017-01-24 07:16:46 +08:00
|
|
|
if (TLI.has(LibFunc_strncpy) && FnName == TLI.getName(LibFunc_strncpy))
|
2012-09-25 06:09:10 +08:00
|
|
|
return true;
|
[Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).
Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.
The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)
There are additional changes required in clang.
Reviewers: rsmith
Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28476
llvm-svn: 292848
2017-01-24 07:16:46 +08:00
|
|
|
if (TLI.has(LibFunc_strcat) && FnName == TLI.getName(LibFunc_strcat))
|
2012-09-25 06:09:10 +08:00
|
|
|
return true;
|
[Analysis] Add LibFunc_ prefix to enums in TargetLibraryInfo. (NFC)
Summary:
The LibFunc::Func enum holds enumerators named for libc functions.
Unfortunately, there are real situations, including libc implementations, where
function names are actually macros (musl uses "#define fopen64 fopen", for
example; any other transitively visible macro would have similar effects).
Strictly speaking, a conforming C++ Standard Library should provide any such
macros as functions instead (via <cstdio>). However, there are some "library"
functions which are not part of the standard, and thus not subject to this
rule (fopen64, for example). So, in order to be both portable and consistent,
the enum should not use the bare function names.
The old enum naming used a namespace LibFunc and an enum Func, with bare
enumerators. This patch changes LibFunc to be an enum with enumerators prefixed
with "LibFFunc_". (Unfortunately, a scoped enum is not sufficient to override
macros.)
There are additional changes required in clang.
Reviewers: rsmith
Subscribers: mehdi_amini, mzolotukhin, nemanjai, llvm-commits
Differential Revision: https://reviews.llvm.org/D28476
llvm-svn: 292848
2017-01-24 07:16:46 +08:00
|
|
|
if (TLI.has(LibFunc_strncat) && FnName == TLI.getName(LibFunc_strncat))
|
2012-09-25 06:09:10 +08:00
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
2009-11-10 14:46:40 +08:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
/// Return a Location stored to by the specified instruction. If isRemovable
|
|
|
|
/// returns true, this function and getLocForRead completely describe the memory
|
|
|
|
/// operations for this instruction.
|
2018-01-21 10:10:54 +08:00
|
|
|
static MemoryLocation getLocForWrite(Instruction *Inst) {
|
2018-07-31 03:41:25 +08:00
|
|
|
|
2010-11-30 15:23:21 +08:00
|
|
|
if (StoreInst *SI = dyn_cast<StoreInst>(Inst))
|
2015-06-04 10:03:15 +08:00
|
|
|
return MemoryLocation::get(SI);
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2018-04-24 03:06:49 +08:00
|
|
|
if (auto *MI = dyn_cast<AnyMemIntrinsic>(Inst)) {
|
2010-11-30 15:23:21 +08:00
|
|
|
// memcpy/memmove/memset.
|
2015-06-17 15:18:54 +08:00
|
|
|
MemoryLocation Loc = MemoryLocation::getForDest(MI);
|
2010-11-30 15:23:21 +08:00
|
|
|
return Loc;
|
|
|
|
}
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2018-01-21 10:10:54 +08:00
|
|
|
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst)) {
|
|
|
|
switch (II->getIntrinsicID()) {
|
|
|
|
default:
|
|
|
|
return MemoryLocation(); // Unhandled intrinsic.
|
|
|
|
case Intrinsic::init_trampoline:
|
|
|
|
return MemoryLocation(II->getArgOperand(0));
|
|
|
|
case Intrinsic::lifetime_end: {
|
|
|
|
uint64_t Len = cast<ConstantInt>(II->getArgOperand(0))->getZExtValue();
|
|
|
|
return MemoryLocation(II->getArgOperand(1), Len);
|
|
|
|
}
|
|
|
|
}
|
2010-11-30 15:23:21 +08:00
|
|
|
}
|
2018-01-21 10:10:54 +08:00
|
|
|
if (auto CS = CallSite(Inst))
|
|
|
|
// All the supported TLI functions so far happen to have dest as their
|
|
|
|
// first argument.
|
|
|
|
return MemoryLocation(CS.getArgument(0));
|
|
|
|
return MemoryLocation();
|
2010-11-30 15:23:21 +08:00
|
|
|
}
|
|
|
|
|
2018-01-21 09:44:33 +08:00
|
|
|
/// Return the location read by the specified "hasAnalyzableMemoryWrite"
|
|
|
|
/// instruction if any.
|
2015-08-13 02:01:44 +08:00
|
|
|
static MemoryLocation getLocForRead(Instruction *Inst,
|
|
|
|
const TargetLibraryInfo &TLI) {
|
2018-01-21 09:44:33 +08:00
|
|
|
assert(hasAnalyzableMemoryWrite(Inst, TLI) && "Unknown instruction case");
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-06 09:48:06 +08:00
|
|
|
// The only instructions that both read and write are the mem transfer
|
|
|
|
// instructions (memcpy/memmove).
|
2018-04-24 03:06:49 +08:00
|
|
|
if (auto *MTI = dyn_cast<AnyMemTransferInst>(Inst))
|
2015-06-04 10:03:15 +08:00
|
|
|
return MemoryLocation::getForSource(MTI);
|
2015-06-17 15:18:54 +08:00
|
|
|
return MemoryLocation();
|
2010-12-06 09:48:06 +08:00
|
|
|
}
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
/// If the value of this instruction and the memory it writes to is unused, may
|
|
|
|
/// we delete this instruction?
|
2010-11-30 13:30:45 +08:00
|
|
|
static bool isRemovable(Instruction *I) {
|
2011-08-18 06:22:24 +08:00
|
|
|
// Don't remove volatile/atomic stores.
|
2009-11-10 14:46:40 +08:00
|
|
|
if (StoreInst *SI = dyn_cast<StoreInst>(I))
|
2011-08-18 06:22:24 +08:00
|
|
|
return SI->isUnordered();
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2012-09-25 06:09:10 +08:00
|
|
|
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
|
|
|
|
switch (II->getIntrinsicID()) {
|
2018-01-21 09:44:33 +08:00
|
|
|
default: llvm_unreachable("doesn't pass 'hasAnalyzableMemoryWrite' predicate");
|
2012-09-25 06:09:10 +08:00
|
|
|
case Intrinsic::lifetime_end:
|
|
|
|
// Never remove dead lifetime_end's, e.g. because it is followed by a
|
|
|
|
// free.
|
|
|
|
return false;
|
|
|
|
case Intrinsic::init_trampoline:
|
|
|
|
// Always safe to remove init_trampoline.
|
|
|
|
return true;
|
|
|
|
case Intrinsic::memset:
|
|
|
|
case Intrinsic::memmove:
|
|
|
|
case Intrinsic::memcpy:
|
|
|
|
// Don't remove volatile memory intrinsics.
|
|
|
|
return !cast<MemIntrinsic>(II)->isVolatile();
|
2018-04-24 03:06:49 +08:00
|
|
|
case Intrinsic::memcpy_element_unordered_atomic:
|
|
|
|
case Intrinsic::memmove_element_unordered_atomic:
|
|
|
|
case Intrinsic::memset_element_unordered_atomic:
|
|
|
|
return true;
|
2012-09-25 06:09:10 +08:00
|
|
|
}
|
2010-12-01 03:12:10 +08:00
|
|
|
}
|
2012-09-25 06:09:10 +08:00
|
|
|
|
2018-01-21 09:44:33 +08:00
|
|
|
// note: only get here for calls with analyzable writes - i.e. libcalls
|
2015-04-10 22:50:08 +08:00
|
|
|
if (auto CS = CallSite(I))
|
2012-09-25 09:55:59 +08:00
|
|
|
return CS.getInstruction()->use_empty();
|
2012-09-25 06:09:10 +08:00
|
|
|
|
|
|
|
return false;
|
2009-11-10 14:46:40 +08:00
|
|
|
}
|
|
|
|
|
2016-04-23 03:51:29 +08:00
|
|
|
/// Returns true if the end of this instruction can be safely shortened in
|
2011-11-10 07:07:35 +08:00
|
|
|
/// length.
|
2016-04-23 03:51:29 +08:00
|
|
|
static bool isShortenableAtTheEnd(Instruction *I) {
|
2011-11-10 07:07:35 +08:00
|
|
|
// Don't shorten stores for now
|
|
|
|
if (isa<StoreInst>(I))
|
|
|
|
return false;
|
2012-07-24 18:51:42 +08:00
|
|
|
|
2012-09-25 06:09:10 +08:00
|
|
|
if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
|
|
|
|
switch (II->getIntrinsicID()) {
|
|
|
|
default: return false;
|
|
|
|
case Intrinsic::memset:
|
|
|
|
case Intrinsic::memcpy:
|
2018-05-10 23:12:49 +08:00
|
|
|
case Intrinsic::memcpy_element_unordered_atomic:
|
|
|
|
case Intrinsic::memset_element_unordered_atomic:
|
2012-09-25 06:09:10 +08:00
|
|
|
// Do shorten memory intrinsics.
|
2016-04-23 03:51:29 +08:00
|
|
|
// FIXME: Add memmove if it's also safe to transform.
|
2012-09-25 06:09:10 +08:00
|
|
|
return true;
|
|
|
|
}
|
2011-11-10 07:07:35 +08:00
|
|
|
}
|
2012-09-25 06:09:10 +08:00
|
|
|
|
|
|
|
// Don't shorten libcalls calls for now.
|
|
|
|
|
|
|
|
return false;
|
2011-11-10 07:07:35 +08:00
|
|
|
}
|
|
|
|
|
2016-04-23 03:51:29 +08:00
|
|
|
/// Returns true if the beginning of this instruction can be safely shortened
|
|
|
|
/// in length.
|
|
|
|
static bool isShortenableAtTheBeginning(Instruction *I) {
|
|
|
|
// FIXME: Handle only memset for now. Supporting memcpy/memmove should be
|
|
|
|
// easily done by offsetting the source address.
|
2018-05-10 23:12:49 +08:00
|
|
|
return isa<AnyMemSetInst>(I);
|
2016-04-23 03:51:29 +08:00
|
|
|
}
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
/// Return the pointer that is being written to.
|
2010-12-01 05:58:14 +08:00
|
|
|
static Value *getStoredPointerOperand(Instruction *I) {
|
2018-01-21 09:44:33 +08:00
|
|
|
//TODO: factor this to reuse getLocForWrite
|
2018-01-21 10:10:54 +08:00
|
|
|
MemoryLocation Loc = getLocForWrite(I);
|
|
|
|
assert(Loc.Ptr &&
|
2018-06-14 13:41:49 +08:00
|
|
|
"unable to find pointer written for analyzable instruction?");
|
2018-01-21 10:10:54 +08:00
|
|
|
// TODO: most APIs don't expect const Value *
|
|
|
|
return const_cast<Value*>(Loc.Ptr);
|
2009-11-10 14:46:40 +08:00
|
|
|
}
|
|
|
|
|
2015-03-10 10:37:25 +08:00
|
|
|
static uint64_t getPointerSize(const Value *V, const DataLayout &DL,
|
llvm: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
2018-07-10 06:27:23 +08:00
|
|
|
const TargetLibraryInfo &TLI,
|
|
|
|
const Function *F) {
|
2012-06-21 23:45:28 +08:00
|
|
|
uint64_t Size;
|
llvm: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
2018-07-10 06:27:23 +08:00
|
|
|
ObjectSizeOpts Opts;
|
|
|
|
Opts.NullIsUnknownSize = NullPointerIsDefined(F);
|
|
|
|
|
|
|
|
if (getObjectSize(V, Size, DL, &TLI, Opts))
|
2012-06-21 23:45:28 +08:00
|
|
|
return Size;
|
2015-06-17 15:21:38 +08:00
|
|
|
return MemoryLocation::UnknownSize;
|
2010-12-01 07:43:23 +08:00
|
|
|
}
|
2010-12-01 03:34:42 +08:00
|
|
|
|
2011-11-10 07:07:35 +08:00
|
|
|
namespace {
|
2017-10-14 05:17:07 +08:00
|
|
|
|
2017-09-26 21:54:28 +08:00
|
|
|
enum OverwriteResult {
|
|
|
|
OW_Begin,
|
|
|
|
OW_Complete,
|
|
|
|
OW_End,
|
|
|
|
OW_PartialEarlierWithFullLater,
|
|
|
|
OW_Unknown
|
|
|
|
};
|
2017-10-14 05:17:07 +08:00
|
|
|
|
|
|
|
} // end anonymous namespace
|
2011-11-10 07:07:35 +08:00
|
|
|
|
2017-03-29 22:42:27 +08:00
|
|
|
/// Return 'OW_Complete' if a store to the 'Later' location completely
|
|
|
|
/// overwrites a store to the 'Earlier' location, 'OW_End' if the end of the
|
|
|
|
/// 'Earlier' location is completely overwritten by 'Later', 'OW_Begin' if the
|
2017-09-26 21:54:28 +08:00
|
|
|
/// beginning of the 'Earlier' location is overwritten by 'Later'.
|
|
|
|
/// 'OW_PartialEarlierWithFullLater' means that an earlier (big) store was
|
|
|
|
/// overwritten by a latter (smaller) store which doesn't write outside the big
|
|
|
|
/// store's memory locations. Returns 'OW_Unknown' if nothing can be determined.
|
2015-06-17 15:18:54 +08:00
|
|
|
static OverwriteResult isOverwrite(const MemoryLocation &Later,
|
|
|
|
const MemoryLocation &Earlier,
|
2015-03-10 10:37:25 +08:00
|
|
|
const DataLayout &DL,
|
2015-08-13 02:01:44 +08:00
|
|
|
const TargetLibraryInfo &TLI,
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
int64_t &EarlierOff, int64_t &LaterOff,
|
|
|
|
Instruction *DepWrite,
|
2018-05-03 19:03:53 +08:00
|
|
|
InstOverlapIntervalsTy &IOL,
|
llvm: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
2018-07-10 06:27:23 +08:00
|
|
|
AliasAnalysis &AA,
|
|
|
|
const Function *F) {
|
2018-10-10 14:39:40 +08:00
|
|
|
// FIXME: Vet that this works for size upper-bounds. Seems unlikely that we'll
|
|
|
|
// get imprecise values here, though (except for unknown sizes).
|
|
|
|
if (!Later.Size.isPrecise() || !Earlier.Size.isPrecise())
|
2017-03-29 22:42:27 +08:00
|
|
|
return OW_Unknown;
|
2016-06-16 06:17:38 +08:00
|
|
|
|
2018-10-09 11:18:56 +08:00
|
|
|
const uint64_t LaterSize = Later.Size.getValue();
|
|
|
|
const uint64_t EarlierSize = Earlier.Size.getValue();
|
2018-10-09 10:14:33 +08:00
|
|
|
|
2010-12-01 07:05:20 +08:00
|
|
|
const Value *P1 = Earlier.Ptr->stripPointerCasts();
|
|
|
|
const Value *P2 = Later.Ptr->stripPointerCasts();
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-01 07:05:20 +08:00
|
|
|
// If the start pointers are the same, we just have to compare sizes to see if
|
|
|
|
// the later store was larger than the earlier store.
|
2018-05-03 19:03:53 +08:00
|
|
|
if (P1 == P2 || AA.isMustAlias(P1, P2)) {
|
2010-12-01 07:05:20 +08:00
|
|
|
// Make sure that the Later size is >= the Earlier size.
|
2018-10-09 10:14:33 +08:00
|
|
|
if (LaterSize >= EarlierSize)
|
2017-03-29 22:42:27 +08:00
|
|
|
return OW_Complete;
|
2010-12-01 07:05:20 +08:00
|
|
|
}
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-01 07:43:23 +08:00
|
|
|
// Check to see if the later store is to the entire object (either a global,
|
2014-01-28 10:38:36 +08:00
|
|
|
// an alloca, or a byval/inalloca argument). If so, then it clearly
|
|
|
|
// overwrites any other store to the same object.
|
2014-02-22 02:34:28 +08:00
|
|
|
const Value *UO1 = GetUnderlyingObject(P1, DL),
|
|
|
|
*UO2 = GetUnderlyingObject(P2, DL);
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-01 07:43:23 +08:00
|
|
|
// If we can't resolve the same pointers to the same object, then we can't
|
|
|
|
// analyze them at all.
|
|
|
|
if (UO1 != UO2)
|
2017-03-29 22:42:27 +08:00
|
|
|
return OW_Unknown;
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-01 07:43:23 +08:00
|
|
|
// If the "Later" store is to a recognizable object, get its size.
|
llvm: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
2018-07-10 06:27:23 +08:00
|
|
|
uint64_t ObjectSize = getPointerSize(UO2, DL, TLI, F);
|
2015-06-17 15:21:38 +08:00
|
|
|
if (ObjectSize != MemoryLocation::UnknownSize)
|
2018-10-09 10:14:33 +08:00
|
|
|
if (ObjectSize == LaterSize && ObjectSize >= EarlierSize)
|
2017-03-29 22:42:27 +08:00
|
|
|
return OW_Complete;
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-01 07:05:20 +08:00
|
|
|
// Okay, we have stores to two completely different pointers. Try to
|
|
|
|
// decompose the pointer into a "base + constant_offset" form. If the base
|
|
|
|
// pointers are equal, then we can reason about the two stores.
|
2011-11-10 07:07:35 +08:00
|
|
|
EarlierOff = 0;
|
|
|
|
LaterOff = 0;
|
2014-02-22 02:34:28 +08:00
|
|
|
const Value *BP1 = GetPointerBaseWithConstantOffset(P1, EarlierOff, DL);
|
|
|
|
const Value *BP2 = GetPointerBaseWithConstantOffset(P2, LaterOff, DL);
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-01 07:05:20 +08:00
|
|
|
// If the base pointers still differ, we have two completely different stores.
|
|
|
|
if (BP1 != BP2)
|
2017-03-29 22:42:27 +08:00
|
|
|
return OW_Unknown;
|
2011-03-26 09:20:37 +08:00
|
|
|
|
2011-03-26 16:02:59 +08:00
|
|
|
// The later store completely overlaps the earlier store if:
|
2011-09-07 02:14:09 +08:00
|
|
|
//
|
2011-03-26 16:02:59 +08:00
|
|
|
// 1. Both start at the same offset and the later one's size is greater than
|
|
|
|
// or equal to the earlier one's, or
|
|
|
|
//
|
|
|
|
// |--earlier--|
|
|
|
|
// |-- later --|
|
2011-09-07 02:14:09 +08:00
|
|
|
//
|
2011-03-26 16:02:59 +08:00
|
|
|
// 2. The earlier store has an offset greater than the later offset, but which
|
|
|
|
// still lies completely within the later store.
|
|
|
|
//
|
|
|
|
// |--earlier--|
|
|
|
|
// |----- later ------|
|
2011-03-31 05:37:19 +08:00
|
|
|
//
|
|
|
|
// We have to be careful here as *Off is signed while *.Size is unsigned.
|
2011-03-26 17:32:07 +08:00
|
|
|
if (EarlierOff >= LaterOff &&
|
2018-10-09 10:14:33 +08:00
|
|
|
LaterSize >= EarlierSize &&
|
|
|
|
uint64_t(EarlierOff - LaterOff) + EarlierSize <= LaterSize)
|
2017-03-29 22:42:27 +08:00
|
|
|
return OW_Complete;
|
2012-07-24 18:51:42 +08:00
|
|
|
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
// We may now overlap, although the overlap is not complete. There might also
|
|
|
|
// be other incomplete overlaps, and together, they might cover the complete
|
|
|
|
// earlier write.
|
|
|
|
// Note: The correctness of this logic depends on the fact that this function
|
|
|
|
// is not even called providing DepWrite when there are any intervening reads.
|
|
|
|
if (EnablePartialOverwriteTracking &&
|
2018-10-09 10:14:33 +08:00
|
|
|
LaterOff < int64_t(EarlierOff + EarlierSize) &&
|
|
|
|
int64_t(LaterOff + LaterSize) >= EarlierOff) {
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
|
|
|
|
// Insert our part of the overlap into the map.
|
|
|
|
auto &IM = IOL[DepWrite];
|
2018-05-14 20:53:11 +08:00
|
|
|
LLVM_DEBUG(dbgs() << "DSE: Partial overwrite: Earlier [" << EarlierOff
|
2018-10-09 10:14:33 +08:00
|
|
|
<< ", " << int64_t(EarlierOff + EarlierSize)
|
2018-05-14 20:53:11 +08:00
|
|
|
<< ") Later [" << LaterOff << ", "
|
2018-10-09 10:14:33 +08:00
|
|
|
<< int64_t(LaterOff + LaterSize) << ")\n");
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
|
|
|
|
// Make sure that we only insert non-overlapping intervals and combine
|
|
|
|
// adjacent intervals. The intervals are stored in the map with the ending
|
|
|
|
// offset as the key (in the half-open sense) and the starting offset as
|
|
|
|
// the value.
|
2018-10-09 10:14:33 +08:00
|
|
|
int64_t LaterIntStart = LaterOff, LaterIntEnd = LaterOff + LaterSize;
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
|
|
|
|
// Find any intervals ending at, or after, LaterIntStart which start
|
|
|
|
// before LaterIntEnd.
|
|
|
|
auto ILI = IM.lower_bound(LaterIntStart);
|
2016-06-30 23:32:20 +08:00
|
|
|
if (ILI != IM.end() && ILI->second <= LaterIntEnd) {
|
|
|
|
// This existing interval is overlapped with the current store somewhere
|
|
|
|
// in [LaterIntStart, LaterIntEnd]. Merge them by erasing the existing
|
|
|
|
// intervals and adjusting our start and end.
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
LaterIntStart = std::min(LaterIntStart, ILI->second);
|
|
|
|
LaterIntEnd = std::max(LaterIntEnd, ILI->first);
|
|
|
|
ILI = IM.erase(ILI);
|
|
|
|
|
2016-06-30 23:32:20 +08:00
|
|
|
// Continue erasing and adjusting our end in case other previous
|
|
|
|
// intervals are also overlapped with the current store.
|
|
|
|
//
|
|
|
|
// |--- ealier 1 ---| |--- ealier 2 ---|
|
|
|
|
// |------- later---------|
|
|
|
|
//
|
|
|
|
while (ILI != IM.end() && ILI->second <= LaterIntEnd) {
|
|
|
|
assert(ILI->second > LaterIntStart && "Unexpected interval");
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
LaterIntEnd = std::max(LaterIntEnd, ILI->first);
|
2016-06-30 23:32:20 +08:00
|
|
|
ILI = IM.erase(ILI);
|
|
|
|
}
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
IM[LaterIntEnd] = LaterIntStart;
|
|
|
|
|
|
|
|
ILI = IM.begin();
|
|
|
|
if (ILI->second <= EarlierOff &&
|
2018-10-09 10:14:33 +08:00
|
|
|
ILI->first >= int64_t(EarlierOff + EarlierSize)) {
|
2018-05-14 20:53:11 +08:00
|
|
|
LLVM_DEBUG(dbgs() << "DSE: Full overwrite from partials: Earlier ["
|
|
|
|
<< EarlierOff << ", "
|
2018-10-09 10:14:33 +08:00
|
|
|
<< int64_t(EarlierOff + EarlierSize)
|
2018-05-14 20:53:11 +08:00
|
|
|
<< ") Composite Later [" << ILI->second << ", "
|
|
|
|
<< ILI->first << ")\n");
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
++NumCompletePartials;
|
2017-03-29 22:42:27 +08:00
|
|
|
return OW_Complete;
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-09-26 21:54:28 +08:00
|
|
|
// Check for an earlier store which writes to all the memory locations that
|
|
|
|
// the later store writes to.
|
|
|
|
if (EnablePartialStoreMerging && LaterOff >= EarlierOff &&
|
2018-10-09 10:14:33 +08:00
|
|
|
int64_t(EarlierOff + EarlierSize) > LaterOff &&
|
|
|
|
uint64_t(LaterOff - EarlierOff) + LaterSize <= EarlierSize) {
|
2018-05-14 20:53:11 +08:00
|
|
|
LLVM_DEBUG(dbgs() << "DSE: Partial overwrite an earlier load ["
|
|
|
|
<< EarlierOff << ", "
|
2018-10-09 10:14:33 +08:00
|
|
|
<< int64_t(EarlierOff + EarlierSize)
|
2018-05-14 20:53:11 +08:00
|
|
|
<< ") by a later store [" << LaterOff << ", "
|
2018-10-09 10:14:33 +08:00
|
|
|
<< int64_t(LaterOff + LaterSize) << ")\n");
|
2017-09-26 21:54:28 +08:00
|
|
|
// TODO: Maybe come up with a better name?
|
|
|
|
return OW_PartialEarlierWithFullLater;
|
|
|
|
}
|
|
|
|
|
2016-04-23 03:51:29 +08:00
|
|
|
// Another interesting case is if the later store overwrites the end of the
|
|
|
|
// earlier store.
|
2011-11-10 07:07:35 +08:00
|
|
|
//
|
|
|
|
// |--earlier--|
|
|
|
|
// |-- later --|
|
|
|
|
//
|
|
|
|
// In this case we may want to trim the size of earlier to avoid generating
|
|
|
|
// writes to addresses which will definitely be overwritten later
|
2016-07-23 02:27:24 +08:00
|
|
|
if (!EnablePartialOverwriteTracking &&
|
2018-10-09 10:14:33 +08:00
|
|
|
(LaterOff > EarlierOff && LaterOff < int64_t(EarlierOff + EarlierSize) &&
|
|
|
|
int64_t(LaterOff + LaterSize) >= int64_t(EarlierOff + EarlierSize)))
|
2017-03-29 22:42:27 +08:00
|
|
|
return OW_End;
|
2011-03-26 16:02:59 +08:00
|
|
|
|
2016-04-23 03:51:29 +08:00
|
|
|
// Finally, we also need to check if the later store overwrites the beginning
|
|
|
|
// of the earlier store.
|
|
|
|
//
|
|
|
|
// |--earlier--|
|
|
|
|
// |-- later --|
|
|
|
|
//
|
|
|
|
// In this case we may want to move the destination address and trim the size
|
|
|
|
// of earlier to avoid generating writes to addresses which will definitely
|
|
|
|
// be overwritten later.
|
2016-07-23 02:27:24 +08:00
|
|
|
if (!EnablePartialOverwriteTracking &&
|
2018-10-09 10:14:33 +08:00
|
|
|
(LaterOff <= EarlierOff && int64_t(LaterOff + LaterSize) > EarlierOff)) {
|
|
|
|
assert(int64_t(LaterOff + LaterSize) < int64_t(EarlierOff + EarlierSize) &&
|
2017-03-29 22:42:27 +08:00
|
|
|
"Expect to be handled as OW_Complete");
|
|
|
|
return OW_Begin;
|
2016-04-23 03:51:29 +08:00
|
|
|
}
|
2011-03-26 16:02:59 +08:00
|
|
|
// Otherwise, they don't completely overlap.
|
2017-03-29 22:42:27 +08:00
|
|
|
return OW_Unknown;
|
2009-11-05 07:20:12 +08:00
|
|
|
}
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
/// If 'Inst' might be a self read (i.e. a noop copy of a
|
2010-12-06 09:48:06 +08:00
|
|
|
/// memory region into an identical pointer) then it doesn't actually make its
|
2011-09-07 02:14:09 +08:00
|
|
|
/// input dead in the traditional sense. Consider this case:
|
2010-12-06 09:48:06 +08:00
|
|
|
///
|
2018-02-21 07:19:34 +08:00
|
|
|
/// memmove(A <- B)
|
|
|
|
/// memmove(A <- A)
|
2010-12-06 09:48:06 +08:00
|
|
|
///
|
|
|
|
/// In this case, the second store to A does not make the first store to A dead.
|
|
|
|
/// The usual situation isn't an explicit A<-A store like this (which can be
|
|
|
|
/// trivially removed) but a case where two pointers may alias.
|
|
|
|
///
|
|
|
|
/// This function detects when it is unsafe to remove a dependent instruction
|
|
|
|
/// because the DSE inducing instruction may be a self-read.
|
|
|
|
static bool isPossibleSelfRead(Instruction *Inst,
|
2015-06-17 15:18:54 +08:00
|
|
|
const MemoryLocation &InstStoreLoc,
|
2015-08-13 02:01:44 +08:00
|
|
|
Instruction *DepWrite,
|
|
|
|
const TargetLibraryInfo &TLI,
|
|
|
|
AliasAnalysis &AA) {
|
2010-12-06 09:48:06 +08:00
|
|
|
// Self reads can only happen for instructions that read memory. Get the
|
|
|
|
// location read.
|
2015-08-13 02:01:44 +08:00
|
|
|
MemoryLocation InstReadLoc = getLocForRead(Inst, TLI);
|
2018-02-21 07:19:34 +08:00
|
|
|
if (!InstReadLoc.Ptr)
|
|
|
|
return false; // Not a reading instruction.
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-06 09:48:06 +08:00
|
|
|
// If the read and written loc obviously don't alias, it isn't a read.
|
2018-02-21 07:19:34 +08:00
|
|
|
if (AA.isNoAlias(InstReadLoc, InstStoreLoc))
|
2010-12-06 09:48:06 +08:00
|
|
|
return false;
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2018-04-24 03:06:49 +08:00
|
|
|
if (isa<AnyMemCpyInst>(Inst)) {
|
2018-02-21 07:19:34 +08:00
|
|
|
// LLVM's memcpy overlap semantics are not fully fleshed out (see PR11763)
|
|
|
|
// but in practice memcpy(A <- B) either means that A and B are disjoint or
|
|
|
|
// are equal (i.e. there are not partial overlaps). Given that, if we have:
|
|
|
|
//
|
|
|
|
// memcpy/memmove(A <- B) // DepWrite
|
|
|
|
// memcpy(A <- B) // Inst
|
|
|
|
//
|
|
|
|
// with Inst reading/writing a >= size than DepWrite, we can reason as
|
|
|
|
// follows:
|
|
|
|
//
|
|
|
|
// - If A == B then both the copies are no-ops, so the DepWrite can be
|
|
|
|
// removed.
|
|
|
|
// - If A != B then A and B are disjoint locations in Inst. Since
|
|
|
|
// Inst.size >= DepWrite.size A and B are disjoint in DepWrite too.
|
|
|
|
// Therefore DepWrite can be removed.
|
|
|
|
MemoryLocation DepReadLoc = getLocForRead(DepWrite, TLI);
|
|
|
|
|
|
|
|
if (DepReadLoc.Ptr && AA.isMustAlias(InstReadLoc.Ptr, DepReadLoc.Ptr))
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2010-12-06 09:48:06 +08:00
|
|
|
// If DepWrite doesn't read memory or if we can't prove it is a must alias,
|
|
|
|
// then it can't be considered dead.
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2015-09-23 19:38:44 +08:00
|
|
|
/// Returns true if the memory which is accessed by the second instruction is not
|
|
|
|
/// modified between the first and the second instruction.
|
|
|
|
/// Precondition: Second instruction must be dominated by the first
|
2015-08-13 23:36:11 +08:00
|
|
|
/// instruction.
|
2016-05-18 05:38:13 +08:00
|
|
|
static bool memoryIsNotModifiedBetween(Instruction *FirstI,
|
|
|
|
Instruction *SecondI,
|
|
|
|
AliasAnalysis *AA) {
|
2015-08-13 23:36:11 +08:00
|
|
|
SmallVector<BasicBlock *, 16> WorkList;
|
|
|
|
SmallPtrSet<BasicBlock *, 8> Visited;
|
2015-09-23 19:38:44 +08:00
|
|
|
BasicBlock::iterator FirstBBI(FirstI);
|
|
|
|
++FirstBBI;
|
|
|
|
BasicBlock::iterator SecondBBI(SecondI);
|
|
|
|
BasicBlock *FirstBB = FirstI->getParent();
|
|
|
|
BasicBlock *SecondBB = SecondI->getParent();
|
|
|
|
MemoryLocation MemLoc = MemoryLocation::get(SecondI);
|
2015-08-13 23:36:11 +08:00
|
|
|
|
|
|
|
// Start checking the store-block.
|
2015-09-23 19:38:44 +08:00
|
|
|
WorkList.push_back(SecondBB);
|
2015-08-13 23:36:11 +08:00
|
|
|
bool isFirstBlock = true;
|
|
|
|
|
|
|
|
// Check all blocks going backward until we reach the load-block.
|
|
|
|
while (!WorkList.empty()) {
|
|
|
|
BasicBlock *B = WorkList.pop_back_val();
|
|
|
|
|
2015-09-23 19:38:44 +08:00
|
|
|
// Ignore instructions before LI if this is the FirstBB.
|
|
|
|
BasicBlock::iterator BI = (B == FirstBB ? FirstBBI : B->begin());
|
2015-08-13 23:36:11 +08:00
|
|
|
|
|
|
|
BasicBlock::iterator EI;
|
|
|
|
if (isFirstBlock) {
|
2015-09-23 19:38:44 +08:00
|
|
|
// Ignore instructions after SI if this is the first visit of SecondBB.
|
|
|
|
assert(B == SecondBB && "first block is not the store block");
|
|
|
|
EI = SecondBBI;
|
2015-08-13 23:36:11 +08:00
|
|
|
isFirstBlock = false;
|
|
|
|
} else {
|
2015-09-23 19:38:44 +08:00
|
|
|
// It's not SecondBB or (in case of a loop) the second visit of SecondBB.
|
2015-08-13 23:36:11 +08:00
|
|
|
// In this case we also have to look at instructions after SI.
|
|
|
|
EI = B->end();
|
|
|
|
}
|
|
|
|
for (; BI != EI; ++BI) {
|
2015-10-14 02:26:00 +08:00
|
|
|
Instruction *I = &*BI;
|
2017-12-06 04:12:23 +08:00
|
|
|
if (I->mayWriteToMemory() && I != SecondI)
|
|
|
|
if (isModSet(AA->getModRefInfo(I, MemLoc)))
|
2015-08-13 23:36:11 +08:00
|
|
|
return false;
|
|
|
|
}
|
2015-09-23 19:38:44 +08:00
|
|
|
if (B != FirstBB) {
|
|
|
|
assert(B != &FirstBB->getParent()->getEntryBlock() &&
|
2015-08-13 23:36:11 +08:00
|
|
|
"Should not hit the entry block because SI must be dominated by LI");
|
2015-12-12 02:39:41 +08:00
|
|
|
for (auto PredI = pred_begin(B), PE = pred_end(B); PredI != PE; ++PredI) {
|
|
|
|
if (!Visited.insert(*PredI).second)
|
2015-08-13 23:36:11 +08:00
|
|
|
continue;
|
2015-12-12 02:39:41 +08:00
|
|
|
WorkList.push_back(*PredI);
|
2015-08-13 23:36:11 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
2011-11-05 18:48:42 +08:00
|
|
|
/// Find all blocks that will unconditionally lead to the block BB and append
|
|
|
|
/// them to F.
|
2016-05-18 05:38:13 +08:00
|
|
|
static void findUnconditionalPreds(SmallVectorImpl<BasicBlock *> &Blocks,
|
2011-11-05 18:48:42 +08:00
|
|
|
BasicBlock *BB, DominatorTree *DT) {
|
2014-07-22 01:06:51 +08:00
|
|
|
for (pred_iterator I = pred_begin(BB), E = pred_end(BB); I != E; ++I) {
|
|
|
|
BasicBlock *Pred = *I;
|
2011-12-09 06:36:35 +08:00
|
|
|
if (Pred == BB) continue;
|
2018-10-15 18:04:59 +08:00
|
|
|
Instruction *PredTI = Pred->getTerminator();
|
2011-11-05 18:48:42 +08:00
|
|
|
if (PredTI->getNumSuccessors() != 1)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (DT->isReachableFromEntry(Pred))
|
|
|
|
Blocks.push_back(Pred);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
/// Handle frees of entire structures whose dependency is a store
|
2010-11-30 09:28:33 +08:00
|
|
|
/// to a field of that structure.
|
2016-05-18 05:38:13 +08:00
|
|
|
static bool handleFree(CallInst *F, AliasAnalysis *AA,
|
|
|
|
MemoryDependenceResults *MD, DominatorTree *DT,
|
2016-07-23 02:27:24 +08:00
|
|
|
const TargetLibraryInfo *TLI,
|
2019-03-29 22:10:24 +08:00
|
|
|
InstOverlapIntervalsTy &IOL, OrderedBasicBlock &OBB) {
|
2011-06-15 08:47:34 +08:00
|
|
|
bool MadeChange = false;
|
|
|
|
|
2015-06-17 15:18:54 +08:00
|
|
|
MemoryLocation Loc = MemoryLocation(F->getOperand(0));
|
2011-11-05 18:48:42 +08:00
|
|
|
SmallVector<BasicBlock *, 16> Blocks;
|
|
|
|
Blocks.push_back(F->getParent());
|
2015-03-10 10:37:25 +08:00
|
|
|
const DataLayout &DL = F->getModule()->getDataLayout();
|
2011-06-15 08:47:34 +08:00
|
|
|
|
2011-11-05 18:48:42 +08:00
|
|
|
while (!Blocks.empty()) {
|
|
|
|
BasicBlock *BB = Blocks.pop_back_val();
|
|
|
|
Instruction *InstPt = BB->getTerminator();
|
|
|
|
if (BB == F->getParent()) InstPt = F;
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2015-10-14 02:26:00 +08:00
|
|
|
MemDepResult Dep =
|
|
|
|
MD->getPointerDependencyFrom(Loc, false, InstPt->getIterator(), BB);
|
2011-11-05 18:48:42 +08:00
|
|
|
while (Dep.isDef() || Dep.isClobber()) {
|
|
|
|
Instruction *Dependency = Dep.getInst();
|
2018-01-21 09:44:33 +08:00
|
|
|
if (!hasAnalyzableMemoryWrite(Dependency, *TLI) ||
|
|
|
|
!isRemovable(Dependency))
|
2011-11-05 18:48:42 +08:00
|
|
|
break;
|
2008-01-20 18:49:23 +08:00
|
|
|
|
2011-11-05 18:48:42 +08:00
|
|
|
Value *DepPointer =
|
2015-03-10 10:37:25 +08:00
|
|
|
GetUnderlyingObject(getStoredPointerOperand(Dependency), DL);
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2011-11-05 18:48:42 +08:00
|
|
|
// Check for aliasing.
|
|
|
|
if (!AA->isMustAlias(F->getArgOperand(0), DepPointer))
|
|
|
|
break;
|
2010-11-12 10:19:17 +08:00
|
|
|
|
2018-05-14 20:53:11 +08:00
|
|
|
LLVM_DEBUG(
|
|
|
|
dbgs() << "DSE: Dead Store to soon to be freed memory:\n DEAD: "
|
|
|
|
<< *Dependency << '\n');
|
2016-07-20 00:50:57 +08:00
|
|
|
|
2016-06-11 01:59:22 +08:00
|
|
|
// DCE instructions only used to calculate that store.
|
2016-07-07 03:48:52 +08:00
|
|
|
BasicBlock::iterator BBI(Dependency);
|
2019-03-29 22:10:24 +08:00
|
|
|
deleteDeadInstruction(Dependency, &BBI, *MD, *TLI, IOL, OBB);
|
2011-11-05 18:48:42 +08:00
|
|
|
++NumFastStores;
|
|
|
|
MadeChange = true;
|
|
|
|
|
|
|
|
// Inst's old Dependency is now deleted. Compute the next dependency,
|
|
|
|
// which may also be dead, as in
|
|
|
|
// s[0] = 0;
|
|
|
|
// s[1] = 0; // This has just been deleted.
|
|
|
|
// free(s);
|
2016-07-07 03:48:52 +08:00
|
|
|
Dep = MD->getPointerDependencyFrom(Loc, false, BBI, BB);
|
2011-11-05 18:48:42 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (Dep.isNonLocal())
|
2016-05-18 05:38:13 +08:00
|
|
|
findUnconditionalPreds(Blocks, BB, DT);
|
2011-11-05 18:48:42 +08:00
|
|
|
}
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2011-06-15 08:47:34 +08:00
|
|
|
return MadeChange;
|
2007-07-12 07:19:17 +08:00
|
|
|
}
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
/// Check to see if the specified location may alias any of the stack objects in
|
|
|
|
/// the DeadStackObjects set. If so, they become live because the location is
|
|
|
|
/// being loaded.
|
|
|
|
static void removeAccessedObjects(const MemoryLocation &LoadedLoc,
|
Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
2019-04-24 14:55:50 +08:00
|
|
|
SmallSetVector<const Value *, 16> &DeadStackObjects,
|
2016-05-18 05:38:13 +08:00
|
|
|
const DataLayout &DL, AliasAnalysis *AA,
|
llvm: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
2018-07-10 06:27:23 +08:00
|
|
|
const TargetLibraryInfo *TLI,
|
|
|
|
const Function *F) {
|
2016-05-18 05:38:13 +08:00
|
|
|
const Value *UnderlyingPointer = GetUnderlyingObject(LoadedLoc.Ptr, DL);
|
|
|
|
|
|
|
|
// A constant can't be in the dead pointer set.
|
|
|
|
if (isa<Constant>(UnderlyingPointer))
|
|
|
|
return;
|
|
|
|
|
|
|
|
// If the kill pointer can be easily reduced to an alloca, don't bother doing
|
|
|
|
// extraneous AA queries.
|
|
|
|
if (isa<AllocaInst>(UnderlyingPointer) || isa<Argument>(UnderlyingPointer)) {
|
Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
2019-04-24 14:55:50 +08:00
|
|
|
DeadStackObjects.remove(UnderlyingPointer);
|
2016-05-18 05:38:13 +08:00
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Remove objects that could alias LoadedLoc.
|
Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
2019-04-24 14:55:50 +08:00
|
|
|
DeadStackObjects.remove_if([&](const Value *I) {
|
2016-05-18 05:38:13 +08:00
|
|
|
// See if the loaded location could alias the stack location.
|
llvm: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
2018-07-10 06:27:23 +08:00
|
|
|
MemoryLocation StackLoc(I, getPointerSize(I, DL, *TLI, F));
|
2016-05-18 05:38:13 +08:00
|
|
|
return !AA->isNoAlias(StackLoc, LoadedLoc);
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
|
|
|
/// Remove dead stores to stack-allocated locations in the function end block.
|
|
|
|
/// Ex:
|
2007-08-09 01:50:09 +08:00
|
|
|
/// %A = alloca i32
|
|
|
|
/// ...
|
|
|
|
/// store i32 1, i32* %A
|
|
|
|
/// ret void
|
2016-05-18 05:38:13 +08:00
|
|
|
static bool handleEndBlock(BasicBlock &BB, AliasAnalysis *AA,
|
2019-03-29 22:10:24 +08:00
|
|
|
MemoryDependenceResults *MD,
|
|
|
|
const TargetLibraryInfo *TLI,
|
|
|
|
InstOverlapIntervalsTy &IOL,
|
|
|
|
OrderedBasicBlock &OBB) {
|
2007-07-13 05:41:30 +08:00
|
|
|
bool MadeChange = false;
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-01 05:32:12 +08:00
|
|
|
// Keep track of all of the stack objects that are dead at the end of the
|
|
|
|
// function.
|
Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
2019-04-24 14:55:50 +08:00
|
|
|
SmallSetVector<const Value*, 16> DeadStackObjects;
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2008-11-28 08:27:14 +08:00
|
|
|
// Find all of the alloca'd pointers in the entry block.
|
2015-10-14 02:26:00 +08:00
|
|
|
BasicBlock &Entry = BB.getParent()->front();
|
|
|
|
for (Instruction &I : Entry) {
|
|
|
|
if (isa<AllocaInst>(&I))
|
|
|
|
DeadStackObjects.insert(&I);
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2011-10-23 05:59:35 +08:00
|
|
|
// Okay, so these are dead heap objects, but if the pointer never escapes
|
|
|
|
// then it's leaked by this function anyways.
|
2015-10-14 02:26:00 +08:00
|
|
|
else if (isAllocLikeFn(&I, TLI) && !PointerMayBeCaptured(&I, true, true))
|
|
|
|
DeadStackObjects.insert(&I);
|
2011-10-23 05:59:35 +08:00
|
|
|
}
|
|
|
|
|
2014-01-28 10:38:36 +08:00
|
|
|
// Treat byval or inalloca arguments the same, stores to them are dead at the
|
|
|
|
// end of the function.
|
2015-10-14 02:26:00 +08:00
|
|
|
for (Argument &AI : BB.getParent()->args())
|
|
|
|
if (AI.hasByValOrInAllocaAttr())
|
|
|
|
DeadStackObjects.insert(&AI);
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2015-03-10 10:37:25 +08:00
|
|
|
const DataLayout &DL = BB.getModule()->getDataLayout();
|
|
|
|
|
2007-07-13 05:41:30 +08:00
|
|
|
// Scan the basic block backwards
|
|
|
|
for (BasicBlock::iterator BBI = BB.end(); BBI != BB.begin(); ){
|
|
|
|
--BBI;
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2015-08-20 16:58:47 +08:00
|
|
|
// If we find a store, check to see if it points into a dead stack value.
|
2018-01-21 09:44:33 +08:00
|
|
|
if (hasAnalyzableMemoryWrite(&*BBI, *TLI) && isRemovable(&*BBI)) {
|
2010-12-01 03:48:15 +08:00
|
|
|
// See through pointer-to-pointer bitcasts
|
Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
2019-04-24 14:55:50 +08:00
|
|
|
SmallVector<const Value *, 4> Pointers;
|
2015-10-14 02:26:00 +08:00
|
|
|
GetUnderlyingObjects(getStoredPointerOperand(&*BBI), Pointers, DL);
|
2010-12-01 03:48:15 +08:00
|
|
|
|
2010-12-01 05:58:14 +08:00
|
|
|
// Stores to stack values are valid candidates for removal.
|
2012-05-11 02:57:38 +08:00
|
|
|
bool AllDead = true;
|
Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
2019-04-24 14:55:50 +08:00
|
|
|
for (const Value *Pointer : Pointers)
|
2016-06-26 20:28:59 +08:00
|
|
|
if (!DeadStackObjects.count(Pointer)) {
|
2012-05-11 02:57:38 +08:00
|
|
|
AllDead = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (AllDead) {
|
2016-07-07 03:48:52 +08:00
|
|
|
Instruction *Dead = &*BBI;
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2018-05-14 20:53:11 +08:00
|
|
|
LLVM_DEBUG(dbgs() << "DSE: Dead Store at End of Block:\n DEAD: "
|
|
|
|
<< *Dead << "\n Objects: ";
|
Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
2019-04-24 14:55:50 +08:00
|
|
|
for (SmallVectorImpl<const Value *>::iterator I =
|
|
|
|
Pointers.begin(),
|
2018-05-14 20:53:11 +08:00
|
|
|
E = Pointers.end();
|
|
|
|
I != E; ++I) {
|
|
|
|
dbgs() << **I;
|
|
|
|
if (std::next(I) != E)
|
|
|
|
dbgs() << ", ";
|
|
|
|
} dbgs()
|
|
|
|
<< '\n');
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-07 05:13:51 +08:00
|
|
|
// DCE instructions only used to calculate that store.
|
2019-03-29 22:10:24 +08:00
|
|
|
deleteDeadInstruction(Dead, &BBI, *MD, *TLI, IOL, OBB,
|
|
|
|
&DeadStackObjects);
|
2010-12-01 03:48:15 +08:00
|
|
|
++NumFastStores;
|
|
|
|
MadeChange = true;
|
2011-08-31 05:11:06 +08:00
|
|
|
continue;
|
2010-12-01 03:48:15 +08:00
|
|
|
}
|
|
|
|
}
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-01 03:48:15 +08:00
|
|
|
// Remove any dead non-memory-mutating instructions.
|
2015-10-14 02:26:00 +08:00
|
|
|
if (isInstructionTriviallyDead(&*BBI, TLI)) {
|
2018-05-14 20:53:11 +08:00
|
|
|
LLVM_DEBUG(dbgs() << "DSE: Removing trivially dead instruction:\n DEAD: "
|
|
|
|
<< *&*BBI << '\n');
|
2019-03-29 22:10:24 +08:00
|
|
|
deleteDeadInstruction(&*BBI, &BBI, *MD, *TLI, IOL, OBB,
|
|
|
|
&DeadStackObjects);
|
2010-12-01 03:48:15 +08:00
|
|
|
++NumFastOther;
|
|
|
|
MadeChange = true;
|
|
|
|
continue;
|
|
|
|
}
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2012-08-08 10:17:32 +08:00
|
|
|
if (isa<AllocaInst>(BBI)) {
|
|
|
|
// Remove allocas from the list of dead stack objects; there can't be
|
|
|
|
// any references before the definition.
|
2015-10-14 02:26:00 +08:00
|
|
|
DeadStackObjects.remove(&*BBI);
|
2012-05-11 01:14:00 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2019-01-07 13:42:51 +08:00
|
|
|
if (auto *Call = dyn_cast<CallBase>(&*BBI)) {
|
2016-05-18 05:38:13 +08:00
|
|
|
// Remove allocation function calls from the list of dead stack objects;
|
2012-08-08 10:17:32 +08:00
|
|
|
// there can't be any references before the definition.
|
2015-10-14 02:26:00 +08:00
|
|
|
if (isAllocLikeFn(&*BBI, TLI))
|
|
|
|
DeadStackObjects.remove(&*BBI);
|
2012-08-08 10:17:32 +08:00
|
|
|
|
2010-12-01 03:48:15 +08:00
|
|
|
// If this call does not access memory, it can't be loading any of our
|
|
|
|
// pointers.
|
2019-01-07 13:42:51 +08:00
|
|
|
if (AA->doesNotAccessMemory(Call))
|
2007-08-09 01:58:56 +08:00
|
|
|
continue;
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-01 05:18:46 +08:00
|
|
|
// If the call might load from any of our allocas, then any store above
|
|
|
|
// the call is live.
|
Add "const" in GetUnderlyingObjects. NFC
Summary:
Both the input Value pointer and the returned Value
pointers in GetUnderlyingObjects are now declared as
const.
It turned out that all current (in-tree) uses of
GetUnderlyingObjects were trivial to update, being
satisfied with have those Value pointers declared
as const. Actually, in the past several of the users
had to use const_cast, just because of ValueTracking
not providing a version of GetUnderlyingObjects with
"const" Value pointers. With this patch we get rid
of those const casts.
Reviewers: hfinkel, materi, jkorous
Reviewed By: jkorous
Subscribers: dexonsmith, jkorous, jholewinski, sdardis, eraman, hiraditya, jrtc27, atanasyan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D61038
llvm-svn: 359072
2019-04-24 14:55:50 +08:00
|
|
|
DeadStackObjects.remove_if([&](const Value *I) {
|
2014-03-01 19:47:00 +08:00
|
|
|
// See if the call site touches the value.
|
2019-01-07 13:42:51 +08:00
|
|
|
return isRefSet(AA->getModRefInfo(
|
|
|
|
Call, I, getPointerSize(I, DL, *TLI, BB.getParent())));
|
2014-03-04 03:28:52 +08:00
|
|
|
});
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-01 05:18:46 +08:00
|
|
|
// If all of the allocas were clobbered by the call then we're not going
|
|
|
|
// to find anything else to process.
|
2012-10-14 18:21:31 +08:00
|
|
|
if (DeadStackObjects.empty())
|
2012-08-08 10:17:32 +08:00
|
|
|
break;
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2007-07-13 05:41:30 +08:00
|
|
|
continue;
|
2010-12-01 05:18:46 +08:00
|
|
|
}
|
2011-07-27 09:08:30 +08:00
|
|
|
|
2016-07-08 04:51:42 +08:00
|
|
|
// We can remove the dead stores, irrespective of the fence and its ordering
|
|
|
|
// (release/acquire/seq_cst). Fences only constraints the ordering of
|
|
|
|
// already visible stores, it does not make a store visible to other
|
|
|
|
// threads. So, skipping over a fence does not change a store from being
|
|
|
|
// dead.
|
|
|
|
if (isa<FenceInst>(*BBI))
|
|
|
|
continue;
|
|
|
|
|
2015-06-17 15:18:54 +08:00
|
|
|
MemoryLocation LoadedLoc;
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2010-12-01 05:18:46 +08:00
|
|
|
// If we encounter a use of the pointer, it is no longer considered dead
|
|
|
|
if (LoadInst *L = dyn_cast<LoadInst>(BBI)) {
|
2011-08-18 06:22:24 +08:00
|
|
|
if (!L->isUnordered()) // Be conservative with atomic/volatile load
|
|
|
|
break;
|
2015-06-04 10:03:15 +08:00
|
|
|
LoadedLoc = MemoryLocation::get(L);
|
2010-12-01 05:18:46 +08:00
|
|
|
} else if (VAArgInst *V = dyn_cast<VAArgInst>(BBI)) {
|
2015-06-04 10:03:15 +08:00
|
|
|
LoadedLoc = MemoryLocation::get(V);
|
2011-09-07 02:14:09 +08:00
|
|
|
} else if (!BBI->mayReadFromMemory()) {
|
|
|
|
// Instruction doesn't read memory. Note that stores that weren't removed
|
|
|
|
// above will hit this case.
|
2008-11-28 08:27:14 +08:00
|
|
|
continue;
|
2011-07-27 09:08:30 +08:00
|
|
|
} else {
|
|
|
|
// Unknown inst; assume it clobbers everything.
|
|
|
|
break;
|
2007-07-13 05:41:30 +08:00
|
|
|
}
|
2008-10-01 23:25:41 +08:00
|
|
|
|
2010-12-01 05:32:12 +08:00
|
|
|
// Remove any allocas from the DeadPointer set that are loaded, as this
|
|
|
|
// makes any stores above the access live.
|
llvm: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
2018-07-10 06:27:23 +08:00
|
|
|
removeAccessedObjects(LoadedLoc, DeadStackObjects, DL, AA, TLI, BB.getParent());
|
2008-10-01 23:25:41 +08:00
|
|
|
|
2010-12-01 05:32:12 +08:00
|
|
|
// If all of the allocas were clobbered by the access then we're not going
|
|
|
|
// to find anything else to process.
|
|
|
|
if (DeadStackObjects.empty())
|
|
|
|
break;
|
2007-07-13 05:41:30 +08:00
|
|
|
}
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2007-07-13 05:41:30 +08:00
|
|
|
return MadeChange;
|
|
|
|
}
|
|
|
|
|
2016-07-23 02:27:24 +08:00
|
|
|
static bool tryToShorten(Instruction *EarlierWrite, int64_t &EarlierOffset,
|
|
|
|
int64_t &EarlierSize, int64_t LaterOffset,
|
|
|
|
int64_t LaterSize, bool IsOverwriteEnd) {
|
|
|
|
// TODO: base this on the target vector size so that if the earlier
|
|
|
|
// store was too small to get vector writes anyway then its likely
|
|
|
|
// a good idea to shorten it
|
|
|
|
// Power of 2 vector writes are probably always a bad idea to optimize
|
|
|
|
// as any store/memset/memcpy is likely using vector instructions so
|
|
|
|
// shortening it to not vector size is likely to be slower
|
2018-05-10 23:12:49 +08:00
|
|
|
auto *EarlierIntrinsic = cast<AnyMemIntrinsic>(EarlierWrite);
|
[DSE] Upgrade uses of MemoryIntrinic::getAlignment() to new API. (NFC)
Summary:
This change is part of step five in the series of changes to remove alignment argument from
memcpy/memmove/memset in favour of alignment attributes. In particular, this changes the
DeadStoreElimination pass to cease using the old getAlignment() API of MemoryIntrinsic
in favour of getting dest specific alignments through the new API.
Steps:
Step 1) Remove alignment parameter and create alignment parameter attributes for
memcpy/memmove/memset. ( rL322965, rC322964, rL322963 )
Step 2) Expand the IRBuilder API to allow creation of memcpy/memmove with differing
source and dest alignments. ( rL323597 )
Step 3) Update Clang to use the new IRBuilder API. ( rC323617 )
Step 4) Update Polly to use the new IRBuilder API. ( rL323618 )
Step 5) Update LLVM passes that create memcpy/memmove calls to use the new IRBuilder API,
and those that use use MemIntrinsicInst::[get|set]Alignment() to use [get|set]DestAlignment()
and [get|set]SourceAlignment() instead. ( rL323886, rL323891, rL324148, rL324273, rL324278,
rL324384, rL324395 )
Step 6) Remove the single-alignment IRBuilder API for memcpy/memmove, and the
MemIntrinsicInst::[get|set]Alignment() methods.
Reference
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089384.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
llvm-svn: 324402
2018-02-07 05:18:33 +08:00
|
|
|
unsigned EarlierWriteAlign = EarlierIntrinsic->getDestAlignment();
|
2016-07-23 02:27:24 +08:00
|
|
|
if (!IsOverwriteEnd)
|
|
|
|
LaterOffset = int64_t(LaterOffset + LaterSize);
|
|
|
|
|
2017-10-14 05:17:07 +08:00
|
|
|
if (!(isPowerOf2_64(LaterOffset) && EarlierWriteAlign <= LaterOffset) &&
|
2016-07-23 02:27:24 +08:00
|
|
|
!((EarlierWriteAlign != 0) && LaterOffset % EarlierWriteAlign == 0))
|
|
|
|
return false;
|
|
|
|
|
2018-05-10 23:12:49 +08:00
|
|
|
int64_t NewLength = IsOverwriteEnd
|
|
|
|
? LaterOffset - EarlierOffset
|
|
|
|
: EarlierSize - (LaterOffset - EarlierOffset);
|
|
|
|
|
|
|
|
if (auto *AMI = dyn_cast<AtomicMemIntrinsic>(EarlierWrite)) {
|
|
|
|
// When shortening an atomic memory intrinsic, the newly shortened
|
|
|
|
// length must remain an integer multiple of the element size.
|
|
|
|
const uint32_t ElementSize = AMI->getElementSizeInBytes();
|
|
|
|
if (0 != NewLength % ElementSize)
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2018-05-14 20:53:11 +08:00
|
|
|
LLVM_DEBUG(dbgs() << "DSE: Remove Dead Store:\n OW "
|
|
|
|
<< (IsOverwriteEnd ? "END" : "BEGIN") << ": "
|
|
|
|
<< *EarlierWrite << "\n KILLER (offset " << LaterOffset
|
|
|
|
<< ", " << EarlierSize << ")\n");
|
2016-07-23 02:27:24 +08:00
|
|
|
|
|
|
|
Value *EarlierWriteLength = EarlierIntrinsic->getLength();
|
|
|
|
Value *TrimmedLength =
|
|
|
|
ConstantInt::get(EarlierWriteLength->getType(), NewLength);
|
|
|
|
EarlierIntrinsic->setLength(TrimmedLength);
|
|
|
|
|
|
|
|
EarlierSize = NewLength;
|
|
|
|
if (!IsOverwriteEnd) {
|
|
|
|
int64_t OffsetMoved = (LaterOffset - EarlierOffset);
|
|
|
|
Value *Indices[1] = {
|
|
|
|
ConstantInt::get(EarlierWriteLength->getType(), OffsetMoved)};
|
|
|
|
GetElementPtrInst *NewDestGEP = GetElementPtrInst::CreateInBounds(
|
2019-02-02 04:44:47 +08:00
|
|
|
EarlierIntrinsic->getRawDest()->getType()->getPointerElementType(),
|
2016-07-23 02:27:24 +08:00
|
|
|
EarlierIntrinsic->getRawDest(), Indices, "", EarlierWrite);
|
2019-04-12 17:47:35 +08:00
|
|
|
NewDestGEP->setDebugLoc(EarlierIntrinsic->getDebugLoc());
|
2016-07-23 02:27:24 +08:00
|
|
|
EarlierIntrinsic->setDest(NewDestGEP);
|
|
|
|
EarlierOffset = EarlierOffset + OffsetMoved;
|
|
|
|
}
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool tryToShortenEnd(Instruction *EarlierWrite,
|
|
|
|
OverlapIntervalsTy &IntervalMap,
|
|
|
|
int64_t &EarlierStart, int64_t &EarlierSize) {
|
|
|
|
if (IntervalMap.empty() || !isShortenableAtTheEnd(EarlierWrite))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
OverlapIntervalsTy::iterator OII = --IntervalMap.end();
|
|
|
|
int64_t LaterStart = OII->second;
|
|
|
|
int64_t LaterSize = OII->first - LaterStart;
|
|
|
|
|
|
|
|
if (LaterStart > EarlierStart && LaterStart < EarlierStart + EarlierSize &&
|
|
|
|
LaterStart + LaterSize >= EarlierStart + EarlierSize) {
|
|
|
|
if (tryToShorten(EarlierWrite, EarlierStart, EarlierSize, LaterStart,
|
|
|
|
LaterSize, true)) {
|
|
|
|
IntervalMap.erase(OII);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool tryToShortenBegin(Instruction *EarlierWrite,
|
|
|
|
OverlapIntervalsTy &IntervalMap,
|
|
|
|
int64_t &EarlierStart, int64_t &EarlierSize) {
|
|
|
|
if (IntervalMap.empty() || !isShortenableAtTheBeginning(EarlierWrite))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
OverlapIntervalsTy::iterator OII = IntervalMap.begin();
|
|
|
|
int64_t LaterStart = OII->second;
|
|
|
|
int64_t LaterSize = OII->first - LaterStart;
|
|
|
|
|
|
|
|
if (LaterStart <= EarlierStart && LaterStart + LaterSize > EarlierStart) {
|
|
|
|
assert(LaterStart + LaterSize < EarlierStart + EarlierSize &&
|
2017-03-29 22:42:27 +08:00
|
|
|
"Should have been handled as OW_Complete");
|
2016-07-23 02:27:24 +08:00
|
|
|
if (tryToShorten(EarlierWrite, EarlierStart, EarlierSize, LaterStart,
|
|
|
|
LaterSize, false)) {
|
|
|
|
IntervalMap.erase(OII);
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool removePartiallyOverlappedStores(AliasAnalysis *AA,
|
|
|
|
const DataLayout &DL,
|
|
|
|
InstOverlapIntervalsTy &IOL) {
|
|
|
|
bool Changed = false;
|
|
|
|
for (auto OI : IOL) {
|
|
|
|
Instruction *EarlierWrite = OI.first;
|
2018-01-21 10:10:54 +08:00
|
|
|
MemoryLocation Loc = getLocForWrite(EarlierWrite);
|
2016-07-23 02:27:24 +08:00
|
|
|
assert(isRemovable(EarlierWrite) && "Expect only removable instruction");
|
|
|
|
|
|
|
|
const Value *Ptr = Loc.Ptr->stripPointerCasts();
|
|
|
|
int64_t EarlierStart = 0;
|
2018-10-09 11:18:56 +08:00
|
|
|
int64_t EarlierSize = int64_t(Loc.Size.getValue());
|
2016-07-23 02:27:24 +08:00
|
|
|
GetPointerBaseWithConstantOffset(Ptr, EarlierStart, DL);
|
|
|
|
OverlapIntervalsTy &IntervalMap = OI.second;
|
2016-07-28 01:25:20 +08:00
|
|
|
Changed |=
|
2016-07-23 02:27:24 +08:00
|
|
|
tryToShortenEnd(EarlierWrite, IntervalMap, EarlierStart, EarlierSize);
|
|
|
|
if (IntervalMap.empty())
|
|
|
|
continue;
|
|
|
|
Changed |=
|
|
|
|
tryToShortenBegin(EarlierWrite, IntervalMap, EarlierStart, EarlierSize);
|
|
|
|
}
|
|
|
|
return Changed;
|
|
|
|
}
|
|
|
|
|
2016-07-09 00:48:40 +08:00
|
|
|
static bool eliminateNoopStore(Instruction *Inst, BasicBlock::iterator &BBI,
|
|
|
|
AliasAnalysis *AA, MemoryDependenceResults *MD,
|
|
|
|
const DataLayout &DL,
|
2016-07-23 02:27:24 +08:00
|
|
|
const TargetLibraryInfo *TLI,
|
2016-08-12 09:09:53 +08:00
|
|
|
InstOverlapIntervalsTy &IOL,
|
2019-03-29 22:10:24 +08:00
|
|
|
OrderedBasicBlock &OBB) {
|
2016-07-09 00:48:40 +08:00
|
|
|
// Must be a store instruction.
|
|
|
|
StoreInst *SI = dyn_cast<StoreInst>(Inst);
|
|
|
|
if (!SI)
|
|
|
|
return false;
|
|
|
|
|
|
|
|
// If we're storing the same value back to a pointer that we just loaded from,
|
|
|
|
// then the store can be removed.
|
|
|
|
if (LoadInst *DepLoad = dyn_cast<LoadInst>(SI->getValueOperand())) {
|
|
|
|
if (SI->getPointerOperand() == DepLoad->getPointerOperand() &&
|
|
|
|
isRemovable(SI) && memoryIsNotModifiedBetween(DepLoad, SI, AA)) {
|
|
|
|
|
2018-05-14 20:53:11 +08:00
|
|
|
LLVM_DEBUG(
|
|
|
|
dbgs() << "DSE: Remove Store Of Load from same pointer:\n LOAD: "
|
|
|
|
<< *DepLoad << "\n STORE: " << *SI << '\n');
|
2016-07-09 00:48:40 +08:00
|
|
|
|
2019-03-29 22:10:24 +08:00
|
|
|
deleteDeadInstruction(SI, &BBI, *MD, *TLI, IOL, OBB);
|
2016-07-09 00:48:40 +08:00
|
|
|
++NumRedundantStores;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Remove null stores into the calloc'ed objects
|
|
|
|
Constant *StoredConstant = dyn_cast<Constant>(SI->getValueOperand());
|
|
|
|
if (StoredConstant && StoredConstant->isNullValue() && isRemovable(SI)) {
|
|
|
|
Instruction *UnderlyingPointer =
|
|
|
|
dyn_cast<Instruction>(GetUnderlyingObject(SI->getPointerOperand(), DL));
|
|
|
|
|
|
|
|
if (UnderlyingPointer && isCallocLikeFn(UnderlyingPointer, TLI) &&
|
|
|
|
memoryIsNotModifiedBetween(UnderlyingPointer, SI, AA)) {
|
2018-05-14 20:53:11 +08:00
|
|
|
LLVM_DEBUG(
|
2016-07-09 00:48:40 +08:00
|
|
|
dbgs() << "DSE: Remove null store to the calloc'ed object:\n DEAD: "
|
|
|
|
<< *Inst << "\n OBJECT: " << *UnderlyingPointer << '\n');
|
|
|
|
|
2019-03-29 22:10:24 +08:00
|
|
|
deleteDeadInstruction(SI, &BBI, *MD, *TLI, IOL, OBB);
|
2016-07-09 00:48:40 +08:00
|
|
|
++NumRedundantStores;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
static bool eliminateDeadStores(BasicBlock &BB, AliasAnalysis *AA,
|
|
|
|
MemoryDependenceResults *MD, DominatorTree *DT,
|
|
|
|
const TargetLibraryInfo *TLI) {
|
|
|
|
const DataLayout &DL = BB.getModule()->getDataLayout();
|
|
|
|
bool MadeChange = false;
|
2010-12-01 05:32:12 +08:00
|
|
|
|
2019-03-29 22:10:24 +08:00
|
|
|
OrderedBasicBlock OBB(&BB);
|
|
|
|
Instruction *LastThrowing = nullptr;
|
2016-08-12 09:09:53 +08:00
|
|
|
|
Allow DeadStoreElimination to track combinations of partial later wrties
DeadStoreElimination can currently remove a small store rendered unnecessary by
a later larger one, but could not remove a larger store rendered unnecessary by
a series of later smaller ones. This adds that capability.
It works by keeping a map, which is used as an effective interval map, for each
store later overwritten only partially, and filling in that interval map as
more such stores are discovered. No additional walking or aliasing queries are
used. In the map forms an interval covering the the entire earlier store, then
it is dead and can be removed. The map is used as an interval map by storing a
mapping between the ending offset and the beginning offset of each interval.
I discovered this problem when investigating a performance issue with code like
this on PowerPC:
#include <complex>
using namespace std;
complex<float> bar(complex<float> C);
complex<float> foo(complex<float> C) {
return bar(C)*C;
}
which produces this:
define void @_Z4testSt7complexIfE(%"struct.std::complex"* noalias nocapture sret %agg.result, i64 %c.coerce) {
entry:
%ref.tmp = alloca i64, align 8
%tmpcast = bitcast i64* %ref.tmp to %"struct.std::complex"*
%c.sroa.0.0.extract.shift = lshr i64 %c.coerce, 32
%c.sroa.0.0.extract.trunc = trunc i64 %c.sroa.0.0.extract.shift to i32
%0 = bitcast i32 %c.sroa.0.0.extract.trunc to float
%c.sroa.2.0.extract.trunc = trunc i64 %c.coerce to i32
%1 = bitcast i32 %c.sroa.2.0.extract.trunc to float
call void @_Z3barSt7complexIfE(%"struct.std::complex"* nonnull sret %tmpcast, i64 %c.coerce)
%2 = bitcast %"struct.std::complex"* %agg.result to i64*
%3 = load i64, i64* %ref.tmp, align 8
store i64 %3, i64* %2, align 4 ; <--- ***** THIS SHOULD NOT BE HERE ****
%_M_value.realp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 0
%4 = lshr i64 %3, 32
%5 = trunc i64 %4 to i32
%6 = bitcast i32 %5 to float
%_M_value.imagp.i.i = getelementptr inbounds %"struct.std::complex", %"struct.std::complex"* %agg.result, i64 0, i32 0, i32 1
%7 = trunc i64 %3 to i32
%8 = bitcast i32 %7 to float
%mul_ad.i.i = fmul fast float %6, %1
%mul_bc.i.i = fmul fast float %8, %0
%mul_i.i.i = fadd fast float %mul_ad.i.i, %mul_bc.i.i
%mul_ac.i.i = fmul fast float %6, %0
%mul_bd.i.i = fmul fast float %8, %1
%mul_r.i.i = fsub fast float %mul_ac.i.i, %mul_bd.i.i
store float %mul_r.i.i, float* %_M_value.realp.i.i, align 4
store float %mul_i.i.i, float* %_M_value.imagp.i.i, align 4
ret void
}
the problem here is not just that the i64 store is unnecessary, but also that
it blocks further backend optimizations of the other uses of that i64 value in
the backend.
In the future, we might want to add a special case for handling smaller
accesses (e.g. using a bit vector) if the map mechanism turns out to be
noticeably inefficient. A sorted vector is also a possible replacement for the
map for small numbers of tracked intervals.
Differential Revision: http://reviews.llvm.org/D18586
llvm-svn: 273559
2016-06-23 21:46:39 +08:00
|
|
|
// A map of interval maps representing partially-overwritten value parts.
|
|
|
|
InstOverlapIntervalsTy IOL;
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
// Do a top-down walk on the BB.
|
|
|
|
for (BasicBlock::iterator BBI = BB.begin(), BBE = BB.end(); BBI != BBE; ) {
|
|
|
|
// Handle 'free' calls specially.
|
2016-07-07 03:48:52 +08:00
|
|
|
if (CallInst *F = isFreeCall(&*BBI, TLI)) {
|
2019-03-29 22:10:24 +08:00
|
|
|
MadeChange |= handleFree(F, AA, MD, DT, TLI, IOL, OBB);
|
2016-07-07 03:48:52 +08:00
|
|
|
// Increment BBI after handleFree has potentially deleted instructions.
|
|
|
|
// This ensures we maintain a valid iterator.
|
|
|
|
++BBI;
|
2016-05-18 05:38:13 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2016-07-07 03:48:52 +08:00
|
|
|
Instruction *Inst = &*BBI++;
|
|
|
|
|
2016-08-12 09:09:53 +08:00
|
|
|
if (Inst->mayThrow()) {
|
2019-03-29 22:10:24 +08:00
|
|
|
LastThrowing = Inst;
|
2016-08-12 09:09:53 +08:00
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2016-07-09 00:48:40 +08:00
|
|
|
// Check to see if Inst writes to memory. If not, continue.
|
2018-01-21 09:44:33 +08:00
|
|
|
if (!hasAnalyzableMemoryWrite(Inst, *TLI))
|
2016-05-18 05:38:13 +08:00
|
|
|
continue;
|
|
|
|
|
2016-07-09 00:48:40 +08:00
|
|
|
// eliminateNoopStore will update in iterator, if necessary.
|
2019-03-29 22:10:24 +08:00
|
|
|
if (eliminateNoopStore(Inst, BBI, AA, MD, DL, TLI, IOL, OBB)) {
|
2016-07-09 00:48:40 +08:00
|
|
|
MadeChange = true;
|
|
|
|
continue;
|
2016-05-18 05:38:13 +08:00
|
|
|
}
|
|
|
|
|
2016-07-09 00:48:40 +08:00
|
|
|
// If we find something that writes memory, get its memory dependence.
|
2019-03-29 22:10:24 +08:00
|
|
|
MemDepResult InstDep = MD->getDependency(Inst, &OBB);
|
2016-05-18 05:38:13 +08:00
|
|
|
|
|
|
|
// Ignore any store where we can't find a local dependence.
|
|
|
|
// FIXME: cross-block DSE would be fun. :)
|
|
|
|
if (!InstDep.isDef() && !InstDep.isClobber())
|
|
|
|
continue;
|
|
|
|
|
|
|
|
// Figure out what location is being stored to.
|
2018-01-21 10:10:54 +08:00
|
|
|
MemoryLocation Loc = getLocForWrite(Inst);
|
2016-05-18 05:38:13 +08:00
|
|
|
|
|
|
|
// If we didn't get a useful location, fail.
|
|
|
|
if (!Loc.Ptr)
|
|
|
|
continue;
|
|
|
|
|
limit the number of instructions per block examined by dead store elimination
Summary: Dead store elimination gets very expensive when large numbers of instructions need to be analyzed. This patch limits the number of instructions analyzed per store to the value of the memdep-block-scan-limit parameter (which defaults to 100). This resulted in no observed difference in performance of the generated code, and no change in the statistics for the dead store elimination pass, but improved compilation time on some files by more than an order of magnitude.
Reviewers: dexonsmith, bruno, george.burgess.iv, dberlin, reames, davidxl
Subscribers: davide, chandlerc, dberlin, davidxl, eraman, tejohnson, mbodart, llvm-commits
Differential Revision: https://reviews.llvm.org/D15537
llvm-svn: 279833
2016-08-27 00:34:27 +08:00
|
|
|
// Loop until we find a store we can eliminate or a load that
|
|
|
|
// invalidates the analysis. Without an upper bound on the number of
|
|
|
|
// instructions examined, this analysis can become very time-consuming.
|
|
|
|
// However, the potential gain diminishes as we process more instructions
|
|
|
|
// without eliminating any of them. Therefore, we limit the number of
|
|
|
|
// instructions we look at.
|
|
|
|
auto Limit = MD->getDefaultBlockScanLimit();
|
2016-05-18 05:38:13 +08:00
|
|
|
while (InstDep.isDef() || InstDep.isClobber()) {
|
|
|
|
// Get the memory clobbered by the instruction we depend on. MemDep will
|
|
|
|
// skip any instructions that 'Loc' clearly doesn't interact with. If we
|
|
|
|
// end up depending on a may- or must-aliased load, then we can't optimize
|
2016-06-16 05:41:22 +08:00
|
|
|
// away the store and we bail out. However, if we depend on something
|
2016-05-18 05:38:13 +08:00
|
|
|
// that overwrites the memory location we *can* potentially optimize it.
|
|
|
|
//
|
|
|
|
// Find out what memory location the dependent instruction stores.
|
|
|
|
Instruction *DepWrite = InstDep.getInst();
|
2018-01-21 10:10:54 +08:00
|
|
|
if (!hasAnalyzableMemoryWrite(DepWrite, *TLI))
|
|
|
|
break;
|
|
|
|
MemoryLocation DepLoc = getLocForWrite(DepWrite);
|
2016-05-18 05:38:13 +08:00
|
|
|
// If we didn't get a useful location, or if it isn't a size, bail out.
|
|
|
|
if (!DepLoc.Ptr)
|
|
|
|
break;
|
|
|
|
|
2016-08-12 09:09:53 +08:00
|
|
|
// Make sure we don't look past a call which might throw. This is an
|
|
|
|
// issue because MemoryDependenceAnalysis works in the wrong direction:
|
|
|
|
// it finds instructions which dominate the current instruction, rather than
|
|
|
|
// instructions which are post-dominated by the current instruction.
|
|
|
|
//
|
|
|
|
// If the underlying object is a non-escaping memory allocation, any store
|
|
|
|
// to it is dead along the unwind edge. Otherwise, we need to preserve
|
|
|
|
// the store.
|
2019-03-29 22:10:24 +08:00
|
|
|
if (LastThrowing && OBB.dominates(DepWrite, LastThrowing)) {
|
2016-08-12 09:09:53 +08:00
|
|
|
const Value* Underlying = GetUnderlyingObject(DepLoc.Ptr, DL);
|
|
|
|
bool IsStoreDeadOnUnwind = isa<AllocaInst>(Underlying);
|
|
|
|
if (!IsStoreDeadOnUnwind) {
|
|
|
|
// We're looking for a call to an allocation function
|
|
|
|
// where the allocation doesn't escape before the last
|
|
|
|
// throwing instruction; PointerMayBeCaptured
|
|
|
|
// reasonably fast approximation.
|
|
|
|
IsStoreDeadOnUnwind = isAllocLikeFn(Underlying, TLI) &&
|
|
|
|
!PointerMayBeCaptured(Underlying, false, true);
|
|
|
|
}
|
|
|
|
if (!IsStoreDeadOnUnwind)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
// If we find a write that is a) removable (i.e., non-volatile), b) is
|
|
|
|
// completely obliterated by the store to 'Loc', and c) which we know that
|
|
|
|
// 'Inst' doesn't load from, then we can remove it.
|
2017-09-26 21:54:28 +08:00
|
|
|
// Also try to merge two stores if a later one only touches memory written
|
|
|
|
// to by the earlier one.
|
2016-05-18 05:38:13 +08:00
|
|
|
if (isRemovable(DepWrite) &&
|
|
|
|
!isPossibleSelfRead(Inst, Loc, DepWrite, *TLI, *AA)) {
|
|
|
|
int64_t InstWriteOffset, DepWriteOffset;
|
2018-05-03 19:03:53 +08:00
|
|
|
OverwriteResult OR = isOverwrite(Loc, DepLoc, DL, *TLI, DepWriteOffset,
|
llvm: Add support for "-fno-delete-null-pointer-checks"
Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.
More details : https://lkml.org/lkml/2018/4/4/601
GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.
-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.
This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.
Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv
Reviewed By: efriedma, george.burgess.iv
Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits
Differential Revision: https://reviews.llvm.org/D47895
llvm-svn: 336613
2018-07-10 06:27:23 +08:00
|
|
|
InstWriteOffset, DepWrite, IOL, *AA,
|
|
|
|
BB.getParent());
|
2017-03-29 22:42:27 +08:00
|
|
|
if (OR == OW_Complete) {
|
2018-05-14 20:53:11 +08:00
|
|
|
LLVM_DEBUG(dbgs() << "DSE: Remove Dead Store:\n DEAD: " << *DepWrite
|
|
|
|
<< "\n KILLER: " << *Inst << '\n');
|
2016-07-18 23:51:31 +08:00
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
// Delete the store and now-dead instructions that feed it.
|
2019-03-29 22:10:24 +08:00
|
|
|
deleteDeadInstruction(DepWrite, &BBI, *MD, *TLI, IOL, OBB);
|
2016-05-18 05:38:13 +08:00
|
|
|
++NumFastStores;
|
|
|
|
MadeChange = true;
|
|
|
|
|
2016-07-07 03:48:52 +08:00
|
|
|
// We erased DepWrite; start over.
|
2019-03-29 22:10:24 +08:00
|
|
|
InstDep = MD->getDependency(Inst, &OBB);
|
2016-07-07 03:48:52 +08:00
|
|
|
continue;
|
2017-03-29 22:42:27 +08:00
|
|
|
} else if ((OR == OW_End && isShortenableAtTheEnd(DepWrite)) ||
|
|
|
|
((OR == OW_Begin &&
|
2016-05-18 05:38:13 +08:00
|
|
|
isShortenableAtTheBeginning(DepWrite)))) {
|
2016-07-23 02:27:24 +08:00
|
|
|
assert(!EnablePartialOverwriteTracking && "Do not expect to perform "
|
|
|
|
"when partial-overwrite "
|
|
|
|
"tracking is enabled");
|
2018-10-09 11:18:56 +08:00
|
|
|
// The overwrite result is known, so these must be known, too.
|
|
|
|
int64_t EarlierSize = DepLoc.Size.getValue();
|
|
|
|
int64_t LaterSize = Loc.Size.getValue();
|
2017-03-29 22:42:27 +08:00
|
|
|
bool IsOverwriteEnd = (OR == OW_End);
|
2016-07-28 01:25:20 +08:00
|
|
|
MadeChange |= tryToShorten(DepWrite, DepWriteOffset, EarlierSize,
|
2016-07-23 02:27:24 +08:00
|
|
|
InstWriteOffset, LaterSize, IsOverwriteEnd);
|
2017-09-26 21:54:28 +08:00
|
|
|
} else if (EnablePartialStoreMerging &&
|
|
|
|
OR == OW_PartialEarlierWithFullLater) {
|
|
|
|
auto *Earlier = dyn_cast<StoreInst>(DepWrite);
|
|
|
|
auto *Later = dyn_cast<StoreInst>(Inst);
|
|
|
|
if (Earlier && isa<ConstantInt>(Earlier->getValueOperand()) &&
|
2019-05-24 16:32:02 +08:00
|
|
|
DL.typeSizeEqualsStoreSize(
|
|
|
|
Earlier->getValueOperand()->getType()) &&
|
2018-01-30 21:53:59 +08:00
|
|
|
Later && isa<ConstantInt>(Later->getValueOperand()) &&
|
2019-05-24 16:32:02 +08:00
|
|
|
DL.typeSizeEqualsStoreSize(
|
|
|
|
Later->getValueOperand()->getType()) &&
|
2018-01-30 21:53:59 +08:00
|
|
|
memoryIsNotModifiedBetween(Earlier, Later, AA)) {
|
2017-09-26 21:54:28 +08:00
|
|
|
// If the store we find is:
|
|
|
|
// a) partially overwritten by the store to 'Loc'
|
|
|
|
// b) the later store is fully contained in the earlier one and
|
|
|
|
// c) they both have a constant value
|
2019-05-24 16:32:02 +08:00
|
|
|
// d) none of the two stores need padding
|
2017-09-26 21:54:28 +08:00
|
|
|
// Merge the two stores, replacing the earlier store's value with a
|
|
|
|
// merge of both values.
|
|
|
|
// TODO: Deal with other constant types (vectors, etc), and probably
|
|
|
|
// some mem intrinsics (if needed)
|
|
|
|
|
|
|
|
APInt EarlierValue =
|
|
|
|
cast<ConstantInt>(Earlier->getValueOperand())->getValue();
|
|
|
|
APInt LaterValue =
|
|
|
|
cast<ConstantInt>(Later->getValueOperand())->getValue();
|
|
|
|
unsigned LaterBits = LaterValue.getBitWidth();
|
|
|
|
assert(EarlierValue.getBitWidth() > LaterValue.getBitWidth());
|
|
|
|
LaterValue = LaterValue.zext(EarlierValue.getBitWidth());
|
|
|
|
|
|
|
|
// Offset of the smaller store inside the larger store
|
|
|
|
unsigned BitOffsetDiff = (InstWriteOffset - DepWriteOffset) * 8;
|
|
|
|
unsigned LShiftAmount =
|
|
|
|
DL.isBigEndian()
|
|
|
|
? EarlierValue.getBitWidth() - BitOffsetDiff - LaterBits
|
|
|
|
: BitOffsetDiff;
|
|
|
|
APInt Mask =
|
|
|
|
APInt::getBitsSet(EarlierValue.getBitWidth(), LShiftAmount,
|
|
|
|
LShiftAmount + LaterBits);
|
|
|
|
// Clear the bits we'll be replacing, then OR with the smaller
|
|
|
|
// store, shifted appropriately.
|
|
|
|
APInt Merged =
|
|
|
|
(EarlierValue & ~Mask) | (LaterValue << LShiftAmount);
|
2018-05-14 20:53:11 +08:00
|
|
|
LLVM_DEBUG(dbgs() << "DSE: Merge Stores:\n Earlier: " << *DepWrite
|
|
|
|
<< "\n Later: " << *Inst
|
|
|
|
<< "\n Merged Value: " << Merged << '\n');
|
2017-09-26 21:54:28 +08:00
|
|
|
|
|
|
|
auto *SI = new StoreInst(
|
|
|
|
ConstantInt::get(Earlier->getValueOperand()->getType(), Merged),
|
|
|
|
Earlier->getPointerOperand(), false, Earlier->getAlignment(),
|
|
|
|
Earlier->getOrdering(), Earlier->getSyncScopeID(), DepWrite);
|
|
|
|
|
|
|
|
unsigned MDToKeep[] = {LLVMContext::MD_dbg, LLVMContext::MD_tbaa,
|
|
|
|
LLVMContext::MD_alias_scope,
|
|
|
|
LLVMContext::MD_noalias,
|
|
|
|
LLVMContext::MD_nontemporal};
|
|
|
|
SI->copyMetadata(*DepWrite, MDToKeep);
|
|
|
|
++NumModifiedStores;
|
|
|
|
|
|
|
|
// Remove earlier, wider, store
|
2019-03-29 22:10:24 +08:00
|
|
|
OBB.replaceInstruction(DepWrite, SI);
|
2017-09-26 21:54:28 +08:00
|
|
|
|
|
|
|
// Delete the old stores and now-dead instructions that feed them.
|
2019-03-29 22:10:24 +08:00
|
|
|
deleteDeadInstruction(Inst, &BBI, *MD, *TLI, IOL, OBB);
|
|
|
|
deleteDeadInstruction(DepWrite, &BBI, *MD, *TLI, IOL, OBB);
|
2017-09-26 21:54:28 +08:00
|
|
|
MadeChange = true;
|
|
|
|
|
|
|
|
// We erased DepWrite and Inst (Loc); start over.
|
|
|
|
break;
|
|
|
|
}
|
2016-05-18 05:38:13 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// If this is a may-aliased store that is clobbering the store value, we
|
|
|
|
// can keep searching past it for another must-aliased pointer that stores
|
|
|
|
// to the same location. For example, in:
|
|
|
|
// store -> P
|
|
|
|
// store -> Q
|
|
|
|
// store -> P
|
|
|
|
// we can remove the first store to P even though we don't know if P and Q
|
|
|
|
// alias.
|
|
|
|
if (DepWrite == &BB.front()) break;
|
|
|
|
|
|
|
|
// Can't look past this instruction if it might read 'Loc'.
|
2017-12-06 04:12:23 +08:00
|
|
|
if (isRefSet(AA->getModRefInfo(DepWrite, Loc)))
|
2016-05-18 05:38:13 +08:00
|
|
|
break;
|
|
|
|
|
limit the number of instructions per block examined by dead store elimination
Summary: Dead store elimination gets very expensive when large numbers of instructions need to be analyzed. This patch limits the number of instructions analyzed per store to the value of the memdep-block-scan-limit parameter (which defaults to 100). This resulted in no observed difference in performance of the generated code, and no change in the statistics for the dead store elimination pass, but improved compilation time on some files by more than an order of magnitude.
Reviewers: dexonsmith, bruno, george.burgess.iv, dberlin, reames, davidxl
Subscribers: davide, chandlerc, dberlin, davidxl, eraman, tejohnson, mbodart, llvm-commits
Differential Revision: https://reviews.llvm.org/D15537
llvm-svn: 279833
2016-08-27 00:34:27 +08:00
|
|
|
InstDep = MD->getPointerDependencyFrom(Loc, /*isLoad=*/ false,
|
|
|
|
DepWrite->getIterator(), &BB,
|
|
|
|
/*QueryInst=*/ nullptr, &Limit);
|
2016-05-18 05:38:13 +08:00
|
|
|
}
|
2010-12-01 05:32:12 +08:00
|
|
|
}
|
2011-09-07 02:14:09 +08:00
|
|
|
|
2016-07-23 02:27:24 +08:00
|
|
|
if (EnablePartialOverwriteTracking)
|
|
|
|
MadeChange |= removePartiallyOverlappedStores(AA, DL, IOL);
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
// If this block ends in a return, unwind, or unreachable, all allocas are
|
|
|
|
// dead at its end, which means stores to them are also dead.
|
|
|
|
if (BB.getTerminator()->getNumSuccessors() == 0)
|
2019-03-29 22:10:24 +08:00
|
|
|
MadeChange |= handleEndBlock(BB, AA, MD, TLI, IOL, OBB);
|
2016-05-18 05:38:13 +08:00
|
|
|
|
|
|
|
return MadeChange;
|
|
|
|
}
|
|
|
|
|
|
|
|
static bool eliminateDeadStores(Function &F, AliasAnalysis *AA,
|
|
|
|
MemoryDependenceResults *MD, DominatorTree *DT,
|
|
|
|
const TargetLibraryInfo *TLI) {
|
|
|
|
bool MadeChange = false;
|
|
|
|
for (BasicBlock &BB : F)
|
|
|
|
// Only check non-dead blocks. Dead blocks may have strange pointer
|
|
|
|
// cycles that will confuse alias analysis.
|
|
|
|
if (DT->isReachableFromEntry(&BB))
|
|
|
|
MadeChange |= eliminateDeadStores(BB, AA, MD, DT, TLI);
|
2016-08-12 09:09:53 +08:00
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
return MadeChange;
|
|
|
|
}
|
|
|
|
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
// DSE Pass
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
PreservedAnalyses DSEPass::run(Function &F, FunctionAnalysisManager &AM) {
|
|
|
|
AliasAnalysis *AA = &AM.getResult<AAManager>(F);
|
|
|
|
DominatorTree *DT = &AM.getResult<DominatorTreeAnalysis>(F);
|
|
|
|
MemoryDependenceResults *MD = &AM.getResult<MemoryDependenceAnalysis>(F);
|
|
|
|
const TargetLibraryInfo *TLI = &AM.getResult<TargetLibraryAnalysis>(F);
|
|
|
|
|
|
|
|
if (!eliminateDeadStores(F, AA, MD, DT, TLI))
|
|
|
|
return PreservedAnalyses::all();
|
2017-01-15 14:32:49 +08:00
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
PreservedAnalyses PA;
|
2017-01-15 14:32:49 +08:00
|
|
|
PA.preserveSet<CFGAnalyses>();
|
2016-05-18 05:38:13 +08:00
|
|
|
PA.preserve<GlobalsAA>();
|
|
|
|
PA.preserve<MemoryDependenceAnalysis>();
|
|
|
|
return PA;
|
|
|
|
}
|
|
|
|
|
2016-07-10 19:28:51 +08:00
|
|
|
namespace {
|
2017-10-14 05:17:07 +08:00
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
/// A legacy pass for the legacy pass manager that wraps \c DSEPass.
|
|
|
|
class DSELegacyPass : public FunctionPass {
|
|
|
|
public:
|
2017-10-14 05:17:07 +08:00
|
|
|
static char ID; // Pass identification, replacement for typeid
|
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
DSELegacyPass() : FunctionPass(ID) {
|
|
|
|
initializeDSELegacyPassPass(*PassRegistry::getPassRegistry());
|
|
|
|
}
|
|
|
|
|
|
|
|
bool runOnFunction(Function &F) override {
|
|
|
|
if (skipFunction(F))
|
|
|
|
return false;
|
|
|
|
|
|
|
|
DominatorTree *DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
|
|
|
|
AliasAnalysis *AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
|
|
|
|
MemoryDependenceResults *MD =
|
|
|
|
&getAnalysis<MemoryDependenceWrapperPass>().getMemDep();
|
|
|
|
const TargetLibraryInfo *TLI =
|
Change TargetLibraryInfo analysis passes to always require Function
Summary:
This is the first change to enable the TLI to be built per-function so
that -fno-builtin* handling can be migrated to use function attributes.
See discussion on D61634 for background. This is an enabler for fixing
handling of these options for LTO, for example.
This change should not affect behavior, as the provided function is not
yet used to build a specifically per-function TLI, but rather enables
that migration.
Most of the changes were very mechanical, e.g. passing a Function to the
legacy analysis pass's getTLI interface, or in Module level cases,
adding a callback. This is similar to the way the per-function TTI
analysis works.
There was one place where we were looking for builtins but not in the
context of a specific function. See FindCXAAtExit in
lib/Transforms/IPO/GlobalOpt.cpp. I'm somewhat concerned my workaround
could provide the wrong behavior in some corner cases. Suggestions
welcome.
Reviewers: chandlerc, hfinkel
Subscribers: arsenm, dschuff, jvesely, nhaehnle, mehdi_amini, javed.absar, sbc100, jgravelle-google, eraman, aheejin, steven_wu, george.burgess.iv, dexonsmith, jfb, asbirlea, gchatelet, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66428
llvm-svn: 371284
2019-09-07 11:09:36 +08:00
|
|
|
&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI(F);
|
2016-05-18 05:38:13 +08:00
|
|
|
|
|
|
|
return eliminateDeadStores(F, AA, MD, DT, TLI);
|
|
|
|
}
|
|
|
|
|
|
|
|
void getAnalysisUsage(AnalysisUsage &AU) const override {
|
|
|
|
AU.setPreservesCFG();
|
|
|
|
AU.addRequired<DominatorTreeWrapperPass>();
|
|
|
|
AU.addRequired<AAResultsWrapperPass>();
|
|
|
|
AU.addRequired<MemoryDependenceWrapperPass>();
|
|
|
|
AU.addRequired<TargetLibraryInfoWrapperPass>();
|
|
|
|
AU.addPreserved<DominatorTreeWrapperPass>();
|
|
|
|
AU.addPreserved<GlobalsAAWrapperPass>();
|
|
|
|
AU.addPreserved<MemoryDependenceWrapperPass>();
|
|
|
|
}
|
|
|
|
};
|
2017-10-14 05:17:07 +08:00
|
|
|
|
2016-07-10 19:28:51 +08:00
|
|
|
} // end anonymous namespace
|
2016-05-18 05:38:13 +08:00
|
|
|
|
|
|
|
char DSELegacyPass::ID = 0;
|
2017-10-14 05:17:07 +08:00
|
|
|
|
2016-05-18 05:38:13 +08:00
|
|
|
INITIALIZE_PASS_BEGIN(DSELegacyPass, "dse", "Dead Store Elimination", false,
|
|
|
|
false)
|
|
|
|
INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
|
|
|
|
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
|
|
|
|
INITIALIZE_PASS_DEPENDENCY(GlobalsAAWrapperPass)
|
|
|
|
INITIALIZE_PASS_DEPENDENCY(MemoryDependenceWrapperPass)
|
|
|
|
INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
|
|
|
|
INITIALIZE_PASS_END(DSELegacyPass, "dse", "Dead Store Elimination", false,
|
|
|
|
false)
|
|
|
|
|
|
|
|
FunctionPass *llvm::createDeadStoreEliminationPass() {
|
|
|
|
return new DSELegacyPass();
|
2007-07-13 05:41:30 +08:00
|
|
|
}
|