The only real user of the flag was removeCopyByCommutingDef(), and it
has been switched to LiveIntervals::hasPHIKill().
All the code changed by this patch was only concerned with computing and
propagating the flag.
llvm-svn: 161255
The VNInfo::HAS_PHI_KILL is only half supported. We precompute it in
LiveIntervalAnalysis, but it isn't properly updated by live range
splitting and functions like shrinkToUses().
It is only used in one place: RegisterCoalescer::removeCopyByCommutingDef().
This patch changes that function to use a new LiveIntervals::hasPHIKill()
function that computes the flag for a given value number.
llvm-svn: 161254
There is no reason why we should not track the memory which was not
allocated in the current function, but was freed there. This would
allow to catch more use-after-free and double free with no/limited IPA.
Also fix a realloc issue which surfaced as the result of this patch.
llvm-svn: 161248
rotate is a critical algorithm because it is often used by other algorithms,
both std and non-std. The main thrust of this optimization is a specialized
algorithm when the 'distance' to be shifted is 1 (either left or right). To my
surprise, this 'optimization' was not effective for types like std::string.
std::string favors rotate algorithms which only use swap. But for types like
scalars, and especially when the sequence is random access, these new
specializations are a big win. If it is a vector<size_t> for example, the
rotate is done via a memmove and can be several times faster than the gcd
algorithm.
I'm using is_trivially_move_assignable to distinguish between types like int and
types like string. This is obviously an ad-hoc approximation, but I haven't
found a case where it doesn't give good results.
I've used a 'static if' (with is_trivially_move_assignable) in three places.
Testing with both -Os and -O3 showed that clang eliminated all code not be
executed by the 'static if' (including the 'static if' itself).
llvm-svn: 161247
Translate the selected parallel loop body into a ptx string and run it with the
cuda driver API. We limit this preliminary implementation to target the
following special test cases:
- Support only 2-dimensional parallel loops with or without only one innermost
non-parallel loop.
- Support write memory access to only one array in a SCoP.
The patch was committed with smaller changes to the build system:
There is now a flag to enable gpu code generation explictly. This was required
as we need the llvm.codegen() patch applied on the llvm sources, to compile this
feature correctly. Also, enabling gpu code generation does not require cuda.
This requirement was removed to allow 'make polly-test' runs, even without an
installed cuda runtime.
Contributed by: Yabin Hu <yabin.hwu@gmail.com>
llvm-svn: 161239
By C++ standard, the vtable should be generated if the first non-inline
virtual function is defined in the TU. Current version of clang doesn't
generate vtable if the first virtual function is defaulted, because the
key function is regarded as the defaulted function.
Patch by Li Kan!
llvm-svn: 161236
This fixes a conflict between polly::createIndVarSimplifyPass() and
llvm::createIndVarSimplifyPass(), which causes problems on windows.
Reported by: Michael Kruse <MichaelKruse@meinersbur.de
llvm-svn: 161235
The Apple linker fails by default, if some function calls can not be resolved at
link time. However, all functions that are part of LLVM itself will not be
linked into Polly, but will be provided by the compiler that Polly is loaded
into. Hence, during linking we need to ignore failures due to unresolved
function calls.
llvm-svn: 161234
to store additional flag options since too many things can
and do override CPPFLAGS. Also, this is exported, unlike CPPFLAGS
so it can be actually used elsewhere. This should enable us
to remove the AC_SUBSTs in the intel checks, but I have no way
of testing it.
llvm-svn: 161233
Fast isel doesn't currently have support for translating builtin function
calls to target instructions. For embedded environments where the library
functions are not available, this is a matter of correctness and not
just optimization. Most of this patch is just arranging to make the
TargetLibraryInfo available in fast isel. <rdar://problem/12008746>
llvm-svn: 161232
This just provides a way to look up a LibFunc::Func enum value for a
function name. Alphabetize the enums and function names so we can use a
binary search.
llvm-svn: 161231
The "findUsedStructTypes" method is very expensive to run. It needs to be
optimized so that LTO can run faster. Splitting this method out of the Module
class will help this occur. For instance, it can keep a list of seen objects so
that it doesn't process them over and over again.
llvm-svn: 161228
engine.
The code that was supposed to split the tie in a deterministic way is
not deterministic. Most likely one of the profile methods uses a
pointer. After this change we do finally get the consistent diagnostic
output. Testing this requires running the analyzer on large code bases
and diffing the results.
llvm-svn: 161224
Now that TableGen supports references to NAME w/o it being explicitly
referenced in the definition's own name, use that to simplify
assembly InstAlias definitions in multiclasses.
llvm-svn: 161218
There's still more work to be done here; this doesn't catch reference
parameters or return values. But it's a step in the right direction.
Part of <rdar://problem/11212286>.
llvm-svn: 161214
Add more comments and use early returns to reduce nesting in isLoadFoldable.
Also disable folding for V_SET0 to avoid introducing a const pool entry and
a const pool load.
rdar://10554090 and rdar://11873276
llvm-svn: 161207
yaml2obj takes a textual description of an object file in YAML format
and outputs the binary equivalent. This greatly simplifies writing
tests that take binary object files as input.
llvm-svn: 161205
Previously, def NAME values were only populated, and references to NAME
resolved, when NAME was referenced in the 'def' entry of the multiclass
sub-entry. e.g.,
multiclass foo<...> {
def prefix_#NAME : ...
}
It's useful, however, to be able to reference NAME even when the default
def name is used. For example, when a multiclass has 'def : Pat<...>'
or 'def : InstAlias<...>' entries which refer to earlier instruction
definitions in the same multiclass. e.g.,
multiclass myMulti<RegisterClass rc> {
def _r : myI<(outs rc:$d), (ins rc:$r), "r $d, $r", []>;
def : InstAlias<\"wilma $r\", (!cast<Instruction>(NAME#\"_r\") rc:$r, rc:$r)>;
}
llvm-svn: 161198
Whenever both instruction depths and instruction heights are known in a
block, it is possible to compute the length of the critical path as
max(depth+height) over the instructions in the block.
The stored live-in lists make it possible to accurately compute the
length of a critical path that bypasses the current (small) block.
llvm-svn: 161197
__time_get_storage<char> to match the initialization behavior in
__time_get_storage<wchar>. Without the initialization, valgrind
reports errors in the subsequent calls to strftime_l.
llvm-svn: 161196