On ARM, peephole optimization for ABS creates a trivial cfg triangle which tempts machine sink to sink instructions in code which is really straight line code. Sometimes this sinking may alter register allocator input such that use and def of a reg is divided by a branch in between, which may result in extra spills. Now mahine sink avoids sinking if final sink destination is post dominator.
Radar 10266272.
llvm-svn: 146604
freebsd bots happy. In the longer term, we should have a mechanism for moving
constexpr recursion off the call stack, to support the default limit of 512
suggested by the standard.
llvm-svn: 146596
LLDBSwigPythonCallCommand crashes when a command script returns an object
Add more robustness to LLDBSwigPythonCallCommand. It should check whether the returned Python object
is a string, and only assign it as the error msg when the check holds.
Also add a regression test.
llvm-svn: 146584
r0 = mov #0
r0 = moveq #1
Then the second instruction has an implicit data dependency on the first
instruction. Sadly I have yet to come up with a small test case that
demonstrate the post-ra scheduler taking advantage of this.
llvm-svn: 146583
Work in progress. Parsing for non-writeback, single spaced register lists
works now. The rest have the representations better factored, but still
need more to be able to parse properly.
llvm-svn: 146579
I cannot reproduce the failures neither on my machine nor on the same buildbot machine (with the clang binary built on it). Let's see if it fails again..
llvm-svn: 146574
When 'cmp rn #imm' doesn't match due to the immediate not being representable,
but 'cmn rn, #-imm' does match, use the latter in place of the former, as
it's equivalent.
rdar://10552389
llvm-svn: 146567
To extract a preoptimized LLVM-IR file from a C-file run:
clang -Xclang -load -Xclang LLVMPolly.so -O0 -mllvm -polly file.c -S -emit-llvm
On the generated file you can directly run passes such as:
'opt -view-scops file.s'
llvm-svn: 146560
Previously the scheduler was splitting bands at the level at which it detected
that the splitting of the band is necessary. This may introduce an additional
level of bands, that can be avoided by backtracking and splitting on a higher
level. Additional splits reduce the number of loops that can be tiled, such that
avoiding splits and maximizing the band depth seems preferable.
As a first data point we looked at 2mm and 3mm from the polybench test suite.
For both maximizing the tilable bands results in a significant (5-10x)
performance improvement.
This patch enables the isl scheduler option to maximize the band depth.
llvm-svn: 146557
If larger coefficients appear as part of the input dependences, the schedule
calculation can take a very long time. We observed that the main overhead in
this calculation is due to optimizing the constant coefficients. They are
misused to increase locality by merging several unrelated dimensions into a
single dimension. This unwanted optimization increases the complexity of the
generated code and furthermore slows it down.
We use a new isl scheduler option to bound the values in the constant dimension
by a user defined value (20 in our case). If the right value is choosen, costly
overoptimization is prevented.
This solution works, but requires a specific (here almost randomly choosen)
value by which the constants are bound. For the moment, this is our best
solution, but we hope to to find a more generic one later on.
After these patch the extremly long compile time for simple kernels like 2mm or
3mm is reduced to a reasonable amount of time (Not more than a couple of seconds
even in debug mode).
llvm-svn: 146556
dispatch functions that are implemented in hand-written assembly.
There is also hand-written eh_frame instructions for unwinding
from these functions.
Normally we don't use eh_frame instructions for the currently
executing function, prefering the assembly instruction profiling
method. But in these hand-written dispatch functions, the
profiling is doomed and we should use the eh_frame instructions.
Unfortunately there's no easy way to flag/extend the eh_frame/debug_frame
sections to annotate if the unwind instructions are accurate at
all addresses ("asynchronous") or if they are only accurate at locations
that can throw an exception ("synchronous" and the normal case for
gcc/clang generated eh_frame/debug_frame CFI).
<rdar://problem/10508134>
llvm-svn: 146551