Summary:
I searched Transforms/ and Analysis/ for 'ByVal' and updated those call
sites to check for inalloca if appropriate.
I added tests for any change that would allow an optimization to fire on
inalloca.
Reviewers: nlewycky
Differential Revision: http://llvm-reviews.chandlerc.com/D2449
llvm-svn: 200281
Because the object files are now readable to humans, I don't think we need the
source assembly file any more, so I removed them too in this commit.
llvm-svn: 200276
LCSSA from it caused a crasher with the LoopUnroll pass.
This crasher is really nasty. We destroy LCSSA form in a suprising way.
When unrolling a loop into an outer loop, we not only need to restore
LCSSA form for the outer loop, but for all children of the outer loop.
This is somewhat obvious in retrospect, but hey!
While this seems pretty heavy-handed, it's not that bad. Fundamentally,
we only do this when we unroll a loop, which is already a heavyweight
operation. We're unrolling all of these hypothetical inner loops as
well, so their size and complexity is already on the critical path. This
is just adding another pass over them to re-canonicalize.
I have a test case from PR18616 that is great for reproducing this, but
pretty useless to check in as it relies on many 10s of nested empty
loops that get unrolled and deleted in just the right order. =/ What's
worse is that investigating this has exposed another source of failure
that is likely to be even harder to test. I'll try to come up with test
cases for these fixes, but I want to get the fixes into the tree first
as they're causing crashes in the wild.
llvm-svn: 200273
Before this patch we used getIntImmCost from TargetTransformInfo to determine if
a load of a constant should be converted to just a constant, but the threshold
for this was set to an arbitrary value. This value works well for the two
targets (X86 and ARM) that implement this target-hook, but it isn't
target-independent at all.
Now targets have the possibility to decide directly if this optimization should
be performed. The default value is set to false to preserve the current
behavior. The target hook has been moved to TargetLowering, which removed the
last use and need of TargetTransformInfo in SelectionDAG.
llvm-svn: 200271
The vectorizer takes a loop like this and widens all instructions except for the
store. The stores are scalarized/unrolled and hidden behind an "if" block.
for (i = 0; i < 128; ++i) {
if (a[i] < 10)
a[i] += val;
}
for (i = 0; i < 128; i+=2) {
v = a[i:i+1];
v0 = (extract v, 0) + 10;
v1 = (extract v, 1) + 10;
if (v0 < 10)
a[i] = v0;
if (v1 < 10)
a[i] = v1;
}
The vectorizer relies on subsequent optimizations to sink instructions into the
conditional block where they are anticipated.
The flag "vectorize-num-stores-pred" controls whether and how many stores to
handle this way. Vectorization of conditional stores is disabled per default for
now.
This patch also adds a change to the heuristic when the flag
"enable-loadstore-runtime-unroll" is enabled (off by default). It unrolls small
loops until load/store ports are saturated. This heuristic uses TTI's
getMaxUnrollFactor as a measure for load/store ports.
I also added a second flag -enable-cond-stores-vec. It will enable vectorization
of conditional stores. But there is no cost model for vectorization of
conditional stores in place yet so this will not do good at the moment.
rdar://15892953
Results for x86-64 -O3 -mavx +/- -mllvm -enable-loadstore-runtime-unroll
-vectorize-num-stores-pred=1 (before the BFI change):
Performance Regressions:
Benchmarks/Ptrdist/yacr2/yacr2 7.35% (maze3() is identical but 10% slower)
Applications/siod/siod 2.18%
Performance improvements:
mesa -4.42%
libquantum -4.15%
With a patch that slightly changes the register heuristics (by subtracting the
induction variable on both sides of the register pressure equation, as the
induction variable is probably not really unrolled):
Performance Regressions:
Benchmarks/Ptrdist/yacr2/yacr2 7.73%
Applications/siod/siod 1.97%
Performance Improvements:
libquantum -13.05% (we now also unroll quantum_toffoli)
mesa -4.27%
llvm-svn: 200270
code to see if we're emitting a function into a non-default
text section. This is still a less-than-ideal solution, but more
contained than r199871 to determine whether or not we're emitting
code into an array of comdat sections.
llvm-svn: 200269
case when correcting for too many arguments (r191450 had only fixed the
problem for when there were too few arguments). Also fix the underlining
for both cases.
llvm-svn: 200268
GDBProcessCommunicationServer now optionally takes a PlatformSP that
defaults to the default platform for the host.
GDBProcessCommunicationServer::LaunchProcess () now uses the platform
to launch the process.
lldb-gdbserver now takes an optional --platform={platform_plugin_name}
or -p {platform_plugin_name} command line option. If no platform is
specified, the default platform for the host is used; otherwise, if
the platform_plugin_name matches a registered platform plugin or
matches the default platform's name (which is not necessarily
registered by name in the case of 'host'), that platform is used. If
the platform name cannot be resolved, lldb-gdbserver exits after
printing all the available platform plugin names and the default
platform plugin name.
llvm-svn: 200266
The many many benefits include:
1 - Input/Output/Error streams are now handled as real streams not a push style input
2 - auto completion in python embedded interpreter
3 - multi-line input for "script" and "expression" commands now allow you to edit previous/next lines using up and down arrow keys and this makes multi-line input actually a viable thing to use
4 - it is now possible to use curses to drive LLDB (please try the "gui" command)
We will need to deal with and fix any buildbot failures and tests and arise now that input/output and error are correctly hooked up in all cases.
llvm-svn: 200263
uint32.
When folding branches to common destination, the updated branch weights
can exceed uint32 by more than factor of 2. We should keep halving the
weights until they can fit into uint32.
llvm-svn: 200262
PR18322. This test will be reenabled when the SDK gets fixed. In the meantime,
it is pretty disruptive to have this test keep failing.
llvm-svn: 200256
This brings MC into line with GNU 'as' on ARM, and it brings the ARM
target into line with most other LLVM targets, which declare the
initial CFI state with addInitialFrameState().
Without this, functions generated with .cfi_startproc/endproc on ARM
will tend to cause GDB to abort with:
gdb/dwarf2-frame.c:1132: internal-error: Unknown CFA rule.
I've also tested this by comparing the output of "readelf -w" on the
object files produced by llvm-mc and gas when given the .s file added
here.
This change is part of addressing PR18636.
Differential Revision: http://llvm-reviews.chandlerc.com/D2597
llvm-svn: 200255
if the remote stub provided enough information to identify it in the
qProcessInfo packet response. (e.g. for an Apple device where we know
it is Mach-O, the cpu type & cpu sub type).
<rdar://problem/15847901>
llvm-svn: 200253
Also update the comment, since it actually produces a
select (setcc) instead of select_cc.
It was checking and using the setcc result type for the
type of the sext, instead of the type of the compared items.
In my problem case, the sext was to i32 and was used as the setcc type,
but the expected type was i64.
No test since I haven't been able to hit the problem with
this on any in-tree targets.
llvm-svn: 200249
Summary:
This commit gives an address mode to the PLD instruction. We
were getting an assertion failure in the frame lowering code
because we had code that was doing a pld of a stack allocated
address. The frame lowering was checking the address mode and
then asserting because pld had none defined.
This commit fixes pld for arm mode. There was a previous fix for
thumb mode in a separate commit. The commit for thumb mode
added a test in a separate file because it would otherwise fail
for arm. This commit moves the thumb test back into the prefetch.ll
file and adds the corresponding arm test.
Differential Revision: http://llvm-reviews.chandlerc.com/D2622
llvm-svn: 200248
ValueObjectPrinter could enter an infinite loop while trying to display an aptly formed ValueObject: a reference, with a child of some pointer type, such that the pointees chain ended up pointing back to some part of itself - a pointer to itself being the simplest such case
Fixed here by only setting a pointer depth when needed, and ensuring that we won't overflow and wrap the pointer depth when it's zero.
llvm-svn: 200247
This change modifies the 'A' command handler's launch code to launch
with LaunchProcess (). The net effect is that the default process
monitoring that LaunchProcess () adds will kick in, allowing the
GDBRemoteCommunicationServer to be able to reap processes started with
this facility correctly. Later, in the case of lldb-gdbserver, we'll
also have the proper process monitoring going on to really debug the
inferior process.
llvm-svn: 200246
This failed the ms-intrin.cpp test.
This reverts commit r200237.
This also comments out the _setjmpex declaration for now so that
intrin.h will work on x64 targets.
llvm-svn: 200243
This reverts commit r200233.
The test required a registered ARM target, it was testing LLVM's
generated assembly, and it should have been an IRGen test.
llvm-svn: 200242
parimary class and in mrr mode, assume property's default
memory attribute (assign) and to prevent a bogus warning.
// rdar://15859862
llvm-svn: 200238
This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when
the operand in input is a build vector of constants (or UNDEFs).
The inability to fold a sext/zext of a constant build_vector was the root
cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support.
Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a
ConstantSDNode.
llvm-svn: 200234