This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen
for Sandy Bridge and Ivy Bridge. There is no functionality change intended for
those chips. Previously, the absence of AVX2 was being used as a proxy to detect
this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2
that do not have the 32-byte unaligned access slowdown.
Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ).
Differential Revision: http://reviews.llvm.org/D6355
llvm-svn: 222544
shuffle lowering to allow much better blend matching.
Specifically, with the new structure the code seems clearer to me and we
correctly can hit the cases where merging two 128-bit lanes is a clear
win and can be shuffled cheaply afterward.
llvm-svn: 222539
offsets for code models other than small/medium. For JIT application,
memory layout is less controlled and can result in truncations
otherwise.
Patch from Akos Kiss.
Differential Revision: http://reviews.llvm.org/D6079
llvm-svn: 222538
a bunch more improvements.
Non-lane-crossing is fine, the key is that lane merging only makes sense
for single-input shuffles. Not sure why I got so turned around here. The
code all works, I was just using the wrong model for it.
This only updates v4 and v8 lowering. The v16 and v32 lowering requires
restructuring the entire check sequence.
llvm-svn: 222537
Before this patch, the DAGCombiner only tried to convert build_vector dag nodes
into shuffles if all operands were either extract_vector_elt or undef.
This patch improves that logic and teaches the DAGCombiner how to deal with
build_vector dag nodes where one or more operands are zero. A build_vector
dag node with some zero operands is turned into a shuffle only if the resulting
shuffle mask is legal for the target.
llvm-svn: 222536
Let the users of SymbolizationLoop define a function that produces the list of .dSYM hints (possible path to the .dSYM bundle) for the given binary.
Because the hints can't be added to an existing llvm-symbolizer process, we spawn a new symbolizer process ones each time a new hint appears.
Those can only appear for binaries that we haven't seen before.
llvm-svn: 222535
lanes.
By special casing these we can often either reduce the total number of
shuffles significantly or reduce the number of (high latency on Haswell)
AVX2 shuffles that potentially cross 128-bit lanes. Even when these
don't actually cross lanes, they have much higher latency to support
that. Doing two of them and a blend is worse than doing a single insert
across the 128-bit lanes to blend and then doing a single interleaved
shuffle.
While this seems like a narrow case, it kept cropping up on me and the
difference is *huge* as you can see in many of the test cases. I first
hit this trying to perfectly fix the interleaving shuffle patterns used
by Halide for AVX2.
llvm-svn: 222533
Previously this was only used for JavaScript.
Before:
functionCall({
int i;
int j;
},
aaaa, bbbb, cccc);
After:
functionCall({
int i;
int j;
}, aaaa, bbbb, cccc);
llvm-svn: 222531
With AllowShortCaseLabelsOnASingleLine set to true:
This gets now left unchanged:
case 1:
// comment
return;
Whereas before it was changed into:
case 1: // comment return;
This fixes llvm.org/PR21630.
llvm-svn: 222529
Before:
public final<X> Foo foo() {
}
public abstract<X> Foo foo();
After:
public final <X> Foo foo() {
}
public abstract <X> Foo foo();
Patch by Harry Terkelsen. Thank you.
llvm-svn: 222527
Revision 220571 removes the requirement to use -pie for tsan binaries. So remove -pie from driver.
Also s/hasZeroBaseShadow/requiresPIE/ because that is what it is used for. Msan does not have zero-based shadow, but requires pie. And in general the relation between zero-based shadow and pie is unclear.
http://reviews.llvm.org/D6318
llvm-svn: 222526
merging 128-bit subvectors and also shuffling all the elements of those
subvectors. Currently we generate pretty bad code for many of these, but
I'm testing a patch that should dramatically improve this in addition to
making the shuffle lowering robust to other changes.
llvm-svn: 222525
This patch simplifies the logic that combines a pair of shuffle nodes into
a single shuffle if there is a legal mask. Also added comments to better
describe the algorithm. No functional change intended.
llvm-svn: 222522
Clang r181627 moved a check for block-scope variables into this code for
handling thread storage class specifiers, but in the process, it broke the
logic for checking if the target supports TLS. Fix this with some simple
restructuring of the code. rdar://problem/18796883
llvm-svn: 222512
Issue D5632 fixed an issue where linux would dump spurious output to tty on startup (due to a broadcast stop event). After the checkin, it was noticed on FreeBSD a unit test was now failing. On closer investigation I found the test was using the C++ API to launch an inferior while using an SBListener to monitor the public state changes. As on OSx, it was expecting to see:
eStateRunning
eStateStopped
On Linux/FreeBSD, there is an extra state change
eStateLaunching
eStateRunning
eStateStopped
I reworked the test to work for both cases and re-enabled the test of FreeBSD.
Differential Revision: http://reviews.llvm.org/D5837
llvm-svn: 222511
E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip)
A hook is added to allow the target to control whether it needs to do such combine.
Reviewed in http://reviews.llvm.org/D6334
llvm-svn: 222510
to be newer than we were expecting. That happens if .pcm's get moved between
file systems during a distributed build. (It's still not OK for them to actually
be different, though, so we still check the size and signature matches.)
llvm-svn: 222507
special member function.
No test yet: the only testcases we have for this issue are extremely complex.
Testcase will be added once I get a reasonable reduction.
llvm-svn: 222506
This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in
the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on
SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM,
there is a store moved out of the inner loop) and a potential speedup on
MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it
makes some code look cleaner, and synchronizing the backends in this regard
seems like a generally good thing.
llvm-svn: 222504
The default value for opt.thread_count was multiprocessing.cpu_count(),
which meant the LLDB_TEST_THREADS environment variable was never used.
It's not easy to pass the -t option to the test run when invoking it
from e.g. 'ninja check-lldb', so having the environment variable as an
option is useful.
Change the logic so that the thread count is set by the first one of:
1. The -t option to test/dosep.py
2. The LLDB_TEST_THREADS environment variable
3. The machine's CPU count from multiprocessing.cpu_count()
llvm-svn: 222501
The alloca's type is irrelevant, only those types which are used in a
load or store of the exact size of the slice should be considered.
This manifested as an assertion failure when we compared the various
types: we had a size mismatch.
This fixes PR21480.
llvm-svn: 222499
is treated as a string instead of a FileSpec.
OptionValueFileSpec::SetValueFromCString() passes the c string to
FileSpec::SetFile(str, true /* resolve */) - and with Zachary's
changes to FileSpec we're using llvm::sys::fs::make_absolute() to
do that "resolve" action now, where we used to use realpath().
One important difference between llvm::sys::fs::make_absolute and
realpath is that when they're handed a filename (no directory),
realpath prepends the current working directory *and if the file exists*,
returns that full path. If that file doesn't exist, the caller
uses the basename only.
llvm::sys::fs::make_absolute prepends the current working directory
regardless of whether it exists or not.
I considered having FileSpec::SetFile save the initial pathname,
call FileSpec::Resolve, and then check to see if the Resolve return
path exists - and if not, go back to the original one.
But instead I just went with changing 'target modules load' to treat its
filename argument as a string instead of a FileSpec. This brings it
in line with how 'target modules list' works.
<rdar://problem/18955416>
llvm-svn: 222498
The previous description of the noalias attribute did not accurately specify
the implemented semantics, and the terminology used differed unnecessarily
from that used by the C specification to define the semantics of restrict. For
the argument attribute, the semantics can be precisely specified in terms of
objects accessed through pointers based on the arguments, and this is now what
is done.
Saying that the semantics are 'slightly weaker' than that provided by C99
restrict is not really useful without further elaboration, so that has been
removed from the sentence.
noalias on a return value is really used to mean that the function is
malloc-like (and, in fact, we use this attribute to represent
__attribute__((malloc)) in Clang), and this is a stronger guarantee than that
provided by restrict (because it is a property of the pointed-to memory region,
not just a guarantee on object access). Clarifying this is relevant to fixing
(and was motivated by the discussion on) PR21556.
llvm-svn: 222497