r218568 added an explicit #include of the Linux ProcessMonitor.h to
POSIXThread.cpp, rather than including just "ProcessMonitor.h" and
relying on the build infrastructure for the appropriate paths.
For now add #ifdefs in the source to use the FreeBSD or Linux header
as appropriate; a cleaner fix (and perhaps some refactoring of the
POSIX classes) should still be done later.
llvm-svn: 218762
This is needed so we can produce -i686- named libraries for
x86 Android (which is i686-linux-android).
An alternative solution would be keeping the "i386" name internally and
tweaking the OUTPUT_NAME of compiler-rt libraries.
llvm-svn: 218761
We use a parametric abstraction of the domain to split alias groups
if accesses cannot be executed under the same parameter evaluation.
The two test cases check that we can remove alias groups if the
pointers which might alias are never accessed under the same parameter
evaluation and that the minimal/maximal accesses are not global but
with regards to the parameter evaluation.
Differential Revision: http://reviews.llvm.org/D5436
llvm-svn: 218758
If there are multiple read only base addresses in an alias group
we can split it into multiple alias groups each with only one
read only access. This way we might reduce the number of
comparisons significantly as it grows linear in the number of
alias groups but exponential in their size.
Differential Revision: http://reviews.llvm.org/D5435
llvm-svn: 218757
that keep cropping up in the regression test suite.
This also addresses one of the issues raised on the mailing list with
failing to form 'movsd' in as many cases as we realistically should.
There will be corresponding patches forthcoming for v4f32 at least. This
was a lot of fuss for a relatively small gain, but all the fuss was on
my end trying different ways of holding the pieces of the x86 fragment
patterns *just right*. Now that it works, the code is reasonably simple.
In the new test cases I'm adding here, v2i64 sticks out as just plain
horrible. I've not come up with any great ideas here other than that it
would be nice to recognize when we're *going* to take a domain crossing
hit and cross earlier to get the decent instructions. At least with AVX
it is slightly less silly....
llvm-svn: 218756
Nothing was relying on this and there are potentially some edge cases
that it would not be correct under. Removing it seems better than trying
to "fix" it as nothing was relying on it.
llvm-svn: 218755
The A64 instruction set includes a generic register syntax for accessing
implementation-defined system registers. The syntax for these registers is:
S<op0>_<op1>_<CRn>_<CRm>_<op2>
The encoding space permitted for implementation-defined system registers
is:
op0 op1 CRn CRm op2
11 xxx 1x11 xxxx xxx
The full encoding space can now be accessed:
op0 op1 CRn CRm op2
xx xxx xxxx xxxx xxx
This is useful to anyone needing to write assembly code supporting new
system registers before the assembler has learned the official names for
them.
llvm-svn: 218753
Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST
was introduced in r217159 and r217138. This patch adds a missing cast from
v2f32 to v1i64 which is causing some compilation failures. Also added test
cases to cover various modimm types and BUILD_VECTORs with i64 elements.
llvm-svn: 218751
The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and
FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be
modeled using the same target feature, and all double-precision
operations are already disabled by the fp-only-sp target features.
llvm-svn: 218748
The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and
FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be
modelled using the same target feature, and all double-precision
operations are already disabled by the fp-only-sp target features.
llvm-svn: 218747
doesn't generate lazy binding stub for a function whose address is taken in
the program.
Differential Revision: http://reviews.llvm.org/D5067
llvm-svn: 218744
This patch implements collapsing of the loops (in particular, in
presense of clause 'collapse'). It calculates number of iterations N
and expressions nesessary to calculate the nested loops counters
values based on new iteration variable (that goes from 0 to N-1)
in Sema. It also adds Codegen for 'omp simd', which uses
(and tests) this feature.
Differential Revision: http://reviews.llvm.org/D5184
llvm-svn: 218743
member of RTDyldMemoryManager (and rename to getSymbolAddressInProcess).
The functionality this provides is very specific to RTDyldMemoryManager, so it
makes sense to keep it in that class to avoid accidental re-use.
No functional change.
llvm-svn: 218741
When generating coverage regions, we were doing a linear search
through the existing regions in order to try to merge related ones.
Most of the time this would find what it was looking for in a small
number of steps and it wasn't a big deal, but in cases with many
regions and few mergeable ones this leads to an absurd compile time
regression.
This changes the coverage mapping logic to do a single sort and then
merge as we go, which is a bit simpler and about 100 times faster.
I've also added FIXMEs on a couple of behaviours that seem a little
suspect, while keeping them behaving as they were - I'll look into
these soon.
The test changes here are mostly tedious reorganization, because the
ordering of regions we output has become slightly (but not completely)
more consistent from the almost completely arbitrary ordering we got
before.
llvm-svn: 218738
This struct has some members that are accessed directly and others
that need accessors, but it's all just public. This is confusing, so
I've changed it to a class and made more members private.
llvm-svn: 218737
The icmp-select-icmp optimization made the implicit assumption
that the select-icmp instructions are in the same block and asserted on it.
The fix explicitly checks for that condition and conservatively suppresses
the optimization when it is violated.
llvm-svn: 218735
in exposing the scalar value to the broadcast DAG fragment so that we
can catch even reloads and fold them into the broadcast.
This is somewhat magical I'm afraid but seems to work. It is also what
the old lowering did, and I've switched an old test to run both
lowerings demonstrating that we get the same result.
Unlike the old code, I'm not lowering f32 or f64 scalars through this
path when we only have AVX1. The target patterns include pretty heinous
code to re-cast those as shuffles when the scalar happens to not be
spilled because AVX1 provides no broadcast mechanism from registers
what-so-ever. This is terribly brittle. I'd much rather go through our
generic lowering code to get this. If needed, we can add a peephole to
get even more opportunities to broadcast-from-spill-slots that are
exposed post-RA, but my suspicion is this just doesn't matter that much.
llvm-svn: 218734
the same speed as pshufd but we can fold loads into the pmovzx
instructions.
This fixes some regressions that came up in the regression test suite
for the new vector shuffle lowering.
llvm-svn: 218733
This can be used for in-place initialization of non-moveable types.
For compilers that don't support variadic templates, only up to four
arguments are supported. We can always add more, of course, but this
should be good enough until we move to a later MSVC that has full
support for variadic templates.
Inspired by std::experimental::optional from the "Library Fundamentals" C++ TS.
Reviewed by David Blaikie.
llvm-svn: 218732
This allows proper disambiguation of unbounded arrays and arrays of zero
bound ("struct foo { int x[]; };" and "struct foo { int x[0]; }"). GCC
instead produces an upper bound of -1 in the latter situation, but count
seems tidier. This way lower_bound is provided if it's not the language
default and count is provided if the count is known, otherwise it's
omitted. Simple.
If someone wants to look at rdar://problem/12566646 and see if this
change is acceptable to that bug/fix, that might be helpful (see the
empty-and-one-elem-array.ll test case which cites that radar).
llvm-svn: 218726
VPBROADCAST.
This has the somewhat expected pervasive impact. I don't know why
I forgot about this. Everything seems good with lots of significant
improvements in the tests.
llvm-svn: 218724
In special cases select instructions can be eliminated by
replacing them with a cheaper bitwise operation even when the
select result is used outside its home block. The instances implemented
are patterns like
%x=icmp.eq
%y=select %x,%r, null
%z=icmp.eq|neq %y, null
br %z,true, false
==> %x=icmp.ne
%y=icmp.eq %r,null
%z=or %x,%y
br %z,true,false
The optimization is integrated into the instruction
combiner and performed only when all uses of the select result can
be replaced by the select operand proper. For this dominator information
is used and dominance is now a required analysis pass in the combiner.
The optimization itself is iterative. The critical step is to replace the
select result with the non-constant select operand. So the select becomes
local and the combiner iteratively works out simpler code pattern and
eventually eliminates the select.
rdar://17853760
llvm-svn: 218721
The darwin linker has the -demangle option which directs it to demangle C++
(and soon Swift) mangled symbol names. Long term we need some Diagnostics object
for formatting errors and warnings. But for now we have the Core linker just
writing messages to llvm::errs(). So, to enable demangling, I changed the
Resolver to call a LinkingContext method on the symbol name.
To make this more interesting, the demangling code is done via __cxa_demangle()
which is part of the C++ ABI, which is only supported on some platforms, so I
had to conditionalize the code with the config generated HAVE_CXXABI_H.
llvm-svn: 218718
being on by default. -fno-cxx-modules can still be used to enable C modules but
not C++ modules, but C++ modules is not significantly less stable than C
modules any more.
Also remove some of the scare words from the modules documentation. We're
certainly not going to remove modules support (though we might change the
interface), and it works well enough to bootstrap and build lots of
non-trivial code.
Note that this does not represent a commitment to the current interface nor
implementation, and we still intend to follow whatever direction the C and C++
committees take regarding modules support.
llvm-svn: 218717
The optimization for -gmlt/-gline-tables-only introduced in r218129 happened to break on Darwin and produce no line number information due to
an incompatibility with dsymutil. ASan tests have been failing because of that and we disabled the use of -gmlt for the tests in r218545. This patch re-enables the use of -gmlt, because we have conditionally disabled the incompatible optimization in LLVM, so -gmlt now works on Darwin. Once Darwin's dsymutil is modified to allow this optimization, we can re-enable the optimization in LLVM.
llvm-svn: 218716
Two related things:
1. Fixes a bug when calculating the offset in GetLinearExpression. The code
previously used zext to extend the offset, so negative offsets were converted
to large positive ones.
2. Enhance aliasGEP to deduce that, if the difference between two GEP
allocations is positive and all the variables that govern the offset are also
positive (i.e. the offset is strictly after the higher base pointer), then
locations that fit in the gap between the two base pointers are NoAlias.
Patch by Nick White!
llvm-svn: 218714
No functional change. Pre-emptive refactoring before I start pushing
some of this subprogram creation down into DWARFCompileUnit so I can
build different subprograms in the skeleton unit from the dwo unit for
adding -gmlt-like data to the skeleton.
llvm-svn: 218713
Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.
The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:
__global__ void foo(int a, int b, int c, int d, int e, int n,
const int *input, int *output) {
int sum = 0;
for (int i = 0; i < n; ++i)
sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
*output = sum;
}
The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:
sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];
Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.
We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!
Test Plan: Added one test case to check the threshold is in effect
Reviewers: nadav, eliben, meheff, resistor, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, llvm-commits
Differential Revision: http://reviews.llvm.org/D5529
llvm-svn: 218711
cases.
While clearly we don't need the AVX vector width, these ISA extensions
often cause us to select different instructions and we should cover them
even with the narrow vector width.
Also, while here, nuke the stress_test2 contents. There is no reason to
try to FileCheck this entire body when it is mostly a test for
successfully surviving the code generator.
llvm-svn: 218710
shuffle tests to match that used in the script I posted and now used
consistently in 128-bit tests.
Nothing interesting changing here, just using the label name as the
FileCheck label and a slightly more general comment marker consumption
strategy.
llvm-svn: 218709