Relational comparisons should not involve multiple potentially
aliasing pointers. Similarly this should hold for switch conditions
and the two conditions involved in equality comparisons (separately!).
This is a heuristic based on the C semantics that does only allow such
operations when the base pointers do point into the same object.
Since this makes aliasing likely we will bail out early instead of
producing a probably failing runtime check.
llvm-svn: 288516
It did happen that after the inliner finished we end up with promotable
allocas in a function. We now run mem2reg to make sure everything is
promoted if possible.
llvm-svn: 288514
Summary:
This is an improvement on rL288448 where address sanitization was listed
as supported for the CudaToolChain. Since the intent is for the
CudaToolChain not to reject any flags supported by the host compiler,
this patch switches to forwarding the CudaToolChain sanitizer support to
the host toolchain rather than explicitly whitelisting address
sanitization.
Thanks to hfinkel for this suggestion.
Reviewers: jlebar
Subscribers: hfinkel, cfe-commits
Differential Revision: https://reviews.llvm.org/D27351
llvm-svn: 288512
The assertion asserted that colorable sections can never have
a reference to non-colorable sections, but that was simply wrong.
They can have references to non-colorable sections. If that's the
case, referenced sections must be the same in terms of pointer
comparison.
llvm-svn: 288511
This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's
builtins (twice: once in r288412 and once in r288497). We should investigate
this offline.
llvm-svn: 288508
Summary:
When X = 0 and Y = inf, the original code produces inf, but the transformed
code produces nan. So this transform (and its relatives) should only be
used when the no-infs-fp-math flag is explicitly enabled.
Also disable the transform using fmad (intermediate rounding) when unsafe-math
is not enabled, since it can reduce the precision of the result; consider this
example with binary floating point numbers with two bits of mantissa:
x = 1.01
y = 111
x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step)
x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero)
The example relies on rounding towards zero at least in the second step.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578
Reviewers: RKSimon, tstellarAMD, spatel, arsenm
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D26602
llvm-svn: 288506
Summary:
Unfortunately, there is no way to emit an llvm masked load/store in
clang without optimizations, and AVX enabled. Unsure how we should go
about making sure this test only runs if it's possible to execute AVX
code.
Reviewers: kcc, RKSimon, pgousseau
Subscribers: kubabrecka, dberris, llvm-commits
Differential Revision: https://reviews.llvm.org/D26506
llvm-svn: 288504
Summary: They are currently being parsed as %f14, %f16, and %f18.
Reviewers: venkatra, jyknight
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27342
llvm-svn: 288503
Summary: Virtual method overrides of dependent types cannot be recognized unless
they are marked as override or final.
Exclude methods marked as final from check and add test.
Reviewers: sbenza, hokein, alexfh
Subscribers: malcolm.parsons, JDevlieghere, cfe-commits
Differential Revision: https://reviews.llvm.org/D27248
llvm-svn: 288502
When trying to vectorize trees that start at insertelement instructions
function tryToVectorizeList() uses vectorization factor calculated as
MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree
cost for this fixed vectorization factor is too high.
Patch tries to improve the situation. It tries different vectorization
factors from max(PowerOf2Floor(NumberOfVectorizedValues),
MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries
to choose the best one.
Differential Revision: https://reviews.llvm.org/D27215
llvm-svn: 288497
getTargetConstantBitsFromNode currently only extracts constant pool vector data, but it will need to be generalized to support broadcast and scalar constant pool data as well.
Converted Constant bit extraction and Bitset splitting to helper lambda functions.
llvm-svn: 288496
Summary:
This replaces all the uses of the __ANDROID_NDK__ define with __ANDROID__. This
is a preparatory step to remove our custom android toolchain file and rely on
the standard android NDK one instead, which does not provide this define.
Instead I rely, on __ANDROID__, which is set by the compiler.
I haven't yet removed the cmake variable with the same name, as we will need to
do something completely different there -- NDK toolchain defines
CMAKE_SYSTEM_NAME to Android, while our current one pretends it's linux.
Reviewers: tberghammer, zturner
Subscribers: danalbert, srhines, mgorny, lldb-commits
Differential Revision: https://reviews.llvm.org/D27305
llvm-svn: 288494
removed as a duplicate header search path
The commit r126167 started passing the First index into RemoveDuplicates, but
forgot to update 0 to First in the loop that looks for the duplicate. This
resulted in a bug where an -iquoted search path was incorrectly removed if you
passed in the same path into -iquote and more than one time into -isystem.
rdar://23991350
Differential Revision: https://reviews.llvm.org/D27298
llvm-svn: 288491
This allows us to delinearize code such as the one below, where the array
sizes are A[][2 * n] as there are n times two elements in the innermost
dimension. Alternatively, we could try to generate another dimension for the
struct in the innermost dimension, but as the struct has constant size,
recovering this dimension is easy.
struct com {
double Real;
double Img;
};
void foo(long n, struct com A[][n]) {
for (long i = 0; i < 100; i++)
for (long j = 0; j < 1000; j++)
A[i][j].Real += A[i][j].Img;
}
int main() {
struct com A[100][1000];
foo(1000, A);
llvm-svn: 288489
LLD used to take 11.73 seconds to link Clang. Now it is 6.94 seconds.
MSVC link takes 83.02 seconds. Note that ICF is enabled by default on
Windows, so a low latency ICF is more important than in ELF.
llvm-svn: 288487
It looks like the way dtrace works is
* The user creates .o files that reference magical symbol names.
* dtrace reads those files, collecs the info it needs and changes the
relocation to R_X86_64_NONE expecting the linker to ignore them.
llvm-svn: 288485
Associative sections are sections that need to be linked if their associated
sections are linked. Associative sections are used to append auxiliary data
such as debug info.
Previously, we compared all associative sections when comparing two comdat
sections. Because usually assocative sections are not mergeable sections,
we missed a lot of mergeable sections. MSVC linker doesn't seem to check
the identity of associative sections.
This patch makes LLD to ignore associative sections when doing ICF.
llvm-svn: 288483
r288228 seems to have regressed ICF performance in some cases in which
a lot of sections are actually mergeable. In r288228, I made a change
to create a Range object for each new color group. So every time we
split a group, we allocated and added a new group to a list of groups.
This patch essentially reverted r288228 with an improvement to
parallelize the original algorithm.
Now the ICF main loop is entirely allocation-free and lock-free.
Just like pre-r288228, we search for group boundaries by linear scan
instead of managing the information using Range class. r288228 was
neutral in performance-wise, and so is this patch.
I confirmed that this produces the exact same result as before
using chromium and clang as tests.
llvm-svn: 288480
After having built memory accesses we perform some additional transformations
on them to increase the chances that our delinearization guesses the right
shape. Only after these transformations, we take the assumptions that the
array shape we predict is such that no out-of-bounds memory accesses arise.
Before this change, the construction of the memory access, the access folding
that improves the represenation for certain parametric subscripts, and taking
the assumption was all done right after a memory access was created. In this
change we split this now into three separate iterations over all memory
accesses. This means only after all memory accesses have been built, we start
to canonicalize accesses, and to take assumptions. This split prepares for
future canonicalizations that must consider all memory accesses for deriving
additional beneficial transformations.
llvm-svn: 288479
Now that PointerType is no longer a SequentialType, all SequentialTypes
have an associated number of elements, so we can move that information to
the base class, allowing for a number of simplifications.
Differential Revision: https://reviews.llvm.org/D27122
llvm-svn: 288464
As proposed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2016-October/106640.html
This is for a couple of reasons:
- Values of type PointerType are unlike the other SequentialTypes (arrays
and vectors) in that they do not hold values of the element type. By moving
PointerType we can unify certain aspects of how the other SequentialTypes
are handled.
- PointerType will have no place in the SequentialType hierarchy once
pointee types are removed, so this is a necessary step towards removing
pointee types.
Differential Revision: https://reviews.llvm.org/D26595
llvm-svn: 288462
This is a fairly reasonable bfd extension since there is one obvious value.
dtrace depends on this feature as it creates multiple absolute
symbols with the same value.
llvm-svn: 288461
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.
Differential Revision: https://reviews.llvm.org/D26594
llvm-svn: 288458