Summary:
This patch prepares more for tail call support in XRay. Until the logging part supports tail calls, this is just staging, so it seems LLVM part is mostly ready with this patch.
Related: https://reviews.llvm.org/D28948 (compiler-rt)
Reviewers: dberris, rengolin
Reviewed By: dberris
Subscribers: llvm-commits, iid_iunknown, aemerson
Differential Revision: https://reviews.llvm.org/D28947
llvm-svn: 293080
Set LLVM_LIBRARY_OUTPUT_INTDIR as expected by llvm_setup_rpath() macro
when doing stand-alone builds. This is required to pass correct
-rpath-link when linking shared libraries, and therefore ensure that
the linker can find dependency libraries correctly during the build.
Differential Revision: https://reviews.llvm.org/D29099
llvm-svn: 293078
Change getReservedRegs() to not mark a register as reserved and then
revert that decision in some cases. Motivated by the discussion in
https://reviews.llvm.org/D29056
llvm-svn: 293073
This patch adds support for the proc_bind clause on the Spmd construct
'target parallel' on the NVPTX device. Since the parallel region is created
upon kernel launch, this clause can be safely ignored on the NVPTX device at
codegen time for level 0 parallelism.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29128
llvm-svn: 293069
Summary:
Adding ARM64 as a supported architecture for Scudo.
The random shuffle is not yet supported for SizeClassAllocator32, which is used
by the AArch64 allocator, so disable the associated test for now.
Reviewers: kcc, alekseyshl, rengolin
Reviewed By: rengolin
Subscribers: aemerson, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D28960
llvm-svn: 293068
This patch introduces guard based loop predication optimization. The new LoopPredication pass tries to convert loop variant range checks to loop invariant by widening checks across loop iterations. For example, it will convert
for (i = 0; i < n; i++) {
guard(i < len);
...
}
to
for (i = 0; i < n; i++) {
guard(n - 1 < len);
...
}
After this transformation the condition of the guard is loop invariant, so loop-unswitch can later unswitch the loop by this condition which basically predicates the loop by the widened condition:
if (n - 1 < len)
for (i = 0; i < n; i++) {
...
}
else
deoptimize
This patch relies on an NFC change to make ScalarEvolution::isMonotonicPredicate public (revision 293062).
Reviewed By: sanjoy
Differential Revision: https://reviews.llvm.org/D29034
llvm-svn: 293064
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine
Reviewed By: apilipenko
Differential Revision: https://reviews.llvm.org/D29075
Patch by Maxim Kazantsev.
llvm-svn: 293061
Summary:
Lifetime extension wasn't triggered on the result of BuildMI because the
reference was non-const. However, instead of adding a const, I've
removed the reference entirely as RVO should kick in anyway.
Reviewers: rovka, bkramer
Reviewed By: bkramer
Subscribers: aemerson, rengolin, dberris, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D29124
llvm-svn: 293059
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine
Reviewed By: apilipenko
Differential Revision: https://reviews.llvm.org/D29074
Patch by Maxim Kazantsev.
llvm-svn: 293058
This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine
Reviewed By: majnemer, apilipenko
Differential Revision: https://reviews.llvm.org/D29071
Patch by Maxim Kazantsev.
llvm-svn: 293056
Summary:
This presents a version of the comment reflowing with less mutable state inside
the comment breakable token subclasses. The state has been pushed into the
driving breakProtrudingToken method. For this, the API of BreakableToken is enriched
by the methods getSplitBefore and getLineLengthAfterSplitBefore.
Reviewers: klimek
Reviewed By: klimek
Subscribers: djasper, klimek, mgorny, cfe-commits, ioeric
Differential Revision: https://reviews.llvm.org/D28764
llvm-svn: 293055
Use the new llvm_canonicalize_cmake_booleans() function to canonicalize
booleans for lit tests. Replace the duplicate ENABLE_CLANG* variables
used to hold canonicalized values with in-place canonicalization. Use
implicit logic in Python code to avoid overrelying on exact 0/1 values.
Differential Revision: https://reviews.llvm.org/D28529
llvm-svn: 293052
Summary: This enables the test to run on systems where output cannot be written.
Reviewers: compnerd
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D29123
llvm-svn: 293051
Prior to OpenCL 2.0, image3d_t can only be used with the write_only
access qualifier when the cl_khr_3d_image_writes extension is enabled,
see e.g. OpenCL 1.1 s6.8b.
Require the extension for write_only image3d_t types and guard uses of
write_only image3d_t in the OpenCL header.
Patch by Sven van Haastregt!
Review: https://reviews.llvm.org/D28860
llvm-svn: 293050
The thread_limit-clause on the combined directive applies to the
'teams' region of this construct. We modify the ThreadLimitClause
class to capture the clause expression within the 'target' region.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29087
llvm-svn: 293049
The num_teams-clause on the combined directive applies to the
'teams' region of this construct. We modify the NumTeamsClause
class to capture the clause expression within the 'target' region.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29085
llvm-svn: 293048
The main motivation for me doing this is being able to build an arm
android lldb-server against api level 9 headers, but it seems like a
good cleanup nonetheless.
The entirety of the cpu_set_t dance now resides in SingleStepCheck.cpp,
which is only built on arm64.
llvm-svn: 293046
Mapping symbols allow a mapping symbol aware disassembler to
correctly disassemble the PLT when the code immediately prior to the
PLT is Thumb.
To implement this we add a function to add symbols with local
binding to be defined in SyntheticSymbols.
Differential Revision: https://reviews.llvm.org/D28956
llvm-svn: 293044
This is an attempt to avoid new false positives caused by the reverted r292800,
however the scope of the fix is significantly reduced - some variables are still
in incorrect memory spaces.
Relevant test cases added.
rdar://problem/30105546
rdar://problem/30156693
Differential revision: https://reviews.llvm.org/D28946
llvm-svn: 293043
instructions.
If number of instructions in horizontal reduction list is not power of 2
then only PowerOf2Floor(NumberOfInstructions) last elements are actually
vectorized, other instructions remain scalar. Patch tries to vectorize
the remaining elements either.
Differential Revision: https://reviews.llvm.org/D28959
llvm-svn: 293042
Floating point intrinsics in LLVM are generally not speculatively
executed, since most of them are defined to behave the same as libm
functions, which set errno.
However, the @llvm.powi.* intrinsics do not correspond to any libm
function, and lacks any defined error handling semantics in LangRef.
It most certainly does not alter errno.
llvm-svn: 293041
The previous patch (https://reviews.llvm.org/rL289538) got reverted because of a bug. Chandler also requested some changes to the algorithm.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161212/413479.html
This is an updated patch. The key difference is that collectBitProviders (renamed to calculateByteProvider) now collects the origin of one byte, not the whole value. It simplifies the implementation and allows to stop the traversal earlier if we know that the result won't be used.
From the original commit:
Match a pattern where a wide type scalar value is loaded by several narrow loads and combined by shifts and ors. Fold it into a single load or a load and a bswap if the targets supports it.
Assuming little endian target:
i8 *a = ...
i32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
=>
i32 val = *((i32)a)
i8 *a = ...
i32 val = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3]
=>
i32 val = BSWAP(*((i32)a))
This optimization was discussed on llvm-dev some time ago in "Load combine pass" thread. We came to the conclusion that we want to do this transformation late in the pipeline because in presence of atomic loads load widening is irreversible transformation and it might hinder other optimizations.
Eventually we'd like to support folding patterns like this where the offset has a variable and a constant part:
i32 val = a[i] | (a[i + 1] << 8) | (a[i + 2] << 16) | (a[i + 3] << 24)
Matching the pattern above is easier at SelectionDAG level since address reassociation has already happened and the fact that the loads are adjacent is clear. Understanding that these loads are adjacent at IR level would have involved looking through geps/zexts/adds while looking at the addresses.
The general scheme is to match OR expressions by recursively calculating the origin of individual bytes which constitute the resulting OR value. If all the OR bytes come from memory verify that they are adjacent and match with little or big endian encoding of a wider value. If so and the load of the wider type (and bswap if needed) is allowed by the target generate a load and a bswap if needed.
Reviewed By: RKSimon, filcab, chandlerc
Differential Revision: https://reviews.llvm.org/D27861
llvm-svn: 293036
Add support for:
* i1 add
* i1 function arguments, if passed through registers
* i1 returns, with ABI signext/zeroext
Differential Revision: https://reviews.llvm.org/D27706
llvm-svn: 293035
At the moment, this means supporting the signext/zeroext attribute on the return
type of the function. For function arguments, signext/zeroext should be handled
by the caller, so there's nothing for us to do until we start lowering calls.
Note that this does not include support for other extensions (i8 to i16), those
will be added later.
Differential Revision: https://reviews.llvm.org/D27705
llvm-svn: 293034
If dominator tree has no roots, the pass that calculates it is
likely to be skipped. It occures, for instance, in the case of
entities with linkage available_externally. Do not run tree
verification in such case.
Differential Revision: https://reviews.llvm.org/D28767
llvm-svn: 293033
This reverts commit r293004 because it broke the buildbots with "unknown CPU"
errors. I tried to fix it in r293026, but that broke on Green Dragon with this
kind of error:
error: expected string not found in input
// CHECK: l{{ +}}df{{ +}}*ABS*{{ +}}{{0+}}{{.+}}preprocessed-input.c{{$}}
^
<stdin>:2:1: note: scanning from here
/Users/buildslave/jenkins/sharedspace/incremental@2/clang-build/tools/clang/test/Frontend/Output/preprocessed-input.c.tmp.o: file format Mach-O 64-bit x86-64
^
<stdin>:2:67: note: possible intended match here
/Users/buildslave/jenkins/sharedspace/incremental@2/clang-build/tools/clang/test/Frontend/Output/preprocessed-input.c.tmp.o: file format Mach-O 64-bit x86-64
I suppose this means that llvm-objdump doesn't support Mach-O, so the test
should indeed check for linux (but not for x86). I'll leave it to someone that
knows better.
llvm-svn: 293032
Summary:
A patch to enable the llvm-xray graph subcommand to color edges and
vertices based on statistics and to annotate vertices with statistics.
Depends on D27243
Reviewers: dblaikie, dberris
Reviewed By: dberris
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D28225
llvm-svn: 293031
Enable the next form (intel style):
"mov <reg64>, <largeImm>"
which is should be available,
where <largeImm> stands for immediates which exceed the range of a singed 32bit integer
Differential Revision: https://reviews.llvm.org/D28988
llvm-svn: 293030
This test broke on a lot of non-x86 buildbots with "unknowm CPU" errors. I don't
see anything platform-specific about this test, and it seems to work fine on ARM
if we just remove the -triple i686 flags from the run line.
llvm-svn: 293026
Conservatively disable sinking and merging inline-asm instructions as doing so
can potentially create arguments that cannot satisfy the inline-asm constraints.
For example, SimplifyCFG used to do the following transformation:
(before)
if.then:
%0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 8)
br label %if.end
if.else:
%1 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 6)
br label %if.end
(after)
%.sink = select i1 %tobool, i32 6, i32 8
%0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 %.sink)
This would result in a crash in the backend since only immediate integer operands
are permitted for constraint "n".
rdar://problem/30110806
Differential Revision: https://reviews.llvm.org/D29111
llvm-svn: 293025