This will enable removing hacks throughout the codebase
in clang and compiler-rt that feed multiple inputs to a
testing utility by globbing, all of which are either disabled
on Windows currently or using xargs / find hacks.
Differential Revision: https://reviews.llvm.org/D30380
llvm-svn: 296904
Summary:
If a loop contains a Phi node which has an invariant input from back
edge, it is profitable to peel such loops (rather than unroll them) to
use the advantage that this Phi is always invariant starting from 2nd
iteration. After the 1st iteration is peeled, other optimizations can
potentially simplify calculations with this invariant.
Patch by Max Kazantsev!
Reviewers: sanjoy, apilipenko, igor-laevsky, anna, mkuper, reames
Reviewed By: mkuper
Subscribers: mkuper, mzolotukhin, llvm-commits
Differential Revision: https://reviews.llvm.org/D30161
llvm-svn: 296898
Summary:
In current implementation the loop peeling happens after trip-count based partial unrolling and may
sometimes not happen at all due to it (for example, if trip count is known, but UP.Partial = false). This
is generally bad, the more than there are some situations where peeling is profitable even if the partial
unrolling is disabled.
This patch is a NFC which reorders peeling and partial unrolling application and prepares the code for
implementation of the said optimizations.
Patch by Max Kazantsev!
Reviewers: sanjoy, anna, reames, apilipenko, igor-laevsky, mkuper
Reviewed By: mkuper
Subscribers: mkuper, llvm-commits, mzolotukhin
Differential Revision: https://reviews.llvm.org/D30243
llvm-svn: 296897
pthread_self() returns a pthread_t, but we were setting it to
an int. It seems the cast to int when calling sysctl is still
the correct thing to do, though.
llvm-svn: 296892
Doing so defines the type llvm::thread. On FreeBSD, we need
to call a macro which references its own ::thread type, which
causes an ambiguity due to ADL when inside of the llvm namespace.
Since we don't even need this unless LLVM_ENABLE_THREADS == 1,
we don't even need this type anyway, as it is always equal to
std::thread, so we can just use that directly.
llvm-svn: 296891
Applications often need the current thread id when making
system calls, and some operating systems provide the notion
of a thread name, which can be useful in enabling better
diagnostics when debugging or logging.
This patch adds an accessor for the thread id, and "best effort"
getters and setters for the thread name. Since this is
non critical functionality, no error is returned to indicate
that a platform doesn't support thread names.
Differential Revision: https://reviews.llvm.org/D30526
llvm-svn: 296887
Added code to check constant bus restrictions for VOP formats (only one SGPR value or literal-constant may be used by the instruction).
Note that the same checks are performed by SIInstrInfo::verifyInstruction (used by lowering code).
Added LIT tests.
llvm-svn: 296873
The intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr read and
write to the fpscr (Floating-Point Status and Control Register) register.
A bug exists in the __builtin_arm_get_fpscr intrinsic definition in llvm which
treats this intrinsic as a IntroNoMem which means it's not a memory access and
doesn't have any other side-effects. Having this property on this intrinsic
means that various optimizations can be done on this such as common
sub-expression elimination with other reads. This can cause issues if there has
been write to this register, e.g.
void foo(int *p) {
p[0] = __builtin_arm_get_fpscr();
__builtin_arm_set_fpscr(1);
p[1] = __builtin_arm_get_fpscr();
}
in the above example the second read is currently CSE'd into the first read,
this is because llvm isn't aware that the write done by __builtin_arm_set_fpscr
effects the same register that __builtin_arm_get_fpscr reads from, to fix this
problem I've removed the property IntrNoMem so that __builtin_arm_get_fpscr is
treated as a memory access.
Differential Revision: https://reviews.llvm.org/D30542
llvm-svn: 296865
for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR.
The fix is to compute the mask for out of order memory accesses while building the vectorizable tree
instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for
external use of vectorizable scalars based on shuffle mask.
Reviewers: mkuper
Differential Revision: https://reviews.llvm.org/D30159
Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f487e2f60ce5d
llvm-svn: 296863
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.
llvm-svn: 296862
VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.
Differential Revision: https://reviews.llvm.org/D29874
llvm-svn: 296859
This is a cleanup/rewrite of the parseSysAlias function. It was not using the
tablegen instruction descriptions, but was “manually” matching the mnemonics
and recreating the operands whereas all this information is already in
tablegen; all this code has been replaced with calls to lookupXYZByName
tablegen calls.
Differential Revision: https://reviews.llvm.org/D30491
llvm-svn: 296857
A call should never modify the stack pointer, but some backends are
not so sure about this and never list SP in the regmask. For the
purposes of LiveDebugValues we assume a call never clobbers SP. We
already have a similar workaround in DbgValueHistoryCalculator (which
we hopefully can retire soon).
This fixes the availabilty of local ASANified variables on AArch64.
rdar://problem/27757381
llvm-svn: 296847
For chains of triangles with small join blocks that can be tail duplicated, a
simple calculation of probabilities is insufficient. Tail duplication
can be profitable in 3 different ways for these cases:
1) The post-dominators marked 50% are actually taken 56% (This shrinks with
longer chains)
2) The chains are statically correlated. Branch probabilities have a very
U-shaped distribution.
[http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805]
If the branches in a chain are likely to be from the same side of the
distribution as their predecessor, but are independent at runtime, this
transformation is profitable. (Because the cost of being wrong is a small
fixed cost, unlike the standard triangle layout where the cost of being
wrong scales with the # of triangles.)
3) The chains are dynamically correlated. If the probability that a previous
branch was taken positively influences whether the next branch will be
taken
We believe that 2 and 3 are common enough to justify the small margin in 1.
The code pre-scans a function's CFG to identify this pattern and marks the edges
so that the standard layout algorithm can use the computed results.
llvm-svn: 296845
Such edges may otherwise result in infinite recursion if a pointer to a vtable
is reachable from the vtable itself. This can happen in practice if a TU
defines the ABI types used to implement RTTI, and is itself compiled with RTTI.
Fixes PR32121.
llvm-svn: 296839
ValueTracking is used for more thorough analysis of operands. Based on the
analysis, either run-time checks can be simplified (e.g. check only one operand
instead of two) or the transformation can be avoided. For example, it is quite
often the case that a divisor is promoted from a shorter type and run-time
checks for it are redundant.
With additional compile-time analysis of values, two special cases naturally
arise and are addressed by the patch:
1) Both operands are known to be short enough. Then, the long division can be
simply replaced with a short one without CFG modification.
2) If a division is unsigned and the dividend is known to be short then the
long division is not needed at all. Because if the divisor is too big for
short division then the quotient is obviously zero (and the remainder is
equal to the dividend). Actually, the division is not needed when
(divisor > dividend).
Differential Revision: https://reviews.llvm.org/D29897
llvm-svn: 296832
Summary:
Fix a few problems in VersionFromVCS.cmake to make it more reliable:
- Stop using git svn info to retrieve the svn revision. I am unable to
determine what the svn revision returned by this command means.
During my testing this command returned a revision from a month
ago which was not the HEAD of any of my local branches.
Also, this revision was never actually added to the version string due
to a typo in the script. All it was used for was to reject the
revision number returned by git svn find-rev HEAD when the revision
numbers didn't match.
- Populate GIT_COMMIT even when we detect a git repo without any
svn information.
Reviewers: mehdi_amini, beanz
Reviewed By: beanz
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D30092
llvm-svn: 296829
The most important goal of the patch is to break large insertFastDiv function
into separate pieces, so that later a different fast insertion logic can be
implemented using some of these pieces.
Differential Revision: https://reviews.llvm.org/D29896
llvm-svn: 296828
Summary:
Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below
```
extern int bar();
extern int baz();
int foo(int x, int y) {
if (x != y)
return bar();
else
return baz();
}
```
, following is the bitcode representation of 'foo' at the end of llvm-ir level optimization:
```
define i32 @foo(i32 %x, i32 %y) !dbg !4 {
entry:
tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
%cmp = icmp ne i32 %x, %y, !dbg !14
br i1 %cmp, label %if.then, label %if.else, !dbg !16
if.then: ; preds = %entry
%call = tail call i32 (...) @bar() #3, !dbg !17
br label %return, !dbg !18
if.else: ; preds = %entry
%call1 = tail call i32 (...) @baz() #3, !dbg !19
br label %return, !dbg !20
return: ; preds = %if.else, %if.then
%retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
ret i32 %retval.0, !dbg !21
}
!14 = !DILocation(line: 5, column: 9, scope: !15)
!16 = !DILocation(line: 5, column: 7, scope: !4)
```
As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue.
Reviewers: atrick, bogner, andreadb, craig.topper, aprantl
Reviewed By: andreadb
Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits
Differential Revision: https://reviews.llvm.org/D29813
llvm-svn: 296825
This commit also relied on r296812, which I just reverted. We should probably
apply it again, after the r296812 has been discussed and been reapplied in some
variant.
llvm-svn: 296820