While the incoming stack for a kernel is 256-byte aligned,
this refers to the base address of the entire wave. This isn't
useful information for most of codegen. Fixes unnecessarily
aligning stack objects in callees.
llvm-svn: 300481
The splitIndirectCriticalEdges function generates and invalid CFG when the
'Target' basic block is a loop to itself. When this occurs, the code that
updates the predecessor terminator needs to update the terminator in the split
basic block.
This occurs when there is an edge from block D back to D. Since D is split in
to D0 and D1, the code needs to update the terminator in D1. But D1 is not in
the OtherPreds vector, so it was not getting updated.
Differential Revision: https://reviews.llvm.org/D32126
llvm-svn: 300480
This was added to work around a bug in MSVC 2013's implementation of stable_sort. That bug has been fixed as of MSVC 2015 so we shouldn't need this anymore.
Technically the current implementation has undefined behavior because we only protect the deleting of the pVal array with the self move check. There is still a memcpy of that.VAL to VAL that isn't protected. In the case of self move those are the same local and memcpy is undefined for src and dst overlapping.
This reduces the size of the opt binary on my local x86-64 build by about 4k.
Differential Revision: https://reviews.llvm.org/D32116
llvm-svn: 300477
Currently we use getTypeSizeInBits which contains a switch statement to dispatch based on what the Type is. We know we always have a pointer type here, but the compiler isn't able to figure out that out to remove the switch.
This patch changes it to just call handle the pointer type directly by calling getPointerSizeInBits without going through a switch.
getPointerTypeSizeInBits is called pretty often, particularly by getOrEnforceKnownAlignment which is used by InstCombine. This should speed that up a little bit.
Differential Revision: https://reviews.llvm.org/D31841
llvm-svn: 300475
It's basically a terrible idea anyway but objc_msgSend gets emitted like that.
We can decide on a better way to deal with it in the unlikely event that anyone
actually uses it.
llvm-svn: 300474
We seem to assume that OS-provided thread IDs are either uptr or int, neither of which is true on Darwin. This introduces a tid_t type, which holds a OS-provided thread ID (gettid on Linux, pthread_threadid_np on Darwin, pthread_self on FreeBSD).
Differential Revision: https://reviews.llvm.org/D31774
llvm-svn: 300473
The documentation for the waymarking algorithm says that we use the lower 2 bits of Use::Prev to store the way marking bits. But because we use a PointerIntPair with the default PointerLikeTypeTraits, we're using bits 2:1 on 64-bit targets.
There's also a trick employed for distinguishing Users that have Uses stored with them and Users that have Uses stored in a separate array. The documentation says we use the LSB of the first byte of the real User object or the User* that occurs at the end of the Use array. But again due to the PointerLikeTypeTraits we're really using bit 2(64-bit) or bit 1(32-bit) and not the LSB. This is a little worrying because the first byte of the User object is the vtable ptr so we're assuming the vtable has 8 byte or 4 byte alignment where what is documented would only require 2 byte alignment.
This patch provides a custom traits override for these two cases to put the bits where the documentation says they are. It also has the side effect of removing some shifts from the waymarking traversal implementation.
Differential Revision: https://reviews.llvm.org/D31733
llvm-svn: 300471
Add a top-level STRTAB block containing a string table blob, and start storing
strings for module codes FUNCTION, GLOBALVAR, ALIAS, IFUNC and COMDAT in
the string table.
This change allows us to share names between globals and comdats as well
as between modules, and improves the efficiency of loading bitcode files by
no longer using a bit encoding for symbol names. Once we start writing the
irsymtab to the bitcode file we will also be able to share strings between
it and the module.
On my machine, link time for Chromium for Linux with ThinLTO decreases by
about 7% for no-op incremental builds or about 1% for full builds. Total
bitcode file size decreases by about 3%.
As discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2017-April/111732.html
Differential Revision: https://reviews.llvm.org/D31838
llvm-svn: 300464
It's almost certainly not a good idea to actually use it in most cases (there's
a pretty large code size overhead on AArch64), but we can't do those
experiments until it's supported.
llvm-svn: 300462
Summary: This specifically addresses the Mach-O zero page, which we cannot read from.
Reviewers: kubamracek, samsonov, alekseyshl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32044
llvm-svn: 300456
Causes some VGPR usage improvements in shaderdb, but
introduces some SGPR spilling regressions due to random
scheduling changes later.
llvm-svn: 300453
Summary:
This seems like an uncontroversial first step toward providing access to the metadata hierarchy that now exists in LLVM. This should allow for good debug info support from C.
Future plans are to deprecate API that take mixed bags of values and metadata (mainly the LLVMMDNode family of functions) and migrate the rest toward the use of LLVMMetadataRef.
Once this is in place, mapping of DIBuilder will be able to start.
Reviewers: mehdi_amini, echristo, whitequark, jketema, Wallbraker
Reviewed By: Wallbraker
Subscribers: Eugene.Zelenko, axw, mehdi_amini, llvm-commits
Differential Revision: https://reviews.llvm.org/D19448
llvm-svn: 300447
This patch is a generalization of the improvement introduced in rL296898.
Previously, we were able to peel one iteration of a loop to get rid of a Phi that becomes
an invariant on the 2nd iteration. In more general case, if a Phi becomes invariant after
N iterations, we can peel N times and turn it into invariant.
In order to do this, we for every Phi in loop's header we define the Invariant Depth value
which is calculated as follows:
Given %x = phi <Inputs from above the loop>, ..., [%y, %back.edge].
If %y is a loop invariant, then Depth(%x) = 1.
If %y is a Phi from the loop header, Depth(%x) = Depth(%y) + 1.
Otherwise, Depth(%x) is infinite.
Notice that if we peel a loop, all Phis with Depth = 1 become invariants,
and all other Phis with finite depth decrease the depth by 1.
Thus, peeling N first iterations allows us to turn all Phis with Depth <= N
into invariants.
Reviewers: reames, apilipenko, mkuper, skatkov, anna, sanjoy
Reviewed By: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31613
llvm-svn: 300446
Patch implements --compress-debug-sections=zlib.
In compare with D20211 (a year old patch, abandoned), it implementation
uses streaming and fully reimplemented, does not support zlib-gnu for
simplification.
This is PR32308.
Differential revision: https://reviews.llvm.org/D31941
llvm-svn: 300444
The code implements Richard Smith suggestion in comment 3 of the PR.
reviewer: Vassil Vassilev
Differential Revision: https://reviews.llvm.org/D31540
llvm-svn: 300443
This is non-functional change to re-order if statements to bail out earlier
from unreachable and ColdCall heuristics.
Reviewers: sanjoy, reames, junbuml, vsk, chandlerc
Reviewed By: chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31704
llvm-svn: 300442
When peeling loops basing on phis becoming invariants, we make a wrong loop size check.
UP.Threshold should be compared against the total numbers of instructions after the transformation,
which is equal to 2 * LoopSize in case of peeling one iteration.
We should also check that the maximum allowed number of peeled iterations is not zero.
Reviewers: sanjoy, anna, reames, mkuper
Reviewed By: mkuper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D31753
llvm-svn: 300441
Metadata potentially is more precise than any heuristics we use, so
it makes sense to use first metadata info if it is available. However it makes
sense to examine it against other strong heuristics like unreachable one.
If edge coming to unreachable block has higher probability then it is expected
by unreachable heuristic then we use heuristic and remaining probability is
distributed among other reachable blocks equally.
An example where metadata might be more strong then unreachable heuristic is
as follows: it is possible that there are two branches and for the branch A
metadata says that its probability is (0, 2^25). For the branch B
the probability is (1, 2^25).
So the expectation is that first edge of B is hotter than first edge of A
because first edge of A did not executed at least once.
If first edge of A points to the unreachable block then using the unreachable
heuristics we'll set the probability for A to (1, 2^20) and now edge of A
becomes hotter than edge of B.
This is unexpected behavior.
This fixed the biggest part of https://bugs.llvm.org/show_bug.cgi?id=32214
Reviewers: sanjoy, junbuml, vsk, chandlerc
Reviewed By: chandlerc
Subscribers: llvm-commits, reames, davidxl
Differential Revision: https://reviews.llvm.org/D30631
llvm-svn: 300440
If we already called computeKnownBits for the RHS being a constant power of 2, we've already computed everything we can and should just stop. I think previously we would still recurse if we had determined the result was negative or had not determined the sign bit at all.
llvm-svn: 300432