Summary:
This lets you build with one CUDA installation but use ptxas from
another install.
This is useful e.g. if you want to avoid bugs in an old ptxas without
actually upgrading wholesale to a newer CUDA version.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D27788
llvm-svn: 289847
This patch checks that the SlowMisaligned128Store subtarget feature is set
when penalizing such stores in getMemoryOpCost.
Differential Revision: https://reviews.llvm.org/D27677
llvm-svn: 289845
This is split out from D27696, since it turned out to be a bug fix and
not part of the NFC efficiency change.
Keep the same adjusted (possibly decayed) threshold in both the worklist
and the ImportList. Otherwise if we encountered it first along a cold
path, the callee would be added to the worklist with a lower decayed
threshold than when it is later encountered along a hot path. But the
logic uses the threshold recorded in the ImportList entry to check if
we should re-add it, and without this patch the threshold recorded there
is the same along both paths so we don't re-add it. Using the
same possibly decayed threshold in the ImportList ensures we re-add it
later with the higher non-decayed hot path threshold.
llvm-svn: 289843
When building the LLDB Framework we need to ensure that the Python files get put into the Framework before the Framework's install target can be invoked.
All files inside the Framework's Resources bundle will get copied over during the install action.
llvm-svn: 289842
CMake's framework target generation was unable to generate POST_BUILD steps (see: https://gitlab.kitware.com/cmake/cmake/issues/16363).
It turns out working around this is really not reasonable. The more reasonable solution to me is just to not support LLDB.framework unless you are on CMake 3.7 or newer.
Since CMake 3.7.1 is released that's how I'm going to handle this.
llvm-svn: 289841
If OUTPUT_DIR is not specified we can assume the symlink is linking to a file in the same directory, so we can use $<TARGET_FILE_NAME:${target}> to create a relative symlink.
In the case of LLDB, when we build a framework, we are creating symlinks in a different directory than the file we're pointing to, and we don't install those links. To make this work in the build directory we can use $<TARGET_FILE:${target}> instead, which uses the full path to the target.
llvm-svn: 289840
Summary:
With the recent changes to the Secondary, we use less bits for UnusedBytes,
which allows us in return to increase the bits used for Offset. That means
that we can use a Primary SizeClassMap allowing for a larger maximum size.
Reviewers: kcc, alekseyshl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27816
llvm-svn: 289838
This is a tiny patch with a big pile of test changes.
This partially fixes PR27885:
https://llvm.org/bugs/show_bug.cgi?id=27885
My motivating case looks like this:
- vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,2]
- vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
- vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
+ vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]
And this happens several times in the diffs. For chips with domain-crossing penalties,
the instruction count and size reduction should usually overcome any potential
domain-crossing penalty due to using an FP op in a sequence of int ops. For chips such
as recent Intel big cores and Atom, there is no domain-crossing penalty for shufps, so
using shufps is a pure win.
So the test case diffs all appear to be improvements except one test in
vector-shuffle-combining.ll where we miss an opportunity to use a shift to generate
zero elements and one test in combine-sra.ll where multiple uses prevent the expected
shuffle combining.
Differential Revision: https://reviews.llvm.org/D27692
llvm-svn: 289837
Move the check for the code model into isGlobalInSmallSectionImpl and return false (not in small section) for variables placed in sections prefixed with .ldata (workaround for a tool limitation).
llvm-svn: 289832
We already have an interceptor for __shared_weak_count::__release_shared, this patch handles __shared_count::__release_shared in the same way. This should get rid of TSan false positives when using std::future.
Differential Revision: https://reviews.llvm.org/D27797
llvm-svn: 289831
The UBSAN runtime is built static on Windows. This requires that we give local
storage always. This impacts Windows where the linker would otherwise have to
generate a thunk to access the symbol via the IAT. This should repair the
windows clang build bots.
llvm-svn: 289829
Simplify CFG will try to sink the last instruction in a series of basic blocks,
creating a "common" instruction in the successor block (sinkLastInstruction).
When it does this, the debug location of the single instruction should be the
merged debug locations of the commoned instructions.
Differential Revision: https://reviews.llvm.org/D27590
llvm-svn: 289828
It os used in work/emulators/qemu-user-static port.
Which tries to use -Ttext-segment and then:
# In case ld does not support -Ttext-segment, edit the default linker
# script via sed to set the .text start addr. This is needed on FreeBSD
# at least.
<here it calls -verbose to extract and edit default bfd linker script.>
Actually now we are do not fully support -Ttext properly (see D27613),
but we also seems never will provide anything close to default script, like bfd do,
so at least this patch introduces proper alias handling.
llvm-svn: 289827
Add the missing domain equivalences for movss, movsd, movd and movq zero extending loading instructions.
Differential Revision: https://reviews.llvm.org/D27684
llvm-svn: 289825
Summary: I was building lldb using cross mingw-w64 toolchain on Linux and observed some issues. This is first patch in the series to fix that build. It mostly corrects the case of include files and adjusts some #ifdefs from _MSC_VER to _WIN32 and vice versa. I built lldb on windows with VS after applying this patch to make sure it does not break the build there.
Reviewers: zturner, labath, abidh
Subscribers: ki.stfu, mgorny, lldb-commits
Differential Revision: https://reviews.llvm.org/D27759
llvm-svn: 289821
Specifically avoid implicit conversions from/to integral types to
avoid potential errors when changing the underlying type. For example,
a typical initialization of a "full" mask was "LaneMask = ~0u", which
would result in a value of 0x00000000FFFFFFFF if the type was extended
to uint64_t.
Differential Revision: https://reviews.llvm.org/D27454
llvm-svn: 289820
To prevent copy statements from accessing arrays out of bounds, ranges of their
extension maps are restricted, according to the constraints of domains.
Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D25655
llvm-svn: 289815
A number of new patterns for simplifying and/xor of icmp:
(icmp ne %x, 0) ^ (icmp ne %y, 0) => icmp ne %x, %y if the following is true:
1- (%x = and %a, %mask) and (%y = and %b, %mask)
2- %mask is a power of 2.
(icmp eq %x, 0) & (icmp ne %y, 0) => icmp ult %x, %y if the following is true:
1- (%x = and %a, %mask1) and (%y = and %b, %mask2)
2- Let %t be the smallest power of 2 where %mask1 & %t != 0. Then for any
%s that is a power of 2 and %s & %mask2 != 0, we must have %s <= %t.
For example if %mask1 = 24 and %mask2 = 16, setting %s = 16 and %t = 8
violates condition (2) above. So this optimization cannot be applied.
llvm-svn: 289813
Patch continues work started in D24706 and D25821.
in this patch symbol table and constant pool areas were
added to .gdb_index section output.
This one finishes the implementation of --gdb-index functionality in LLD.
Differential revision: https://reviews.llvm.org/D26283
llvm-svn: 289810
gemm ([1]). In particular, elements of the matrix B, the second operand of
matrix multiplication, are reused between iterations of the innermost loop.
To keep the reused data in cache, only elements of matrix A, the first operand
of matrix multiplication, should be evicted during an iteration of the
innermost loop. To provide such a cache replacement policy, elements of the
matrix A can, in particular, be loaded first and, consequently, be
least-recently-used.
In our case matrices are stored in row-major order instead of column-major
order used in the BLIS implementation ([1]). One of the ways to address it is
to accordingly change the order of the loops of the loop nest. However, it
makes elements of the matrix A to be reused in the innermost loop and,
consequently, requires to load elements of the matrix B first. Since the LLVM
vectorizer always generates loads from the matrix A before loads from the
matrix B and we can not provide it. Consequently, we only change the BLIS micro
kernel and the computation of its parameters instead. In particular, reused
elements of the matrix B are successively multiplied by specific elements of
the matrix A .
Refs.:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf
Reviewed-by: Tobias Grosser <tobias@grosser.es>
Differential Revision: https://reviews.llvm.org/D25653
llvm-svn: 289806
In some situations, the BUILD_VECTOR node that builds a v18i8 vector by
a splat of an i8 constant will end up with signed 8-bit values and other
situations, it'll end up with unsigned ones. Handle both situations.
Fixes PR31340.
llvm-svn: 289804
This code is currently unused.
Removing it should make porting of the linux plugin to NetBSD easier, and we can
always add it later if needed.
llvm-svn: 289801
Summary: This fixes templated type aliases and templated type aliases in classes.
Reviewers: hokein
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D27801
llvm-svn: 289799
Summary:
Use auto when declaring variables that are initialized by calling a templated
function that returns its explicit first argument.
Fixes PR26763.
Reviewers: aaron.ballman, alexfh, staronj, Prazek
Subscribers: Eugene.Zelenko, JDevlieghere, cfe-commits
Differential Revision: https://reviews.llvm.org/D27166
llvm-svn: 289797