Shufflevector instructions in LLVM IR that extract a subset of elements
of a longer input into a shorter vector can be done using VECTOR_SHUFFLEs.
This will avoid expanding them into constly extracts and inserts.
llvm-svn: 342091
This is available on most platforms (Linux/Mac/Win/BSD) with no extra syscalls.
On other platforms (e.g. Solaris) we stat() if this information is requested.
This will allow switching clang's VFS to efficiently expose (path, type) when
traversing a directory. Currently it exposes an entire Status, but does so by
calling fs::status() on all platforms.
Almost all callers only need the path, and all callers only need (path, type).
Patch by sammccall (Sam McCall)
Differential Revision: https://reviews.llvm.org/D51918
llvm-svn: 342089
template methods in JITDylib out-of-line.
This also splits JITDylib::define into a pair of template methods, one taking an
lvalue reference and the other an rvalue reference. This simplifies the
templates at the cost of a small amount of code duplication.
llvm-svn: 342087
construction, a new convenience lookup method, and add-to layer methods.
ExecutionSession now creates a special 'main' JITDylib upon construction. All
subsequently created JITDylibs are added to the main JITDylib's search order by
default (controlled by the AddToMainDylibSearchOrder parameter to
ExecutionSession::createDylib). The main JITDylib's search order will be used in
the future to properly handle cross-JITDylib weak symbols, with the first
definition in this search order selected.
This commit also adds a new ExecutionSession::lookup convenience method that
performs a blocking lookup using the main JITDylib's search order, as this will
be a very common operation for clients.
Finally, new convenience overloads of IRLayer and ObjectLayer's add methods are
introduced that add the given program representations to the main dylib, which
is likely to be the common case.
llvm-svn: 342086
Summary:
rL341389 broke code with tied register operands in inline assembly. For
example, `asm("" : "=r"(var) : "0"(var));`
The code above specifies the input operand to be in the same register
with the output operand, tying the two register. This patch makes this
kind of code work again.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, eraman, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51991
llvm-svn: 342084
r342003 added support for emitting FPO data from the
DEBUG_S_FRAMEDATA subsection of the .debug$S section to the PDB
file. However, that is not the end of the story. FPO can end
up in two different destinations in a PDB, each corresponding to
a different FPO data source.
The case handled by r342003 involves copying data from the
DEBUG_S_FRAMEDATA subsection of the .debug$S section to the
"New FPO" stream in the PDB, which is then referred to by the
DBI stream. The case handled by this patch involves copying
records from the .debug$F section of an object file to the "FPO"
stream (or perhaps more aptly, the "Old FPO" stream) in the PDB
file, which is also referred to by the DBI stream.
The formats are largely similar, and the difference is mostly
only visible in masm generated object files, such as some of the
low-level CRT object files like memcpy. MASM doesn't appear to
support writing the DEBUG_S_FRAMEDATA subsection, and instead
just writes these records to the .debug$F section.
Although clang-cl does not emit a .debug$F section ever, lld still
needs to support it so we have good debugging for CRT functions.
Differential Revision: https://reviews.llvm.org/D51958
llvm-svn: 342080
Scalarization of a shuffle will break up the source vectors into individual
elements, and use them to assemble the resulting vector. An element type of
a legal vector type may not necessarily be a legal scalar type, so make
sure that the extracted values are extended to a legal scalar type.
llvm-svn: 342079
Move isa version determination into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
llvm-svn: 342069
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only `n` bits wide (where `n` is a variable.)
There are **many** ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)
Let's handle the second variant first, since it is much simpler.
https://rise4fun.com/Alive/LYjYhttps://bugs.llvm.org/show_bug.cgi?id=38708
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51985
llvm-svn: 342067
Summary:
Match the ordering semantics of non-vector comparisons. For
floating point comparisons that do not correspond to instructions, the
tests check that some vector comparison instruction was emitted but do
not care about the full implementation.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51765
llvm-svn: 342064
There's no advantage to this instruction unless you need to avoid touching other flag bits. It's encoding is longer, it can't fold an immediate, it doesn't write all the flags.
I don't think gcc will generate this instruction either.
Fixes PR38852.
Differential Revision: https://reviews.llvm.org/D51754
llvm-svn: 342059
Fix for https://bugs.llvm.org/show_bug.cgi?id=38912.
In GVNHoist::computeInsertionPoints() we iterate over the Value
Numbers and calculate the Iterated Dominance Frontiers without
clearing the IDFBlocks vector first. IDFBlocks ends up accumulating
an insane number of basic blocks, which bloats the compilation time
of SemaChecking.cpp with ubsan enabled.
Differential Revision: https://reviews.llvm.org/D51980
llvm-svn: 342055
This patch adds codegen support for the saving/restoring
V8-V23 for functions specified with the aarch64_vector_pcs
calling convention attribute, as added in patch D51477.
Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D51479
llvm-svn: 342049
Eliminating some duplication of rangelist dumping code at the expense of
some version-dependent code in dump and extract routines.
Reviewer: dblaikie, JDevlieghere, vleschuk
Differential revision: https://reviews.llvm.org/D51081
llvm-svn: 342048
The output of splitLargeGEPOffsets does not appear to be deterministic because
of the way that we iterate over a DenseMap. I've changed it to a MapVector for
consistent output.
The test here isn't particularly great, only showing a consmetic difference in
output. The original reproducer is much larger but show a diffierence in
instruction ordering, leading to different codegen.
Differential Revision: https://reviews.llvm.org/D51851
llvm-svn: 342043
Previously the alignment on the newly created switch table data was not set,
meaning that DataLayout::getPreferredAlignment was free to overalign it to 16
bytes. This causes unnecessary code bloat.
Differential Revision: https://reviews.llvm.org/D51800
llvm-svn: 342039
This patch refactors several parts of AArch64FrameLowering
so that it can be easily extended to support saving/restoring
of FPR128 (Q) registers.
Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D51478
llvm-svn: 342038
SMLAD and SMLALD instructions also come in the form of SMLADX and
SMLALDX which perform an exchange on their second operand. To support
this, more of the loads in the MAC candidates are compared for
sequential access and a boolean value has been added to BinOpChain.
AddMACCandiate has been refactored into a small pattern matching
state machine to reduce the amount of duplicated code, but also to
enable the matching to be more flexible. CreateParallelMACPairs now
iterates through all the candidates to find parallel ones.
Differential Revision: https://reviews.llvm.org/D51424
llvm-svn: 342033
This patch adds parsing support for the 'aarch64_vector_pcs'
calling convention attribute to calls and function declarations.
More information describing the vector ABI and procedure call standard
can be found here:
https://developer.arm.com/products/software-development-tools/\
hpc/arm-compiler-for-hpc/vector-function-abi
Reviewers: t.p.northover, rnk, rengolin, javed.absar, thegameg, SjoerdMeijer
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D51477
llvm-svn: 342030
Move the 2 classes out of LoopVectorize.cpp to make it easier to re-use
them for VPlan outside LoopVectorize.cpp
Reviewers: Ayal, mssimpso, rengolin, dcaballe, mkuper, hsaito, hfinkel, xbolva00
Reviewed By: rengolin, xbolva00
Differential Revision: https://reviews.llvm.org/D49488
llvm-svn: 342027
This fixes a layering violation:
Analysis/IVDescrtors.cpp can't include Transforms/Utils/BasicBlockUtils.h,
since TransformUtils depends on Analysis.
llvm-svn: 342024
I suspect this became unecessary when the CSE of mgather was fixed in r338080. It may still be possible to hit this if we widen the element type of a gather outside of type legalization and the promote the mask of a separate gather node so they become the same. But that seems pretty unlikely.
llvm-svn: 342022
Summary:
The InductionDescriptor and RecurrenceDescriptor classes basically analyze the IR to identify the respective IVs. So, it is better to have them in the "Analysis" directory instead of the "Transforms" directory.
The rationale for this is to make the Induction and Recurrence descriptor classes available for analysis passes. Currently including them in an analysis pass produces link error (http://lists.llvm.org/pipermail/llvm-dev/2018-July/124456.html).
Induction and Recurrence descriptors are moved from Transforms/Utils/LoopUtils.h|cpp to Analysis/IVDescriptors.h|cpp.
Reviewers: dmgreen, llvm-commits, hfinkel
Reviewed By: dmgreen
Subscribers: mgorny
Differential Revision: https://reviews.llvm.org/D51153
llvm-svn: 342016
Summary:
In GNUX23, is64BitMode returns true, but pointers are 32-bits. So we shouldn't copy pointer values into RSI/RDI since the widths don't match.
Fixes PR38865 despite what the title says. I think the llvm_unreachable in the copyPhysReg code tricked the optimizer and made the fatal error trigger.
Reviewers: rnk, efriedma, MatzeB, echristo
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51893
llvm-svn: 342015
Since the outliner is a module pass, it doesn't get codegen size remarks like
the other codegen passes do. This adds size remarks *to* the outliner.
This is kind of a workaround, so it's peppered with FIXMEs; size remarks
really ought to not ever be handled by the pass itself. However, since the
outliner is the only "MachineModulePass", this works for now. Since the
entire purpose of the MachineOutliner is to produce code size savings, it
really ought to be included in codgen size remarks.
If we ever go ahead and make a MachineModulePass (say, something similar to
MachineFunctionPass), then all of this ought to be moved there.
llvm-svn: 342009
Name: op_ugt_sum
%a = add i8 %x, %y
%r = icmp ugt i8 %x, %a
=>
%notx = xor i8 %x, -1
%r = icmp ugt i8 %y, %notx
Name: sum_ult_op
%a = add i8 %x, %y
%r = icmp ult i8 %a, %x
=>
%notx = xor i8 %x, -1
%r = icmp ugt i8 %y, %notx
https://rise4fun.com/Alive/ZRxI
AFAICT, this doesn't interfere with any add-saturation patterns
because those have >1 use for the 'add'. But this should be
better for IR analysis and codegen in the basic cases.
This is another fold inspired by PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613
llvm-svn: 342004
Summary:
Previously, check-llvm on my Windows 10 workstation took about 300s to
run, and it would lock up my mouse. Now, it takes just under 60s.
Previously running the tests only used about 15% of the available CPU
time, now it uses up to 60%.
Shell32.dll and ole32.dll have direct dependencies on user32.dll and
gdi32.dll. These DLLs are mostly used to for Windows GUI functionality,
which most LLVM tools don't need. It seems that these DLLs acquire and
release some system resources on startup and exit, and that requires
acquiring the same highly contended lock discussed in this post:
https://randomascii.wordpress.com/2017/07/09/24-core-cpu-and-i-cant-move-my-mouse/
Most LLVM tools still have a transitive dependency on
SHGetKnownFolderPathW, which is used to look up the user home directory
and local AppData directory. We also use SHFileOperationW to recursively
delete a directory, but only LLDB and clang-doc use this today. At some
point, we might consider removing these last shell32.dll deps, but for
now, I would like to take this free performance.
Many thanks to Bruce Dawson for suggesting this fix. He looked at an ETW
trace of an LLVM test run, noticed these DLLs, and suggested avoiding
them.
Reviewers: zturner, pcc, thakis
Subscribers: mgorny, llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D51952
llvm-svn: 342002
This reverts rL341954.
The builder `sanitizer-x86_64-linux-bootstrap-ubsan` has been
failing with timeouts at stage2 clang/ubsan:
[3065/3073] Linking CXX executable bin/lld
command timed out: 1200 seconds without output running python
../sanitizer_buildbot/sanitizers/buildbot_selector.py,
attempting to kill
llvm-svn: 342001
Summary:
There are two registers encoded in the S_FRAMEPROC flags: one for locals
and one for parameters. The encoding is described by the
ExpandEncodedBasePointerReg function in cvinfo.h. Two bits are used to
indicate one of four possible values:
0: no register - Used when there are no variables.
1: SP / standard - Variables are stored relative to the standard SP
for the ISA.
2: FP - Variables are addressed relative to the ISA frame
pointer, i.e. EBP on x86. If realignment is required, parameters
use this. If a dynamic alloca is used, locals will be EBP relative.
3: Alternative - Variables are stored relative to some alternative
third callee-saved register. This is required to address highly
aligned locals when there are dynamic stack adjustments. In this
case, both the incoming SP saved in the standard FP and the current
SP are at some dynamic offset from the locals. LLVM uses ESI in
this case, MSVC uses EBX.
Most of the changes in this patch are to pass around the CPU so that we
can decode these into real, named architectural registers.
Subscribers: hiraditya
Differential Revision: https://reviews.llvm.org/D51894
llvm-svn: 341999
In this diff we adjust the code of getSymbols to avoid creating LLVMContext when it's not necessary.
Without this patch when the function getSymbols was called on a MachO object with a __bitcode section
it was parsing the embedded bitcode and then ignoring the result.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D51759
llvm-svn: 341998
These are the folds in Alive;
Name: xor_ult
Pre: isPowerOf2(-C1)
%xor = xor i8 %x, C1
%r = icmp ult i8 %xor, C1
=>
%r = icmp ugt i8 %x, ~C1
Name: xor_ugt
Pre: isPowerOf2(C1+1)
%xor = xor i8 %x, C1
%r = icmp ugt i8 %xor, C1
=>
%r = icmp ugt i8 %x, C1
https://rise4fun.com/Alive/Vty
The ugt case in its simplest form was already handled by DemandedBits,
but that's not ideal as shown in the multi-use test.
I'm not sure if these are all of the symmetrical folds, but I adjusted
the existing code for one of the folds to try to show the similarities.
There's no obvious connection, but this is another preliminary step
for PR14613...
https://bugs.llvm.org/show_bug.cgi?id=14613
llvm-svn: 341997
Summary: Initial support for nsw, nuw and exact flags in MI
Reviewers: spatel, hfinkel, wristow
Reviewed By: spatel
Subscribers: nlopes
Differential Revision: https://reviews.llvm.org/D51738
llvm-svn: 341996
Fixes at_file.c test failure caused by r341988. We may want to change
how we treat \n in our tokenizer, but this is probably a good fix
regardless, since we can invoke all kinds of programs with different
interpretations of the command line quoting rules.
llvm-svn: 341992
Summary:
Shell32.dll depends on gdi32.dll and user32.dll, which are mostly DLLs
for Windows GUI functionality. LLVM's utilities don't typically need GUI
functionality, and loading these DLLs seems to be slowing down startup.
Also, we already have an implementation of Windows command line
tokenization in cl::TokenizeWindowsCommandLine, so we can just use it.
The goal is to get the original argv in UTF-8, so that it can pass
through most LLVM string APIs. A Windows process starts life with a
UTF-16 string for its command line, and it can be retreived with
GetCommandLineW from kernel32.dll.
Previously, we would:
1. Get the wide command line
2. Call CommandLineToArgvW to handle quoting rules and separate it into
arguments.
3. For each wide argument, expand wildcards (* and ?) using
FindFirstFileW.
4. Convert each argument to UTF-8
Now we:
1. Get the wide command line, convert the whole thing to UTF-8
2. Tokenize the UTF-8 command line with cl::TokenizeWindowsCommandLine
3. For each argument, expand wildcards if present
- This requires converting back to UTF-16 to call FindFirstFileW
- Results of FindFirstFileW must be converted back to UTF-8
Reviewers: zturner
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51941
llvm-svn: 341988
Summary:
Place global arrays in comdat sections with their associated functions.
This makes sure they are stripped along with the functions they
reference, even on the BFD linker.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: eraman, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D51902
llvm-svn: 341987
Summary:
Update MemorySSA in old LoopUnswitch pass.
Actual dependency and update is disabled by default.
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D45301
llvm-svn: 341984