We should avoid emitting MachineSDNodes from lowering.
We can use the the implicit def handling in InstrEmitter to avoid
manually copying from each xmm result register. We only need to
manually emit the copies for the implicit uses.
New projects (particularly out of tree) have a tendency to hijack the existing
llvm configuration options and build targets (add_llvm_library,
add_llvm_tool). This can lead to some confusion.
1) When querying a configuration variable, do we care about how LLVM was
configured, or how these options were configured for the out of tree project?
2) LLVM has lots of defaults, which are easy to miss
(e.g. LLVM_BUILD_TOOLS=ON). These options all need to be duplicated in the
CMakeLists.txt for the project.
In addition, with LLVM Incubators coming online, we need better ways for these
incubators to do things the "LLVM way" without alot of futzing. Ideally, this
would happen in a way that eases importing into the LLVM monorepo when
projects mature.
This patch creates some generic infrastructure in llvm/cmake/modules and
refactors MLIR to use this infrastructure. This should expand to include
add_xxx_library, which is by far the most complicated bit of building a
project correctly, since it has to deal with lots of shared library
configuration bits. (MLIR currently hijacks the LLVM infrastructure for
building libMLIR.so, so this needs to get refactored anyway.)
Differential Revision: https://reviews.llvm.org/D85140
Instead of emitting MachineSDNodes during lowering, emit X86ISD
opcodes. These opcodes will either be selected by tablegen
patterns or custom selection code.
Emitting MachineSDNodes during lowering is uncommon so this makes
things more consistent. It also allows selectAddr to be called to
perform address matching during instruction selection.
I had trouble getting tablegen to accept XMM0-XMM7 as results in
an isel pattern for the WIDE instructions so I had to use custom
instruction selection.
Class simplifies keeping track of the indentation while emitting. For every new line the current indentation is simply prefixed (if not at start of line, then it just emits as normal). Add a simple Region helper that makes it easy to have the C++ scope match the emitted scope.
Use this in op doc generator and rewrite generator.
This reverts revert commit be185b6a73 addresses shared lib failure by fixing up cmake files.
Differential Revision: https://reviews.llvm.org/D84107
This diff refactors writeUniversalBinary and adds writeUniversalBinaryToBuffer.
This is a preparation for adding support for universal binaries to llvm-objcopy.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D88372
The signature of the ctor expects a MCRegister, but currently any
unsigned value can be converted to a MCRegister.
This patch checks that indeed the provided value is a physical register
only. We want to eventually stop implicitly converting unsigned or
Register to MCRegister (which is incorrect). The next step after this
patch is changing uses of MCRegUnitIterator to explicitly cast Register
or unsigned values to MCRegister. To that end, this patch also
introduces 2 APIs that make that conversion checked and explicit.
Differential Revision: https://reviews.llvm.org/D88705
When updating operands of a VPUser, we also have to adjust the list of
users for the new and old VPValues. This is required once we start
transitioning recipes to become VPValues.
We could either try to make SROA more picky to the new type
and/or prevent InstCombine from creating the original problem (converting load-stores to operate on ints),
and/or make InstCombine recover the situation by cleaning up all that cruft.
This makes the prologue match the windows canonical layout, for
cases without a frame pointer.
This can potentially be a slower (a longer dependency chain of the
sp register, and potentially one arithmetic operation more on some
cores), but gives notable size improvements.
The previous two commits shrinks a 166 KB xdata section by 49 KB,
and if the change from this commit is enabled, it shrinks the xdata
section by another 25 KB.
In total, since the start of the recent arm64 unwind info cleanups
and optimizations (since before commit 37ef743cbf), the xdata+pdata
sections of the same test DLL has shrunk from 407 KB in total
originally, to 163 KB now.
Differential Revision: https://reviews.llvm.org/D88701
This saves one instruction per prologue/epilogue for any function with
an odd number of callee-saved GPRs, but more importantly, allows such
functions to match the packed unwind format.
Differential Revision: https://reviews.llvm.org/D88699
On windows, the callee saved registers in a canonical prologue are
ordered starting from a lower register number at a lower stack
address (with the possible gap for aligning the stack at the top);
this is the opposite order that llvm normally produces.
To achieve this, reverse the order of the registers in the
assignCalleeSavedSpillSlots callback, to get the stack objects
laid out by PrologEpilogInserter in the right order, and adjust
computeCalleeSaveRegisterPairs to lay them out from the bottom up.
This allows generated prologs more often to match the format that
allows the unwind info to be written as packed info.
Differential Revision: https://reviews.llvm.org/D88677
Add a partial read/write tests for x87 FPU registers. This includes
reading and writing ST registers, control registers and floating-point
exception data registers (fop, fip, fdp).
The tests assume the current (roughly incorrect) behavior of reporting
the 'abridged' 8-bit ftag state as 16-bit ftag. They also assume Linux
plugin behavior of reporting fip/fdp split into halves as (fiseg, fioff)
and (foseg, fooff).
Differential Revision: https://reviews.llvm.org/D88583
Multiple fixes related to bugs discovered while debugging a crash
when reading all registers on i386.
The underlying problem was that GetSetForNativeRegNum() did not account
for MPX registers on i386, and since it only compared against upper
bounds of each known register set, the MPX registers were classified
into the wrong set and therefore considered supported. However, they
were not expected in RegNumX86ToX86_64() and caused the assertion
to fail.
This includes:
- adding (unused) i386 → x86_64 translations for MPX registers
- fixing GetSetForNativeRegNum() to check both lower and upper bound
for register sets, to avoid wrongly classifying unhandled register
sets
- adding missing range check for MPX registers on i386
- renaming k_last_mpxr to k_last_mpxr_i386 for consistency
- replacing return-assertions with llvm_unreachable() and adding more
checks for unexpected parameters
Differential Revision: https://reviews.llvm.org/D88682
Fix reading FIP/FDP registers to correctly return segment and offset
parts. On amd64, this roughly matches the Linux behavior of splitting
the 64-bit FIP/FDP into two halves, and putting the higher 32 bits
into f*seg and lower into f*off. Well, actually we use only 16 bits
of higher half but the CPUs do not seem to handle more than that anyway.
Differential Revision: https://reviews.llvm.org/D88681
Do not instrument user-defined ELF sections (whose names resemble valid
C identifiers). They may have special use semantics and modifying them
may break programs. This is e.g. the case with NetBSD __link_set API
that expects these sections to store consecutive array elements.
Differential Revision: https://reviews.llvm.org/D76665
We can't use Use.Calls after its std::move()'d to TmpCalls as it will be in an undefined state. Instead, swap with the known empty map in TmpCalls so we can then safely emplace_back into the now empty Use.Calls.
Fixes clang static analyzer warning.
Class simplifies keeping track of the indentation while emitting. For every new line the current indentation is simply prefixed (if not at start of line, then it just emits as normal). Add a simple Region helper that makes it easy to have the C++ scope match the emitted scope.
Use this in op doc generator and rewrite generator.
Differential Revision: https://reviews.llvm.org/D84107
We were not accounting for the pointer offset when splitting a store from
a VMOVDRR node, which could lead to incorrect aliasing info. In this
case it is the fneg via integer arithmetic that gives us a store->load
pair that we started getting wrong.
Differential Revision: https://reviews.llvm.org/D88653
Add basic vector handling to recognizeBSwapOrBitReverseIdiom/collectBitParts - this works at the element level, all vector element operations must match (splat constants etc.) and there is no cross-element support (insert/extract/shuffle etc.).
This patch fixes one worning. Since Flang sets `-Werror`, that's
sufficient for a build to fail. As per flang/README.md, Clang-10 is one
of the officially supported compilers.
Differential Revision: https://reviews.llvm.org/D88723
If we're bswap'ing some bytes and zero'ing the remainder we can perform this as a bswap+mask which helps us match 'partial' bswaps as a first step towards folding into a more complex bswap pattern.
Reapplied with early-out if recognizeBSwapOrBitReverseIdiom collects a source wider than the result type.
Differential Revision: https://reviews.llvm.org/D88578
The function `TryListConversion` didn't properly validate the following
part of the standard:
Otherwise, if the parameter type is a character array [... ]
and the initializer list has a single element that is an
appropriately-typed string literal (8.5.2 [dcl.init.string]), the
implicit conversion sequence is the identity conversion.
This caused the following call to `f()` to be ambiguous.
void f(int(&&)[1]);
void f(unsigned(&&)[1]);
void g(unsigned i) {
f({i});
}
This issue only occurs when the initializer list had one element.
Differential Revision: https://reviews.llvm.org/D87561
This helper method is useful even outside of Gnu toolchains, so move
it to ToolChain so it can be reused in other toolchains such as Fuchsia.
Differential Revision: https://reviews.llvm.org/D88452
The aesdec/enc instructions produce a flag output and one or eight
xmm regsiter outputs. The test were not capturing the xmm outputs.
Also add nounwind to tests to remove .cfi directives
This moves the platform-specific parameter logic from asan into
lsan_common.h to lsan can share it.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D87795