Adds support for MachO static initializers/deinitializers and eh-frame
registration via the ORC runtime.
This commit introduces cooperative support code into the ORC runtime and ORC
LLVM libraries (especially the MachOPlatform class) to support macho runtime
features for JIT'd code. This commit introduces support for static
initializers, static destructors (via cxa_atexit interposition), and eh-frame
registration. Near-future commits will add support for MachO native
thread-local variables, and language runtime registration (e.g. for Objective-C
and Swift).
The llvm-jitlink tool is updated to use the ORC runtime where available, and
regression tests for the new MachOPlatform support are added to compiler-rt.
Notable changes on the ORC runtime side:
1. The new macho_platform.h / macho_platform.cpp files contain the bulk of the
runtime-side support. This includes eh-frame registration; jit versions of
dlopen, dlsym, and dlclose; a cxa_atexit interpose to record static destructors,
and an '__orc_rt_macho_run_program' function that defines running a JIT'd MachO
program in terms of the jit- dlopen/dlsym/dlclose functions.
2. Replaces JITTargetAddress (and casting operations) with ExecutorAddress
(copied from LLVM) to improve type-safety of address management.
3. Adds serialization support for ExecutorAddress and unordered_map types to
the runtime-side Simple Packed Serialization code.
4. Adds orc-runtime regression tests to ensure that static initializers and
cxa-atexit interposes work as expected.
Notable changes on the LLVM side:
1. The MachOPlatform class is updated to:
1.1. Load the ORC runtime into the ExecutionSession.
1.2. Set up standard aliases for macho-specific runtime functions. E.g.
___cxa_atexit -> ___orc_rt_macho_cxa_atexit.
1.3. Install the MachOPlatformPlugin to scrape LinkGraphs for information
needed to support MachO features (e.g. eh-frames, mod-inits), and
communicate this information to the runtime.
1.4. Provide entry-points that the runtime can call to request initializers,
perform symbol lookup, and request deinitialiers (the latter is
implemented as an empty placeholder as macho object deinits are rarely
used).
1.5. Create a MachO header object for each JITDylib (defining the __mh_header
and __dso_handle symbols).
2. The llvm-jitlink tool (and llvm-jitlink-executor) are updated to use the
runtime when available.
3. A `lookupInitSymbolsAsync` method is added to the Platform base class. This
can be used to issue an async lookup for initializer symbols. The existing
`lookupInitSymbols` method is retained (the GenericIRPlatform code is still
using it), but is deprecated and will be removed soon.
4. JIT-dispatch support code is added to ExecutorProcessControl.
The JIT-dispatch system allows handlers in the JIT process to be associated with
'tag' symbols in the executor, and allows the executor to make remote procedure
calls back to the JIT process (via __orc_rt_jit_dispatch) using those tags.
The primary use case is ORC runtime code that needs to call bakc to handlers in
orc::Platform subclasses. E.g. __orc_rt_macho_jit_dlopen calling back to
MachOPlatform::rt_getInitializers using __orc_rt_macho_get_initializers_tag.
(The system is generic however, and could be used by non-runtime code).
The new ExecutorProcessControl::JITDispatchInfo struct provides the address
(in the executor) of the jit-dispatch function and a jit-dispatch context
object, and implementations of the dispatch function are added to
SelfExecutorProcessControl and OrcRPCExecutorProcessControl.
5. OrcRPCTPCServer is updated to support JIT-dispatch calls over ORC-RPC.
6. Serialization support for StringMap is added to the LLVM-side Simple Packed
Serialization code.
7. A JITLink::allocateBuffer operation is introduced to allocate writable memory
attached to the graph. This is used by the MachO header synthesis code, and will
be generically useful for other clients who want to create new graph content
from scratch.
Based off the codegen reports on PR51075 - hopefully we can handle some of this in SLP or VectorCombine, but we usually have to leave load combining until the backend so at least some of these patterns will still appear even then.
This is part of an effort to reduce the differences between the custom C++ bindings used right now by polly in `lib/External/isl/include/isl/isl-noxceptions.h` and the official isl C++ interface.
Changes made:
- Stop generating `isl::union_set` and isl::union_map` from `isl::space` and instead generate them from `isl::ctx`
- Disable clang-format on `isl-noexceptions.h`
- Removed `isl::union_{set,map}` generator from `isl::space` from `isl-noexceptions.h`
- `isl-noexceptions.h` has been generated by this 87c3413b6f
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D106059
Insert ops replacing pad_tensor in front of the associated tansfer_write / insert_slice op. Otherwise we may end up with invalid ir if one of the remaining tansfer_write / insert_slice operands is defined after the pad_tensor op.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D106162
Finds function calls where the call arguments might be provided in an
incorrect order, based on the comparison (via string metrics) of the
parameter names and the argument names against each other.
A diagnostic is emitted if an argument name is similar to a *different*
parameter than the one currently passed to, and it is sufficiently
dissimilar to the one it **is** passed to currently.
False-positive warnings from this check are useful to indicate bad
naming convention issues, even if a swap isn't necessary.
This check does not generate FixIts.
Originally implemented by @varjujan as his Master's Thesis work.
The check was subsequently taken over by @barancsuk who added type
conformity checks to silence false positive matches.
The work by @whisperity involved driving the check's review and fixing
some more bugs in the process.
Reviewed By: aaron.ballman, alexfh
Differential Revision: http://reviews.llvm.org/D20689
Co-authored-by: János Varjú <varjujanos2@gmail.com>
Co-authored-by: Lilla Barancsuk <barancsuklilla@gmail.com>
Replace code which identifies induction phi with helper function
getInductionVariable to improve robustness.
Differential Revision: https://reviews.llvm.org/D106045
This relaxes the VMLAV and VADDV reduction recognition code to handle
smaller than legal types, extending them as needed. That was already
handled for some reductions, this extends it to more types in a more
generic way. If a smaller than legal value is found it is extended to
the legal type as needed.
Differential Revision: https://reviews.llvm.org/D106051
The case for nxv2f32/nxv2i32 was already covered by D104573.
This patch builds on top of that by making the mechanism work for
nxv2[b]f16/nxv2i16, nxv4[b]f16/nxv4i16 as well.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D106138
If a file has no symbols, perhaps because it is a linked executable,
synthesize some symbols by walking the code section. Otherwise the
disassembler will try to treat the whole code section as a function,
which won't parse. Fixes https://bugs.llvm.org/show_bug.cgi?id=50957.
Differential Revision: https://reviews.llvm.org/D105539
D106236 added a new CMake argument for `libomptarget` test, but when user's
input contains white spaces, CMake will add escape char to the final lit command,
which leads to an error. This patch converts the user's input `LIBOMPTARGET_LIT_ARGS`
into a local array, and then passes the array to the function.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D106247
This reverts commit 2a419a0b99.
The result of a shufflevector must not propagate poison from any element
other than the one noted in the shuffle mask.
The regressions outside of fptoui-may-overflow.ll can probably be
recovered some other way; for example, using isGuaranteedNotToBePoison.
See discussion on https://reviews.llvm.org/D106053 for more background.
Differential Revision: https://reviews.llvm.org/D106222
The `__CUDA__` macro is already defined for openmp/nvptx and is not used by
`__clang_cuda_complex_builtins.h`, so dropping that macro slightly simplifies
nvptx and avoids defining it on amdgcn (where it is likely to be harmful).
Also dropped a cplusplus test from a C++ header as compilation will have
failed on cmath earlier if it was included from C.
Reviewed By: jdoerfert, fodinabor
Differential Revision: https://reviews.llvm.org/D105221
The inttoptr/ptrtoint roundtrip optimization is not always correct.
We are working towards removing this optimization and adding support
to specific cases where this optimization works. This patch is the
first one on this line.
Consider the example:
%i = ptrtoint i8* %X to i64
%p = inttoptr i64 %i to i16*
%cmp = icmp eq i8* %load, %p
In this specific case, the inttoptr/ptrtoint optimization is correct
as it only compares the pointer values. In this patch, we fold
inttoptr/ptrtoint to a bitcast (if src and dest types are different).
Differential Revision: https://reviews.llvm.org/D105088
By default, `lit` uses all threads to invoke tests, which can easily cause out
of memory on GPUs because most of OpenMP offloading test usually take about 1GB
GPU memory, but a typical GPU only has 4-8GB memory. This patch introduce a
CMake argument `LIBOMPTARGET_LIT_ARGS` to allow users to control the behavior of
`libomptarget` tests, similar to `LLVM_LIT_ARGS`.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D106236
I'm going to extend the functionality started in D106058 so move the folds into their own method to reduce the amount of code in DAGCombiner::visitSELECT
Currently when we compile the project in debug mode, `-g` will not be added to
compilation flag. The bc files generated in different mode are of different size.
When using GPU debuggers like `cuda-gdb`, it is expected to provide more info
with a debug version of bc lib.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D106229
D106053 exposed that we've not been taking into account that by bitcasting smaller elements together and then performing a ComputeKnownBits on the result we'd be allowing a poison element to influence other neighbouring elements being used in the pack. Instead we now peek through any existing bitcast to ensure that the source type already matches the width source of the pack node we're trying to match.
This has also been a chance to stop matchShuffleWithPACK creating unused nodes on the fly which could affect oneuse tests during shuffle lowering/combining.
The only regression we're seeing is due to being unable to peek through a bitcast as its on the other side of a extract_subvector - which should go away once we finally allow shuffle combining across different vector widths (by making matchShuffleWithPACK using const SelectionDAG& we've gotten closer to this - see PR45974).
At most these use the StringRef/Twine wrappers and don't have any implicit uses of std::string.
Move the include down to any cpp implementation where std::string is actually used.
This pattern is visible in unrolled and vectorized loops.
Although the backend seems to be able to reassociate to
ideal form in the examples I looked at, we might as well
do that in IR for efficiency.
This patch handles the `std::swap` function specialization
for `std::unique_ptr`. Implemented to be very similar to
how `swap` method is handled
Differential Revision: https://reviews.llvm.org/D104300
Add a small power 2 srem test to match existing sdiv test. Add
larger power of 2 test to both.
The larger constant test shows materialization of a constant
for an AND in the RV64 code. We should be using W shift instructions
to match the RV32 code.
It's noteworthy that GCC has the same bug here, which is a bit
surprising. Both Clang and GCC's bug is only for function template
arguments that are themselves templates with default template arguments
(f1<t1<int[, missing_default_here]>>). Probably because function name
matching isn't generally necessary - whereas type matching is necessary
for DWARF consumers to associate declarations and definitions across
translation units, so the bug's been addressed there already - but
continued to exist for function templates since it's fairly benign
there.
I came across this while working on a change that could reconstitute
these pretty printed names based on the rest of the DWARF, reducing the
size of the DWARF by not having to encode all the template parameters in
the name string. That reconstitution code can't tell the difference
between a defaulted argument or not, so couldn't create the current
buggy-ish output.
Making the names more consistent between direct and indirect references,
and between function and class templates seems all to the good.
(I fixed the function template version of this a few years back in
9fdd09a4cc - clearly I should've looked
more closely and generalized the code better so it only had to be fixed
once - well, doing that here now)
llvm::KnownBits::byteSwap() and reverse() don't modify in-place, so
we weren't actually computing anything. This was causing a miscompile on an
arm64 stage2 bootstrap clang build.