The linker wrapper is used to perform linking and wrapping of embedded
device object files. Currently its internals are not able to be tested
easily. This patch adds the `--dry-run` and `--print-wrapped-module`
options to investigate the link jobs that will be run along with the
wrapped code that will be created to register the binaries.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D124039
The previous patch introduced the offloading binary format so we can
store some metada along with the binary image. This patch introduces
using this inside the linker wrapper and Clang instead of the previous
method that embedded the metadata in the section name.
Differential Revision: https://reviews.llvm.org/D122683
Summary:
The changes in D122987 ensures that the offloading sections always have
the SHF_EXCLUDE flag. This means that we do not need to manually strip
these sections for ELF or COFF targets.
Summary:
A previous patch added the option to use the default pipeline when
perfomring LTO rather than the regular LTO pipeline. This greatly
improved performance regressions we were observing with the LTO
pipeline. However, this should not be used if the user explicitly
disables optimizations as the default pipeline expects some
optimizatoins to be perfomed.
Summary:
Currently there is no option to configure the number of thin-backend
threads to use when performing thin-lto on the device, but we should
default to use all the threads rather than just one. In the future we
should use the same arguments that gold / lld use and parse it here.
This patch adds a configuration option to simply use the default pass
pipeline in favor of the LTO-specific one. We observed some severe
performance penalties when uding device-side LTO for OpenMP offloading
applications caused by the LTO-pass pipeline. This is primarily because
OpenMP uses an LLVM bitcode library to implement a GPU runtime library.
In a standard compilation we link this bitcode library into each source
file and optimize it with the default pipeline. When performing LTO we
link it late with all the files, but the bitcode library never has the
regular optimization pipeline applied to it so we miss a few
optimizations just using the LTO pipeline to optimize it.
I'm not committed to this solution, but it's the easiest method to solve
this performance regression when using LTO without changing the
optimizatin pipeline for other users.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D122133
This patch adds the offload kind to the embedded section name in
preparation for offloading to different kinda like CUDA or HIP.
Depends on D120288
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D120271
This patch implements a DenseMap info struct for the device file type.
This is used to help grouping device files that have the same triple and
architecture. Because of this the filename, which will always be unique
for each file, is not used.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D120288
Summary:
This patch correctly handles the `--sysroot=` option when passed to the
linker wrapper. This allows users to correctly find libraries that may
contain offloading code if using this option.
Summary;
This path adds printing support for the linker wrapper. When the user
passes `-v` it will not print the commands used by the linker wrapper to
indicate to the user what is happening during the linking.
Summary:
This patch removes the error we recieve when attempting to extract
offloading sections. We shouldn't consider this a failure because
extracting bitcode isn't necessarily required.
Summary:
We were not previously saving strings when saving symbol names during
LTO symbol resolution. This caused a crash inside the dense set when
some of the strings would rarely be moved internally by the object file
class.
This patch adds support for linking CPU offloading applications in the
linker wrapper. We generate the necessary linking job using the host
linker's path and library arguments. This may not be true for more
complex offloading schemes, but this is sufficient for now.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D119613
Summary:
This patch changes the ClangLinkerWrapper to use the executable path
when searching for the lld binary. Previously we relied on the program
name. Also not finding 'llvm-strip' is not considered an error anymore
because it is an optional optimization.
Add the build directory to the search path for llvm-strip instead
of solely relying on the PATH environment variable setting.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D118965
This header is very large (3M Lines once expended) and was included in location
where dwarf-specific information were not needed.
More specifically, this commit suppresses the dependencies on
llvm/BinaryFormat/Dwarf.h in two headers: llvm/IR/IRBuilder.h and
llvm/IR/DebugInfoMetadata.h. As these headers (esp. the former) are widely used,
this has a decent impact on number of preprocessed lines generated during
compilation of LLVM, as showcased below.
This is achieved by moving some definitions back to the .cpp file, no
performance impact implied[0].
As a consequence of that patch, downstream user may need to manually some extra
files:
llvm/IR/IRBuilder.h no longer includes llvm/BinaryFormat/Dwarf.h
llvm/IR/DebugInfoMetadata.h no longer includes llvm/BinaryFormat/Dwarf.h
In some situations, codes maybe relying on the fact that
llvm/BinaryFormat/Dwarf.h was including llvm/ADT/Triple.h, this hidden
dependency now needs to be explicit.
$ clang++ -E -Iinclude -I../llvm/include ../llvm/lib/Transforms/Scalar/*.cpp -std=c++14 -fno-rtti -fno-exceptions | wc -l
after: 10978519
before: 11245451
Related Discourse thread: https://llvm.discourse.group/t/include-what-you-use-include-cleanup
[0] https://llvm-compile-time-tracker.com/compare.php?from=fa7145dfbf94cb93b1c3e610582c495cb806569b&to=995d3e326ee1d9489145e20762c65465a9caeab4&stat=instructions
Differential Revision: https://reviews.llvm.org/D118781
The linker wrapper tool uses the 'nvlink' and 'ptxas' binaries to link
and assemble device files. Previously we searched for this using the
binaries in the user's path. This didn't work in cases where the user
passed in a specific Cuda path to Clang. This patch changes the linker
wrapper to accept an argument for the Cuda path we can get from Clang.
This should fix#53573.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D118944
Summary:
This patch removes the system call to the `clang-offload-wrapper` tool
by replicating its functionality in a new file. This improves
performance and makes the future wrapping functionality easier to
change.
Differential Revision: https://reviews.llvm.org/D118198
Summary:
This patch replaces the system call to the `llc` binary with a library
call to the target machine interface. This should be faster than
relying on an external system call to compile the final wrapper binary.
Differential Revision: https://reviews.llvm.org/D118197
Summary:
This parses the executable name out of the linker arguments so we can
use it to give more informative temporary file names and so we don't
accidentally use it for device linking.
Summary:
This patch implements the `-save-temps` flag for the linker wrapper.
This allows the user to inspect the intermeditary outpout that the
linker wrapper creates.
Summary:
Various changes to the linker wrapper, and the bitcode embedding is not
done after the optimizations have run rather than after linking is done.
This saves time when doing JIT.
This patch improves the symbol resolution done for LTO with offloading
applications. The symbol resolution done here allows the LTO backend to
internalize more functions. The symbol resoltion done is a simplified
view that does not take into account various options like `--wrap` or
`--dyanimic-list` and always assumes we are creating a shared object.
The actual target may be an executable, but semantically it is used as a
shared object because certain objects need to be visible outside of the
executable when they are read by the OpenMP plugin.
Depends on D117246
Differential Revision: https://reviews.llvm.org/D118155
This patch adds support for linking AMDGPU images using the LLD binary.
AMDGPU files are always bitcode images and will always use the LTO
backend. Additionally we now pass the default architecture found with
the `amdgpu-arch` tool to the argument list.
Depends on D117156
Differential Revision: https://reviews.llvm.org/D117246
This patch adds support for a few extra flags in the linker wrapper,
such as debugging flags, verbose output, and passing arguments to ptxas. We also
now forward pass remarks to the LLVM backend so they will show up in the LTO
passes.
Depends on D117049
Differential Revision: https://reviews.llvm.org/D117156
Summary;
This patch adds support for embedding device images in the linker
wrapper tool. This will be used for performing JIT functionality in the
future.
Depends on D117048
Differential Revision: https://reviews.llvm.org/D117049
Summary:
This patch adds support for linking the OpenMP device bitcode library
late when doing LTO. This simply passes it in as an additional device
file when doing the final device linking phase with LTO. This has the
advantage that we don't link it multiple times, and the device
references do not get inlined and prevent us from doing needed OpenMP
optimizations when we have visiblity of the whole module.
Fix some failings where the implicit conversion of an Error to an
Expected triggered the deleted copy constructor.
Depends on D116675
Differential revision: https://reviews.llvm.org/D117048
This patch implements the fist support for handling LTO in the
offloading pipeline. The flag `-foffload-lto` is used to control if
bitcode is embedded into the device. If bitcode is found in the device,
the extracted files will be sent to the LTO pipeline to be linked and
sent to the backend. This implementation does not separately link the
device bitcode libraries yet.
Depends on D116675
Differential Revision: https://reviews.llvm.org/D116975
This patch adds support for searching through the linker library paths
to identify static libraries that may contain device code. If device
code is present it will be extracted. This should ideally fully support
static linking with OpenMP offloading.
Depends on D116627
Differential Revision: https://reviews.llvm.org/D116675
This patch adds the initial support for linking NVPTX offloading code
using the clang-linker-wrapper tool. This uses the extracted device
files and runs `nvlink` on them. Currently this is then passed to the
existing toolchain for creating linkable OpenMP offloading programs
using `clang-offload-wrapper` and compiling it manually using `llc`.
More work is required to support LTO, Bitcode linking, AMDGPU, and x86
offloading.
Depends on D116545
Differential Revision: https://reviews.llvm.org/D116627
This patchs add support for extracting device offloading code from the
linker's input files. If the file contains a section with the name
`.llvm.offloading.<triple>.<arch>` it will be extracted to a new
temporary file to be linked. Addtionally, the host file containing it
will have the section stripped so it does not remain in the executable
once linked.
Depends on D116544
Differential Revision: https://reviews.llvm.org/D116545
This patch introduces a linker wrapper tool that allows us to preprocess
files before they are sent to the linker. This adds a dummy action and
job to the driver stage that builds the linker command as usual and then
replaces the command line with the wrapper tool.
Depends on D116543
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D116544