Commit Graph

53 Commits

Author SHA1 Message Date
Joseph Huber a9fd8b9113 [LinkerWrapper] Fix calls to deleted Error constructor on older compilers
Summary:
A recent patch added some new code paths to the linker wrapper. Older
compilers seem to have problems with returning errors wrapped in
an Excepted type without explicitly moving them. This caused failures in
some of the buildbots. This patch fixes that.
2022-06-22 09:39:23 -04:00
Joseph Huber 958a885050 [LinkerWrapper] Rework the linker wrapper and use owning binaries
The linker wrapper currently eagerly extracts all identified offloading
binaries to a file. This isn't ideal because we will soon open these
files again to examine their symbols for LTO and other things.
Additionally, we may not use every extracted file in the case of static
libraries. This would be very noisy in the case of static libraries that
may contain code for several targets not participating in the current
link.

Recent changes allow us to treat an Offloading binary as a standard
binary class. So that allows us to use an OwningBinary to model the
file. Now we keep it in memory and only write it once we know which
files will be participating in the final link job. This also reworks a
lot of the structure around how we handle this by removing the old
DeviceFile class.

The main benefit from this is that the following doesn't output 32+ files and
instead will only output a single temp file for the linked module.
```
$ clang input.c -fopenmp --offload-arch=sm_70 -foffload-lto -save-temps
```

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D127246
2022-06-22 09:24:10 -04:00
Fangrui Song 95a134254a Remove unneeded cl::ZeroOrMore for cl::opt/cl::list options 2022-06-05 01:07:51 -07:00
Fangrui Song d0d1c416cb Remove unneeded cl::ZeroOrMore for cl::list options 2022-06-04 23:51:13 -07:00
Fangrui Song 734c223445 [clang-link-wrapper] Remove unneeded cl::ZeroOrMore for cl::opt options. NFC
Similar to 557efc9a8b
2022-06-03 22:02:11 -07:00
Joseph Huber 3723868d9e [OpenMP] Fix file arguments for embedding bitcode in the linker wrapper
Summary:
The linker wrapper supports embedding bitcode images instead of linked
device images to facilitate JIT in the device runtime. However, we were
incorrectly passing in the file twice when this option was set. This
patch makes sure we only use the intermediate result of the LTO pass and
don't add the final output to the full job.

In the future we will want to add both of these andle handle that
accoridngly to allow the runtime to either use the AoT compiled version
or JIT compile the bitcode version if availible.
2022-05-24 13:45:52 -04:00
Joseph Huber f37101983f [OpenMP] Add `-Xoffload-linker` to forward input to the device linker
We use the clang-linker-wrapper to perform device linking of embedded
offloading object files. This is done by generating those jobs inside of
the linker-wrapper itself. This patch adds an argument in Clang and the
linker-wrapper that allows users to forward input to the device linking
phase. This can either be done for every device linker, or for a
specific target triple. We use the `-Xoffload-linker <arg>` and the
`-Xoffload-linker-<triple> <arg>` syntax to accomplish this.

Reviewed By: markdewing, tra

Differential Revision: https://reviews.llvm.org/D126226
2022-05-24 09:11:02 -04:00
Joseph Huber 8a0fb965f6 [LinkerWrapper] Group static libraries in their own buffer
Summary:
Static libraries need to be handled differently from regular inpout
files, namely they are loaded lazily. Previously we used a flag to
indicate a file camm from a static library. This patch simplifies this
by simply keeping a different array that contains the static libraries
so we don't need to parse them out again.
2022-05-12 20:45:49 -04:00
Joseph Huber 1bfa88d0c5 [LinkerWrapper] Remove stripping features from the linker wrapper
Summary:
The linker wrapper previously had functionality to strip the sections
manually. We don't use this at all because this is much better done by
the linker via the `SHF_EXCLUDE` flag. This patch simply removes the
support for thi sfeature to simplify the code.
2022-05-12 20:45:49 -04:00
Joseph Huber 42a1fb5ca5 [LinkerWrapper][Fix} Fix bad alignment from extracted archive members
Summary:
We use embedded binaries to extract offloading device code from the host
fatbinary. This uses a binary format whose necessary alignment is
eight bytes. The alignment is included within the ELF section type so
the data extracted from the ELF should always be aligned at that amount.
However, if this file was extraqcted from a static archive, it was being
sent as an offset in the archive file which did not have the same
alignment guaruntees as the ELF file. This was causing errors in the
UB-sanitizer build as it would occasionally try to access a misaligned
address. To fix this, I simply copy the memory directly to a new buffer
which is guarnteed to have worst-case alignment of 16 in the case that
it's not properly aligned.
2022-05-11 16:56:41 -04:00
Joseph Huber f49d576a88 [CUDA] Add wrapper code generation for registering CUDA images
This patch adds the necessary code generation to create the wrapper code
that registers all the globals in CUDA. We create the necessary
functions and iterate through the list of
`__start_cuda_offloading_entries` to find which globals must be
registered. This is very similar to the code generation done currently
in Clang for non-rdc builds, but here we are registering a fully linked
fatbinary and finding the globals via the above sections.

With this we should be able to fully support basic RDC / LTO building of CUDA
code.

It's also worth noting that this does not include the necessary PTX to JIT the
image, so to use this support the offloading architecture must match the
system's architecture.

Depends on D123810

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D123812
2022-05-11 07:30:25 -04:00
Joseph Huber e7858a9fab [Cuda] Add initial support for wrapping CUDA images in the new driver.
This patch adds the initial support for wrapping CUDA images. This
requires changing some of the logic for how we bundle images. We now
need to copy the image for all kinds that are active for the
architecture. Then we need to run a separate wrapping job if the Kind is
Cuda. For cuda wrapping we need to use the `fatbinary` program from the
CUDA SDK to bundle all the binaries together. This is then passed to a
new function to perfom the actual module code generation that will be
implemented in a later patch.

Depends on D120273 D123471

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D123810
2022-05-11 07:30:23 -04:00
Joseph Huber e12905b4d5 [OpenMP] Add basic support for properly handling static libraries
Currently we handle static libraries like any other object in the
linker wrapper. However, this does not preserve the sematnics that
dictate static libraries should be lazily loaded as the symbols are
needed. This allows us to ignore linking in architectures that are not
used by the main application being compiled. This patch adds the basic
support for detecting if a file came from a static library, and only
including it in the link job if it's used by other object files.

This patch only adds the basic support, to be more correct we should
check the symbols and only inclue the library if the link job contains
symbols that are needed. Ideally we could just put this on the linker
itself, but nvlink doesn't seem to support `.a` files.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D125092
2022-05-06 11:20:58 -04:00
Joseph Huber 46a5a8029e [OpenMP] Fix save-temps name in linker wrapper
Summary:
The wrapped registration code had a typo in the save-temps version of
the name.
2022-05-03 20:51:05 -04:00
Joseph Huber 9f7ac522ae [OpenMP] Fix printing commands twice in verbose mode
Summary:
A previous patch merged the command execution and printing into a helper
function. The old printing code wasn't removed causing each to be
printed twice.
2022-04-29 23:06:22 -04:00
Joseph Huber d9c64d33b9 [OpenMP] Allow CUDA to be linked with OpenMP using the new driver
After basic support for embedding and handling CUDA files was added to
the new driver, we should be able to call CUDA functions from OpenMP
code. This patch makes the necessary changes to successfuly link in CUDA
programs that were compiled using the new driver. With this patch it
should be possible to compile device-only CUDA code (no kernels) and
call it from OpenMP as follows:

```
$ clang++ cuda.cu -fopenmp-new-driver -offload-arch=sm_70 -c
$ clang++ openmp.cpp cuda.o -fopenmp-new-driver -fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_70
```

Currently this requires using a host variant to suppress the generation
of a CPU-side fallback call.

Depends on D120272

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120273
2022-04-29 11:38:40 -04:00
Joseph Huber 2fb131668f [OpenMP] Fix incorrect path taken when searching for LLD for offloading
Summary:
A previous patch updated the path searching in the linker wrapper. I
made an error and caused `lld`, which is necessary to link AMDGPU
images, to not be found on some systems. This patch fixes this by
correctly searching that linker-wrapper's binary path first again.
2022-04-26 10:51:04 -04:00
Joseph Huber 3530c35c66 [OpenMP] Use CUDA's non-RDC mode when LTO has whole program visibility
When we do LTO we consider ourselves to have whole program visibility if
every single input file we have contains LLVM bitcode. If we have whole
program visibliity then we can create a single image and utilize CUDA's
non-RDC mode by not passing `-c` to `ptxas` and ignoring the `nvlink`
job. This should be faster for some situations and also saves us the
time executing `nvlink`.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D124292
2022-04-23 12:42:40 -04:00
Joseph Huber dbb10f7097 [OpenMP] Fix deleted move constructor failing on some compiles
Summary:
A previous commit added some new errors that were not correctly casted
to an r-value. This doesn't work on some compilers.
2022-04-19 18:40:15 -04:00
Joseph Huber 260c5df2d5 [OpenMP] Add better testing for the linker wrapper
The linker wrapper is used to perform linking and wrapping of embedded
device object files. Currently its internals are not able to be tested
easily. This patch adds the `--dry-run` and `--print-wrapped-module`
options to investigate the link jobs that will be run along with the
wrapped code that will be created to register the binaries.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D124039
2022-04-19 18:37:09 -04:00
Joseph Huber 33b604d1c3 [OpenMP] Fix linting diagnostics in the linker wrapper
Summary:
A previous patch had some linter warnings that should've been addressed.
2022-04-15 21:19:29 -04:00
Joseph Huber 984a0dc386 [OpenMP] Use new offloading binary when embedding offloading images
The previous patch introduced the offloading binary format so we can
store some metada along with the binary image. This patch introduces
using this inside the linker wrapper and Clang instead of the previous
method that embedded the metadata in the section name.

Differential Revision: https://reviews.llvm.org/D122683
2022-04-15 20:35:26 -04:00
Joseph Huber cac81161ed [OpenMP] Don't manually strip sections in the linker wrapper
Summary:
The changes in D122987 ensures that the offloading sections always have
the SHF_EXCLUDE flag. This means that we do not need to manually strip
these sections for ELF or COFF targets.
2022-04-15 20:35:25 -04:00
Joseph Huber a1d57fc225 [OpenMP] Do not use the default pipeline without optimizations
Summary:
A previous patch added the option to use the default pipeline when
perfomring LTO rather than the regular LTO pipeline. This greatly
improved performance regressions we were observing with the LTO
pipeline. However, this should not be used if the user explicitly
disables optimizations as the default pipeline expects some
optimizatoins to be perfomed.
2022-04-11 17:27:38 -04:00
Joseph Huber 69a77771a9 [OpenMP] Make linker wrapper thin-lto default thread count use all
Summary:
Currently there is no option to configure the number of thin-backend
threads to use when performing thin-lto on the device, but we should
default to use all the threads rather than just one. In the future we
should use the same arguments that gold / lld use and parse it here.
2022-04-01 09:44:28 -04:00
Joseph Huber 5856f30b5a [LTO] Add configuartion option to use default optimization pipeline
This patch adds a configuration option to simply use the default pass
pipeline in favor of the LTO-specific one. We observed some severe
performance penalties when uding device-side LTO for OpenMP offloading
applications caused by the LTO-pass pipeline. This is primarily because
OpenMP uses an LLVM bitcode library to implement a GPU runtime library.
In a standard compilation we link this bitcode library into each source
file and optimize it with the default pipeline. When performing LTO we
link it late with all the files, but the bitcode library never has the
regular optimization pipeline applied to it so we miss a few
optimizations just using the LTO pipeline to optimize it.

I'm not committed to this solution, but it's the easiest method to solve
this performance regression when using LTO without changing the
optimizatin pipeline for other users.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D122133
2022-03-22 09:28:45 -04:00
Joseph Huber 9f89769cd7 [Clang] Add offload kind to embedded offload object
This patch adds the offload kind to the embedded section name in
preparation for offloading to different kinda like CUDA or HIP.

Depends on D120288

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120271
2022-03-14 20:08:27 -04:00
Joseph Huber 06b336c4cd [OpenMP] Implement dense map info for device file
This patch implements a DenseMap info struct for the device file type.
This is used to help grouping device files that have the same triple and
architecture. Because of this the filename, which will always be unique
for each file, is not used.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D120288
2022-03-14 20:08:26 -04:00
Joseph Huber 3f7c3ff90e [OpenMP] Handle sysroot option in offloading linker wrapper
Summary:
This patch correctly handles the `--sysroot=` option when passed to the
linker wrapper. This allows users to correctly find libraries that may
contain offloading code if using this option.
2022-03-02 13:02:41 -05:00
Joseph Huber d5b2055769 [OpenMP] Add verbose output for linker wrapper
Summary;
This path adds printing support for the linker wrapper. When the user
passes `-v` it will not print the commands used by the linker wrapper to
indicate to the user what is happening during the linking.
2022-02-28 13:28:19 -05:00
Joseph Huber 6a0b78af91 [OpenMP] Remove static allocator in linker wrapper
Summary:
We don't need this static allocator to survive the entire file, the
strings stored have a defined lifetime.
2022-02-22 21:22:19 -05:00
Joseph Huber 55cb84d9fb [OpenMP] Unrecognized objects should not be considered failure
Summary:
This patch removes the error we recieve when attempting to extract
offloading sections. We shouldn't consider this a failure because
extracting bitcode isn't necessarily required.
2022-02-22 21:22:18 -05:00
Joseph Huber 55639c2f7c [OpenMP] Properly save strings when doing LTO
Summary:
We were not previously saving strings when saving symbol names during
LTO symbol resolution. This caused a crash inside the dense set when
some of the strings would rarely be moved internally by the object file
class.
2022-02-16 16:40:39 -05:00
Joseph Huber 24ecafb413 [OpenMP] Add support for CPU offloading in new driver
This patch adds support for linking CPU offloading applications in the
linker wrapper. We generate the necessary linking job using the host
linker's path and library arguments. This may not be true for more
complex offloading schemes, but this is sufficient for now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D119613
2022-02-15 15:05:30 -05:00
Joseph Huber 7ee8bd60f2 [OpenMP] Use executable path when searching for lld
Summary:
This patch changes the ClangLinkerWrapper to use the executable path
when searching for the lld binary. Previously we relied on the program
name. Also not finding 'llvm-strip' is not considered an error anymore
because it is an optional optimization.
2022-02-07 15:09:51 -05:00
Kelvin Li 8ea4aed50a [OpenMP] Add search path for llvm-strip
Add the build directory to the search path for llvm-strip instead
of solely relying on the PATH environment variable setting.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D118965
2022-02-04 22:15:14 -05:00
Joseph Huber 8cc4ca95b0 [OpenMP] Add Cuda path to linker wrapper tool
The linker wrapper tool uses the 'nvlink' and 'ptxas' binaries to link
and assemble device files. Previously we searched for this using the
binaries in the user's path. This didn't work in cases where the user
passed in a specific Cuda path to Clang. This patch changes the linker
wrapper to accept an argument for the Cuda path we can get from Clang.
This should fix #53573.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D118944
2022-02-03 20:39:18 -05:00
Joseph Huber 19fac745e3 [OpenMP] Remove call to 'clang-offload-wrapper' binary
Summary:
This patch removes the system call to the `clang-offload-wrapper` tool
by replicating its functionality in a new file. This improves
performance and makes the future wrapping functionality easier to
change.

Differential Revision: https://reviews.llvm.org/D118198
2022-01-31 23:11:43 -05:00
Joseph Huber eb6ddf288c [OpenMP] Replace sysmtem call to `llc` with target machine
Summary:
This patch replaces the system call to the `llc` binary with a library
call to the target machine interface. This should be faster than
relying on an external system call to compile the final wrapper binary.

Differential Revision: https://reviews.llvm.org/D118197
2022-01-31 23:11:42 -05:00
Joseph Huber 9375f1563e [OpenMP] Cleanup the Linker Wrapper
Summary:
Various changes and cleanup for the Linker Wrapper tool.
2022-01-31 23:11:42 -05:00
Joseph Huber 58dc981e08 [OpenMP] Include the executable name in the temporary files
Summary:
This parses the executable name out of the linker arguments so we can
use it to give more informative temporary file names and so we don't
accidentally use it for device linking.
2022-01-31 23:11:42 -05:00
Joseph Huber bf499c58af [OpenMP] Implement save temps functionality in linker wrapper
Summary:
This patch implements the `-save-temps` flag for the linker wrapper.
This allows the user to inspect the intermeditary outpout that the
linker wrapper creates.
2022-01-31 23:11:42 -05:00
Joseph Huber a47b1cf306 [OpenMP] Embed bitcode after optimizations instead of linking
Summary:
Various changes to the linker wrapper, and the bitcode embedding is not
done after the optimizations have run rather than after linking is done.
This saves time when doing JIT.
2022-01-31 23:11:42 -05:00
Joseph Huber 46d019041c [OpenMP] Improve symbol resolution for OpenMP Offloading LTO
This patch improves the symbol resolution done for LTO with offloading
applications. The symbol resolution done here allows the LTO backend to
internalize more functions. The symbol resoltion done is a simplified
view that does not take into account various options like `--wrap` or
`--dyanimic-list` and always assumes we are creating a shared object.
The actual target may be an executable, but semantically it is used as a
shared object because certain objects need to be visible outside of the
executable when they are read by the OpenMP plugin.

Depends on D117246

Differential Revision: https://reviews.llvm.org/D118155
2022-01-31 23:11:42 -05:00
Joseph Huber ce16ca3c74 [OpenMP] Add support for linking AMDGPU images
This patch adds support for linking AMDGPU images using the LLD binary.
AMDGPU files are always bitcode images and will always use the LTO
backend. Additionally we now pass the default architecture found with
the `amdgpu-arch` tool to the argument list.

Depends on D117156

Differential Revision: https://reviews.llvm.org/D117246
2022-01-31 23:11:42 -05:00
Joseph Huber cb7cfaec71 [OpenMP] Add extra flag handling to linker wrapper
This patch adds support for a few extra flags in the linker wrapper,
such as debugging flags, verbose output, and passing arguments to ptxas. We also
now forward pass remarks to the LLVM backend so they will show up in the LTO
passes.

Depends on D117049

Differential Revision: https://reviews.llvm.org/D117156
2022-01-31 23:11:41 -05:00
Joseph Huber f28c3153ee [OpenMP] Add support for embedding bitcode images in wrapper tool
Summary;
This patch adds support for embedding device images in the linker
wrapper tool. This will be used for performing JIT functionality in the
future.

Depends on D117048

Differential Revision: https://reviews.llvm.org/D117049
2022-01-31 23:11:41 -05:00
Joseph Huber 3762111aa9 [OpenMP] Link the bitcode library late for device LTO
Summary:
This patch adds support for linking the OpenMP device bitcode library
late when doing LTO. This simply passes it in as an additional device
file when doing the final device linking phase with LTO. This has the
advantage that we don't link it multiple times, and the device
references do not get inlined and prevent us from doing needed OpenMP
optimizations when we have visiblity of the whole module.
Fix some failings where the implicit conversion of an Error to an
Expected triggered the deleted copy constructor.

Depends on D116675

Differential revision: https://reviews.llvm.org/D117048
2022-01-31 23:11:41 -05:00
Joseph Huber c732c3df74 [OpenMP] Initial Implementation of LTO and bitcode linking in linker wrapper
This patch implements the fist support for handling LTO in the
offloading pipeline. The flag `-foffload-lto` is used to control if
bitcode is embedded into the device. If bitcode is found in the device,
the extracted files will be sent to the LTO pipeline to be linked and
sent to the backend. This implementation does not separately link the
device bitcode libraries yet.

Depends on D116675

Differential Revision: https://reviews.llvm.org/D116975
2022-01-31 23:11:41 -05:00
Joseph Huber 0e82c7553b [OpenMP] Search for static libraries in offload linker tool
This patch adds support for searching through the linker library paths
to identify static libraries that may contain device code. If device
code is present it will be extracted. This should ideally fully support
static linking with OpenMP offloading.

Depends on D116627

Differential Revision: https://reviews.llvm.org/D116675
2022-01-31 23:11:41 -05:00