[Clang][Docs] Update information on the new driver now that it's default

Summary:
This patch updates some of the documentation on the new driver now that
it's the default. Also the ABI for embedding these images changed.
This commit is contained in:
Joseph Huber 2022-04-18 15:01:55 -04:00
parent ae23be84cb
commit 15e62062c0
3 changed files with 21 additions and 25 deletions

View File

@ -801,7 +801,7 @@ Generate Interface Stub Files, emit merged text not binary.
Extract API information Extract API information
.. option:: -fopenmp-new-driver .. option:: -fopenmp-new-driver, fno-openmp-new-driver
Use the new driver for OpenMP offloading. Use the new driver for OpenMP offloading.

View File

@ -17,11 +17,6 @@ application using Clang.
OpenMP Offloading OpenMP Offloading
================= =================
.. note::
This documentation describes Clang's behavior using the new offloading
driver. This currently must be enabled manually using
``-fopenmp-new-driver``.
Clang supports OpenMP target offloading to several different architectures such Clang supports OpenMP target offloading to several different architectures such
as NVPTX, AMDGPU, X86_64, Arm, and PowerPC. Offloading code is generated by as NVPTX, AMDGPU, X86_64, Arm, and PowerPC. Offloading code is generated by
Clang and then executed using the ``libomptarget`` runtime and the associated Clang and then executed using the ``libomptarget`` runtime and the associated
@ -226,15 +221,15 @@ A fat binary is a binary file that contains information intended for another
device. We create a fat object by embedding the output of the device compilation device. We create a fat object by embedding the output of the device compilation
stage into the host as a named section. The output from the device compilation stage into the host as a named section. The output from the device compilation
is passed to the host backend using the ``-fembed-offload-object`` flag. This is passed to the host backend using the ``-fembed-offload-object`` flag. This
inserts the object as a global in the host's IR. The section name contains the embeds the device image into the ``.llvm.offloading`` section using a special
target triple and architecture that the data corresponds to for later use. binary format that behaves like a string map. This binary format is used to
Typically we will also add an extra string to the section name to prevent it bundle metadata about the image so the linker can associate the proper device
from being merged with other sections if the user performs relocatable linking linking action with the image. Each device image will start with the magic bytes
on the object. ``0x10FF10AD``.
.. code-block:: llvm .. code-block:: llvm
@llvm.embedded.object = private constant [1 x i8] c"\00", section ".llvm.offloading.nvptx64.sm_70." @llvm.embedded.object = private constant [1 x i8] c"\00", section ".llvm.offloading"
The device code will then be placed in the corresponding section one the backend The device code will then be placed in the corresponding section one the backend
is run on the host, creating a fat object. Using fat objects allows us to treat is run on the host, creating a fat object. Using fat objects allows us to treat
@ -250,7 +245,7 @@ will use this information when :ref:`Device Linking`.
+==================================+====================================================================+ +==================================+====================================================================+
| omp_offloading_entries | Offloading entry information (see :ref:`table-tgt_offload_entry`) | | omp_offloading_entries | Offloading entry information (see :ref:`table-tgt_offload_entry`) |
+----------------------------------+--------------------------------------------------------------------+ +----------------------------------+--------------------------------------------------------------------+
| .llvm.offloading.<triple>.<arch> | Embedded device object file for the target device and architecture | | .llvm.offloading | Embedded device object file for the target device and architecture |
+----------------------------------+--------------------------------------------------------------------+ +----------------------------------+--------------------------------------------------------------------+
.. _Device Linking: .. _Device Linking:
@ -262,9 +257,10 @@ Objects containing :ref:`table-offloading_sections` require special handling to
create an executable device image. This is done using a Clang tool, see create an executable device image. This is done using a Clang tool, see
:doc:`ClangLinkerWrapper` for more information. This tool works as a wrapper :doc:`ClangLinkerWrapper` for more information. This tool works as a wrapper
over the host linking job. It scans the input object files for the offloading over the host linking job. It scans the input object files for the offloading
sections and runs the appropriate device linking action. The linked device image section ``.llvm.offloading``. The device files stored in this section are then
is then :ref:`wrapped <Device Binary Wrapping>` to create the symbols used to load the extracted and passed tot he appropriate linking job. The linked device image is
device image and link it with the host. then :ref:`wrapped <Device Binary Wrapping>` to create the symbols used to load
the device image and link it with the host.
The linker wrapper tool supports linking bitcode files through link time The linker wrapper tool supports linking bitcode files through link time
optimization (LTO). This is used whenever the object files embedded in the host optimization (LTO). This is used whenever the object files embedded in the host
@ -438,19 +434,22 @@ This code is compiled using the following Clang flags.
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 zaxpy.cpp -c $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 zaxpy.cpp -c
The output section in the object file can be seen using the ``readelf`` utility The output section in the object file can be seen using the ``readelf`` utility.
The ``.llvm.offloading`` section has the ``SHF_EXCLUDE`` flag so it will be
removed from the final executable or shared library by the linker.
.. code-block:: text .. code-block:: text
$ llvm-readelf -WS zaxpy.o $ llvm-readelf -WS zaxpy.o
[Nr] Name Type Section Headers:
... [Nr] Name Type Address Off Size ES Flg Lk Inf Al
[34] omp_offloading_entries PROGBITS [11] omp_offloading_entries PROGBITS 0000000000000000 0001f0 000040 00 A 0 0 1
[35] .llvm.offloading.nvptx64-nvidia-cuda.sm_70 PROGBITS [12] .llvm.offloading PROGBITS 0000000000000000 000260 030950 00 E 0 0 8
Compiling this file again will invoke the ``clang-linker-wrapper`` utility to Compiling this file again will invoke the ``clang-linker-wrapper`` utility to
extract and link the device code stored at the section named extract and link the device code stored at the section named
``.llvm.offloading.nvptx64-nvidia-cuda.sm_70`` and then use entries stored in ``.llvm.offloading`` and then use entries stored in
the section named ``omp_offloading_entries`` to create the symbols necessary for the section named ``omp_offloading_entries`` to create the symbols necessary for
``libomptarget`` to register the device image and call the entry function. ``libomptarget`` to register the device image and call the entry function.

View File

@ -95,9 +95,6 @@ Features not supported or with limited support for Cuda devices
- Nested parallelism: inner parallel regions are executed sequentially. - Nested parallelism: inner parallel regions are executed sequentially.
- Static linking of libraries containing device code is not supported without
explicitly using ``-fopenmp-new-driver``.
- Automatic translation of math functions in target regions to device-specific - Automatic translation of math functions in target regions to device-specific
math functions is not implemented yet. math functions is not implemented yet.