forked from OSchip/llvm-project
[Clang][Docs] Update information on the new driver now that it's default
Summary: This patch updates some of the documentation on the new driver now that it's the default. Also the ABI for embedding these images changed.
This commit is contained in:
parent
ae23be84cb
commit
15e62062c0
|
@ -801,7 +801,7 @@ Generate Interface Stub Files, emit merged text not binary.
|
|||
|
||||
Extract API information
|
||||
|
||||
.. option:: -fopenmp-new-driver
|
||||
.. option:: -fopenmp-new-driver, fno-openmp-new-driver
|
||||
|
||||
Use the new driver for OpenMP offloading.
|
||||
|
||||
|
|
|
@ -17,11 +17,6 @@ application using Clang.
|
|||
OpenMP Offloading
|
||||
=================
|
||||
|
||||
.. note::
|
||||
This documentation describes Clang's behavior using the new offloading
|
||||
driver. This currently must be enabled manually using
|
||||
``-fopenmp-new-driver``.
|
||||
|
||||
Clang supports OpenMP target offloading to several different architectures such
|
||||
as NVPTX, AMDGPU, X86_64, Arm, and PowerPC. Offloading code is generated by
|
||||
Clang and then executed using the ``libomptarget`` runtime and the associated
|
||||
|
@ -226,15 +221,15 @@ A fat binary is a binary file that contains information intended for another
|
|||
device. We create a fat object by embedding the output of the device compilation
|
||||
stage into the host as a named section. The output from the device compilation
|
||||
is passed to the host backend using the ``-fembed-offload-object`` flag. This
|
||||
inserts the object as a global in the host's IR. The section name contains the
|
||||
target triple and architecture that the data corresponds to for later use.
|
||||
Typically we will also add an extra string to the section name to prevent it
|
||||
from being merged with other sections if the user performs relocatable linking
|
||||
on the object.
|
||||
embeds the device image into the ``.llvm.offloading`` section using a special
|
||||
binary format that behaves like a string map. This binary format is used to
|
||||
bundle metadata about the image so the linker can associate the proper device
|
||||
linking action with the image. Each device image will start with the magic bytes
|
||||
``0x10FF10AD``.
|
||||
|
||||
.. code-block:: llvm
|
||||
|
||||
@llvm.embedded.object = private constant [1 x i8] c"\00", section ".llvm.offloading.nvptx64.sm_70."
|
||||
@llvm.embedded.object = private constant [1 x i8] c"\00", section ".llvm.offloading"
|
||||
|
||||
The device code will then be placed in the corresponding section one the backend
|
||||
is run on the host, creating a fat object. Using fat objects allows us to treat
|
||||
|
@ -250,7 +245,7 @@ will use this information when :ref:`Device Linking`.
|
|||
+==================================+====================================================================+
|
||||
| omp_offloading_entries | Offloading entry information (see :ref:`table-tgt_offload_entry`) |
|
||||
+----------------------------------+--------------------------------------------------------------------+
|
||||
| .llvm.offloading.<triple>.<arch> | Embedded device object file for the target device and architecture |
|
||||
| .llvm.offloading | Embedded device object file for the target device and architecture |
|
||||
+----------------------------------+--------------------------------------------------------------------+
|
||||
|
||||
.. _Device Linking:
|
||||
|
@ -262,9 +257,10 @@ Objects containing :ref:`table-offloading_sections` require special handling to
|
|||
create an executable device image. This is done using a Clang tool, see
|
||||
:doc:`ClangLinkerWrapper` for more information. This tool works as a wrapper
|
||||
over the host linking job. It scans the input object files for the offloading
|
||||
sections and runs the appropriate device linking action. The linked device image
|
||||
is then :ref:`wrapped <Device Binary Wrapping>` to create the symbols used to load the
|
||||
device image and link it with the host.
|
||||
section ``.llvm.offloading``. The device files stored in this section are then
|
||||
extracted and passed tot he appropriate linking job. The linked device image is
|
||||
then :ref:`wrapped <Device Binary Wrapping>` to create the symbols used to load
|
||||
the device image and link it with the host.
|
||||
|
||||
The linker wrapper tool supports linking bitcode files through link time
|
||||
optimization (LTO). This is used whenever the object files embedded in the host
|
||||
|
@ -438,19 +434,22 @@ This code is compiled using the following Clang flags.
|
|||
|
||||
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 zaxpy.cpp -c
|
||||
|
||||
The output section in the object file can be seen using the ``readelf`` utility
|
||||
The output section in the object file can be seen using the ``readelf`` utility.
|
||||
The ``.llvm.offloading`` section has the ``SHF_EXCLUDE`` flag so it will be
|
||||
removed from the final executable or shared library by the linker.
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
$ llvm-readelf -WS zaxpy.o
|
||||
[Nr] Name Type
|
||||
...
|
||||
[34] omp_offloading_entries PROGBITS
|
||||
[35] .llvm.offloading.nvptx64-nvidia-cuda.sm_70 PROGBITS
|
||||
Section Headers:
|
||||
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
|
||||
[11] omp_offloading_entries PROGBITS 0000000000000000 0001f0 000040 00 A 0 0 1
|
||||
[12] .llvm.offloading PROGBITS 0000000000000000 000260 030950 00 E 0 0 8
|
||||
|
||||
|
||||
Compiling this file again will invoke the ``clang-linker-wrapper`` utility to
|
||||
extract and link the device code stored at the section named
|
||||
``.llvm.offloading.nvptx64-nvidia-cuda.sm_70`` and then use entries stored in
|
||||
``.llvm.offloading`` and then use entries stored in
|
||||
the section named ``omp_offloading_entries`` to create the symbols necessary for
|
||||
``libomptarget`` to register the device image and call the entry function.
|
||||
|
||||
|
|
|
@ -95,9 +95,6 @@ Features not supported or with limited support for Cuda devices
|
|||
|
||||
- Nested parallelism: inner parallel regions are executed sequentially.
|
||||
|
||||
- Static linking of libraries containing device code is not supported without
|
||||
explicitly using ``-fopenmp-new-driver``.
|
||||
|
||||
- Automatic translation of math functions in target regions to device-specific
|
||||
math functions is not implemented yet.
|
||||
|
||||
|
|
Loading…
Reference in New Issue