forked from OSchip/llvm-project
[Clang][Docs] Update information on the new driver now that it's default
Summary: This patch updates some of the documentation on the new driver now that it's the default. Also the ABI for embedding these images changed.
This commit is contained in:
parent
ae23be84cb
commit
15e62062c0
|
@ -801,7 +801,7 @@ Generate Interface Stub Files, emit merged text not binary.
|
||||||
|
|
||||||
Extract API information
|
Extract API information
|
||||||
|
|
||||||
.. option:: -fopenmp-new-driver
|
.. option:: -fopenmp-new-driver, fno-openmp-new-driver
|
||||||
|
|
||||||
Use the new driver for OpenMP offloading.
|
Use the new driver for OpenMP offloading.
|
||||||
|
|
||||||
|
|
|
@ -17,11 +17,6 @@ application using Clang.
|
||||||
OpenMP Offloading
|
OpenMP Offloading
|
||||||
=================
|
=================
|
||||||
|
|
||||||
.. note::
|
|
||||||
This documentation describes Clang's behavior using the new offloading
|
|
||||||
driver. This currently must be enabled manually using
|
|
||||||
``-fopenmp-new-driver``.
|
|
||||||
|
|
||||||
Clang supports OpenMP target offloading to several different architectures such
|
Clang supports OpenMP target offloading to several different architectures such
|
||||||
as NVPTX, AMDGPU, X86_64, Arm, and PowerPC. Offloading code is generated by
|
as NVPTX, AMDGPU, X86_64, Arm, and PowerPC. Offloading code is generated by
|
||||||
Clang and then executed using the ``libomptarget`` runtime and the associated
|
Clang and then executed using the ``libomptarget`` runtime and the associated
|
||||||
|
@ -226,15 +221,15 @@ A fat binary is a binary file that contains information intended for another
|
||||||
device. We create a fat object by embedding the output of the device compilation
|
device. We create a fat object by embedding the output of the device compilation
|
||||||
stage into the host as a named section. The output from the device compilation
|
stage into the host as a named section. The output from the device compilation
|
||||||
is passed to the host backend using the ``-fembed-offload-object`` flag. This
|
is passed to the host backend using the ``-fembed-offload-object`` flag. This
|
||||||
inserts the object as a global in the host's IR. The section name contains the
|
embeds the device image into the ``.llvm.offloading`` section using a special
|
||||||
target triple and architecture that the data corresponds to for later use.
|
binary format that behaves like a string map. This binary format is used to
|
||||||
Typically we will also add an extra string to the section name to prevent it
|
bundle metadata about the image so the linker can associate the proper device
|
||||||
from being merged with other sections if the user performs relocatable linking
|
linking action with the image. Each device image will start with the magic bytes
|
||||||
on the object.
|
``0x10FF10AD``.
|
||||||
|
|
||||||
.. code-block:: llvm
|
.. code-block:: llvm
|
||||||
|
|
||||||
@llvm.embedded.object = private constant [1 x i8] c"\00", section ".llvm.offloading.nvptx64.sm_70."
|
@llvm.embedded.object = private constant [1 x i8] c"\00", section ".llvm.offloading"
|
||||||
|
|
||||||
The device code will then be placed in the corresponding section one the backend
|
The device code will then be placed in the corresponding section one the backend
|
||||||
is run on the host, creating a fat object. Using fat objects allows us to treat
|
is run on the host, creating a fat object. Using fat objects allows us to treat
|
||||||
|
@ -250,7 +245,7 @@ will use this information when :ref:`Device Linking`.
|
||||||
+==================================+====================================================================+
|
+==================================+====================================================================+
|
||||||
| omp_offloading_entries | Offloading entry information (see :ref:`table-tgt_offload_entry`) |
|
| omp_offloading_entries | Offloading entry information (see :ref:`table-tgt_offload_entry`) |
|
||||||
+----------------------------------+--------------------------------------------------------------------+
|
+----------------------------------+--------------------------------------------------------------------+
|
||||||
| .llvm.offloading.<triple>.<arch> | Embedded device object file for the target device and architecture |
|
| .llvm.offloading | Embedded device object file for the target device and architecture |
|
||||||
+----------------------------------+--------------------------------------------------------------------+
|
+----------------------------------+--------------------------------------------------------------------+
|
||||||
|
|
||||||
.. _Device Linking:
|
.. _Device Linking:
|
||||||
|
@ -262,9 +257,10 @@ Objects containing :ref:`table-offloading_sections` require special handling to
|
||||||
create an executable device image. This is done using a Clang tool, see
|
create an executable device image. This is done using a Clang tool, see
|
||||||
:doc:`ClangLinkerWrapper` for more information. This tool works as a wrapper
|
:doc:`ClangLinkerWrapper` for more information. This tool works as a wrapper
|
||||||
over the host linking job. It scans the input object files for the offloading
|
over the host linking job. It scans the input object files for the offloading
|
||||||
sections and runs the appropriate device linking action. The linked device image
|
section ``.llvm.offloading``. The device files stored in this section are then
|
||||||
is then :ref:`wrapped <Device Binary Wrapping>` to create the symbols used to load the
|
extracted and passed tot he appropriate linking job. The linked device image is
|
||||||
device image and link it with the host.
|
then :ref:`wrapped <Device Binary Wrapping>` to create the symbols used to load
|
||||||
|
the device image and link it with the host.
|
||||||
|
|
||||||
The linker wrapper tool supports linking bitcode files through link time
|
The linker wrapper tool supports linking bitcode files through link time
|
||||||
optimization (LTO). This is used whenever the object files embedded in the host
|
optimization (LTO). This is used whenever the object files embedded in the host
|
||||||
|
@ -438,19 +434,22 @@ This code is compiled using the following Clang flags.
|
||||||
|
|
||||||
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 zaxpy.cpp -c
|
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 zaxpy.cpp -c
|
||||||
|
|
||||||
The output section in the object file can be seen using the ``readelf`` utility
|
The output section in the object file can be seen using the ``readelf`` utility.
|
||||||
|
The ``.llvm.offloading`` section has the ``SHF_EXCLUDE`` flag so it will be
|
||||||
|
removed from the final executable or shared library by the linker.
|
||||||
|
|
||||||
.. code-block:: text
|
.. code-block:: text
|
||||||
|
|
||||||
$ llvm-readelf -WS zaxpy.o
|
$ llvm-readelf -WS zaxpy.o
|
||||||
[Nr] Name Type
|
Section Headers:
|
||||||
...
|
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
|
||||||
[34] omp_offloading_entries PROGBITS
|
[11] omp_offloading_entries PROGBITS 0000000000000000 0001f0 000040 00 A 0 0 1
|
||||||
[35] .llvm.offloading.nvptx64-nvidia-cuda.sm_70 PROGBITS
|
[12] .llvm.offloading PROGBITS 0000000000000000 000260 030950 00 E 0 0 8
|
||||||
|
|
||||||
|
|
||||||
Compiling this file again will invoke the ``clang-linker-wrapper`` utility to
|
Compiling this file again will invoke the ``clang-linker-wrapper`` utility to
|
||||||
extract and link the device code stored at the section named
|
extract and link the device code stored at the section named
|
||||||
``.llvm.offloading.nvptx64-nvidia-cuda.sm_70`` and then use entries stored in
|
``.llvm.offloading`` and then use entries stored in
|
||||||
the section named ``omp_offloading_entries`` to create the symbols necessary for
|
the section named ``omp_offloading_entries`` to create the symbols necessary for
|
||||||
``libomptarget`` to register the device image and call the entry function.
|
``libomptarget`` to register the device image and call the entry function.
|
||||||
|
|
||||||
|
|
|
@ -95,9 +95,6 @@ Features not supported or with limited support for Cuda devices
|
||||||
|
|
||||||
- Nested parallelism: inner parallel regions are executed sequentially.
|
- Nested parallelism: inner parallel regions are executed sequentially.
|
||||||
|
|
||||||
- Static linking of libraries containing device code is not supported without
|
|
||||||
explicitly using ``-fopenmp-new-driver``.
|
|
||||||
|
|
||||||
- Automatic translation of math functions in target regions to device-specific
|
- Automatic translation of math functions in target regions to device-specific
|
||||||
math functions is not implemented yet.
|
math functions is not implemented yet.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue