From a957634942a48c963a8ed99b1bb90f7b985a3602 Mon Sep 17 00:00:00 2001 From: Joseph Huber Date: Tue, 12 Jan 2021 19:19:39 -0500 Subject: [PATCH] [OpenMP] Add documentation for error messages and release notes Add extra information to the runtime page describing the error messages and add information to the release notes for clang 12.0 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D94562 --- openmp/docs/ReleaseNotes.rst | 27 +++++++++- openmp/docs/design/Runtimes.rst | 91 +++++++++++++++++++++++++++++++++ 2 files changed, 116 insertions(+), 2 deletions(-) diff --git a/openmp/docs/ReleaseNotes.rst b/openmp/docs/ReleaseNotes.rst index de0f2018032c..7f40d3c81510 100644 --- a/openmp/docs/ReleaseNotes.rst +++ b/openmp/docs/ReleaseNotes.rst @@ -13,11 +13,34 @@ Introduction ============ This document contains the release notes for the OpenMP runtime, release 12.0.0. -Here we describe the status of openmp, including major improvements -from the previous release. All openmp releases may be downloaded +Here we describe the status of OpenMP, including major improvements +from the previous release. All OpenMP releases may be downloaded from the `LLVM releases web site `_. Non-comprehensive list of changes in this release ================================================= +- Extended the ``libomptarget`` API functions to include source location + information and OpenMP target mapper support. This allows ``libomptarget`` to + know the source location of the OpenMP region it is executing, as well as the + name and declarations of all the variables used inside the region. Each + function generated now uses its ``mapper`` variant. The old API calls now call + into the new API functions with ``nullptr`` arguments for backwards + compatibility with old binaries. Source location information for + ``libomptarget`` is now generated by Clang at any level of debugging + information. +- Added improved error messages for ``libomptarget`` and ``CUDA`` plugins. Error + messages are now presented without requiring a debug build of + ``libomptarget``. The newly added source location information can also be used + to identify which OpenMP target region the failure occurred in. More + information can be found :ref:`here `. + +- Added additional environment variables to control output from the + ``libomptarget`` runtime library. ``LIBOMPTARGET_PROFILE`` to + generate time profile output similar to Clang's ``-ftime-trace`` option. + ``LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD`` sets the threshold size for which + the ``libomptarget`` memory manager will handle the allocation. + ``LIBOMPTARGET_INFO`` allows the user to request certain information from the + ``libomptarget`` runtime using a 32-bit field. A full description of each + environment variable is described :ref:`here `. diff --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst index 1d52b6b8378c..85031c66f442 100644 --- a/openmp/docs/design/Runtimes.rst +++ b/openmp/docs/design/Runtimes.rst @@ -16,6 +16,8 @@ the LLVM/OpenMP host runtime, aka. `libomp.so`, is available as a `pdf LLVM/OpenMP Target Host Runtime (``libomptarget``) -------------------------------------------------- +.. _libopenmptarget_environment_vars: + Environment Variables ^^^^^^^^^^^^^^^^^^^^^ @@ -171,6 +173,95 @@ shows that ``D`` will be copied back from the device once the OpenMP device kernel region ends even though it isn't written to. Finally, at the end of the OpenMP data region the entries for ``X`` and ``Y`` are removed from the table. +.. _libopenmptarget_errors: + +Errors: +^^^^^^^ + +``libomptarget`` provides error messages when the program fails inside the +OpenMP target region. Common causes of failure could be an invalid pointer +access, running out of device memory, or trying to offload when the device is +busy. If the application was built with debugging symbols the error messages +will additionally provide the source location of the OpenMP target region. + +For example, consider the following code that implements a simple parallel +reduction on the GPU. This code has a bug that causes it to fail in the +offloading region. + +.. code-block:: c++ + + #include + + double sum(double *A, std::size_t N) { + double sum = 0.0; + #pragma omp target teams distribute parallel for reduction(+:sum) + for (int i = 0; i < N; ++i) + sum += A[i]; + + return sum; + } + + int main() { + const int N = 1024; + double A[N]; + sum(A, N); + } + +If this code is compiled and run, there will be an error message indicating what is +going wrong. + +.. code-block:: console + + $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only sum.cpp -o sum + $ ./sum + +.. code-block:: text + + CUDA error: Error when copying data from device to host. + CUDA error: an illegal memory access was encountered + Libomptarget error: Copying data from device failed. + Libomptarget error: Call to targetDataEnd failed, abort target. + Libomptarget error: Failed to process data after launching the kernel. + Libomptarget error: Run with LIBOMPTARGET_INFO=4 to dump host-target pointer mappings. + sum.cpp:5:1: Libomptarget error 1: failure of target construct while offloading is mandatory + +This shows that there is an illegal memory access occuring inside the OpenMP +target region once execution has moved to the CUDA device, suggesting a +segmentation fault. This then causes a chain reaction of failures in +``libomptarget``. Another message suggests using the ``LIBOMPTARGET_INFO`` +environment variable as described in :ref:`libopenmptarget_environment_vars`. If +we do this it will print the sate of the host-target pointer mappings at the +time of failure. + +.. code-block:: console + + $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only sum.cpp -o sum + $ env LIBOMPTARGET_INFO=4 ./sum + +.. code-block:: text + + info: OpenMP Host-Device pointer mappings after block at sum.cpp:5:1: + info: Host Ptr Target Ptr Size (B) RefCount Declaration + info: 0x00007ffc058280f8 0x00007f4186600000 8 1 sum at sum.cpp:4:10 + +This tells us that the only data mapped between the host and the device is the +``sum`` variable that will be copied back from the device once the reduction has +ended. There is no entry mapping the host array ``A`` to the device. In this +situation, the compiler cannot determine the size of the array at compile time +so it will simply assume that the pointer is mapped on the device already by +default. The solution is to add an explicit map clause in the target region. + +.. code-block:: c++ + + double sum(double *A, std::size_t N) { + double sum = 0.0; + #pragma omp target teams distribute parallel for reduction(+:sum) map(to:A[0 : N]) + for (int i = 0; i < N; ++i) + sum += A[i]; + + return sum; + } + .. toctree:: :hidden: :maxdepth: 1