From 194788b2fd0f82a2069899b8eb81648439d8dbab Mon Sep 17 00:00:00 2001
From: Joseph Huber <jhuber6@vols.utk.edu>
Date: Mon, 28 Nov 2022 14:54:30 -0600
Subject: [PATCH] [libc][docs] Add documentation for the new GPU mode

This patch introduces documentation for the new GPU mode added in
D138608. The documentation includes instructions for building and using
the library, along with a description of the supported functions and
headers.

Reviewed By: sivachandra, lntue, michaelrj

Differential Revision: https://reviews.llvm.org/D138856
---
 libc/docs/gpu_mode.rst | 168 +++++++++++++++++++++++++++++++++++++++++
 libc/docs/index.rst    |   1 +
 2 files changed, 169 insertions(+)
 create mode 100644 libc/docs/gpu_mode.rst
diff --git a/libc/docs/gpu_mode.rst b/libc/docs/gpu_mode.rst
new file mode 100644
index 000000000000..1f34f7555d31
--- /dev/null
+++ b/libc/docs/gpu_mode.rst
@@ -0,0 +1,168 @@
+.. _GPU_mode:
+
+==============
+GPU Mode
+==============
+
+.. include:: check.rst
+
+.. contents:: Table of Contents
+  :depth: 4
+  :local:
+
+.. note:: This feature is very experimental and may change in the future.
+
+The *GPU* mode of LLVM's libc is an experimental mode used to support calling
+libc routines during GPU execution. The goal of this project is to provide
+access to the standard C library on systems running accelerators. To begin using
+this library, build and install the ``libcgpu.a`` static archive following the
+instructions in :ref:`building_gpu_mode` and link with your offloading
+application.
+
+.. _building_gpu_mode:
+
+Building the GPU library
+========================
+
+LLVM's libc GPU support *must* be built using the same compiler as the final
+application to ensure relative LLVM bitcode compatibility. This can be done
+automatically using the ``LLVM_ENABLE_RUNTIMES=libc`` option. Furthermore,
+building for the GPU is only supported in :ref:`fullbuild_mode`. To enable the
+GPU build, set the target OS to ``gpu`` via ``LLVM_LIBC_TARGET_OS=gpu``. By
+default, ``libcgpu.a`` will be built using every supported GPU architecture. To
+restrict the number of architectures build, set ``LLVM_LIBC_GPU_ARCHITECTURES``
+to the list of desired architectures or use ``all``. A typical ``cmake``
+configuration will look like this:
+
+.. code-block:: sh
+
+  $> cd llvm-project  # The llvm-project checkout
+  $> mkdir build
+  $> cd build
+  $> cmake ../llvm -G Ninja                                \
+     -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt"        \
+     -DLLVM_ENABLE_RUNTIMES="libc;openmp"                  \
+     -DCMAKE_BUILD_TYPE=<Debug|Release>  \ # Select build type
+     -DLLVM_LIBC_FULL_BUILD=ON           \ # We need the full libc
+     -DLLVM_LIBC_TARGET_OS=gpu           \ # Build in GPU mode
+     -DLLVM_LIBC_GPU_ARCHITECTURES=all   \ # Build all supported architectures
+     -DCMAKE_INSTALL_PREFIX=<PATH>       \ # Where 'libcgpu.a' will live
+  $> ninja install
+
+Since we want to include ``clang``, ``lld`` and ``compiler-rt`` in our
+toolchain, we list them in ``LLVM_ENABLE_PROJECTS``. To ensure ``libc`` is built
+using a compatible compiler and to support ``openmp`` offloading, we list them
+in ``LLVM_ENABLE_RUNTIMES`` to build them after the enabled projects using the
+newly built compiler. ``CMAKE_INSTALL_PREFIX`` specifies the installation
+directory in which to install the ``libcgpu.a`` library along with LLVM.
+
+Usage
+=====
+
+Once the ``libcgpu.a`` static archive has been built in
+:ref:`building_gpu_mode`, it can be linked directly with offloading applications
+as a standard library. This process is described in the `clang documentation
+<https://clang.llvm.org/docs/OffloadingDesign.html>_`. This linking mode is used
+by the OpenMP toolchain, but is currently opt-in for the CUDA and HIP toolchains
+using the ``--offload-new-driver``` and ``-fgpu-rdc`` flags. A typical usage
+will look this this:
+
+.. code-block:: sh
+
+  $> clang foo.c -fopenmp --offload-arch=gfx90a -lcgpu
+
+The ``libcgpu.a`` static archive is a fat-binary containing LLVM-IR for each
+supported target device. The supported architectures can be seen using LLVM's
+objdump with the ``--offloading`` flag:
+
+.. code-block:: sh
+
+  $> llvm-objdump --offloading libcgpu.a
+  libcgpu.a(strcmp.cpp.o):    file format elf64-x86-64
+
+  OFFLOADING IMAGE [0]:
+  kind            llvm ir
+  arch            gfx90a
+  triple          amdgcn-amd-amdhsa
+  producer        <none>
+
+Because the device code is stored inside a fat binary, it can be difficult to
+inspect the resulting code. This can be done using the following utilities:
+
+.. code-block:: sh
+   $> llvm-ar x libcgpu.a strcmp.cpp.o
+   $> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc
+   $> opt -S out.bc
+   ...
+
+Supported Functions
+===================
+
+The following functions and headers are supported at least partially on the
+device. Currently, only basic device functions that do not require an operating
+system are supported on the device. Supporting functions like `malloc` using an
+RPC mechanism is a work-in-progress.
+
+ctype.h
+-------
+
+=============  =========
+Function Name  Available
+=============  =========
+isalnum        |check|
+isalpha        |check|
+isascii        |check|
+isblank        |check|
+iscntrl        |check|
+isdigit        |check|
+isgraph        |check|
+islower        |check|
+isprint        |check|
+ispunct        |check|
+isspace        |check|
+isupper        |check|
+isxdigit       |check|
+toascii        |check|
+tolower        |check|
+toupper        |check|
+=============  =========
+
+string.h
+--------
+
+=============   =========
+Function Name   Available
+=============   =========
+bcmp            |check|
+bzero           |check|
+memccpy         |check|
+memchr          |check|
+memcmp          |check|
+memcpy          |check|
+memmove         |check|
+mempcpy         |check|
+memrchr         |check|
+memset          |check|
+stpcpy          |check|
+stpncpy         |check|
+strcat          |check|
+strchr          |check|
+strcmp          |check|
+strcpy          |check|
+strcspn         |check|
+strlcat         |check|
+strlcpy         |check|
+strlen          |check|
+strncat         |check|
+strncmp         |check|
+strncpy         |check|
+strnlen         |check|
+strpbrk         |check|
+strrchr         |check|
+strspn          |check|
+strstr          |check|
+strtok          |check|
+strtok_r        |check|
+strdup
+strndup
+=============   =========
diff --git a/libc/docs/index.rst b/libc/docs/index.rst
index c298f00c1e99..10e30a363ec9 100644
--- a/libc/docs/index.rst
+++ b/libc/docs/index.rst
@@ -52,6 +52,7 @@ stages there is no ABI stability in any form.
    usage_modes
    overlay_mode
    fullbuild_mode
+   gpu_mode
 
 .. toctree::
    :hidden: