Commit Graph

39 Commits

Author SHA1 Message Date
Alex Lorenz 6b938d2ead Recommit "[clang][driver] Use the provided arch name for a Darwin target triple
This ensures that the Darwin driver uses a consistent target triple
representation when the triple is printed out to the user.

This reverts the revert commit ab0df6c034.

Differential Revision: https://reviews.llvm.org/D100807
2021-04-29 15:00:40 -07:00
Shilei Tian c41ae246ac [OpenMP][Clang][NVPTX] Only build one bitcode library for each SM
In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on
NVPTX target. We don't need to have macros in source code to select right functions
based on CUDA version. we don't need to compile multiple bitcode libraries of
different CUDA versions for each SM. We don't need to worry about future
compatibility with newer CUDA version.

`-target-feature +ptx61` is used in this patch, which corresponds to the highest
PTX version that CUDA 9.2 can support.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97198
2021-03-08 12:03:04 -05:00
Pushpinder Singh 99951aa68d OpenMP: Fix object clobbering issue when using save-temps
There are two preconditions to reproduce the issue,
 1. Use -save-temps option
 2. Provide the -o option with name equal to the input file name
    without the file extension. For e.g. clang a.c -o a

With the -o specified, the AssembleJobAction after OffloadWrapperJobAction
will produce the object file with same name as host code object file.
Due to this clash, the OffloadWrapperAction overwrites the initial host
object file, which results in lld error. This also fixes the `multiple definition of __dummy.omp_offloading.entry'` issue in D96769 .

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D97273
2021-02-25 00:50:51 -05:00
Shilei Tian 76151acf89 [Clang][OpenMP] Require CUDA 9.2+ for OpenMP offloading on NVPTX target
In current implementation of `deviceRTLs`, we're using some functions
that are CUDA version dependent (if CUDA_VERSION < 9, it is one; otheriwse, it
is another one). As a result, we have to compile one bitcode library for each
CUDA version supported. A worse problem is forward compatibility. If a new CUDA
version is released, we have to update CMake file as well.

CUDA 9.2 has been released for three years. Instead of using various weird tricks
to make `deviceRTLs` work with different CUDA versions and still have forward
compatibility, we can simply drop support for CUDA 9.1 or lower version. It has at
least two benifits:
- We don't need to generate bitcode libraries for each CUDA version;
- Clang driver doesn't need to search for the bitcode lib based on CUDA version.

We can claim that starting from LLVM 12, OpenMP offloading on NVPTX target requires
CUDA 9.2+.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D97003
2021-02-22 11:00:33 -05:00
Shilei Tian 33d660939d [Clang][OpenMP] Update driver test case for OpenMP offload to use sm_35
`sm_35` is the minimum requirement for OpenMP offloading on NVPTX device.
Current driver test case is using `sm_20`. D97003 is going to switch the minimum
CUDA version to 9.2, which only supports `sm_30+`. This patch makes step for the
change.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D97120
2021-02-20 15:14:13 -05:00
Shilei Tian 7c03f7d7d0 [OpenMP][deviceRTLs] Build the deviceRTLs with OpenMP instead of target dependent language
From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics.
Here're a list of changes in this patch.
1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros.
2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly.
3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation.
4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`.

With this change, there are also multiple features to be expected in the near future:
1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version.
2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong.
3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore.
4. (Maybe more...)

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94745
2021-01-26 12:28:47 -05:00
Shilei Tian 5ad038aafa [Clang][OpenMP][NVPTX] Replace `libomptarget-nvptx-path` with `libomptarget-nvptx-bc-path`
D94700 removed the static library so we no longer need to pass
`-llibomptarget-nvptx` to `nvlink`. Since the bitcode library is the only device
runtime for now, instead of emitting a warning when it is not found, an error
should be raised. We also set a new option `libomptarget-nvptx-bc-path` to let
user choose which bitcode library is being used.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95161
2021-01-23 14:42:38 -05:00
Nico Weber 956034c6c8 [mac/arm] XFAIL two more tests on arm64-apple
Part of PR46644
2020-12-12 15:20:50 -05:00
Amy Huang 394db22595 Revert "Switch to using -debug-info-kind=constructor as default (from =limited)"
This reverts commit 227db86a1b.

Causing debug info errors in google3 LTO builds; also causes a
debuginfo-test failure.
2020-07-28 11:23:59 -07:00
Amy Huang 227db86a1b Switch to using -debug-info-kind=constructor as default (from =limited)
Summary:
-debug-info-kind=constructor reduces the amount of class debug info that
is emitted; this patch switches to using this as the default.

Constructor homing emits the complete type info for a class only when the
constructor is emitted, so it is expected that there will be some classes that
are not defined in the debug info anymore because they are never constructed,
and we shouldn't need debug info for these classes.

I compared the PDB files for clang, and there are 273 class types that are defined with `=limited`
but not with `=constructor` (out of ~60,000 total class types).
We've looked at a number of the types that are no longer defined with =constructor. The vast
majority of cases are something like class A is used as a parameter in a member function of
some other class B, which is emitted. But the function that uses class A is never called, and class A
is never constructed, and therefore isn't emitted in the debug info.

Bug: https://bugs.llvm.org/show_bug.cgi?id=46537

Subscribers: aprantl, cfe-commits, lldb-commits

Tags: #clang, #lldb

Differential Revision: https://reviews.llvm.org/D79147
2020-07-09 15:26:46 -07:00
Saiyedul Islam 602d9b0afc [OpenMP][AMDGCN] Support OpenMP offloading for AMDGCN architecture - Part 1
Summary:
Allow AMDGCN as a GPU offloading target for OpenMP during compiler
invocation and allow setting CUDAMode for it.

Originally authored by Greg Rodgers (@gregrodgers).

Reviewers: ronlieb, yaxunl, b-sumner, scchan, JonChesterfield, jdoerfert, sameerds, msearles, hliao, arsenm

Reviewed By: sameerds

Subscribers: sstefan1, jvesely, wdng, arsenm, guansong, dexonsmith, cfe-commits, llvm-commits, gregrodgers

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D79754
2020-05-27 07:51:27 +00:00
Sergey Dmitriev a0d83768f1 [Clang][OpenMP Offload] Add new tool for wrapping offload device binaries
This patch removes the remaining part of the OpenMP offload linker scripts which was used for inserting device binaries into the output linked binary. Device binaries are now inserted into the host binary with a help of the wrapper bit-code file which contains device binaries as data. Wrapper bit-code file is dynamically created by the clang driver with a help of new tool clang-offload-wrapper which takes device binaries as input and produces bit-code file with required contents. Wrapper bit-code is then compiled to an object and resulting object is appended to the host linking by the clang driver.

This is the second part of the patch for eliminating OpenMP linker script (please see https://reviews.llvm.org/D64943).

Differential Revision: https://reviews.llvm.org/D68166

llvm-svn: 374219
2019-10-09 20:42:58 +00:00
Gheorghe-Teodor Bercea e62c693c8e [OpenMP][Clang] Support for target math functions
Summary:
In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang.

We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism.

Authors:
@gtbercea
@jdoerfert

Reviewers: hfinkel, caomhin, ABataev, tra

Reviewed By: hfinkel, ABataev, tra

Subscribers: JDevlieghere, mgorny, guansong, cfe-commits, jdoerfert

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61399

llvm-svn: 360265
2019-05-08 15:52:33 +00:00
Jonas Devlieghere fe608c938c Revert "[OpenMP][Clang] Support for target math functions"
This commit appears to be breaking stage-2 builds on GreenDragon. The
OpenMP wrappers for cmath and math.h are copied into the root of the
resource directory and cause a cyclic dependency in module 'Darwin':
Darwin -> std -> Darwin. This blows up when CMake is testing for modules
support and breaks all stage 2 module builds, including the ThinLTO bot
and all LLDB bots.

CMake Error at cmake/modules/HandleLLVMOptions.cmake:497 (message):
  LLVM_ENABLE_MODULES is not supported by this compiler

llvm-svn: 360192
2019-05-07 21:08:15 +00:00
Gheorghe-Teodor Bercea 1e28a668bc [OpenMP][Clang] Support for target math functions
Summary:
In this patch we propose a temporary solution to resolving math functions for the NVPTX toolchain, temporary until OpenMP variant is supported by Clang.

We intercept the inclusion of math.h and cmath headers and if we are in the OpenMP-NVPTX case, we re-use CUDA's math function resolution mechanism.

Authors:
@gtbercea
@jdoerfert

Reviewers: hfinkel, caomhin, ABataev, tra

Reviewed By: hfinkel, ABataev, tra

Subscribers: mgorny, guansong, cfe-commits, jdoerfert

Tags: #clang

Differential Revision: https://reviews.llvm.org/D61399

llvm-svn: 360063
2019-05-06 18:19:15 +00:00
Alexey Bataev 8061acd501 [OPENMP][NVPTX]Use faster teams reduction algorithm.
A faster way to reduce the values in teams reductions was found, the
codegen is updated to use this faster algorithm and new runtime functions.

llvm-svn: 354479
2019-02-20 16:36:22 +00:00
Alexey Bataev c92fc3c8bc [CUDA][OPENMP][NVPTX]Improve logic of the debug info support.
Summary:
Added support for the -gline-directives-only option + fixed logic of the
debug info for CUDA devices. If optimization level is O0, then options
--[no-]cuda-noopt-device-debug do not affect the debug info level. If
the optimization level is >O0, debug info options are used +
--no-cuda-noopt-device-debug is used or no --cuda-noopt-device-debug is
used, the optimization level for the device code is kept and the
emission of the debug directives is used.
If the opt level is > O0, debug info is requested +
--cuda-noopt-device-debug option is used, the optimization is disabled
for the device code + required debug info is emitted.

Reviewers: tra, echristo

Subscribers: aprantl, guansong, JDevlieghere, cfe-commits

Differential Revision: https://reviews.llvm.org/D51554

llvm-svn: 348930
2018-12-12 14:52:27 +00:00
Alexey Bataev a5178f5369 [DRIVER][OFFLOAD] Do not invoke unbundler on unsupported file types.
clang-offload-bundler should not be invoked with the unbundling action
when the input file type does not match the action type. For example,
.so files should be unbundled during linking phase and should be linked
only with the host code.

llvm-svn: 343335
2018-09-28 16:17:59 +00:00
Jonas Hahnfeld a981f67bcd [OpenMP] Improve search for libomptarget-nvptx
When looking for the bclib Clang considered the default library
path first while it preferred directories in LIBRARY_PATH when
constructing the invocation of nvlink. The latter actually makes
more sense because during development it allows using a non-default
runtime library. So change the search for the bclib to start
looking in directories given by LIBRARY_PATH.
Additionally add a new option --libomptarget-nvptx-path= which
will be searched first. This will be handy for testing purposes.

Differential Revision: https://reviews.llvm.org/D51686

llvm-svn: 343230
2018-09-27 16:12:32 +00:00
Alexey Bataev 3dfc993437 Revert "[DRIVER][OFFLOAD] Do not invoke unbundler on unsupported file
types."

It reverts commit r342991 + several other commits intended to fix the
tests. Still have some failed tests, need to investigate it.

llvm-svn: 343002
2018-09-25 18:31:56 +00:00
Alexey Bataev 464ab241e7 [DRIVER][OFFLOAD] Do not invoke unbundler on unsupported file types.
clang-offload-bundler should not be invoked with the unbundling action
when the input file type does not match the action type. For example,
.so files should be unbundled during linking phase and should be linked
only with the host code.

llvm-svn: 342991
2018-09-25 17:09:17 +00:00
Alexey Bataev 80a9a61ded [OPENMP][NVPTX] Add options -f[no-]openmp-cuda-force-full-runtime.
Added options -f[no-]openmp-cuda-force-full-runtime to [not] force use
of the full runtime for OpenMP offloading to CUDA devices.

llvm-svn: 341073
2018-08-30 14:45:24 +00:00
Matt Arsenault a13746b7eb Rename -mlink-cuda-bitcode to -mlink-builtin-bitcode
The same semantics work for OpenCL, and probably any offload
language. Keep the old name around as an alias.

llvm-svn: 340193
2018-08-20 18:16:48 +00:00
Alexey Bataev b83b4e40fe [DEBUGINFO] Disable unsupported debug info options for NVPTX target.
Summary:
Some targets support only default set of the debug options and do not
support additional debug options, like NVPTX target. Patch introduced
virtual function supportsDebugInfoOptions() that can be overloaded
by the toolchain, checks if the target supports some debug
options and emits warning when an unsupported debug option is
found.

Reviewers: echristo

Subscribers: aprantl, JDevlieghere, cfe-commits

Differential Revision: https://reviews.llvm.org/D49148

llvm-svn: 338155
2018-07-27 19:45:14 +00:00
Alexey Bataev e36c67b35c [NVPTX] Emit debug info in DWARF-2 by default for Cuda devices.
Summary:
NVPTX target supports debug info in DWARF-2 format. Patch adds emission
of debug info in DWARF-2 by default.

Reviewers: tra, jlebar

Subscribers: aprantl, JDevlieghere, cfe-commits

Differential Revision: https://reviews.llvm.org/D42581

llvm-svn: 330272
2018-04-18 16:31:09 +00:00
Gheorghe-Teodor Bercea 0d5aa84ad9 [OpenMP] Add flag for linking runtime bitcode library
Summary: This patch adds an additional flag to the OpenMP device offloading toolchain to link in the runtime library bitcode.

Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, grokos, hfinkel

Reviewed By: ABataev, grokos

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D43197

llvm-svn: 327460
2018-03-13 23:19:52 +00:00
Gheorghe-Teodor Bercea 0805b80a73 Revert revision 327438.
llvm-svn: 327447
2018-03-13 20:50:12 +00:00
Gheorghe-Teodor Bercea 148046c11b [OpenMP] Add flag for linking runtime bitcode library
Summary: This patch adds an additional flag to the OpenMP device offloading toolchain to link in the runtime library bitcode.

Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, grokos, hfinkel

Reviewed By: ABataev, grokos

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D43197

llvm-svn: 327438
2018-03-13 19:39:19 +00:00
Jonas Hahnfeld 4609b25dde Add target triples to openmp-offload-gpu.c
This might fix the failure on Green Dragon.

llvm-svn: 318767
2017-11-21 15:06:28 +00:00
Jonas Hahnfeld 7c78cc5273 [OpenMP] Consistently use cubin extension for nvlink
This was previously done in some places, but for example not for
bundling so that single object compilation with -c failed. In
addition cubin was used for all file types during unbundling which
is incorrect for assembly files that are passed to ptxas.
Tighten up the tests so that we can't regress in that area.

Differential Revision: https://reviews.llvm.org/D40250

llvm-svn: 318763
2017-11-21 14:44:45 +00:00
Jonas Hahnfeld 757e61fa4f [OpenMP] Fix passing of -m arguments to device toolchain
AuxTriple is not set if host and device share a toolchain. Also,
removing an argument modifies the DAL which needs to be returned
for future use.
(Move tests back to offload-openmp.c as they are not related to GPUs.)

Differential Revision: https://reviews.llvm.org/D38258

llvm-svn: 314329
2017-09-27 18:12:34 +00:00
Jonas Hahnfeld 85f19958e9 [OpenMP] Fix memory leak when translating arguments
Parsing the argument after -Xopenmp-target allocates memory that needs
to be freed. Associate it with the final DerivedArgList after we know
which one will be used.

Differential Revision: https://reviews.llvm.org/D38257

llvm-svn: 314328
2017-09-27 18:12:31 +00:00
Gheorghe-Teodor Bercea 5a3608ccfa [OpenMP] Don't throw cudalib not found error if only front-end is required.
Summary: If we only use the compiler front-end, do not throw an error about the cuda device library not being found. This allows the front-end to be run on systems where no Cuda installation is found.

Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, tra

Reviewed By: tra

Subscribers: hfinkel, tra, cfe-commits

Differential Revision: https://reviews.llvm.org/D37914

llvm-svn: 314217
2017-09-26 15:36:20 +00:00
Gheorghe-Teodor Bercea 20789a5f09 [OpenMP] Enable the existing nocudalib flag for OpenMP offloading toolchain.
Summary: Enable the -nocudalib flag for the OpenMP device offloading toolchain as well. Currently it can only be used for the CUDA toolchain.

Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, hfinkel, tra

Reviewed By: tra

Subscribers: hfinkel, cfe-commits

Differential Revision: https://reviews.llvm.org/D37913

llvm-svn: 314164
2017-09-25 21:56:32 +00:00
Gheorghe-Teodor Bercea 5636f4b33a [OpenMP] Bugfix: output file name drops the absolute path where full path is needed.
Summary: When composing the output file name, the path to the file is being dropped. The full path is required.

Reviewers: Hahnfeld, ABataev, caomhin, carlo.bertolli, hfinkel, tra

Reviewed By: tra

Subscribers: hfinkel, tra, cfe-commits

Differential Revision: https://reviews.llvm.org/D37912

llvm-svn: 314156
2017-09-25 21:25:38 +00:00
Gheorghe-Teodor Bercea d45720b55a Revert commit with wrong message.
llvm-svn: 314154
2017-09-25 21:22:49 +00:00
Gheorghe-Teodor Bercea 8cf757ceda [OpenMP] Don't throw cudalib not found error if only front-end is required.
Summary: If we only use the compiler front-end, do not throw an error about the cuda device library not being found. This allows the front-end to be run on systems where no Cuda installation is found.

Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, tra

Reviewed By: tra

Subscribers: hfinkel, tra, cfe-commits

Differential Revision: https://reviews.llvm.org/D37914

llvm-svn: 314150
2017-09-25 21:07:16 +00:00
Gheorghe-Teodor Bercea 0499212ebf [OpenMP] Move failing flag tests to disabled GPU
offloading test file. This should prevent further errors
with the sanitizer.

Diff: D29660
llvm-svn: 310765
2017-08-11 21:17:50 +00:00
Gheorghe-Teodor Bercea 9c52574886 [OpenMP] Enable previously successful offloading tests.
Create a separate test file to contain all tests for OpenMP
offloading to GPUs.

Make libdevice checking more robust by accounting for
the case in which no libdevice is found.

This changes are in connrection with diff: D29660

llvm-svn: 310718
2017-08-11 15:46:22 +00:00