This patch enables SPIR-V binary emission for HIP device code via the
HIPSPV tool chain.
‘--offload’ option, which is envisioned in [1], is added for specifying
offload targets. This option is used to override default device target
(amdgcn-amd-amdhsa) for HIP compilation for emitting device code as
SPIR-V binary. The option is handled in getHIPOffloadTargetTriple().
getOffloadingDeviceToolChain() function (based on the design in the
SYCL repository) is added to select HIPSPVToolChain when HIP offload
target is ‘spirv64’.
The HIPActionBuilder is modified to produce LLVM IR at the backend
phase. HIPSPV tool chain expects to receive HIP device code as LLVM
IR so it can run external LLVM passes over them. HIPSPV TC is also
responsible for emitting the SPIR-V binary.
A Cuda GPU architecture ‘generic’ is added. The name is picked from
the LLVM SPIR-V Backend. In the HIPSPV code path the architecture
name is inserted to the bundle entry ID as target ID. Target ID is
expected to be always present so a component in the target triple
is not mistaken as target ID.
Tests are added for checking the HIPSPV tool chain.
[1]: https://lists.llvm.org/pipermail/cfe-dev/2020-December/067362.html
Patch by: Henry Linjamäki
Reviewed by: Yaxun Liu, Artem Belevich, Alexey Bader
Differential Revision: https://reviews.llvm.org/D110622
Always use cuda.h to detect CUDA version. It's a more universal approach
compared to version.txt which is no longer present in recent CUDA versions.
Split the 'unknown CUDA version' warning in two:
* when detected CUDA version is partially supported by clang. It's expected to
work in general, at the feature parity with the latest supported CUDA
version. and may be missing support for the new features/instructions/GPU
variants. Clang will issue a warning.
* when detected version is new. Recent CUDA versions have been working with
clang reasonably well, and will likely to work similarly to the partially
supported ones above. Or it may not work at all. Clang will issue a warning and
proceed as if the latest known CUDA version was detected.
Differential Revision: https://reviews.llvm.org/D108247
The patch only plumbs through the option necessary for targeting sm_86 GPUs w/o
adding any new functionality.
Differential Revision: https://reviews.llvm.org/D95974
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.
Differential Revision: https://reviews.llvm.org/D90419
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
At AMD, in an internal audit of our code, we found some corner cases
where we were not quite differentiating targets enough for some old
hardware. This commit is part of fixing that by adding three new
targets:
* The "Oland" and "Hainan" variants of gfx601 are now split out into
gfx602. LLPC (in the GPUOpen driver) and other front-ends could use
that to avoid using the shaderZExport workaround on gfx602.
* One variant of gfx703 is now split out into gfx705. LLPC and other
front-ends could use that to avoid using the
shaderSpiCsRegAllocFragmentation workaround on gfx705.
* The "TongaPro" variant of gfx802 is now split out into gfx805.
TongaPro has a faster 64-bit shift than its former friends in gfx802,
and a subtarget feature could be set up for that to take advantage of
it. This commit does not make that change; it just adds the target.
V2: Add clang changes. Put TargetParser list in order.
V3: AMDGCNGPUs table in TargetParser.cpp needs to be in GPUKind order,
so fix the GPUKind order.
Differential Revision: https://reviews.llvm.org/D88916
Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
Currently CUDA/HIP toolchain uses "unknown" as bound arch
for offload action for fat binary. This causes -mcpu or -march
with "unknown" added in HIPToolChain::TranslateArgs or
CUDAToolChain::TranslateArgs.
This causes issue for https://reviews.llvm.org/D88377 since
HIP toolchain needs to check -mcpu in HIPToolChain::TranslateArgs.
The bound arch of offload action for fat binary is not really
used, therefore set it to CudaArch::UNUSED.
Differential Revision: https://reviews.llvm.org/D88524
Generate PTX using newer versions of PTX and allow using sm_80 with CUDA-11.
None of the new features of CUDA-10.2+ have been implemented yet, so using these
versions will still produce a warning.
Differential Revision: https://reviews.llvm.org/D77670
Instead of hardcoding individual GPU mappings in multiple functions, keep them
all in one table and use it to look up the mappings.
We also don't care about 'virtual' architecture much, so the API is trimmed down
down to a simpler GPU->Virtual arch name lookup.
Differential Revision: https://reviews.llvm.org/D77665
Summary:
- `RegisterVar` has `void` return type and `size_t` in its variable size
parameter in HIP or CUDA 9.0+.
Reviewers: tra, yaxunl
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D77398
This makes clang somewhat forward-compatible with new CUDA releases
without having to patch it for every minor release without adding
any new function.
If an unknown version is found, clang issues a warning (can be disabled
with -Wno-cuda-unknown-version) and assumes that it has detected
the latest known version. CUDA releases are usually supersets
of older ones feature-wise, so it should be sufficient to keep
released clang versions working with minor CUDA updates without
having to upgrade clang, too.
Differential Revision: https://reviews.llvm.org/D73231
..and use it to control that parts of CUDA compilation
that depend on the specific version of CUDA SDK.
This patch has a placeholder for a 'new launch API' support
which is in a separate patch. The list will be further
extended in the upcoming patch to support CUDA-10.1.
Differential Revision: https://reviews.llvm.org/D57487
llvm-svn: 352798
Clang can use CUDA-9.1 now, though new APIs (are not implemented yet.
The major change is that headers in CUDA-9.1 went through substantial
changes that started in CUDA-9.0 which required substantial changes
in the cuda compatibility headers provided by clang.
There are two major issues:
* CUDA SDK no longer provides declarations for libdevice functions.
* A lot of device-side functions have become nvcc's builtins and
CUDA headers no longer contain their implementations.
This patch changes the way CUDA headers are handled if we compile
with CUDA 9.x. Both 9.0 and 9.1 are affected.
* Clang provides its own declarations of libdevice functions.
* For CUDA-9.x clang now provides implementation of device-side
'standard library' functions using libdevice.
This patch should not affect compilation with CUDA-8. There may be
some observable differences for CUDA-9.0, though they are not expected
to affect functionality.
Tested: CUDA test-suite tests for all supported combinations of:
CUDA: 7.0,7.5,8.0,9.0,9.1
GPU: sm_20, sm_35, sm_60, sm_70
Differential Revision: https://reviews.llvm.org/D42513
llvm-svn: 323713
Summary:
CUDA 9's minimum sm is sm_30.
Ideally we should also make sm_30 the default when compiling with CUDA
9, but that seems harder than it should be.
Subscribers: sanjoy
Differential Revision: https://reviews.llvm.org/D39109
llvm-svn: 316611
For now CUDA-9 is not included in the list of CUDA versions clang
searches for, so the path to CUDA-9 must be explicitly passed
via --cuda-path=.
On LLVM side NVPTX added sm_70 GPU type which bumps required
PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment.
Differential Revision: https://reviews.llvm.org/D37576
llvm-svn: 312734
Some compilers are too dumb to realize that the switch statement covers
all cases.
(Don't use a "default" label, because we explicitly want to get a warning
if our switch doesn't cover all the cases.)
llvm-svn: 274713
Summary:
Currently our handling of CUDA architectures is scattered all around
clang. This patch centralizes it.
A key advantage of this centralization is that you can now write a C++
switch on e.g. CudaArch and get a compile error if you don't handle one
of the enum values.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D21867
llvm-svn: 274681