Commit Graph

22 Commits

Author SHA1 Message Date
Jon Chesterfield fc88d927e3 [clang][amdgpu] Use implicit code object version
[clang][amdgpu] Use implicit code object version

At present, clang always passes amdhsa-code-object-version on to -cc1. That is
great for certainty over what object version is being used when debugging.

Unfortunately, the command line argument is in AMDGPUBaseInfo.cpp in the amdgpu
target. If clang is used with an llvm compiled with DLLVM_TARGETS_TO_BUILD
that excludes amdgpu, this will be diagnosed (as discovered via D98658):

- Unknown command line argument '--amdhsa-code-object-version=4'

This means that clang, built only for X86, can be used to compile the nvptx
devicertl for openmp but not the amdgpu one. That would shortly spawn fragile
logic in the devicertl cmake to try to guess whether the clang used will work.

This change omits the amdhsa-code-object-version parameter when it matches the
default that AMDGPUBaseInfo.cpp specifies, with a comment to indicate why. As
this is the only part of clang's codegen for amdgpu that depends on the target
in the back end it suffices to build the openmp runtime on most (all?) systems.

It is a non-functional change, though observable in the updated tests and when
compiling with -###. It may cause minor disruption to the amd-stg-open branch.

Revision of D98746, builds on refactor in D101077

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D101095
2021-04-23 23:52:50 +01:00
Yaxun (Sam) Liu 4fd05e0ad7 [HIP] Change to code object v4
Change to code object v4 by default to match ROCm 4.1.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D99235
2021-04-06 20:22:58 -04:00
Jon Chesterfield c0619d3b21 [NFC] Use regex for code object version in hip tests
[NFC] Use regex for code object version in hip tests

Extracted from D93258. Makes tests robust to changes in default
code object version.

Reviewed By: t-tye

Differential Revision: https://reviews.llvm.org/D93398
2020-12-16 17:00:19 +00:00
Yaxun (Sam) Liu 0b81d9a992 [AMDGPU] add -mcode-object-version=n
Add option -mcode-object-version=n to control code object version for
AMDGPU.

Differential Revision: https://reviews.llvm.org/D91310
2020-12-07 18:08:37 -05:00
Yaxun (Sam) Liu 4bed1d9b32 [HIP] fix bundle entry ID for --
Canonicalize triple used in fat binary. Change from
amdgcn-amd-amdhsa to amdgcn-amd-amdhsa-.

This is part of https://reviews.llvm.org/D60620
2020-12-07 18:08:37 -05:00
Yaxun (Sam) Liu dc6a0b0ec7 [HIP] Align device binary
To facilitate faster loading of device binaries and share them among processes,
HIP runtime favors their alignment being 4096 bytes. HIP runtime can load
unaligned device binaries, however, aligning them at 4096 bytes results in
faster loading and less shared memory usage.

This patch adds an option -bundle-align to clang-offload-bundler which allows
bundles to be aligned at specified alignment. By default it is 1, which is NFC
compared to existing format.

This patch then aligns embedded fat binary and device binary inside fat binary
at 4096 bytes.

It has been verified this change does not cause significant overall file size increase
for typical HIP applications (less than 1%).

Differential Revision: https://reviews.llvm.org/D88734
2020-10-02 18:10:44 -04:00
Yaxun (Sam) Liu c830d517b4 [HIP] Enable -amdgpu-internalize-symbols
Enable -amdgpu-internalize-symbols to eliminate unused functions and global variables
for whole program to speed up compilation and improve performance.

For -fno-gpu-rdc, -amdgpu-internalize-symbols is passed to clang -cc1.

For -fgpu-rdc, -amdgpu-internalize-symbols is passed to lld.

Differential Revision: https://reviews.llvm.org/D81959
2020-06-18 16:34:37 -04:00
Hiroshi Yamauchi ce82b8e8af [HIP] Improve check patterns to avoid test failures in case string "opt",
etc. happens to be in the InstallDir path in the -### output.

Differential Revision: https://reviews.llvm.org/D82046
2020-06-18 10:14:31 -07:00
Yaxun (Sam) Liu 6752786d65 [HIP] Do not use llvm-link/opt/llc for -fgpu-rdc
This patch is a follow up on https://reviews.llvm.org/D81627.

In addition to default -fno-gpu-rdc case, this patches let
HIP toolchain not use llvm-link/opt/llc to link device code
for -fgpu-rdc case. Instead, uses standard lto.

This will eliminate some redundant optimizations and speed
up the compilation/linking.

Differential Revision: https://reviews.llvm.org/D81861
2020-06-15 21:09:18 -04:00
Yaxun (Sam) Liu e8090d83fd [HIP] Do not call opt/llc for -fno-gpu-rdc
Currently HIP toolchain calls clang to emit bitcode then calls opt/llc for device compilation for the default -fno-gpu-rdc
case, which is unnecessary since clang is able to compile a single source file to ISA.

This patch fixes the HIP action builder and toolchain so that the default -fno-gpu-rdc can be done like a canonical
toolchain, i.e. one clang -cc1 invocation to compile source code to ISA.

This can avoid unnecessary processes to speed up the compilation, and avoid redundant LLVM passes which are
performed in clang -cc1 and opt.

Differential Revision: https://reviews.llvm.org/D81627
2020-06-15 18:55:01 -04:00
Michael Liao c97be2c377 [hip] Remove `hip_pinned_shadow`.
Summary:
- Use `device_builtin_surface` and `device_builtin_texture` for
  surface/texture reference support. So far, both the host and device
  use the same reference type, which could be revised later when
  interface/implementation is stablized.

Reviewers: yaxunl

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77583
2020-04-07 09:51:49 -04:00
Alexandre Ganea 5896e2df45 [Clang] Fix HIP tests when running on Windows with the LLVM toolchain is in the path
On Windows, when the LLVM toolchain is in the current path (%PATH%), fusing the linker yields c:\{path}\lld.exe whereas the hip tests did not expect the .exe part.
This only happens if LLD is not present in the build folder, which consequently makes the code in Driver::GetProgramPath() to fall back to %PATH% and platform-specific search, which includes the .exe part (see https://github.com/llvm/llvm-project/blob/master/clang/lib/Driver/Driver.cpp#L4733).

Differential Revision: https://reviews.llvm.org/D76631
2020-03-23 16:54:48 -04:00
Scott Linder d96ea47c75 [AMDGPU][HIP] Improve opt-level handling
Summary:
The HIP toolchain invokes `llc` without an explicit opt-level, meaning
it always uses the default (-O2). This makes it impossible to use -O1,
for example. The HIP toolchain also coerces -Os/-Oz to -O2 even when
invoking opt, and it coerces -Og to -O2 rather than -O1.

Forward the opt-level to `llc` as well as `opt`, and only coerce levels
where it is required.

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D70987
2019-12-05 11:27:12 -05:00
Michael Liao 5a48678a6a [hip] Allow the declaration of functions with variadic arguments in HIP.
Summary:
- As variadic parameters have the lowest rank in overload resolution,
  without real usage of `va_arg`, they are commonly used as the
  catch-all fallbacks in SFINAE. As the front-end still reports errors
  on calls to `va_arg`, the declaration of functions with variadic
  arguments should be allowed in general.

Reviewers: jlebar, tra, yaxunl

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D69389
2019-10-25 00:39:24 -04:00
Yaxun Liu c3dfe9082b [HIP] Support attribute hip_pinned_shadow
This patch introduces support of hip_pinned_shadow variable for HIP.

A hip_pinned_shadow variable is a global variable with attribute hip_pinned_shadow.
It has external linkage on device side and has no initializer. It has internal
linkage on host side and has initializer or static constructor. It can be accessed
in both device code and host code.

This allows HIP runtime to implement support of HIP texture reference.

Differential Revision: https://reviews.llvm.org/D62738

llvm-svn: 364381
2019-06-26 03:47:37 +00:00
Reid Kleckner 549ed544c3 [Driver] Move the "-o OUT -x TYPE SRC.c" flags to the end of -cc1
New -cc1 arguments, such as -faddrsig, have started appearing after the
input name. I personally find it convenient for the input to be the last
argument to the compile command line, since I often need to edit it when
running crash reproduction scripts.

Differential Revision: https://reviews.llvm.org/D62270

llvm-svn: 361530
2019-05-23 18:35:43 +00:00
Yaxun Liu 7bd8c37b17 [HIP] Use -mlink-builtin-bitcode to link device library
Use -mlink-builtin-bitcode instead of llvm-link to link
device library so that device library bitcode and user
device code can be compiled in a consistent way.

This is the same approach used by CUDA and OpenMP.

Differential Revision: https://reviews.llvm.org/D60513

llvm-svn: 358290
2019-04-12 16:23:31 +00:00
Douglas Yung 607a1b2234 Relax restriction in tests to where "-emit-llvm-bc" and "-emit-obj" must appear.
The CHECK lines as structured were requiring them to appear only in a certain
position while all that is really needed is to check that they are present.

llvm-svn: 354001
2019-02-14 01:11:32 +00:00
Aaron Enye Shi a1adb80ae7 [HIP] Fix hip-toolchain-rdc tests
Since we removed changed the way HIP Toolchain will propagate -m options into LLC, we need to remove from these older tests.

This is related to rC353880.

Differential Revision: https://reviews.llvm.org/D57977

llvm-svn: 353885
2019-02-12 22:01:19 +00:00
Scott Linder bef2663751 Add -fapply-global-visibility-to-externs for -cc1
Introduce an option to request global visibility settings be applied to
declarations without a definition or an explicit visibility, rather than
the existing behavior of giving these default visibility. When the
visibility of all or most extern definitions are known this allows for
the same optimisations -fvisibility permits without updating source code
to annotate all declarations.

Differential Revision: https://reviews.llvm.org/D56868

llvm-svn: 352391
2019-01-28 17:12:19 +00:00
Yaxun Liu 9b6d9f2a62 Disable code object version 3 for HIP toolchain
AMDGPU backend will switch to code object version 3 by default.
Since HIP runtime is not ready, disable it until the runtime is ready.

Differential Revision: https://reviews.llvm.org/D53325

llvm-svn: 344630
2018-10-16 17:36:23 +00:00
Yaxun Liu 9767089d00 [HIP] Support early finalization of device code for -fno-gpu-rdc
This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original
options as aliases. When -fgpu-rdc is off,
clang will assume the device code in each translation unit does not call
external functions except those in the device library, therefore it is possible
to compile the device code in each translation unit to self-contained kernels
and embed them in the host object, so that the host object behaves like
usual host object which can be linked by lld.

The benefits of this feature is: 1. allow users to create static libraries which
can be linked by host linker; 2. amortized device code linking time.

This patch modifies HIP action builder to insert actions for linking device
code and generating HIP fatbin, and pass HIP fatbin to host backend action.
It extracts code for constructing command for generating HIP fatbin as
a function so that it can be reused by early finalization. It also modifies
codegen of HIP host constructor functions to embed the device fatbin
when it is available.

Differential Revision: https://reviews.llvm.org/D52377

llvm-svn: 343611
2018-10-02 17:48:54 +00:00