Commit Graph

246 Commits

Author SHA1 Message Date
Yaxun (Sam) Liu c4afb5f81b [HIP] Fix linking of asanrt.bc
HIP currently uses -mlink-builtin-bitcode to link all bitcode libraries, which
changes the linkage of functions to be internal once they are linked in. This
works for common bitcode libraries since these functions are not intended
to be exposed for external callers.

However, the functions in the sanitizer bitcode library is intended to be
called by instructions generated by the sanitizer pass. If their linkage is
changed to internal, their parameters may be altered by optimizations before
the sanitizer pass, which renders them unusable by the sanitizer pass.

To fix this issue, HIP toolchain links the sanitizer bitcode library with
-mlink-bitcode-file, which does not change the linkage.

A struct BitCodeLibraryInfo is introduced in ToolChain as a generic
approach to pass the bitcode library information between ToolChain and Tool.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D110304
2021-09-27 13:25:46 -04:00
Jonas Hahnfeld ea08c4cd1c [CUDA] Fix static device variables with -fgpu-rdc
NVPTX does not allow dots in the identifier, so ptxas errors out with
   fatal   : Parsing error near '.static': syntax error
because it parses .static as a directive. Avoid this problem by using
two underscores, similar to what OpenMP does for outlined functions.

Differential Revision: https://reviews.llvm.org/D108456
2021-08-25 09:31:22 +02:00
Anshil Gandhi 7063ac1afa [HIP] Allow target addr space in target builtins
This patch allows target specific addr space in target builtins for HIP. It inserts implicit addr
space cast for non-generic pointer to generic pointer in general, and inserts implicit addr
space cast for generic to non-generic for target builtin arguments only.

It is NFC for non-HIP languages.

Differential Revision: https://reviews.llvm.org/D102405
2021-08-19 23:51:58 -06:00
Anshil Gandhi f5d5f17d3a Revert "[HIP] Allow target addr space in target builtins"
This reverts commit a35008955f.
2021-08-18 21:38:42 -06:00
Anshil Gandhi f22ba51873 [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpand pass to report atomics generating a
compare and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-16 14:56:01 -06:00
Dávid Bolvanský 49de6070a2 Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"
This reverts commit 435785214f. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.
2021-08-15 11:44:13 +02:00
Anshil Gandhi 435785214f [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpand pass to report atomics generating
a compare and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-14 23:37:23 -06:00
Anshil Gandhi 29e11a1aa3 Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"
This reverts commit c4e5425aa5.
2021-08-13 23:58:04 -06:00
Anshil Gandhi c4e5425aa5 [Remarks] Emit optimization remarks for atomics generating CAS loop
Implements ORE in AtomicExpandPass to report atomics generating a compare
and swap loop.

Differential Revision: https://reviews.llvm.org/D106891
2021-08-13 22:44:08 -06:00
Anshil Gandhi a35008955f [HIP] Allow target addr space in target builtins
This patch allows target specific addr space in target builtins for HIP. It inserts implicit addr
space cast for non-generic pointer to generic pointer in general, and inserts implicit addr
space cast for generic to non-generic for target builtin arguments only.

It is NFC for non-HIP languages.

Differential Revision: https://reviews.llvm.org/D102405
2021-08-09 16:38:04 -06:00
Michael Liao 6ec36d18ec [cuda] Mark builtin texture/surface reference variable as 'externally_initialized'.
- They need to be preserved even if there's no reference within the
  device code as the host code may need to initialize them based on the
  application logic.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D107718
2021-08-09 13:27:40 -04:00
Yaxun (Sam) Liu 44dbbe6106 [HIP] Preserve ASAN bitcode library functions
Address sanitizer passes may generate call of ASAN bitcode library
functions after bitcode linking in lld, therefore lld cannot add
those symbols since it does not know they will be used later.

To solve this issue, clang emits a reference to a bicode library
function which calls all ASAN functions which need to be
preserved. This basically force all ASAN functions to be
linked in.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D106315
2021-07-23 10:35:52 -04:00
Michael Liao 4e5d9c8803 [Internalize] Preserve variables externally initialized.
- ``externally_initialized`` variables would be initialized or modified
  elsewhere. Particularly, CUDA or HIP may have host code to initialize
  or modify ``externally_initialized`` device variables, which may not
  be explicitly referenced on the device side but may still be used
  through the host side interfaces. Not preserving them triggers the
  elimination of them in the GlobalDCE and breaks the user code.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D105135
2021-07-08 10:48:19 -04:00
Sameer Sahasrabuddhe 280593bd3f [Clang] [NFC] fix CHECK lines for convergent attribute tests 2021-06-29 00:21:07 +05:30
Jay Foad 157473a58f [IR] Simplify createReplacementInstr
NFCI, although the test change shows that ConstantExpr::getAsInstruction
is better than the old implementation of createReplacementInstr because
it propagates things like the sdiv "exact" flag.

Differential Revision: https://reviews.llvm.org/D104124
2021-06-23 10:47:43 +01:00
Yaxun (Sam) Liu 054cc3b1b4 [CUDA][HIP] Fix store of vtbl in ctor
vtbl itself is in default global address space. When clang emits
ctor, it gets a pointer to the vtbl field based on the this pointer,
then stores vtbl to the pointer.

Since this pointer can point to any address space (e.g. an object
created in stack), this pointer points to default address space, therefore
the pointer to vtbl field in this object should also be in default
address space.

Currently, clang incorrectly casts the pointer to vtbl field in this object
to global address space. This caused assertions in backend.

This patch fixes that by removing the incorrect addr space cast.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D103835
2021-06-08 10:24:44 -04:00
Konstantin Zhuravlyov 4d9f8527db CUDA/HIP: Change device-use-host-var.cu's NOT "external" check to include variable name
Otherwise it is causing one of our build jobs to fail,
it is using "external" as directory, and NOT is
failing because "external" is found in ModuleID.

Differential Revision: https://reviews.llvm.org/D103658
2021-06-04 13:10:00 -04:00
Yaxun (Sam) Liu e42def62d8 [HIP] Fix amdgcn builtin for long type
Currently some amdgcn builtins are defined with long int type,
which causes invalid IR on Windows since long int is 32 bit
on Windows whereas these builtins have 64 bit arguments.

long long int type cannot be used since it is 128 bit in OpenCL.

This patch uses 64 bit int type instead of long int to define 64 bit int
arguments or return for amdgcn builtins.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D103563
2021-06-03 19:05:56 -04:00
Yaxun (Sam) Liu 04caa7c3e0 [CUDA][HIP] Promote const variables to constant
Recently we added diagnosing ODR-use of host variables
in device functions, which includes ODR-use of const
host variables since they are not really emitted on
device side. This caused regressions since we used
to allow ODR-use of const host variables in device
functions.

This patch allows ODR-use of const variables in device
functions if the const variables can be statically initialized
and have an empty dtor. Such variables are marked with
implicit constant attrs and emitted on device side. This is
in line with what clang does for constexpr variables.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D103108
2021-06-01 21:28:41 -04:00
Yaxun (Sam) Liu 4cb42564ec [CUDA][HIP] Fix device variables used by host
variables emitted on both host and device side with different addresses
when ODR-used by host function should not cause device side counter-part
to be force emitted.

This fixes the regression caused by https://reviews.llvm.org/D102237

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102801
2021-05-20 17:04:29 -04:00
Fangrui Song 37561ba89b -fno-semantic-interposition: Don't set dso_local on GlobalVariable
`clang -fpic -fno-semantic-interposition` may set dso_local on variables for -fpic.

GCC folks consider there are 'address interposition' and 'semantic interposition',
and 'disabling semantic interposition' can optimize function calls but
cannot change variable references to use local aliases
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483).

This patch removes dso_local for variables in
`clang -fpic -fno-semantic-interposition` mode so that the built shared objects can
work with copy relocations. Building llvm-project tiself with
-fno-semantic-interposition (D102453) should now be safe with trunk Clang.

Example:
```
// a.c
int var;
int *addr() { return var; }

// old: cannot be interposed
movslq  .Lvar$local(%rip), %rax
// new: can be interposed
movq    var@GOTPCREL(%rip), %rax
movslq  (%rax), %rax
```

The local alias lowering for `GlobalVariable`s is kept in case there is a
future option allowing local aliases.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D102583
2021-05-19 16:08:28 -07:00
Steffen Larsen f226e28a88 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX redux.sync instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `redux.sync` instructions
for `sm_80` architecture or newer.

PTX ISA description of `redux.sync`:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-redux-sync

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>

Differential Revision: https://reviews.llvm.org/D100124
2021-05-17 09:46:59 -07:00
Yaxun (Sam) Liu 98575708da [CUDA][HIP] Fix device template variables
Currently clang does not emit device template variables
instantiated only in host functions, however, nvcc is
able to do that:

https://godbolt.org/z/fneEfferY

This patch fixes this issue by refactoring and extending
the existing mechanism for emitting static device
var ODR-used by host only. Basically clang records
device variables ODR-used by host code and force
them to be emitted in device compilation. The existing
mechanism makes sure these device variables ODR-used
by host code are added to llvm.compiler-used, therefore
they are guaranteed not to be deleted.

It also fixes non-ODR-use of static device variable by host code
causing static device variable to be emitted and registered,
which should not.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D102237
2021-05-12 11:13:29 -04:00
Yaxun (Sam) Liu c58a6a6fb4 [HIP] Fix device lib selection
Choose optimized device lib bitcode by fp options
for performance.

Reviewed by: Artem Belevich, Fangrui Song

Differential Revision: https://reviews.llvm.org/D101654
2021-05-01 20:31:11 -04:00
Yaxun (Sam) Liu d8805574c1 [CUDA][HIP] Allow non-ODR use of host var in device
Reviewed by: Artem Belevich, Richard Smith

Differential Revision: https://reviews.llvm.org/D98193
2021-04-19 14:45:24 -04:00
Yaxun (Sam) Liu d5c0f00e21 [CUDA][HIP] Mark device var used by host only
Add device variables to llvm.compiler.used if they are
ODR-used by either host or device functions.

This is necessary to prevent them from being
eliminated by whole-program optimization
where the compiler has no way to know a device
variable is used by some host code.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D98814
2021-04-17 11:25:25 -04:00
Yaxun (Sam) Liu 3597f02fd5 [AMDGPU] Add GlobalDCE before internalization pass
The internalization pass only internalizes global variables
with no users. If the global variable has some dead user,
the internalization pass will not internalize it.

To be able to internalize global variables with dead
users, a global dce pass is needed before the
internalization pass.

This patch adds that.

Reviewed by: Artem Belevich, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D98783
2021-04-17 11:25:25 -04:00
Yaxun (Sam) Liu 61d065e21f Let clang atomic builtins fetch add/sub support floating point types
Recently atomicrmw started to support fadd/fsub:

https://reviews.llvm.org/D53965

However clang atomic builtins fetch add/sub still does not support
emitting atomicrmw fadd/fsub.

This patch adds that.

Reviewed by: John McCall, Artem Belevich, Matt Arsenault, JF Bastien,
James Y Knight, Louis Dionne, Olivier Giroux

Differential Revision: https://reviews.llvm.org/D71726
2021-04-06 15:44:00 -04:00
Yaxun (Sam) Liu 907af84396 [CUDA][HIP] rename -fcuda-flush-denormals-to-zero
Rename it to -fgpu-flush-denormals-to-zero.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D99688
2021-04-05 00:13:51 -04:00
Thomas Preud'homme 292726b644 [HIP, test] Fix use of undef FileCheck var
Clang test CodeGenCUDA/kernel-stub-name.cu uses never defined DKERN
variable in a CHECK-NOT directive. This commit replace the variable by a
regex, thereby avoiding the issue.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D99832
2021-04-04 19:30:49 +01:00
Thomas Preud'homme a41b5100e4 [HIP-Clang, test] Fix use of undef FileCheck var
Commit 8129521318 changed a line defining
PREFIX in clang test CodeGenCUDA/device-stub.cu into a CHECK-NOT
directive. All following lines using PREFIX are therefore using an
undefined variable since the pattern defining PREFIX is not supposed to
occur and CHECK-NOT are checked independently.

This commit replaces all uses of PREFIX by the regex used to define it,
thereby avoiding the problem.

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D99831
2021-04-04 19:30:27 +01:00
Yaxun (Sam) Liu cc9477166a [CUDA][HIP] add __builtin_get_device_side_mangled_name
Add builtin function __builtin_get_device_side_mangled_name
to get device side manged name for functions and global
variables, which can be used to get symbol address of kernels
or variables by mangled name in dynamically loaded
bundled code objects at run time.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D99301
2021-03-25 15:25:29 -04:00
Yaxun (Sam) Liu 9ecbb34e1d Fix test cxx-call-kernel.cpp
Only test it with x86 since other target may have an ABI
making it difficult to test.

Change-Id: I85423c8bbbbbb8f24cb3ea4cb64a408069b4d61c
2021-03-01 17:10:53 -05:00
Yaxun (Sam) Liu 5cf2a37f12 [HIP] Emit kernel symbol
Currently clang uses stub function to launch kernel. This is inconvenient
to interop with C++ programs since the stub function has different name
as kernel, which is required by ROCm debugger.

This patch emits a variable symbol which has the same name as the kernel
and uses it to register and launch the kernel. This allows C++ program to
launch a kernel by using the original kernel name.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D86376
2021-03-01 16:31:40 -05:00
Fangrui Song 28cb620321 Change some addUsedGlobal to addUsedOrCompilerUsedGlobal
An global value in the `llvm.used` list does not have GC root semantics on ELF targets.
This will be changed in a subsequent backend patch.

Change some `llvm.used` in the ELF code path to use `llvm.compiler.used` to
prevent undesired GC root semantics.

Change one extern "C" alias (due to `__attribute__((used))` in extern "C") to use `llvm.compiler.used` on all targets.

GNU ld has a rule "`__start_/__stop_` references from a live input section retain the associated C identifier name sections",
which LLD may drop entirely (currently refined to exclude SHF_LINK_ORDER/SHF_GROUP) in a future release (the rule makes it clumsy to GC metadata sections; D96914 added a way to try the potential future behavior).
For `llvm.used` global values defined in a C identifier name section, keep using `llvm.used` so that
the future LLD change will not affect them.

rnk kindly categorized the changes:
```
ObjC/blocks: this wants GC root semantics, since ObjC mainly runs on Mac.
MS C++ ABI stuff: wants GC root semantics, no change
OpenMP: unsure, but GC root semantics probably don't hurt
CodeGenModule: affected in this patch to *not* use GC root semantics so that __attribute__((used)) behavior remains the same on ELF, plus two other minor use cases that don't want GC semantics
Coverage: Probably want GC root semantics
CGExpr.cpp: refers to LTO, wants GC root
CGDeclCXX.cpp: one is MS ABI specific, so yes GC root, one is some other C++ init functionality, which should form GC roots (C++ initializers can have side effects and must run)
CGDecl.cpp: Changed in this patch for __attribute__((used))
```

Differential Revision: https://reviews.llvm.org/D97446
2021-02-26 10:42:07 -08:00
Yaxun (Sam) Liu 47acdec1dd [CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc
For -fgpu-rdc mode, static device vars in different TU's may have the same name.
To support accessing file-scope static device variables in host code, we need to give them
a distinct name and external linkage. This can be done by postfixing each static device variable with
a distinct CUID (Compilation Unit ID) hash.

Since the static device variables have different name across compilation units, now we let
them have external linkage so that they can be looked up by the runtime.

Reviewed by: Artem Belevich, and Jon Chesterfield

Differential Revision: https://reviews.llvm.org/D85223
2021-02-24 18:23:45 -05:00
Yaxun (Sam) Liu a3ce7f5cd2 [HIP] Fix managed variable linkage
Currently managed variables are emitted as undefined symbols, which
causes difficulty for diagnosing undefined symbols for non-managed
variables.

This patch transforms managed variables in device compilation so that
they can be emitted as normal variables.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D96195
2021-02-23 22:34:45 -05:00
Yaxun (Sam) Liu 053e61d54e Relands "[HIP] Change default --gpu-max-threads-per-block value to 1024"
This reverts commit e384e94fbe.
2021-02-12 10:53:59 -05:00
Yaxun (Sam) Liu b008ea304d [CUDA][HIP] Fix device variable linkage
For -fgpu-rdc, shadow variables should not be internalized, otherwise
they cannot be accessed by other TUs. This is necessary because
the shadow variable of external device variables are always
emitted as undefined symbols, which need to resolve to a global
symbols.

Managed variables need to be emitted as undefined symbols
in device compilations.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D95901
2021-02-05 15:11:12 -05:00
Michael Liao 01bf529db2 Recommit of a2fdf9d4d7.
- The failures are all cc1-based tests due to the missing `-aux-triple` options,
which is always prepared by the driver in CUDA/HIP compilation.
- Add extra check on the missing aux-targetinfo to prevent crashing.

[hip][cuda] Enable extended lambda support on Windows.

- On Windows, extended lambda has extra issues due to the numbering
schemes are different between the host compilation (Microsoft C++ ABI)
and the device compilation (Itanium C++ ABI. Additional device side
lambda number is required per lambda for the host compilation to
correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
number a lambda for both host- and device-compilations.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D69322

This reverts commit 4874ff0241.
2021-02-05 11:27:30 -05:00
Nico Weber 4874ff0241 Revert "[hip][cuda] Enable extended lambda support on Windows."
This reverts commit a2fdf9d4d7.
Slightly speculative, seeing several cuda tests fail on this
Windows bot: http://45.33.8.238/win/32620/step_7.txt
2021-02-04 07:10:46 -05:00
Michael Liao a2fdf9d4d7 [hip][cuda] Enable extended lambda support on Windows.
- On Windows, extended lambda has extra issues due to the numbering
  schemes are different between the host compilation (Microsoft C++ ABI)
  and the device compilation (Itanium C++ ABI. Additional device side
  lambda number is required per lambda for the host compilation to
  correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
  number a lambda for both host- and device-compilations.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D69322
2021-02-04 01:38:29 -05:00
Yaxun (Sam) Liu 622eaa4a4c [HIP] Support __managed__ attribute
This patch implements codegen for __managed__ variable attribute for HIP.

Diagnostics will be added later.

Differential Revision: https://reviews.llvm.org/D94814
2021-01-22 11:43:58 -05:00
Artem Belevich 127091bfd5 [CUDA] Normalize handling of defauled dtor.
Defaulted destructor was treated inconsistently, compared to other
compiler-generated functions.

When Sema::IdentifyCUDATarget() got called on just-created dtor which didn't
have implicit __host__ __device__ attributes applied yet, it would treat it as a
host function.  That happened to (sometimes) hide the error when dtor referred
to a host-only functions.

Even when we had identified defaulted dtor as a HD function, we still treated it
inconsistently during selection of usual deallocators, where we did not allow
referring to wrong-side functions, while it is allowed for other HD functions.

This change brings handling of defaulted dtors in line with other HD functions.

Differential Revision: https://reviews.llvm.org/D94732
2021-01-21 10:48:07 -08:00
Jan Svoboda 7ab803095a [clang][cli] Remove -f[no-]trapping-math from -cc1 command line
This patch removes the -f[no-]trapping-math flags from the -cc1 command line. These flags are ignored in the command line parser and their semantics is fully handled by -ffp-exception-mode.

This patch does not remove -f[no-]trapping-math from the driver command line. The driver flags are being used and do affect compilation.

Reviewed By: dexonsmith, SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D93395
2021-01-12 10:00:23 +01:00
Fangrui Song 219d00e0d9 [test] Make ELF tests immune to dso_local/dso_preemptable/(none) differences
ELF -cc1 -mrelocation-model pic will default to no semantic interposition plus
setting dso_local on default visibility external linkage definitions, so that
COFF, Mach-O and ELF output will be similar.

This patch makes tests immune to the differences.
2020-12-31 13:59:44 -08:00
Fangrui Song fd739804e0 [test] Add {{.*}} to make ELF tests immune to dso_local/dso_preemptable/(none) differences
For a default visibility external linkage definition, dso_local is set for ELF
-fno-pic/-fpie and COFF and Mach-O. Since default clang -cc1 for ELF is similar
to -fpic ("PIC Level" is not set), this nuance causes unneeded binary format differences.

To make emitted IR similar, ELF -cc1 -fpic will default to -fno-semantic-interposition,
which sets dso_local for default visibility external linkage definitions.

To make this flip smooth and enable future (dso_local as definition default),
this patch replaces (function) `define ` with `define{{.*}} `,
(variable/constant/alias) `= ` with `={{.*}} `, or inserts appropriate `{{.*}} `.
2020-12-31 00:27:11 -08:00
Yaxun (Sam) Liu 3a781b912f Fix assertion in tryEmitAsConstant
due to cd95338ee3

Need to check if result is LValue before getLValueBase.
2020-12-02 19:10:01 -05:00
Yaxun (Sam) Liu 5c8911d0ba [CUDA][HIP] Diagnose reference of host variable
This patch diagnoses invalid references of global host variables in device,
global, or host device functions.

Differential Revision: https://reviews.llvm.org/D91281
2020-12-02 10:15:56 -05:00
Yaxun (Sam) Liu cd95338ee3 [CUDA][HIP] Fix capturing reference to host variable
In C++ when a reference variable is captured by copy, the lambda
is supposed to make a copy of the referenced variable in the captures
and refer to the copy in the lambda. Therefore, it is valid to capture
a reference to a host global variable in a device lambda since the
device lambda will refer to the copy of the host global variable instead
of access the host global variable directly.

However, clang tries to avoid capturing of reference to a host global variable
if it determines the use of the reference variable in the lambda function is
not odr-use. Clang also tries to emit load of the reference to a global variable
as load of the global variable if it determines that the reference variable is
a compile-time constant.

For a device lambda to capture a reference variable to host global variable
and use the captured value, clang needs to be taught that in such cases the use of the reference
variable is odr-use and the reference variable is not compile-time constant.

This patch fixes that.

Differential Revision: https://reviews.llvm.org/D91088
2020-12-02 10:14:46 -05:00