Commit Graph

97 Commits

Author SHA1 Message Date
Yaxun Liu aa24601f98 [CUDA][HIP] Allow CUDA __global__ functions to have amdgpu kernel attributes
There are HIP applications e.g. Tensorflow 1.3 using amdgpu kernel attributes, however
currently they are only allowed on OpenCL kernel functions.

This patch will allow amdgpu kernel attributes to be applied to CUDA/HIP __global__
functions.

Differential Revision: https://reviews.llvm.org/D47958

llvm-svn: 334561
2018-06-12 23:58:59 +00:00
Yaxun Liu 6c10a66ec7 [CUDA][HIP] Set kernel calling convention before arrange function
Currently clang set kernel calling convention for CUDA/HIP after
arranging function, which causes incorrect kernel function type since
it depends on calling convention.

This patch moves setting kernel convention before arranging
function.

Differential Revision: https://reviews.llvm.org/D47733

llvm-svn: 334457
2018-06-12 00:16:33 +00:00
Jonas Hahnfeld 3b9cbba9a8 [CUDA] Fix emission of constant strings in sections
CGM.GetAddrOfConstantCString() sets the adress of the created GlobalValue
to unnamed. When emitting the object file LLVM will mark the surrounding
section as SHF_MERGE iff the string is nul-terminated and contains no
other nuls (see IsNullTerminatedString). This results in problems when
saving temporaries because LLVM doesn't set an EntrySize, so reading in
the serialized assembly file fails.
This never happened for the GPU binaries because they usually contain
a nul-character somewhere. Instead this only affected the module ID
when compiling relocatable device code.

However, this points to a potentially larger problem: If we put a
constant string into a named section, we really want the data to end
up in that section in the object file. To avoid LLVM merging sections
this patch unmarks the GlobalVariable's address as unnamed which also
fixes the problem of invalid serialized assembly files when saving
temporaries.

Differential Revision: https://reviews.llvm.org/D47902

llvm-svn: 334281
2018-06-08 11:17:08 +00:00
Yaxun Liu 6328f9a988 [CUDA][HIP] Do not emit type info when compiling for device
CUDA/HIP does not support RTTI on device side, therefore there
is no point of emitting type info when compiling for device.

Emitting type info for device not only clutters the IR with useless
global variables, but also causes undefined symbol at linking
since vtable for cxxabiv1::class_type_info has external linkage.

Differential Revision: https://reviews.llvm.org/D47694

llvm-svn: 334021
2018-06-05 15:11:02 +00:00
Yaxun Liu 29155b01c1 [HIP] Support offloading by linker script
To support linking device code in different source files, it is necessary to
embed fat binary at host linking stage.

This patch emits an external symbol for fat binary in host codegen, then
embed the fat binary by lld through a linker script.

Differential Revision: https://reviews.llvm.org/D46472

llvm-svn: 332724
2018-05-18 15:07:56 +00:00
Yaxun Liu 48390a992f Fix failure in lit test kernel-call.cu due to name mangling
llvm-svn: 330821
2018-04-25 13:07:58 +00:00
Yaxun Liu 997e64f8a6 Fix lit test kernel-call.cu failure on ps4 due to dso_local
llvm-svn: 330795
2018-04-25 03:16:07 +00:00
Yaxun Liu e21278d938 Fix failure in lit test kernel-call.cu
There is signext on ppc64. Just remove check for function argument.

llvm-svn: 330793
2018-04-25 02:34:04 +00:00
Yaxun Liu 887c569bcb [HIP] Add hip input kind and codegen for kernel launching
HIP is a language similar to CUDA (https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md ).
The language syntax is very similar, which allows a hip program to be compiled as a CUDA program by Clang. The main difference
is the host API. HIP has a set of vendor neutral host API which can be implemented on different platforms. Currently there is open source
implementation of HIP runtime on amdgpu target (https://github.com/ROCm-Developer-Tools/HIP).

This patch adds support of input kind and language standard hip.

When hip file is compiled, both LangOpts.CUDA and LangOpts.HIP is turned on. This allows compilation of hip program as CUDA
in most cases and only special handling of hip program is needed LangOpts.HIP is checked.

This patch also adds support of kernel launching of HIP program using HIP host API.

When -x hip is not specified, there is no behaviour change for CUDA.

Patch by Greg Rodgers.
Revised and lit test added by Yaxun Liu.

Differential Revision: https://reviews.llvm.org/D44984

llvm-svn: 330790
2018-04-25 01:10:37 +00:00
Yaxun Liu 4306f2086f [CUDA] Set LLVM calling convention for CUDA kernel
Some targets need special LLVM calling convention for CUDA kernel.
This patch does that through a TargetCodeGenInfo hook.

It only affects amdgcn target.

Patch by Greg Rodgers.
Revised and lit tests added by Yaxun Liu.

Differential Revision: https://reviews.llvm.org/D45223

llvm-svn: 330447
2018-04-20 17:01:03 +00:00
Jonas Hahnfeld f5527c2381 [CUDA] Register relocatable GPU binaries
nvcc generates a unique registration function for each object file
that contains relocatable device code. Unique names are achieved
with a module id that is also reflected in the function's name.

Differential Revision: https://reviews.llvm.org/D42922

llvm-svn: 330425
2018-04-20 13:04:45 +00:00
Eli Friedman 01d349bab1 Remove -cc1 option "-backend-option".
It means the same thing as -mllvm; there isn't any reason to have two
options which do the same thing.

Differential Revision: https://reviews.llvm.org/D45109

llvm-svn: 329965
2018-04-12 22:21:36 +00:00
Alexander Kornienko 2a8c18d991 Fix typos in clang
Found via codespell -q 3 -I ../clang-whitelist.txt
Where whitelist consists of:

  archtype
  cas
  classs
  checkk
  compres
  definit
  frome
  iff
  inteval
  ith
  lod
  methode
  nd
  optin
  ot
  pres
  statics
  te
  thru

Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few
files that have dubious fixes reverted.)

Differential revision: https://reviews.llvm.org/D44188

llvm-svn: 329399
2018-04-06 15:14:32 +00:00
Artem Belevich 55ebd6cc26 Revert "Set calling convention for CUDA kernel"
This reverts r328795 which introduced an issue with referencing __global__
function templates. More details in the original review D44747.

llvm-svn: 329099
2018-04-03 18:29:31 +00:00
Yaxun Liu a64a491e7b [CUDA] Let device-side shared variables be initialized with undef
CUDA shared variable should be initialized with undef.

Patch by Greg Rodgers.
Revised and lit test added by Yaxun Liu.

Differential Revision: https://reviews.llvm.org/D44985

llvm-svn: 328994
2018-04-02 17:38:24 +00:00
Yaxun Liu b2f2bb26e4 Set calling convention for CUDA kernel
This patch sets target specific calling convention for CUDA kernels in IR.

Patch by Greg Rodgers.
Revised and lit test added by Yaxun Liu.

Differential Revision: https://reviews.llvm.org/D44747

llvm-svn: 328795
2018-03-29 15:02:08 +00:00
Yaxun Liu b0eee29c74 Disable emitting static extern C aliases for amdgcn target for CUDA
Patch by Greg Rodgers.
Revised and lit test added by Yaxun Liu.

Differential Revision: https://reviews.llvm.org/D44987

llvm-svn: 328793
2018-03-29 14:50:00 +00:00
Rafael Espindola 2a639a4c11 Really fix test on windows.
Sorry for the noise.

llvm-svn: 325943
2018-02-23 19:38:41 +00:00
Rafael Espindola f43c2ff84b Fix one last test on a windows host.
llvm-svn: 325942
2018-02-23 19:36:20 +00:00
Artem Belevich 5ecdb94487 [CUDA] CUDA has no device-side library builtins.
We should (almost) never consider a device-side declaration to match a
library builtin functio.  Otherwise clang may ignore the implementation
provided by the CUDA headers and emit clang's idea of the builtin.

Differential Revision: https://reviews.llvm.org/D42319

llvm-svn: 323239
2018-01-23 19:08:18 +00:00
Matthias Braun a451953224 CodeGenModule: Always output wchar_size, check LLVM assumptions.
Re-commit r303463 now that LLVM is fixed and adjust some lit tests.

llvm::TargetLibraryInfo needs to know the size of wchar_t to work on
functions like `wcslen`. This patch changes clang to always emit the
wchar_size module flag (it would only do so for ARM previously).
This also adds an `assert()` to ensure the LLVM defaults based on the
target triple are in sync with clang.

Differential Revision: https://reviews.llvm.org/D32982

llvm-svn: 303478
2017-05-20 01:29:55 +00:00
Adam Nemet 049a31d53d Use FPContractModeKind universally
FPContractModeKind is the codegen option flag which is already ternary (off,
on, fast).  This makes it universally the type for the contractable info
across the front-end:

* In FPOptions (i.e. in the Sema + in the expression nodes).
* In LangOpts::DefaultFPContractMode which is the option that initializes
FPOptions in the Sema.

Another way to look at this change is that before fp-contractable on/off were
the only states handled to the front-end:
 * For "on", FMA folding was performed by  the front-end
 * For "fast", we simply forwarded the flag to TargetOptions to handle it in
 LLVM

Now off/on/fast are all exposed because for fast we will generate
fast-math-flags during CodeGen.

This is toward moving fp-contraction=fast from an LLVM TargetOption to a
FastMathFlag in order to fix PR25721.

---
This is a recommit of r299027 with an adjustment to the test
CodeGenCUDA/fp-contract.cu.  The test assumed that even
though -ffp-contract=on is passed FE-based folding of FMA won't happen.

This is obviously wrong since the user is asking for this explicitly with the
option.  CUDA is different that -ffp-contract=fast is on by default.

The test used to "work" because contract=fast and contract=on were maintained
separately and we didn't fold in the FE because contract=fast was on due to
the target-default.  This patch consolidates the contract=on/fast/off state
into a ternary state hence the change in behavior.
---

Differential Revision: https://reviews.llvm.org/D31167

llvm-svn: 299033
2017-03-29 21:54:24 +00:00
Justin Lebar b080b630b1 [CodeGen] [CUDA] Add the ability set default attrs on functions in linked modules.
Summary:
Now when you ask clang to link in a bitcode module, you can tell it to
set attributes on that module's functions to match what we would have
set if we'd emitted those functions ourselves.

This is particularly important for fast-math attributes in CUDA
compilations.

Each CUDA compilation links in libdevice, a bitcode library provided by
nvidia as part of the CUDA distribution.  Without this patch, if we have
a user-function F that is compiled with -ffast-math that calls a
function G from libdevice, F will have the unsafe-fp-math=true (etc.)
attributes, but G will have no attributes.

Since F calls G, the inliner will merge G's attributes into F's.  It
considers the lack of an unsafe-fp-math=true attribute on G to be
tantamount to unsafe-fp-math=false, so it "merges" these by setting
unsafe-fp-math=false on F.

This then continues up the call graph, until every function that
(transitively) calls something in libdevice gets unsafe-fp-math=false
set, thus disabling fastmath in almost all CUDA code.

Reviewers: echristo

Subscribers: hfinkel, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D28538

llvm-svn: 293097
2017-01-25 21:29:48 +00:00
Artem Belevich 13e9b4d768 [CUDA] Improve target attribute checking for function templates.
* __host__ __device__ functions are no longer considered to be
  redeclarations of __host__ or __device__ functions. This prevents
  unintentional merging of target attributes across them.
* Function target attributes are not considered (and must match) during
  explicit instantiation and specialization of function templates.

Differential Revision: https://reviews.llvm.org/D25809

llvm-svn: 288962
2016-12-07 19:27:16 +00:00
Justin Lebar 2dfbe9a3b4 [CUDA] Rename cuda_builtin_vars.h to __clang_cuda_builtin_vars.h.
Summary: This matches the idiom we use for our other CUDA wrapper headers.

Reviewers: tra

Subscribers: beanz, mgorny, cfe-commits

Differential Revision: https://reviews.llvm.org/D24978

llvm-svn: 283679
2016-10-08 22:16:08 +00:00
Justin Lebar 4a759ff44c [CUDA] Add missing ':' to noexcept.cu test.
llvm-svn: 283280
2016-10-05 00:27:38 +00:00
Justin Lebar 3e6449b4f4 [CUDA] Mark device functions as nounwind.
Summary:
This prevents clang from emitting 'invoke's and catch statements.

Things previously mostly worked thanks to TryToMarkNoThrow() in
CodeGenFunction.  But this is not a proper IPO, and it doesn't properly
handle cases like mutual recursion.

Fixes bug 30593.

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D25166

llvm-svn: 283272
2016-10-04 23:41:49 +00:00
Justin Lebar e060feb7b1 [CUDA] Disallow overloading destructors.
Summary:
We'd attempted to allow this, but turns out we were doing a very bad
job.  :)

Making this work properly would be a giant change in clang.  For
example, we'd need to make CXXRecordDecl::getDestructor()
context-sensitive, because the destructor you end up with depends on
where you're calling it from.

For now (and hopefully for ever), just disallow overloading of
destructors in CUDA.

Reviewers: rsmith

Subscribers: cfe-commits, tra

Differential Revision: https://reviews.llvm.org/D24571

llvm-svn: 283120
2016-10-03 16:48:23 +00:00
Justin Lebar 18e2d82297 [CUDA] Raise an error if a wrong-side call is codegen'ed.
Summary:
Some function calls in CUDA are allowed to appear in
semantically-correct programs but are an error if they're ever
codegen'ed.  Specifically, a host+device function may call a host
function, but it's an error if such a function is ever codegen'ed in
device mode (and vice versa).

Previously, clang made no attempt to catch these errors.  For the most
part, they would be caught by ptxas, and reported as "call to unknown
function 'foo'".

Now we catch these errors and report them the same as we report other
illegal calls (e.g. a call from a host function to a device function).

This has a small change in error-message behavior for calls that were
previously disallowed (e.g. calls from a host to a device function).
Previously, we'd catch disallowed calls fairly early, before doing
additional semantic checking e.g. of the call's arguments.  Now we catch
these illegal calls at the very end of our semantic checks, so we'll
only emit a "illegal CUDA call" error if the call is otherwise
well-formed.

Reviewers: tra, rnk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D23242

llvm-svn: 278759
2016-08-15 23:00:49 +00:00
Artem Belevich 4c09318be2 [CUDA] Place GPU binary into .nv_fatbin section and align it by 8.
This matches the way nvcc encapsulates GPU binaries into host object file.
Now cuobjdump can deal with clang-compiled object files.

Differential Revision: https://reviews.llvm.org/D23429

llvm-svn: 278549
2016-08-12 18:44:01 +00:00
Justin Lebar e56360a2cd [CUDA] Align kernel launch args correctly when the LLVM type's alignment is different from the clang type's alignment.
Summary:
Before this patch, we computed the offsets in memory of args passed to
GPU kernel functions by throwing all of the args into an LLVM struct.

clang emits packed llvm structs basically whenever it feels like it, and
packed structs have alignment 1.  So we cannot rely on the llvm type's
alignment matching the C++ type's alignment.

This patch fixes our codegen so we always respect the clang types'
alignments.

Reviewers: rnk

Subscribers: cfe-commits, tra

Differential Revision: https://reviews.llvm.org/D22879

llvm-svn: 276927
2016-07-27 22:36:21 +00:00
Justin Bogner 2d5de7e568 NVPTX: Use the nvvm builtins to read SRegs rather than the legacy ptx ones
The ptx spellings were removed from LLVM in r274769.

llvm-svn: 274770
2016-07-07 16:41:08 +00:00
Justin Lebar 27ee130e38 [CUDA] Give templated device functions internal linkage, templated kernels external linkage.
Summary:
This lets LLVM perform IPO over these functions.  In particular, it
allows LLVM to emit ld.global.nc for loads to __restrict pointers in
kernels that are never written to.

Reviewers: rsmith

Subscribers: cfe-commits, tra

Differential Revision: http://reviews.llvm.org/D21337

llvm-svn: 274261
2016-06-30 18:41:33 +00:00
Artem Belevich bcec9dac14 [CUDA] Add implicit conversion of __launch_bounds__ arguments to rvalue.
Fixes clang crash reported in PR27778.

Differential Revision: http://reviews.llvm.org/D20985

llvm-svn: 271951
2016-06-06 22:54:57 +00:00
Justin Lebar f179364341 [CUDA] Conservatively mark inline asm as convergent.
Summary:
This is particularly important because a some convergent CUDA intrinsics
(e.g.  __shfl_down) are implemented in terms of inline asm.

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D20836

llvm-svn: 271336
2016-05-31 21:27:13 +00:00
Reid Kleckner a769fd50ba Avoid depending on test inputes that aren't in Inputs
Some people have weird CI systems that run each test subdirectory
independently without access to other parallel trees.

Unfortunately, this means we have to suffer some duplication until Art
can sort out how to share these types.

llvm-svn: 270164
2016-05-20 00:38:25 +00:00
Artem Belevich 3650bbeebc [CUDA] Do not allow non-empty destructors for global device-side variables.
According to Cuda Programming guide (v7.5, E2.3.1):
> __device__, __constant__ and __shared__ variables defined in namespace
> scope, that are of class type, cannot have a non-empty constructor or a
> non-empty destructor.

Clang already deals with device-side constructors (see D15305).
This patch enforces similar rules for destructors.

Differential Revision: http://reviews.llvm.org/D20140

llvm-svn: 270108
2016-05-19 20:13:53 +00:00
Artem Belevich 85b6f63f42 [CUDA] Split device-var-init.cu tests into separate Sema and CodeGen parts.
Codegen tests for device-side variable initialization are subset of test
cases used to verify Sema's part of the job.
Including CodeGenCUDA/device-var-init.cu from SemaCUDA makes it easier to
keep both sides in sync.

Differential Revision: http://reviews.llvm.org/D20139

llvm-svn: 270107
2016-05-19 20:13:39 +00:00
Artem Belevich 31c3bad499 [CUDA] Enable fusing FP ops (-ffp-contract=fast) for CUDA by default.
This matches default nvcc behavior and gives substantial
performance boost on GPU where fmad is much cheaper compared to add+mul.

Differential Revision: http://reviews.llvm.org/D20341

llvm-svn: 270094
2016-05-19 18:44:45 +00:00
Justin Lebar 3b30b7eef6 [CUDA] Fix flush-denormals.cu test so that it checks what it intends to CHECK.
FileCheck does not evaluate plain CHECKs if you pass -check-prefix; you
have to ask for it explicitly.

llvm-svn: 269000
2016-05-10 00:34:50 +00:00
Artem Belevich 4d430badeb [CUDA] Restrict init of local __shared__ variables to empty constructors only.
Allow only empty constructors for local __shared__ variables in a way
identical to restrictions imposed on dynamic initializers for global
variables on device.

Differential Revision: http://reviews.llvm.org/D20039

llvm-svn: 268982
2016-05-09 22:09:56 +00:00
Artem Belevich 0c0ada01b6 [CUDA] Only __shared__ variables can be static local on device side.
According to CUDA programming guide (v7.5):
> E.2.9.4: Within the body of a device or global function, only
> shared variables may be declared with static storage class.

Differential Revision: http://reviews.llvm.org/D20034

llvm-svn: 268962
2016-05-09 19:36:08 +00:00
Artem Belevich ca2b951cbc [CUDA] Make sure device-side __global__ functions are always visible.
__global__ functions are a special case in CUDA.

Even when the symbol would normally not be externally
visible according to C++ rules, they still must be visible
in CUDA GPU object so host-side stub can launch them.

Differential Revision: http://reviews.llvm.org/D19748

llvm-svn: 268299
2016-05-02 20:30:03 +00:00
Justin Lebar d3a44f6885 [CUDA] Add -fcuda-flush-denormals-to-zero.
Summary:
Setting this flag causes all functions are annotated with the
"nvvm-f32ftz" = "true" attribute.

In addition, we annotate the module with "nvvm-reflect-ftz" set
to 0 or 1, depending on whether -cuda-flush-denormals-to-zero is set.
This is read by the NVVMReflect pass.

Reviewers: tra, rnk

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D18671

llvm-svn: 265435
2016-04-05 18:26:20 +00:00
Justin Lebar 19b648eae3 [CUDA] Add -disable-llvm-passes to CodeGenCUDA/link-device-bitcode.cu. NFC
We already have this flag in most of the file, but we need it everywhere
else, to disable the NVVMReflect pass, which we're explicitly checking
doesn't run here.  (Upcoming changes to llvm will cause it to be run.)

llvm-svn: 264969
2016-03-30 23:45:38 +00:00
Justin Lebar 25c4a81e79 [CUDA] Remove three obsolete CUDA cc1 flags.
Summary:
* -fcuda-target-overloads

  Previously unconditionally set to true by the driver.  Necessary for
  correct functioning of the compiler -- our CUDA headers wrapper won't
  compile without this.

* -fcuda-disable-target-call-checks

  Previously unconditionally set to true by the driver.  Necessary to
  compile almost any external CUDA code -- almost all libraries assume
  that host+device code can call host or device functions.

* -fcuda-allow-host-calls-from-host-device

  No effect when target overloading is enabled.

Reviewers: tra

Subscribers: rsmith, cfe-commits

Differential Revision: http://reviews.llvm.org/D18416

llvm-svn: 264739
2016-03-29 16:24:16 +00:00
Justin Lebar e5eed04d52 [CUDA] Merge most of CodeGenCUDA/function-overload.cu into SemaCUDA/function-overload.cu.
Summary:
Previously we were using the codegen test to ensure that we choose the
right overload.  But we can do this within sema, with a bit of
cleverness.

I left the constructor/destructor checks in CodeGen, because these
overloads (particularly on the destructors) are hard to check in Sema.

Reviewers: tra

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D18386

llvm-svn: 264207
2016-03-23 22:42:30 +00:00
Artem Belevich 3609085dc4 Fixed test failure platforms with name mangling different from Linux.
* Run cc with -triple x86_64-linux-gnu to make symbol mangling predictable.
* Use temporary file as a fake GPU input so its content
  does not interfere with pattern matching.

llvm-svn: 262516
2016-03-02 21:03:20 +00:00
Artem Belevich 8c1ec1ef38 [CUDA] Do not generate unnecessary runtime init code.
Differential Revision: http://reviews.llvm.org/D17780

llvm-svn: 262499
2016-03-02 18:28:53 +00:00
Artem Belevich 42e1949b46 [CUDA] Emit host-side 'shadows' for device-side global variables
... and register them with CUDA runtime.

This is needed for commonly used cudaMemcpy*() APIs that use address of
host-side shadow to access their counterparts on device side.

Fixes PR26340

Differential Revision: http://reviews.llvm.org/D17779

llvm-svn: 262498
2016-03-02 18:28:50 +00:00