llvm-project

Commit Graph

Author	SHA1	Message	Date
Yaxun Liu	c3dfe9082b	[HIP] Support attribute hip_pinned_shadow This patch introduces support of hip_pinned_shadow variable for HIP. A hip_pinned_shadow variable is a global variable with attribute hip_pinned_shadow. It has external linkage on device side and has no initializer. It has internal linkage on host side and has initializer or static constructor. It can be accessed in both device code and host code. This allows HIP runtime to implement support of HIP texture reference. Differential Revision: https://reviews.llvm.org/D62738 llvm-svn: 364381	2019-06-26 03:47:37 +00:00
Yaxun Liu	cabce71845	[AMDGPU] Enable the implicit arguments for HIP (CLANG) Enable 48-bytes of implicit arguments for HIP as well. Earlier it was enabled for OpenCL. This code is specific to AMDGPU target. Differential Revision: https://reviews.llvm.org/D62244 llvm-svn: 363414	2019-06-14 15:54:47 +00:00
Tim Northover	c46827c7ed	LLVM IR: Generate new-style byval-with-Type from Clang LLVM IR recently added a Type parameter to the byval Attribute, so that when pointers become opaque and no longer have an element type the information will still be present in IR. For now the Type parameter is optional (which is why Clang didn't need this change at the time), but it will become mandatory soon. llvm-svn: 362652	2019-06-05 21:12:14 +00:00
Michael Liao	4b7a713acc	[CUDA][HIP] Skip setting `externally_initialized` for static device variables. Summary: - By declaring device variables as `static`, we assume they won't be addressable from the host side. Thus, no `externally_initialized` is required. Reviewers: yaxunl Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D62603 llvm-svn: 361994	2019-05-29 17:23:27 +00:00
Yaxun Liu	dc805a4906	Fix failure of lit test dependent-libs.cu llvm-svn: 361905	2019-05-29 01:34:44 +00:00
Yaxun Liu	02afe4e077	[CUDA][HIP] Emit dependent libs for host only Recently D60274 was introduced to allow lld to handle dependent libs. However current usage of dependent libs (e.g. pragma comment(lib, *) in windows header files) are intended for host only. Emitting the metadata in device IR causes link error in device path. Until there is a way to different it dependent libs for device or host, metadata for dependent libs should be emitted for host only. This patch enforces that. Differential Revision: https://reviews.llvm.org/D62483 llvm-svn: 361880	2019-05-28 21:18:59 +00:00
Michael Liao	3820506960	[HIP] Fix visibility of `__constant__` variables. Summary: - `__constant__` variables should not be `hidden` as the linker may turn them into `LOCAL` symbols. Reviewers: yaxunl Subscribers: jvesely, nhaehnle, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D61194 llvm-svn: 359344	2019-04-26 19:31:48 +00:00
Aaron Enye Shi	8129521318	[HIP-Clang] Fat binary should not be produced for non GPU code 2 Also for CUDA, we need to disable producing these fat binary functions when there is no GPU code. Reviewers: yaxunl, tra Differential Revision: https://reviews.llvm.org/D60141 llvm-svn: 357526	2019-04-02 20:49:41 +00:00
Aaron Enye Shi	13d8e92940	[HIP-Clang] Fat binary should not be produced for non GPU code Skip producing the fat binary functions for HIP when no device code is present. Reviewers: yaxunl Differential Review: https://reviews.llvm.org/D60141 llvm-svn: 357520	2019-04-02 20:10:18 +00:00
Michael Liao	982cbb6232	[CUDA][HIP][DebugInfo] Skip reference device function Summary: - A device functions could be used as a non-type template parameter in a global/host function template. However, we should not try to retrieve that device function and reference it in the host-side debug info as it's only valid at device side. Subscribers: aprantl, jdoerfert, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58992 llvm-svn: 355551	2019-03-06 21:16:27 +00:00
Yaxun Liu	e739ac0e25	[HIP] change kernel stub name Add .stub to kernel stub function name so that it is different from kernel name in device code. This is necessary to let debugger find correct symbol for kernel. Differential Revision: https://reviews.llvm.org/D58518 llvm-svn: 354948	2019-02-27 02:02:52 +00:00
Yaxun Liu	00ebc0cb92	revert r354615: [HIP] change kernel stub name It caused regressions. Differential Revision: https://reviews.llvm.org/D58518 llvm-svn: 354651	2019-02-22 04:20:12 +00:00
Yaxun Liu	8d7cf0e2d4	[HIP] change kernel stub name Add .stub to kernel stub function name so that it is different from kernel name in device code. This is necessary to let debugger find correct symbol for kernel Differential Revision: https://reviews.llvm.org/D58518 llvm-svn: 354615	2019-02-21 20:12:16 +00:00
Yaxun Liu	c18e9ecd4f	[CUDA][HIP] Use device side kernel and variable names when registering them __hipRegisterFunction and __hipRegisterVar need to accept device side kernel and variable names so that HIP runtime can associate kernel stub functions in host code with kernel symbols in fat binaries, and associate shadow variables in host code with device variables in fat binaries. Currently, clang assumes kernel functions and device variables have the same name as the kernel stub functions and shadow variables. However, when host is compiled in windows with MSVC C++ ABI and device is compiled with Itanium C++ ABI (e.g. AMDGPU), kernels and device symbols in fat binary are mangled differently than host. This patch gets the device side kernel and variable name by mangling them in the mangle context of aux target. Differential Revision: https://reviews.llvm.org/D58163 llvm-svn: 354004	2019-02-14 02:00:09 +00:00
Alexey Bataev	1a9e05d7da	[DEBUG_INFO][NVPTX] Generate correct data about variable address class. Summary: Added ability to generate correct debug info data about the variable address class. Currently, for all the locals and globals the default values are used, ADDR_local_space(6) for locals and ADDR_global_space(5) for globals. The values are taken from the table in https://docs.nvidia.com/cuda/archive/10.0/ptx-writers-guide-to-interoperability/index.html#cuda-specific-dwarf. We need to emit correct data for address classes of, at least, shared and constant globals. Currently, all these variables are treated by the cuda-gdb debugger as the variables in the global address space and, thus, it require manual data type casting. Reviewers: echristo, probinson Subscribers: jholewinski, aprantl, cfe-commits Differential Revision: https://reviews.llvm.org/D57162 llvm-svn: 353204	2019-02-05 19:45:57 +00:00
Yaxun Liu	277e064bf5	Do not copy long double and 128-bit fp format from aux target for AMDGPU rC352620 caused regressions because it copied floating point format from aux target. floating point format decides whether extended long double is supported. It is x86_fp80 on x86 but IEEE double on amdgcn. Document usage of long doubel type in HIP programming guide https://github.com/ROCm-Developer-Tools/HIP/pull/890 Differential Revision: https://reviews.llvm.org/D57527 llvm-svn: 352801	2019-01-31 21:57:51 +00:00
Artem Belevich	c62214da3d	[CUDA] add support for the new kernel launch API in CUDA-9.2+. Instead of calling CUDA runtime to arrange function arguments, the new API constructs arguments in a local array and the kernels are launched with __cudaLaunchKernel(). The old API has been deprecated and is expected to go away in the next CUDA release. Differential Revision: https://reviews.llvm.org/D57488 llvm-svn: 352799	2019-01-31 21:34:03 +00:00
Artem Belevich	9953577cb2	[CUDA] Treat extern global variable shadows same as regular extern vars. This fixes compiler crash when we attempted to compile this code: extern __device__ int data; __device__ int data = 1; Differential Revision: https://reviews.llvm.org/D56033 llvm-svn: 349981	2018-12-22 01:11:09 +00:00
Artem Belevich	7b05666a19	[CUDA] Make all host-side shadows of device-side variables undef. The host-side code can't (and should not) access the values that may only exist on the device side. E.g. address of a __device__ function does not exist on the host side as we don't generate the code for it there. Differential Revision: https://reviews.llvm.org/D55663 llvm-svn: 349087	2018-12-13 21:43:04 +00:00
Sean Fertile	d900dd0c23	Revert "[CodeGenCXX] Treat 'this' as noalias in constructors" This reverts commit https://reviews.llvm.org/rL344150 which causes MachineOutliner related failures on the ppc64le multistage buildbot. llvm-svn: 344526	2018-10-15 15:43:00 +00:00
Anton Bikineev	cc7e74753a	[CodeGenCXX] Treat 'this' as noalias in constructors This is currently a clang extension and a resolution of the defect report in the C++ Standard. Differential Revision: https://reviews.llvm.org/D46441 llvm-svn: 344150	2018-10-10 16:14:51 +00:00
Yaxun Liu	9767089d00	[HIP] Support early finalization of device code for -fno-gpu-rdc This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original options as aliases. When -fgpu-rdc is off, clang will assume the device code in each translation unit does not call external functions except those in the device library, therefore it is possible to compile the device code in each translation unit to self-contained kernels and embed them in the host object, so that the host object behaves like usual host object which can be linked by lld. The benefits of this feature is: 1. allow users to create static libraries which can be linked by host linker; 2. amortized device code linking time. This patch modifies HIP action builder to insert actions for linking device code and generating HIP fatbin, and pass HIP fatbin to host backend action. It extracts code for constructing command for generating HIP fatbin as a function so that it can be reused by early finalization. It also modifies codegen of HIP host constructor functions to embed the device fatbin when it is available. Differential Revision: https://reviews.llvm.org/D52377 llvm-svn: 343611	2018-10-02 17:48:54 +00:00
Artem Belevich	78929efb4d	[CUDA] Ignore uncallable functions when we check for usual deallocators. Previously clang considered function variants from both sides of compilation and that resulted in picking up wrong deallocation function. Differential Revision: https://reviews.llvm.org/D51808 llvm-svn: 342749	2018-09-21 17:29:33 +00:00
Matt Arsenault	a13746b7eb	Rename -mlink-cuda-bitcode to -mlink-builtin-bitcode The same semantics work for OpenCL, and probably any offload language. Keep the old name around as an alias. llvm-svn: 340193	2018-08-20 18:16:48 +00:00
Yaxun Liu	94ff57f5b1	[HIP] Make __hip_gpubin_handle hidden to avoid being merged across different shared libraries Different shared libraries contain different fat binary, which is stored in a global variable __hip_gpubin_handle. Since different compilation units share the same fat binary, this variable has linkonce linkage. However, it should not be merged across different shared libraries. This patch set the visibility of the global variable to be hidden, which will make it invisible in the shared library, therefore preventing it from being merged. Differential Revision: https://reviews.llvm.org/D50596 llvm-svn: 340056	2018-08-17 17:47:31 +00:00
Matt Arsenault	c65f966d76	Try to make builtin address space declarations not useless The way address space declarations for builtins currently work is nearly useless. The code assumes the address spaces used for builtins is a confusingly named "target address space" from user code using __attribute__((address_space(N))) that matches the builtin declaration. There's no way to use this to declare a builtin that returns a language specific address space. The terminology used is highly cofusing since it has nothing to do with the the address space selected by the target to use for a language address space. This feature is essentially unused as-is. AMDGPU and NVPTX are the only in-tree targets attempting to use this. The AMDGPU builtins certainly do not behave as intended (i.e. all of the builtins returning pointers can never compile because the numbered address space never matches the expected named address space). The NVPTX builtins are missing tests for some, and the others seem to rely on an implicit addrspacecast. Change the used address space for builtins based on a target hook to allow using a language address space for a builtin. This allows the same builtin declaration to be used for multiple languages with similarly purposed address spaces (e.g. the same AMDGPU builtin can be used in OpenCL and CUDA even though the constant address spaces are arbitarily different). This breaks the possibility of using arbitrary numbered address spaces alongside the named address spaces for builtins. If this is an issue we probably need to introduce another builtin declaration character to distinguish language address spaces from so-called "target address spaces". llvm-svn: 338707	2018-08-02 12:14:28 +00:00
Yaxun Liu	a4005e13f7	[CUDA][HIP] Allow function-scope static const variable CUDA 8.0 E.3.9.4 says: Within the body of a __device__ or __global__ function, only __shared__ variables or variables without any device memory qualifiers may be declared with static storage class. It is unclear how a function-scope non-const static variable without device memory qualifier is implemented, therefore only static const variable without device memory qualifier is allowed, which can be emitted as a global variable in constant address space. Currently clang only allows function-scope static variable with __shared__ qualifier. This patch also allows function-scope static const variable without device memory qualifier and emits it as a global variable in constant address space. Differential Revision: https://reviews.llvm.org/D49931 llvm-svn: 338188	2018-07-28 03:05:25 +00:00
Yaxun Liu	e1bfbc589f	[HIP] Support -fcuda-flush-denormals-to-zero for amdgcn Differential Revision: https://reviews.llvm.org/D48287 llvm-svn: 337639	2018-07-21 02:02:22 +00:00
Yaxun Liu	f99752b66b	[HIP] Register/unregister device fat binary only once HIP generates one fat binary for all devices after linking. However, for each compilation unit a ctor function is emitted which register the same fat binary. Measures need to be taken to make sure the fat binary is only registered once. Currently each ctor function calls __hipRegisterFatBinary and stores the returned value to __hip_gpubin_handle. This patch changes the linkage of __hip_gpubin_handle to be linkonce so that they are shared between LLVM modules. Then this patch adds check of value of __hip_gpubin_handle to make sure __hipRegisterFatBinary is only called once. The code is equivalent to void *_gpubin_handle; void ctor() { if (__hip_gpubin_handle == 0) { __hip_gpubin_handle = __hipRegisterFatBinary(...); } // register kernels and variables. } The patch also does similar change to dtors so that __hipUnregisterFatBinary is called once. Differential Revision: https://reviews.llvm.org/D49083 llvm-svn: 337631	2018-07-20 22:45:24 +00:00
Joel E. Denny	72c2783012	[FileCheck] Add -allow-deprecated-dag-overlap to failing clang tests See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47172 llvm-svn: 336844	2018-07-11 20:26:20 +00:00
Artem Belevich	c66d254ded	[CUDA] Use atexit() to call module destructor. This matches the way NVCC does it. Doing module cleanup at global destructor phase used to work, but is, apparently, too late for the CUDA runtime in CUDA-9.2, which ends up crashing with double-free. Differential Revision: https://reviews.llvm.org/D48613 llvm-svn: 335763	2018-06-27 18:32:51 +00:00
Yaxun Liu	aa24601f98	[CUDA][HIP] Allow CUDA __global__ functions to have amdgpu kernel attributes There are HIP applications e.g. Tensorflow 1.3 using amdgpu kernel attributes, however currently they are only allowed on OpenCL kernel functions. This patch will allow amdgpu kernel attributes to be applied to CUDA/HIP __global__ functions. Differential Revision: https://reviews.llvm.org/D47958 llvm-svn: 334561	2018-06-12 23:58:59 +00:00
Yaxun Liu	6c10a66ec7	[CUDA][HIP] Set kernel calling convention before arrange function Currently clang set kernel calling convention for CUDA/HIP after arranging function, which causes incorrect kernel function type since it depends on calling convention. This patch moves setting kernel convention before arranging function. Differential Revision: https://reviews.llvm.org/D47733 llvm-svn: 334457	2018-06-12 00:16:33 +00:00
Jonas Hahnfeld	3b9cbba9a8	[CUDA] Fix emission of constant strings in sections CGM.GetAddrOfConstantCString() sets the adress of the created GlobalValue to unnamed. When emitting the object file LLVM will mark the surrounding section as SHF_MERGE iff the string is nul-terminated and contains no other nuls (see IsNullTerminatedString). This results in problems when saving temporaries because LLVM doesn't set an EntrySize, so reading in the serialized assembly file fails. This never happened for the GPU binaries because they usually contain a nul-character somewhere. Instead this only affected the module ID when compiling relocatable device code. However, this points to a potentially larger problem: If we put a constant string into a named section, we really want the data to end up in that section in the object file. To avoid LLVM merging sections this patch unmarks the GlobalVariable's address as unnamed which also fixes the problem of invalid serialized assembly files when saving temporaries. Differential Revision: https://reviews.llvm.org/D47902 llvm-svn: 334281	2018-06-08 11:17:08 +00:00
Yaxun Liu	6328f9a988	[CUDA][HIP] Do not emit type info when compiling for device CUDA/HIP does not support RTTI on device side, therefore there is no point of emitting type info when compiling for device. Emitting type info for device not only clutters the IR with useless global variables, but also causes undefined symbol at linking since vtable for cxxabiv1::class_type_info has external linkage. Differential Revision: https://reviews.llvm.org/D47694 llvm-svn: 334021	2018-06-05 15:11:02 +00:00
Yaxun Liu	29155b01c1	[HIP] Support offloading by linker script To support linking device code in different source files, it is necessary to embed fat binary at host linking stage. This patch emits an external symbol for fat binary in host codegen, then embed the fat binary by lld through a linker script. Differential Revision: https://reviews.llvm.org/D46472 llvm-svn: 332724	2018-05-18 15:07:56 +00:00
Yaxun Liu	48390a992f	Fix failure in lit test kernel-call.cu due to name mangling llvm-svn: 330821	2018-04-25 13:07:58 +00:00
Yaxun Liu	997e64f8a6	Fix lit test kernel-call.cu failure on ps4 due to dso_local llvm-svn: 330795	2018-04-25 03:16:07 +00:00
Yaxun Liu	e21278d938	Fix failure in lit test kernel-call.cu There is signext on ppc64. Just remove check for function argument. llvm-svn: 330793	2018-04-25 02:34:04 +00:00
Yaxun Liu	887c569bcb	[HIP] Add hip input kind and codegen for kernel launching HIP is a language similar to CUDA (https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md ). The language syntax is very similar, which allows a hip program to be compiled as a CUDA program by Clang. The main difference is the host API. HIP has a set of vendor neutral host API which can be implemented on different platforms. Currently there is open source implementation of HIP runtime on amdgpu target (https://github.com/ROCm-Developer-Tools/HIP). This patch adds support of input kind and language standard hip. When hip file is compiled, both LangOpts.CUDA and LangOpts.HIP is turned on. This allows compilation of hip program as CUDA in most cases and only special handling of hip program is needed LangOpts.HIP is checked. This patch also adds support of kernel launching of HIP program using HIP host API. When -x hip is not specified, there is no behaviour change for CUDA. Patch by Greg Rodgers. Revised and lit test added by Yaxun Liu. Differential Revision: https://reviews.llvm.org/D44984 llvm-svn: 330790	2018-04-25 01:10:37 +00:00
Yaxun Liu	4306f2086f	[CUDA] Set LLVM calling convention for CUDA kernel Some targets need special LLVM calling convention for CUDA kernel. This patch does that through a TargetCodeGenInfo hook. It only affects amdgcn target. Patch by Greg Rodgers. Revised and lit tests added by Yaxun Liu. Differential Revision: https://reviews.llvm.org/D45223 llvm-svn: 330447	2018-04-20 17:01:03 +00:00
Jonas Hahnfeld	f5527c2381	[CUDA] Register relocatable GPU binaries nvcc generates a unique registration function for each object file that contains relocatable device code. Unique names are achieved with a module id that is also reflected in the function's name. Differential Revision: https://reviews.llvm.org/D42922 llvm-svn: 330425	2018-04-20 13:04:45 +00:00
Eli Friedman	01d349bab1	Remove -cc1 option "-backend-option". It means the same thing as -mllvm; there isn't any reason to have two options which do the same thing. Differential Revision: https://reviews.llvm.org/D45109 llvm-svn: 329965	2018-04-12 22:21:36 +00:00
Alexander Kornienko	2a8c18d991	Fix typos in clang Found via codespell -q 3 -I ../clang-whitelist.txt Where whitelist consists of: archtype cas classs checkk compres definit frome iff inteval ith lod methode nd optin ot pres statics te thru Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few files that have dubious fixes reverted.) Differential revision: https://reviews.llvm.org/D44188 llvm-svn: 329399	2018-04-06 15:14:32 +00:00
Artem Belevich	55ebd6cc26	Revert "Set calling convention for CUDA kernel" This reverts r328795 which introduced an issue with referencing __global__ function templates. More details in the original review D44747. llvm-svn: 329099	2018-04-03 18:29:31 +00:00
Yaxun Liu	a64a491e7b	[CUDA] Let device-side shared variables be initialized with undef CUDA shared variable should be initialized with undef. Patch by Greg Rodgers. Revised and lit test added by Yaxun Liu. Differential Revision: https://reviews.llvm.org/D44985 llvm-svn: 328994	2018-04-02 17:38:24 +00:00
Yaxun Liu	b2f2bb26e4	Set calling convention for CUDA kernel This patch sets target specific calling convention for CUDA kernels in IR. Patch by Greg Rodgers. Revised and lit test added by Yaxun Liu. Differential Revision: https://reviews.llvm.org/D44747 llvm-svn: 328795	2018-03-29 15:02:08 +00:00
Yaxun Liu	b0eee29c74	Disable emitting static extern C aliases for amdgcn target for CUDA Patch by Greg Rodgers. Revised and lit test added by Yaxun Liu. Differential Revision: https://reviews.llvm.org/D44987 llvm-svn: 328793	2018-03-29 14:50:00 +00:00
Rafael Espindola	2a639a4c11	Really fix test on windows. Sorry for the noise. llvm-svn: 325943	2018-02-23 19:38:41 +00:00
Rafael Espindola	f43c2ff84b	Fix one last test on a windows host. llvm-svn: 325942	2018-02-23 19:36:20 +00:00
Artem Belevich	5ecdb94487	[CUDA] CUDA has no device-side library builtins. We should (almost) never consider a device-side declaration to match a library builtin functio. Otherwise clang may ignore the implementation provided by the CUDA headers and emit clang's idea of the builtin. Differential Revision: https://reviews.llvm.org/D42319 llvm-svn: 323239	2018-01-23 19:08:18 +00:00
Matthias Braun	a451953224	CodeGenModule: Always output wchar_size, check LLVM assumptions. Re-commit r303463 now that LLVM is fixed and adjust some lit tests. llvm::TargetLibraryInfo needs to know the size of wchar_t to work on functions like `wcslen`. This patch changes clang to always emit the wchar_size module flag (it would only do so for ARM previously). This also adds an `assert()` to ensure the LLVM defaults based on the target triple are in sync with clang. Differential Revision: https://reviews.llvm.org/D32982 llvm-svn: 303478	2017-05-20 01:29:55 +00:00
Adam Nemet	049a31d53d	Use FPContractModeKind universally FPContractModeKind is the codegen option flag which is already ternary (off, on, fast). This makes it universally the type for the contractable info across the front-end: * In FPOptions (i.e. in the Sema + in the expression nodes). * In LangOpts::DefaultFPContractMode which is the option that initializes FPOptions in the Sema. Another way to look at this change is that before fp-contractable on/off were the only states handled to the front-end: * For "on", FMA folding was performed by the front-end * For "fast", we simply forwarded the flag to TargetOptions to handle it in LLVM Now off/on/fast are all exposed because for fast we will generate fast-math-flags during CodeGen. This is toward moving fp-contraction=fast from an LLVM TargetOption to a FastMathFlag in order to fix PR25721. --- This is a recommit of r299027 with an adjustment to the test CodeGenCUDA/fp-contract.cu. The test assumed that even though -ffp-contract=on is passed FE-based folding of FMA won't happen. This is obviously wrong since the user is asking for this explicitly with the option. CUDA is different that -ffp-contract=fast is on by default. The test used to "work" because contract=fast and contract=on were maintained separately and we didn't fold in the FE because contract=fast was on due to the target-default. This patch consolidates the contract=on/fast/off state into a ternary state hence the change in behavior. --- Differential Revision: https://reviews.llvm.org/D31167 llvm-svn: 299033	2017-03-29 21:54:24 +00:00
Justin Lebar	b080b630b1	[CodeGen] [CUDA] Add the ability set default attrs on functions in linked modules. Summary: Now when you ask clang to link in a bitcode module, you can tell it to set attributes on that module's functions to match what we would have set if we'd emitted those functions ourselves. This is particularly important for fast-math attributes in CUDA compilations. Each CUDA compilation links in libdevice, a bitcode library provided by nvidia as part of the CUDA distribution. Without this patch, if we have a user-function F that is compiled with -ffast-math that calls a function G from libdevice, F will have the unsafe-fp-math=true (etc.) attributes, but G will have no attributes. Since F calls G, the inliner will merge G's attributes into F's. It considers the lack of an unsafe-fp-math=true attribute on G to be tantamount to unsafe-fp-math=false, so it "merges" these by setting unsafe-fp-math=false on F. This then continues up the call graph, until every function that (transitively) calls something in libdevice gets unsafe-fp-math=false set, thus disabling fastmath in almost all CUDA code. Reviewers: echristo Subscribers: hfinkel, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D28538 llvm-svn: 293097	2017-01-25 21:29:48 +00:00
Artem Belevich	13e9b4d768	[CUDA] Improve target attribute checking for function templates. * __host__ __device__ functions are no longer considered to be redeclarations of __host__ or __device__ functions. This prevents unintentional merging of target attributes across them. * Function target attributes are not considered (and must match) during explicit instantiation and specialization of function templates. Differential Revision: https://reviews.llvm.org/D25809 llvm-svn: 288962	2016-12-07 19:27:16 +00:00
Justin Lebar	2dfbe9a3b4	[CUDA] Rename cuda_builtin_vars.h to __clang_cuda_builtin_vars.h. Summary: This matches the idiom we use for our other CUDA wrapper headers. Reviewers: tra Subscribers: beanz, mgorny, cfe-commits Differential Revision: https://reviews.llvm.org/D24978 llvm-svn: 283679	2016-10-08 22:16:08 +00:00
Justin Lebar	4a759ff44c	[CUDA] Add missing ':' to noexcept.cu test. llvm-svn: 283280	2016-10-05 00:27:38 +00:00
Justin Lebar	3e6449b4f4	[CUDA] Mark device functions as nounwind. Summary: This prevents clang from emitting 'invoke's and catch statements. Things previously mostly worked thanks to TryToMarkNoThrow() in CodeGenFunction. But this is not a proper IPO, and it doesn't properly handle cases like mutual recursion. Fixes bug 30593. Reviewers: tra Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D25166 llvm-svn: 283272	2016-10-04 23:41:49 +00:00
Justin Lebar	e060feb7b1	[CUDA] Disallow overloading destructors. Summary: We'd attempted to allow this, but turns out we were doing a very bad job. :) Making this work properly would be a giant change in clang. For example, we'd need to make CXXRecordDecl::getDestructor() context-sensitive, because the destructor you end up with depends on where you're calling it from. For now (and hopefully for ever), just disallow overloading of destructors in CUDA. Reviewers: rsmith Subscribers: cfe-commits, tra Differential Revision: https://reviews.llvm.org/D24571 llvm-svn: 283120	2016-10-03 16:48:23 +00:00
Justin Lebar	18e2d82297	[CUDA] Raise an error if a wrong-side call is codegen'ed. Summary: Some function calls in CUDA are allowed to appear in semantically-correct programs but are an error if they're ever codegen'ed. Specifically, a host+device function may call a host function, but it's an error if such a function is ever codegen'ed in device mode (and vice versa). Previously, clang made no attempt to catch these errors. For the most part, they would be caught by ptxas, and reported as "call to unknown function 'foo'". Now we catch these errors and report them the same as we report other illegal calls (e.g. a call from a host function to a device function). This has a small change in error-message behavior for calls that were previously disallowed (e.g. calls from a host to a device function). Previously, we'd catch disallowed calls fairly early, before doing additional semantic checking e.g. of the call's arguments. Now we catch these illegal calls at the very end of our semantic checks, so we'll only emit a "illegal CUDA call" error if the call is otherwise well-formed. Reviewers: tra, rnk Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D23242 llvm-svn: 278759	2016-08-15 23:00:49 +00:00
Artem Belevich	4c09318be2	[CUDA] Place GPU binary into .nv_fatbin section and align it by 8. This matches the way nvcc encapsulates GPU binaries into host object file. Now cuobjdump can deal with clang-compiled object files. Differential Revision: https://reviews.llvm.org/D23429 llvm-svn: 278549	2016-08-12 18:44:01 +00:00
Justin Lebar	e56360a2cd	[CUDA] Align kernel launch args correctly when the LLVM type's alignment is different from the clang type's alignment. Summary: Before this patch, we computed the offsets in memory of args passed to GPU kernel functions by throwing all of the args into an LLVM struct. clang emits packed llvm structs basically whenever it feels like it, and packed structs have alignment 1. So we cannot rely on the llvm type's alignment matching the C++ type's alignment. This patch fixes our codegen so we always respect the clang types' alignments. Reviewers: rnk Subscribers: cfe-commits, tra Differential Revision: https://reviews.llvm.org/D22879 llvm-svn: 276927	2016-07-27 22:36:21 +00:00
Justin Bogner	2d5de7e568	NVPTX: Use the nvvm builtins to read SRegs rather than the legacy ptx ones The ptx spellings were removed from LLVM in r274769. llvm-svn: 274770	2016-07-07 16:41:08 +00:00
Justin Lebar	27ee130e38	[CUDA] Give templated device functions internal linkage, templated kernels external linkage. Summary: This lets LLVM perform IPO over these functions. In particular, it allows LLVM to emit ld.global.nc for loads to __restrict pointers in kernels that are never written to. Reviewers: rsmith Subscribers: cfe-commits, tra Differential Revision: http://reviews.llvm.org/D21337 llvm-svn: 274261	2016-06-30 18:41:33 +00:00
Artem Belevich	bcec9dac14	[CUDA] Add implicit conversion of __launch_bounds__ arguments to rvalue. Fixes clang crash reported in PR27778. Differential Revision: http://reviews.llvm.org/D20985 llvm-svn: 271951	2016-06-06 22:54:57 +00:00
Justin Lebar	f179364341	[CUDA] Conservatively mark inline asm as convergent. Summary: This is particularly important because a some convergent CUDA intrinsics (e.g. __shfl_down) are implemented in terms of inline asm. Reviewers: tra Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D20836 llvm-svn: 271336	2016-05-31 21:27:13 +00:00
Reid Kleckner	a769fd50ba	Avoid depending on test inputes that aren't in Inputs Some people have weird CI systems that run each test subdirectory independently without access to other parallel trees. Unfortunately, this means we have to suffer some duplication until Art can sort out how to share these types. llvm-svn: 270164	2016-05-20 00:38:25 +00:00
Artem Belevich	3650bbeebc	[CUDA] Do not allow non-empty destructors for global device-side variables. According to Cuda Programming guide (v7.5, E2.3.1): > __device__, __constant__ and __shared__ variables defined in namespace > scope, that are of class type, cannot have a non-empty constructor or a > non-empty destructor. Clang already deals with device-side constructors (see D15305). This patch enforces similar rules for destructors. Differential Revision: http://reviews.llvm.org/D20140 llvm-svn: 270108	2016-05-19 20:13:53 +00:00
Artem Belevich	85b6f63f42	[CUDA] Split device-var-init.cu tests into separate Sema and CodeGen parts. Codegen tests for device-side variable initialization are subset of test cases used to verify Sema's part of the job. Including CodeGenCUDA/device-var-init.cu from SemaCUDA makes it easier to keep both sides in sync. Differential Revision: http://reviews.llvm.org/D20139 llvm-svn: 270107	2016-05-19 20:13:39 +00:00
Artem Belevich	31c3bad499	[CUDA] Enable fusing FP ops (-ffp-contract=fast) for CUDA by default. This matches default nvcc behavior and gives substantial performance boost on GPU where fmad is much cheaper compared to add+mul. Differential Revision: http://reviews.llvm.org/D20341 llvm-svn: 270094	2016-05-19 18:44:45 +00:00
Justin Lebar	3b30b7eef6	[CUDA] Fix flush-denormals.cu test so that it checks what it intends to CHECK. FileCheck does not evaluate plain CHECKs if you pass -check-prefix; you have to ask for it explicitly. llvm-svn: 269000	2016-05-10 00:34:50 +00:00
Artem Belevich	4d430badeb	[CUDA] Restrict init of local __shared__ variables to empty constructors only. Allow only empty constructors for local __shared__ variables in a way identical to restrictions imposed on dynamic initializers for global variables on device. Differential Revision: http://reviews.llvm.org/D20039 llvm-svn: 268982	2016-05-09 22:09:56 +00:00
Artem Belevich	0c0ada01b6	[CUDA] Only __shared__ variables can be static local on device side. According to CUDA programming guide (v7.5): > E.2.9.4: Within the body of a device or global function, only > shared variables may be declared with static storage class. Differential Revision: http://reviews.llvm.org/D20034 llvm-svn: 268962	2016-05-09 19:36:08 +00:00
Artem Belevich	ca2b951cbc	[CUDA] Make sure device-side __global__ functions are always visible. __global__ functions are a special case in CUDA. Even when the symbol would normally not be externally visible according to C++ rules, they still must be visible in CUDA GPU object so host-side stub can launch them. Differential Revision: http://reviews.llvm.org/D19748 llvm-svn: 268299	2016-05-02 20:30:03 +00:00
Justin Lebar	d3a44f6885	[CUDA] Add -fcuda-flush-denormals-to-zero. Summary: Setting this flag causes all functions are annotated with the "nvvm-f32ftz" = "true" attribute. In addition, we annotate the module with "nvvm-reflect-ftz" set to 0 or 1, depending on whether -cuda-flush-denormals-to-zero is set. This is read by the NVVMReflect pass. Reviewers: tra, rnk Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D18671 llvm-svn: 265435	2016-04-05 18:26:20 +00:00
Justin Lebar	19b648eae3	[CUDA] Add -disable-llvm-passes to CodeGenCUDA/link-device-bitcode.cu. NFC We already have this flag in most of the file, but we need it everywhere else, to disable the NVVMReflect pass, which we're explicitly checking doesn't run here. (Upcoming changes to llvm will cause it to be run.) llvm-svn: 264969	2016-03-30 23:45:38 +00:00
Justin Lebar	25c4a81e79	[CUDA] Remove three obsolete CUDA cc1 flags. Summary: * -fcuda-target-overloads Previously unconditionally set to true by the driver. Necessary for correct functioning of the compiler -- our CUDA headers wrapper won't compile without this. * -fcuda-disable-target-call-checks Previously unconditionally set to true by the driver. Necessary to compile almost any external CUDA code -- almost all libraries assume that host+device code can call host or device functions. * -fcuda-allow-host-calls-from-host-device No effect when target overloading is enabled. Reviewers: tra Subscribers: rsmith, cfe-commits Differential Revision: http://reviews.llvm.org/D18416 llvm-svn: 264739	2016-03-29 16:24:16 +00:00
Justin Lebar	e5eed04d52	[CUDA] Merge most of CodeGenCUDA/function-overload.cu into SemaCUDA/function-overload.cu. Summary: Previously we were using the codegen test to ensure that we choose the right overload. But we can do this within sema, with a bit of cleverness. I left the constructor/destructor checks in CodeGen, because these overloads (particularly on the destructors) are hard to check in Sema. Reviewers: tra Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D18386 llvm-svn: 264207	2016-03-23 22:42:30 +00:00
Artem Belevich	3609085dc4	Fixed test failure platforms with name mangling different from Linux. * Run cc with -triple x86_64-linux-gnu to make symbol mangling predictable. * Use temporary file as a fake GPU input so its content does not interfere with pattern matching. llvm-svn: 262516	2016-03-02 21:03:20 +00:00
Artem Belevich	8c1ec1ef38	[CUDA] Do not generate unnecessary runtime init code. Differential Revision: http://reviews.llvm.org/D17780 llvm-svn: 262499	2016-03-02 18:28:53 +00:00
Artem Belevich	42e1949b46	[CUDA] Emit host-side 'shadows' for device-side global variables ... and register them with CUDA runtime. This is needed for commonly used cudaMemcpy*() APIs that use address of host-side shadow to access their counterparts on device side. Fixes PR26340 Differential Revision: http://reviews.llvm.org/D17779 llvm-svn: 262498	2016-03-02 18:28:50 +00:00
Justin Lebar	ddd97faeec	[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent. Summary: This is important for e.g. the following case: void sync() { __syncthreads(); } void foo() { do_something(); sync(); do_something_else(): } Without this change, if the optimizer does not inline sync() (which it won't because __syncthreads is also marked as noduplicate, for now anyway), it is free to perform optimizations on sync() that it would not be able to perform on __syncthreads(), because sync() is not marked as convergent. Similarly, we need a notion of convergent calls, since in the case when we can't statically determine a call's target(s), we need to know whether it's safe to perform optimizations around the call. This change is conservative; the optimizer will remove these attrs where it can, see r260318, r260319. Reviewers: majnemer Subscribers: cfe-commits, jhen, echristo, tra Differential Revision: http://reviews.llvm.org/D17056 llvm-svn: 261779	2016-02-24 21:55:11 +00:00
Artem Belevich	186091094a	[CUDA] Tweak attribute-based overload resolution to match nvcc behavior. This is an artefact of split-mode CUDA compilation that we need to mimic. HD functions are sometimes allowed to call H or D functions. Due to split compilation mode device-side compilation will not see host-only function and thus they will not be considered at all. For clang both H and D variants will become function overloads visible to compiler. Normally target attribute is considered only if C++ rules can not determine which function is better. However in this case we need to ignore functions that would not be present during current compilation phase before we apply normal overload resolution rules. Changes: * introduced another level of call preference to better describe possible call combinations. * removed WrongSide functions from consideration if the set contains SameSide function. * disabled H->D, D->H and G->H calls. These combinations are not allowed by CUDA and we were reluctantly allowing them to work around device-side calls to math functions in std namespace. We no longer need it after r258880. Differential Revision: http://reviews.llvm.org/D16870 llvm-svn: 260697	2016-02-12 18:29:18 +00:00
Justin Lebar	9a2c0fbaf5	[CUDA] Don't crash when trying to printf a non-scalar object. Summary: We can't do the right thing, since there's no right thing to do, but at least we can not crash the compiler. Reviewers: majnemer, rnk Subscribers: cfe-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D17103 llvm-svn: 260479	2016-02-11 02:00:52 +00:00
Artem Belevich	97c01c35f8	[CUDA] Do not allow dynamic initialization of global device side variables. In general CUDA does not allow dynamic initialization of global device-side variables. One exception is that CUDA allows records with empty constructors as described in section E2.2.1 of CUDA 7.5 Programming guide. This patch applies initializer checks for all device-side variables. Empty constructors are accepted, but no code is generated for them. Differential Revision: http://reviews.llvm.org/D15305 llvm-svn: 259592	2016-02-02 22:29:48 +00:00
Justin Lebar	c0e42750da	[CUDA] Generate CUDA's printf alloca in its function's entry block. Summary: This is necessary to prevent llvm from generating stacksave intrinsics around this alloca. NVVM doesn't have a stack, and we don't handle said intrinsics. Reviewers: rnk, echristo Subscribers: cfe-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D16664 llvm-svn: 259122	2016-01-28 23:58:28 +00:00
Justin Lebar	cd2f6bbd5c	[CUDA] Don't generate aliases for static extern "C" functions. Summary: These aliases are done to support inline asm, but there's nothing we can do: NVPTX doesn't support aliases. Reviewers: tra Subscribers: cfe-commits, jhen, echristo Differential Revision: http://reviews.llvm.org/D16501 llvm-svn: 258734	2016-01-25 22:36:37 +00:00
Justin Lebar	3039a593db	[CUDA] Make printf work. Summary: The code in CGCUDACall is largely based on a patch written by Eli Bendersky: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140324/210218.html That patch implemented an LLVM pass lowering printf to vprintf; this one does something similar, but in Clang codegen. Reviewers: echristo Subscribers: cfe-commits, jhen, tra, majnemer Differential Revision: http://reviews.llvm.org/D16372 llvm-svn: 258642	2016-01-23 21:28:14 +00:00
Artem Belevich	9b9294674b	[CUDA] Make vtable construction aware of host/device side of CUDA compilation. C++ emits vtables for classes that have key function present in the current TU. While we compile CUDA the fact that key function was found in this TU does not mean that we are going to generate code for it. E.g. vtable for a class with host-only methods should not (and can not) be generated on device side, because we'll never generate code for them during device-side compilation. This patch adds an extra CUDA-specific check during key method computation and filters out potential key methods that are not suitable for this side of CUDA compilation. When we codegen vtable, entries for unsuitable methods are set to null. Differential Revision: http://reviews.llvm.org/D15309 llvm-svn: 255911	2015-12-17 18:12:36 +00:00
Artem Belevich	5d40ae3a46	Allow linking multiple bitcode files. Linking options for particular file depend on the option that specifies the file. Currently there are two: * -mlink-bitcode-file links in complete content of the specified file. * -mlink-cuda-bitcode links in only the symbols needed by current TU. Linked symbols are internalized. This bitcode linking mode is used to link device-specific bitcode provided by CUDA. Files are linked in order they are specified on command line. -mlink-cuda-bitcode replaces -fcuda-uses-libdevice flag. Differential Revision: http://reviews.llvm.org/D13913 llvm-svn: 251427	2015-10-27 17:56:59 +00:00
Artem Belevich	7b41f70e6c	[CUDA] __global__ functions should always be visible externally. Adjust __global__ functions with DiscardableODR linkage to use StrongODR linkage instead, so they are visible externally. Differential Revision: http://reviews.llvm.org/D13067 llvm-svn: 248400	2015-09-23 17:44:53 +00:00
Artem Belevich	94a55e8169	[CUDA] Allow function overloads in CUDA based on host/device attributes. The patch makes it possible to parse CUDA files that contain host/device functions with identical signatures, but different attributes without having to physically split source into host-only and device-only parts. This change is needed in order to parse CUDA header files that have a lot of name clashes with standard include files. Gory details are in design doc here: https://goo.gl/EXnymm Feel free to leave comments there or in this review thread. This feature is controlled with CC1 option -fcuda-target-overloads and is disabled by default. Differential Revision: http://reviews.llvm.org/D12453 llvm-svn: 248295	2015-09-22 17:22:59 +00:00
Artem Belevich	c3fa25def7	[CUDA] Add implicit __attribute__((used)) to all __global__ functions. This makes sure that we emit kernels that were instantiated from the host code and which would never be explicitly referenced by anything else on device side. Differential Revision: http://reviews.llvm.org/D11666 llvm-svn: 248293	2015-09-22 17:22:51 +00:00
Artem Belevich	7cb25c9b69	[CUDA] Postprocess bitcode linked in during device-side CUDA compilation. Link in and internalize the symbols we need from supplied bitcode library. Differential Revision: http://reviews.llvm.org/D11664 llvm-svn: 247317	2015-09-10 18:24:23 +00:00
Artem Belevich	da1851ca58	[CUDA] Allow trivial constructors as initializer for __shared__ variables. Differential Revision: http://reviews.llvm.org/D12739 llvm-svn: 247307	2015-09-10 17:26:58 +00:00
Jingyue Wu	284ebe237f	[CUDA] Change initializer for CUDA device code based on CUDA documentation. Summary: According to CUDA documentation, global variables declared with __device__, __constant__ can be initialized from host code, so mark them as externally initialized. Because __shared__ variables cannot have an initialization as part of their declaration and since the value maybe kept across different kernel invocation, the value of __shared__ is effectively undefined instead of zero initialized. Wrongly using zero initializer may cause illegitimate optimization, e.g. removing unused __constant__ variable because it's not updated in the device code and the value is initialized with zero. Test Plan: test/CodeGenCUDA/address-spaces.cu Patch by Xuetian Weng Reviewers: jholewinski, eliben, tra, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12241 llvm-svn: 245786	2015-08-22 05:49:28 +00:00
Daniel Jasper	3b0f87d289	Revert "[CUDA] Add implicit __attribute__((used)) to all __global__ functions." This is breaking internal test. I'll provide a reproduction. llvm-svn: 244583	2015-08-11 11:02:09 +00:00
Artem Belevich	b7e4aab40c	[CUDA] Add implicit __attribute__((used)) to all __global__ functions. This allows emitting kernels that were instantiated from the host code and which would never be explicitly referenced otherwise. Differential Revision: http://reviews.llvm.org/D11666 llvm-svn: 244501	2015-08-10 20:57:02 +00:00
Artem Belevich	e958275250	[cuda] Fixed test case failure on s390x llvm-svn: 237007	2015-05-11 18:35:58 +00:00
Artem Belevich	8d062ad560	Fixed test failure on machines with 32-bit size_t. llvm-svn: 236773	2015-05-07 21:06:03 +00:00

1 2 3 4

178 Commits