llvm-project

Commit Graph

Author	SHA1	Message	Date
Artem Belevich	3650bbeebc	[CUDA] Do not allow non-empty destructors for global device-side variables. According to Cuda Programming guide (v7.5, E2.3.1): > __device__, __constant__ and __shared__ variables defined in namespace > scope, that are of class type, cannot have a non-empty constructor or a > non-empty destructor. Clang already deals with device-side constructors (see D15305). This patch enforces similar rules for destructors. Differential Revision: http://reviews.llvm.org/D20140 llvm-svn: 270108	2016-05-19 20:13:53 +00:00
Artem Belevich	85b6f63f42	[CUDA] Split device-var-init.cu tests into separate Sema and CodeGen parts. Codegen tests for device-side variable initialization are subset of test cases used to verify Sema's part of the job. Including CodeGenCUDA/device-var-init.cu from SemaCUDA makes it easier to keep both sides in sync. Differential Revision: http://reviews.llvm.org/D20139 llvm-svn: 270107	2016-05-19 20:13:39 +00:00
Artem Belevich	31c3bad499	[CUDA] Enable fusing FP ops (-ffp-contract=fast) for CUDA by default. This matches default nvcc behavior and gives substantial performance boost on GPU where fmad is much cheaper compared to add+mul. Differential Revision: http://reviews.llvm.org/D20341 llvm-svn: 270094	2016-05-19 18:44:45 +00:00
Justin Lebar	3b30b7eef6	[CUDA] Fix flush-denormals.cu test so that it checks what it intends to CHECK. FileCheck does not evaluate plain CHECKs if you pass -check-prefix; you have to ask for it explicitly. llvm-svn: 269000	2016-05-10 00:34:50 +00:00
Artem Belevich	4d430badeb	[CUDA] Restrict init of local __shared__ variables to empty constructors only. Allow only empty constructors for local __shared__ variables in a way identical to restrictions imposed on dynamic initializers for global variables on device. Differential Revision: http://reviews.llvm.org/D20039 llvm-svn: 268982	2016-05-09 22:09:56 +00:00
Artem Belevich	0c0ada01b6	[CUDA] Only __shared__ variables can be static local on device side. According to CUDA programming guide (v7.5): > E.2.9.4: Within the body of a device or global function, only > shared variables may be declared with static storage class. Differential Revision: http://reviews.llvm.org/D20034 llvm-svn: 268962	2016-05-09 19:36:08 +00:00
Artem Belevich	ca2b951cbc	[CUDA] Make sure device-side __global__ functions are always visible. __global__ functions are a special case in CUDA. Even when the symbol would normally not be externally visible according to C++ rules, they still must be visible in CUDA GPU object so host-side stub can launch them. Differential Revision: http://reviews.llvm.org/D19748 llvm-svn: 268299	2016-05-02 20:30:03 +00:00
Justin Lebar	d3a44f6885	[CUDA] Add -fcuda-flush-denormals-to-zero. Summary: Setting this flag causes all functions are annotated with the "nvvm-f32ftz" = "true" attribute. In addition, we annotate the module with "nvvm-reflect-ftz" set to 0 or 1, depending on whether -cuda-flush-denormals-to-zero is set. This is read by the NVVMReflect pass. Reviewers: tra, rnk Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D18671 llvm-svn: 265435	2016-04-05 18:26:20 +00:00
Justin Lebar	19b648eae3	[CUDA] Add -disable-llvm-passes to CodeGenCUDA/link-device-bitcode.cu. NFC We already have this flag in most of the file, but we need it everywhere else, to disable the NVVMReflect pass, which we're explicitly checking doesn't run here. (Upcoming changes to llvm will cause it to be run.) llvm-svn: 264969	2016-03-30 23:45:38 +00:00
Justin Lebar	25c4a81e79	[CUDA] Remove three obsolete CUDA cc1 flags. Summary: * -fcuda-target-overloads Previously unconditionally set to true by the driver. Necessary for correct functioning of the compiler -- our CUDA headers wrapper won't compile without this. * -fcuda-disable-target-call-checks Previously unconditionally set to true by the driver. Necessary to compile almost any external CUDA code -- almost all libraries assume that host+device code can call host or device functions. * -fcuda-allow-host-calls-from-host-device No effect when target overloading is enabled. Reviewers: tra Subscribers: rsmith, cfe-commits Differential Revision: http://reviews.llvm.org/D18416 llvm-svn: 264739	2016-03-29 16:24:16 +00:00
Justin Lebar	e5eed04d52	[CUDA] Merge most of CodeGenCUDA/function-overload.cu into SemaCUDA/function-overload.cu. Summary: Previously we were using the codegen test to ensure that we choose the right overload. But we can do this within sema, with a bit of cleverness. I left the constructor/destructor checks in CodeGen, because these overloads (particularly on the destructors) are hard to check in Sema. Reviewers: tra Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D18386 llvm-svn: 264207	2016-03-23 22:42:30 +00:00
Artem Belevich	3609085dc4	Fixed test failure platforms with name mangling different from Linux. * Run cc with -triple x86_64-linux-gnu to make symbol mangling predictable. * Use temporary file as a fake GPU input so its content does not interfere with pattern matching. llvm-svn: 262516	2016-03-02 21:03:20 +00:00
Artem Belevich	8c1ec1ef38	[CUDA] Do not generate unnecessary runtime init code. Differential Revision: http://reviews.llvm.org/D17780 llvm-svn: 262499	2016-03-02 18:28:53 +00:00
Artem Belevich	42e1949b46	[CUDA] Emit host-side 'shadows' for device-side global variables ... and register them with CUDA runtime. This is needed for commonly used cudaMemcpy*() APIs that use address of host-side shadow to access their counterparts on device side. Fixes PR26340 Differential Revision: http://reviews.llvm.org/D17779 llvm-svn: 262498	2016-03-02 18:28:50 +00:00
Justin Lebar	ddd97faeec	[CUDA] Mark all CUDA device-side function defs, decls, and calls as convergent. Summary: This is important for e.g. the following case: void sync() { __syncthreads(); } void foo() { do_something(); sync(); do_something_else(): } Without this change, if the optimizer does not inline sync() (which it won't because __syncthreads is also marked as noduplicate, for now anyway), it is free to perform optimizations on sync() that it would not be able to perform on __syncthreads(), because sync() is not marked as convergent. Similarly, we need a notion of convergent calls, since in the case when we can't statically determine a call's target(s), we need to know whether it's safe to perform optimizations around the call. This change is conservative; the optimizer will remove these attrs where it can, see r260318, r260319. Reviewers: majnemer Subscribers: cfe-commits, jhen, echristo, tra Differential Revision: http://reviews.llvm.org/D17056 llvm-svn: 261779	2016-02-24 21:55:11 +00:00
Artem Belevich	186091094a	[CUDA] Tweak attribute-based overload resolution to match nvcc behavior. This is an artefact of split-mode CUDA compilation that we need to mimic. HD functions are sometimes allowed to call H or D functions. Due to split compilation mode device-side compilation will not see host-only function and thus they will not be considered at all. For clang both H and D variants will become function overloads visible to compiler. Normally target attribute is considered only if C++ rules can not determine which function is better. However in this case we need to ignore functions that would not be present during current compilation phase before we apply normal overload resolution rules. Changes: * introduced another level of call preference to better describe possible call combinations. * removed WrongSide functions from consideration if the set contains SameSide function. * disabled H->D, D->H and G->H calls. These combinations are not allowed by CUDA and we were reluctantly allowing them to work around device-side calls to math functions in std namespace. We no longer need it after r258880. Differential Revision: http://reviews.llvm.org/D16870 llvm-svn: 260697	2016-02-12 18:29:18 +00:00
Justin Lebar	9a2c0fbaf5	[CUDA] Don't crash when trying to printf a non-scalar object. Summary: We can't do the right thing, since there's no right thing to do, but at least we can not crash the compiler. Reviewers: majnemer, rnk Subscribers: cfe-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D17103 llvm-svn: 260479	2016-02-11 02:00:52 +00:00
Artem Belevich	97c01c35f8	[CUDA] Do not allow dynamic initialization of global device side variables. In general CUDA does not allow dynamic initialization of global device-side variables. One exception is that CUDA allows records with empty constructors as described in section E2.2.1 of CUDA 7.5 Programming guide. This patch applies initializer checks for all device-side variables. Empty constructors are accepted, but no code is generated for them. Differential Revision: http://reviews.llvm.org/D15305 llvm-svn: 259592	2016-02-02 22:29:48 +00:00
Justin Lebar	c0e42750da	[CUDA] Generate CUDA's printf alloca in its function's entry block. Summary: This is necessary to prevent llvm from generating stacksave intrinsics around this alloca. NVVM doesn't have a stack, and we don't handle said intrinsics. Reviewers: rnk, echristo Subscribers: cfe-commits, jhen, tra Differential Revision: http://reviews.llvm.org/D16664 llvm-svn: 259122	2016-01-28 23:58:28 +00:00
Justin Lebar	cd2f6bbd5c	[CUDA] Don't generate aliases for static extern "C" functions. Summary: These aliases are done to support inline asm, but there's nothing we can do: NVPTX doesn't support aliases. Reviewers: tra Subscribers: cfe-commits, jhen, echristo Differential Revision: http://reviews.llvm.org/D16501 llvm-svn: 258734	2016-01-25 22:36:37 +00:00
Justin Lebar	3039a593db	[CUDA] Make printf work. Summary: The code in CGCUDACall is largely based on a patch written by Eli Bendersky: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140324/210218.html That patch implemented an LLVM pass lowering printf to vprintf; this one does something similar, but in Clang codegen. Reviewers: echristo Subscribers: cfe-commits, jhen, tra, majnemer Differential Revision: http://reviews.llvm.org/D16372 llvm-svn: 258642	2016-01-23 21:28:14 +00:00
Artem Belevich	9b9294674b	[CUDA] Make vtable construction aware of host/device side of CUDA compilation. C++ emits vtables for classes that have key function present in the current TU. While we compile CUDA the fact that key function was found in this TU does not mean that we are going to generate code for it. E.g. vtable for a class with host-only methods should not (and can not) be generated on device side, because we'll never generate code for them during device-side compilation. This patch adds an extra CUDA-specific check during key method computation and filters out potential key methods that are not suitable for this side of CUDA compilation. When we codegen vtable, entries for unsuitable methods are set to null. Differential Revision: http://reviews.llvm.org/D15309 llvm-svn: 255911	2015-12-17 18:12:36 +00:00
Artem Belevich	5d40ae3a46	Allow linking multiple bitcode files. Linking options for particular file depend on the option that specifies the file. Currently there are two: * -mlink-bitcode-file links in complete content of the specified file. * -mlink-cuda-bitcode links in only the symbols needed by current TU. Linked symbols are internalized. This bitcode linking mode is used to link device-specific bitcode provided by CUDA. Files are linked in order they are specified on command line. -mlink-cuda-bitcode replaces -fcuda-uses-libdevice flag. Differential Revision: http://reviews.llvm.org/D13913 llvm-svn: 251427	2015-10-27 17:56:59 +00:00
Artem Belevich	7b41f70e6c	[CUDA] __global__ functions should always be visible externally. Adjust __global__ functions with DiscardableODR linkage to use StrongODR linkage instead, so they are visible externally. Differential Revision: http://reviews.llvm.org/D13067 llvm-svn: 248400	2015-09-23 17:44:53 +00:00
Artem Belevich	94a55e8169	[CUDA] Allow function overloads in CUDA based on host/device attributes. The patch makes it possible to parse CUDA files that contain host/device functions with identical signatures, but different attributes without having to physically split source into host-only and device-only parts. This change is needed in order to parse CUDA header files that have a lot of name clashes with standard include files. Gory details are in design doc here: https://goo.gl/EXnymm Feel free to leave comments there or in this review thread. This feature is controlled with CC1 option -fcuda-target-overloads and is disabled by default. Differential Revision: http://reviews.llvm.org/D12453 llvm-svn: 248295	2015-09-22 17:22:59 +00:00
Artem Belevich	c3fa25def7	[CUDA] Add implicit __attribute__((used)) to all __global__ functions. This makes sure that we emit kernels that were instantiated from the host code and which would never be explicitly referenced by anything else on device side. Differential Revision: http://reviews.llvm.org/D11666 llvm-svn: 248293	2015-09-22 17:22:51 +00:00
Artem Belevich	7cb25c9b69	[CUDA] Postprocess bitcode linked in during device-side CUDA compilation. Link in and internalize the symbols we need from supplied bitcode library. Differential Revision: http://reviews.llvm.org/D11664 llvm-svn: 247317	2015-09-10 18:24:23 +00:00
Artem Belevich	da1851ca58	[CUDA] Allow trivial constructors as initializer for __shared__ variables. Differential Revision: http://reviews.llvm.org/D12739 llvm-svn: 247307	2015-09-10 17:26:58 +00:00
Jingyue Wu	284ebe237f	[CUDA] Change initializer for CUDA device code based on CUDA documentation. Summary: According to CUDA documentation, global variables declared with __device__, __constant__ can be initialized from host code, so mark them as externally initialized. Because __shared__ variables cannot have an initialization as part of their declaration and since the value maybe kept across different kernel invocation, the value of __shared__ is effectively undefined instead of zero initialized. Wrongly using zero initializer may cause illegitimate optimization, e.g. removing unused __constant__ variable because it's not updated in the device code and the value is initialized with zero. Test Plan: test/CodeGenCUDA/address-spaces.cu Patch by Xuetian Weng Reviewers: jholewinski, eliben, tra, jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12241 llvm-svn: 245786	2015-08-22 05:49:28 +00:00
Daniel Jasper	3b0f87d289	Revert "[CUDA] Add implicit __attribute__((used)) to all __global__ functions." This is breaking internal test. I'll provide a reproduction. llvm-svn: 244583	2015-08-11 11:02:09 +00:00
Artem Belevich	b7e4aab40c	[CUDA] Add implicit __attribute__((used)) to all __global__ functions. This allows emitting kernels that were instantiated from the host code and which would never be explicitly referenced otherwise. Differential Revision: http://reviews.llvm.org/D11666 llvm-svn: 244501	2015-08-10 20:57:02 +00:00
Artem Belevich	e958275250	[cuda] Fixed test case failure on s390x llvm-svn: 237007	2015-05-11 18:35:58 +00:00
Artem Belevich	8d062ad560	Fixed test failure on machines with 32-bit size_t. llvm-svn: 236773	2015-05-07 21:06:03 +00:00
Artem Belevich	52cc487ba8	[cuda] Include GPU binary into host object file and generate init/deinit code. - added -fcuda-include-gpubinary option to incorporate results of device-side compilation into host-side one. - generate code to register GPU binaries and associated kernels with CUDA runtime and clean-up on exit. - added test case for init/deinit code generation. Differential Revision: http://reviews.llvm.org/D9507 llvm-svn: 236765	2015-05-07 19:34:16 +00:00
Artem Belevich	0488d1e4ba	[cuda] treat file scope __asm as __host__ and ignore it during device-side compilation. Currently clang emits file-scope asm during both host and device compilation modes which is usually a wrong thing to do. There's no way to attach any attribute to an __asm statement, so there's no way to differentiate between host-side and device-side file-scope asm. This patch makes clang to match nvcc behavior and emit file-scope-asm only during host-side compilation. Differential Revision: http://reviews.llvm.org/D9270 llvm-svn: 235905	2015-04-27 18:52:00 +00:00
Artem Belevich	7093e40641	[cuda] Allow using integral non-type template parameters as launch_bounds attribute arguments. - Changed CUDALaunchBounds arguments from integers to Expr* so they can be saved in AST for instantiation. - Added support for template instantiation of launch_bounds attrubute. - Moved evaluation of launch_bounds arguments to NVPTXTargetCodeGenInfo:: SetTargetAttributes() where it can be done after template instantiation. - Added a warning on negative launch_bounds arguments. - Amended test cases. Differential Revision: http://reviews.llvm.org/D8985 llvm-svn: 235452	2015-04-21 22:55:54 +00:00
Artem Belevich	4e192df778	[cuda] Added support for CUDA built-in variables. Added cuda_builtin_vars.h which implements built-in CUDA variables using __declattr(property). Fields of built-in variables (except for warpSize) are implemented using __declattr(property) which replaces read/write of a member field with a call to a getter/setter member function, in this case with appropriate NVPTX builtin. Added a test case to check diagnostics on attempt to construct or improperly access a built-in variable. Differential Revision: http://reviews.llvm.org/D9064 llvm-svn: 235448	2015-04-21 22:14:13 +00:00
Artem Belevich	a050112bba	Revert r235398 "[cuda] Added support for CUDA built-in variables." r235398 was causing buildbot break due to missing Makefile changes. llvm-svn: 235401	2015-04-21 18:36:42 +00:00
Artem Belevich	d0a2ae054f	[cuda] Added support for CUDA built-in variables. Added cuda_builtin_vars.h which implements built-in CUDA variables using __declattr(property). Fields of built-in variables (except for warpSize) are implemented using __declattr(property) which replaces read/write of a member field with a call to a getter/setter member function, in this case with appropriate NVPTX builtin. Added a test case to check diagnostics on attempt to construct or improperly access a built-in variable. Differential Revision: http://reviews.llvm.org/D9064 llvm-svn: 235398	2015-04-21 17:39:06 +00:00
Jingyue Wu	4f7b9eb217	Fix addrspace when emitting constructors of static local variables Summary: Due to CUDA's implicit address space casting, the type of a static local variable may be more specific (i.e. with address space qualifiers) than the type expected by the constructor. Emit an addrspacecast in that case. Test Plan: Clang used to crash on the added test. Reviewers: nlewycky, pcc, eliben, rsmith Reviewed By: eliben, rsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8575 llvm-svn: 233208	2015-03-25 20:06:28 +00:00
David Blaikie	bdf40a62a7	Test case updates for explicit type parameter to the gep operator llvm-svn: 232187	2015-03-13 18:21:46 +00:00
David Blaikie	a953f2825b	Update Clang tests to handle explicitly typed load changes in LLVM. llvm-svn: 230795	2015-02-27 21:19:58 +00:00
Jacques Pienaar	a50178c23e	CUDA: Add option to allow host device functions to call host functions Commiting code from review http://reviews.llvm.org/D7841 llvm-svn: 230385	2015-02-24 21:45:33 +00:00
Justin Holewinski	f37f3d35eb	When generating llvm.used, we may need an addrspacecast instead of a bitcast. Summary: This is especially important for targets that use multiple address spaces, and commonly place global variables in address spaces other than zero. Fixes PR22383 Test Plan: New test case added: llvm-used.cu Reviewers: jingyue Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7345 llvm-svn: 227861	2015-02-02 21:05:49 +00:00
Duncan P. N. Exon Smith	b3a66691f8	IR: Make metadata typeless in assembly, clang side Match LLVM changes from r224257. llvm-svn: 224259	2014-12-15 19:10:08 +00:00
Eli Bendersky	3468d9d929	Move all CUDA testing inputs to Inputs/ subdirectory inside the tests. llvm-svn: 207453	2014-04-28 22:21:28 +00:00
Eli Bendersky	8578c8f1e3	Add test case for r206302 llvm-svn: 206303	2014-04-15 16:57:53 +00:00
Eli Bendersky	cb39943f6f	Proper handling of static local variables with address space qualifiers. Similar to the implementation for globals in r157167. Patch by Jingyue Wu. llvm-svn: 204677	2014-03-24 22:05:38 +00:00
Hans Wennborg	c9bd88e681	Remove the -cxx-abi command-line flag. This makes the C++ ABI depend entirely on the target: MS ABI for -win32 triples, Itanium otherwise. It's no longer possible to do weird combinations. To be able to run a test with a specific ABI without constraining it to a specific triple, new substitutions are added to lit: %itanium_abi_triple and %ms_abi_triple can be used to get the current target triple adjusted to the desired ABI. For example, if the test suite is running with the i686-pc-win32 target, %itanium_abi_triple will expand to i686-pc-mingw32. Differential Revision: http://llvm-reviews.chandlerc.com/D2545 llvm-svn: 199250	2014-01-14 19:35:09 +00:00
Hans Wennborg	9125b08b52	Update tests in preparation for using the MS ABI for Win32 targets In preparation for making the Win32 triple imply MS ABI mode, make all tests pass in this mode, or make them use the Itanium mode explicitly. Differential Revision: http://llvm-reviews.chandlerc.com/D2401 llvm-svn: 199130	2014-01-13 19:48:13 +00:00
Matt Arsenault	00e65b2afe	Fix test failures after addrspacecast added. Bitcasts between address spaces are no longer allowed. llvm-svn: 194765	2013-11-15 02:19:52 +00:00
Stephen Lin	4362261b00	CHECK-LABEL-ify some code gen tests to improve diagnostic experience when tests fail. llvm-svn: 188447	2013-08-15 06:47:53 +00:00
Justin Holewinski	368374308d	Use kernel metadata to differentiate between kernel and device functions for the NVPTX target. llvm-svn: 178418	2013-03-30 14:38:24 +00:00
Peter Collingbourne	c6b0857e95	CUDA: give static storage class to __shared__ and __constant__ variables without a storage class within a function, to implement CUDA B.2.5: "__shared__ and __constant__ variables have implied static storage [duration]." llvm-svn: 162788	2012-08-28 20:37:50 +00:00
Peter Collingbourne	ee0502d551	CUDA: give correct address space to globals declared in functions llvm-svn: 162787	2012-08-28 20:37:10 +00:00
Justin Holewinski	83e9668133	Replace PTX back-end with NVPTX back-end in all places where Clang cares NV_CONTRIB llvm-svn: 157403	2012-05-24 17:43:12 +00:00
Peter Collingbourne	f44bdf9c5f	CUDA: add CodeGen support for global variable address spaces. Because in CUDA types do not have associated address spaces, globals are declared in their "native" address space, and accessed by bitcasting the pointer to address space 0. This relies on address space 0 being a unified address space. llvm-svn: 157167	2012-05-20 21:08:35 +00:00
Peter Collingbourne	fa4d6033a3	CUDA: IR generation support for device stubs llvm-svn: 141304	2011-10-06 18:51:56 +00:00
Peter Collingbourne	a9455ec9f8	CUDA: add -fcuda-is-device flag This frontend-only flag is used by the IR generator to determine whether to filter CUDA declarations for the host or for the device. llvm-svn: 141301	2011-10-06 18:29:46 +00:00
Peter Collingbourne	fe88342240	CUDA: IR generation support for kernel call expressions llvm-svn: 141300	2011-10-06 18:29:37 +00:00
Peter Collingbourne	5bad4afa2f	CUDA: set proper calling conventions for PTX llvm-svn: 141296	2011-10-06 16:49:54 +00:00

1 2 3

111 Commits