llvm-project

Commit Graph

Author	SHA1	Message	Date
Yaxun (Sam) Liu	47acdec1dd	[CUDA][HIP] Support accessing static device variable in host code for -fgpu-rdc For -fgpu-rdc mode, static device vars in different TU's may have the same name. To support accessing file-scope static device variables in host code, we need to give them a distinct name and external linkage. This can be done by postfixing each static device variable with a distinct CUID (Compilation Unit ID) hash. Since the static device variables have different name across compilation units, now we let them have external linkage so that they can be looked up by the runtime. Reviewed by: Artem Belevich, and Jon Chesterfield Differential Revision: https://reviews.llvm.org/D85223	2021-02-24 18:23:45 -05:00
Yaxun (Sam) Liu	e355110040	[CUDA][HIP] Fix checking dependent initalizer Defer constant checking of dependent initializer to template instantiation since it cannot be done for dependent values. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D95840	2021-02-04 18:04:54 -05:00
Yaxun (Sam) Liu	622eaa4a4c	[HIP] Support __managed__ attribute This patch implements codegen for __managed__ variable attribute for HIP. Diagnostics will be added later. Differential Revision: https://reviews.llvm.org/D94814	2021-01-22 11:43:58 -05:00
Artem Belevich	127091bfd5	[CUDA] Normalize handling of defauled dtor. Defaulted destructor was treated inconsistently, compared to other compiler-generated functions. When Sema::IdentifyCUDATarget() got called on just-created dtor which didn't have implicit __host__ __device__ attributes applied yet, it would treat it as a host function. That happened to (sometimes) hide the error when dtor referred to a host-only functions. Even when we had identified defaulted dtor as a HD function, we still treated it inconsistently during selection of usual deallocators, where we did not allow referring to wrong-side functions, while it is allowed for other HD functions. This change brings handling of defaulted dtors in line with other HD functions. Differential Revision: https://reviews.llvm.org/D94732	2021-01-21 10:48:07 -08:00
Artem Belevich	0936655bac	[CUDA] Do not diagnose host/device variable access in dependent types. `isCUDADeviceBuiltinSurfaceType()`/`isCUDADeviceBuiltinTextureType()` do not work on dependent types as they rely on specific type attributes. Differential Revision: https://reviews.llvm.org/D92893	2020-12-14 11:53:18 -08:00
Yaxun (Sam) Liu	acb6f80d96	[CUDA][HIP] Fix overloading resolution This patch implements correct hostness based overloading resolution in isBetterOverloadCandidate. Based on hostness, if one candidate is emittable whereas the other candidate is not emittable, the emittable candidate is better. If both candidates are emittable, or neither is emittable based on hostness, then other rules should be used to determine which is better. This is because hostness based overloading resolution is mostly for determining viability of a function. If two functions are both viable, other factors should take precedence in preference. If other rules cannot determine which is better, CUDA preference will be used again to determine which is better. However, correct hostness based overloading resolution requires overloading resolution diagnostics to be deferred, which is not on by default. The rationale is that deferring overloading resolution diagnostics may hide overloading reslolutions issues in header files. An option -fgpu-exclude-wrong-side-overloads is added, which is off by default. When -fgpu-exclude-wrong-side-overloads is off, keep the original behavior, that is, exclude wrong side overloads only if there are same side overloads. This may result in incorrect overloading resolution when there are no same side candates, but is sufficient for most CUDA/HIP applications. When -fgpu-exclude-wrong-side-overloads is on, enable deferring overloading resolution diagnostics and enable correct hostness based overloading resolution, i.e., always exclude wrong side overloads. Differential Revision: https://reviews.llvm.org/D80450	2020-12-02 16:33:33 -05:00
Yaxun (Sam) Liu	5c8911d0ba	[CUDA][HIP] Diagnose reference of host variable This patch diagnoses invalid references of global host variables in device, global, or host device functions. Differential Revision: https://reviews.llvm.org/D91281	2020-12-02 10:15:56 -05:00
Artem Belevich	be86b6773b	[CUDA] Allow local static variables with target attributes. While CUDA documentation claims that such variables are not allowed[1], NVCC has been accepting them since CUDA-10.0[2] and some headers in CUDA-11 rely on this working. 1. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#static-variables-function 2. https://godbolt.org/z/zsodzc Differential Revision: https://reviews.llvm.org/D88345	2020-11-03 10:30:38 -08:00
Artem Belevich	0a3ebb4d8d	Revert "[CUDA] Allow local static variables with target attributes." This reverts commit `f38a9e5117` Which triggered assertions.	2020-11-02 15:09:07 -08:00
Artem Belevich	f38a9e5117	[CUDA] Allow local static variables with target attributes. While CUDA documentation claims that such variables are not allowed[1], NVCC has been accepting them since CUDA-10.0[2] and some headers in CUDA-11 rely on this working. 1. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#static-variables-function 2. https://godbolt.org/z/zsodzc Differential Revision: https://reviews.llvm.org/D88345	2020-11-02 14:37:13 -08:00
Yaxun (Sam) Liu	52bcd691cb	Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions" This recommits `7f1f89ec8d` and `40df06cdaf` with bug fixes for memory sanitizer failure and Tensile build failure.	2020-10-19 17:48:04 -04:00
Richard Smith	f7f2e4261a	PR47805: Use a single object for a function parameter in the caller and callee in constant evaluation. We previously made a deep copy of function parameters of class type when passing them, resulting in the destructor for the parameter applying to the original argument value, ignoring any modifications made in the function body. This also meant that the 'this' pointer of the function parameter could be observed changing between the caller and the callee. This change completely reimplements how we model function parameters during constant evaluation. We now model them roughly as if they were variables living in the caller, albeit with an artificially reduced scope that covers only the duration of the function call, instead of modeling them as temporaries in the caller that we partially "reparent" into the callee at the point of the call. This brings some minor diagnostic improvements, as well as significantly reduced stack usage during constant evaluation.	2020-10-14 17:43:51 -07:00
Richard Smith	69f7c006ff	Revert "PR47805: Use a single object for a function parameter in the caller and" Breaks a clangd unit test. This reverts commit `8f8b9f2cca`.	2020-10-13 19:32:03 -07:00
Richard Smith	8f8b9f2cca	PR47805: Use a single object for a function parameter in the caller and callee in constant evaluation. We previously made a deep copy of function parameters of class type when passing them, resulting in the destructor for the parameter applying to the original argument value, ignoring any modifications made in the function body. This also meant that the 'this' pointer of the function parameter could be observed changing between the caller and the callee. This change completely reimplements how we model function parameters during constant evaluation. We now model them roughly as if they were variables living in the caller, albeit with an artificially reduced scope that covers only the duration of the function call, instead of modeling them as temporaries in the caller that we partially "reparent" into the callee at the point of the call. This brings some minor diagnostic improvements, as well as significantly reduced stack usage during constant evaluation.	2020-10-13 18:50:46 -07:00
Richard Smith	ab870f3030	Revert "PR47805: Use a single object for a function parameter in the caller and" The buildbots are displeased. This reverts commit `8d03a972ce`.	2020-10-13 15:59:00 -07:00
Richard Smith	8d03a972ce	PR47805: Use a single object for a function parameter in the caller and callee in constant evaluation. We previously made a deep copy of function parameters of class type when passing them, resulting in the destructor for the parameter applying to the original argument value, ignoring any modifications made in the function body. This also meant that the 'this' pointer of the function parameter could be observed changing between the caller and the callee. This change completely reimplements how we model function parameters during constant evaluation. We now model them roughly as if they were variables living in the caller, albeit with an artificially reduced scope that covers only the duration of the function call, instead of modeling them as temporaries in the caller that we partially "reparent" into the callee at the point of the call. This brings some minor diagnostic improvements, as well as significantly reduced stack usage during constant evaluation.	2020-10-13 15:45:04 -07:00
Reid Kleckner	3453b6928d	Revert "Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions"" This reverts commit `e39da8ab6a`. This depends on a change that needs additional design review and needs to be reverted.	2020-09-24 11:16:54 -07:00
Yaxun (Sam) Liu	e39da8ab6a	Recommit "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions" This recommits `7f1f89ec8d` and `40df06cdaf` after fixing memory sanitizer failure.	2020-09-24 08:44:37 -04:00
Yaxun (Sam) Liu	772bd8a7d9	Revert "[CUDA][HIP] Defer overloading resolution diagnostics for host device functions" This reverts commit `7f1f89ec8d`. This reverts commit `40df06cdaf`.	2020-09-17 13:55:31 -04:00
Yaxun (Sam) Liu	40df06cdaf	[CUDA][HIP] Defer overloading resolution diagnostics for host device functions In CUDA/HIP a function may become implicit host device function by pragma or constexpr. A host device function is checked in both host and device compilation. However it may be emitted only on host or device side, therefore the diagnostics should be deferred until it is known to be emitted. Currently clang is only able to defer certain diagnostics. This causes false alarms and limits the usefulness of host device functions. This patch lets clang defer all overloading resolution diagnostics for host device functions. An option -fgpu-defer-diag is added to control this behavior. By default it is off. It is NFC for other languages. Differential Revision: https://reviews.llvm.org/D84364	2020-09-17 11:30:42 -04:00
Yaxun (Sam) Liu	9275e14379	recommit `4fc752b30b` [CUDA][HIP] Always defer diagnostics for wrong-sided reference Fixed regression in test builtin-amdgcn-atomic-inc-dec-failure.cpp.	2020-07-17 09:14:39 -04:00
Yaxun (Sam) Liu	a46ef7d42d	Revert "[CUDA][HIP] Always defer diagnostics for wrong-sided reference" This reverts commit `4fc752b30b`.	2020-07-17 08:10:56 -04:00
Yaxun (Sam) Liu	4fc752b30b	[CUDA][HIP] Always defer diagnostics for wrong-sided reference When a device function calls a host function or vice versa, this is wrong-sided reference. Currently clang immediately diagnose it. This is different from nvcc behavior, where it is diagnosed only if the function is really emitted. Current clang behavior causes false alarms for valid use cases. This patch let clang always defer diagnostics for wrong-sided reference. Differential Revision: https://reviews.llvm.org/D83893	2020-07-17 07:51:55 -04:00
Yaxun (Sam) Liu	1eaad01046	[CUDA][HIP] Let lambda be host device by default This patch let lambda be host device by default and adds diagnostics for capturing host variable by reference in device lambda. Differential Revision: https://reviews.llvm.org/D78655	2020-07-08 13:10:26 -04:00
Fangrui Song	dfc0d94755	Revert D80450 "[CUDA][HIP] Fix implicit HD function resolution" This reverts commit `263390d4f5`. This can still cause bogus errors: eigen3/Eigen/src/Core/CoreEvaluators.h:94:38: error: call to implicitly-deleted copy constructor of 'unary_evaluator<Eigen::Inverse<Eigen::Matrix<double, 4, 4, 0, 4, 4>>>' thrust/system/detail/generic/for_each.h:49:3: error: implicit instantiation of undefined template 'thrust::detail::STATIC_ASSERTION_FAILURE<false>'	2020-06-10 17:42:28 -07:00
Yaxun (Sam) Liu	263390d4f5	[CUDA][HIP] Fix implicit HD function resolution recommit `e03394c6a6` with fix When implicit HD function calls a function in device compilation, if one candidate is an implicit HD function, current resolution rule is: D wins over HD and H HD and H are equal this caused regression when there is an otherwise worse D candidate This patch changes that to D, HD and H are all equal The rationale is that we already know for host compilation there is already a valid candidate in HD and H candidates that will not cause error. Allowing HD and H gives us a fall back candidate that will not cause error. If D wins, that means D has to be a better match otherwise, therefore D should also be a valid candidate that will not cause error. In this way, we can guarantee no regression. Differential Revision: https://reviews.llvm.org/D80450	2020-06-04 16:54:52 -04:00
Yaxun (Sam) Liu	049d860707	[CUDA][HIP] Fix constexpr variables for C++17 constexpr variables are compile time constants and implicitly const, therefore they are safe to emit on both device and host side. Besides, in many cases they are intended for both device and host, therefore it makes sense to emit them on both device and host sides if necessary. In most cases constexpr variables are used as rvalue and the variables themselves do not need to be emitted. However if their address is taken, then they need to be emitted. For C++14, clang is able to handle that since clang emits them with available_externally linkage together with the initializer. However for C++17, the constexpr static data member of a class or template class become inline variables implicitly. Therefore they become definitions with linkonce_odr or weak_odr linkages. As such, they can not have available_externally linkage. This patch fixes that by adding implicit constant attribute to file scope constexpr variables and constexpr static data members in device compilation. Differential Revision: https://reviews.llvm.org/D79237	2020-06-03 21:56:52 -04:00
Artem Belevich	ef649e8fd5	Revert "[CUDA][HIP] Workaround for resolving host device function against wrong-sided function" Still breaks CUDA compilation. This reverts commit `e03394c6a6`.	2020-05-18 12:22:55 -07:00
Yaxun (Sam) Liu	e03394c6a6	[CUDA][HIP] Workaround for resolving host device function against wrong-sided function recommit `c77a4078e0` with fix https://reviews.llvm.org/D77954 caused regressions due to diagnostics in implicit host device functions. For now, it seems the most feasible workaround is to treat implicit host device function and explicit host device function differently. Basically in device compilation for implicit host device functions, keep the old behavior, i.e. give host device candidates and wrong-sided candidates equal preference. For explicit host device functions, favor host device candidates against wrong-sided candidates. The rationale is that explicit host device functions are blessed by the user to be valid host device functions, that is, they should not cause diagnostics in both host and device compilation. If diagnostics occur, user is able to fix them. However, there is no guarantee that implicit host device function can be compiled in device compilation, therefore we need to preserve its overloading resolution in device compilation. Differential Revision: https://reviews.llvm.org/D79526	2020-05-12 08:27:50 -04:00
Artem Belevich	bf6a26b066	Revert D77954 -- it breaks Eigen & Tensorflow. This reverts commit `55bcb96f31`.	2020-05-05 14:07:31 -07:00
Yaxun (Sam) Liu	d75a6e93ae	[CUDA][HIP] Fix empty ctor/dtor check for union union ctor does not call ctors of its data members. union dtor does not call dtors of its data members. Also union does not have base class. Currently when clang checks whether union has an empty ctor/dtor, it checks the ctors/dtors of its data members. This causes incorrectly diagnose device side global variables and shared variables as having non-empty ctors/dtors. This patch fixes that. Differential Revision: https://reviews.llvm.org/D79367	2020-05-04 21:52:04 -04:00
Yaxun (Sam) Liu	55bcb96f31	recommit `c77a4078e0` with fix https://reviews.llvm.org/D77954 caused a regression about ambiguity of new operator in file scope. This patch recovered the previous behavior for comparison without a caller. This is a workaround. For real fix we need D71227 https://reviews.llvm.org/D78970	2020-04-28 09:14:13 -04:00
Dmitri Gribenko	8c8aae852b	Revert "recommit c77a4078e01033aa2206c31a579d217c8a07569b" This reverts commit `b46b1a916d`. It broke overload resolution for operator 'new' -- see reproducer in https://reviews.llvm.org/D77954.	2020-04-27 16:41:35 +02:00
Yaxun (Sam) Liu	b46b1a916d	recommit `c77a4078e0`	2020-04-24 16:53:18 -04:00
Yaxun (Sam) Liu	7eae00477f	Revert "[CUDA][HIP] Fix host/device based overload resolution" This reverts commit `c77a4078e0`.	2020-04-24 14:57:10 -04:00
Yaxun (Sam) Liu	c77a4078e0	[CUDA][HIP] Fix host/device based overload resolution Currently clang fails to compile the following CUDA program in device compilation: __host__ int foo(int x) { return 1; } template<class T> __device__ __host__ int foo(T x) { return 2; } __device__ __host__ int bar() { return foo(1); } __global__ void test(int a) { a = bar(); } This is due to foo is resolved to the __host__ foo instead of __device__ __host__ foo. This seems to be a bug since __device__ __host__ foo is a viable callee for foo whereas clang is unable to choose it. This patch fixes that. Differential Revision: https://reviews.llvm.org/D77954	2020-04-24 14:55:18 -04:00
Michael Liao	86e3b735cd	[hip] Claim builtin type `__float128` supported if the host target supports it. Reviewers: tra, yaxunl Subscribers: jvesely, nhaehnle, kerbowa, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D78513	2020-04-21 15:56:40 -04:00
Michael Liao	c97be2c377	[hip] Remove `hip_pinned_shadow`. Summary: - Use `device_builtin_surface` and `device_builtin_texture` for surface/texture reference support. So far, both the host and device use the same reference type, which could be revised later when interface/implementation is stablized. Reviewers: yaxunl Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77583	2020-04-07 09:51:49 -04:00
Yaxun (Sam) Liu	2c31aa2de1	Speed up deferred diagnostic emitter Move function emitDeferredDiags from Sema to DeferredDiagsEmitter since it is only used by DeferredDiagsEmitter. Also skip visited functions to avoid exponential compile time. Differential Revision: https://reviews.llvm.org/D77028	2020-04-06 13:07:43 -04:00
Michael Liao	5be9b8cbe2	[cuda][hip] Add CUDA builtin surface/texture reference support. Summary: - Re-commit after fix Sema checks on partial template specialization. Reviewers: tra, rjmccall, yaxunl, a.sidorin Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76365	2020-03-27 17:18:49 -04:00
Artem Belevich	fe8063e1a0	Revert "[cuda][hip] Add CUDA builtin surface/texture reference support." This reverts commit `6a9ad5f3f4`. The patch breaks CUDA copmilation. Differential Revision: https://reviews.llvm.org/D76365	2020-03-27 10:01:38 -07:00
Michael Liao	6a9ad5f3f4	[cuda][hip] Add CUDA builtin surface/texture reference support. Summary: - Even though the bindless surface/texture interfaces are promoted, there are still code using surface/texture references. For example, [PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the compilation issue for code using `tex2D` with texture references. For better compatibility, this patch proposes the support of surface/texture references. - Due to the absent documentation and magic headers, it's believed that `nvcc` does use builtins for texture support. From the limited NVVM documentation[^nvvm] and NVPTX backend texture/surface related tests[^test], it's believed that surface/texture references are supported by replacing their reference types, which are annotated with `device_builtin_surface_type`/`device_builtin_texture_type`, with the corresponding handle-like object types, `cudaSurfaceObject_t` or `cudaTextureObject_t`, in the device-side compilation. On the host side, that global handle variables are registered and will be established and updated later when corresponding binding/unbinding APIs are called[^bind]. Surface/texture references are most like device global variables but represented in different types on the host and device sides. - In this patch, the following changes are proposed to support that behavior: + Refine `device_builtin_surface_type` and `device_builtin_texture_type` attributes to be applied on `Type` decl only to check whether a variable is of the surface/texture reference type. + Add hooks in code generation to replace that reference types with the correponding object types as well as all accesses to them. In particular, `nvvm.texsurf.handle.internal` should be used to load object handles from global reference variables[^texsurf] as well as metadata annotations. + Generate host-side registration with proper template argument parsing. --- [^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf [^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll [^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf). [^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later. Reviewers: tra, rjmccall, yaxunl, a.sidorin Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76365	2020-03-26 14:44:52 -04:00
Yaxun (Sam) Liu	b670ab7b6b	recommit `1b978ddba0` [CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST travese Differential Revision: https://reviews.llvm.org/D70172	2020-03-23 12:09:07 -04:00
Yaxun (Sam) Liu	bcadb1f2e6	Revert "[CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST travese" This reverts commit `1b978ddba0`.	2020-02-18 14:45:34 -05:00
Yaxun (Sam) Liu	1b978ddba0	[CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST travese This patch removes the explicit call graph for CUDA/HIP/OpenMP deferred diagnostics generated during parsing since it is error prone due to incomplete information about function declarations during parsing. In stead, this patch does a post-parsing AST traverse and emits deferred diagnostics based on the use graph implicitly generated during the traverse. Differential Revision: https://reviews.llvm.org/D70172	2020-02-16 22:44:33 -05:00
Yaxun (Sam) Liu	fb44b9db95	[OpenCL][CUDA][HIP][SYCL] Add norecurse norecurse function attr indicates the function is not called recursively directly or indirectly. Add norecurse to OpenCL functions, SYCL functions in device compilation and CUDA/HIP kernels. Although there is LLVM pass adding norecurse to functions, it only works for whole-program compilation. Also FE adding norecurse can make that pass run faster since functions with norecurse do not need to be checked again. Differential Revision: https://reviews.llvm.org/D73651	2020-02-16 20:41:00 -05:00
Yaxun Liu	c1f8e04eee	[CUDA][HIP} Add a test for constexpr default ctor Differential Revision: https://reviews.llvm.org/D68753 llvm-svn: 374502	2019-10-11 02:43:28 +00:00
Yaxun Liu	229c78d3a5	[CUDA][HIP] Fix host/device check with -fopenmp CUDA/HIP program may be compiled with -fopenmp. In this case, -fopenmp is only passed to host compilation to take advantages of multi-threads computation. CUDA/HIP and OpenMP both use Sema::DeviceCallGraph to store functions to be analyzed and remove them once they decide the function is sure to be emitted. CUDA/HIP and OpenMP have different functions to determine if a function is sure to be emitted. To check host/device correctly for CUDA/HIP when -fopenmp is enabled, there needs a unified logic to determine whether a function is to be emitted. The logic needs to be aware of both CUDA and OpenMP logic. Differential Revision: https://reviews.llvm.org/D67837 llvm-svn: 374263	2019-10-09 23:54:10 +00:00
Martin Storsjo	71decf841c	[clang] [AST] Treat "inline gnu_inline" the same way as "extern inline gnu_inline" in C++ mode This matches how GCC handles it, see e.g. https://gcc.godbolt.org/z/HPplnl. GCC documents the gnu_inline attribute with "In C++, this attribute does not depend on extern in any way, but it still requires the inline keyword to enable its special behavior." The previous behaviour of gnu_inline in C++, without the extern keyword, can be traced back to the original commit that added support for gnu_inline, SVN r69045. Differential Revision: https://reviews.llvm.org/D67414 llvm-svn: 373078	2019-09-27 12:25:19 +00:00
Michael Liao	24337db616	[CUDA][HIP] Enable kernel function return type deduction. Summary: - Even though only `void` is still accepted as the deduced return type, enabling deduction/instantiation on the return type allows more consistent coding. Reviewers: tra, jlebar Subscribers: cfe-commits, yaxunl Tags: #clang Differential Revision: https://reviews.llvm.org/D68031 llvm-svn: 372898	2019-09-25 16:51:45 +00:00

1 2 3 4

187 Commits