llvm-project

Commit Graph

Author	SHA1	Message	Date
Fangrui Song	dfc0d94755	Revert D80450 "[CUDA][HIP] Fix implicit HD function resolution" This reverts commit `263390d4f5`. This can still cause bogus errors: eigen3/Eigen/src/Core/CoreEvaluators.h:94:38: error: call to implicitly-deleted copy constructor of 'unary_evaluator<Eigen::Inverse<Eigen::Matrix<double, 4, 4, 0, 4, 4>>>' thrust/system/detail/generic/for_each.h:49:3: error: implicit instantiation of undefined template 'thrust::detail::STATIC_ASSERTION_FAILURE<false>'	2020-06-10 17:42:28 -07:00
Yaxun (Sam) Liu	263390d4f5	[CUDA][HIP] Fix implicit HD function resolution recommit `e03394c6a6` with fix When implicit HD function calls a function in device compilation, if one candidate is an implicit HD function, current resolution rule is: D wins over HD and H HD and H are equal this caused regression when there is an otherwise worse D candidate This patch changes that to D, HD and H are all equal The rationale is that we already know for host compilation there is already a valid candidate in HD and H candidates that will not cause error. Allowing HD and H gives us a fall back candidate that will not cause error. If D wins, that means D has to be a better match otherwise, therefore D should also be a valid candidate that will not cause error. In this way, we can guarantee no regression. Differential Revision: https://reviews.llvm.org/D80450	2020-06-04 16:54:52 -04:00
Yaxun (Sam) Liu	049d860707	[CUDA][HIP] Fix constexpr variables for C++17 constexpr variables are compile time constants and implicitly const, therefore they are safe to emit on both device and host side. Besides, in many cases they are intended for both device and host, therefore it makes sense to emit them on both device and host sides if necessary. In most cases constexpr variables are used as rvalue and the variables themselves do not need to be emitted. However if their address is taken, then they need to be emitted. For C++14, clang is able to handle that since clang emits them with available_externally linkage together with the initializer. However for C++17, the constexpr static data member of a class or template class become inline variables implicitly. Therefore they become definitions with linkonce_odr or weak_odr linkages. As such, they can not have available_externally linkage. This patch fixes that by adding implicit constant attribute to file scope constexpr variables and constexpr static data members in device compilation. Differential Revision: https://reviews.llvm.org/D79237	2020-06-03 21:56:52 -04:00
Artem Belevich	ef649e8fd5	Revert "[CUDA][HIP] Workaround for resolving host device function against wrong-sided function" Still breaks CUDA compilation. This reverts commit `e03394c6a6`.	2020-05-18 12:22:55 -07:00
Yaxun (Sam) Liu	e03394c6a6	[CUDA][HIP] Workaround for resolving host device function against wrong-sided function recommit `c77a4078e0` with fix https://reviews.llvm.org/D77954 caused regressions due to diagnostics in implicit host device functions. For now, it seems the most feasible workaround is to treat implicit host device function and explicit host device function differently. Basically in device compilation for implicit host device functions, keep the old behavior, i.e. give host device candidates and wrong-sided candidates equal preference. For explicit host device functions, favor host device candidates against wrong-sided candidates. The rationale is that explicit host device functions are blessed by the user to be valid host device functions, that is, they should not cause diagnostics in both host and device compilation. If diagnostics occur, user is able to fix them. However, there is no guarantee that implicit host device function can be compiled in device compilation, therefore we need to preserve its overloading resolution in device compilation. Differential Revision: https://reviews.llvm.org/D79526	2020-05-12 08:27:50 -04:00
Artem Belevich	bf6a26b066	Revert D77954 -- it breaks Eigen & Tensorflow. This reverts commit `55bcb96f31`.	2020-05-05 14:07:31 -07:00
Yaxun (Sam) Liu	d75a6e93ae	[CUDA][HIP] Fix empty ctor/dtor check for union union ctor does not call ctors of its data members. union dtor does not call dtors of its data members. Also union does not have base class. Currently when clang checks whether union has an empty ctor/dtor, it checks the ctors/dtors of its data members. This causes incorrectly diagnose device side global variables and shared variables as having non-empty ctors/dtors. This patch fixes that. Differential Revision: https://reviews.llvm.org/D79367	2020-05-04 21:52:04 -04:00
Yaxun (Sam) Liu	55bcb96f31	recommit `c77a4078e0` with fix https://reviews.llvm.org/D77954 caused a regression about ambiguity of new operator in file scope. This patch recovered the previous behavior for comparison without a caller. This is a workaround. For real fix we need D71227 https://reviews.llvm.org/D78970	2020-04-28 09:14:13 -04:00
Dmitri Gribenko	8c8aae852b	Revert "recommit c77a4078e01033aa2206c31a579d217c8a07569b" This reverts commit `b46b1a916d`. It broke overload resolution for operator 'new' -- see reproducer in https://reviews.llvm.org/D77954.	2020-04-27 16:41:35 +02:00
Yaxun (Sam) Liu	b46b1a916d	recommit `c77a4078e0`	2020-04-24 16:53:18 -04:00
Yaxun (Sam) Liu	7eae00477f	Revert "[CUDA][HIP] Fix host/device based overload resolution" This reverts commit `c77a4078e0`.	2020-04-24 14:57:10 -04:00
Yaxun (Sam) Liu	c77a4078e0	[CUDA][HIP] Fix host/device based overload resolution Currently clang fails to compile the following CUDA program in device compilation: __host__ int foo(int x) { return 1; } template<class T> __device__ __host__ int foo(T x) { return 2; } __device__ __host__ int bar() { return foo(1); } __global__ void test(int a) { a = bar(); } This is due to foo is resolved to the __host__ foo instead of __device__ __host__ foo. This seems to be a bug since __device__ __host__ foo is a viable callee for foo whereas clang is unable to choose it. This patch fixes that. Differential Revision: https://reviews.llvm.org/D77954	2020-04-24 14:55:18 -04:00
Michael Liao	86e3b735cd	[hip] Claim builtin type `__float128` supported if the host target supports it. Reviewers: tra, yaxunl Subscribers: jvesely, nhaehnle, kerbowa, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D78513	2020-04-21 15:56:40 -04:00
Michael Liao	c97be2c377	[hip] Remove `hip_pinned_shadow`. Summary: - Use `device_builtin_surface` and `device_builtin_texture` for surface/texture reference support. So far, both the host and device use the same reference type, which could be revised later when interface/implementation is stablized. Reviewers: yaxunl Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77583	2020-04-07 09:51:49 -04:00
Yaxun (Sam) Liu	2c31aa2de1	Speed up deferred diagnostic emitter Move function emitDeferredDiags from Sema to DeferredDiagsEmitter since it is only used by DeferredDiagsEmitter. Also skip visited functions to avoid exponential compile time. Differential Revision: https://reviews.llvm.org/D77028	2020-04-06 13:07:43 -04:00
Michael Liao	5be9b8cbe2	[cuda][hip] Add CUDA builtin surface/texture reference support. Summary: - Re-commit after fix Sema checks on partial template specialization. Reviewers: tra, rjmccall, yaxunl, a.sidorin Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76365	2020-03-27 17:18:49 -04:00
Artem Belevich	fe8063e1a0	Revert "[cuda][hip] Add CUDA builtin surface/texture reference support." This reverts commit `6a9ad5f3f4`. The patch breaks CUDA copmilation. Differential Revision: https://reviews.llvm.org/D76365	2020-03-27 10:01:38 -07:00
Michael Liao	6a9ad5f3f4	[cuda][hip] Add CUDA builtin surface/texture reference support. Summary: - Even though the bindless surface/texture interfaces are promoted, there are still code using surface/texture references. For example, [PR#26400](https://bugs.llvm.org/show_bug.cgi?id=26400) reports the compilation issue for code using `tex2D` with texture references. For better compatibility, this patch proposes the support of surface/texture references. - Due to the absent documentation and magic headers, it's believed that `nvcc` does use builtins for texture support. From the limited NVVM documentation[^nvvm] and NVPTX backend texture/surface related tests[^test], it's believed that surface/texture references are supported by replacing their reference types, which are annotated with `device_builtin_surface_type`/`device_builtin_texture_type`, with the corresponding handle-like object types, `cudaSurfaceObject_t` or `cudaTextureObject_t`, in the device-side compilation. On the host side, that global handle variables are registered and will be established and updated later when corresponding binding/unbinding APIs are called[^bind]. Surface/texture references are most like device global variables but represented in different types on the host and device sides. - In this patch, the following changes are proposed to support that behavior: + Refine `device_builtin_surface_type` and `device_builtin_texture_type` attributes to be applied on `Type` decl only to check whether a variable is of the surface/texture reference type. + Add hooks in code generation to replace that reference types with the correponding object types as well as all accesses to them. In particular, `nvvm.texsurf.handle.internal` should be used to load object handles from global reference variables[^texsurf] as well as metadata annotations. + Generate host-side registration with proper template argument parsing. --- [^nvvm]: https://docs.nvidia.com/cuda/pdf/NVVM_IR_Specification.pdf [^test]: https://raw.githubusercontent.com/llvm/llvm-project/master/llvm/test/CodeGen/NVPTX/tex-read-cuda.ll [^bind]: See section 3.2.11.1.2 ``Texture reference API` in [CUDA C Programming Guide](https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf). [^texsurf]: According to NVVM IR, `nvvm.texsurf.handle` should be used. But, the current backend doesn't have that supported. We may revise that later. Reviewers: tra, rjmccall, yaxunl, a.sidorin Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76365	2020-03-26 14:44:52 -04:00
Yaxun (Sam) Liu	b670ab7b6b	recommit `1b978ddba0` [CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST travese Differential Revision: https://reviews.llvm.org/D70172	2020-03-23 12:09:07 -04:00
Yaxun (Sam) Liu	bcadb1f2e6	Revert "[CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST travese" This reverts commit `1b978ddba0`.	2020-02-18 14:45:34 -05:00
Yaxun (Sam) Liu	1b978ddba0	[CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST travese This patch removes the explicit call graph for CUDA/HIP/OpenMP deferred diagnostics generated during parsing since it is error prone due to incomplete information about function declarations during parsing. In stead, this patch does a post-parsing AST traverse and emits deferred diagnostics based on the use graph implicitly generated during the traverse. Differential Revision: https://reviews.llvm.org/D70172	2020-02-16 22:44:33 -05:00
Yaxun (Sam) Liu	fb44b9db95	[OpenCL][CUDA][HIP][SYCL] Add norecurse norecurse function attr indicates the function is not called recursively directly or indirectly. Add norecurse to OpenCL functions, SYCL functions in device compilation and CUDA/HIP kernels. Although there is LLVM pass adding norecurse to functions, it only works for whole-program compilation. Also FE adding norecurse can make that pass run faster since functions with norecurse do not need to be checked again. Differential Revision: https://reviews.llvm.org/D73651	2020-02-16 20:41:00 -05:00
Yaxun Liu	c1f8e04eee	[CUDA][HIP} Add a test for constexpr default ctor Differential Revision: https://reviews.llvm.org/D68753 llvm-svn: 374502	2019-10-11 02:43:28 +00:00
Yaxun Liu	229c78d3a5	[CUDA][HIP] Fix host/device check with -fopenmp CUDA/HIP program may be compiled with -fopenmp. In this case, -fopenmp is only passed to host compilation to take advantages of multi-threads computation. CUDA/HIP and OpenMP both use Sema::DeviceCallGraph to store functions to be analyzed and remove them once they decide the function is sure to be emitted. CUDA/HIP and OpenMP have different functions to determine if a function is sure to be emitted. To check host/device correctly for CUDA/HIP when -fopenmp is enabled, there needs a unified logic to determine whether a function is to be emitted. The logic needs to be aware of both CUDA and OpenMP logic. Differential Revision: https://reviews.llvm.org/D67837 llvm-svn: 374263	2019-10-09 23:54:10 +00:00
Martin Storsjo	71decf841c	[clang] [AST] Treat "inline gnu_inline" the same way as "extern inline gnu_inline" in C++ mode This matches how GCC handles it, see e.g. https://gcc.godbolt.org/z/HPplnl. GCC documents the gnu_inline attribute with "In C++, this attribute does not depend on extern in any way, but it still requires the inline keyword to enable its special behavior." The previous behaviour of gnu_inline in C++, without the extern keyword, can be traced back to the original commit that added support for gnu_inline, SVN r69045. Differential Revision: https://reviews.llvm.org/D67414 llvm-svn: 373078	2019-09-27 12:25:19 +00:00
Michael Liao	24337db616	[CUDA][HIP] Enable kernel function return type deduction. Summary: - Even though only `void` is still accepted as the deduced return type, enabling deduction/instantiation on the return type allows more consistent coding. Reviewers: tra, jlebar Subscribers: cfe-commits, yaxunl Tags: #clang Differential Revision: https://reviews.llvm.org/D68031 llvm-svn: 372898	2019-09-25 16:51:45 +00:00
Yaxun Liu	e5d17c511f	[CUDA][HIP] Fix hostness of defaulted constructor Clang does not respect the explicit device host attributes of defaulted special members. Also clang does not respect the hostness of special members determined by their first declarations. Clang also adds duplicate implicit device or host attributes in certain cases. This patch fixes that. Differential Revision: https://reviews.llvm.org/D67509 llvm-svn: 372394	2019-09-20 14:28:09 +00:00
Michael Liao	b8fc6a9116	[CUDA][HIP] Re-apply part of r372318. - r372318 causes violation of `use-of-uninitialized-value` detected by MemorySanitizer. Once `Viable` field is set to false, `FailureKind` needs setting as well as it will be checked during destruction if `Viable` is not true. - Revert the part trying to skip `std::vector` erasing. llvm-svn: 372356	2019-09-19 21:26:18 +00:00
Mitch Phillips	08f938bd1a	Revert "[CUDA][HIP] Fix typo in `BestViableFunction`" Broke the msan buildbots (see comments on rL372318 for more details). This reverts commit `eb231d1582`. llvm-svn: 372353	2019-09-19 21:11:28 +00:00
Michael Liao	eb231d1582	[CUDA][HIP] Fix typo in `BestViableFunction` Summary: - Should consider viable ones only when checking SameSide candidates. - Replace erasing with clearing viable flag to reduce data moving/copying. - Add one and revise another one as the diagnostic message are more relevant compared to previous one. Reviewers: tra Subscribers: cfe-commits, yaxunl Tags: #clang Differential Revision: https://reviews.llvm.org/D67730 llvm-svn: 372318	2019-09-19 13:14:03 +00:00
Sunil Srivastava	85d667fcb6	Renamed and changed the wording of warn_cconv_ignored As discussed in D64780 the wording of this warning message is being changed to say 'is not supported' instead of 'ignored', and the diag ID itself is being changed to warn_cconv_not_supported. llvm-svn: 366368	2019-07-17 20:41:26 +00:00
Yaxun Liu	c3dfe9082b	[HIP] Support attribute hip_pinned_shadow This patch introduces support of hip_pinned_shadow variable for HIP. A hip_pinned_shadow variable is a global variable with attribute hip_pinned_shadow. It has external linkage on device side and has no initializer. It has internal linkage on host side and has initializer or static constructor. It can be accessed in both device code and host code. This allows HIP runtime to implement support of HIP texture reference. Differential Revision: https://reviews.llvm.org/D62738 llvm-svn: 364381	2019-06-26 03:47:37 +00:00
Erich Keane	505427cb2f	Permit redeclarations of a builtin to specify calling convention. After https://reviews.llvm.org/rL355317 we noticed that quite a decent amount of code redeclares builtins (memcpy in particular, I believe reduced from an MSVC header) with a calling convention specified. This gets particularly troublesome when the user specifies a new 'default' calling convention on the command line. When looking to add a diagnostic for this case, it was noticed that we had 3 other diagnostics that differed only slightly. This patch ALSO unifies those under a 'select'. Unfortunately, the order of words in ONE of these diagnostics was reversed ("'thiscall' calling convention" vs "calling convention 'thiscall'"), so this patch also standardizes on the former. Differential Revision: https://reviews.llvm.org/D59560 Change-Id: I79f99fe7c2301640755ffdd774b46eb44526bb22 llvm-svn: 356663	2019-03-21 13:30:56 +00:00
Yaxun Liu	c5be267003	[CUDA][HIP][Sema] Fix template kernel with function as template parameter If a kernel template has a function as its template parameter, a device function should be allowed as template argument since a kernel can call a device function. However, currently if the kernel template is instantiated in a host function, clang will emit an error message saying the device function is an invalid candidate for the template parameter. This happens because clang checks the reference to the device function during parsing the template arguments. At this point, the template is not instantiated yet. Clang incorrectly assumes the device function is called by the host function and emits the error message. This patch fixes the issue by disabling checking of device function during parsing template arguments and deferring the check to the instantion of the template. At that point, the template decl is already available, therefore the check can be done against the instantiated function template decl. Differential Revision: https://reviews.llvm.org/D56411 llvm-svn: 355421	2019-03-05 18:19:35 +00:00
Yaxun Liu	fa49c3a888	[CUDA][HIP] Check calling convention based on function target MSVC header files using vectorcall to differentiate overloaded functions, which causes failure for AMDGPU target. This is because clang does not check function calling convention based on function target. This patch checks calling convention using the proper target info. Differential Revision: https://reviews.llvm.org/D57716 llvm-svn: 354929	2019-02-26 22:24:49 +00:00
Alexey Bataev	305b6b9647	[OPENMP][CUDA]Do not emit warnings for variables in late-reported asm statements. If the assembler instruction is not generated and the delayed diagnostic is emitted, we may end up with extra warning message for variables used in the asm statement. Since the asm statement is not built, the variables may be left non-referenced and it may produce a warning about a use of the non-initialized variables. llvm-svn: 354928	2019-02-26 21:51:16 +00:00
Michael Liao	7557afa000	[AMDGPU] Allow using integral non-type template parameters Summary: - Allow using integral non-type template parameters in the following attributes __attribute__((amdgpu_flat_work_group_size(<min>, <max>))) __attribute__((amdgpu_waves_per_eu(<min>[, <max>]))) Reviewers: kzhuravl, yaxunl Subscribers: jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, jdoerfert, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58623 llvm-svn: 354909	2019-02-26 18:49:36 +00:00
Alexey Bataev	e69f94e022	[OPENMP] Delayed diagnostics for VLA support. Generalized processing of the deferred diagnostics for OpenMP/CUDA code. llvm-svn: 354690	2019-02-22 20:36:10 +00:00
Alexey Bataev	bbd5c55c66	Revert "[OPENMP] Delayed diagnostics for VLA support." This reverts commit r354679 to fix the problem with the Windows buildbots llvm-svn: 354680	2019-02-22 17:16:50 +00:00
Alexey Bataev	b09bcf8efd	[OPENMP] Delayed diagnostics for VLA support. Generalized processing of the deferred diagnostics for OpenMP/CUDA code. llvm-svn: 354679	2019-02-22 16:49:13 +00:00
Alexey Bataev	3167b3035e	[CUDA]Delayed diagnostics for the asm instructions. Adapted targetDiag for the CUDA and used for the delayed diagnostics in asm constructs. Works for both host and device compilation sides. Differential Revision: https://reviews.llvm.org/D58463 llvm-svn: 354671	2019-02-22 14:42:48 +00:00
Alexey Bataev	12a21e4b69	Revert "[CUDA]Delayed diagnostics for the asm instructions." This reverts commit r354593 to fix the problem with the crash on windows. llvm-svn: 354596	2019-02-21 16:40:21 +00:00
Alexey Bataev	16d3e1a4d2	[CUDA]Delayed diagnostics for the asm instructions. Summary: Adapted targetDiag for the CUDA and used for the delayed diagnostics in asm constructs. Works for both host and device compilation sides. Reviewers: tra, jlebar Subscribers: jdoerfert, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D58463 llvm-svn: 354593	2019-02-21 15:51:30 +00:00
Artem Belevich	c62214da3d	[CUDA] add support for the new kernel launch API in CUDA-9.2+. Instead of calling CUDA runtime to arrange function arguments, the new API constructs arguments in a local array and the kernels are launched with __cudaLaunchKernel(). The old API has been deprecated and is expected to go away in the next CUDA release. Differential Revision: https://reviews.llvm.org/D57488 llvm-svn: 352799	2019-01-31 21:34:03 +00:00
Yaxun Liu	95f2ca541f	[HIP] Fix size_t for MSVC environment In 64 bit MSVC environment size_t is defined as unsigned long long. In single source language like HIP, data layout should be consistent in device and host compilation, therefore copy data layout controlling fields from Aux target for AMDGPU target. Differential Revision: https://reviews.llvm.org/D56318 llvm-svn: 352620	2019-01-30 12:26:54 +00:00
Yaxun Liu	d442500f5d	[CUDA][HIP] Do not diagnose use of _Float16 r352221 caused regressions in CUDA/HIP since device function may use _Float16 whereas host does not support it. In this case host compilation should not diagnose usage of _Float16 in device functions or variables. For now just do not diagnose _Float16 for CUDA/HIP. In the future we should have more precise check. Differential Revision: https://reviews.llvm.org/D57369 llvm-svn: 352488	2019-01-29 13:20:23 +00:00
Yaxun Liu	a461174cfd	[CUDA][HIP] Fix ShouldDeleteSpecialMember for inherited constructors ShouldDeleteSpecialMember is called upon inherited constructors. It calls inferCUDATargetForImplicitSpecialMember. Normally the special member enum passed to ShouldDeleteSpecialMember matches the constructor. However this is not true when inherited constructor is passed, where DefaultConstructor is passed to treat the inherited constructor as DefaultConstructor. However inferCUDATargetForImplicitSpecialMember expects the special member enum argument to match the constructor, which results in assertion when this expection is not satisfied. This patch checks whether the constructor is inherited. If true it will get the real special member enum for the constructor and pass it to inferCUDATargetForImplicitSpecialMember. Differential Revision: https://reviews.llvm.org/D51809 llvm-svn: 344057	2018-10-09 15:53:14 +00:00
Yaxun Liu	9767089d00	[HIP] Support early finalization of device code for -fno-gpu-rdc This patch renames -f{no-}cuda-rdc to -f{no-}gpu-rdc and keeps the original options as aliases. When -fgpu-rdc is off, clang will assume the device code in each translation unit does not call external functions except those in the device library, therefore it is possible to compile the device code in each translation unit to self-contained kernels and embed them in the host object, so that the host object behaves like usual host object which can be linked by lld. The benefits of this feature is: 1. allow users to create static libraries which can be linked by host linker; 2. amortized device code linking time. This patch modifies HIP action builder to insert actions for linking device code and generating HIP fatbin, and pass HIP fatbin to host backend action. It extracts code for constructing command for generating HIP fatbin as a function so that it can be reused by early finalization. It also modifies codegen of HIP host constructor functions to embed the device fatbin when it is available. Differential Revision: https://reviews.llvm.org/D52377 llvm-svn: 343611	2018-10-02 17:48:54 +00:00
Richard Smith	9b2c5e7c44	[cxx2a] P0641R2: (Some) type mismatches on defaulted functions only render the function deleted instead of rendering the program ill-formed. This change also adds an enabled-by-default warning for the case where an explicitly-defaulted special member function of a non-template class is implicitly deleted by the type checking rules. (This fires either due to this language change or due to pre-C++20 reasons for the member being implicitly deleted). I've tested this on a large codebase and found only bugs (where the program means something that's clearly different from what the programmer intended), so this is enabled by default, but we should revisit this if there are problems with this being enabled by default. llvm-svn: 343285	2018-09-28 01:16:43 +00:00
Artem Belevich	78929efb4d	[CUDA] Ignore uncallable functions when we check for usual deallocators. Previously clang considered function variants from both sides of compilation and that resulted in picking up wrong deallocation function. Differential Revision: https://reviews.llvm.org/D51808 llvm-svn: 342749	2018-09-21 17:29:33 +00:00

1 2 3 4

163 Commits