llvm-project

Commit Graph

Author	SHA1	Message	Date
Arpith Chacko Jacob	fc711b1f47	[OpenMP] Teams reduction on the NVPTX device. This patch implements codegen for the reduction clause on any teams construct for elementary data types. It builds on parallel reductions on the GPU. Subsequently, the team master writes to a unique location in a global memory scratchpad. The last team to do so loads and reduces this array to calculate the final result. This patch emits two helper functions that are used by the OpenMP runtime on the GPU to perform reductions across teams. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29879 llvm-svn: 295335	2017-02-16 16:48:49 +00:00
Arpith Chacko Jacob	101e8fb1f3	[OpenMP] Parallel reduction on the NVPTX device. This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295333	2017-02-16 16:20:16 +00:00
Arpith Chacko Jacob	bd6344c0be	Revert r295319 while investigating buildbot failure. llvm-svn: 295323	2017-02-16 14:25:35 +00:00
Arpith Chacko Jacob	8e170fc857	[OpenMP] Parallel reduction on the NVPTX device. This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295319	2017-02-16 14:03:36 +00:00
Arpith Chacko Jacob	cca61a3a74	[OpenMP] Codegen support for 'target teams' on the NVPTX device. This is a simple patch to teach OpenMP codegen to emit the construct in Generic mode. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29143 llvm-svn: 293183	2017-01-26 15:43:27 +00:00
Arpith Chacko Jacob	2cd6eeabfd	[OpenMP] Support for the proc_bind-clause on 'target parallel' on the NVPTX device. This patch adds support for the proc_bind clause on the Spmd construct 'target parallel' on the NVPTX device. Since the parallel region is created upon kernel launch, this clause can be safely ignored on the NVPTX device at codegen time for level 0 parallelism. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29128 llvm-svn: 293069	2017-01-25 16:55:10 +00:00
Arpith Chacko Jacob	86f9e46365	Reverting commit because an NVPTX patch sneaked in. Break up into two patches. llvm-svn: 293003	2017-01-25 01:45:59 +00:00
Arpith Chacko Jacob	4dbf368e14	[OpenMP] Codegen support for 'target teams' on the host. This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293001	2017-01-25 01:38:33 +00:00
Arpith Chacko Jacob	e04da5dee2	[OpenMP] Support for the num_threads-clause on 'target parallel' on the NVPTX device. This patch adds support for the Spmd construct 'target parallel' on the NVPTX device. This involves ignoring the num_threads clause on the device since the number of threads in this combined construct is already set on the host through the call to __tgt_target_teams(). Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29083 llvm-svn: 292999	2017-01-25 01:18:34 +00:00
Arpith Chacko Jacob	44a87c9f1b	[OpenMP] Codegen for the 'target parallel' directive on the NVPTX device. This patch adds codegen for the 'target parallel' directive on the NVPTX device. We term offload OpenMP directives such as 'target parallel' and 'target teams distribute parallel for' as SPMD constructs. SPMD constructs, in contrast to Generic ones like the plain 'target', can never contain a serial region. SPMD constructs can be handled more efficiently on the GPU and do not require the Warp Loop of the Generic codegen scheme. This patch adds SPMD codegen support for 'target parallel' on the NVPTX device and can be reused for other SPMD constructs. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28755 llvm-svn: 292428	2017-01-18 19:35:00 +00:00
Arpith Chacko Jacob	19b911cb75	[OpenMP] Codegen support for 'target parallel' on the host. This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292419	2017-01-18 18:18:53 +00:00
Arpith Chacko Jacob	42793e000a	Revert r292374 to debug Windows buildbot failure. llvm-svn: 292400	2017-01-18 15:36:05 +00:00
Arpith Chacko Jacob	68019578a3	[OpenMP] Codegen support for 'target parallel' on the host. This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292374	2017-01-18 15:14:52 +00:00
Malcolm Parsons	c6e4583dbb	Remove unused lambda captures. NFC llvm-svn: 291939	2017-01-13 18:55:32 +00:00
Arpith Chacko Jacob	bb36fe8dba	[OpenMP] Basic support for a parallel directive in a target region on an NVPTX device Summary: This patch introduces support for the execution of parallel constructs in a target region on the NVPTX device. Parallel regions must be in the lexical scope of the target directive. The master thread in the master warp signals parallel work for worker threads in worker warps on encountering a parallel region. Note: The patch does not yet support capture of arguments in a parallel region so the test cases are simple. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28145 llvm-svn: 291565	2017-01-10 15:42:51 +00:00
Samuel Antao	f83efdb77a	[OpenMP] Add fields for flags in the offload entry descriptor. Summary: This patch adds two fields to the offload entry descriptor. One field is meant to signal Ctors/Dtors and `link` global variables, and the other is reserved for runtime library use. Currently, these fields are only filled with zeros in the current code generation, but that will change when `declare target` is added. The reason, we are adding these fields now is to make the code generation consistent with the runtime library proposal under review in https://reviews.llvm.org/D14031. Reviewers: ABataev, hfinkel, carlo.bertolli, kkwli0, arpith-jacob, Hahnfeld Subscribers: cfe-commits, caomhin, jholewinski Differential Revision: https://reviews.llvm.org/D28298 llvm-svn: 291124	2017-01-05 16:02:49 +00:00
Arpith Chacko Jacob	406acdba61	[OpenMP] Update target codegen for NVPTX device. This patch includes updates for codegen of the target region for the NVPTX device. It moves initializers from the compiler to the runtime and updates the worker loop to assume parallel work is retrieved from the runtime. A subsequent patch will update the codegen to retrieve the parallel work using calls to the runtime. It includes the removal of the inline attribute for the worker loop and disabling debug info in it. This allows codegen for a target directive and serial execution on the NVPTX device. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28125 llvm-svn: 291121	2017-01-05 15:24:05 +00:00
Arpith Chacko Jacob	b0d96f5375	Reverting commit r290983 while debugging test failure on windows. llvm-svn: 290989	2017-01-04 19:14:43 +00:00
Arpith Chacko Jacob	4a24ad0a81	[OpenMP] Update target codegen for NVPTX device. This patch includes updates for codegen of the target region for the NVPTX device. It moves initializers from the compiler to the runtime and updates the worker loop to assume parallel work is retrieved from the runtime. A subsequent patch will update the codegen to retrieve the parallel work using calls to the runtime. It includes the removal of the inline attribute for the worker loop and disabling debug info in it. This allows codegen for a target directive and serial execution on the NVPTX device. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28125 llvm-svn: 290983	2017-01-04 18:44:50 +00:00
Arpith Chacko Jacob	ccf2f7352f	[OpenMP] Code cleanup for NVPTX OpenMP codegen This patch cleans up private methods for NVPTX OpenMP codegen. It converts private members to static functions to follow the coding style of CGOpenMPRuntime.cpp and declutter the header file. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28124 llvm-svn: 290904	2017-01-03 20:19:56 +00:00
Chandler Carruth	fcd33149b4	Cleanup the handling of noinline function attributes, -fno-inline, -fno-inline-functions, -O0, and optnone. These were really, really tangled together: - We used the noinline LLVM attribute for -fno-inline - But not for -fno-inline-functions (breaking LTO) - But we did use it for -finline-hint-functions (yay, LTO is happy!) - But we didn't for -O0 (LTO is sad yet again...) - We had weird structuring of CodeGenOpts with both an inlining enumeration and a boolean. They interacted in weird ways and needlessly. - A lot of set smashing went on with setting these, and then got worse when we considered optnone and other inlining-effecting attributes. - A bunch of inline affecting attributes were managed in a completely different place from -fno-inline. - Even with -fno-inline we failed to put the LLVM noinline attribute onto many generated function definitions because they didn't show up as AST-level functions. - If you passed -O0 but -finline-functions we would run the normal inliner pass in LLVM despite it being in the O0 pipeline, which really doesn't make much sense. - Lastly, we used things like '-fno-inline' to manipulate the pass pipeline which forced the pass pipeline to be much more parameterizable than it really needs to be. Instead we can just use the optimization level to select a pipeline and control the rest via attributes. Sadly, this causes a bunch of churn in tests because we don't run the optimizer in the tests and check the contents of attribute sets. It would be awesome if attribute sets were a bit more FileCheck friendly, but oh well. I think this is a significant improvement and should remove the semantic need to change what inliner pass we run in order to comply with the requested inlining semantics by relying completely on attributes. It also cleans up tho optnone and related handling a bit. One unfortunate aspect of this is that for generating alwaysinline routines like those in OpenMP we end up removing noinline and then adding alwaysinline. I tried a bunch of other approaches, but because we recompute function attributes from scratch and don't have a declaration here I couldn't find anything substantially cleaner than this. Differential Revision: https://reviews.llvm.org/D28053 llvm-svn: 290398	2016-12-23 01:24:49 +00:00
Alexey Bataev	5e87c3465e	[OPENMP] Fix for PR31417: assert failure when compiling trivial openmp program Offload related code is not quite ready yet, but some simple examples must not crash the compiler. Patch fixes the problem in offloading code with exceptions. llvm-svn: 290364	2016-12-22 19:44:05 +00:00
Carlo Bertolli	c687225b43	[OPENMP] Codegen for teams directive for NVPTX This patch implements the teams directive for the NVPTX backend. It is different from the host code generation path as it: Does not call kmpc_fork_teams. All necessary teams and threads are started upon touching the target region, when launching a CUDA kernel, and their execution is coordinated through sequential and parallel regions within the target region. Does not call kmpc_push_num_teams even if a num_teams of thread_limit clause is present. Setting the number of teams and the thread limit is implemented by the nvptx-related runtime. Please note that I am now passing a Clang Expr * to emitPushNumTeams instead of the originally chosen llvm::Value * type. The reason for that is that I want to avoid emitting expressions for num_teams and thread_limit if they are not needed in the target region. http://reviews.llvm.org/D17963 llvm-svn: 265304	2016-04-04 15:55:02 +00:00
Alexey Bataev	14fa1c6b60	[OPENMP] Allow runtime insert its own code inside OpenMP regions. Solution unifies interface of RegionCodeGenTy type to allow insert runtime-specific code before/after main codegen action defined in CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy for general OpenMP directives, but must be allowed to insert its own (required) code to support target specific codegen. llvm-svn: 264700	2016-03-29 05:34:15 +00:00
Alexey Bataev	f539faa733	Revert "[OPENMP] Allow runtime insert its own code inside OpenMP regions." Reverting because of failed tests. llvm-svn: 264577	2016-03-28 12:58:34 +00:00
Alexey Bataev	424be92831	[OPENMP] Allow runtime insert its own code inside OpenMP regions. Solution unifies interface of RegionCodeGenTy type to allow insert runtime-specific code before/after main codegen action defined in CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy for general OpenMP directives, but must be allowed to insert its own (required) code to support target specific codegen. llvm-svn: 264576	2016-03-28 12:52:58 +00:00
Alexey Bataev	f662b5943c	Revert "[OPENMP] Allow runtime insert its own code inside OpenMP regions." This reverts commit 3ee791165100607178073f14531a0dc90c622b36. llvm-svn: 264570	2016-03-28 10:12:03 +00:00
Alexey Bataev	b8c425c4f7	[OPENMP] Allow runtime insert its own code inside OpenMP regions. Solution unifies interface of RegionCodeGenTy type to allow insert runtime-specific code before/after main codegen action defined in CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy for general OpenMP directives, but must be allowed to insert its own (required) code to support target specific codegen. llvm-svn: 264569	2016-03-28 09:53:43 +00:00
Vasileios Kalintiris	e5c095923d	Fix warning about extra semicolon. NFC. llvm-svn: 264035	2016-03-22 10:41:20 +00:00
Arpith Chacko Jacob	5c309e475d	[OpenMP] Base support for target directive codegen on NVPTX device. Summary: This patch adds base support for codegen of the target directive on the NVPTX device. Reviewers: ABataev Differential Revision: http://reviews.llvm.org/D17877 Reworked test case after buildbot failure on windows. Updated patch to integrate r263837 and test case nvptx_target_firstprivate_codegen.cpp. llvm-svn: 264018	2016-03-22 01:48:56 +00:00
Arpith Chacko Jacob	129fa9a048	Revert r263783 as buildbot failure is being investigated. llvm-svn: 263784	2016-03-18 12:39:40 +00:00
Arpith Chacko Jacob	ac563708ab	[OpenMP] Base support for target directive codegen on NVPTX device. Summary: Reworked test case after buildbot failure on windows. This patch adds base support for codegen of the target directive on the NVPTX device. Reviewers: ABataev Differential Revision: http://reviews.llvm.org/D17877 llvm-svn: 263783	2016-03-18 11:47:43 +00:00
Arpith Chacko Jacob	9cb61faa61	Revert commit http://reviews.llvm.org/D17877 to fix tests on x86. llvm-svn: 263589	2016-03-15 21:26:34 +00:00
Arpith Chacko Jacob	5e1493b560	[OpenMP] Base support for target directive codegen on NVPTX device. Summary: This patch adds base support for codegen of the target directive on the NVPTX device. Reviewers: ABataev Differential Revision: http://reviews.llvm.org/D17877 llvm-svn: 263587	2016-03-15 21:04:57 +00:00
Arpith Chacko Jacob	fc46c25d74	Reverted http://reviews.llvm.org/D17877 to fix tests. llvm-svn: 263555	2016-03-15 16:19:13 +00:00
Arpith Chacko Jacob	c61744c26b	[OpenMP] Base support for target directive codegen on NVPTX device. Summary: This patch adds base support for codegen of the target directive on the NVPTX device. Reviewers: ABataev Differential Revision: http://reviews.llvm.org/D17877 llvm-svn: 263552	2016-03-15 15:24:52 +00:00
Alexey Bataev	c5b1d320b8	[OPENMP 4.0] Codegen for 'declare reduction' construct. Emit function for 'combiner' part of 'declare reduction' construct and 'initialilzer' part, if any. llvm-svn: 262699	2016-03-04 09:22:22 +00:00
Samuel Antao	45bfe4cc8a	Re-apply for the 2nd-time r259977 - [OpenMP] Reorganize code to allow specialized code generation for different devices. This was reverted by r260036, but was not the cause of the problem in the buildbot. llvm-svn: 260106	2016-02-08 15:59:20 +00:00
Renato Golin	1cf4c0a6ee	Revert "Re-apply r259977 - [OpenMP] Reorganize code to allow specialized code generation for different devices." This reverts commit r259985, as it still fails one buildbot. llvm-svn: 260036	2016-02-07 15:43:09 +00:00
Samuel Antao	0572837eff	Re-apply r259977 - [OpenMP] Reorganize code to allow specialized code generation for different devices. This was reverted due to a failure in a buildbot, but it turned out the failure was unrelated. llvm-svn: 259985	2016-02-06 06:52:48 +00:00
Samuel Antao	0a1eaf8025	Revert r259977 - [OpenMP] Reorganize code to allow specialized code generation for different devices. It triggered some problem in the configuration related with zlib and exposed in the driver. llvm-svn: 259984	2016-02-06 06:22:46 +00:00
Samuel Antao	3f465c095b	[OpenMP] Reorganize code to allow specialized code generation for different devices. Summary: Different devices may in some cases require different code generation schemes in order to implement OpenMP. This is required not only for performance reasons, but also because it may not be possible to have the current (default) implementation working for these devices. E.g. GPU's cannot implement the same scheme a target such as powerpc or x86b would use, in the sense that it does not have the ability to fork threads, instead all the threads are always executing and need to be managed by the implementation. This patch proposes a reorganization of the code in the OpenMP code generation to pave the way to have specialized implementation of OpenMP support. More than a "real" patch this is more a request for comments in order to understand if what is proposed is acceptable or if there are better/easier ways to do it. In this patch part of the common OpenMP codegen infrastructure is moved to a new file under a new namespace (CGOpenMPCommon) so it can be shared between the default implementation and the specialized one. When CGOpenMPRuntime is created, an attempt to select a specialized implementation is done. In the patch a specialization for nvptx targets is done which currently checks if the target is an OpenMP device and trap if it is not. Let me know comments suggestions you may have. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: Hahnfeld, cfe-commits, fraggamuffin, caomhin, jholewinski Differential Revision: http://reviews.llvm.org/D16784 llvm-svn: 259977	2016-02-06 02:12:34 +00:00

42 Commits