llvm-project

Commit Graph

Author	SHA1	Message	Date
Arpith Chacko Jacob	101e8fb1f3	[OpenMP] Parallel reduction on the NVPTX device. This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295333	2017-02-16 16:20:16 +00:00
Arpith Chacko Jacob	bd6344c0be	Revert r295319 while investigating buildbot failure. llvm-svn: 295323	2017-02-16 14:25:35 +00:00
Arpith Chacko Jacob	8e170fc857	[OpenMP] Parallel reduction on the NVPTX device. This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295319	2017-02-16 14:03:36 +00:00
Arpith Chacko Jacob	99a1e0eba5	[OpenMP] Codegen support for 'target teams' on the host. This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293005	2017-01-25 02:18:43 +00:00
Arpith Chacko Jacob	86f9e46365	Reverting commit because an NVPTX patch sneaked in. Break up into two patches. llvm-svn: 293003	2017-01-25 01:45:59 +00:00
Arpith Chacko Jacob	4dbf368e14	[OpenMP] Codegen support for 'target teams' on the host. This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293001	2017-01-25 01:38:33 +00:00
Arpith Chacko Jacob	fe4890a68b	[OpenMP] Support for the if-clause on the combined directive 'target parallel'. The if-clause on the combined directive potentially applies to both the 'target' and the 'parallel' regions. Codegen'ing the if-clause on the combined directive requires additional support because the expression in the clause must be captured by the 'target' capture statement but not the 'parallel' capture statement. Note that this situation arises for other clauses such as num_threads. The OMPIfClause class inherits OMPClauseWithPreInit to support capturing of expressions in the clause. A member CaptureRegion is added to OMPClauseWithPreInit to indicate which captured statement (in this case 'target' but not 'parallel') captures these expressions. To ensure correct codegen of captured expressions in the presence of combined 'target' directives, OMPParallelScope was added to 'parallel' codegen. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28781 llvm-svn: 292437	2017-01-18 20:40:48 +00:00
Arpith Chacko Jacob	19b911cb75	[OpenMP] Codegen support for 'target parallel' on the host. This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292419	2017-01-18 18:18:53 +00:00
Arpith Chacko Jacob	42793e000a	Revert r292374 to debug Windows buildbot failure. llvm-svn: 292400	2017-01-18 15:36:05 +00:00
Arpith Chacko Jacob	68019578a3	[OpenMP] Codegen support for 'target parallel' on the host. This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292374	2017-01-18 15:14:52 +00:00
Arpith Chacko Jacob	43a8b7bc8c	[OpenMP] Refactor code that calls codegen for target regions on the device. This patch refactors code that calls codegen for target regions. Currently the codebase only supports the 'target' directive. The patch pulls out common target processing code into a static function that can be called by codegen for any target directive. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28752 llvm-svn: 292134	2017-01-16 15:26:02 +00:00
Malcolm Parsons	c6e4583dbb	Remove unused lambda captures. NFC llvm-svn: 291939	2017-01-13 18:55:32 +00:00
Kelvin Li	da68118729	[OpenMP] Sema and parsing for 'target teams distribute simd’ pragma This patch is to implement sema and parsing for 'target teams distribute simd’ pragma. Differential Revision: https://reviews.llvm.org/D28252 llvm-svn: 291579	2017-01-10 18:08:18 +00:00
Carlo Bertolli	962bb807ec	[OPENMP] Private, firstprivate, and lastprivate clauses for distribute, host code generation https://reviews.llvm.org/D17840 This patch enables private, firstprivate, and lastprivate clauses for the OpenMP distribute directive. Regression tests differ from the similar case of the same clauses on the for directive, by removing a reference to two global variables g and g1. This is necessary because: 1. a distribute pragma is only allowed inside a target region; 2. referring a global variable (e.g. g and g1) in a target region requires the program to enclose the variable in a "declare target" region; 3. declare target pragmas, which are used to define a declare target region, are currently unavailable in clang (patch being prepared). For this reason, I moved the global declarations into local variables. llvm-svn: 290898	2017-01-03 18:24:42 +00:00
Kelvin Li	1851df563d	[OpenMP] Sema and parsing for 'target teams distribute parallel for simd’ pragma This patch is to implement sema and parsing for 'target teams distribute parallel for simd’ pragma. Differential Revision: https://reviews.llvm.org/D28202 llvm-svn: 290862	2017-01-03 05:23:48 +00:00
Kelvin Li	80e8f56284	[OpenMP] Sema and parsing for 'target teams distribute parallel for’ pragma This patch is to implement sema and parsing for 'target teams distribute parallel for’ pragma. Differential Revision: https://reviews.llvm.org/D28160 llvm-svn: 290725	2016-12-29 22:16:30 +00:00
Kelvin Li	26fd21ab80	Fix format. NFC llvm-svn: 290673	2016-12-28 17:57:07 +00:00
Kelvin Li	83c451e998	[OpenMP] Sema and parsing for 'target teams distribute' pragma This patch is to implement sema and parsing for 'target teams distribute' pragma. Differential Revision: https://reviews.llvm.org/D28015 llvm-svn: 290508	2016-12-25 04:52:54 +00:00
Kelvin Li	bf594a5600	[OpenMP] Sema and parsing for 'target teams' pragma This patch is to implement sema and parsing for 'target teams' pragma. Differential Revision: https://reviews.llvm.org/D27818 llvm-svn: 290038	2016-12-17 05:48:59 +00:00
Kelvin Li	51336dd0b4	Fix typo in comment. NFC. llvm-svn: 289836	2016-12-15 17:55:32 +00:00
Kelvin Li	7ade93f5e2	[OpenMP] Sema and parsing for 'teams distribute parallel for' pragma This patch is to implement sema and parsing for 'teams distribute parallel for' pragma. Differential Revision: https://reviews.llvm.org/D27345 llvm-svn: 289179	2016-12-09 03:24:30 +00:00
Kelvin Li	579e41ced2	[OpenMP] Sema and parsing for 'teams distribute parallel for simd' pragma This patch is to implement sema and parsing for 'teams distribute parallel for simd' pragma. Differential Revision: https://reviews.llvm.org/D27084 llvm-svn: 288294	2016-11-30 23:51:03 +00:00
Alexey Bataev	957d856e7e	[OPENMP] Fixed codegen for 'omp cancel' construct. If 'omp cancel' construct is used in a worksharing construct it may cause hanging of the software in case if reduction clause is used. Patch fixes this problem by avoiding extra reduction processing for branches that were canceled. llvm-svn: 287227	2016-11-17 15:12:05 +00:00
Vitaly Buka	2d15858e40	Revert "[OPENMP] Fixed codegen for 'omp cancel' construct." Summary: r286944 introduced bugs detected by ASAN as use-after-return. r287025 have not fixed them completely. This reverts commit r286944 and r287025. Reviewers: ABataev Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D26720 llvm-svn: 287069	2016-11-16 01:01:22 +00:00
Alexey Bataev	ba002163c9	[OPENMP] Fix stack use after delete, NFC. Fixed possible use of stack variable after deletion. llvm-svn: 287025	2016-11-15 20:57:18 +00:00
Alexey Bataev	473a3e7fed	[OPENMP] Fixed codegen for 'omp cancel' construct. If 'omp cancel' construct is used in a worksharing construct it may cause hanging of the software in case if reduction clause is used. Patch fixes this problem by avoiding extra reduction processing for branches that were canceled. llvm-svn: 286944	2016-11-15 09:11:50 +00:00
Amara Emerson	652795db16	Add the loop end location to the loop metadata. This additional information can be used to improve the locations when generating remarks for loops. Depends on the companion LLVM change r286227. Patch by Florian Hahn. Differential Revision: https://reviews.llvm.org/D25764 llvm-svn: 286456	2016-11-10 14:44:30 +00:00
Alexey Bataev	ac5eabb0b9	[OPENMP] Fixed capturing of VLA variables. After some changes in codegen capturing of VLA variables in OpenMP regions was broken, causing compiler crash. Patch fixes this issue. llvm-svn: 286103	2016-11-07 11:16:04 +00:00
Diana Picus	1e2b7e6672	Revert "[OPENMP] Fixed capturing of VLA variables." This reverts commit r286098 because the modified test breaks on many of the buildbots. llvm-svn: 286102	2016-11-07 10:01:43 +00:00
Alexey Bataev	420537fad8	[OPENMP] Fixed capturing of VLA variables. After some changes in codegen capturing of VLA variables in OpenMP regions was broken, causing compiler crash. Patch fixes this issue. llvm-svn: 286098	2016-11-07 08:07:25 +00:00
Kelvin Li	4e325f77a9	Re-apply patch r279045. llvm-svn: 285066	2016-10-25 12:50:55 +00:00
Akira Hatanaka	642f799b0d	[CodeGen][ObjC] Do not call objc_storeStrong when initializing a constexpr variable. When compiling a constexpr NSString initialized with an objective-c string literal, CodeGen emits objc_storeStrong on an uninitialized alloca, which causes a crash. This patch folds the code in EmitScalarInit into EmitStoreThroughLValue and fixes the crash by calling objc_retain on the string instead of using objc_storeStrong. rdar://problem/28562009 Differential Revision: https://reviews.llvm.org/D25547 llvm-svn: 284516	2016-10-18 19:05:41 +00:00
Alexey Bataev	2f5ed34279	Fix for PR30639: CGDebugInfo Null dereference with OpenMP array access, by Erich Keane OpenMP creates a variable array type with a a null size-expr. The Debug generation failed to due to this. This patch corrects the openmp implementation, updates the tests, and adds a new one for this condition. Differential Revision: https://reviews.llvm.org/D25373 llvm-svn: 284110	2016-10-13 09:52:46 +00:00
Diana Picus	8b44bbc077	Revert "[OpenMP] Sema and parsing for 'teams distribute simd’ pragma" This reverts commit r279003 as it breaks some of our buildbots (e.g. clang-cmake-aarch64-quick, clang-x86_64-linux-selfhost-modules). The error is in OpenMP/teams_distribute_simd_ast_print.cpp: clang: /home/buildslave/buildslave/clang-cmake-aarch64-quick/llvm/include/llvm/ADT/DenseMap.h:527: bool llvm::DenseMapBase<DerivedT, KeyT, ValueT, KeyInfoT, BucketT>::LookupBucketFor(const LookupKeyT&, const BucketT&) const [with LookupKeyT = clang::Stmt; DerivedT = llvm::DenseMap<clang::Stmt, long unsigned int>; KeyT = clang::Stmt; ValueT = long unsigned int; KeyInfoT = llvm::DenseMapInfo<clang::Stmt>; BucketT = llvm::detail::DenseMapPair<clang::Stmt, long unsigned int>]: Assertion `!KeyInfoT::isEqual(Val, EmptyKey) && !KeyInfoT::isEqual(Val, TombstoneKey) && "Empty/Tombstone value shouldn't be inserted into map!"' failed. llvm-svn: 279045	2016-08-18 09:25:07 +00:00
Kelvin Li	0e3bde8216	[OpenMP] Sema and parsing for 'teams distribute simd’ pragma This patch is to implement sema and parsing for 'teams distribute simd’ pragma. This patch is originated by Carlo Bertolli. Differential Revision: https://reviews.llvm.org/D23528 llvm-svn: 279003	2016-08-17 23:13:03 +00:00
Kelvin Li	0253287633	[OpenMP] Sema and parsing for 'teams distribute' pragma This patch is to implement sema and parsing for 'teams distribute' pragma. Differential Revision: https://reviews.llvm.org/D23189 llvm-svn: 277818	2016-08-05 14:37:37 +00:00
Samuel Antao	cc10b85789	[OpenMP] Codegen for use_device_ptr clause. Summary: This patch adds support for the use_device_ptr clause. It includes changes in SEMA that could not be tested without codegen, namely, the use of the first private logic and mappable expressions support. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: https://reviews.llvm.org/D22691 llvm-svn: 276977	2016-07-28 14:23:26 +00:00
Samuel Antao	403ffd409f	[OpenMP] Add support for mapping array sections through pointer references. Summary: This patch fixes a bug in the map of array sections whose base is a reference to a pointer. The existing mapping support was not prepared to deal with it, causing the compiler to crash. Mapping a reference to a pointer enjoys the same characteristics of a regular pointer, i.e., it is passed by value. Therefore, the reference has to be materialized in the target region. Reviewers: hfinkel, carlo.bertolli, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: https://reviews.llvm.org/D22690 llvm-svn: 276933	2016-07-27 22:49:49 +00:00
Kelvin Li	986330c190	[OpenMP] Sema and parsing for 'target simd' pragma This patch is to implement sema and parsing for 'target simd' pragma. Differential Revision: https://reviews.llvm.org/D22479 llvm-svn: 276203	2016-07-20 22:57:10 +00:00
Alexey Bataev	5140e748b5	[OPENMP] Improved processing of 'priority' clause, NFC. Removed some old comments + improved handling of 'priority' clause value during codegen after comments from Richard Smith. llvm-svn: 275945	2016-07-19 04:21:09 +00:00
Kelvin Li	a579b9196c	[OpenMP] Sema and parsing for 'target parallel for simd' pragma This patch is to implement sema and parsing for 'target parallel for simd' pragma. Differential Revision: http://reviews.llvm.org/D22096 llvm-svn: 275365	2016-07-14 02:54:56 +00:00
Carlo Bertolli	70594e9282	[OpenMP] Initial implementation of parse+sema for OpenMP clause 'is_device_ptr' of target http://reviews.llvm.org/D22070 llvm-svn: 275282	2016-07-13 17:16:49 +00:00
Carlo Bertolli	2404b17192	[OpenMP] Initial implementation of parse+sema for clause use_device_ptr of 'target data' http://reviews.llvm.org/D21904 This patch is similar to the implementation of 'private' clause: it adds a list of private pointers to be used within the target data region to store the device pointers returned by the runtime. Please refer to the following document for a full description of what the runtime witll return in this case (page 10 and 11): https://github.com/clang-omp/OffloadingDesign I am happy to answer any question related to the runtime interface to help reviewing this patch. llvm-svn: 275271	2016-07-13 15:37:16 +00:00
Kelvin Li	787f3fcc6b	[OpenMP] Sema and parsing for 'distribute simd' pragma Summary: This patch is an implementation of sema and parsing for the OpenMP composite pragma 'distribute simd'. Differential Revision: http://reviews.llvm.org/D22007 llvm-svn: 274604	2016-07-06 04:45:38 +00:00
Kelvin Li	4a39add05e	[OpenMP] Sema and parse for 'distribute parallel for simd' Summary: This patch is an implementation of sema and parsing for the OpenMP composite pragma 'distribute parallel for simd'. Differential Revision: http://reviews.llvm.org/D21977 llvm-svn: 274530	2016-07-05 05:00:15 +00:00
Carlo Bertolli	9925f15661	Resubmission of http://reviews.llvm.org/D21564 after fixes. [OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for' This patch is an initial implementation for #distribute parallel for. The main differences that affect other pragmas are: The implementation of 'distribute parallel for' requires blocking of the associated loop, where blocks are "distributed" to different teams and iterations within each block are scheduled to parallel threads within each team. To implement blocking, sema creates two additional worksharing directive fields that are used to pass the team assigned block lower and upper bounds through the outlined function resulting from 'parallel'. In this way, scheduling for 'for' to threads can use those bounds. As a consequence of blocking, the stride of 'distribute' is not 1 but it is equal to the blocking size. This is returned by the runtime and sema prepares a DistIncrExpr variable to hold that value. As a consequence of blocking, the global upper bound (EnsureUpperBound) expression of the 'for' is not the original loop upper bound (e.g. in for(i = 0 ; i < N; i++) this is 'N') but it is the team-assigned block upper bound. Sema creates a new expression holding the calculation of the actual upper bound for 'for' as UB = min(UB, PrevUB), where UB is the loop upper bound, and PrevUB is the team-assigned block upper bound. llvm-svn: 273884	2016-06-27 14:55:37 +00:00
Carlo Bertolli	b8503d5399	Revert r273705 [OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for' llvm-svn: 273709	2016-06-24 19:20:02 +00:00
Carlo Bertolli	e77d6e0e4d	[OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for' http://reviews.llvm.org/D21564 This patch is an initial implementation for #distribute parallel for. The main differences that affect other pragmas are: The implementation of 'distribute parallel for' requires blocking of the associated loop, where blocks are "distributed" to different teams and iterations within each block are scheduled to parallel threads within each team. To implement blocking, sema creates two additional worksharing directive fields that are used to pass the team assigned block lower and upper bounds through the outlined function resulting from 'parallel'. In this way, scheduling for 'for' to threads can use those bounds. As a consequence of blocking, the stride of 'distribute' is not 1 but it is equal to the blocking size. This is returned by the runtime and sema prepares a DistIncrExpr variable to hold that value. As a consequence of blocking, the global upper bound (EnsureUpperBound) expression of the 'for' is not the original loop upper bound (e.g. in for(i = 0 ; i < N; i++) this is 'N') but it is the team-assigned block upper bound. Sema creates a new expression holding the calculation of the actual upper bound for 'for' as UB = min(UB, PrevUB), where UB is the loop upper bound, and PrevUB is the team-assigned block upper bound. llvm-svn: 273705	2016-06-24 18:53:35 +00:00
Samuel Antao	6d0042642a	Re-apply r272900 - [OpenMP] Cast captures by copy when passed to fork call so that they are compatible to what the runtime library expects. An issue in one of the regression tests was fixed for 32-bit hosts. llvm-svn: 272931	2016-06-16 18:39:34 +00:00
Samuel Antao	b1f9501242	Revert r272900 - [OpenMP] Cast captures by copy when passed to fork call so that they are compatible to what the runtime library expects. Was causing trouble in one of the regression tests for a 32-bit address space. llvm-svn: 272908	2016-06-16 16:06:22 +00:00

1 2 3 4 5 ...

307 Commits