llvm-project

Commit Graph

Author	SHA1	Message	Date
Adam Nemet	484aa45153	Encapsulate FPOptions and use it consistently Sema holds the current FPOptions which is adjusted by 'pragma STDC FP_CONTRACT'. This then gets propagated into expression nodes as they are built. This encapsulates FPOptions so that this propagation happens opaquely rather than directly with the fp_contractable on/off bit. This allows controlled transitioning of fp_contractable to a ternary value (off, on, fast). It will also allow adding more fast-math flags later. This is toward moving fp-contraction=fast from an LLVM TargetOption to a FastMathFlag in order to fix PR25721. Differential Revision: https://reviews.llvm.org/D31166 llvm-svn: 298877	2017-03-27 19:17:25 +00:00
Arpith Chacko Jacob	fc711b1f47	[OpenMP] Teams reduction on the NVPTX device. This patch implements codegen for the reduction clause on any teams construct for elementary data types. It builds on parallel reductions on the GPU. Subsequently, the team master writes to a unique location in a global memory scratchpad. The last team to do so loads and reduces this array to calculate the final result. This patch emits two helper functions that are used by the OpenMP runtime on the GPU to perform reductions across teams. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29879 llvm-svn: 295335	2017-02-16 16:48:49 +00:00
Arpith Chacko Jacob	101e8fb1f3	[OpenMP] Parallel reduction on the NVPTX device. This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295333	2017-02-16 16:20:16 +00:00
Arpith Chacko Jacob	bd6344c0be	Revert r295319 while investigating buildbot failure. llvm-svn: 295323	2017-02-16 14:25:35 +00:00
Arpith Chacko Jacob	8e170fc857	[OpenMP] Parallel reduction on the NVPTX device. This patch implements codegen for the reduction clause on any parallel construct for elementary data types. An efficient implementation requires hierarchical reduction within a warp and a threadblock. It is complicated by the fact that variables declared in the stack of a CUDA thread cannot be shared with other threads. The patch creates a struct to hold reduction variables and a number of helper functions. The OpenMP runtime on the GPU implements reduction algorithms that uses these helper functions to perform reductions within a team. Variables are shared between CUDA threads using shuffle intrinsics. An implementation of reductions on the NVPTX device is substantially different to that of CPUs. However, this patch is written so that there are minimal changes to the rest of OpenMP codegen. The implemented design allows the compiler and runtime to be decoupled, i.e., the runtime does not need to know of the reduction operation(s), the type of the reduction variable(s), or the number of reductions. The design also allows reuse of host codegen, with appropriate specialization for the NVPTX device. While the patch does introduce a number of abstractions, the expected use case calls for inlining of the GPU OpenMP runtime. After inlining and optimizations in LLVM, these abstractions are unwound and performance of OpenMP reductions is comparable to CUDA-canonical code. Patch by Tian Jin in collaboration with Arpith Jacob Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29758 llvm-svn: 295319	2017-02-16 14:03:36 +00:00
Arpith Chacko Jacob	99a1e0eba5	[OpenMP] Codegen support for 'target teams' on the host. This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293005	2017-01-25 02:18:43 +00:00
Arpith Chacko Jacob	86f9e46365	Reverting commit because an NVPTX patch sneaked in. Break up into two patches. llvm-svn: 293003	2017-01-25 01:45:59 +00:00
Arpith Chacko Jacob	4dbf368e14	[OpenMP] Codegen support for 'target teams' on the host. This patch adds support for codegen of 'target teams' on the host. This combined directive has two captured statements, one for the 'teams' region, and the other for the 'parallel'. This target teams region is offloaded using the __tgt_target_teams() call. The patch sets the number of teams as an argument to this call. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D29084 llvm-svn: 293001	2017-01-25 01:38:33 +00:00
Arpith Chacko Jacob	fe4890a68b	[OpenMP] Support for the if-clause on the combined directive 'target parallel'. The if-clause on the combined directive potentially applies to both the 'target' and the 'parallel' regions. Codegen'ing the if-clause on the combined directive requires additional support because the expression in the clause must be captured by the 'target' capture statement but not the 'parallel' capture statement. Note that this situation arises for other clauses such as num_threads. The OMPIfClause class inherits OMPClauseWithPreInit to support capturing of expressions in the clause. A member CaptureRegion is added to OMPClauseWithPreInit to indicate which captured statement (in this case 'target' but not 'parallel') captures these expressions. To ensure correct codegen of captured expressions in the presence of combined 'target' directives, OMPParallelScope was added to 'parallel' codegen. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28781 llvm-svn: 292437	2017-01-18 20:40:48 +00:00
Arpith Chacko Jacob	19b911cb75	[OpenMP] Codegen support for 'target parallel' on the host. This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292419	2017-01-18 18:18:53 +00:00
Arpith Chacko Jacob	42793e000a	Revert r292374 to debug Windows buildbot failure. llvm-svn: 292400	2017-01-18 15:36:05 +00:00
Arpith Chacko Jacob	68019578a3	[OpenMP] Codegen support for 'target parallel' on the host. This patch adds support for codegen of 'target parallel' on the host. It is also the first combined directive that requires two or more captured statements. Support for this functionality is included in the patch. A combined directive such as 'target parallel' has two captured statements, one for the 'target' and the other for the 'parallel' region. Two captured statements are required because each has different implicit parameters (see SemaOpenMP.cpp). For example, the 'parallel' has 'global_tid' and 'bound_tid' while the 'target' does not. The patch adds support for handling multiple captured statements based on the combined directive. When codegen'ing the 'target parallel' directive, the 'target' outlined function is created using the outer captured statement and the 'parallel' outlined function is created using the inner captured statement. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28753 llvm-svn: 292374	2017-01-18 15:14:52 +00:00
Arpith Chacko Jacob	43a8b7bc8c	[OpenMP] Refactor code that calls codegen for target regions on the device. This patch refactors code that calls codegen for target regions. Currently the codebase only supports the 'target' directive. The patch pulls out common target processing code into a static function that can be called by codegen for any target directive. Reviewers: ABataev Differential Revision: https://reviews.llvm.org/D28752 llvm-svn: 292134	2017-01-16 15:26:02 +00:00
Malcolm Parsons	c6e4583dbb	Remove unused lambda captures. NFC llvm-svn: 291939	2017-01-13 18:55:32 +00:00
Kelvin Li	da68118729	[OpenMP] Sema and parsing for 'target teams distribute simd’ pragma This patch is to implement sema and parsing for 'target teams distribute simd’ pragma. Differential Revision: https://reviews.llvm.org/D28252 llvm-svn: 291579	2017-01-10 18:08:18 +00:00
Carlo Bertolli	962bb807ec	[OPENMP] Private, firstprivate, and lastprivate clauses for distribute, host code generation https://reviews.llvm.org/D17840 This patch enables private, firstprivate, and lastprivate clauses for the OpenMP distribute directive. Regression tests differ from the similar case of the same clauses on the for directive, by removing a reference to two global variables g and g1. This is necessary because: 1. a distribute pragma is only allowed inside a target region; 2. referring a global variable (e.g. g and g1) in a target region requires the program to enclose the variable in a "declare target" region; 3. declare target pragmas, which are used to define a declare target region, are currently unavailable in clang (patch being prepared). For this reason, I moved the global declarations into local variables. llvm-svn: 290898	2017-01-03 18:24:42 +00:00
Kelvin Li	1851df563d	[OpenMP] Sema and parsing for 'target teams distribute parallel for simd’ pragma This patch is to implement sema and parsing for 'target teams distribute parallel for simd’ pragma. Differential Revision: https://reviews.llvm.org/D28202 llvm-svn: 290862	2017-01-03 05:23:48 +00:00
Kelvin Li	80e8f56284	[OpenMP] Sema and parsing for 'target teams distribute parallel for’ pragma This patch is to implement sema and parsing for 'target teams distribute parallel for’ pragma. Differential Revision: https://reviews.llvm.org/D28160 llvm-svn: 290725	2016-12-29 22:16:30 +00:00
Kelvin Li	26fd21ab80	Fix format. NFC llvm-svn: 290673	2016-12-28 17:57:07 +00:00
Kelvin Li	83c451e998	[OpenMP] Sema and parsing for 'target teams distribute' pragma This patch is to implement sema and parsing for 'target teams distribute' pragma. Differential Revision: https://reviews.llvm.org/D28015 llvm-svn: 290508	2016-12-25 04:52:54 +00:00
Kelvin Li	bf594a5600	[OpenMP] Sema and parsing for 'target teams' pragma This patch is to implement sema and parsing for 'target teams' pragma. Differential Revision: https://reviews.llvm.org/D27818 llvm-svn: 290038	2016-12-17 05:48:59 +00:00
Kelvin Li	51336dd0b4	Fix typo in comment. NFC. llvm-svn: 289836	2016-12-15 17:55:32 +00:00
Kelvin Li	7ade93f5e2	[OpenMP] Sema and parsing for 'teams distribute parallel for' pragma This patch is to implement sema and parsing for 'teams distribute parallel for' pragma. Differential Revision: https://reviews.llvm.org/D27345 llvm-svn: 289179	2016-12-09 03:24:30 +00:00
Kelvin Li	579e41ced2	[OpenMP] Sema and parsing for 'teams distribute parallel for simd' pragma This patch is to implement sema and parsing for 'teams distribute parallel for simd' pragma. Differential Revision: https://reviews.llvm.org/D27084 llvm-svn: 288294	2016-11-30 23:51:03 +00:00
Alexey Bataev	957d856e7e	[OPENMP] Fixed codegen for 'omp cancel' construct. If 'omp cancel' construct is used in a worksharing construct it may cause hanging of the software in case if reduction clause is used. Patch fixes this problem by avoiding extra reduction processing for branches that were canceled. llvm-svn: 287227	2016-11-17 15:12:05 +00:00
Vitaly Buka	2d15858e40	Revert "[OPENMP] Fixed codegen for 'omp cancel' construct." Summary: r286944 introduced bugs detected by ASAN as use-after-return. r287025 have not fixed them completely. This reverts commit r286944 and r287025. Reviewers: ABataev Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D26720 llvm-svn: 287069	2016-11-16 01:01:22 +00:00
Alexey Bataev	ba002163c9	[OPENMP] Fix stack use after delete, NFC. Fixed possible use of stack variable after deletion. llvm-svn: 287025	2016-11-15 20:57:18 +00:00
Alexey Bataev	473a3e7fed	[OPENMP] Fixed codegen for 'omp cancel' construct. If 'omp cancel' construct is used in a worksharing construct it may cause hanging of the software in case if reduction clause is used. Patch fixes this problem by avoiding extra reduction processing for branches that were canceled. llvm-svn: 286944	2016-11-15 09:11:50 +00:00
Amara Emerson	652795db16	Add the loop end location to the loop metadata. This additional information can be used to improve the locations when generating remarks for loops. Depends on the companion LLVM change r286227. Patch by Florian Hahn. Differential Revision: https://reviews.llvm.org/D25764 llvm-svn: 286456	2016-11-10 14:44:30 +00:00
Alexey Bataev	ac5eabb0b9	[OPENMP] Fixed capturing of VLA variables. After some changes in codegen capturing of VLA variables in OpenMP regions was broken, causing compiler crash. Patch fixes this issue. llvm-svn: 286103	2016-11-07 11:16:04 +00:00
Diana Picus	1e2b7e6672	Revert "[OPENMP] Fixed capturing of VLA variables." This reverts commit r286098 because the modified test breaks on many of the buildbots. llvm-svn: 286102	2016-11-07 10:01:43 +00:00
Alexey Bataev	420537fad8	[OPENMP] Fixed capturing of VLA variables. After some changes in codegen capturing of VLA variables in OpenMP regions was broken, causing compiler crash. Patch fixes this issue. llvm-svn: 286098	2016-11-07 08:07:25 +00:00
Kelvin Li	4e325f77a9	Re-apply patch r279045. llvm-svn: 285066	2016-10-25 12:50:55 +00:00
Akira Hatanaka	642f799b0d	[CodeGen][ObjC] Do not call objc_storeStrong when initializing a constexpr variable. When compiling a constexpr NSString initialized with an objective-c string literal, CodeGen emits objc_storeStrong on an uninitialized alloca, which causes a crash. This patch folds the code in EmitScalarInit into EmitStoreThroughLValue and fixes the crash by calling objc_retain on the string instead of using objc_storeStrong. rdar://problem/28562009 Differential Revision: https://reviews.llvm.org/D25547 llvm-svn: 284516	2016-10-18 19:05:41 +00:00
Alexey Bataev	2f5ed34279	Fix for PR30639: CGDebugInfo Null dereference with OpenMP array access, by Erich Keane OpenMP creates a variable array type with a a null size-expr. The Debug generation failed to due to this. This patch corrects the openmp implementation, updates the tests, and adds a new one for this condition. Differential Revision: https://reviews.llvm.org/D25373 llvm-svn: 284110	2016-10-13 09:52:46 +00:00
Diana Picus	8b44bbc077	Revert "[OpenMP] Sema and parsing for 'teams distribute simd’ pragma" This reverts commit r279003 as it breaks some of our buildbots (e.g. clang-cmake-aarch64-quick, clang-x86_64-linux-selfhost-modules). The error is in OpenMP/teams_distribute_simd_ast_print.cpp: clang: /home/buildslave/buildslave/clang-cmake-aarch64-quick/llvm/include/llvm/ADT/DenseMap.h:527: bool llvm::DenseMapBase<DerivedT, KeyT, ValueT, KeyInfoT, BucketT>::LookupBucketFor(const LookupKeyT&, const BucketT&) const [with LookupKeyT = clang::Stmt; DerivedT = llvm::DenseMap<clang::Stmt, long unsigned int>; KeyT = clang::Stmt; ValueT = long unsigned int; KeyInfoT = llvm::DenseMapInfo<clang::Stmt>; BucketT = llvm::detail::DenseMapPair<clang::Stmt, long unsigned int>]: Assertion `!KeyInfoT::isEqual(Val, EmptyKey) && !KeyInfoT::isEqual(Val, TombstoneKey) && "Empty/Tombstone value shouldn't be inserted into map!"' failed. llvm-svn: 279045	2016-08-18 09:25:07 +00:00
Kelvin Li	0e3bde8216	[OpenMP] Sema and parsing for 'teams distribute simd’ pragma This patch is to implement sema and parsing for 'teams distribute simd’ pragma. This patch is originated by Carlo Bertolli. Differential Revision: https://reviews.llvm.org/D23528 llvm-svn: 279003	2016-08-17 23:13:03 +00:00
Kelvin Li	0253287633	[OpenMP] Sema and parsing for 'teams distribute' pragma This patch is to implement sema and parsing for 'teams distribute' pragma. Differential Revision: https://reviews.llvm.org/D23189 llvm-svn: 277818	2016-08-05 14:37:37 +00:00
Samuel Antao	cc10b85789	[OpenMP] Codegen for use_device_ptr clause. Summary: This patch adds support for the use_device_ptr clause. It includes changes in SEMA that could not be tested without codegen, namely, the use of the first private logic and mappable expressions support. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: https://reviews.llvm.org/D22691 llvm-svn: 276977	2016-07-28 14:23:26 +00:00
Samuel Antao	403ffd409f	[OpenMP] Add support for mapping array sections through pointer references. Summary: This patch fixes a bug in the map of array sections whose base is a reference to a pointer. The existing mapping support was not prepared to deal with it, causing the compiler to crash. Mapping a reference to a pointer enjoys the same characteristics of a regular pointer, i.e., it is passed by value. Therefore, the reference has to be materialized in the target region. Reviewers: hfinkel, carlo.bertolli, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: https://reviews.llvm.org/D22690 llvm-svn: 276933	2016-07-27 22:49:49 +00:00
Kelvin Li	986330c190	[OpenMP] Sema and parsing for 'target simd' pragma This patch is to implement sema and parsing for 'target simd' pragma. Differential Revision: https://reviews.llvm.org/D22479 llvm-svn: 276203	2016-07-20 22:57:10 +00:00
Alexey Bataev	5140e748b5	[OPENMP] Improved processing of 'priority' clause, NFC. Removed some old comments + improved handling of 'priority' clause value during codegen after comments from Richard Smith. llvm-svn: 275945	2016-07-19 04:21:09 +00:00
Kelvin Li	a579b9196c	[OpenMP] Sema and parsing for 'target parallel for simd' pragma This patch is to implement sema and parsing for 'target parallel for simd' pragma. Differential Revision: http://reviews.llvm.org/D22096 llvm-svn: 275365	2016-07-14 02:54:56 +00:00
Carlo Bertolli	70594e9282	[OpenMP] Initial implementation of parse+sema for OpenMP clause 'is_device_ptr' of target http://reviews.llvm.org/D22070 llvm-svn: 275282	2016-07-13 17:16:49 +00:00
Carlo Bertolli	2404b17192	[OpenMP] Initial implementation of parse+sema for clause use_device_ptr of 'target data' http://reviews.llvm.org/D21904 This patch is similar to the implementation of 'private' clause: it adds a list of private pointers to be used within the target data region to store the device pointers returned by the runtime. Please refer to the following document for a full description of what the runtime witll return in this case (page 10 and 11): https://github.com/clang-omp/OffloadingDesign I am happy to answer any question related to the runtime interface to help reviewing this patch. llvm-svn: 275271	2016-07-13 15:37:16 +00:00
Kelvin Li	787f3fcc6b	[OpenMP] Sema and parsing for 'distribute simd' pragma Summary: This patch is an implementation of sema and parsing for the OpenMP composite pragma 'distribute simd'. Differential Revision: http://reviews.llvm.org/D22007 llvm-svn: 274604	2016-07-06 04:45:38 +00:00
Kelvin Li	4a39add05e	[OpenMP] Sema and parse for 'distribute parallel for simd' Summary: This patch is an implementation of sema and parsing for the OpenMP composite pragma 'distribute parallel for simd'. Differential Revision: http://reviews.llvm.org/D21977 llvm-svn: 274530	2016-07-05 05:00:15 +00:00
Carlo Bertolli	9925f15661	Resubmission of http://reviews.llvm.org/D21564 after fixes. [OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for' This patch is an initial implementation for #distribute parallel for. The main differences that affect other pragmas are: The implementation of 'distribute parallel for' requires blocking of the associated loop, where blocks are "distributed" to different teams and iterations within each block are scheduled to parallel threads within each team. To implement blocking, sema creates two additional worksharing directive fields that are used to pass the team assigned block lower and upper bounds through the outlined function resulting from 'parallel'. In this way, scheduling for 'for' to threads can use those bounds. As a consequence of blocking, the stride of 'distribute' is not 1 but it is equal to the blocking size. This is returned by the runtime and sema prepares a DistIncrExpr variable to hold that value. As a consequence of blocking, the global upper bound (EnsureUpperBound) expression of the 'for' is not the original loop upper bound (e.g. in for(i = 0 ; i < N; i++) this is 'N') but it is the team-assigned block upper bound. Sema creates a new expression holding the calculation of the actual upper bound for 'for' as UB = min(UB, PrevUB), where UB is the loop upper bound, and PrevUB is the team-assigned block upper bound. llvm-svn: 273884	2016-06-27 14:55:37 +00:00
Carlo Bertolli	b8503d5399	Revert r273705 [OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for' llvm-svn: 273709	2016-06-24 19:20:02 +00:00
Carlo Bertolli	e77d6e0e4d	[OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for' http://reviews.llvm.org/D21564 This patch is an initial implementation for #distribute parallel for. The main differences that affect other pragmas are: The implementation of 'distribute parallel for' requires blocking of the associated loop, where blocks are "distributed" to different teams and iterations within each block are scheduled to parallel threads within each team. To implement blocking, sema creates two additional worksharing directive fields that are used to pass the team assigned block lower and upper bounds through the outlined function resulting from 'parallel'. In this way, scheduling for 'for' to threads can use those bounds. As a consequence of blocking, the stride of 'distribute' is not 1 but it is equal to the blocking size. This is returned by the runtime and sema prepares a DistIncrExpr variable to hold that value. As a consequence of blocking, the global upper bound (EnsureUpperBound) expression of the 'for' is not the original loop upper bound (e.g. in for(i = 0 ; i < N; i++) this is 'N') but it is the team-assigned block upper bound. Sema creates a new expression holding the calculation of the actual upper bound for 'for' as UB = min(UB, PrevUB), where UB is the loop upper bound, and PrevUB is the team-assigned block upper bound. llvm-svn: 273705	2016-06-24 18:53:35 +00:00
Samuel Antao	6d0042642a	Re-apply r272900 - [OpenMP] Cast captures by copy when passed to fork call so that they are compatible to what the runtime library expects. An issue in one of the regression tests was fixed for 32-bit hosts. llvm-svn: 272931	2016-06-16 18:39:34 +00:00
Samuel Antao	b1f9501242	Revert r272900 - [OpenMP] Cast captures by copy when passed to fork call so that they are compatible to what the runtime library expects. Was causing trouble in one of the regression tests for a 32-bit address space. llvm-svn: 272908	2016-06-16 16:06:22 +00:00
Samuel Antao	4951617980	[OpenMP] Cast captures by copy when passed to fork call so that they are compatible to what the runtime library expects. Summary: This patch fixes an issue detected when firstprivate variables are passed to an OpenMP outlined function vararg list. Currently they are not compatible with what the runtime library expects causing malfunction in some targets. This patch fixes the issue by moving the casting logic already in place for offloading to the common code that creates the outline function and arguments and updates the regression tests accordingly. Reviewers: hfinkel, arpith-jacob, carlo.bertolli, kkwli0, ABataev Subscribers: cfe-commits, caomhin Differential Revision: http://reviews.llvm.org/D21150 llvm-svn: 272900	2016-06-16 15:09:31 +00:00
Alexey Bataev	ad537bb8a0	[OPENMP 4.5] Fixed codegen for 'priority' and destructors in task-based directives. 'kmp_task_t' record type added a new field for 'priority' clause and changed the representation of pointer to destructors for privates used within loop-based directives. Old representation: typedef struct kmp_task { /* GEH: Shouldn't this be aligned somehow? / void shareds; /*< pointer to block of pointers to shared vars / kmp_routine_entry_t routine; /*< pointer to routine to call for executing task / kmp_int32 part_id; /*< part id for the task / kmp_routine_entry_t destructors; /* pointer to function to invoke deconstructors of firstprivate C++ objects / / private vars / } kmp_task_t; New representation: typedef struct kmp_task { / GEH: Shouldn't this be aligned somehow? / void shareds; /*< pointer to block of pointers to shared vars / kmp_routine_entry_t routine; /*< pointer to routine to call for executing task / kmp_int32 part_id; /*< part id for the task / kmp_cmplrdata_t data1; /* Two known optional additions: destructors and priority / kmp_cmplrdata_t data2; / Process destructors first, priority second / / future data / / private vars */ } kmp_task_t; Also excessive initialization of 'destructors' fields to 'null' was removed from codegen if it is known that no destructors shal be used. Currently a special bit is used in 'kmp_tasking_flags_t' bitfields ('destructors_thunk' bitfield). llvm-svn: 271201	2016-05-30 09:06:50 +00:00
Samuel Antao	8d2d730f2a	[OpenMP] Codegen for target update directive. Summary: This patch implements the code generation for the `target update` directive. The implemntation relies on the logic already in place for target data standalone directives, i.e. target enter/exit data. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: caomhin, cfe-commits Differential Revision: http://reviews.llvm.org/D20650 llvm-svn: 270886	2016-05-26 18:30:22 +00:00
Samuel Antao	ec172c6da0	[OpenMP] Parsing and sema support for the from clause Summary: The patch contains the parsing and sema support for the `from` clause. Patch based on the original post by Kelvin Li. Reviewers: hfinkel, carlo.bertolli, kkwli0, arpith-jacob, ABataev Subscribers: caomhin, cfe-commits Differential Revision: http://reviews.llvm.org/D18488 llvm-svn: 270882	2016-05-26 17:49:04 +00:00
Samuel Antao	661c0904e1	[OpenMP] Parsing and sema support for the to clause Summary: The patch contains the parsing and sema support for the `to` clause. Patch based on the original post by Kelvin Li. Reviewers: carlo.bertolli, hfinkel, kkwli0, arpith-jacob, ABataev Subscribers: caomhin, cfe-commits Differential Revision: http://reviews.llvm.org/D18597 llvm-svn: 270880	2016-05-26 17:39:58 +00:00
Samuel Antao	686c70c3dc	[OpenMP] Parsing and sema support for target update directive Summary: This patch is to add parsing and sema support for `target update` directive. Support for the `to` and `from` clauses will be added by a different patch. This patch also adds support for other clauses that are already implemented upstream and apply to `target update`, e.g. `device` and `if`. This patch is based on the original post by Kelvin Li. Reviewers: hfinkel, carlo.bertolli, kkwli0, arpith-jacob, ABataev Subscribers: caomhin, cfe-commits Differential Revision: http://reviews.llvm.org/D15944 llvm-svn: 270878	2016-05-26 17:30:50 +00:00
Hal Finkel	c07e19b2c1	Add a loop's debug location to its llvm.loop metadata Getting accurate locations for loops is important, because those locations are used by the frontend to generate optimization remarks. Currently, optimization remarks for loops often appear on the wrong line, often the first line of the loop body instead of the loop itself. This is confusing because that line might itself be another loop, or might be somewhere else completely if the body was an inlined function call. This happens because of the way we find the loop's starting location. First, we look for a preheader, and if we find one, and its terminator has a debug location, then we use that. Otherwise, we look for a location on an instruction in the loop header. The fallback heuristic is not bad, but will almost always find the beginning of the body, and not the loop statement itself. The preheader location search often fails because there's often not a preheader, and even when there is a preheader, depending on how it was formed, it sometimes carries the location of some preceeding code. I don't see any good theoretical way to fix this problem. On the other hand, this seems like a straightforward solution: Put the debug location in the loop's llvm.loop metadata. When emitting debug information, this commit causes us to add the debug location as an operand to each loop's llvm.loop metadata. Thus, we now generate this metadata for all loops (not just loops with optimization hints) when we're otherwise generating debug information. The remark test case changes depend on the companion LLVM commit r270771. llvm-svn: 270772	2016-05-25 21:53:24 +00:00
Alexey Bataev	8b42706a6e	[OPENMP 4.5] Codegen for dacross loop synchronization constructs. OpenMP 4.5 adds support for doacross loop synchronization. Patch implements codegen for this construct. llvm-svn: 270690	2016-05-25 12:36:08 +00:00
Alexey Bataev	9afe57541e	[OPENMP] Fixed codegen for firstprivate vars in standalone worksharing directives. If firstprivate variable is is captured by value in outlined region and then used as firstprivate variable in inner worksharing directive, the copy for this firstprivate variable was not created. Fixed this bug. llvm-svn: 270536	2016-05-24 07:40:12 +00:00
Alexey Bataev	7ace49dff1	[OPENMP] Pass scalar firstprivate vars by value. For better performance and to unify code with offloading part we pass scalar firstprivate values by value, instead of by reference. It will remove some extra copying operations. llvm-svn: 269751	2016-05-17 08:55:33 +00:00
Alexey Bataev	1e1e286a6b	[OPENMP 4.5] Initial codegen for 'priority' clause in task-based directives. OpenMP 4.5 supports clause 'priority' in task-based directives. Patch adds initial codegen support for this clause in codegen. llvm-svn: 269050	2016-05-10 12:21:02 +00:00
Alexey Bataev	9ebd742748	[OPENMP 4.5] Add codegen support in runtime for '[non]monotonic' schedule modifiers. Runtime library expects some additional data in schedule argument for loop-based directives, that have additional schedule modifiers 'monotonic\|nonmonotonic'. llvm-svn: 269035	2016-05-10 09:57:36 +00:00
Alexey Bataev	f93095a003	[OPENMP 4.5] Codegen for 'lastprivate' clauses in 'taskloop' directives. OpenMP 4.5 adds taskloop/taskloop simd directives. These directives allow to use lastprivate clause. Patch adds codegen for this clause. llvm-svn: 268618	2016-05-05 08:46:22 +00:00
Alexey Bataev	1e73ef3882	[OPENMP 4.5] Initial codegen for 'taskloop simd' directive. OpenMP 4.5 defines 'taskloop simd' directive, which is combined directive for 'taskloop' and 'simd' directives. Patch adds initial codegen support for this directive and its 2 basic clauses 'safelen' and 'simdlen'. llvm-svn: 267872	2016-04-28 12:14:51 +00:00
Alexey Bataev	24b5baed27	[OPENMP] Simplified interface for codegen of tasks, NFC. Reduced number of arguments in member functions of runtime support library for task-based directives. llvm-svn: 267863	2016-04-28 09:23:51 +00:00
Alexey Bataev	2b19a6fe53	[OPENMP 4.5] Codegen for 'grainsize/num_tasks' clauses of 'taskloop' directive. OpenMP 4.5 defines 'taskloop' directive and 2 additional clauses 'grainsize' and 'num_tasks' for this directive. Patch adds codegen for these clauses. These clauses are generated as arguments of the '__kmpc_taskloop' libcall and are encoded the following way: void __kmpc_taskloop(ident_t loc, int gtid, kmp_task_t task, int if_val, kmp_uint64 lb, kmp_uint64 ub, kmp_int64 st, int nogroup, int sched, kmp_uint64 grainsize, void *task_dup); If 'grainsize' is specified, 'sched' argument must be set to '1' and 'grainsize' argument must be set to the value of the 'grainsize' clause. If 'num_tasks' is specified, 'sched' argument must be set to '2' and 'grainsize' argument must be set to the value of the 'num_tasks' clause. It is possible because these 2 clauses are mutually exclusive and can't be used at the same time on the same directive. If none of these clauses is specified, 'sched' argument must be set to '0'. llvm-svn: 267862	2016-04-28 09:15:06 +00:00
Samuel Antao	8dd6628743	[OpenMP] Code generation for target exit data directive Summary: This patch adds support for the target exit data directive code generation. Given that, apart from the employed runtime call, target exit data requires the same code generation pattern as target enter data, the OpenMP codegen entry point was renamed and reused for both. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: cfe-commits, fraggamuffin, caomhin Differential Revision: http://reviews.llvm.org/D17369 llvm-svn: 267814	2016-04-27 23:14:30 +00:00
Samuel Antao	bd0ae2e14c	[OpenMP] Code generation for target enter data directive Summary: This patch adds support for the target enter data directive code generation. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: cfe-commits, fraggamuffin, caomhin Differential Revision: http://reviews.llvm.org/D17368 llvm-svn: 267812	2016-04-27 23:07:29 +00:00
Samuel Antao	df158d5567	[OpenMP] Code generation for target data directive Summary: This patch adds support for the target data directive code generation. Part of the already existent functionality related with data maps is moved to a new function so that it could be reused. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev Subscribers: cfe-commits, fraggamuffin, caomhin Differential Revision: http://reviews.llvm.org/D17367 llvm-svn: 267811	2016-04-27 22:58:19 +00:00
Alexey Bataev	8fbae8cf09	[OPENMP] Fix crash on initialization of classes with no init clause in declare reductions. If reduction clause is applied to instance of class with user-defined reduction operation without initialization clause, it may cause a crash. Patch fixes this issue. llvm-svn: 267695	2016-04-27 11:38:05 +00:00
Alexey Bataev	4ba78a46ff	[OPENMP] Fix for codegen of captured variables in inlined directives. Currently there is a problem with codegen of inlined directives inside lambdas, it may cause a crash during codegen because of incorrect capturing of variables. Patch fixes this problem. llvm-svn: 267677	2016-04-27 07:56:03 +00:00
Alexey Bataev	7292c29bb5	[OPENMP 4.5] Codegen for 'taskloop' directive. The taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using OpenMP tasks. The iterations are distributed across tasks created by the construct and scheduled to be executed. The next code will be generated for the taskloop directive: #pragma omp taskloop num_tasks(N) lastprivate(j) for( i=0; i<NGRAINSTRIDE-1; i+=STRIDE ) { int th = omp_get_thread_num(); #pragma omp atomic counter++; #pragma omp atomic th_counter[th]++; j = i; } Generated code: task = __kmpc_omp_task_alloc(NULL,gtid,1,sizeof(struct task),sizeof(struct shar),&task_entry); psh = task->shareds; psh->pth_counter = &th_counter; psh->pcounter = &counter; psh->pj = &j; task->lb = 0; task->ub = NGRAINSTRIDE-2; task->st = STRIDE; __kmpc_taskloop( NULL, // location gtid, // gtid task, // task structure 1, // if clause value &task->lb, // lower bound &task->ub, // upper bound STRIDE, // loop increment 0, // 1 if nogroup specified 2, // schedule type: 0-none, 1-grainsize, 2-num_tasks N, // schedule value (ignored for type 0) (void*)&__task_dup_entry // tasks duplication routine ); llvm-svn: 267395	2016-04-25 12:22:29 +00:00
Alexey Bataev	feddd64bff	[OPENMP] Fix for PR27463: Privatizing struct fields with array type causes code generation failure. The codegen part of firstprivate clause for member decls used type of original variable without skipping reference type from OMPCapturedExprDecl. Patch fixes this problem. llvm-svn: 267125	2016-04-22 09:05:03 +00:00
Alexey Bataev	5dff95c04d	[OPENMP] Fix for LCV in simd directives in explicit clauses. If loop control variable for simd-based directives is explicitly marked as linear/lastprivate in clauses, codegen for such construct would crash. Patch fixes this problem. llvm-svn: 267101	2016-04-22 03:56:56 +00:00
Alexey Bataev	48591dd98c	[OPENMP] Codegen for untied tasks. If the untied clause is present on a task construct, any thread in the team can resume the task region after a suspension. Patch adds proper codegen for untied tasks. llvm-svn: 266853	2016-04-20 04:01:36 +00:00
Alexey Bataev	995e861ba6	Revert "[OPENMP] Codegen for untied tasks." This reverts commit r266754. llvm-svn: 266755	2016-04-19 16:36:01 +00:00
Alexey Bataev	823acfacdf	[OPENMP] Codegen for untied tasks. If the untied clause is present on a task construct, any thread in the team can resume the task region after a suspension. Patch adds proper codegen for untied tasks. llvm-svn: 266754	2016-04-19 16:27:55 +00:00
Alexey Bataev	bec9572213	Revert "[OPENMP] Codegen for untied tasks." This reverts commit 266722. llvm-svn: 266724	2016-04-19 09:27:38 +00:00
Alexey Bataev	26b2577f6b	[OPENMP] Codegen for untied tasks. If the untied clause is present on a task construct, any thread in the team can resume the task region after a suspension. Patch adds proper codegen for untied tasks. llvm-svn: 266722	2016-04-19 09:10:27 +00:00
Alexey Bataev	e48a5fc56d	[OPENMP 4.0] Support for 'uniform' clause in 'declare simd' directive. OpenMP 4.0 defines clause 'uniform' in 'declare simd' directive: 'uniform' '(' <argument-list> ')' The uniform clause declares one or more arguments to have an invariant value for all concurrent invocations of the function in the execution of a single SIMD loop. The special this pointer can be used as if was one of the arguments to the function in any of the linear, aligned, or uniform clauses. llvm-svn: 266041	2016-04-12 05:28:34 +00:00
JF Bastien	92f4ef1017	NFC: make AtomicOrdering an enum class Summary: See LLVM change D18775 for details, this change depends on it. Reviewers: jyknight, reames Subscribers: cfe-commits Differential Revision: http://reviews.llvm.org/D18776 llvm-svn: 265569	2016-04-06 17:26:42 +00:00
Carlo Bertolli	c687225b43	[OPENMP] Codegen for teams directive for NVPTX This patch implements the teams directive for the NVPTX backend. It is different from the host code generation path as it: Does not call kmpc_fork_teams. All necessary teams and threads are started upon touching the target region, when launching a CUDA kernel, and their execution is coordinated through sequential and parallel regions within the target region. Does not call kmpc_push_num_teams even if a num_teams of thread_limit clause is present. Setting the number of teams and the thread limit is implemented by the nvptx-related runtime. Please note that I am now passing a Clang Expr * to emitPushNumTeams instead of the originally chosen llvm::Value * type. The reason for that is that I want to avoid emitting expressions for num_teams and thread_limit if they are not needed in the target region. http://reviews.llvm.org/D17963 llvm-svn: 265304	2016-04-04 15:55:02 +00:00
Alexey Bataev	5a3af13d93	[OPENMP] Remove extra code transformation. For better support of some specific GNU extensions some extra transformation of AST nodes were introduced. These transformations are very hard to handle. The code is improved in handling of these extensions by using captured expressions construct. llvm-svn: 264709	2016-03-29 08:58:54 +00:00
Alexey Bataev	14fa1c6b60	[OPENMP] Allow runtime insert its own code inside OpenMP regions. Solution unifies interface of RegionCodeGenTy type to allow insert runtime-specific code before/after main codegen action defined in CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy for general OpenMP directives, but must be allowed to insert its own (required) code to support target specific codegen. llvm-svn: 264700	2016-03-29 05:34:15 +00:00
Alexey Bataev	f539faa733	Revert "[OPENMP] Allow runtime insert its own code inside OpenMP regions." Reverting because of failed tests. llvm-svn: 264577	2016-03-28 12:58:34 +00:00
Alexey Bataev	424be92831	[OPENMP] Allow runtime insert its own code inside OpenMP regions. Solution unifies interface of RegionCodeGenTy type to allow insert runtime-specific code before/after main codegen action defined in CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy for general OpenMP directives, but must be allowed to insert its own (required) code to support target specific codegen. llvm-svn: 264576	2016-03-28 12:52:58 +00:00
Alexey Bataev	f662b5943c	Revert "[OPENMP] Allow runtime insert its own code inside OpenMP regions." This reverts commit 3ee791165100607178073f14531a0dc90c622b36. llvm-svn: 264570	2016-03-28 10:12:03 +00:00
Alexey Bataev	b8c425c4f7	[OPENMP] Allow runtime insert its own code inside OpenMP regions. Solution unifies interface of RegionCodeGenTy type to allow insert runtime-specific code before/after main codegen action defined in CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy for general OpenMP directives, but must be allowed to insert its own (required) code to support target specific codegen. llvm-svn: 264569	2016-03-28 09:53:43 +00:00
Alexey Bataev	a839dddf92	[OPENMP 4.0] Use 'declare reduction' constructs in 'reduction' clauses. OpenMP 4.0 allows to define custom reduction operations using '#pragma omp declare reduction' construct. Patch allows to use this custom defined reduction operations in 'reduction' clauses. llvm-svn: 263701	2016-03-17 10:19:46 +00:00
John McCall	c56a8b3284	Preserve ExtParameterInfos into CGFunctionInfo. As part of this, make the function-arrangement interfaces a little simpler and more semantic. NFC. llvm-svn: 263191	2016-03-11 04:30:31 +00:00
Alexey Bataev	ef549a8955	[OPENMP 4.5] Codegen for data members in 'linear' clause OpenMP 4.5 allows privatization of non-static data members in OpenMP constructs. Patch adds proper codegen support for data members in 'linear' clause llvm-svn: 263003	2016-03-09 09:49:09 +00:00
Alexey Bataev	78849fb464	[OPENMP 4.5] Codegen for data members in 'linear' clause. OpenMP 4.5 allows to use data members in private clauses. Patch adds codegen support for 'linear' clause. llvm-svn: 263002	2016-03-09 09:49:00 +00:00
Carlo Bertolli	0ff587d4a4	[OPENMP] Codegen for distribute directive: fix bug in ordering of parameters. llvm-svn: 262833	2016-03-07 16:19:13 +00:00
Carlo Bertolli	fc35ad2bbc	Reapply r262741 [OPENMP] Codegen for distribute directive This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling. http://reviews.llvm.org/D17170 llvm-svn: 262832	2016-03-07 16:04:49 +00:00
Samuel Antao	bf4d18d3d2	Revert r262741 - [OPENMP] Codegen for distribute directive Was causing a failure in one of the buildbot slaves. llvm-svn: 262744	2016-03-04 21:02:14 +00:00
Carlo Bertolli	4a56e3831d	[OPENMP] Codegen for distribute directive This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling. http://reviews.llvm.org/D17170 llvm-svn: 262741	2016-03-04 20:24:58 +00:00
Carlo Bertolli	6ad7b5aff2	[OPENMP] firstprivate and private clauses of teams, host codegeneration Add code generation support for firstprivate and private clauses of teams on the host. Add extensive regression tests including lambda functions and vla testing. http://reviews.llvm.org/D17582 llvm-svn: 262663	2016-03-03 22:09:40 +00:00
Carlo Bertolli	430d8ecc55	Add code generation for teams directive inside target region llvm-svn: 262652	2016-03-03 20:34:23 +00:00
Samuel Antao	b68e2db8f9	[OpenMP] Code generation for teams - kernel launching Summary: This patch implements the launching of a target region in the presence of a nested teams region, i.e calls tgt_target_teams with the required arguments gathered from the enclosed teams directive. The actual codegen of the region enclosed by the teams construct will be contributed in a separate patch. Reviewers: hfinkel, arpith-jacob, kkwli0, carlo.bertolli, ABataev Subscribers: cfe-commits, caomhin, fraggamuffin Differential Revision: http://reviews.llvm.org/D17019 llvm-svn: 262625	2016-03-03 16:20:23 +00:00
Alexey Bataev	2bbf7217ea	[OPENMP 4.5] Initial support for data members in 'linear' clause. OpenMP 4.5 allows to privatize data members of current class in member functions. Patch adds initial support for privatization of data members in 'linear' clause, no codegen support. llvm-svn: 262578	2016-03-03 03:52:24 +00:00
Alexey Bataev	61205070c4	[OPENMP 4.5] Codegen for data members in 'reduction' clause. OpenMP 4.5 allows to privatize non-static data members of current class in non-static member functions. Patch supports codegen for non-static data members in 'reduction' clauses. llvm-svn: 262460	2016-03-02 04:57:40 +00:00
Alexey Bataev	005248ac8a	[OPENMP 4.5] Codegen for member decls in 'lastprivate' clause. OpenMP 4.5 allows to privatize non-static member decls in non-static member functions. Patch captures such decls by reference in general (for bitfields, by value) and then operates with this capture. For bitfields, at the end of codegen for lastprivates original bitfield is updated with the value of captured copy. llvm-svn: 261824	2016-02-25 05:25:57 +00:00
Richard Trieu	cc3949d99a	Remove use of builtin comma operator. Cleanup for upcoming Clang warning -Wcomma. No functionality change intended. llvm-svn: 261271	2016-02-18 22:34:54 +00:00
Alexey Bataev	8ffcc949b1	[OPENMP] Fix codegen for lastprivate loop counters. Patch fixes bug with codegen for lastprivate loop counters. Also it may improve performance for lastprivates calculations in some cases. llvm-svn: 261209	2016-02-18 13:48:15 +00:00
Alexey Bataev	417089fc7e	[OPENMP 4.5] Codegen support for data members in 'firstprivate' clause. Added codegen for captured data members in non-static member functions. llvm-svn: 261089	2016-02-17 13:19:37 +00:00
Alexey Bataev	3392d76081	[OPENMP] Improved handling of pseudo-captured expressions in OpenMP. Expressions inside 'schedule'\|'dist_schedule' clause must be captured in combined directives to avoid possible crash during codegen. Patch improves handling of such constructs llvm-svn: 260954	2016-02-16 11:18:12 +00:00
Alexey Bataev	cd8b6a2cf1	[OPENMP] Remove extra sync barriers for 'firstprivate' clause. Sync barrier will be emitted after generation of firstprivate variables only if one of the firstprivate vars is used in lastprivate clause. llvm-svn: 260877	2016-02-15 08:07:17 +00:00
Alexey Bataev	4244be25bd	[OPENMP] Rename OMPCapturedFieldDecl to OMPCapturedExprDecl, NFC. OMPCapturedExprDecl allows caopturing not only of fielddecls, but also other expressions. It also allows to simplify codegen for several clauses. llvm-svn: 260492	2016-02-11 05:35:55 +00:00
Alexey Bataev	31300ed0a5	[OPENMP 4.0] Fixed support of array sections/array subscripts. Codegen for array sections/array subscripts worked only for expressions with arrays as base. Patch fixes codegen for bases with pointer/reference types. llvm-svn: 259776	2016-02-04 11:27:03 +00:00
Arpith Chacko Jacob	05bebb578a	[OpenMP] Parsing + sema for target parallel for directive. Summary: This patch adds parsing + sema for the target parallel for directive along with testcases. Reviewers: ABataev Differential Revision: http://reviews.llvm.org/D16759 llvm-svn: 259654	2016-02-03 15:46:42 +00:00
Arpith Chacko Jacob	e955b3d3fe	[OpenMP] Parsing + sema for target parallel directive. Summary: This patch adds parsing + sema for the target parallel directive and its clauses along with testcases. Reviewers: ABataev Differential Revision: http://reviews.llvm.org/D16553 Rebased to current trunk and updated test cases. llvm-svn: 258832	2016-01-26 18:48:41 +00:00
Arpith Chacko Jacob	3cf89040b0	[OpenMP] Parsing + sema for defaultmap clause. Summary: This patch adds parsing + sema for the defaultmap clause associated with the target directive (among others). Reviewers: ABataev Differential Revision: http://reviews.llvm.org/D16527 llvm-svn: 258817	2016-01-26 16:37:23 +00:00
Alexey Bataev	1189bd0205	[OPENMP 4.5] Allow arrays in 'reduction' clause. OpenMP 4.5, alogn with array sections, allows to use variables of array type in reductions. llvm-svn: 258804	2016-01-26 12:20:39 +00:00
Alexey Bataev	3015bcc62a	[OPENMP] Generalize codegen for 'sections'-based directive. If 'sections' directive has only one sub-section, the code for 'single'-based directive was emitted. Removed this codegen, because it causes crashes in different cases. llvm-svn: 258495	2016-01-22 08:56:50 +00:00
Alexey Bataev	8524d15954	[OPENMP] Fix crash on reduction for complex variables. reworked codegen for reduction operation for complex types to avoid crash llvm-svn: 258394	2016-01-21 12:35:58 +00:00
Alexey Bataev	9619f04c0e	[OPENMP 4.0] Fix for codegen of 'cancel' directive within 'sections' directive. Allow to emit code for 'cancel' directive within 'sections' directive with single sub-section. llvm-svn: 258307	2016-01-20 12:29:47 +00:00
Samuel Antao	7259076032	[OpenMP] Parsing + sema for "target exit data" directive. Patch by Arpith Jacob. Thanks! llvm-svn: 258177	2016-01-19 20:04:50 +00:00
Samuel Antao	df67fc468e	[OpenMP] Parsing + sema for "target enter data" directive. Patch by Arpith Jacob. Thanks! llvm-svn: 258165	2016-01-19 19:15:56 +00:00
Carlo Bertolli	b4adf55e0f	Add OpenMP dist_schedule clause to distribute directive and related regression tests. llvm-svn: 257917	2016-01-15 18:50:31 +00:00
Samuel Antao	ee8fb302f5	[OpenMP] Reapply rL256842: [OpenMP] Offloading descriptor registration and device codegen. This patch attempts to fix the regressions identified when the patch was committed initially. Thanks to Michael Liao for identifying the fix in the offloading metadata generation related with side effects in evaluation of function arguments. llvm-svn: 256933	2016-01-06 13:42:12 +00:00
Samuel Antao	7d5de9a1ee	[OpenMP] Revert rL256842: [OpenMP] Offloading descriptor registration and device codegen. It was causing two regression, so I'm reverting until the cause is found. llvm-svn: 256858	2016-01-05 19:16:13 +00:00
Samuel Antao	4d5f0bbea1	[OpenMP] Offloading descriptor registration and device codegen. Summary: In order to offloading work properly two things need to be in place: - a descriptor with all the offloading information (device entry functions, and global variable) has to be created by the host and registered in the OpenMP offloading runtime library. - all the device functions need to be emitted for the device and a convention has to be in place so that the runtime library can easily map the host ID of an entry point with the actual function in the device. This patch adds support for these two things. However, only entry functions are being registered given that 'declare target' directive is not yet implemented. About offloading descriptor: The details of the descriptor are explained with more detail in http://goo.gl/L1rnKJ. Basically the descriptor will have fields that specify the number of devices, the pointers to where the device images begin and end (that will be defined by the linker), and also pointers to a the begin and end of table whose entries contain information about a specific entry point. Each entry has the type: ``` struct __tgt_offload_entry{ void addr; char name; int64_t size; }; ``` and will be implemented in a pre determined (ELF) section `.omp_offloading.entries` with 1-byte alignment, so that when all the objects are linked, the table is in that section with no padding in between entries (will be like a C array). The code generation ensures that all `__tgt_offload_entry` entries are emitted in the same order for both host and device so that the runtime can have the corresponding entries in both host and device in same index of the table, and efficiently implement the mapping. The resulting descriptor is registered/unregistered with the runtime library using the calls `__tgt_register_lib` and `__tgt_unregister_lib`. The registration is implemented in a high priority global initializer so that the registration happens always before any initializer (that can potentially include target regions) is run. The driver flag -omptargets= was created to specify a comma separated list of devices the user wants to support so that the new functionality can be exercised. Each device is specified with its triple. About target codegen: The target codegen is pretty much straightforward as it reuses completely the logic of the host version for the same target region. The tricky part is to identify the meaningful target regions in the device side. Unlike other programming models, like CUDA, there are no already outlined functions with attributes that mark what should be emitted or not. So, the information on what to emit is passed in the form of metadata in host bc file. This requires a new option to pass the host bc to the device frontend. Then everything is similar to what happens in CUDA: the global declarations emission is intercepted to check to see if it is an "interesting" declaration. The difference is that instead of checking an attribute, the metadata information in checked. Right now, there is only a form of metadata to pass information about the device entry points (target regions). A class `OffloadEntriesInfoManagerTy` was created to manage all the information and queries related with the metadata. The metadata looks like this: ``` !omp_offload.info = !{!0, !1, !2, !3, !4, !5, !6} !0 = !{i32 0, i32 52, i32 77426347, !"_ZN2S12r1Ei", i32 479, i32 13, i32 4} !1 = !{i32 0, i32 52, i32 77426347, !"_ZL7fstatici", i32 461, i32 11, i32 5} !2 = !{i32 0, i32 52, i32 77426347, !"_Z9ftemplateIiET_i", i32 444, i32 11, i32 6} !3 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 99, i32 11, i32 0} !4 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 272, i32 11, i32 3} !5 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 127, i32 11, i32 1} !6 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 159, i32 11, i32 2} ``` The fields in each metadata entry are (in sequence): Entry 1) an ID of the type of metadata - right now only zero is used meaning "OpenMP target region". Entry 2) a unique ID of the device where the input source file that contain the target region lives. Entry 3) a unique ID of the file where the input source file that contain the target region lives. Entry 4) a mangled name of the function that encloses the target region. Entries 5) and 6) line and column number where the target region was found. Entry 7) is the order the entry was emitted. Entry 2) and 3) are required to distinguish files that have the same function name. Entry 4) is required to distinguish different instances of the same declaration (usually templated ones) Entries 5) and 6) are required to distinguish the particular target region in body of the function (it is possible that a given target region is not an entry point - if clause can evaluate always to zero - and therefore we need to identify the "interesting" target regions. ) This patch replaces http://reviews.llvm.org/D12306. Reviewers: ABataev, hfinkel, tra, rjmccall, sfantao Subscribers: FBrygidyn, piotr.rak, Hahnfeld, cfe-commits Differential Revision: http://reviews.llvm.org/D12614 llvm-svn: 256842	2016-01-05 16:23:04 +00:00
Alexey Bataev	a6f2a14b94	[OPENMP 4.5] Codegen for 'schedule' clause with monotonic/nonmonotonic modifiers. OpenMP 4.5 adds support for monotonic/nonmonotonic modifiers in 'schedule' clause. Add codegen for these modifiers. llvm-svn: 256666	2015-12-31 06:52:34 +00:00
Alexey Bataev	6f531ec0a2	[OPENMP] Remove explicit call for implicit barrier #pragma omp parallel needs an implicit barrier that is currently done by an explicit call to __kmpc_barrier. However, the runtime already ensures a barrier in __kmpc_fork_call which currently leads to two barriers per region per thread. Differential Revision: http://reviews.llvm.org/D15561 llvm-svn: 255992	2015-12-18 10:24:53 +00:00
Alexey Bataev	8ef3141127	[OPENMP] Fix for http://llvm.org/PR25878 : Error compiling an OpenMP program OpenMP codegen tried to emit the code for its constructs even if it was detected as a dead-code. Added checks to ensure that the code is emitted if the code is not dead. llvm-svn: 255990	2015-12-18 07:58:25 +00:00
Alexey Bataev	fc57d1601d	[OPENMP 4.5] Codegen for 'hint' clause of 'critical' directive OpenMP 4.5 defines 'hint' clause for 'critical' directive. Patch adds codegen for this clause. llvm-svn: 255639	2015-12-15 10:55:09 +00:00
Alexey Bataev	28c75417b2	[OPENMP 4.5] Parsing/sema for 'hint' clause of 'critical' directive. OpenMP 4.5 adds 'hint' clause to critical directive. Patch adds parsing/semantic analysis for this clause. llvm-svn: 255625	2015-12-15 08:19:24 +00:00
Carlo Bertolli	6200a3d0f3	Add parse and sema of OpenMP distribute directive with all clauses except dist_schedule llvm-svn: 255498	2015-12-14 14:51:25 +00:00
Alexey Bataev	33c56402d8	[OPENMP] Fix debug info for 'atomic' construct. Debug info for statement under 'atomic' construct must point exactly to that statement, not the directive itself. llvm-svn: 255487	2015-12-14 09:26:19 +00:00
NAKAMURA Takumi	2d5c6ddf74	Revert r255001, "Add parse and sema for OpenMP distribute directive and all its clauses excluding dist_schedule." It causes memory leak. Some tests in test/OpenMP would fail. llvm-svn: 255094	2015-12-09 04:35:57 +00:00
Alexey Bataev	382967a2e4	[OPENMP 4.5] Parsing/sema for 'num_tasks' clause. OpenMP 4.5 adds directives 'taskloop' and 'taskloop simd'. These directives support clause 'num_tasks'. Patch adds parsing/semantic analysis for this clause. llvm-svn: 255008	2015-12-08 12:06:20 +00:00
Carlo Bertolli	b9bfa75b28	Add parse and sema for OpenMP distribute directive and all its clauses excluding dist_schedule. llvm-svn: 255001	2015-12-08 04:21:03 +00:00
Alexey Bataev	1fd4aed26b	[OPENMP 4.5] parsing/sema support for 'grainsize' clause. OpenMP 4.5 adds 'taksloop' and 'taskloop simd' directives, which have 'grainsize' clause. Patch adds parsing/sema analysis of this clause. llvm-svn: 254903	2015-12-07 12:52:51 +00:00
Alexey Bataev	b825de17b7	[OPENMP 4.5] parsing/sema support for 'nogroup' clause. OpenMP 4.5 adds 'taskloop' and 'taskloop simd' directives. These directives have new 'nogroup' clause. Patch adds basic parsing/sema support for this clause. llvm-svn: 254899	2015-12-07 10:51:44 +00:00
Serge Pavlov	3a5614599a	[PGO] Instrument only base constructors and destructors. Constructors and destructors may be represented by several functions in IR. Only base structors correspond to source code, others are small pieces of code and eventually call the base variant. In this case instrumentation of non-base structors has little sense, this fix remove it. Now profile data of a declaration corresponds to exactly one function in IR, it agrees with the current logic of the profile data loading. This change fixes PR24996. Differential Revision: http://reviews.llvm.org/D15158 llvm-svn: 254876	2015-12-06 14:32:39 +00:00
Alexey Bataev	0a6ed84a0d	[OPENMP 4.5] Parsing/sema support for 'omp taskloop simd' directive. OpenMP 4.5 adds directive 'taskloop simd'. Patch adds parsing/sema analysis for 'taskloop simd' directive and its clauses. llvm-svn: 254597	2015-12-03 09:40:15 +00:00
Samuel Antao	4af1b7b693	[OpenMP] Update target directive codegen to use 4.5 implicit data mappings. Summary: This patch implements the 4.5 specification for the implicit data maps. OpenMP 4.5 specification changes the default way data is captured into a target region. All the non-aggregate kinds are passed by value by default. This required activating the capturing by value during SEMA for the target region. All the non-aggregate values that can be encoded in the size of a pointer are properly casted and forwarded to the runtime library. On top of fixing the previous weird behavior for mapping pointers in nested data regions (an explicit map was always required), this also improves performance as the number of allocations/transactions to the device per non-aggregate map are reduced from two to only one - instead of passing a reference and the value, only the value passed. Explicit maps will be added later on once firstprivate, private, and map clauses' SEMA and parsing are available. Reviewers: hfinkel, rjmccall, ABataev Subscribers: cfe-commits, carlo.bertolli Differential Revision: http://reviews.llvm.org/D14940 llvm-svn: 254521	2015-12-02 17:44:43 +00:00
Alexey Bataev	a056935a2f	[OPENMP 4.5] Parsing/sema analysis for 'priority' clause. OpenMP 4.5 defines new clause 'priority' for 'task', 'taskloop' and 'taskloop simd' directives. Added parsing and sema analysis for 'priority' clause in 'task' and 'taskloop' directives. llvm-svn: 254398	2015-12-01 10:17:31 +00:00
Alexey Bataev	49f6e78d71	[OPENMP 4.5] Parsing/sema analysis for 'taskloop' directive. Adds initial parsing and semantic analysis for 'taskloop' directive. llvm-svn: 254367	2015-12-01 04:18:41 +00:00
Kelvin Li	a15fb1a110	[OpenMP] Parsing and sema support for thread_limit clause. http://reviews.llvm.org/D15029 llvm-svn: 254207	2015-11-27 18:47:36 +00:00
Kelvin Li	099bb8c65d	[OpenMP] Parsing and sema support for num_teams clause http://reviews.llvm.org/D14802 llvm-svn: 254019	2015-11-24 20:50:12 +00:00
Kelvin Li	0bff7afab5	[OpenMP] Parsing and sema support for map clause http://reviews.llvm.org/D14134 llvm-svn: 253849	2015-11-23 05:32:03 +00:00
NAKAMURA Takumi	811a09ec5b	CGStmtOpenMP.cpp: Prune redundant \param. [-Wdocumentation] llvm-svn: 249698	2015-10-08 16:41:42 +00:00
Alexey Bataev	f24e7b1f60	[OPENMP 4.1] Codegen for array sections/subscripts in 'reduction' clause. OpenMP 4.1 adds support for array sections/subscripts in 'reduction' clause. Patch adds codegen for this feature. llvm-svn: 249672	2015-10-08 09:10:53 +00:00
Samuel Antao	bed3c46632	[OpenMP] Target directive host codegen. This patch implements the outlining for offloading functions for code annotated with the OpenMP target directive. It uses a temporary naming of the outlined functions that will have to be updated later on once target side codegen and registration of offloading libraries is implemented - the naming needs to be made unique in the produced library. llvm-svn: 249148	2015-10-02 16:14:20 +00:00
Alexey Bataev	5f600d6a49	[OPENMP 4.1] Codegen for ‘simd’ clause in ‘ordered’ directive. Description. If the simd clause is specified, the ordered regions encountered by any thread will use only a single SIMD lane to execute the ordered regions in the order of the loop iterations. Restrictions. An ordered construct with the simd clause is the only OpenMP construct that can appear in the simd region. An ordered directive with ‘simd’ clause is generated as an outlined function and corresponding function call to prevent this part of code from vectorization later in backend. llvm-svn: 248772	2015-09-29 03:48:57 +00:00
Alexey Bataev	d14d1e6f25	[OPENMP 4.1] Add 'simd' clause for 'ordered' directive. Parsing and sema analysis for 'simd' clause in 'ordered' directive. Description If the simd clause is specified, the ordered regions encountered by any thread will use only a single SIMD lane to execute the ordered regions in the order of the loop iterations. Restrictions An ordered construct with the simd clause is the only OpenMP construct that can appear in the simd region llvm-svn: 248696	2015-09-28 06:39:35 +00:00
Alexey Bataev	346265e3bc	[OPENMP 4.1] Add 'threads' clause for '#pragma omp ordered'. OpenMP 4.1 extends format of '#pragma omp ordered'. It adds 3 additional clauses: 'threads', 'simd' and 'depend'. If no clause is specified, the ordered construct behaves as if the threads clause had been specified. If the threads clause is specified, the threads in the team executing the loop region execute ordered regions sequentially in the order of the loop iterations. The loop region to which an ordered region without any clause or with a threads clause binds must have an ordered clause without the parameter specified on the corresponding loop directive. llvm-svn: 248569	2015-09-25 10:37:12 +00:00

1 2 3 4 5 ...

409 Commits