Sema holds the current FPOptions which is adjusted by 'pragma STDC
FP_CONTRACT'. This then gets propagated into expression nodes as they are
built.
This encapsulates FPOptions so that this propagation happens opaquely rather
than directly with the fp_contractable on/off bit. This allows controlled
transitioning of fp_contractable to a ternary value (off, on, fast). It will
also allow adding more fast-math flags later.
This is toward moving fp-contraction=fast from an LLVM TargetOption to a
FastMathFlag in order to fix PR25721.
Differential Revision: https://reviews.llvm.org/D31166
llvm-svn: 298877
This patch implements codegen for the reduction clause on
any teams construct for elementary data types. It builds
on parallel reductions on the GPU. Subsequently,
the team master writes to a unique location in a global
memory scratchpad. The last team to do so loads and
reduces this array to calculate the final result.
This patch emits two helper functions that are used by
the OpenMP runtime on the GPU to perform reductions across
teams.
Patch by Tian Jin in collaboration with Arpith Jacob
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29879
llvm-svn: 295335
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types. An efficient
implementation requires hierarchical reduction within a
warp and a threadblock. It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.
The patch creates a struct to hold reduction variables and
a number of helper functions. The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team. Variables are
shared between CUDA threads using shuffle intrinsics.
An implementation of reductions on the NVPTX device is
substantially different to that of CPUs. However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.
The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions. The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.
While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.
Patch by Tian Jin in collaboration with Arpith Jacob
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758
llvm-svn: 295333
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types. An efficient
implementation requires hierarchical reduction within a
warp and a threadblock. It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.
The patch creates a struct to hold reduction variables and
a number of helper functions. The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team. Variables are
shared between CUDA threads using shuffle intrinsics.
An implementation of reductions on the NVPTX device is
substantially different to that of CPUs. However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.
The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions. The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.
While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.
Patch by Tian Jin in collaboration with Arpith Jacob
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758
llvm-svn: 295319
This patch adds support for codegen of 'target teams' on the host.
This combined directive has two captured statements, one for the
'teams' region, and the other for the 'parallel'.
This target teams region is offloaded using the __tgt_target_teams()
call. The patch sets the number of teams as an argument to
this call.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29084
llvm-svn: 293005
This patch adds support for codegen of 'target teams' on the host.
This combined directive has two captured statements, one for the
'teams' region, and the other for the 'parallel'.
This target teams region is offloaded using the __tgt_target_teams()
call. The patch sets the number of teams as an argument to
this call.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29084
llvm-svn: 293001
The if-clause on the combined directive potentially applies to both the
'target' and the 'parallel' regions. Codegen'ing the if-clause on the
combined directive requires additional support because the expression in
the clause must be captured by the 'target' capture statement but not
the 'parallel' capture statement. Note that this situation arises for
other clauses such as num_threads.
The OMPIfClause class inherits OMPClauseWithPreInit to support capturing
of expressions in the clause. A member CaptureRegion is added to
OMPClauseWithPreInit to indicate which captured statement (in this case
'target' but not 'parallel') captures these expressions.
To ensure correct codegen of captured expressions in the presence of
combined 'target' directives, OMPParallelScope was added to 'parallel'
codegen.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28781
llvm-svn: 292437
This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements. Support for this functionality is included in
the patch.
A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region. Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp). For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not. The patch adds support for handling multiple captured
statements based on the combined directive.
When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753
llvm-svn: 292419
This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements. Support for this functionality is included in
the patch.
A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region. Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp). For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not. The patch adds support for handling multiple captured
statements based on the combined directive.
When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753
llvm-svn: 292374
This patch refactors code that calls codegen for target regions. Currently
the codebase only supports the 'target' directive. The patch pulls out
common target processing code into a static function that can be called
by codegen for any target directive.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28752
llvm-svn: 292134
This patch is to implement sema and parsing for 'target teams distribute simd’ pragma.
Differential Revision: https://reviews.llvm.org/D28252
llvm-svn: 291579
https://reviews.llvm.org/D17840
This patch enables private, firstprivate, and lastprivate clauses for the OpenMP distribute directive.
Regression tests differ from the similar case of the same clauses on the for directive, by removing a reference to two global variables g and g1. This is necessary because: 1. a distribute pragma is only allowed inside a target region; 2. referring a global variable (e.g. g and g1) in a target region requires the program to enclose the variable in a "declare target" region; 3. declare target pragmas, which are used to define a declare target region, are currently unavailable in clang (patch being prepared).
For this reason, I moved the global declarations into local variables.
llvm-svn: 290898
This patch is to implement sema and parsing for 'target teams distribute parallel for simd’ pragma.
Differential Revision: https://reviews.llvm.org/D28202
llvm-svn: 290862
This patch is to implement sema and parsing for 'target teams distribute parallel for’ pragma.
Differential Revision: https://reviews.llvm.org/D28160
llvm-svn: 290725
This patch is to implement sema and parsing for 'target teams distribute' pragma.
Differential Revision: https://reviews.llvm.org/D28015
llvm-svn: 290508
This patch is to implement sema and parsing for 'teams distribute parallel for' pragma.
Differential Revision: https://reviews.llvm.org/D27345
llvm-svn: 289179
This patch is to implement sema and parsing for 'teams distribute parallel for simd' pragma.
Differential Revision: https://reviews.llvm.org/D27084
llvm-svn: 288294
If 'omp cancel' construct is used in a worksharing construct it may
cause hanging of the software in case if reduction clause is used. Patch fixes this problem by avoiding extra reduction processing for branches that were canceled.
llvm-svn: 287227
Summary:
r286944 introduced bugs detected by ASAN as use-after-return.
r287025 have not fixed them completely.
This reverts commit r286944 and r287025.
Reviewers: ABataev
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D26720
llvm-svn: 287069
If 'omp cancel' construct is used in a worksharing construct it may cause
hanging of the software in case if reduction clause is used. Patch fixes
this problem by avoiding extra reduction processing for branches that
were canceled.
llvm-svn: 286944
can be used to improve the locations when generating remarks for loops.
Depends on the companion LLVM change r286227.
Patch by Florian Hahn.
Differential Revision: https://reviews.llvm.org/D25764
llvm-svn: 286456
After some changes in codegen capturing of VLA variables in OpenMP regions was broken, causing compiler crash. Patch fixes this issue.
llvm-svn: 286103
After some changes in codegen capturing of VLA variables in OpenMP
regions was broken, causing compiler crash. Patch fixes this issue.
llvm-svn: 286098
constexpr variable.
When compiling a constexpr NSString initialized with an objective-c
string literal, CodeGen emits objc_storeStrong on an uninitialized
alloca, which causes a crash.
This patch folds the code in EmitScalarInit into EmitStoreThroughLValue
and fixes the crash by calling objc_retain on the string instead of
using objc_storeStrong.
rdar://problem/28562009
Differential Revision: https://reviews.llvm.org/D25547
llvm-svn: 284516
access, by Erich Keane
OpenMP creates a variable array type with a a null size-expr. The Debug
generation failed to due to this. This patch corrects the openmp
implementation, updates the tests, and adds a new one for this
condition.
Differential Revision: https://reviews.llvm.org/D25373
llvm-svn: 284110
This reverts commit r279003 as it breaks some of our buildbots (e.g.
clang-cmake-aarch64-quick, clang-x86_64-linux-selfhost-modules).
The error is in OpenMP/teams_distribute_simd_ast_print.cpp:
clang: /home/buildslave/buildslave/clang-cmake-aarch64-quick/llvm/include/llvm/ADT/DenseMap.h:527:
bool llvm::DenseMapBase<DerivedT, KeyT, ValueT, KeyInfoT, BucketT>::LookupBucketFor(const LookupKeyT&, const BucketT*&) const
[with LookupKeyT = clang::Stmt*; DerivedT = llvm::DenseMap<clang::Stmt*, long unsigned int>;
KeyT = clang::Stmt*; ValueT = long unsigned int;
KeyInfoT = llvm::DenseMapInfo<clang::Stmt*>;
BucketT = llvm::detail::DenseMapPair<clang::Stmt*, long unsigned int>]:
Assertion `!KeyInfoT::isEqual(Val, EmptyKey) && !KeyInfoT::isEqual(Val, TombstoneKey) &&
"Empty/Tombstone value shouldn't be inserted into map!"' failed.
llvm-svn: 279045
This patch is to implement sema and parsing for 'teams distribute simd’ pragma.
This patch is originated by Carlo Bertolli.
Differential Revision: https://reviews.llvm.org/D23528
llvm-svn: 279003
Summary: This patch adds support for the use_device_ptr clause. It includes changes in SEMA that could not be tested without codegen, namely, the use of the first private logic and mappable expressions support.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: caomhin, cfe-commits
Differential Revision: https://reviews.llvm.org/D22691
llvm-svn: 276977
Summary:
This patch fixes a bug in the map of array sections whose base is a reference to a pointer. The existing mapping support was not prepared to deal with it, causing the compiler to crash.
Mapping a reference to a pointer enjoys the same characteristics of a regular pointer, i.e., it is passed by value. Therefore, the reference has to be materialized in the target region.
Reviewers: hfinkel, carlo.bertolli, kkwli0, ABataev
Subscribers: caomhin, cfe-commits
Differential Revision: https://reviews.llvm.org/D22690
llvm-svn: 276933
This patch is to implement sema and parsing for 'target parallel for simd' pragma.
Differential Revision: http://reviews.llvm.org/D22096
llvm-svn: 275365
http://reviews.llvm.org/D21904
This patch is similar to the implementation of 'private' clause: it adds a list of private pointers to be used within the target data region to store the device pointers returned by the runtime.
Please refer to the following document for a full description of what the runtime witll return in this case (page 10 and 11):
https://github.com/clang-omp/OffloadingDesign
I am happy to answer any question related to the runtime interface to help reviewing this patch.
llvm-svn: 275271
Summary: This patch is an implementation of sema and parsing for the OpenMP composite pragma 'distribute simd'.
Differential Revision: http://reviews.llvm.org/D22007
llvm-svn: 274604
Summary: This patch is an implementation of sema and parsing for the OpenMP composite pragma 'distribute parallel for simd'.
Differential Revision: http://reviews.llvm.org/D21977
llvm-svn: 274530
[OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for'
This patch is an initial implementation for #distribute parallel for.
The main differences that affect other pragmas are:
The implementation of 'distribute parallel for' requires blocking of the associated loop, where blocks are "distributed" to different teams and iterations within each block are scheduled to parallel threads within each team. To implement blocking, sema creates two additional worksharing directive fields that are used to pass the team assigned block lower and upper bounds through the outlined function resulting from 'parallel'. In this way, scheduling for 'for' to threads can use those bounds.
As a consequence of blocking, the stride of 'distribute' is not 1 but it is equal to the blocking size. This is returned by the runtime and sema prepares a DistIncrExpr variable to hold that value.
As a consequence of blocking, the global upper bound (EnsureUpperBound) expression of the 'for' is not the original loop upper bound (e.g. in for(i = 0 ; i < N; i++) this is 'N') but it is the team-assigned block upper bound. Sema creates a new expression holding the calculation of the actual upper bound for 'for' as UB = min(UB, PrevUB), where UB is the loop upper bound, and PrevUB is the team-assigned block upper bound.
llvm-svn: 273884
http://reviews.llvm.org/D21564
This patch is an initial implementation for #distribute parallel for.
The main differences that affect other pragmas are:
The implementation of 'distribute parallel for' requires blocking of the associated loop, where blocks are "distributed" to different teams and iterations within each block are scheduled to parallel threads within each team. To implement blocking, sema creates two additional worksharing directive fields that are used to pass the team assigned block lower and upper bounds through the outlined function resulting from 'parallel'. In this way, scheduling for 'for' to threads can use those bounds.
As a consequence of blocking, the stride of 'distribute' is not 1 but it is equal to the blocking size. This is returned by the runtime and sema prepares a DistIncrExpr variable to hold that value.
As a consequence of blocking, the global upper bound (EnsureUpperBound) expression of the 'for' is not the original loop upper bound (e.g. in for(i = 0 ; i < N; i++) this is 'N') but it is the team-assigned block upper bound. Sema creates a new expression holding the calculation of the actual upper bound for 'for' as UB = min(UB, PrevUB), where UB is the loop upper bound, and PrevUB is the team-assigned block upper bound.
llvm-svn: 273705
Summary:
This patch fixes an issue detected when firstprivate variables are passed to an OpenMP outlined function vararg list. Currently they are not compatible with what the runtime library expects causing malfunction in some targets.
This patch fixes the issue by moving the casting logic already in place for offloading to the common code that creates the outline function and arguments and updates the regression tests accordingly.
Reviewers: hfinkel, arpith-jacob, carlo.bertolli, kkwli0, ABataev
Subscribers: cfe-commits, caomhin
Differential Revision: http://reviews.llvm.org/D21150
llvm-svn: 272900
directives.
'kmp_task_t' record type added a new field for 'priority' clause and
changed the representation of pointer to destructors for privates used
within loop-based directives.
Old representation:
typedef struct kmp_task { /* GEH: Shouldn't this be
aligned somehow? */
void *shareds; /**< pointer to block of
pointers to shared vars */
kmp_routine_entry_t routine; /**< pointer to routine
to call for executing task */
kmp_int32 part_id; /**< part id for the
task */
kmp_routine_entry_t destructors; /* pointer to function to
invoke deconstructors of firstprivate C++ objects */
/* private vars */
} kmp_task_t;
New representation:
typedef struct kmp_task { /* GEH: Shouldn't this be
aligned somehow? */
void *shareds; /**< pointer to block of
pointers to shared vars */
kmp_routine_entry_t routine; /**< pointer to routine
to call for executing task */
kmp_int32 part_id; /**< part id for the
task */
kmp_cmplrdata_t data1; /* Two known
optional additions: destructors and priority */
kmp_cmplrdata_t data2; /* Process
destructors first, priority second */
/* future data */
/* private vars */
} kmp_task_t;
Also excessive initialization of 'destructors' fields to 'null' was
removed from codegen if it is known that no destructors shal be used.
Currently a special bit is used in 'kmp_tasking_flags_t' bitfields
('destructors_thunk' bitfield).
llvm-svn: 271201
Summary: This patch implements the code generation for the `target update` directive. The implemntation relies on the logic already in place for target data standalone directives, i.e. target enter/exit data.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: caomhin, cfe-commits
Differential Revision: http://reviews.llvm.org/D20650
llvm-svn: 270886
Summary:
The patch contains the parsing and sema support for the `from` clause.
Patch based on the original post by Kelvin Li.
Reviewers: hfinkel, carlo.bertolli, kkwli0, arpith-jacob, ABataev
Subscribers: caomhin, cfe-commits
Differential Revision: http://reviews.llvm.org/D18488
llvm-svn: 270882
Summary:
The patch contains the parsing and sema support for the `to` clause.
Patch based on the original post by Kelvin Li.
Reviewers: carlo.bertolli, hfinkel, kkwli0, arpith-jacob, ABataev
Subscribers: caomhin, cfe-commits
Differential Revision: http://reviews.llvm.org/D18597
llvm-svn: 270880
Summary:
This patch is to add parsing and sema support for `target update` directive. Support for the `to` and `from` clauses will be added by a different patch. This patch also adds support for other clauses that are already implemented upstream and apply to `target update`, e.g. `device` and `if`.
This patch is based on the original post by Kelvin Li.
Reviewers: hfinkel, carlo.bertolli, kkwli0, arpith-jacob, ABataev
Subscribers: caomhin, cfe-commits
Differential Revision: http://reviews.llvm.org/D15944
llvm-svn: 270878
Getting accurate locations for loops is important, because those locations are
used by the frontend to generate optimization remarks. Currently, optimization
remarks for loops often appear on the wrong line, often the first line of the
loop body instead of the loop itself. This is confusing because that line might
itself be another loop, or might be somewhere else completely if the body was
an inlined function call. This happens because of the way we find the loop's
starting location. First, we look for a preheader, and if we find one, and its
terminator has a debug location, then we use that. Otherwise, we look for a
location on an instruction in the loop header.
The fallback heuristic is not bad, but will almost always find the beginning of
the body, and not the loop statement itself. The preheader location search
often fails because there's often not a preheader, and even when there is a
preheader, depending on how it was formed, it sometimes carries the location of
some preceeding code.
I don't see any good theoretical way to fix this problem. On the other hand,
this seems like a straightforward solution: Put the debug location in the
loop's llvm.loop metadata. When emitting debug information, this commit causes
us to add the debug location as an operand to each loop's llvm.loop metadata.
Thus, we now generate this metadata for all loops (not just loops with
optimization hints) when we're otherwise generating debug information.
The remark test case changes depend on the companion LLVM commit r270771.
llvm-svn: 270772
directives.
If firstprivate variable is is captured by value in outlined region and then used as firstprivate variable in inner worksharing directive, the copy for this firstprivate variable was not created. Fixed this bug.
llvm-svn: 270536
For better performance and to unify code with offloading part we pass
scalar firstprivate values by value, instead of by reference. It will
remove some extra copying operations.
llvm-svn: 269751
directives.
OpenMP 4.5 supports clause 'priority' in task-based directives. Patch
adds initial codegen support for this clause in codegen.
llvm-svn: 269050
schedule modifiers.
Runtime library expects some additional data in schedule argument for
loop-based directives, that have additional schedule modifiers
'monotonic|nonmonotonic'.
llvm-svn: 269035
OpenMP 4.5 adds taskloop/taskloop simd directives. These directives
allow to use lastprivate clause. Patch adds codegen for this clause.
llvm-svn: 268618
OpenMP 4.5 defines 'taskloop simd' directive, which is combined
directive for 'taskloop' and 'simd' directives. Patch adds initial
codegen support for this directive and its 2 basic clauses 'safelen' and
'simdlen'.
llvm-svn: 267872
directive.
OpenMP 4.5 defines 'taskloop' directive and 2 additional clauses
'grainsize' and 'num_tasks' for this directive. Patch adds codegen for
these clauses.
These clauses are generated as arguments of the '__kmpc_taskloop'
libcall and are encoded the following way:
void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val, kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup, int sched, kmp_uint64 grainsize, void *task_dup);
If 'grainsize' is specified, 'sched' argument must be set to '1' and
'grainsize' argument must be set to the value of the 'grainsize' clause.
If 'num_tasks' is specified, 'sched' argument must be set to '2' and
'grainsize' argument must be set to the value of the 'num_tasks' clause.
It is possible because these 2 clauses are mutually exclusive and can't
be used at the same time on the same directive.
If none of these clauses is specified, 'sched' argument must be set to
'0'.
llvm-svn: 267862
Summary:
This patch adds support for the target exit data directive code generation.
Given that, apart from the employed runtime call, target exit data requires the same code generation pattern as target enter data, the OpenMP codegen entry point was renamed and reused for both.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: cfe-commits, fraggamuffin, caomhin
Differential Revision: http://reviews.llvm.org/D17369
llvm-svn: 267814
Summary: This patch adds support for the target enter data directive code generation.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: cfe-commits, fraggamuffin, caomhin
Differential Revision: http://reviews.llvm.org/D17368
llvm-svn: 267812
Summary:
This patch adds support for the target data directive code generation.
Part of the already existent functionality related with data maps is moved to a new function so that it could be reused.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: cfe-commits, fraggamuffin, caomhin
Differential Revision: http://reviews.llvm.org/D17367
llvm-svn: 267811
declare reductions.
If reduction clause is applied to instance of class with user-defined
reduction operation without initialization clause, it may cause a crash.
Patch fixes this issue.
llvm-svn: 267695
Currently there is a problem with codegen of inlined directives inside
lambdas, it may cause a crash during codegen because of incorrect
capturing of variables. Patch fixes this problem.
llvm-svn: 267677
The taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using OpenMP tasks. The iterations are distributed across tasks created by the construct and scheduled to be executed.
The next code will be generated for the taskloop directive:
#pragma omp taskloop num_tasks(N) lastprivate(j)
for( i=0; i<N*GRAIN*STRIDE-1; i+=STRIDE ) {
int th = omp_get_thread_num();
#pragma omp atomic
counter++;
#pragma omp atomic
th_counter[th]++;
j = i;
}
Generated code:
task = __kmpc_omp_task_alloc(NULL,gtid,1,sizeof(struct
task),sizeof(struct shar),&task_entry);
psh = task->shareds;
psh->pth_counter = &th_counter;
psh->pcounter = &counter;
psh->pj = &j;
task->lb = 0;
task->ub = N*GRAIN*STRIDE-2;
task->st = STRIDE;
__kmpc_taskloop(
NULL, // location
gtid, // gtid
task, // task structure
1, // if clause value
&task->lb, // lower bound
&task->ub, // upper bound
STRIDE, // loop increment
0, // 1 if nogroup specified
2, // schedule type: 0-none, 1-grainsize, 2-num_tasks
N, // schedule value (ignored for type 0)
(void*)&__task_dup_entry // tasks duplication routine
);
llvm-svn: 267395
causes code generation failure.
The codegen part of firstprivate clause for member decls used type of
original variable without skipping reference type from
OMPCapturedExprDecl. Patch fixes this problem.
llvm-svn: 267125
If loop control variable for simd-based directives is explicitly marked
as linear/lastprivate in clauses, codegen for such construct would
crash. Patch fixes this problem.
llvm-svn: 267101
If the untied clause is present on a task construct, any thread in the
team can resume the task region after a suspension. Patch adds proper
codegen for untied tasks.
llvm-svn: 266853
If the untied clause is present on a task construct, any thread in the
team can resume the task region after a suspension. Patch adds proper
codegen for untied tasks.
llvm-svn: 266754
If the untied clause is present on a task construct, any thread in the team can resume the task region after a suspension. Patch adds proper codegen for untied tasks.
llvm-svn: 266722
OpenMP 4.0 defines clause 'uniform' in 'declare simd' directive:
'uniform' '(' <argument-list> ')'
The uniform clause declares one or more arguments to have an invariant value for all concurrent invocations of the function in the execution of a single SIMD loop.
The special this pointer can be used as if was one of the arguments to the function in any of the linear, aligned, or uniform clauses.
llvm-svn: 266041
Summary: See LLVM change D18775 for details, this change depends on it.
Reviewers: jyknight, reames
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D18776
llvm-svn: 265569
This patch implements the teams directive for the NVPTX backend. It is different from the host code generation path as it:
Does not call kmpc_fork_teams. All necessary teams and threads are started upon touching the target region, when launching a CUDA kernel, and their execution is coordinated through sequential and parallel regions within the target region.
Does not call kmpc_push_num_teams even if a num_teams of thread_limit clause is present. Setting the number of teams and the thread limit is implemented by the nvptx-related runtime.
Please note that I am now passing a Clang Expr * to emitPushNumTeams instead of the originally chosen llvm::Value * type. The reason for that is that I want to avoid emitting expressions for num_teams and thread_limit if they are not needed in the target region.
http://reviews.llvm.org/D17963
llvm-svn: 265304
For better support of some specific GNU extensions some extra
transformation of AST nodes were introduced. These transformations are
very hard to handle. The code is improved in handling of these
extensions by using captured expressions construct.
llvm-svn: 264709
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264700
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264576
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264569
OpenMP 4.0 allows to define custom reduction operations using '#pragma
omp declare reduction' construct. Patch allows to use this custom
defined reduction operations in 'reduction' clauses.
llvm-svn: 263701
OpenMP 4.5 allows privatization of non-static data members in OpenMP
constructs. Patch adds proper codegen support for data members in
'linear' clause
llvm-svn: 263003
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.
http://reviews.llvm.org/D17170
llvm-svn: 262832
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.
http://reviews.llvm.org/D17170
llvm-svn: 262741
Add code generation support for firstprivate and private clauses of teams on the host. Add extensive regression tests including lambda functions and vla testing.
http://reviews.llvm.org/D17582
llvm-svn: 262663
Summary:
This patch implements the launching of a target region in the presence of a nested teams region, i.e calls tgt_target_teams with the required arguments gathered from the enclosed teams directive.
The actual codegen of the region enclosed by the teams construct will be contributed in a separate patch.
Reviewers: hfinkel, arpith-jacob, kkwli0, carlo.bertolli, ABataev
Subscribers: cfe-commits, caomhin, fraggamuffin
Differential Revision: http://reviews.llvm.org/D17019
llvm-svn: 262625
OpenMP 4.5 allows to privatize data members of current class in member
functions. Patch adds initial support for privatization of data members
in 'linear' clause, no codegen support.
llvm-svn: 262578
OpenMP 4.5 allows to privatize non-static data members of current class
in non-static member functions. Patch supports codegen for non-static
data members in 'reduction' clauses.
llvm-svn: 262460
OpenMP 4.5 allows to privatize non-static member decls in non-static
member functions. Patch captures such decls by reference in general (for
bitfields, by value) and then operates with this capture. For bitfields,
at the end of codegen for lastprivates original bitfield is updated with the value of captured copy.
llvm-svn: 261824
Patch fixes bug with codegen for lastprivate loop counters. Also it may
improve performance for lastprivates calculations in some cases.
llvm-svn: 261209
Expressions inside 'schedule'|'dist_schedule' clause must be captured in
combined directives to avoid possible crash during codegen. Patch
improves handling of such constructs
llvm-svn: 260954
Sync barrier will be emitted after generation of firstprivate variables
only if one of the firstprivate vars is used in lastprivate clause.
llvm-svn: 260877
OMPCapturedExprDecl allows caopturing not only of fielddecls, but also
other expressions. It also allows to simplify codegen for several
clauses.
llvm-svn: 260492
Codegen for array sections/array subscripts worked only for expressions with arrays as base. Patch fixes codegen for bases with pointer/reference types.
llvm-svn: 259776
Summary:
This patch adds parsing + sema for the target parallel for directive along with testcases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16759
llvm-svn: 259654
Summary:
This patch adds parsing + sema for the target parallel directive and its clauses along with testcases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16553
Rebased to current trunk and updated test cases.
llvm-svn: 258832
Summary:
This patch adds parsing + sema for the defaultmap clause associated with the target directive (among others).
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16527
llvm-svn: 258817
If 'sections' directive has only one sub-section, the code for 'single'-based directive was emitted. Removed this codegen, because it causes crashes in different cases.
llvm-svn: 258495
This patch attempts to fix the regressions identified when the patch was committed initially.
Thanks to Michael Liao for identifying the fix in the offloading metadata generation
related with side effects in evaluation of function arguments.
llvm-svn: 256933
Summary:
In order to offloading work properly two things need to be in place:
- a descriptor with all the offloading information (device entry functions, and global variable) has to be created by the host and registered in the OpenMP offloading runtime library.
- all the device functions need to be emitted for the device and a convention has to be in place so that the runtime library can easily map the host ID of an entry point with the actual function in the device.
This patch adds support for these two things. However, only entry functions are being registered given that 'declare target' directive is not yet implemented.
About offloading descriptor:
The details of the descriptor are explained with more detail in http://goo.gl/L1rnKJ. Basically the descriptor will have fields that specify the number of devices, the pointers to where the device images begin and end (that will be defined by the linker), and also pointers to a the begin and end of table whose entries contain information about a specific entry point. Each entry has the type:
```
struct __tgt_offload_entry{
void *addr;
char *name;
int64_t size;
};
```
and will be implemented in a pre determined (ELF) section `.omp_offloading.entries` with 1-byte alignment, so that when all the objects are linked, the table is in that section with no padding in between entries (will be like a C array). The code generation ensures that all `__tgt_offload_entry` entries are emitted in the same order for both host and device so that the runtime can have the corresponding entries in both host and device in same index of the table, and efficiently implement the mapping.
The resulting descriptor is registered/unregistered with the runtime library using the calls `__tgt_register_lib` and `__tgt_unregister_lib`. The registration is implemented in a high priority global initializer so that the registration happens always before any initializer (that can potentially include target regions) is run.
The driver flag -omptargets= was created to specify a comma separated list of devices the user wants to support so that the new functionality can be exercised. Each device is specified with its triple.
About target codegen:
The target codegen is pretty much straightforward as it reuses completely the logic of the host version for the same target region. The tricky part is to identify the meaningful target regions in the device side. Unlike other programming models, like CUDA, there are no already outlined functions with attributes that mark what should be emitted or not. So, the information on what to emit is passed in the form of metadata in host bc file. This requires a new option to pass the host bc to the device frontend. Then everything is similar to what happens in CUDA: the global declarations emission is intercepted to check to see if it is an "interesting" declaration. The difference is that instead of checking an attribute, the metadata information in checked. Right now, there is only a form of metadata to pass information about the device entry points (target regions). A class `OffloadEntriesInfoManagerTy` was created to manage all the information and queries related with the metadata. The metadata looks like this:
```
!omp_offload.info = !{!0, !1, !2, !3, !4, !5, !6}
!0 = !{i32 0, i32 52, i32 77426347, !"_ZN2S12r1Ei", i32 479, i32 13, i32 4}
!1 = !{i32 0, i32 52, i32 77426347, !"_ZL7fstatici", i32 461, i32 11, i32 5}
!2 = !{i32 0, i32 52, i32 77426347, !"_Z9ftemplateIiET_i", i32 444, i32 11, i32 6}
!3 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 99, i32 11, i32 0}
!4 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 272, i32 11, i32 3}
!5 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 127, i32 11, i32 1}
!6 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 159, i32 11, i32 2}
```
The fields in each metadata entry are (in sequence):
Entry 1) an ID of the type of metadata - right now only zero is used meaning "OpenMP target region".
Entry 2) a unique ID of the device where the input source file that contain the target region lives.
Entry 3) a unique ID of the file where the input source file that contain the target region lives.
Entry 4) a mangled name of the function that encloses the target region.
Entries 5) and 6) line and column number where the target region was found.
Entry 7) is the order the entry was emitted.
Entry 2) and 3) are required to distinguish files that have the same function name.
Entry 4) is required to distinguish different instances of the same declaration (usually templated ones)
Entries 5) and 6) are required to distinguish the particular target region in body of the function (it is possible that a given target region is not an entry point - if clause can evaluate always to zero - and therefore we need to identify the "interesting" target regions. )
This patch replaces http://reviews.llvm.org/D12306.
Reviewers: ABataev, hfinkel, tra, rjmccall, sfantao
Subscribers: FBrygidyn, piotr.rak, Hahnfeld, cfe-commits
Differential Revision: http://reviews.llvm.org/D12614
llvm-svn: 256842
#pragma omp parallel needs an implicit barrier that is currently done by an explicit call to __kmpc_barrier. However, the runtime already ensures a barrier in __kmpc_fork_call which currently leads to two barriers per region per thread.
Differential Revision: http://reviews.llvm.org/D15561
llvm-svn: 255992
OpenMP codegen tried to emit the code for its constructs even if it was detected as a dead-code. Added checks to ensure that the code is emitted if the code is not dead.
llvm-svn: 255990
OpenMP 4.5 adds directives 'taskloop' and 'taskloop simd'. These directives support clause 'num_tasks'. Patch adds parsing/semantic analysis for this clause.
llvm-svn: 255008
OpenMP 4.5 adds 'taksloop' and 'taskloop simd' directives, which have 'grainsize' clause. Patch adds parsing/sema analysis of this clause.
llvm-svn: 254903
OpenMP 4.5 adds 'taskloop' and 'taskloop simd' directives. These directives have new 'nogroup' clause. Patch adds basic parsing/sema support for this clause.
llvm-svn: 254899
Constructors and destructors may be represented by several functions
in IR. Only base structors correspond to source code, others are
small pieces of code and eventually call the base variant. In this
case instrumentation of non-base structors has little sense, this
fix remove it. Now profile data of a declaration corresponds to
exactly one function in IR, it agrees with the current logic of the
profile data loading.
This change fixes PR24996.
Differential Revision: http://reviews.llvm.org/D15158
llvm-svn: 254876
Summary:
This patch implements the 4.5 specification for the implicit data maps. OpenMP 4.5 specification changes the default way data is captured into a target region. All the non-aggregate kinds are passed by value by default. This required activating the capturing by value during SEMA for the target region. All the non-aggregate values that can be encoded in the size of a pointer are properly casted and forwarded to the runtime library. On top of fixing the previous weird behavior for mapping pointers in nested data regions (an explicit map was always required), this also improves performance as the number of allocations/transactions to the device per non-aggregate map are reduced from two to only one - instead of passing a reference and the value, only the value passed.
Explicit maps will be added later on once firstprivate, private, and map clauses' SEMA and parsing are available.
Reviewers: hfinkel, rjmccall, ABataev
Subscribers: cfe-commits, carlo.bertolli
Differential Revision: http://reviews.llvm.org/D14940
llvm-svn: 254521
OpenMP 4.5 defines new clause 'priority' for 'task', 'taskloop' and 'taskloop simd' directives. Added parsing and sema analysis for 'priority' clause in 'task' and 'taskloop' directives.
llvm-svn: 254398
This patch implements the outlining for offloading functions for code
annotated with the OpenMP target directive. It uses a temporary naming
of the outlined functions that will have to be updated later on once
target side codegen and registration of offloading libraries is
implemented - the naming needs to be made unique in the produced
library.
llvm-svn: 249148
Description.
If the simd clause is specified, the ordered regions encountered by any thread will use only a single SIMD lane to execute the ordered regions in the order of the loop iterations.
Restrictions.
An ordered construct with the simd clause is the only OpenMP construct that can appear in the simd region.
An ordered directive with ‘simd’ clause is generated as an outlined function and corresponding function call to prevent this part of code from vectorization later in backend.
llvm-svn: 248772
Parsing and sema analysis for 'simd' clause in 'ordered' directive.
Description
If the simd clause is specified, the ordered regions encountered by any thread will use only a single SIMD lane to execute the ordered
regions in the order of the loop iterations.
Restrictions
An ordered construct with the simd clause is the only OpenMP construct that can appear in the simd region
llvm-svn: 248696
OpenMP 4.1 extends format of '#pragma omp ordered'. It adds 3 additional clauses: 'threads', 'simd' and 'depend'.
If no clause is specified, the ordered construct behaves as if the threads clause had been specified. If the threads clause is specified, the threads in the team executing the loop region execute ordered regions sequentially in the order of the loop iterations.
The loop region to which an ordered region without any clause or with a threads clause binds must have an ordered clause without the parameter specified on the corresponding loop directive.
llvm-svn: 248569