This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements. Support for this functionality is included in
the patch.
A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region. Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp). For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not. The patch adds support for handling multiple captured
statements based on the combined directive.
When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753
llvm-svn: 292419
This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements. Support for this functionality is included in
the patch.
A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region. Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp). For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not. The patch adds support for handling multiple captured
statements based on the combined directive.
When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753
llvm-svn: 292374
Summary:
This patch introduces support for the execution of parallel constructs in a target
region on the NVPTX device. Parallel regions must be in the lexical scope of the
target directive.
The master thread in the master warp signals parallel work for worker threads in worker
warps on encountering a parallel region.
Note: The patch does not yet support capture of arguments in a parallel region so
the test cases are simple.
Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28145
llvm-svn: 291565
Summary:
This patch adds two fields to the offload entry descriptor. One field is meant to signal Ctors/Dtors and `link` global variables, and the other is reserved for runtime library use.
Currently, these fields are only filled with zeros in the current code generation, but that will change when `declare target` is added.
The reason, we are adding these fields now is to make the code generation consistent with the runtime library proposal under review in https://reviews.llvm.org/D14031.
Reviewers: ABataev, hfinkel, carlo.bertolli, kkwli0, arpith-jacob, Hahnfeld
Subscribers: cfe-commits, caomhin, jholewinski
Differential Revision: https://reviews.llvm.org/D28298
llvm-svn: 291124
Summary: This patch adds support for the use_device_ptr clause. It includes changes in SEMA that could not be tested without codegen, namely, the use of the first private logic and mappable expressions support.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: caomhin, cfe-commits
Differential Revision: https://reviews.llvm.org/D22691
llvm-svn: 276977
Summary: This patch implements the code generation for the `target update` directive. The implemntation relies on the logic already in place for target data standalone directives, i.e. target enter/exit data.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: caomhin, cfe-commits
Differential Revision: http://reviews.llvm.org/D20650
llvm-svn: 270886
directives.
OpenMP 4.5 supports clause 'priority' in task-based directives. Patch
adds initial codegen support for this clause in codegen.
llvm-svn: 269050
If private variables require destructors call at the deletion of the
task, additional flag in task flags must be set. Patch fixes this
problem.
llvm-svn: 269039
schedule modifiers.
Runtime library expects some additional data in schedule argument for
loop-based directives, that have additional schedule modifiers
'monotonic|nonmonotonic'.
llvm-svn: 269035
OpenMP 4.5 adds taskloop/taskloop simd directives. These directives
allow to use lastprivate clause. Patch adds codegen for this clause.
llvm-svn: 268618
directive.
OpenMP 4.5 defines 'taskloop' directive and 2 additional clauses
'grainsize' and 'num_tasks' for this directive. Patch adds codegen for
these clauses.
These clauses are generated as arguments of the '__kmpc_taskloop'
libcall and are encoded the following way:
void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val, kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup, int sched, kmp_uint64 grainsize, void *task_dup);
If 'grainsize' is specified, 'sched' argument must be set to '1' and
'grainsize' argument must be set to the value of the 'grainsize' clause.
If 'num_tasks' is specified, 'sched' argument must be set to '2' and
'grainsize' argument must be set to the value of the 'num_tasks' clause.
It is possible because these 2 clauses are mutually exclusive and can't
be used at the same time on the same directive.
If none of these clauses is specified, 'sched' argument must be set to
'0'.
llvm-svn: 267862
Summary:
This patch adds support for the target exit data directive code generation.
Given that, apart from the employed runtime call, target exit data requires the same code generation pattern as target enter data, the OpenMP codegen entry point was renamed and reused for both.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: cfe-commits, fraggamuffin, caomhin
Differential Revision: http://reviews.llvm.org/D17369
llvm-svn: 267814
Summary: This patch adds support for the target enter data directive code generation.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: cfe-commits, fraggamuffin, caomhin
Differential Revision: http://reviews.llvm.org/D17368
llvm-svn: 267812
Summary:
This patch adds support for the target data directive code generation.
Part of the already existent functionality related with data maps is moved to a new function so that it could be reused.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: cfe-commits, fraggamuffin, caomhin
Differential Revision: http://reviews.llvm.org/D17367
llvm-svn: 267811
The taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using OpenMP tasks. The iterations are distributed across tasks created by the construct and scheduled to be executed.
The next code will be generated for the taskloop directive:
#pragma omp taskloop num_tasks(N) lastprivate(j)
for( i=0; i<N*GRAIN*STRIDE-1; i+=STRIDE ) {
int th = omp_get_thread_num();
#pragma omp atomic
counter++;
#pragma omp atomic
th_counter[th]++;
j = i;
}
Generated code:
task = __kmpc_omp_task_alloc(NULL,gtid,1,sizeof(struct
task),sizeof(struct shar),&task_entry);
psh = task->shareds;
psh->pth_counter = &th_counter;
psh->pcounter = &counter;
psh->pj = &j;
task->lb = 0;
task->ub = N*GRAIN*STRIDE-2;
task->st = STRIDE;
__kmpc_taskloop(
NULL, // location
gtid, // gtid
task, // task structure
1, // if clause value
&task->lb, // lower bound
&task->ub, // upper bound
STRIDE, // loop increment
0, // 1 if nogroup specified
2, // schedule type: 0-none, 1-grainsize, 2-num_tasks
N, // schedule value (ignored for type 0)
(void*)&__task_dup_entry // tasks duplication routine
);
llvm-svn: 267395
If the untied clause is present on a task construct, any thread in the
team can resume the task region after a suspension. Patch adds proper
codegen for untied tasks.
llvm-svn: 266853
If the untied clause is present on a task construct, any thread in the
team can resume the task region after a suspension. Patch adds proper
codegen for untied tasks.
llvm-svn: 266754
If the untied clause is present on a task construct, any thread in the team can resume the task region after a suspension. Patch adds proper codegen for untied tasks.
llvm-svn: 266722
This patch implements the teams directive for the NVPTX backend. It is different from the host code generation path as it:
Does not call kmpc_fork_teams. All necessary teams and threads are started upon touching the target region, when launching a CUDA kernel, and their execution is coordinated through sequential and parallel regions within the target region.
Does not call kmpc_push_num_teams even if a num_teams of thread_limit clause is present. Setting the number of teams and the thread limit is implemented by the nvptx-related runtime.
Please note that I am now passing a Clang Expr * to emitPushNumTeams instead of the originally chosen llvm::Value * type. The reason for that is that I want to avoid emitting expressions for num_teams and thread_limit if they are not needed in the target region.
http://reviews.llvm.org/D17963
llvm-svn: 265304
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264700
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264576
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
(required) code to support target specific codegen.
llvm-svn: 264569
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
Reworked test case after buildbot failure on windows.
Updated patch to integrate r263837 and test case nvptx_target_firstprivate_codegen.cpp.
llvm-svn: 264018
Summary:
Reworked test case after buildbot failure on windows.
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
llvm-svn: 263783
OpenMP 4.0 allows to define custom reduction operations using '#pragma
omp declare reduction' construct. Patch allows to use this custom
defined reduction operations in 'reduction' clauses.
llvm-svn: 263701
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
llvm-svn: 263587
Summary:
This patch adds base support for codegen of the target directive on the NVPTX device.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D17877
llvm-svn: 263552
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.
http://reviews.llvm.org/D17170
llvm-svn: 262832
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.
http://reviews.llvm.org/D17170
llvm-svn: 262741
Summary:
Unlike other outlined regions in OpenMP, offloading entry points have to have be visible (external linkage) for the device side. Using dots in the names of the entries can be therefore problematic for some toolchains, e.g. NVPTX.
Also the patch drops the column information in the unique name of the entry points. The parsing of directives ignore unknown tokens, preventing several target regions to be implemented in the same line. Therefore, the line information is sufficient for the name to be unique. Also, the preprocessor printer does not preserve the column information, causing offloading-entry detection issues if the host uses an integrated preprocessor and the target doesn't (or vice versa).
Reviewers: hfinkel, arpith-jacob, carlo.bertolli, kkwli0, ABataev
Subscribers: cfe-commits, fraggamuffin, caomhin
Differential Revision: http://reviews.llvm.org/D17179
llvm-svn: 260837