Summary:
In order to offloading work properly two things need to be in place:
- a descriptor with all the offloading information (device entry functions, and global variable) has to be created by the host and registered in the OpenMP offloading runtime library.
- all the device functions need to be emitted for the device and a convention has to be in place so that the runtime library can easily map the host ID of an entry point with the actual function in the device.
This patch adds support for these two things. However, only entry functions are being registered given that 'declare target' directive is not yet implemented.
About offloading descriptor:
The details of the descriptor are explained with more detail in http://goo.gl/L1rnKJ. Basically the descriptor will have fields that specify the number of devices, the pointers to where the device images begin and end (that will be defined by the linker), and also pointers to a the begin and end of table whose entries contain information about a specific entry point. Each entry has the type:
```
struct __tgt_offload_entry{
void *addr;
char *name;
int64_t size;
};
```
and will be implemented in a pre determined (ELF) section `.omp_offloading.entries` with 1-byte alignment, so that when all the objects are linked, the table is in that section with no padding in between entries (will be like a C array). The code generation ensures that all `__tgt_offload_entry` entries are emitted in the same order for both host and device so that the runtime can have the corresponding entries in both host and device in same index of the table, and efficiently implement the mapping.
The resulting descriptor is registered/unregistered with the runtime library using the calls `__tgt_register_lib` and `__tgt_unregister_lib`. The registration is implemented in a high priority global initializer so that the registration happens always before any initializer (that can potentially include target regions) is run.
The driver flag -omptargets= was created to specify a comma separated list of devices the user wants to support so that the new functionality can be exercised. Each device is specified with its triple.
About target codegen:
The target codegen is pretty much straightforward as it reuses completely the logic of the host version for the same target region. The tricky part is to identify the meaningful target regions in the device side. Unlike other programming models, like CUDA, there are no already outlined functions with attributes that mark what should be emitted or not. So, the information on what to emit is passed in the form of metadata in host bc file. This requires a new option to pass the host bc to the device frontend. Then everything is similar to what happens in CUDA: the global declarations emission is intercepted to check to see if it is an "interesting" declaration. The difference is that instead of checking an attribute, the metadata information in checked. Right now, there is only a form of metadata to pass information about the device entry points (target regions). A class `OffloadEntriesInfoManagerTy` was created to manage all the information and queries related with the metadata. The metadata looks like this:
```
!omp_offload.info = !{!0, !1, !2, !3, !4, !5, !6}
!0 = !{i32 0, i32 52, i32 77426347, !"_ZN2S12r1Ei", i32 479, i32 13, i32 4}
!1 = !{i32 0, i32 52, i32 77426347, !"_ZL7fstatici", i32 461, i32 11, i32 5}
!2 = !{i32 0, i32 52, i32 77426347, !"_Z9ftemplateIiET_i", i32 444, i32 11, i32 6}
!3 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 99, i32 11, i32 0}
!4 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 272, i32 11, i32 3}
!5 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 127, i32 11, i32 1}
!6 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 159, i32 11, i32 2}
```
The fields in each metadata entry are (in sequence):
Entry 1) an ID of the type of metadata - right now only zero is used meaning "OpenMP target region".
Entry 2) a unique ID of the device where the input source file that contain the target region lives.
Entry 3) a unique ID of the file where the input source file that contain the target region lives.
Entry 4) a mangled name of the function that encloses the target region.
Entries 5) and 6) line and column number where the target region was found.
Entry 7) is the order the entry was emitted.
Entry 2) and 3) are required to distinguish files that have the same function name.
Entry 4) is required to distinguish different instances of the same declaration (usually templated ones)
Entries 5) and 6) are required to distinguish the particular target region in body of the function (it is possible that a given target region is not an entry point - if clause can evaluate always to zero - and therefore we need to identify the "interesting" target regions. )
This patch replaces http://reviews.llvm.org/D12306.
Reviewers: ABataev, hfinkel, tra, rjmccall, sfantao
Subscribers: FBrygidyn, piotr.rak, Hahnfeld, cfe-commits
Differential Revision: http://reviews.llvm.org/D12614
llvm-svn: 256842
#pragma omp parallel needs an implicit barrier that is currently done by an explicit call to __kmpc_barrier. However, the runtime already ensures a barrier in __kmpc_fork_call which currently leads to two barriers per region per thread.
Differential Revision: http://reviews.llvm.org/D15561
llvm-svn: 255992
OpenMP codegen tried to emit the code for its constructs even if it was detected as a dead-code. Added checks to ensure that the code is emitted if the code is not dead.
llvm-svn: 255990
OpenMP 4.5 adds directives 'taskloop' and 'taskloop simd'. These directives support clause 'num_tasks'. Patch adds parsing/semantic analysis for this clause.
llvm-svn: 255008
OpenMP 4.5 adds 'taksloop' and 'taskloop simd' directives, which have 'grainsize' clause. Patch adds parsing/sema analysis of this clause.
llvm-svn: 254903
OpenMP 4.5 adds 'taskloop' and 'taskloop simd' directives. These directives have new 'nogroup' clause. Patch adds basic parsing/sema support for this clause.
llvm-svn: 254899
Constructors and destructors may be represented by several functions
in IR. Only base structors correspond to source code, others are
small pieces of code and eventually call the base variant. In this
case instrumentation of non-base structors has little sense, this
fix remove it. Now profile data of a declaration corresponds to
exactly one function in IR, it agrees with the current logic of the
profile data loading.
This change fixes PR24996.
Differential Revision: http://reviews.llvm.org/D15158
llvm-svn: 254876
Summary:
This patch implements the 4.5 specification for the implicit data maps. OpenMP 4.5 specification changes the default way data is captured into a target region. All the non-aggregate kinds are passed by value by default. This required activating the capturing by value during SEMA for the target region. All the non-aggregate values that can be encoded in the size of a pointer are properly casted and forwarded to the runtime library. On top of fixing the previous weird behavior for mapping pointers in nested data regions (an explicit map was always required), this also improves performance as the number of allocations/transactions to the device per non-aggregate map are reduced from two to only one - instead of passing a reference and the value, only the value passed.
Explicit maps will be added later on once firstprivate, private, and map clauses' SEMA and parsing are available.
Reviewers: hfinkel, rjmccall, ABataev
Subscribers: cfe-commits, carlo.bertolli
Differential Revision: http://reviews.llvm.org/D14940
llvm-svn: 254521
OpenMP 4.5 defines new clause 'priority' for 'task', 'taskloop' and 'taskloop simd' directives. Added parsing and sema analysis for 'priority' clause in 'task' and 'taskloop' directives.
llvm-svn: 254398
This patch implements the outlining for offloading functions for code
annotated with the OpenMP target directive. It uses a temporary naming
of the outlined functions that will have to be updated later on once
target side codegen and registration of offloading libraries is
implemented - the naming needs to be made unique in the produced
library.
llvm-svn: 249148
Description.
If the simd clause is specified, the ordered regions encountered by any thread will use only a single SIMD lane to execute the ordered regions in the order of the loop iterations.
Restrictions.
An ordered construct with the simd clause is the only OpenMP construct that can appear in the simd region.
An ordered directive with ‘simd’ clause is generated as an outlined function and corresponding function call to prevent this part of code from vectorization later in backend.
llvm-svn: 248772
Parsing and sema analysis for 'simd' clause in 'ordered' directive.
Description
If the simd clause is specified, the ordered regions encountered by any thread will use only a single SIMD lane to execute the ordered
regions in the order of the loop iterations.
Restrictions
An ordered construct with the simd clause is the only OpenMP construct that can appear in the simd region
llvm-svn: 248696
OpenMP 4.1 extends format of '#pragma omp ordered'. It adds 3 additional clauses: 'threads', 'simd' and 'depend'.
If no clause is specified, the ordered construct behaves as if the threads clause had been specified. If the threads clause is specified, the threads in the team executing the loop region execute ordered regions sequentially in the order of the loop iterations.
The loop region to which an ordered region without any clause or with a threads clause binds must have an ordered clause without the parameter specified on the corresponding loop directive.
llvm-svn: 248569
Patch improves codegen for OpenMP constructs. If the OpenMP region does not have internal 'cancel' construct, a call to 'void __kmpc_barrier()' runtime function is generated for all implicit/explicit barriers. If the region has inner 'cancel' directive, then
```
if (__kmpc_cancel_barrier())
exit from outer construct;
```
code is generated.
Also, the code for 'canellation point' directive is not generated if parent directive does not have 'cancel' directive.
llvm-svn: 247681
Currently all variables used in OpenMP regions are captured into a record and passed to outlined functions in this record. It may result in some poor performance because of too complex analysis later in optimization passes. Patch makes to emit outlined functions for parallel-based regions with a list of captured variables. It reduces code for 2*n GEPs, stores and loads at least.
Codegen for task-based regions remains unchanged because runtime requires that all captured variables are passed in captured record.
llvm-svn: 247251
Introduce an Address type to bundle a pointer value with an
alignment. Introduce APIs on CGBuilderTy to work with Address
values. Change core APIs on CGF/CGM to traffic in Address where
appropriate. Require alignments to be non-zero. Update a ton
of code to compute and propagate alignment information.
As part of this, I've promoted CGBuiltin's EmitPointerWithAlignment
helper function to CGF and made use of it in a number of places in
the expression emitter.
The end result is that we should now be significantly more correct
when performing operations on objects that are locally known to
be under-aligned. Since alignment is not reliably tracked in the
type system, there are inherent limits to this, but at least we
are no longer confused by standard operations like derived-to-base
conversions and array-to-pointer decay. I've also fixed a large
number of bugs where we were applying the complete-object alignment
to a pointer instead of the non-virtual alignment, although most of
these were hidden by the very conservative approach we took with
member alignment.
Also, because IRGen now reliably asserts on zero alignments, we
should no longer be subject to an absurd but frustrating recurring
bug where an incomplete type would report a zero alignment and then
we'd naively do a alignmentAtOffset on it and emit code using an
alignment equal to the largest power-of-two factor of the offset.
We should also now be emitting much more aggressive alignment
attributes in the presence of over-alignment. In particular,
field access now uses alignmentAtOffset instead of min.
Several times in this patch, I had to change the existing
code-generation pattern in order to more effectively use
the Address APIs. For the most part, this seems to be a strict
improvement, like doing pointer arithmetic with GEPs instead of
ptrtoint. That said, I've tried very hard to not change semantics,
but it is likely that I've failed in a few places, for which I
apologize.
ABIArgInfo now always carries the assumed alignment of indirect and
indirect byval arguments. In order to cut down on what was already
a dauntingly large patch, I changed the code to never set align
attributes in the IR on non-byval indirect arguments. That is,
we still generate code which assumes that indirect arguments have
the given alignment, but we don't express this information to the
backend except where it's semantically required (i.e. on byvals).
This is likely a minor regression for those targets that did provide
this information, but it'll be trivial to add it back in a later
patch.
I partially punted on applying this work to CGBuiltin. Please
do not add more uses of the CreateDefaultAligned{Load,Store}
APIs; they will be going away eventually.
llvm-svn: 246985
Fix processing of shared variables with reference types in OpenMP constructs. Previously, if the variable was not marked in one of the private clauses, the reference to this variable was emitted incorrectly and caused an assertion later.
llvm-svn: 246846
Fixed codegen for extended format of 'if' clauses with special 'directive-name-modifier' + ast-print tests for extended format of 'if' clause.
llvm-svn: 246748
This replaces the filtered generic iterator with a type-specfic one based
on dyn_cast instead of comparing the kind enum. This allows us to use
range-based for loops and eliminates casts. No functionality change
intended.
llvm-svn: 246384
Add emission of metadata for simd loops in presence of 'simdlen' clause.
If 'simdlen' clause is provided without 'safelen' clause, the vectorizer width for the loop is set to value of 'simdlen' clause + all read/write ops in loop are marked with '!llvm.mem.parallel_loop_access' metadata.
If 'simdlen' clause is provided along with 'safelen' clause, the vectorizer width for the loop is set to value of 'simdlen' clause + all read/write ops in loop are not marked with '!llvm.mem.parallel_loop_access' metadata.
If 'safelen' clause is provided without 'simdlen' clause, the vectorizer width for the loop is set to value of 'safelen' clause + all read/write ops in loop are not marked with '!llvm.mem.parallel_loop_access' metadata.
llvm-svn: 245697
Add parsing/sema analysis for 'simdlen' clause in simd directives. Also add check that if both 'safelen' and 'simdlen' clauses are specified, the value of 'simdlen' parameter is less than the value of 'safelen' parameter.
llvm-svn: 245692
OpenMP 4.1 allows to use variables with reference types in all private clauses (private, firstprivate, lastprivate, linear etc.). Patch allows to use such variables and fixes codegen for linear variables with reference types.
llvm-svn: 245268
blender uses statements expression in condition of the loop under control of the '#pragma omp parallel for'. This condition is used several times in different expressions required for codegen of the loop directive. If there are some variables defined in statement expression, it fires an assert during codegen because of redefinition of the same variables.
We have to rebuild several expression to be sure that all variables are unique.
llvm-svn: 245041
Summary:
float_cast_overflow is the only UBSan check without a source location attached.
This patch propagates SourceLocations where necessary to get them to the
EmitCheck() call.
Reviewers: rsmith, ABataev, rjmccall
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D11757
llvm-svn: 244568
OpenMP 4.1 allows to use variables with reference types in private clauses and, therefore, in init expressions of the cannonical loop forms.
llvm-svn: 244209
The next code is generated for this construct:
```
if (__kmpc_cancellationpoint(ident_t *loc, kmp_int32 global_tid, kmp_int32 cncl_kind) != 0)
<exit from outer innermost construct>;
```
llvm-svn: 241239
If task directive has associated 'depend' clause then function kmp_int32 __kmpc_omp_task_with_deps ( ident_t *loc_ref, kmp_int32 gtid, kmp_task_t * new_task, kmp_int32 ndeps, kmp_depend_info_t *dep_list,kmp_int32 ndeps_noalias, kmp_depend_info_t *noalias_dep_list) must be called instead of __kmpc_omp_task().
If this directive has associated 'if' clause then also before a call of kmpc_omp_task_begin_if0() a function void __kmpc_omp_wait_deps ( ident_t *loc_ref, kmp_int32 gtid, kmp_int32 ndeps, kmp_depend_info_t *dep_list, kmp_int32 ndeps_noalias, kmp_depend_info_t *noalias_dep_list) must be called.
Array sections are not supported yet.
llvm-svn: 240532
Parsing and sema analysis (without support for array sections in arguments) for 'depend' clause (used in 'task' directive, OpenMP 4.0).
llvm-svn: 240409
Added parsing, sema analysis and codegen for '#pragma omp taskgroup' directive (OpenMP 4.0).
The code for directive is generated the following way:
#pragma omp taskgroup
<body>
void __kmpc_taskgroup(<loc>, thread_id);
<body>
void __kmpc_end_taskgroup(<loc>, thread_id);
llvm-svn: 240011
Added codegen for combined 'omp for simd' directives, that is a combination of 'omp for' directive followed by 'omp simd' directive. Includes support for all clauses.
llvm-svn: 239990
The following code is generated for reduction clause within 'omp simd' loop construct:
#pragma omp simd reduction(op:var)
for (...)
<body>
alloca priv_var
priv_var = <initial reduction value>;
<loop_start>:
<body> // references to original 'var' are replaced by 'priv_var'
<loop_end>:
var op= priv_var;
llvm-svn: 239881
Previously the last iteration for simd loop-based OpenMP constructs were generated as a separate code. This feature is not required and codegen is simplified.
llvm-svn: 239810
If loop control variable in a worksharing construct is marked as lastprivate, we should copy last calculated value of private counter back to original variable.
llvm-svn: 237879
This modification generates proper copyin/initialization sequences for array variables/parameters. Before they were considered as pointers, not arrays.
llvm-svn: 237691
'schedule' clause for combined directives requires additional processing. Special helper variable is generated, that is captured in the outlined parallel region for 'parallel for' region. This captured variable is used to store chunk expression from the 'schedule' clause in this 'parallel for' region.
llvm-svn: 237100
Inner bodies of OpenMP worksharing loop-based constructs with dynamic or guided scheduling are allowed to be marked with !llvm.mem.parallel_loop_access metadata for better optimization. Worksharing constructs with static scheduling cannot be marked this way (according to OpenMP standard "A data dependence between the same logical iterations in two such loops is guaranteed").
Constructs with auto and runtime scheduling are also not marked because automatically chosen scheduling may be static also.
Differential Revision: http://reviews.llvm.org/D9518
llvm-svn: 236693
For tasks codegen for private/firstprivate variables are different rather than for other directives.
1. Build an internal structure of privates for each private variable:
struct .kmp_privates_t. {
Ty1 var1;
...
Tyn varn;
};
2. Add a new field to kmp_task_t type with list of privates.
struct kmp_task_t {
void * shareds;
kmp_routine_entry_t routine;
kmp_int32 part_id;
kmp_routine_entry_t destructors;
.kmp_privates_t. privates;
};
3. Create a function with destructors calls for all privates after end of task region.
kmp_int32 .omp_task_destructor.(kmp_int32 gtid, kmp_task_t *tt) {
~Destructor(&tt->privates.var1);
...
~Destructor(&tt->privates.varn);
return 0;
}
4. Perform initialization of all firstprivate fields (by simple copying for POD data, copy constructor calls for classes) + provide address of a destructor function after kmpc_omp_task_alloc() and before kmpc_omp_task() calls.
kmp_task_t *new_task = __kmpc_omp_task_alloc(ident_t *, kmp_int32 gtid, kmp_int32 flags, size_t sizeof_kmp_task_t, size_t sizeof_shareds, kmp_routine_entry_t *task_entry);
CopyConstructor(new_task->privates.var1, *new_task->shareds.var1_ref);
new_task->shareds.var1_ref = &new_task->privates.var1;
...
CopyConstructor(new_task->privates.varn, *new_task->shareds.varn_ref);
new_task->shareds.varn_ref = &new_task->privates.varn;
new_task->destructors = .omp_task_destructor.;
kmp_int32 __kmpc_omp_task(ident_t *, kmp_int32 gtid, kmp_task_t *new_task)
Differential Revision: http://reviews.llvm.org/D9370
llvm-svn: 236479
For tasks codegen for private/firstprivate variables are different rather than for other directives.
1. Build an internal structure of privates for each private variable:
struct .kmp_privates_t. {
Ty1 var1;
...
Tyn varn;
};
2. Add a new field to kmp_task_t type with list of privates.
struct kmp_task_t {
void * shareds;
kmp_routine_entry_t routine;
kmp_int32 part_id;
kmp_routine_entry_t destructors;
.kmp_privates_t. privates;
};
3. Create a function with destructors calls for all privates after end of task region.
kmp_int32 .omp_task_destructor.(kmp_int32 gtid, kmp_task_t *tt) {
~Destructor(&tt->privates.var1);
...
~Destructor(&tt->privates.varn);
return 0;
}
4. Perform default initialization of all private fields (no initialization for POD data, default constructor calls for classes) + provide address of a destructor function after kmpc_omp_task_alloc() and before kmpc_omp_task() calls.
kmp_task_t *new_task = __kmpc_omp_task_alloc(ident_t *, kmp_int32 gtid, kmp_int32 flags, size_t sizeof_kmp_task_t, size_t sizeof_shareds, kmp_routine_entry_t *task_entry);
DefaultConstructor(new_task->privates.var1);
new_task->shareds.var1_ref = &new_task->privates.var1;
...
DefaultConstructor(new_task->privates.varn);
new_task->shareds.varn_ref = &new_task->privates.varn;
new_task->destructors = .omp_task_destructor.;
kmp_int32 __kmpc_omp_task(ident_t *, kmp_int32 gtid, kmp_task_t *new_task)
Differential Revision: http://reviews.llvm.org/D9322
llvm-svn: 236207
Emit the following code for 'taskwait' directive within tied task:
call i32 @__kmpc_omp_taskwait(<loc>, i32 <thread_id>);
Differential Revision: http://reviews.llvm.org/D9245
llvm-svn: 235836
Emit a code for reduction clause. Next code should be emitted for reductions:
static kmp_critical_name lock = { 0 };
void reduce_func(void *lhs[<n>], void *rhs[<n>]) {
*(Type0*)lhs[0] = ReductionOperation0(*(Type0*)lhs[0], *(Type0*)rhs[0]);
...
*(Type<n>-1*)lhs[<n>-1] =
ReductionOperation<n>-1(*(Type<n>-1*)lhs[<n>-1],
*(Type<n>-1*)rhs[<n>-1]);
}
...
void *RedList[<n>] = {&<RHSExprs>[0], ..., &<RHSExprs>[<n>-1]};
switch (__kmpc_reduce{_nowait}(<loc>, <gtid>, <n>, sizeof(RedList), RedList, reduce_func, &<lock>)) {
case 1:
<LHSExprs>[0] = ReductionOperation0(*<LHSExprs>[0], *<RHSExprs>[0]);
...
<LHSExprs>[<n>-1] = ReductionOperation<n>-1(*<LHSExprs>[<n>-1], *<RHSExprs>[<n>-1]);
__kmpc_end_reduce{_nowait}(<loc>, <gtid>, &<lock>);
break;
case 2:
Atomic(<LHSExprs>[0] = ReductionOperation0(*<LHSExprs>[0], *<RHSExprs>[0]));
...
Atomic(<LHSExprs>[<n>-1] = ReductionOperation<n>-1(*<LHSExprs>[<n>-1], *<RHSExprs>[<n>-1]));
break;
default:;
}
Reduction variables are a kind of a private variables, they have private copies, but initial values are chosen in accordance with the reduction operation.
If sections directive has only single section, then original shared variables are used instead with barrier at the end of the directive.
Differential Revision: http://reviews.llvm.org/D9242
llvm-svn: 235835
#pragma omp sections lastprivate(<var>)
<BODY>;
This construct is translated into something like:
<last_iter> = alloca i32
<init for lastprivates>;
<last_iter> = 0
; No initializer for simple variables or a default constructor is called for objects.
; For arrays perform element by element initialization by the call of the default constructor.
...
OMP_FOR_START(...,<last_iter>, ..); sets <last_iter> to 1 if this is the last iteration.
<BODY>
...
OMP_FOR_END
if (<last_iter> != 0) {
<final copy for lastprivate>; Update original variable with the lastprivate value.
}
call __kmpc_cancel_barrier() ; an implicit barrier to avoid possible data race.
If there is only one section, there is no special code generation, original shared variables are used + barrier is emitted at the end of the directive.
Differential Revision: http://reviews.llvm.org/D9240
llvm-svn: 235834
If there are 2 or more sections in a 'section' directive the following code is generated:
<default init for privates>
@__kmpc_for_static_init_4();
<BODY for sections directive>
@__kmpc_for_static_fini()
If there is only one section, the following code is generated:
if (@__kmpc_single()) {
<default init for privates>
@__kmpc_end_single();
}
Differential Revision: http://reviews.llvm.org/D9239
llvm-svn: 235833
Emit the following code for 'single' directive with 'private' clause:
if (@__kmpc_single()) {
<default init for privates>
@__kmpc_end_single();
}
Differential Revision: http://reviews.llvm.org/D9238
llvm-svn: 235832
Emit the following code for 'single' directive with 'firtstprivate' clause:
if (@__kmpc_single()) {
<init for firstprivates>
@__kmpc_end_single();
}
@__kmpc_cancel_barrier(); // To avoid data race in firstprivate init
Differential Revision: http://reviews.llvm.org/D9223
llvm-svn: 235694
Runtime function for 'copyprivate' directive generates implicit barriers, so no need to emit it.
Differential Revision: http://reviews.llvm.org/D9215
llvm-svn: 235692
If there are 2 or more sections in a 'section' directive the following code is generated:
<init for firstprivates>
@__kmpc_cancel_barrier();// To avoid data race in firstprivate init
@__kmpc_for_static_init_4();
<BODY for sections directive>
@__kmpc_for_static_fini()
If there is only one section, the following code is generated:
if (@__kmpc_single()) {
<init for firstprivates>
@__kmpc_end_single();
}
@__kmpc_cancel_barrier(); // To avoid data race in firstprivate init
Differential Revision: http://reviews.llvm.org/D9214
llvm-svn: 235691
The RegionCounter type does a lot of legwork, but most of it is only
meaningful within the implementation of CodeGenPGO. The uses elsewhere
in CodeGen generally just want to increment or read counters, so do
that directly.
llvm-svn: 235664
Adds codegen for 'atomic capture' constructs with the following forms of expressions/statements:
v = x binop= expr;
v = x++;
v = ++x;
v = x--;
v = --x;
v = x = x binop expr;
v = x = expr binop x;
{v = x; x = binop= expr;}
{v = x; x++;}
{v = x; ++x;}
{v = x; x--;}
{v = x; --x;}
{x = x binop expr; v = x;}
{x binop= expr; v = x;}
{x++; v = x;}
{++x; v = x;}
{x--; v = x;}
{--x; v = x;}
{x = x binop expr; v = x;}
{x = expr binop x; v = x;}
{v = x; x = expr;}
If x and expr are integer and binop is associative or x is a LHS in a RHS of the assignment expression, and atomics are allowed for type of x on the target platform atomicrmw instruction is emitted.
Otherwise compare-and-swap sequence is emitted.
Update of 'v' is not required to be be atomic with respect to the read or write of the 'x'.
bb:
...
atomic load <x>
cont:
<expected> = phi [ <x>, label %bb ], [ <new_failed>, %cont ]
<desired> = <expected> binop <expr>
<res> = cmpxchg atomic &<x>, desired, expected
<new_failed> = <res>.field1;
br <res>field2, label %exit, label %cont
exit:
atomic store <old/new x>, <v>
...
Differential Revision: http://reviews.llvm.org/D9049
llvm-svn: 235573
If condition evaluates to true, the code executes task by calling @__kmpc_omp_task() runtime function.
If condition evaluates to false, the code executes serial version of the code by executing the following code:
call void @__kmpc_omp_task_begin_if0(<loc>, <threadid>, <task_t_ptr, returned by @__kmpc_omp_task_alloc()>);
proxy_task_entry(<gtid>, <task_t_ptr, returned by @__kmpc_omp_task_alloc()>);
call void @__kmpc_omp_task_complete_if0(<loc>, <threadid>, <task_t_ptr, returned by @__kmpc_omp_task_alloc()>);
Also it checks if the condition is constant and if it is constant it evaluates its value and then generates either parallel version of the code (if the condition evaluates to true), or the serial version of the code (if the condition evaluates to false).
Differential Revision: http://reviews.llvm.org/D9143
llvm-svn: 235507
This patch generates helper variables which used as a private copies of the corresponding original variables inside an OpenMP 'for' directive. These generated variables are initialized by default (with the default constructor, if any). In OpenMP region references to original variables are replaced by the references to these private helper variables.
Differential Revision: http://reviews.llvm.org/D9106
llvm-svn: 235503
Patch fixes bugs in codegen for loops with unsigned counters and zero trip count. Previously preconditions for all loops were built using logic (Upper - Lower) > 0. But if the loop is a loop with zero trip count, then Upper - Lower is < 0 only for signed integer, for unsigned we're running into an underflow situation.
In this patch we're using original Lower<Upper condition to check that loop body can be executed at least once. Also this allows to skip code generation for loops, if it is known that preconditions for the loop are always false.
Differential Revision: http://reviews.llvm.org/D9103
llvm-svn: 235500
Add codegen for 'ordered' directive:
__kmpc_ordered(ident_t *, gtid);
<associated statement>;
__kmpc_end_ordered(ident_t *, gtid);
Also for 'for' directives with the dynamic scheduling and an 'ordered' clause added a call to '__kmpc_dispatch_fini_(4|8)[u]()' function after increment expression for loop control variable:
while(__kmpc_dispatch_next(&LB, &UB)) {
idx = LB;
while (idx <= UB) { BODY; ++idx;
__kmpc_dispatch_fini_(4|8)[u](); // For ordered loops only.
} // inner loop
}
Differential Revision: http://reviews.llvm.org/D9070
llvm-svn: 235496
Emits the following code for the clause at the beginning of the outlined function for implicit threads:
if (<not a master thread>) {
...
<thread local copy of var> = <master thread local copy of var>;
...
}
<sync point>;
Checking for a non-master thread is performed by comparing of the address of the thread local variable with the address of the master's variable. Master thread always uses original variables, so you always know the address of the variable in the master thread.
Differential Revision: http://reviews.llvm.org/D9026
llvm-svn: 235075
#pragma omp for lastprivate(<var>)
for (i = a; i < b; ++b)
<BODY>;
This construct is translated into something like:
<last_iter> = alloca i32
<lastprivate_var> = alloca <type>
<last_iter> = 0
; No initializer for simple variables or a default constructor is called for objects.
; For arrays perform element by element initialization by the call of the default constructor.
...
OMP_FOR_START(...,<last_iter>, ..); sets <last_iter> to 1 if this is the last iteration.
<BODY>
...
OMP_FOR_END
if (<last_iter> != 0) {
<var> = <lastprivate_var> ; Update original variable with the lastprivate value.
}
call __kmpc_cancel_barrier() ; an implicit barrier to avoid possible data race.
Differential Revision: http://reviews.llvm.org/D8658
llvm-svn: 235074
Adds proper codegen for 'firstprivate' clause in for directive. Initially codegen for 'firstprivate' clause was implemented for 'parallel' directive only.
Also this patch emits sync point only after initialization of firstprivate variables, not all private variables. This sync point is not required for privates, lastprivates etc., only for initialization of firstprivate variables.
Differential Revision: http://reviews.llvm.org/D8660
llvm-svn: 234978
Fixed a bug with codegen of variables with array types specified in 'copyprivate' clause of 'single' directive.
Differential Revision: http://reviews.llvm.org/D8914
llvm-svn: 234856
Adds atomic update codegen for the following forms of expressions:
x binop= expr;
x++;
++x;
x--;
--x;
x = x binop expr;
x = expr binop x;
If x and expr are integer and binop is associative or x is a LHS in a RHS of the assignment expression, and atomics are allowed for type of x on the target platform atomicrmw instruction is emitted.
Otherwise compare-and-swap sequence is emitted:
bb:
...
atomic load <x>
cont:
<expected> = phi [ <x>, label %bb ], [ <new_failed>, %cont ]
<desired> = <expected> binop <expr>
<res> = cmpxchg atomic &<x>, desired, expected
<new_failed> = <res>.field1;
br <res>field2, label %exit, label %cont
exit:
...
Differential Revision: http://reviews.llvm.org/D8536
llvm-svn: 233513
Replace boolean IsExplicit parameter of OpenMPRuntime::emitBarrierCall() method by OpenMPDirectiveKind Kind for better compatibility with the runtime library. Also add processing of 'nowait' clause on worksharing directives.
Differential Revision: http://reviews.llvm.org/D8659
llvm-svn: 233511
If there is at least one 'copyprivate' clause is associated with the single directive, the following code is generated:
```
i32 did_it = 0; \\ for 'copyprivate' clause
if(__kmpc_single(ident_t *, gtid)) {
SingleOpGen();
__kmpc_end_single(ident_t *, gtid);
did_it = 1; \\ for 'copyprivate' clause
}
<copyprivate_list>[0] = &var0;
...
<copyprivate_list>[n] = &varn;
call __kmpc_copyprivate(ident_t *, gtid, <copyprivate_list_size>,
<copyprivate_list>, <copy_func>, did_it);
...
void<copy_func>(void *LHSArg, void *RHSArg) {
Dst = (void * [n])(LHSArg);
Src = (void * [n])(RHSArg);
Dst[0] = Src[0];
... Dst[n] = Src[n];
}
```
All list items from all 'copyprivate' clauses are gathered into single <copyprivate list> (<copyprivate_list_size> is a size in bytes of this list) and <copy_func> is used to propagate values of private or threadprivate variables from the 'single' region to other implicit threads from outer 'parallel' region.
Differential Revision: http://reviews.llvm.org/D8410
llvm-svn: 232932
The linear variable is privatized (similar to 'private') and its
value on current iteration is calculated, similar to the loop
counter variables.
Differential revision: http://reviews.llvm.org/D8375
llvm-svn: 232890
This patch allows using of ExprWithCleanups expressions and other complex expressions in 'omp atomic' construct
Differential Revision: http://reviews.llvm.org/D8200
llvm-svn: 231905
The task region is emmitted in several steps:
Emit a call to kmp_task_t *__kmpc_omp_task_alloc(ident_t *, kmp_int32 gtid, kmp_int32 flags, size_t sizeof_kmp_task_t, size_t sizeof_shareds, kmp_routine_entry_t *task_entry).
Here task_entry is a pointer to the function:
kmp_int32 .omp_task_entry.(kmp_int32 gtid, kmp_task_t *tt) {
TaskFunction(gtid, tt->part_id, tt->shareds);
return 0;
}
Copy a list of shared variables to field shareds of the resulting structure kmp_task_t returned by the previous call (if any).
Copy a pointer to destructions function to field destructions of the resulting structure kmp_task_t.
Emit a call to kmp_int32 __kmpc_omp_task(ident_t *, kmp_int32 gtid, kmp_task_t *new_task), where new_task is a resulting structure from previous items.
Differential Revision: http://reviews.llvm.org/D7560
llvm-svn: 231762
Patch adds proper generation of debug info for all OpenMP regions. Also, all OpenMP regions are generated in a termination scope, because standard does not allow to throw exceptions out of structured blocks, associated with the OpenMP regions
Differential Revision: http://reviews.llvm.org/D7935
llvm-svn: 231757
This reverts commit r231752.
It was failing to link with cmake:
lib64/libclangCodeGen.a(CGOpenMPRuntime.cpp.o):/home/espindola/llvm/llvm/tools/clang/lib/CodeGen/CGOpenMPRuntime.cpp:function clang::CodeGen::InlinedOpenMPRegionRAII::~InlinedOpenMPRegionRAII(): error: undefined reference to 'clang::CodeGen::EHScopeStack::popTerminate()'
clang-3.7: error: linker command failed with exit code 1 (use -v to see invocation)
llvm-svn: 231754
Patch adds proper generation of debug info for all OpenMP regions. Also, all OpenMP regions are generated in a termination scope, because standard does not allow to throw exceptions out of structured blocks, associated with the OpenMP regions
Differential Revision: http://reviews.llvm.org/D7935
llvm-svn: 231752
For global reg lvalue - use regular store through global register.
For simple lvalue - use simple atomic store.
For bitfields, vector element, extended vector elements - the original value of the whole storage (for vector elements) or of some aligned value (for bitfields) is atomically read, the part of this value for the given lvalue is modified and then use atomic compare-and-exchange operation to try to atomically write modified value (if it was not modified).
Also, changes in this patch fix the bug for '#pragma omp atomic read' applied to extended vector elements.
Differential Revision: http://reviews.llvm.org/D7369
llvm-svn: 230736
The /volatile:ms semantics turn volatile loads and stores into atomic
acquire and release operations. This distinction is important because
volatile memory operations do not form a happens-before relationship
with non-atomic memory. This means that a volatile store is not
sufficient for implementing a mutex unlock routine.
Differential Revision: http://reviews.llvm.org/D7580
llvm-svn: 229082
This patch emits the following code for the single directive:
#pragma omp single
<body>
<---->
if(__kmpc_single(...)) {
<body>
__kmpc_end_single(...);
}
Differential Revision: http://reviews.llvm.org/D7045
llvm-svn: 228275
For 'taskyield' directive emit call to kmp_int32 __kmpc_omp_taskyield(ident_t *,
kmp_int32 global_tid, int end_part); runtime function call with end_part arg set
to 0 (it is ignored).
Differential Revision: http://reviews.llvm.org/D7047
llvm-svn: 228272
distinction between the different use-cases. With the previous default
behavior we would occasionally emit empty debug locations in situations
where they actually were strictly required (= on invoke insns).
We now have a choice between defaulting to an empty location or an
artificial location.
Specifically, this fixes a bug caused by a missing debug location when
emitting C++ EH cleanup blocks from within an artificial function, such as
an ObjC destroy helper function.
rdar://problem/19670595
llvm-svn: 228003
"omp atomic read [seq_cst]" accepts expressions "v=x;". In this patch we perform
an atomic load of "x" (using builtin atomic loading instructions or a call to
"atomic_load()" for simple lvalues and "kmpc_atomic_start();load
<x>;kmpc_atomic_end();" for other lvalues), convert the result of loading to
type of "v" (using EmitScalarConversion() for simple types and
EmitComplexToScalarConversion() for conversions from complex to scalar) and then
store the result in "v".)
Differential Revision: http://reviews.llvm.org/D6431
llvm-svn: 226788
"omp atomic read [seq_cst]" accepts expressions "v=x;". In this patch we perform
an atomic load of "x" (using builtin atomic loading instructions or a call to
"atomic_load()" for simple lvalues and "kmpc_atomic_start();load
<x>;kmpc_atomic_end();" for other lvalues), convert the result of loading to
type of "v" (using EmitScalarConversion() for simple types and
EmitComplexToScalarConversion() for conversions from complex to scalar) and then
store the result in "v".)
Differential Revision: http://reviews.llvm.org/D6431
llvm-svn: 226786
"omp atomic read [seq_cst]" accepts expressions "v=x;". In this patch we perform
an atomic load of "x" (using builtin atomic loading instructions or a call to
"atomic_load()" for simple lvalues and "kmpc_atomic_start();load
<x>;kmpc_atomic_end();" for other lvalues), convert the result of loading to
type of "v" (using EmitScalarConversion() for simple types and
EmitComplexToScalarConversion() for conversions from complex to scalar) and then
store the result in "v".
Differential Revision: http://reviews.llvm.org/D6431
llvm-svn: 226784
Sorry for the noise, I managed to miss a bunch of recent regressions of
include orderings here. This should actually sort all the includes for
Clang. Again, no functionality changed, this is just a mechanical
cleanup that I try to run periodically to keep the #include lines as
regular as possible across the project.
llvm-svn: 225979
Several pieces of code were relying on implicit debug location setting
which usually lead to incorrect line information anyway. So I've fixed
those (in r225955 and r225845) separately which should pave the way for
this commit to be cleanly reapplied.
The reason these implicit dependencies resulted in crashes with this
patch is that the debug location would no longer implicitly leak from
one place to another, but be set back to invalid. Once a call with
no/invalid location was emitted, if that call was ever inlined it could
produce invalid debugloc chains and assert during LLVM's codegen.
There may be further cases of such bugs in this patch - they're hard to
flush out with regression testing, so I'll keep an eye out for reports
and investigate/fix them ASAP if they come up.
Original commit message:
Reapply "DebugInfo: Generalize debug info location handling"
Originally committed in r224385 and reverted in r224441 due to concerns
this change might've introduced a crash. Turns out this change fixes the
crash introduced by one of my earlier more specific location handling
changes (those specific fixes are reverted by this patch, in favor of
the more general solution).
Recommitted in r224941 and reverted in r224970 after it caused a crash
when building compiler-rt. Looks to be due to this change zeroing out
the debug location when emitting default arguments (which were meant to
inherit their outer expression's location) thus creating call
instructions without locations - these create problems for inlining and
must not be created. That is fixed and tested in this version of the
change.
Original commit message:
This is a more scalable (fixed in mostly one place, rather than many
places that will need constant improvement/maintenance) solution to
several commits I've made recently to increase source fidelity for
subexpressions.
This resetting had to be done at the DebugLoc level (not the
SourceLocation level) to preserve scoping information (if the resetting
was done with CGDebugInfo::EmitLocation, it would've caused the tail end
of an expression's codegen to end up in a potentially different scope
than the start, even though it was at the same source location). The
drawback to this is that it might leave CGDebugInfo out of sync. Ideally
CGDebugInfo shouldn't have a duplicate sense of the current
SourceLocation, but for now it seems it does... - I don't think I'm
going to tackle removing that just now.
I expect this'll probably cause some more buildbot fallout & I'll
investigate that as it comes up.
Also these sort of improvements might be starting to show a weakness/bug
in LLVM's line table handling: we don't correctly emit is_stmt for
statements, we just put it on every line table entry. This means one
statement split over multiple lines appears as multiple 'statements' and
two statements on one line (without column info) are treated as one
statement.
I don't think we have any IR representation of statements that would
help us distinguish these cases and identify the beginning of each
statement - so that might be something we need to add (possibly to the
lexical scope chain - a scope for each statement). This does cause some
problems for GDB and possibly other DWARF consumers.
llvm-svn: 225956
This reverts commit r225000, r225021, r225083, r225086, r225090.
The root change (r225000) still has several issues where it's caused
calls to be emitted without debug locations. This causes assertion
failures if/when those calls are inlined.
I'll work up some test cases and fixes before recommitting this.
llvm-svn: 225555
Originally committed in r224385 and reverted in r224441 due to concerns
this change might've introduced a crash. Turns out this change fixes the
crash introduced by one of my earlier more specific location handling
changes (those specific fixes are reverted by this patch, in favor of
the more general solution).
Recommitted in r224941 and reverted in r224970 after it caused a crash
when building compiler-rt. Looks to be due to this change zeroing out
the debug location when emitting default arguments (which were meant to
inherit their outer expression's location) thus creating call
instructions without locations - these create problems for inlining and
must not be created. That is fixed and tested in this version of the
change.
Original commit message:
This is a more scalable (fixed in mostly one place, rather than many
places that will need constant improvement/maintenance) solution to
several commits I've made recently to increase source fidelity for
subexpressions.
This resetting had to be done at the DebugLoc level (not the
SourceLocation level) to preserve scoping information (if the resetting
was done with CGDebugInfo::EmitLocation, it would've caused the tail end
of an expression's codegen to end up in a potentially different scope
than the start, even though it was at the same source location). The
drawback to this is that it might leave CGDebugInfo out of sync. Ideally
CGDebugInfo shouldn't have a duplicate sense of the current
SourceLocation, but for now it seems it does... - I don't think I'm
going to tackle removing that just now.
I expect this'll probably cause some more buildbot fallout & I'll
investigate that as it comes up.
Also these sort of improvements might be starting to show a weakness/bug
in LLVM's line table handling: we don't correctly emit is_stmt for
statements, we just put it on every line table entry. This means one
statement split over multiple lines appears as multiple 'statements' and
two statements on one line (without column info) are treated as one
statement.
I don't think we have any IR representation of statements that would
help us distinguish these cases and identify the beginning of each
statement - so that might be something we need to add (possibly to the
lexical scope chain - a scope for each statement). This does cause some
problems for GDB and possibly other DWARF consumers.
llvm-svn: 225000
Originally committed in r224385 and reverted in r224441 due to concerns
this change might've introduced a crash. Turns out this change fixes the
crash introduced by one of my earlier more specific location handling
changes (those specific fixes are reverted by this patch, in favor of
the more general solution).
Original commit message:
This is a more scalable (fixed in mostly one place, rather than many
places that will need constant improvement/maintenance) solution to
several commits I've made recently to increase source fidelity for
subexpressions.
This resetting had to be done at the DebugLoc level (not the
SourceLocation level) to preserve scoping information (if the resetting
was done with CGDebugInfo::EmitLocation, it would've caused the tail end
of an expression's codegen to end up in a potentially different scope
than the start, even though it was at the same source location). The
drawback to this is that it might leave CGDebugInfo out of sync. Ideally
CGDebugInfo shouldn't have a duplicate sense of the current
SourceLocation, but for now it seems it does... - I don't think I'm
going to tackle removing that just now.
I expect this'll probably cause some more buildbot fallout & I'll
investigate that as it comes up.
Also these sort of improvements might be starting to show a weakness/bug
in LLVM's line table handling: we don't correctly emit is_stmt for
statements, we just put it on every line table entry. This means one
statement split over multiple lines appears as multiple 'statements' and
two statements on one line (without column info) are treated as one
statement.
I don't think we have any IR representation of statements that would
help us distinguish these cases and identify the beginning of each
statement - so that might be something we need to add (possibly to the
lexical scope chain - a scope for each statement). This does cause some
problems for GDB and possibly other DWARF consumers.
llvm-svn: 224941
This is a more scalable (fixed in mostly one place, rather than many
places that will need constant improvement/maintenance) solution to
several commits I've made recently to increase source fidelity for
subexpressions.
This resetting had to be done at the DebugLoc level (not the
SourceLocation level) to preserve scoping information (if the resetting
was done with CGDebugInfo::EmitLocation, it would've caused the tail end
of an expression's codegen to end up in a potentially different scope
than the start, even though it was at the same source location). The
drawback to this is that it might leave CGDebugInfo out of sync. Ideally
CGDebugInfo shouldn't have a duplicate sense of the current
SourceLocation, but for now it seems it does... - I don't think I'm
going to tackle removing that just now.
I expect this'll probably cause some more buildbot fallout & I'll
investigate that as it comes up.
Also these sort of improvements might be starting to show a weakness/bug
in LLVM's line table handling: we don't correctly emit is_stmt for
statements, we just put it on every line table entry. This means one
statement split over multiple lines appears as multiple 'statements' and
two statements on one line (without column info) are treated as one
statement.
I don't think we have any IR representation of statements that would
help us distinguish these cases and identify the beginning of each
statement - so that might be something we need to add (possibly to the
lexical scope chain - a scope for each statement). This does cause some
problems for GDB and possibly other DWARF consumers.
llvm-svn: 224385
Currently, if global variable is marked as a private OpenMP variable, the compiler crashes in debug version or generates incorrect code in release version. It happens because in the OpenMP region the original global variable is used instead of the generated private copy. It happens because currently globals variables are not captured in the OpenMP region.
This patch adds capturing of global variables iff private copy of the global variable must be used in the OpenMP region.
Differential Revision: http://reviews.llvm.org/D6259
llvm-svn: 224323
the simplest case, which is used when no chunk_size is specified in
the schedule(static) or no 'schedule' clause is specified - the
iteration space is divided by the library into chunks that are
approximately equal in size, and at most one chunk is distributed
to each thread. In this case, we do not need an outer loop in each
thread - each thread requests once which iterations range it should
handle (using __kmpc_for_static_init runtime call) and then runs the
inner loop on this range.
Differential Revision: http://reviews.llvm.org/D5865
llvm-svn: 224233
Adds generation of call to "i32 kmpc_cancel_barrier(ident_t *, i32)" libcall for explicitly specified barriers (OMP_IDENT_BARRIER_EXPL flag is added to "flags" field of "ident_t" structure).
Also this patch replaces all calls to "kmpc_barrier" function by calls of "__kmpc_cancel_barrier" function which provides additional functionality for OpenMP 4.0.
Also, library specific enum OpenMPLocationFlags moved to private section of CGOpenMPRuntime class to make it more independent from library implementation.
Differential Revision: http://reviews.llvm.org/D6447
llvm-svn: 223444
For each "omp flush" directive a call to "void kmpc_flush(ident_t *, ...)" function is generated.
Directive "omp flush" may have an associated list of variables to flush, but currently runtime function ignores them. So the patch generates just "call kmpc_flush(ident_t *<loc>, i32 0)".
Differential Revision: http://reviews.llvm.org/D6292
llvm-svn: 222409
This patch generates some helper variables which used as a private copies of the corresponding original variables inside an OpenMP 'parallel' directive. These generated variables are initialized by default (with the default constructor, if any). In outlined function references to original variables are replaced by the references to these private helper variables. At the end of the initialization of the private variables and implicit barier is set by calling __kmpc_barrier(...) runtime function to be sure that all threads were initialized using original values of the variables.
Differential Revision: http://reviews.llvm.org/D4752
llvm-svn: 220262
This patch generates call to "kmpc_push_num_threads(ident_t *loc, kmp_int32 global_tid, kmp_int32 num_threads);" library function before calling "kmpc_fork_call" each time there is an associated "num_threads" clause in the "omp parallel" directive.
Differential Revision: http://reviews.llvm.org/D5145
llvm-svn: 219599
Adds codegen for 'if' clause. Currently only for 'if' clause used with the 'parallel' directive.
If condition evaluates to true, the code executes parallel version of the code by calling __kmpc_fork_call(loc, 1, microtask, captured_struct/*context*/), where loc - debug location, 1 - number of additional parameters after "microtask" argument, microtask - is outlined finction for the code associated with the 'parallel' directive, captured_struct - list of variables captured in this outlined function.
If condition evaluates to false, the code executes serial version of the code by executing the following code:
global_thread_id.addr = alloca i32
store i32 global_thread_id, global_thread_id.addr
zero.addr = alloca i32
store i32 0, zero.addr
kmpc_serialized_parallel(loc, global_thread_id);
microtask(global_thread_id.addr, zero.addr, captured_struct/*context*/);
kmpc_end_serialized_parallel(loc, global_thread_id);
Where loc - debug location, global_thread_id - global thread id, returned by __kmpc_global_thread_num() call or passed as a first parameter in microtask() call, global_thread_id.addr - address of the variable, where stored global_thread_id value, zero.addr - implicit bound thread id (should be set to 0 for serial call), microtask() and captured_struct are the same as in parallel call.
Also this patch checks if the condition is constant and if it is constant it evaluates its value and then generates either parallel version of the code (if the condition evaluates to true), or the serial version of the code (if the condition evaluates to false).
Differential Revision: http://reviews.llvm.org/D4716
llvm-svn: 219597
Moved CGOpenMPRegionInfo from CGOpenMPRuntime.h to CGOpenMPRuntime.cpp file and reworked the code for this change. Also added processing of ThreadID variable passed as an argument in outlined functions in parallel and task directives.
llvm-svn: 219490
This patch makes class OMPPrivateScope a common class for all private variables. Reworked processing of firstprivate variables (now it is based on OMPPrivateScope too).
llvm-svn: 219486