Summary:
Different devices may in some cases require different code generation schemes in order to implement OpenMP. This is required not only for performance reasons, but also because it may not be possible to have the current (default) implementation working for these devices. E.g. GPU's cannot implement the same scheme a target such as powerpc or x86b would use, in the sense that it does not have the ability to fork threads, instead all the threads are always executing and need to be managed by the implementation.
This patch proposes a reorganization of the code in the OpenMP code generation to pave the way to have specialized implementation of OpenMP support. More than a "real" patch this is more a request for comments in order to understand if what is proposed is acceptable or if there are better/easier ways to do it.
In this patch part of the common OpenMP codegen infrastructure is moved to a new file under a new namespace (CGOpenMPCommon) so it can be shared between the default implementation and the specialized one. When CGOpenMPRuntime is created, an attempt to select a specialized implementation is done.
In the patch a specialization for nvptx targets is done which currently checks if the target is an OpenMP device and trap if it is not.
Let me know comments suggestions you may have.
Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev
Subscribers: Hahnfeld, cfe-commits, fraggamuffin, caomhin, jholewinski
Differential Revision: http://reviews.llvm.org/D16784
llvm-svn: 259977
Codegen for array sections/array subscripts worked only for expressions with arrays as base. Patch fixes codegen for bases with pointer/reference types.
llvm-svn: 259776
Summary:
This patch adds parsing + sema for the target parallel for directive along with testcases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16759
llvm-svn: 259654
Summary:
This patch enhances Sema to check for the following restriction:
OpenMP 4.5 [2.17 Nesting of Regions]
If a target, target update, target data, target enter data, or
target exit data construct is encountered during execution of a
target region, the behavior is unspecified.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16758
llvm-svn: 259464
Summary:
This patch enhances Sema to check for the following restriction:
OpenMP 4.5 [2.17 Nesting of Regions]
If a target, target update, target data, target enter data, or
target exit data construct is encountered during execution of a
target region, the behavior is unspecified.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16758
llvm-svn: 259366
Summary:
This patch adds parsing + sema for the target parallel directive and its clauses along with testcases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16553
Rebased to current trunk and updated test cases.
llvm-svn: 258832
Summary:
This patch adds parsing + sema for the defaultmap clause associated with the target directive (among others).
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16527
llvm-svn: 258817
Summary:
Extend support in the map clause SEMA for the expressions supported in the OpenMP 4.5 specification, namely member expressions.
Fix some bugs in the previous implementation of SEMA related with expressions that do not consist of single variable references.
Fix bug in parsing when the expression in the map clause do not start with an identifier: accept any expression in the map clause and check for validity in SEMA instead of just ignoring it.
Reviewers: hfinkel, kkwli0, arpith-jacob, carlo.bertolli, ABataev
Subscribers: cfe-commits, fraggamuffin, caomhin
Differential Revision: http://reviews.llvm.org/D16385
llvm-svn: 258543
Summary:
Accept depend clause on target exit data directive in sema and add test cases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16401
llvm-svn: 258502
If 'sections' directive has only one sub-section, the code for 'single'-based directive was emitted. Removed this codegen, because it causes crashes in different cases.
llvm-svn: 258495
Summary:
Accept depend clause on target enter data directive in sema and add test cases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16400
llvm-svn: 258466
Summary:
Accept depend clause on target directive in sema and add test cases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16375
llvm-svn: 258460
Summary:
Accept nowait clause on target exit data directive in sema and add test cases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16362
llvm-svn: 258459
Summary:
Accept nowait clause on target enter data directive in sema and add test cases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16361
llvm-svn: 258457
Summary:
Allow nowait clause on target directive in sema and add test cases.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16358
llvm-svn: 258441
Summary:
Adds the following restriction in the OpenMP specifications.
OpenMP [2.10.1, Restrictions, p. 97]
At least one map clause must appear on the directive.
Reviewers: ABataev
Differential Revision: http://reviews.llvm.org/D16341
llvm-svn: 258425
OpenMP 4.5 allows to use non-static members of current class in non-static member functions in 'private' clause. Patch adds initial support for privatizing data members.
llvm-svn: 258299
Support for the following OpenMP 4.5 restriction on 'target enter data' and 'target exit data':
- A map-type must be specified in all map clauses.
I have to save 'IsMapTypeImplicit' when parsing a map clause to support this constraint and for more informative error messages. This helps me support the following case:
#pragma omp target enter data map(r) // expected-error {{map type must be specified for '#pragma omp target enter data'}}
and distinguish it from:
#pragma omp target enter data map(tofrom: r) // expected-error {{map type 'tofrom' is not allowed for '#pragma omp target enter data'}}
Patch by Arpith Jacob. Thanks!
llvm-svn: 258179
This commit is a follow-up to r251734, r251476, and r249735, which fixes
a bug where function attributes were not attached to thread local
wrapper functions.
rdar://problem/20828324
llvm-svn: 257865
Fixes processing of declarative directives and standalone executable directives. Declarative directives should not be allowed as an immediate statements and standalone executable directives are allowed to be used in case-stmt constructs.
llvm-svn: 257586
- Allow device ID to be signed.
- Add missing semicolon to some of the CHECK directives.
Thanks to Amjad Aboud for detecting the issue.
llvm-svn: 257065
This patch attempts to fix the regressions identified when the patch was committed initially.
Thanks to Michael Liao for identifying the fix in the offloading metadata generation
related with side effects in evaluation of function arguments.
llvm-svn: 256933
Summary:
In order to offloading work properly two things need to be in place:
- a descriptor with all the offloading information (device entry functions, and global variable) has to be created by the host and registered in the OpenMP offloading runtime library.
- all the device functions need to be emitted for the device and a convention has to be in place so that the runtime library can easily map the host ID of an entry point with the actual function in the device.
This patch adds support for these two things. However, only entry functions are being registered given that 'declare target' directive is not yet implemented.
About offloading descriptor:
The details of the descriptor are explained with more detail in http://goo.gl/L1rnKJ. Basically the descriptor will have fields that specify the number of devices, the pointers to where the device images begin and end (that will be defined by the linker), and also pointers to a the begin and end of table whose entries contain information about a specific entry point. Each entry has the type:
```
struct __tgt_offload_entry{
void *addr;
char *name;
int64_t size;
};
```
and will be implemented in a pre determined (ELF) section `.omp_offloading.entries` with 1-byte alignment, so that when all the objects are linked, the table is in that section with no padding in between entries (will be like a C array). The code generation ensures that all `__tgt_offload_entry` entries are emitted in the same order for both host and device so that the runtime can have the corresponding entries in both host and device in same index of the table, and efficiently implement the mapping.
The resulting descriptor is registered/unregistered with the runtime library using the calls `__tgt_register_lib` and `__tgt_unregister_lib`. The registration is implemented in a high priority global initializer so that the registration happens always before any initializer (that can potentially include target regions) is run.
The driver flag -omptargets= was created to specify a comma separated list of devices the user wants to support so that the new functionality can be exercised. Each device is specified with its triple.
About target codegen:
The target codegen is pretty much straightforward as it reuses completely the logic of the host version for the same target region. The tricky part is to identify the meaningful target regions in the device side. Unlike other programming models, like CUDA, there are no already outlined functions with attributes that mark what should be emitted or not. So, the information on what to emit is passed in the form of metadata in host bc file. This requires a new option to pass the host bc to the device frontend. Then everything is similar to what happens in CUDA: the global declarations emission is intercepted to check to see if it is an "interesting" declaration. The difference is that instead of checking an attribute, the metadata information in checked. Right now, there is only a form of metadata to pass information about the device entry points (target regions). A class `OffloadEntriesInfoManagerTy` was created to manage all the information and queries related with the metadata. The metadata looks like this:
```
!omp_offload.info = !{!0, !1, !2, !3, !4, !5, !6}
!0 = !{i32 0, i32 52, i32 77426347, !"_ZN2S12r1Ei", i32 479, i32 13, i32 4}
!1 = !{i32 0, i32 52, i32 77426347, !"_ZL7fstatici", i32 461, i32 11, i32 5}
!2 = !{i32 0, i32 52, i32 77426347, !"_Z9ftemplateIiET_i", i32 444, i32 11, i32 6}
!3 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 99, i32 11, i32 0}
!4 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 272, i32 11, i32 3}
!5 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 127, i32 11, i32 1}
!6 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 159, i32 11, i32 2}
```
The fields in each metadata entry are (in sequence):
Entry 1) an ID of the type of metadata - right now only zero is used meaning "OpenMP target region".
Entry 2) a unique ID of the device where the input source file that contain the target region lives.
Entry 3) a unique ID of the file where the input source file that contain the target region lives.
Entry 4) a mangled name of the function that encloses the target region.
Entries 5) and 6) line and column number where the target region was found.
Entry 7) is the order the entry was emitted.
Entry 2) and 3) are required to distinguish files that have the same function name.
Entry 4) is required to distinguish different instances of the same declaration (usually templated ones)
Entries 5) and 6) are required to distinguish the particular target region in body of the function (it is possible that a given target region is not an entry point - if clause can evaluate always to zero - and therefore we need to identify the "interesting" target regions. )
This patch replaces http://reviews.llvm.org/D12306.
Reviewers: ABataev, hfinkel, tra, rjmccall, sfantao
Subscribers: FBrygidyn, piotr.rak, Hahnfeld, cfe-commits
Differential Revision: http://reviews.llvm.org/D12614
llvm-svn: 256842
OpenMP 4.0-3.1 supports the next format of ‘schedule’ clause: schedule(kind[, chunk_size])
Where kind can be one of ‘static’, ‘dynamic’, ‘guided’, ‘auto’ or ‘runtime’.
OpenMP 4.5 defines the format: schedule([modifier [, modifier]:]kind[, chunk_size])
Modifier can be one of ‘monotonic’, ‘nonmonotonic’ or ‘simd’.
llvm-svn: 256487
OpenMP 4.5 adds 'depend(sink:vec)' in 'ordered' directive for doacross loop synchronization. Patch adds parsing and semantic analysis for this clause.
llvm-svn: 256330
OpenMP 4.5 adds 'depend(sink:vec)' in 'ordered' directive for doacross loop synchronization. Patch adds parsing and semantic analysis for this clause.
llvm-svn: 256238
It resolves clang selfhosting with std::once() for Cygwin.
FIXME: It may be EmulatedTLS-generic also for X86-Android.
FIXME: Pass EmulatedTLS to LLVM CodeGen from Clang with -femulated-tls.
llvm-svn: 256134
#pragma omp parallel needs an implicit barrier that is currently done by an explicit call to __kmpc_barrier. However, the runtime already ensures a barrier in __kmpc_fork_call which currently leads to two barriers per region per thread.
Differential Revision: http://reviews.llvm.org/D15561
llvm-svn: 255992
OpenMP codegen tried to emit the code for its constructs even if it was detected as a dead-code. Added checks to ensure that the code is emitted if the code is not dead.
llvm-svn: 255990
OpenMP 4.5 adds 'depend(source)' clause for 'ordered' directive to support cross-iteration dependence. Patch adds parsing and semantic analysis for this construct.
llvm-svn: 255986
Predetermined data-shared attributes for local variables are now considered as implicit. Also, patch prohibits changin of DSA for static memebers of classes.
llvm-svn: 255229
OpenMP 4.5 adds directives 'taskloop' and 'taskloop simd'. These directives support clause 'num_tasks'. Patch adds parsing/semantic analysis for this clause.
llvm-svn: 255008
OpenMP 4.5 adds 'taksloop' and 'taskloop simd' directives, which have 'grainsize' clause. Patch adds parsing/sema analysis of this clause.
llvm-svn: 254903
OpenMP 4.5 adds 'taskloop' and 'taskloop simd' directives. These directives have new 'nogroup' clause. Patch adds basic parsing/sema support for this clause.
llvm-svn: 254899
Summary:
This patch implements the 4.5 specification for the implicit data maps. OpenMP 4.5 specification changes the default way data is captured into a target region. All the non-aggregate kinds are passed by value by default. This required activating the capturing by value during SEMA for the target region. All the non-aggregate values that can be encoded in the size of a pointer are properly casted and forwarded to the runtime library. On top of fixing the previous weird behavior for mapping pointers in nested data regions (an explicit map was always required), this also improves performance as the number of allocations/transactions to the device per non-aggregate map are reduced from two to only one - instead of passing a reference and the value, only the value passed.
Explicit maps will be added later on once firstprivate, private, and map clauses' SEMA and parsing are available.
Reviewers: hfinkel, rjmccall, ABataev
Subscribers: cfe-commits, carlo.bertolli
Differential Revision: http://reviews.llvm.org/D14940
llvm-svn: 254521
OpenMP 4.5 defines new clause 'priority' for 'task', 'taskloop' and 'taskloop simd' directives. Added parsing and sema analysis for 'priority' clause in 'task' and 'taskloop' directives.
llvm-svn: 254398
According to OpenMP 4.5 the parameter of 'ordered' clause must be greater than or equal to the parameter of 'collapse' clause. Patch adds this rule.
llvm-svn: 254141
Runtime library requires, that codegen for 'depend' clause for 'out' dependency kind must be the same as codegen for 'depend' clause with 'inout' dependency.
llvm-svn: 253866
This is a follow on from a similar LLVM commit: r253511.
Note, this was reviewed (and more details are in) http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151109/312083.html
These intrinsics currently have an explicit alignment argument which is
required to be a constant integer. It represents the alignment of the
source and dest, and so must be the minimum of those.
This change allows source and dest to each have their own alignments
by using the alignment attribute on their arguments. The alignment
argument itself is removed.
The only code change to clang is hidden in CGBuilder.h which now passes
both dest and source alignment to IRBuilder, instead of taking the minimum of
dest and source alignments.
Reviewed by Hal Finkel.
llvm-svn: 253512
Currently, when there is a global register variable in a program that
is bound to an invalid register, clang/llvm prints an error message that
is not very user-friendly.
This commit improves the diagnostic and moves the check that used to be
in the backend to Sema. In addition, it makes changes to error out if
the size of the register doesn't match the declared variable size.
e.g., volatile register int B asm ("rbp");
rdar://problem/23084219
Differential Revision: http://reviews.llvm.org/D13834
llvm-svn: 253405
We used to emit the store prior to branch in the entry block. To make it more
efficient, this commit moves it to the init block. We still mark as initialized
before initializing anything else.
llvm-svn: 252777
With this change, most 'g' options are rejected by CompilerInvocation.
They remain only as Driver options. The new way to request debug info
from cc1 is with "-debug-info-kind={line-tables-only|limited|standalone}"
and "-dwarf-version={2|3|4}". In the absence of a command-line option
to specify Dwarf version, the Toolchain decides it, rather than placing
Toolchain-specific logic in CompilerInvocation.
Also fix a bug in the Windows compatibility argument parsing
in which the "rightmost argument wins" principle failed.
Differential Revision: http://reviews.llvm.org/D13221
llvm-svn: 249655
All global variables that are not enclosed in a declare target region
must be captured in the target region as local variables do. Currently,
there is no support for declare target, so this patch adds support for
capturing all the global variables used in a the target region.
llvm-svn: 249154
This patch implements the outlining for offloading functions for code
annotated with the OpenMP target directive. It uses a temporary naming
of the outlined functions that will have to be updated later on once
target side codegen and registration of offloading libraries is
implemented - the naming needs to be made unique in the produced
library.
llvm-svn: 249148
Description.
If the simd clause is specified, the ordered regions encountered by any thread will use only a single SIMD lane to execute the ordered regions in the order of the loop iterations.
Restrictions.
An ordered construct with the simd clause is the only OpenMP construct that can appear in the simd region.
An ordered directive with ‘simd’ clause is generated as an outlined function and corresponding function call to prevent this part of code from vectorization later in backend.
llvm-svn: 248772
Parsing and sema analysis for 'simd' clause in 'ordered' directive.
Description
If the simd clause is specified, the ordered regions encountered by any thread will use only a single SIMD lane to execute the ordered
regions in the order of the loop iterations.
Restrictions
An ordered construct with the simd clause is the only OpenMP construct that can appear in the simd region
llvm-svn: 248696
OpenMP 4.1 extends format of '#pragma omp ordered'. It adds 3 additional clauses: 'threads', 'simd' and 'depend'.
If no clause is specified, the ordered construct behaves as if the threads clause had been specified. If the threads clause is specified, the threads in the team executing the loop region execute ordered regions sequentially in the order of the loop iterations.
The loop region to which an ordered region without any clause or with a threads clause binds must have an ordered clause without the parameter specified on the corresponding loop directive.
llvm-svn: 248569
Patch adds emission of additional note for 'if' clauses with name modifiers in case if 'if' clause without name modified was specified or 'if' clause with the same name modifier was specified.
llvm-svn: 247706
Patch improves codegen for OpenMP constructs. If the OpenMP region does not have internal 'cancel' construct, a call to 'void __kmpc_barrier()' runtime function is generated for all implicit/explicit barriers. If the region has inner 'cancel' directive, then
```
if (__kmpc_cancel_barrier())
exit from outer construct;
```
code is generated.
Also, the code for 'canellation point' directive is not generated if parent directive does not have 'cancel' directive.
llvm-svn: 247681
If target supports TLS all threadprivates are generated as TLS. If target does not support TLS, use runtime calls for proper codegen of threadprivate variables.
llvm-svn: 247273
Currently private copies of captured variables have default alignment. Patch makes private variables to have same alignment as original variables.
llvm-svn: 247260
Currently all variables used in OpenMP regions are captured into a record and passed to outlined functions in this record. It may result in some poor performance because of too complex analysis later in optimization passes. Patch makes to emit outlined functions for parallel-based regions with a list of captured variables. It reduces code for 2*n GEPs, stores and loads at least.
Codegen for task-based regions remains unchanged because runtime requires that all captured variables are passed in captured record.
llvm-svn: 247251
Introduce an Address type to bundle a pointer value with an
alignment. Introduce APIs on CGBuilderTy to work with Address
values. Change core APIs on CGF/CGM to traffic in Address where
appropriate. Require alignments to be non-zero. Update a ton
of code to compute and propagate alignment information.
As part of this, I've promoted CGBuiltin's EmitPointerWithAlignment
helper function to CGF and made use of it in a number of places in
the expression emitter.
The end result is that we should now be significantly more correct
when performing operations on objects that are locally known to
be under-aligned. Since alignment is not reliably tracked in the
type system, there are inherent limits to this, but at least we
are no longer confused by standard operations like derived-to-base
conversions and array-to-pointer decay. I've also fixed a large
number of bugs where we were applying the complete-object alignment
to a pointer instead of the non-virtual alignment, although most of
these were hidden by the very conservative approach we took with
member alignment.
Also, because IRGen now reliably asserts on zero alignments, we
should no longer be subject to an absurd but frustrating recurring
bug where an incomplete type would report a zero alignment and then
we'd naively do a alignmentAtOffset on it and emit code using an
alignment equal to the largest power-of-two factor of the offset.
We should also now be emitting much more aggressive alignment
attributes in the presence of over-alignment. In particular,
field access now uses alignmentAtOffset instead of min.
Several times in this patch, I had to change the existing
code-generation pattern in order to more effectively use
the Address APIs. For the most part, this seems to be a strict
improvement, like doing pointer arithmetic with GEPs instead of
ptrtoint. That said, I've tried very hard to not change semantics,
but it is likely that I've failed in a few places, for which I
apologize.
ABIArgInfo now always carries the assumed alignment of indirect and
indirect byval arguments. In order to cut down on what was already
a dauntingly large patch, I changed the code to never set align
attributes in the IR on non-byval indirect arguments. That is,
we still generate code which assumes that indirect arguments have
the given alignment, but we don't express this information to the
backend except where it's semantically required (i.e. on byvals).
This is likely a minor regression for those targets that did provide
this information, but it'll be trivial to add it back in a later
patch.
I partially punted on applying this work to CGBuiltin. Please
do not add more uses of the CreateDefaultAligned{Load,Store}
APIs; they will be going away eventually.
llvm-svn: 246985
Some of instantiation-dependent expressions could cause false diagnostic to be emitted about unsupported atomic constructs. Relaxed rules for detection of incorrect expressions.
llvm-svn: 246853
Fix processing of shared variables with reference types in OpenMP constructs. Previously, if the variable was not marked in one of the private clauses, the reference to this variable was emitted incorrectly and caused an assertion later.
llvm-svn: 246846
Fixed codegen for extended format of 'if' clauses with special 'directive-name-modifier' + ast-print tests for extended format of 'if' clause.
llvm-svn: 246748
OpenMP 4.1 added special 'directive-name-modifier' to the 'if' clause.
Format of 'if' clause is as follows:
```
if([ directive-name-modifier :] scalar-logical-expression)
```
The restriction rules are also changed.
1. If any 'if' clause on the directive includes a 'directive-name-modifier' then all 'if' clauses on the directive must include a 'directive-name-modifier'.
2. At most one 'if' clause without a 'directive-name-modifier' can appear on the directive.
3. At most one 'if' clause with some particular 'directive-name-modifier' can appear on the directive.
'directive-name-modifier' is important for combined directives and allows to separate conditions in 'if' clause for simple sub-directives in combined directive. This 'directive-name-modifier' identifies the sub-directive to which this 'if' clause must be applied.
llvm-svn: 246747
Added codegen for array section in 'depend' clause of 'task' directive. It emits to pointers, one for the begin of array section and another for the end of array section. Size of the section is calculated as (end + 1 - start) * sizeof(basic_element_type).
llvm-svn: 246422
Added codegen for array section in 'depend' clause of 'task' directive. It emits to pointers, one for the begin of array section and another for the end of array section. Size of the section is calculated as (end + 1 - start) * sizeof(basic_element_type).
llvm-svn: 246278
The ABI string only exists to communicate with TargetCodeGenInfo.
Concretely, since we only used "avx*" ABI strings on x86_64 (as AVX
doesn't affect the i386 ABIs), this meant that, when initializing
SimdDefaultAlign, we would ignore AVX/AVX512 on i386, for no good
reason.
Instead, directly check the features. A similar change for
MaxVectorAlign will follow.
Differential Revision: http://reviews.llvm.org/D12390
llvm-svn: 246228
Adds parsing/sema analysis/serialization/deserialization for array sections in OpenMP constructs (introduced in OpenMP 4.0).
Currently it is allowed to use array sections only in OpenMP clauses that accepts list of expressions.
Differential Revision: http://reviews.llvm.org/D10732
llvm-svn: 245937
Add emission of metadata for simd loops in presence of 'simdlen' clause.
If 'simdlen' clause is provided without 'safelen' clause, the vectorizer width for the loop is set to value of 'simdlen' clause + all read/write ops in loop are marked with '!llvm.mem.parallel_loop_access' metadata.
If 'simdlen' clause is provided along with 'safelen' clause, the vectorizer width for the loop is set to value of 'simdlen' clause + all read/write ops in loop are not marked with '!llvm.mem.parallel_loop_access' metadata.
If 'safelen' clause is provided without 'simdlen' clause, the vectorizer width for the loop is set to value of 'safelen' clause + all read/write ops in loop are not marked with '!llvm.mem.parallel_loop_access' metadata.
llvm-svn: 245697
Add parsing/sema analysis for 'simdlen' clause in simd directives. Also add check that if both 'safelen' and 'simdlen' clauses are specified, the value of 'simdlen' parameter is less than the value of 'safelen' parameter.
llvm-svn: 245692
According to standard the 'uval' modifier declares the address of the original list item to have an invariant value for all iterations of the associated loop(s). Patch improves codegen for this qualifier by removing usage of the original reference variable and replacing by referenced l-value.
llvm-svn: 245674
Standard allows to use 'uval' and 'ref' modifiers in 'linear' clause for variables with reference types only. Added check for it and modified test.
llvm-svn: 245556
OpenMP 4.1 adds 3 optional modifiers to 'linear' clause.
Format of 'linear' clause has changed to:
```
linear(linear-list[ : linear-step])
```
where linear-list is one of the following
```
list
modifier(list)
```
where modifier is one of the following:
```
ref (C++)
val (C/C++)
uval (C++)
```
Patch adds parsing and sema analysis for these modifiers.
llvm-svn: 245550
Adds libomp.lib for -fopenmp=libomp and libiomp5md.lib for -fopenmp=libiomp5 on Windows
Differential Revision: http://reviews.llvm.org/D11932
llvm-svn: 245414
OpenMP 4.1 allows to use variables with reference types in all private clauses (private, firstprivate, lastprivate, linear etc.). Patch allows to use such variables and fixes codegen for linear variables with reference types.
llvm-svn: 245268
blender uses statements expression in condition of the loop under control of the '#pragma omp parallel for'. This condition is used several times in different expressions required for codegen of the loop directive. If there are some variables defined in statement expression, it fires an assert during codegen because of redefinition of the same variables.
We have to rebuild several expression to be sure that all variables are unique.
llvm-svn: 245041
OpenMP 4.1 allows to use variables with reference types in private clauses and, therefore, in init expressions of the cannonical loop forms.
llvm-svn: 244209
If a global variable is marked as private in OpenMP construct and then is used in of the private clauses of the same construct, it might cause compiler crash because of incorrect capturing.
llvm-svn: 243964
OpenMP 4.1 introduces optional argument '(n)' for 'ordered' clause, where 'n' is a number of loops that immediately follow the directive.
'n' must be constant positive integer expressions and it must be less or equal than the number of the loops in the resulting loop nest.
Patch adds parsing and semantic analysis for this optional argument.
llvm-svn: 243635
If the variable is marked as private in OpenMP construct, the reference to this variable should not keep type qualifiers for the original variable. Private copy is not volatile or constant, so we can use unqualified type for private copy.
llvm-svn: 242133
This patch adds ObjectFilePCHContainerOperations uses the LLVM backend
to put the contents of a PCH into a __clangast section inside a COFF, ELF,
or Mach-O object file container.
This is done to facilitate module debugging by makeing it possible to
store the debug info for the types defined by a module alongside the AST.
rdar://problem/20091852
llvm-svn: 241620
The next code is generated for this construct:
```
if (__kmpc_cancellationpoint(ident_t *loc, kmp_int32 global_tid, kmp_int32 cncl_kind) != 0)
<exit from outer innermost construct>;
```
llvm-svn: 241239
Several tests wouldn't pass when executed on an armv7a_pc_linux triple
due to the non-default arm_aapcs calling convention produced on the
function definitions in the IR output. Account for this with the
application of a little regex.
Patch by Ying Yi.
llvm-svn: 240971
If task directive has associated 'depend' clause then function kmp_int32 __kmpc_omp_task_with_deps ( ident_t *loc_ref, kmp_int32 gtid, kmp_task_t * new_task, kmp_int32 ndeps, kmp_depend_info_t *dep_list,kmp_int32 ndeps_noalias, kmp_depend_info_t *noalias_dep_list) must be called instead of __kmpc_omp_task().
If this directive has associated 'if' clause then also before a call of kmpc_omp_task_begin_if0() a function void __kmpc_omp_wait_deps ( ident_t *loc_ref, kmp_int32 gtid, kmp_int32 ndeps, kmp_depend_info_t *dep_list, kmp_int32 ndeps_noalias, kmp_depend_info_t *noalias_dep_list) must be called.
Array sections are not supported yet.
llvm-svn: 240532
Parsing and sema analysis (without support for array sections in arguments) for 'depend' clause (used in 'task' directive, OpenMP 4.0).
llvm-svn: 240409
Currently if the variable is captured in captured region, capture record for this region stores reference to this variable for future use. But we don't need to provide the reference to the original variable if it was explicitly marked as private in the 'private' clause of the OpenMP construct, this variable is replaced by private copy.
Differential Revision: http://reviews.llvm.org/D9550
llvm-svn: 240377
As specified in the SysV AVX512 ABI drafts. It follows the same scheme
as AVX2:
Arguments of type __m512 are split into eight eightbyte chunks.
The least significant one belongs to class SSE and all the others
to class SSEUP.
This also means we change the OpenMP SIMD default alignment on AVX512.
Based on r240337.
Differential Revision: http://reviews.llvm.org/D9894
llvm-svn: 240338
Added parsing, sema analysis and codegen for '#pragma omp taskgroup' directive (OpenMP 4.0).
The code for directive is generated the following way:
#pragma omp taskgroup
<body>
void __kmpc_taskgroup(<loc>, thread_id);
<body>
void __kmpc_end_taskgroup(<loc>, thread_id);
llvm-svn: 240011
Added codegen for combined 'omp for simd' directives, that is a combination of 'omp for' directive followed by 'omp simd' directive. Includes support for all clauses.
llvm-svn: 239990
The following code is generated for reduction clause within 'omp simd' loop construct:
#pragma omp simd reduction(op:var)
for (...)
<body>
alloca priv_var
priv_var = <initial reduction value>;
<loop_start>:
<body> // references to original 'var' are replaced by 'priv_var'
<loop_end>:
var op= priv_var;
llvm-svn: 239881
Previously the last iteration for simd loop-based OpenMP constructs were generated as a separate code. This feature is not required and codegen is simplified.
llvm-svn: 239810
If loop control variable in a worksharing construct is marked as lastprivate, we should copy last calculated value of private counter back to original variable.
llvm-svn: 237879
-fopenmp turns on OpenMP support and links libiomp5 as OpenMP library. Also there is -fopenmp={libiomp5|libgomp} option that allows to override effect of -fopenmp and link libgomp library (if -fopenmp=libgomp is specified).
Differential Revision: http://reviews.llvm.org/D9736
llvm-svn: 237769
Patch fixes codegen for aggregate copying of VLAs. Currently method CodeGenFunction::EmitAggregateCopy() does not support copying of VLAs. Patch checks if the size of the type is 0, then checks if the type is actually a variable-length array. Then it calculates total length for this array and calculates total size of the array in bytes:
<total number of elements in array> * aligned_sizeof(ElementType) (if copy assignment is requested).
If simple copying is requested, size is calculated like:
<total number of elements in array> * aligned_sizeof(ElementType) - aligned_sizeof(ElementType) + sizeof(ElementType).
memcpy() is used with this calculated size of the VLA.
Differential Revision: http://reviews.llvm.org/D9851
llvm-svn: 237768
This modification generates proper copyin/initialization sequences for array variables/parameters. Before they were considered as pointers, not arrays.
llvm-svn: 237691
Internal task structure must be generated like
typedef struct kmp_task {
void * shareds;
kmp_routine_entry_t routine;
kmp_int32 part_id;
kmp_routine_entry_t destructors;
} kmp_task_t;
struct kmp_task_t_with_privates {
kmp_task_t task_data;
.kmp_private. privates;
};
to avoid possible additional alignment bytes in first fields (shareds, routine, part_id and destructors). Runtime library is not aware of such kind additional alignment bytes.
llvm-svn: 237561
'schedule' clause for combined directives requires additional processing. Special helper variable is generated, that is captured in the outlined parallel region for 'parallel for' region. This captured variable is used to store chunk expression from the 'schedule' clause in this 'parallel for' region.
llvm-svn: 237100
Inner bodies of OpenMP worksharing loop-based constructs with dynamic or guided scheduling are allowed to be marked with !llvm.mem.parallel_loop_access metadata for better optimization. Worksharing constructs with static scheduling cannot be marked this way (according to OpenMP standard "A data dependence between the same logical iterations in two such loops is guaranteed").
Constructs with auto and runtime scheduling are also not marked because automatically chosen scheduling may be static also.
Differential Revision: http://reviews.llvm.org/D9518
llvm-svn: 236693
Fixed codegen for reduction operations min, max, && and ||. Codegen for them is quite similar and I was confused by this similarity.
Also added a call to kmpc_end_reduce() in atomic part of reduction codegen (call to kmpc_end_reduce_nowait() is not required).
Differential Revision: http://reviews.llvm.org/D9513
llvm-svn: 236689
For tasks codegen for private/firstprivate variables are different rather than for other directives.
1. Build an internal structure of privates for each private variable:
struct .kmp_privates_t. {
Ty1 var1;
...
Tyn varn;
};
2. Add a new field to kmp_task_t type with list of privates.
struct kmp_task_t {
void * shareds;
kmp_routine_entry_t routine;
kmp_int32 part_id;
kmp_routine_entry_t destructors;
.kmp_privates_t. privates;
};
3. Create a function with destructors calls for all privates after end of task region.
kmp_int32 .omp_task_destructor.(kmp_int32 gtid, kmp_task_t *tt) {
~Destructor(&tt->privates.var1);
...
~Destructor(&tt->privates.varn);
return 0;
}
4. Perform initialization of all firstprivate fields (by simple copying for POD data, copy constructor calls for classes) + provide address of a destructor function after kmpc_omp_task_alloc() and before kmpc_omp_task() calls.
kmp_task_t *new_task = __kmpc_omp_task_alloc(ident_t *, kmp_int32 gtid, kmp_int32 flags, size_t sizeof_kmp_task_t, size_t sizeof_shareds, kmp_routine_entry_t *task_entry);
CopyConstructor(new_task->privates.var1, *new_task->shareds.var1_ref);
new_task->shareds.var1_ref = &new_task->privates.var1;
...
CopyConstructor(new_task->privates.varn, *new_task->shareds.varn_ref);
new_task->shareds.varn_ref = &new_task->privates.varn;
new_task->destructors = .omp_task_destructor.;
kmp_int32 __kmpc_omp_task(ident_t *, kmp_int32 gtid, kmp_task_t *new_task)
Differential Revision: http://reviews.llvm.org/D9370
llvm-svn: 236479
For tasks codegen for private/firstprivate variables are different rather than for other directives.
1. Build an internal structure of privates for each private variable:
struct .kmp_privates_t. {
Ty1 var1;
...
Tyn varn;
};
2. Add a new field to kmp_task_t type with list of privates.
struct kmp_task_t {
void * shareds;
kmp_routine_entry_t routine;
kmp_int32 part_id;
kmp_routine_entry_t destructors;
.kmp_privates_t. privates;
};
3. Create a function with destructors calls for all privates after end of task region.
kmp_int32 .omp_task_destructor.(kmp_int32 gtid, kmp_task_t *tt) {
~Destructor(&tt->privates.var1);
...
~Destructor(&tt->privates.varn);
return 0;
}
4. Perform default initialization of all private fields (no initialization for POD data, default constructor calls for classes) + provide address of a destructor function after kmpc_omp_task_alloc() and before kmpc_omp_task() calls.
kmp_task_t *new_task = __kmpc_omp_task_alloc(ident_t *, kmp_int32 gtid, kmp_int32 flags, size_t sizeof_kmp_task_t, size_t sizeof_shareds, kmp_routine_entry_t *task_entry);
DefaultConstructor(new_task->privates.var1);
new_task->shareds.var1_ref = &new_task->privates.var1;
...
DefaultConstructor(new_task->privates.varn);
new_task->shareds.varn_ref = &new_task->privates.varn;
new_task->destructors = .omp_task_destructor.;
kmp_int32 __kmpc_omp_task(ident_t *, kmp_int32 gtid, kmp_task_t *new_task)
Differential Revision: http://reviews.llvm.org/D9322
llvm-svn: 236207
For proper codegen we need to capture variable in the OpenMP region. In loop-based directives loop control variables are private by default and they must be captured in this region. There was a problem with capturing of globals, used as lcv, as they was not marked as private by default.
Differential Revision: http://reviews.llvm.org/D9336
llvm-svn: 236201
Fixed initialization of 'single' region completion + changed type of the third argument of __kmpc_copyprivate() runtime function to size_t.
llvm-svn: 236198
LLVM r236120 renamed debug info IR constructs to use a `DI` prefix, now
that the `DIDescriptor` hierarchy has been gone for about a week. This
commit was generated using the rename-md-di-nodes.sh upgrade script
attached to PR23080, followed by running clang-format-diff.py on the
`lib/` portion of the patch.
llvm-svn: 236121