This patch adds unique idenfitiers to the existing OpenMP remarks. This makes
it easier to identify the corresponding documentation for each remark that will
be hosted in the OpenMP webpage.
Depends on D105898
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105939
This patch rewrites and reworks a few of the existing remarks to make the mmore
concise and consistent prior to writing the documentation for them.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D105898
OpenMP 5.1 added support for writing OpenMP directives using [[]]
syntax in addition to using #pragma and this introduces support for the
new syntax.
In OpenMP, the attributes take one of two forms:
[[omp::directive(...)]] or [[omp::sequence(...)]]. A directive
attribute contains an OpenMP directive clause that is identical to the
analogous #pragma syntax. A sequence attribute can contain either
sequence or directive arguments and is used to ensure that the
attributes are processed sequentially for situations where the order of
the attributes matter (remember:
https://eel.is/c++draft/dcl.attr.grammar#4.sentence-4).
The approach taken here is somewhat novel and deserves mention. We
could refactor much of the OpenMP parsing logic to work for either
pragma annotation tokens or for attribute clauses. It would be a fair
amount of effort to share the logic for both, but it's certainly
doable. However, the semantic attribute system is not designed to
handle the arbitrarily complex arguments that OpenMP directives
contain. Adding support to thread the novel parsed information until we
can produce a semantic attribute would be considerably more effort.
What's more, existing OpenMP constructs are not (often) represented as
semantic attributes. So doing this through Attr.td would be a massive
undertaking that would likely only benefit OpenMP and comes with
additional risks. Rather than walk down that path, I am taking
advantage of the fact that the syntax of the directives within the
directive clause is identical to that of the #pragma form. Once the
parser recognizes that we're processing an OpenMP attribute, it caches
all of the directive argument tokens and then replays them as though
the user wrote a pragma. This reuses the same OpenMP parsing and
semantic logic directly, but does come with a risk if the OpenMP
committee decides to purposefully diverge their pragma and attribute
syntaxes. So, despite this being a novel approach that does token
replay, I think it's actually a better approach than trying to do this
through the declarative syntax in Attr.td.
In the spirit of TRegions [0], this patch analyzes a kernel and tracks
if it can be executed in SPMD-mode. If so, we flip the arguments of
the __kmpc_target_init and deinit call to enable the mode. We also
update the `<kernel>_exec_mode` flag to indicate to the runtime we
changed the mode to SPMD.
The code analysis is done interprocedurally by extending the
AAKernelInfo abstract attribute to track SPMD compatibility as well.
[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11
Differential Revision: https://reviews.llvm.org/D102307
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.
The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.
[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11
This was in parts extracted from D59319.
Reviewed By: ABataev, JonChesterfield
Differential Revision: https://reviews.llvm.org/D101976
If the base is used in a map clause and later we have a memberexpr with
this base, and the member is a pointer, and this pointer is dereferenced
anyhow (subscript, array section, dereference, etc.), such components
should be considered as overlapped, otherwise it may lead to incorrect
size computations, since we try to map a pointee as a part of the whole
struct, which is not true for the pointer members.
Differential Revision: https://reviews.llvm.org/D105562
No need to emit private copyfor firstprivate constants in target
regions, we can use the original copy instead.
Differential Revision: https://reviews.llvm.org/D105647
No need to try to create the default constructor for private copy, it
will be called automatically in the initializer of the declare
reduction. Fixes balance between constructors/destructors calls.
Differential Revision: https://reviews.llvm.org/D105143
This patch adds a module level metadata flag indicating that the module
was compiled with the `-fopenmp` flag. This will make it easier for
passes like OpenMPOpt to determine if it should be run.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102361
Summary:
Currently the attributor needs to give up if a function has external linkage.
This means that the optimization introduced in D97818 will only apply to static
functions. This change uses the Attributor to internalize OpenMP device
routines by making a copy of each function with private linkage and replacing
the uses in the module with it. This allows for the optimization to be applied
to any regular function.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102824
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.
This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D97680
The codegen for simd constructs was affected by the presence (or
absence) of the 'monotonic' schedule modifier for worksharing
loops. The modifier is only intended to apply to the scheduling of
chunks for a thread, not iterations of a loop inside a chunk.
In addition, the monotonic modifier was applied to worksharing loops
by default if no schedule clause was present; the referenced part of
the OpenMP 4.5 spec in the code (section 2.7.1) only applies if the
user specified a schedule clause with a static kind but no modifier.
Without a user-specified schedule clause we should default to
nonmonotonic scheduling.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D103793
Need to capture locals in aligned clauses for the combined directives to
be fix the crash in the codegen.
Differential Revision: https://reviews.llvm.org/D104258
Added support for CXXRewrittenBinaryOperator as a condition in ranged
for loops. This is a new kind of expression, need to extend support for
C++20 constructs.
It fixes PR49970: range-based for compilation fails for libstdc++ vector
with -std=c++20.
Differential Revision: https://reviews.llvm.org/D104240
The parser code assumes building with C++ compiler and asserts when using clang (not clang++) on C file. I made the code dependent on input language. This shows up for amdgpu target.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D103899
Implementation of the unroll directive introduced in OpenMP 5.1. Follows the approach from D76342 for the tile directive (i.e. AST-based, not using the OpenMPIRBuilder). Tries to use `llvm.loop.unroll.*` metadata where possible, but has to fall back to an AST representation of the outer loop if the partially unrolled generated loop is associated with another directive (because it needs to compute the number of iterations).
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D99459
Clang will create a global value put in constant memory if an aggregate value
is declared firstprivate in the target device. The symbol name only uses the
name of the firstprivate variable, so symbol name conflicts will occur if the
variable is allowed to have different types through templates. An example of
this behvaiour is shown in https://godbolt.org/z/EsMjYh47n. This patch adds the
mangled type name to the symbol to avoid such naming conflicts. This fixes
https://bugs.llvm.org/show_bug.cgi?id=50642.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D103995
Refactored code of dependence processing and added new inoutset dependence type.
Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps.
Size of type of the dependence flag changed from 1 to 4 bytes in clang.
All dependence flags library gets so far and corresponding dependence types:
1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET.
Differential Revision: https://reviews.llvm.org/D97085
Need to emit a call for __kmpc_cancel_barrier in the exit block for
__kmpc_cancel function call if cancellation of the parallel block is
requested.
Differential Revision: https://reviews.llvm.org/D103646
Patch allows using of constexpr vars evaluatable to constant calue to be
used in declare mapper construct.
Differential Revision: https://reviews.llvm.org/D103642
The PreInits of a loop transformation (atm moment only tile) include the computation of the trip count. The trip count is needed by any loop-associated directives that consumes the transformation-generated loop. Hence, we must ensure that the PreInits of consumed loop transformations are emitted with the consuming directive.
This is done by addinging the inner loop transformation's PreInits to the outer loop-directive's PreInits. The outer loop-directive will consume the de-sugared AST such that the inner PreInits are not emitted twice. The PreInits of a loop transformation are still emitted directly if its generated loop(s) are not associated with another loop-associated directive.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D102180
If a test does not contain an " simd" but -fopenmp-simd RUN lines we can
just check that we do not create __kmpc|__tgt calls.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D101973
`clang -fpic -fno-semantic-interposition` may set dso_local on variables for -fpic.
GCC folks consider there are 'address interposition' and 'semantic interposition',
and 'disabling semantic interposition' can optimize function calls but
cannot change variable references to use local aliases
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100483).
This patch removes dso_local for variables in
`clang -fpic -fno-semantic-interposition` mode so that the built shared objects can
work with copy relocations. Building llvm-project tiself with
-fno-semantic-interposition (D102453) should now be safe with trunk Clang.
Example:
```
// a.c
int var;
int *addr() { return var; }
// old: cannot be interposed
movslq .Lvar$local(%rip), %rax
// new: can be interposed
movq var@GOTPCREL(%rip), %rax
movslq (%rax), %rax
```
The local alias lowering for `GlobalVariable`s is kept in case there is a
future option allowing local aliases.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D102583
Summary:
Currently, only `OptimizationRemarks` can be emitted using a Function.
Add constructors to allow this for `OptimizationRemarksAnalysis` and
`OptimizationRemarkMissed` as well.
Reviewed By: jdoerfert thegameg
Differential Revision: https://reviews.llvm.org/D102784
Revert recent commit to require x86-registered-target (e4b790c5e3).
Remove -O1 from the run lines so they are less dependent on backend passes.
Update the CHECK6 and CHECK10 lines with script.
Differential Revision: https://reviews.llvm.org/D102720
The original change was reverted because it was discovered
that clang mishandles thunks, and they receive wrong
attributes for their this/return types - the ones for the function
they will call, not the ones they have.
While i have tried to fix this in https://reviews.llvm.org/D100388
that patch has been up and stuck for a month now,
with little signs of progress.
So while it will be good to solve this for real,
for now we can simply avoid introducing the bug,
by not annotating this/return for thunks.
This reverts commit 6270b3a1ea,
relanding 0aa0458f14.
The OpenMP spec seems to require the compound operators be used for
+, *, &, |, and ^ reduction. So use these if a class has those operators.
If not try the simple operators as we did previously to limit the impact
to existing code.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=48584
Differential Revision: https://reviews.llvm.org/D101941
Follow the more general patch for now, do not try to SPMDize the kernel
if the variable is used and local.
Differential Revision: https://reviews.llvm.org/D101911
This patch fixes various issues with our prior `declare target` handling
and extends it to support `omp begin declare target` as well.
This started with PR49649 in mind, trying to provide a way for users to
avoid the "ref" global use introduced for globals with internal linkage.
From there it went down the rabbit hole, e.g., all variables, even
`nohost` ones, were emitted into the device code so it was impossible to
determine if "ref" was needed late in the game (based on the name only).
To make it really useful, `begin declare target` was needed as it can
carry the `device_type`. Not emitting variables eagerly had a ripple
effect. Finally, the precedence of the (explicit) declare target list
items needed to be taken into account, that meant we cannot just look
for any declare target attribute to make a decision. This caused the
handling of functions to require fixup as well.
I tried to clean up things while I was at it, e.g., we should not "parse
declarations and defintions" as part of OpenMP parsing, this will always
break at some point. Instead, we keep track what region we are in and
act on definitions and declarations instead, this is what we do for
declare variant and other begin/end directives already.
Highlights:
- new diagnosis for restrictions specificed in the standard,
- delayed emission of globals not mentioned in an explicit
list of a declare target,
- omission of `nohost` globals on the host and `host` globals on the
device,
- no explicit parsing of declarations in-between `omp [begin] declare
variant` and the corresponding end anymore, regular parsing instead,
- precedence for explicit mentions in `declare target` lists over
implicit mentions in the declaration-definition-seq, and
- `omp allocate` declarations will now replace an earlier emitted
global, if necessary.
---
Notes:
The patch is larger than I hoped but it turns out that most changes do
on their own lead to "inconsistent states", which seem less desirable
overall.
After working through this I feel the standard should remove the
explicit declare target forms as the delayed emission is horrible.
That said, while we delay things anyway, it seems to me we check too
often for the current status even though that is often not sufficient to
act upon. There seems to be a lot of duplication that can probably be
trimmed down. Eagerly emitting some things seems pretty weak as an
argument to keep so much logic around.
---
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D101030
This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D101849
This patch renames the replace-function-regex to replace-value-regex to indicate that the existing regex replacement functionality can replace any IR value besides functions.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D101934
This patch refactors a subset of Clang OpenMP tests, generating checklines using the update_cc_test_checks script. This refactoring facilitates updating the Clang OpenMP code generation codebase by automating test generation.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D101849
as used.
The problem only happens with constexpr variable, for constexpr variable,
variable is not marked during parser variable. This is because compiler
might find some var's associate expressions may not actully an odr-used
later, the variables get kept in MaybeODRUseExprs, in normal case, at
end of process fullExpr, the variable will be marked during the call to
CleanupVarDeclMarking(). Since we are processing expression of OpenMP
clauses, and the ActOnFinishFullExpr is not getting called that casue
variable is not get marked.
One way to fix this is to call CleanupVarDeclMarking() in EndOpenMPClause
for each omp directive.
This to fix https://bugs.llvm.org/show_bug.cgi?id=50206
Differential Revision: https://reviews.llvm.org/D101781
This patch fixes a bug from D89802. For example, without it, Clang
generates x as the debug map name for both x and y in the following
example:
```
#pragma omp target map(to: x, y)
x = y = 1;
```
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D101564
This patch is child of D89671, contains the clang
implementation to use the OpenMP IRBuilder's section
construct.
Co-author: @anchu-rajendran
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D91054
Need to respect mapping/privatization of declare target variables in the
target regions if explicitly specified by the user.
Differential Revision: https://reviews.llvm.org/D99530
When an object is allocated in a non-default address space we do not
need to check for a constructor if it is not initialized and has a
trivial constructor (which we won't call then).
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D100929
This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions.
Reviewed By: jdoerfert, Meinersbur
Differential Revision: https://reviews.llvm.org/D95976
The implicitly generated mappings for allocation/deallocation in mappers
runtime should be mapped as implicit, also no need to clear member_of
flag to avoid ref counter increment. Also, the ref counter should not be
incremented for the very first element that comes from the mapper
function.
Differential Revision: https://reviews.llvm.org/D100673