The problem with the old scheme is that we would need to keep track of
the "next region" and reset the num_threads value after it. The new RT
doesn't do it and an assertion is triggered. The old RT doesn't do it
either, I haven't tested it but I assume a num_threads clause might
impact multiple parallel regions "accidentally". Further, in SPMD mode
num_threads was simply ignored, for some reason beyond me.
In any case, parallel_51 is designed to take the clause value directly,
so let's do that instead.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D113623
WG14 adopted the _ExtInt feature from Clang for C23, but renamed the
type to be _BitInt. This patch does the vast majority of the work to
rename _ExtInt to _BitInt, which accounts for most of its size. The new
type is exposed in older C modes and all C++ modes as a conforming
extension. However, there are functional changes worth calling out:
* Deprecates _ExtInt with a fix-it to help users migrate to _BitInt.
* Updates the mangling for the type.
* Updates the documentation and adds a release note to warn users what
is going on.
* Adds new diagnostics for use of _BitInt to call out when it's used as
a Clang extension or as a pre-C23 compatibility concern.
* Adds new tests for the new diagnostic behaviors.
I want to call out the ABI break specifically. We do not believe that
this break will cause a significant imposition for early adopters of
the feature, and so this is being done as a full break. If it turns out
there are critical uses where recompilation is not an option for some
reason, we can consider using ABI tags to ease the transition.
Need to do the analysis of the captured expressions in the clauses.
Previously the compiler ignored them and it may lead to a compiler crash
trying to get the address of the mapped variables.
Differential Revision: https://reviews.llvm.org/D114546
Currently the last value of linear is calculated as var = init + num_iters * step.
Replaced it with var = var_priv, i.e. original variable gets the value
of the last private copy.
Differential Revision: https://reviews.llvm.org/D105151
Need to postpone anlysis of the ranged for loops till the actual
instantiation to avoid erroneous emission of error messages.
Differential Revision: https://reviews.llvm.org/D114560
Need to postpone analysis for addressable lvalue in a depend clause with
iterators, otherwise the incorrect error message is emitted.
Differential Revision: https://reviews.llvm.org/D114653
Fix for the case when there are no instructions in the entry basic block before the call
to `createSections`
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D114143
Currently variables appearing inside private/firstprivate/lastprivate
clause of openmp task construct are not visible inside lldb debugger.
This is because compiler does not generate debug info for it.
Please consider the testcase debug_private.c attached with patch.
```
28 #pragma omp task shared(res) private(priv1, priv2) firstprivate(fpriv)
29 {
30 priv1 = n;
31 priv2 = n + 2;
32 printf("Task n=%d,priv1=%d,priv2=%d,fpriv=%d\n",n,priv1,priv2,fpriv);
33
-> 34 res = priv1 + priv2 + fpriv + foo(n - 1);
35 }
36 #pragma omp taskwait
37 return res;
(lldb) p priv1
error: <user expression 0>:1:1: use of undeclared identifier 'priv1'
priv1
^
(lldb) p priv2
error: <user expression 1>:1:1: use of undeclared identifier 'priv2'
priv2
^
(lldb) p fpriv
error: <user expression 2>:1:1: use of undeclared identifier 'fpriv'
fpriv
^
```
After the current patch, lldb is able to show the variables
```
(lldb) p priv1
(int) $0 = 10
(lldb) p priv2
(int) $1 = 12
(lldb) p fpriv
(int) $2 = 14
```
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D114504
Eachempati.
This patch adds clang (parsing, sema, serialization, codegen) support for the 'depend' clause on the 'taskwait' directive.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D113540
After the changes introduced by D106799 it is possible to tag
outlined function with both AlwaysInline and NoInline attributes using
-fno-inline command line options.
This issue is similiar to D107649.
Differential Revision: https://reviews.llvm.org/D112645
Extension of D112504. Lower amdgpu printf to `__llvm_omp_vprintf`
which takes the same const char*, void* arguments as cuda vprintf and also
passes the size of the void* alloca which will be needed by a non-stub
implementation of `__llvm_omp_vprintf` for amdgpu.
This removes the amdgpu link error on any printf in a target region in favour
of silently compiling code that doesn't print anything to stdout.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D112680
at the start of the entry block, which in turn would aid better code transformation/optimization.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D110257
This patch removes the assumption propagation that was added in D110655
primarily to get assumption informatino on opaque call sites for
optimizations. The analysis done in D111445 allows us to do this more
intelligently in the back-end.
Depends on D111445
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D111463
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.
Reviewed By: jdoerfert, JonChesterfield, tianshilei1992
Differential Revision: https://reviews.llvm.org/D113421
The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.
Reviewed By: jdoerfert, JonChesterfield, tianshilei1992
Differential Revision: https://reviews.llvm.org/D113421
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
[Clang/Test]: Rename enable_noundef_analysis to disable-noundef-analysis and turn it off by default (2)
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:
(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.
(2) The remaining tests are updated manually.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108453
Resolve lit failures in clang after 8ca4b3e's land
Fix lit test failures in clang-ppc* and clang-x64-windows-msvc
Fix missing failures in clang-ppc64be* and retry fixing clang-x64-windows-msvc
Fix internal_clone(aarch64) inline assembly
Turning on `enable_noundef_analysis` flag allows better codegen by removing freeze instructions.
I modified clang by renaming `enable_noundef_analysis` flag to `disable-noundef-analysis` and turning it off by default.
Test updates are made as a separate patch: D108453
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D105169
[NFC] This patch fixes URLs containing "master". Old URLs were either broken or
redirecting to the new URL.
Reviewed By: #libc, ldionne, mehdi_amini
Differential Revision: https://reviews.llvm.org/D113186
There is no need to check for deferred diag when device compilation or target is
not given. This results in considerable build time improvement in some cases.
Differential Revision: https://reviews.llvm.org/D109175
Adds initial parsing and sema for the 'append_args' clause.
Note that an AST clause is not created as it instead adds its values
to the OMPDeclareVariantAttr.
Differential Revision: https://reviews.llvm.org/D111854
Based on post-commit review discussion on
2bd8493847 with Richard Smith.
Other uses of forcing HasEmptyPlaceHolder to false seem OK to me -
they're all around pointer/reference types where the pointer/reference
token will appear at the rightmost side of the left side of the type
name, so they make nested types (eg: the "int" in "int *") behave as
though there is a non-empty placeholder (because the "*" is essentially
the placeholder as far as the "int" is concerned).
This was originally committed in 277623f4d5
Reverted in f9ad1d1c77 due to breakages
outside of clang - lldb seems to have some strange/strong dependence on
"char [N]" versus "char[N]" when printing strings (not due to that name
appearing in DWARF, but probably due to using clang to stringify type
names) that'll need to be addressed, plus a few other odds and ends in
other subprojects (clang-tools-extra, compiler-rt, etc).
This patch updates test files after D105169.
Autogenerated test codes are changed by `utils/update_cc_test_checks.py,` and non-autogenerated test codes are changed as follows:
(1) I wrote a python script that (partially) updates the tests using regex: {F18594904} The script is not perfect, but I believe it gives hints about which patterns are updated to have `noundef` attached.
(2) The remaining tests are updated manually.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108453
This was committed as ec6c847179, but then reverted after a failure
in: https://lab.llvm.org/buildbot/#/builders/84/builds/13983
I was not able to reproduce the problem, but I added an extra check
for a NULL QualType just in case.
Original comit message:
The patch adds missing diagnostics for cases like:
float F3 = ((__float128)F1 * (__float128)F2) / 2.0f;
Sema::checkDeviceDecl (renamed to checkTypeSupport) is changed to work
with a type without the corresponding ValueDecl. It is also refactored
so that host diagnostics for unsupported types can be added here as
well.
Differential Revision: https://reviews.llvm.org/D109315
Looks like lldb has some issues with this - somehow it causes lldb to
treat a "char[N]" type as an array of chars (prints them out
individually) but a "char [N]" is printed as a string. (even though the
DWARF doesn't have this string in it - it's something to do with the
string lldb generates for itself using clang)
This reverts commit 277623f4d5.
Based on post-commit review discussion on
2bd8493847 with Richard Smith.
Other uses of forcing HasEmptyPlaceHolder to false seem OK to me -
they're all around pointer/reference types where the pointer/reference
token will appear at the rightmost side of the left side of the type
name, so they make nested types (eg: the "int" in "int *") behave as
though there is a non-empty placeholder (because the "*" is essentially
the placeholder as far as the "int" is concerned).
Adds initial parsing and sema for the 'adjust_args' clause.
Note that an AST clause is not created as it instead adds its expressions
to the OMPDeclareVariantAttr.
Differential Revision: https://reviews.llvm.org/D99905
Sequel patch to https://reviews.llvm.org/D111293.
Remove call to CodeGenFunction::InitTempAlloca() from OpenMP related
codegen part.
Also remove the metadata `!llvm.access.group` from the updated lit
tests.
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D111316
This patch adds support for the
`__kmpc_get_hardware_num_threads_in_block` function that returns the
number of threads. This was missing in the new runtime and was used by
the AMDGPU plugin which prevented it from using the new runtime. This
patchs also unified the interface for getting the thread numbers in the
frontend.
Originally authored by jdoerfert.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D111475
This patch adds two flags to be supported for the new runtime. The flags
are `-fopenmp-assume-threads-oversubscription` and
-fopenmp-assume-teams-oversubscription`. These add global values that
can be checked by the work sharing runtime functions to make better
judgements about how to distribute work between the threads.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D111348
adjustMemberOfForLambdaCaptures.
The problem is happening when user passes lambda function with reference
type in the map clause.
The natural of the problem when processing generateInfoForCapture,
the BasePointer is generated with new load for a lambda variable with
reference type. It is not expected in adjustMemberOfForLambdaCaptures.
One way to fix this is to skipping call to generateInfoForCapture for
map(to:lambda). The map info will be generated later in the call to
generateDefaultMapInfo samiler as firsprivate clase.
This to fix https://bugs.llvm.org/show_bug.cgi?id=52071
Differential Revision:https://reviews.llvm.org/D111115
Clang would reject
#pragma omp for
#pragma omp tile sizes(P)
for (int i = 0; i < 128; ++i) {}
where P is a template parameter, but the loop itself is not
template-dependent. Because P context-dependent, the TransformedStmt
cannot be generated and therefore is nullptr (until the template is
instantiated by TreeTransform). The OMPForDirective would still expect
the a loop is the dependent context and trigger an error.
Fix by introducing a NumGeneratedLoops field to OMPLoopTransformation.
This is used to distinguish the case where no TransformedStmt will be
generated at all (e.g. #pragma omp unroll full) and template
instantiation is needed. In the latter case, delay resolving the
iteration space like when the for-loop itself is template-dependent
until the template instatiation.
A more radical solution would always delay the iteration space analysis
until template instantiation, but would also break many test cases.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D111124
by Raul Penacoba.
The size of kmp_depend_info and the number of dependencies are computed multiplying the iterator sizes, which not right.
Now size is computed as:
itersize1*numclausedeps1 + itersize2*numclausedeps2 + ... + itersizeN*numclausedepsN
where itersizeX is the size of the iterator and numclausedepsX the number of dependencies in that depend clause.
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D111045
This patch adds OpenMP assumption attributes to call sites in applicable
regions. Currently this applies the caller's assumption attributes to
any calls contained within it. So, if a call occurs inside an OpenMP
assumes region to a function outside that region, we will assume that
call respects the assumptions. This is primarily useful for inline
assembly calls used heavily in the OpenMP GPU device runtime, which
allows us to then make judgements about what the ASM will do.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110655
This patch adds a new RTL function for worksharing. Currently we use
`__kmpc_for_static_init` for both the `distribute` and `parallel`
portion of the loop clause. This patch replaces the `distribute` portion
with a new runtime call `__kmpc_distribute_static_init`. Currently this
will be used exactly the same way, but will make it easier in the future
to fine-tune the distribute and parallel portion of the loop.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110429
This is a follow-up of D110029, which uses bitset to indicate execution mode. This patches makes the changes in the function call.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D110279