2021-07-21 00:04:13 +08:00
|
|
|
.. _omp112:
|
|
|
|
|
2021-07-15 05:04:54 +08:00
|
|
|
Found thread data sharing on the GPU. Expect degraded performance due to data globalization. [OMP112]
|
|
|
|
=====================================================================================================
|
|
|
|
|
|
|
|
This missed remark indicates that a globalized value was found on the target
|
|
|
|
device that was not either replaced with stack memory by :ref:`OMP110 <omp110>`
|
|
|
|
or shared memory by :ref:`OMP111 <omp111>`. Globalization that has not been
|
2021-07-27 04:01:41 +08:00
|
|
|
removed will need to be handled by the runtime and will significantly impact
|
2021-07-15 05:04:54 +08:00
|
|
|
performance.
|
|
|
|
|
2021-07-27 04:01:41 +08:00
|
|
|
The OpenMP standard requires that threads are able to share their data between
|
|
|
|
each-other. However, this is not true by default when offloading to a target
|
2021-07-15 05:04:54 +08:00
|
|
|
device such as a GPU. Threads on a GPU cannot shared their data unless it is
|
|
|
|
first placed in global or shared memory. In order to create standards complaint
|
|
|
|
code, the Clang compiler will globalize any variables that could potentially be
|
|
|
|
shared between the threads. In the majority of cases, globalized variables can
|
|
|
|
either be returns to a thread-local stack, or pushed to shared memory. However,
|
|
|
|
in a few cases it is necessary and will cause a performance penalty.
|
|
|
|
|
|
|
|
Examples
|
|
|
|
--------
|
|
|
|
|
|
|
|
This example shows legitimate data sharing on the device. It is a convoluted
|
|
|
|
example, but is completely complaint with the OpenMP standard. If globalization
|
|
|
|
was not added this would result in different results on different target
|
|
|
|
devices.
|
|
|
|
|
|
|
|
.. code-block:: c++
|
|
|
|
|
|
|
|
#include <omp.h>
|
|
|
|
#include <cstdio>
|
|
|
|
|
|
|
|
#pragma omp declare target
|
|
|
|
static int *p;
|
|
|
|
#pragma omp end declare target
|
|
|
|
|
|
|
|
void foo() {
|
|
|
|
int x = omp_get_thread_num();
|
|
|
|
if (omp_get_thread_num() == 1)
|
|
|
|
p = &x;
|
|
|
|
|
|
|
|
#pragma omp barrier
|
|
|
|
|
|
|
|
printf ("Thread %d: %d\n", omp_get_thread_num(), *p);
|
|
|
|
}
|
|
|
|
|
|
|
|
int main() {
|
|
|
|
#pragma omp target parallel
|
|
|
|
foo();
|
|
|
|
}
|
|
|
|
|
|
|
|
.. code-block:: console
|
|
|
|
|
|
|
|
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
|
|
|
|
omp112.cpp:9:7: remark: Found thread data sharing on the GPU. Expect degraded performance
|
|
|
|
due to data globalization. [OMP112] [-Rpass-missed=openmp-opt]
|
|
|
|
int x = omp_get_thread_num();
|
|
|
|
^
|
|
|
|
|
|
|
|
A less convoluted example globalization that cannot be removed occurs when
|
|
|
|
calling functions that aren't visible from the current translation unit.
|
|
|
|
|
|
|
|
.. code-block:: c++
|
|
|
|
|
|
|
|
extern void use(int *x);
|
|
|
|
|
|
|
|
void foo() {
|
|
|
|
int x;
|
|
|
|
use(&x);
|
|
|
|
}
|
|
|
|
|
|
|
|
int main() {
|
|
|
|
#pragma omp target parallel
|
|
|
|
foo();
|
|
|
|
}
|
|
|
|
|
|
|
|
.. code-block:: console
|
|
|
|
|
|
|
|
$ clang++ -fopenmp -fopenmp-targets=nvptx64 -O1 -Rpass-missed=openmp-opt omp112.cpp
|
|
|
|
omp112.cpp:4:7: remark: Found thread data sharing on the GPU. Expect degraded performance
|
|
|
|
due to data globalization. [OMP112] [-Rpass-missed=openmp-opt]
|
|
|
|
int x;
|
|
|
|
^
|
|
|
|
|
|
|
|
Diagnostic Scope
|
|
|
|
----------------
|
|
|
|
|
|
|
|
OpenMP target offloading missed remark.
|