forked from OSchip/llvm-project
56 lines
2.9 KiB
ReStructuredText
56 lines
2.9 KiB
ReStructuredText
OpenMP Optimization Remarks
|
|
===========================
|
|
|
|
The :doc:`OpenMP-Aware optimization pass </optimizations/OpenMPOpt>` is able to
|
|
generate compiler remarks for performed and missed optimisations. To emit them,
|
|
pass ``-Rpass=openmp-opt``, ``-Rpass-analysis=openmp-opt``, and
|
|
``-Rpass-missed=openmp-opt`` to the Clang invocation. For more information and
|
|
features of the remark system the clang documentation should be consulted:
|
|
|
|
+ `Clang options to emit optimization reports <https://clang.llvm.org/docs/UsersManual.html#options-to-emit-optimization-reports>`_
|
|
+ `Clang diagnostic and remark flags <https://clang.llvm.org/docs/ClangCommandLineReference.html#diagnostic-flags>`_
|
|
+ The `-foptimization-record-file flag
|
|
<https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-foptimization-record-file>`_
|
|
and the `-fsave-optimization-record flag
|
|
<https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang1-fsave-optimization-record>`_
|
|
|
|
|
|
.. _ompXXX:
|
|
|
|
Some OpenMP remarks start with a "tag", like `[OMP100]`, which indicates that
|
|
there is further information about them on this page. To directly jump to the
|
|
respective entry, navigate to
|
|
`https://openmp.llvm.org/docs/remarks/OptimizationRemarks.html#ompXXX <https://openmp.llvm.org/docs/remarks/OptimizationRemarks.html#ompXXX>`_ where `XXX` is
|
|
the three digit code shown in the tag.
|
|
|
|
|
|
----
|
|
|
|
|
|
.. _omp100:
|
|
.. _omp_no_external_caller_in_target_region:
|
|
|
|
`[OMP100]` Potentially unknown OpenMP target region caller
|
|
----------------------------------------------------------
|
|
|
|
A function remark that indicates the function, when compiled for a GPU, is
|
|
potentially called from outside the translation unit. Note that a remark is
|
|
only issued if we tried to perform an optimization which would require us to
|
|
know all callers on the GPU.
|
|
|
|
To facilitate OpenMP semantics on GPUs we provide a runtime mechanism through
|
|
which the code that makes up the body of a parallel region is shared with the
|
|
threads in the team. Generally we use the address of the outlined parallel
|
|
region to identify the code that needs to be executed. If we know all target
|
|
regions that reach the parallel region we can avoid this function pointer
|
|
passing scheme and often improve the register usage on the GPU. However, If a
|
|
parallel region on the GPU is in a function with external linkage we may not
|
|
know all callers statically. If there are outside callers within target
|
|
regions, this remark is to be ignored. If there are no such callers, users can
|
|
modify the linkage and thereby help optimization with a `static` or
|
|
`__attribute__((internal))` function annotation. If changing the linkage is
|
|
impossible, e.g., because there are outside callers on the host, one can split
|
|
the function into an external visible interface which is not compiled for
|
|
the target and an internal implementation which is compiled for the target
|
|
and should be called from within the target region.
|