Link error occurred when time profiling in libomp is enabled by default
because `libomp` is assumed to be a C library but the dependence on
`libLLVMSupport` for profiling is a C++ library. Currently the issue blocks all
OpenMP tests in Phabricator.
This patch set a new CMake option `OPENMP_ENABLE_LIBOMP_PROFILING` to
enable/disable the feature. By default it is disabled. Note that once time
profiling is enabled for `libomp`, it becomes a C++ library.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95585
When OMP_PLACES contains an invalid value, the warning informs the user
that the fallback is OMP_PLACES=threads, but the actual internal setting
is OMP_PLACES=cores and is detected as such with KMP_SETTINGS=1.
This patch informs the user that OMP_PLACES=cores is being used instead
of OMP_PLACES=threads.
Differential Revision: https://reviews.llvm.org/D95170
This patch adds the new algorithm for topology discovery using cpuid
leaf 1f. Only the new die level is detected and integrated into the
current affinity mechanisms including KMP_AFFINITY (granularity level
and compact/scatter algorithm), OMP_PLACES=dies, and KMP_HW_SUBSET.
Differential Revision: https://reviews.llvm.org/D95157
HWLOC 2.0 has numa nodes as separate children and are not in the main
parent/child topology tree anymore. This change takes this into
account. The main topology detection loop in the create_hwloc_map()
routine starts at a hardware thread within the initial affinity mask and
goes up the topology tree setting the socket/core/thread labels
correctly.
This change also introduces some of the more generic changes that the
future kmp_topology_t structure will take advantage of including a
generic ratio & count array (finding all ratios of topology layers like
threads/core cores/socket and finding all counts of each topology
layer), generic radix1 reduction step, generic uniformity check, and
generic printing of topology (en_US.txt)
Differential Revision: https://reviews.llvm.org/D95156
Problem reported by Joseph Shen <joseph.smeng@gmail.com>.
The patch changes *(&<atomic-var>) to (&<atomic-var>)->load().
Differential Revision: https://reviews.llvm.org/D95485
This patch sets the def-allocator-var ICV based on the environment variables
provided in OMP_ALLOCATOR. Previously, only allowed value for OMP_ALLOCATOR
was a predefined memory allocator. OpenMP 5.1 specification allows predefined
memory allocator, predefined mem space, or predefined mem space with traits in
OMP_ALLOCATOR. If an allocator can not be created using the provided environment
variables, the def-allocator-var is set to omp_default_mem_alloc.
Differential Revision: https://reviews.llvm.org/D94985
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks. We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.
Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.
Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D77609
The buckets are initialized in __kmp_dephash_create but when they are extended
the memory is allocated but not NULL'd, potentially leaving some buckets
uninitialized after all entries have been copied into the new allocation.
This commit makes sure the buckets are properly initialized with NULL before
copying the entries.
Differential Revision: https://reviews.llvm.org/D95167
Profiling has been recently implemented in libomptarget (D93055). This patch enables time profiling support for libomptarget in libomp, to support profiling of multi-threaded execution of offloaded regions.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94855
The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks. We first use `pthread_create` to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls `__kmpc_fork_call`. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via `pthread_create` on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.
Two environment variables, `LIBOMP_NUM_HIDDEN_HELPER_THREADS` and `LIBOMP_USE_HIDDEN_HELPER_TASK`, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.
Here are some open issues to be discussed:
1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D77609
Hierarchical barrier is an experimental barrier algorithm that uses aspects
of machine hierarchy to define the barrier tree structure. This patch fixes
offset calculation in hierarchical barrier. The offset is used to store info
on a flag about sleeping threads waiting on a location stored in the flag.
This commit also fixes a potential deadlock in hierarchical barrier when
using infinite blocktime by adjusting the offset value of leaf kids so that
it matches the value of leaf state. It also adds testing of default barriers
with infinite blocktime, and also tests hierarchical barrier algorithm with
both default and infinite blocktime.
Patch by Terry Wilmarth and Nawrin Sultana.
Differential Revision: https://reviews.llvm.org/D94241
This change enables volatile use of persistent memory for omp_large_cap_mem*
on supported systems. It depends on libmemkind's support for persistent memory,
and requirements/details can be found at the following url.
https://pmem.io/2020/01/20/memkind-dax-kmem.html
Differential Revision: https://reviews.llvm.org/D94353
Fugaku supercomputer is built with the Fujitsu A64FX microprocessor, whose cache line is 256. In current libomp, we only have cache line size 128 for PPC64 and otherwise 64. This patch added the support of cache line 256 for A64FX. It's worth noting that although A64FX is a variant of AArch64, this property is not shared. As a result, in light of UCX source code (392443ab92/src/ucs/arch/aarch64/cpu.c (L17)), we can only determine by checking whether the CPU is FUJITSU A64FX.
Reviewed By: jdoerfert, Hahnfeld
Differential Revision: https://reviews.llvm.org/D93169
This patch partially prepares the runtime source code to be built with
-Wconversion, which should trigger warnings if any implicit conversions
can possibly change a value. For builds done with icc or gcc, all such
warnings are handled in this patch. clang gives a much longer list of
warnings, particularly for sign conversions, which the other compilers
don't report. The -Wconversion flag is commented into cmake files, but
I'm not going to turn it on. If someone thinks it is important, and wants
to fix all the clang warnings, they are welcome to.
Types of changes made here involve either improving the consistency of types
used so that no conversion is needed, or else performing careful explicit
conversions, when we're sure a problem won't arise.
Patch is a combination of changes by Terry Wilmarth and Johnny Peyton.
Differential Revision: https://reviews.llvm.org/D92942
Introduce new kmp_safe_raii_file_t class with RAII semantics for file
open/close. It is essentially a wrapper around the C-style FILE* object.
This also unifies the way we error report if a file can't be opened.
Differential Revision: https://reviews.llvm.org/D92604
This patch enables serial initialization in the forked child process
to fix unstable runtime behavior when used with Python-based AI tools.
Differential Revision: https://reviews.llvm.org/D93230
This patch introduces a new RTM lock type based on spin lock which is
used for OMP lock with speculative hint on supported architecture.
Differential Revision: https://reviews.llvm.org/D92615
This patch adds new API __kmpc_taskloop_5 to accomadate strict
modifier (introduced in OpenMP 5.1) in num_tasks and grainsize
clause.
Differential Revision: https://reviews.llvm.org/D92352
KMP_AFFINITY=norespect was triggering an error because the underlying
process affinity mask was not updated to include the entire machine.
The Windows documentation states that the thread affinities must be
subsets of the process affinity. This patch also moves the printing
(for KMP_AFFINITY=verbose) of whether the initial mask was respected
out of each topology detection function and to one location where the
initial affinity mask is read.
Differential Revision: https://reviews.llvm.org/D92587
Check pointer returned by strchr, as it can be NULL in case of broken
format of input string. Introduced new function __kmp_str_loc_numbers
for fast parsing of numbers only in the location string.
Also made some cleanup of __kmp_str_loc_init declaration and usage:
- changed type of init_fname parameter to bool;
- changed input from true to false in places where fname is not used.
Differential Revision: https://reviews.llvm.org/D90962
D91692 missed various locations in kmp_gsupport, where the scope for
OMPT_STORE_RETURN_ADDRESS is too narrow, i.e. the scope ends before the OMPT
callback is called in some nested function.
This patch fixes the scoping issue, so that all OMPT tests pass, when the
tests are built with gcc.
Differential Revision: https://reviews.llvm.org/D92121
These changes add support for Intel's umonitor/umwait usage in wait
code, for architectures that support those intrinsic functions. Usage of
umonitor/umwait is off by default, but can be turned on by setting the
KMP_USER_LEVEL_MWAIT environment variable.
Differential Revision: https://reviews.llvm.org/D91189
Added UNLIKELY hint to one-time or rarely executed branches.
This improves performance of the library on some tasking benchmarks.
Differential Revision: https://reviews.llvm.org/D92322
With the change to using shared memory, there were a few problems that need to be fixed.
- The previous filename that was used for SHM only used process id. Given that process is
usually based on 16bit number, this was causing some conflicts on machines. Thus we add
UID to the name to prevent this.
- It appears under some conditions (SIGTERM, etc) the shared memory files were not getting
cleaned up. Added a call to clean up the shm files under those conditions. For this user
needs to set envirable KMP_HANDLE_SIGNALS to true.
Patch by Erdner, Todd <todd.erdner@intel.com>
Differential Revision: https://reviews.llvm.org/D91869
Once __kmp_task_finish is not executed for proxy tasks,
move mutexinoutset dependency code to __kmp_release_deps
which is executed for all task kinds.
Differential Revision: https://reviews.llvm.org/D92326
This is an alternative approach to address inconsistencies pointed out in: D90078
This patch makes sure that the return address is reset, when leaving the scope.
In some cases, I had to move the macro out of an if-statement to have it in the
right scope, in some cases I added an additional block to restrict the scope.
This patch does not handle inconsistencies, which might occur if the return
address is still set when we call into the application.
Test case (repeated_calls.c) provided by @hbae
Differential Revision: https://reviews.llvm.org/D91692
OpenMP 5.1 introduces the new env variable
OMP_TOOL_VERBOSE_INIT=(disabled|stdout|stderr|<filename>) to enable verbose
loading and initialization of OMPT tools.
This env variable helps to understand the cause when loading of a tool fails
(e.g., undefined symbols or dependency not in LD_LIBRARY_PATH)
Output of OMP_TOOL_VERBOSE_INIT is added for OMP_DISPLAY_ENV
Tests for this patch are integrated into the different existing tool loading
tests, making these tests more verbose. An Archer specific verbose test is
integrated into an existing Archer test.
Patch prepared by: Isabel Thärigen
Differential Revision: https://reviews.llvm.org/D91464
Adjusted external reference for Darwin/AARCH64 link compatibility.
Made size directive conditional only if __ELF__ defined.
Patch by Michael_Pique <mpique@icloud.com>
Differential Revision: https://reviews.llvm.org/D88252