This change fixes the implementation of OMP_THREAD_LIMIT. The implementation of
this previously was not restricted to a contention group (but it should be,
according to the spec), and this is fixed here. A field is added to root thread
to store a counter of the threads in the contention group. An extra check is
added when reserving threads for a parallel region that checks this variable and
compares to threadlimit-var, which is implemented as a new global variable,
kmp_cg_max_nth. Associated settings changes were also made, and clean up of
comments that referred to OMP_THREAD_LIMIT, but should refer to the new
KMP_DEVICE_THREAD_LIMIT (added in an earlier patch).
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D35912
llvm-svn: 309319
This change drops in KMP_DEVICE_THREAD_LIMIT to replace KMP_MAX_THREADS. It's
possible there will eventually be a OMP_DEVICE_THREAD_LIMIT, and we need
something to distinguish from OMP_THREAD_LIMIT, which is currently implemented
incorrectly (the fix for that will be added soon in a separate patch).
KMP_ALL_THREADS is deprecated here, but we can keep the "all" option on
KMP_DEVICE_THREAD_LIMIT to support that functionality. KMP_DEVICE_THREAD_LIMIT
now has priority over its deprecated rival KMP_ALL_THREADS. I also cleaned up
some comments that incorrectly referred to non-existent kmp_max_threads variable
instead of kmp_max_nth.
I've left the name of where this setting eventually ends up as
__kmp_max_nth, for now.
This change does not change much in the way of functionality. It does NOT change
OMP_THREAD_LIMIT. It's just cleaning up and setting up for that.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D35860
llvm-svn: 309168
Introduce OPENMP_ENABLE_LIBOMPTARGET which defaults to OFF at the moment.
libomptarget is not yet ready for prime time:
- Offloading to NVIDIA GPUs is not completed yet (compiler, device RTL)
- The generic ELF plugin for offloading to the host (meant for testing)
uses a single instance of the OpenMP runtime (libomp). That is why
omp_is_initial_device() returns 1 which makes the tests fail.
Because of these reasons, we want to disable building (and testing!)
for release 5.0.
See https://bugs.llvm.org/show_bug.cgi?id=33859
Differential Revision: https://reviews.llvm.org/D35719
llvm-svn: 309115
Removed unused __kmp_env_* variables. Also clangified other people's code.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D35808
llvm-svn: 309000
Summary:
The kmp_os.h header is defining the `PAGE_SIZE` macro unconditionally,
even while it is only used directly after its definition, for the
Windows implementation of the `KMP_GET_PAGE_SIZE()` macro.
On at least FreeBSD, but likely all other BSDs too, this macro conflicts
with the one defined in system headers, so remove it, since nothing else
uses it. Make all Unixes use `getpagesize()` instead, and use
`GetSystemInfo()` for the Windows case.
Reviewers: jlpeyton, jcownie, emaste, AndreyChurbanov
Reviewed By: AndreyChurbanov
Subscribers: AndreyChurbanov, hfinkel, zturner
Differential Revision: https://reviews.llvm.org/D35072
llvm-svn: 308355
Summary:
Taskloop implementation is extended by using recursive task scheduling.
Envirable KMP_TASKLOOP_MIN_TASKS added as a manual threshold for the user
to switch from recursive to linear tasks scheduling.
Details:
* The calculations for the loop parameters are moved from __kmp_taskloop_linear
upper level
* Initial calculation is done in the __kmpc_taskloop, further range splitting
is done in the __kmp_taskloop_recur.
* Added threshold to switch from recursive to linear tasks scheduling;
* One half of split range is scheduled as an internal task which just moves
sub-range parameters to the stealing thread that continues recursive
scheduling (if number of tasks still enough), the other half is processed
recursively;
* Internal task duplication routine fixed to assign parent task, that was not
needed when all tasks were scheduled by same thread, but is needed now.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D35273
llvm-svn: 308338
The internal details of this setting are not meant to be user visible and only create confusion.
Differential Revision: https://reviews.llvm.org/D35269
llvm-svn: 308189
Changes are: got all atomics to accept volatile pointers that allowed
to simplify many type conversions. Windows specific code fixed correspondingly.
Differential Revision: https://reviews.llvm.org/D35417
llvm-svn: 308164
The first bit is actually the "untied" flag. That is why the condition was
wrong and has to be inverted to set the flag correctly.
Found and initial patch by Simon Convent!
llvm-svn: 307899
Summary:
On Unix, a .S file is normally an assembly source which must be
preprocessed with a C preprocessor, while a .s file is "plain" assembly.
The former is handled by the compiler driver (cc), the latter is
directly passed to the assembler binary (as).
Because z_Linux_asm.s is supposed to be preprocessed, rename it to .S,
so it can be automatically picked up correctly by build systems.
Reviewers: AndreyChurbanov, emaste, jlpeyton
Reviewed By: AndreyChurbanov
Subscribers: mgorny, openmp-commits
Differential Revision: https://reviews.llvm.org/D35171
llvm-svn: 307680
While importing libomp into the FreeBSD base system we encountered
Clang warnings that "'register' storage class specifier is deprecated
and incompatible with C++1z [-Wdeprecated-register]".
Differential Revision: https://reviews.llvm.org/D35124
llvm-svn: 307441
Changes are: replaced C-style casts with cons_cast and reinterpret_cast;
type of several counters changed to signed; type of parameters of 32-bit and
64-bit AND and OR intrinsics changes to unsigned; changed files formatted
using clang-format version 3.8.1.
Differential Revision: https://reviews.llvm.org/D34759
llvm-svn: 307020
I've found it very difficult to get test/parallel/omp_nested.c to pass
consistently across my build environments. The problem is that it creates N^2
threads (it is testing nested parallel regions), and that often exceeds the
thread limits on systems with many cores. We do raise the process limits in
lit, and that often helps, but if running lit with a smaller number of threads
or on a system where we're otherwise resource constrained, this particular test
tends to fail (because the runtime cannot create a sufficient number of
threads).
This seems to work: if the maximum number of threads is more than some small
number, then cap the number of threads used for the parallel region. The choice
of 4 here is somewhat arbitrary.
Differential Revision: https://reviews.llvm.org/D32033
llvm-svn: 306357
Summary: On BSDs, there is no `libdl.so`, and functions like `dlopen`
are implemented in the main C library instead. Use the `CMAKE_DL_LIBS`
variable instead of hardcoding a dependency on the `dl` library.
Reviewers: grokos, joerg, emaste
Reviewed By: emaste
Subscribers: jlpeyton, mgorny, openmp-commits
Differential Revision: https://reviews.llvm.org/D34632
llvm-svn: 306319
Reset affinity to none (false for proc-bind-var) so that threads in the child
processes are not bound tightly, unless the user explicitly sets this in
KMP_AFFINITY/OMP_PROC_BIND, in child processes. This can improve
performance for scripting languages which fork for parallelism like Python's
multiprocessing module.
Differential Revision: https://reviews.llvm.org/D34154
llvm-svn: 305513
If OpenMP is initialized before fork()-ing occurs and affinity is set to
something like compact, then the master thread will be pinned to a single HW
thread/core after initialization. If the master (or any other thread) then
forks N processes, all N processes will then be pinned to that same single HW
thread/core. To reset the affinity for the new child process, the atfork
handler for the child process can call kmp_set_thread_affinity_mask_initial()
to reset its affinity to the initial affinity of the application before it
re-initializes libomp. The parent process will not be affected and still
keeps its affinity setting.
Differential Revision: https://reviews.llvm.org/D34118
llvm-svn: 305306
Fix static initializers to use the proper unlocked value for the poll
field of the tas and futex locks.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D33794
llvm-svn: 304828
Some code was restructured to move it under KMP_DEBUG. The rest is
formatting changes to fix some things broken by clang-format
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D33744
llvm-svn: 304438
With these settings, the create_hwloc_map() method was being called causing an
assert(). After some consideration, it was determined that disabling affinity
explicitly should just disable hwloc as well. i.e., KMP_AFFINITY overrides
KMP_TOPOLOGY_METHOD. This lets the user know that the Hwloc mechanism is being
ignored when KMP_AFFINITY=disabled.
Differential Revision: https://reviews.llvm.org/D33208
llvm-svn: 304344
This change checks if the initial affinity mask is equal to exactly one
Windows processor group's affinity mask. If it is, then the code does not
respect the initial affinity mask and uses the entire machine instead.
The reasoning behind this is that, by default, Windows assigns exactly one
processor group as the initial affinity mask even when there are multiple
Windows processor groups available. User's typically want to use the whole
machine, so we ignore this special case and use the entire machine.
If the initial affinity mask is a proper subset of one group, or spans multiple
groups, then the initial affinity mask is respected since we can assume that the
operating system did not assign this initial affinity mask. This change only
affects machines with multiple processor groups
Differential Revision: https://reviews.llvm.org/D33210
llvm-svn: 304343
An assert() was being tripped when KMP_AFFINITY=respect + Multiple Processor
Groups. Let __kmp_affinity_create_proc_group_map() function be able to create
address2os object which contains a single group by deleting restriction that
process affinity mask must span multiple groups.
llvm-svn: 303101
This patch contains the clang-format and cleanup of the entire code base. Some
of clang-formats changes made the code look worse in places. A best effort was
made to resolve the bulk of these problems, but many remain. Most of the
problems were mangling line-breaks and tabbing of comments.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D32659
llvm-svn: 302929
This patch chagnes the plugin interface so that:
1) future plugins can take advantage of systems with shared CPU/device storage
2) instead of using base addresses, target regions are launched by providing target addresseds and base offsets explicitly.
Differential revision: https://reviews.llvm.org/D33028
llvm-svn: 302663
Older Hwloc libraries (< 1.10.0) don't offer the HWLOC_OBJ_NUMANODE nor
HWLOC_OBJ_PACKAGE types. Instead they are named HWLOC_OBJ_NODE and
HWLOC_OBJ_SOCKET instead. This patch just defines the newer names based on
the older names when using an older Hwloc.
Differential Revision: https://reviews.llvm.org/D32496
llvm-svn: 301349
Without this fix cancellation status for parallel, sections and for persists
across construct boundaries.
Differential Revision: https://reviews.llvm.org/D31419
llvm-svn: 299434