* Incorrect lock value written in __kmp_test_futex_lock
* Incorrect lock value check in tas/futex lock with USE_LOCK_PROFILE on
Patch by Hansang Bae
llvm-svn: 274053
UNICODE and _UNICODE defintions were added in the LLVM CMake build system.
While on Unices, the UNICODE/_UNICODE macros don't cause problems, on Windows
only ittnotify_static.c should be compiled using -DUNICODE. We are still
looking at a proper fix, but this change sets the build back to exactly what it
was doing before. Also, a comment and TODO were added in the src/CMakeLists.txt
file to help explain.
llvm-svn: 274052
That patch made all LLVM projects build with -DUNICODE. However, this doesn't
work for the OpenMP runtime.
But just overriding the flag with -UUNICODE breaks compiling ittnotify_static.c,
which for some reason needs to be compiled with -DUNICIODE. Note that compiling
ittnotify.h with -DUNICODE does not work though.
This seems like a mess. This commit fixes it for now, but it would be great
if someone who works on the OpenMP runtime could fix it properly.
llvm-svn: 273898
Bug fix for hang when omp task and nested parallelism used together.
Still some problem remains with task state saving/restoring, but
user's case works fine now. All tasking unit tests passed as well.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21558
llvm-svn: 273297
Replaced readings of nproc from team structure with ones from
thread structure to improve performance.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21559
llvm-svn: 273293
The removal of legacy code to support long-deprecated debugger support library
resulted in some whitespace changes. Comments from that legacy code were made
public as they may be useful for other debuggers.
Patch by Olga Malysheva.
Differential Revision: http://reviews.llvm.org/D21391
llvm-svn: 273282
A couple improvements:
1) Add ability to limit fullMask size when KMP_HW_SUBSET limits resources.
2) Make KMP_HW_SUBSET work for affinity_none, and only limit fullMask in this case.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21528
llvm-svn: 273278
There was a segfault in the stubs library in posix_memalign because
of a bad parameter. The fix is to send address of the pointer as a
parameter. Also added check of result of posix_memalign.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21529
llvm-svn: 273276
This change appends the process id to the KMP_STATS_FILE (if specified) which
enables MPI processes to output their stats to separate files.
Differential Revision: http://reviews.llvm.org/D21386
llvm-svn: 273273
Change hwloc discovery algorithm to print topology for only accessible
resources, and report uniformity correspondingly, similar to what other topology
discovery algorithms do. Fixes minor inconsistency in total topology reported
and resources used for threads binding in case hwloc used.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21389
llvm-svn: 272952
This patch allows a user to enable Hwloc on windows. There are three main
changes in here:
1.kmp.h - Move definitions/declarations out of KMP_OS_WINDOWS guard (our windows
implementation of affinity) because they need to be defined when
KMP_USE_HWLOC is on as well.
2.teach __kmp_set_system_affinity, __kmp_get_system_affinity,
__kmp_get_proc_group, and __kmp_affinity_bind_thread how to use hwloc.
3.teach CMake how to include hwloc when building Windows
Another minor change in here is to make sure that anything under KMP_USE_HWLOC
is also guarded by KMP_AFFINITY_SUPPORTED as well. This is to prevent Mac
builds from requiring anything from Hwloc.
Differential Revision: http://reviews.llvm.org/D21441
llvm-svn: 272951
With single thread using __kmpc_omp_wait_deps segfaults in OpenMP runtime.
Offloading with depend also encounters this problem when we generate
kmpc_omp_wait_deps instead of kmpc_omp_task_with_deps.
Patch by Alex Duran
Differential Revision: http://reviews.llvm.org/D21384
llvm-svn: 272949
Cleanup: fixed missing memory cleanup in couple of corner cases. Fixes possible
memory leak in some corner cases
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21355
llvm-svn: 272946
Improved performance of ittnotify calls by request from ittnotify
owner: calls to __itt_string_handle_create made unique (it was
called multiple times).
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21353
llvm-svn: 272945
Deprecate KMP_PLACE_THREADS and rename it to KMP_HW_SUBSET due to confusion
about its purpose and function among users. KMP_HW_SUBSET is an environment
variable which allows users to easily pick a subset of the hardware topology to
use. e.g., KMP_HW_SUBSET=30c,2t means use 30 cores, 2 threads per core.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21340
llvm-svn: 272937
Added argv array check/allocation for parallel directly nested inside the teams
construct, as new coming Fortran codegen passes parameters directly into
kmpc_fork_call missing same parameters in kmpc_fork_teams (earlier codegen
passed to parallel the subset of parameter passed to teams, and thus
no check/allocation needed).
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21336
llvm-svn: 272935
Currently, there is a big overhead in reporting of loop metadata through
ittnotify. The pair of functions: __kmp_str_loc_init/__kmp_str_loc_free are
replaced with strchr/atoi calls. Thus, a lot of time consuming actions are
skipped - many memory allocations/deallocations, heavy string duplication, etc.
The loop metadata only needs line and column info from the source string, so no
allocations and string splitting actually needed.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21309
llvm-svn: 272698
Cleanup - unused code removal.
TODO: consider to remove (replace with flag class methods)
also kmp_wait_64 and kmp_release_64 routines.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21332
llvm-svn: 272697
OpenMP 4.1 is now OpenMP 4.5. Any mention of 41 or 4.1 is replaced with
45 or 4.5. Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that
41 is deprecated and to use 45 instead.
llvm-svn: 272687
Fix for bugzilla https://llvm.org/bugs/show_bug.cgi?id=26602. Removed functions
body consisted of the only KMP_ASSERT(0) statement. Thus possible runtime crash
converted to compile-time error, which looks preferable (faster possible error
detection).
TODO: consider C++11 static assert as an alternative, that could
make the diagnostics better.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21304
llvm-svn: 272590
Remove static specifier from var fullMask and remove kmp_get_fullMask() routine.
When iterating through procs in a mask, always check if proc is in fullMask
(this check was missing in a few places).
Patch by Brian Bliss.
Differential Revision: http://reviews.llvm.org/D21300
llvm-svn: 272589
If either current_task or new_task is untied then skip task scheduling
constraint checks, because untied tasks are not affected by the task
scheduling constraints.
Differential Revision: http://reviews.llvm.org/D21196
llvm-svn: 272570
The problem scenario is the following:
A dynamic library, libfoo.so, depends on libomp.so (it creates parallel region
and calls some omp functions). An application has a loop where it dynamically
loads libfoo.so, calls the function from it, unloads libfoo.so. After several
loop iterations application crashes with the message about lack of resources
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
The problem is that pthread_kill() was not followed by pthread_join() in case
of terminated thread. This patch fixes this problem for both worker and monitor
threads.
Differential Revision: http://reviews.llvm.org/D21200
llvm-svn: 272567
These changes remove the hwloc_topology_ignore_type function which doesn't exist
in the hwloc 2.0 API. In the existing code, the topology extracted from hwloc
has the cache levels stripped out and then assumes the final stripped topology
follows the typical three-level topology: packages -> cores -> HW threads.
But the code is doing unclean manipulations to determine at what level those
resources are located and also assumes too much about what hwloc is detecting
(there could be intermediate levels in between socket and core for instance).
This new way of extracting the topology doesn't strip out any hardware objects
that hwloc detects. It does not assume the three level topology, and instead
searches for the relevant three levels within the topology for each bit of
information using hwloc interface functions. i.e., the three level topology
subset that our affinity code is interested in is extracted from the hwloc
topology tree directly.
For example, the new __kmp_hwloc_get_nobjs_under_obj function gives the user the
number of cores under a socket reliably without worrying if there are unexpected
objects between the socket object and core object in the hwloc topology
structure. Also, now that all topology information is kept, there are also
possibilities of using the caches/numa nodes to determine more sophisticated
affinity settings in the future.
There is also some cleanup code added for the destruction of the
__kmp_hwloc_topology object.
Differential Revision: http://reviews.llvm.org/D21195
llvm-svn: 272565
The bitmask complement operation doesn't consider the max proc id which means
something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a
Linux system even though there aren't 600 processors on said system. This
change has the complement bitmask and-ed with the fullmask so that it will only
contain valid processors.
Differential Revision: http://reviews.llvm.org/D21245
llvm-svn: 272561
Refactored __kmp_execute_tasks_template to shorten and remove code redundancy.
The original code for __kmp_execute_tasks_template was very redundant with
large sections of repeated code that needed to be kept consistent, and goto
statements that made the control flow difficult to discern. This refactoring
removes all gotos and redundancy.
Patch by Terry Wilmarth
Differential Revision: http://reviews.llvm.org/D20879
llvm-svn: 272286
MSVC doesn't allow std::atomic<>s in a union since they don't have trivial
copy constructor. Replacing them with e.g. std::atomic_int works, but that
breaks the GCC build on Linux, because then calls to e.g. std::atomic_load_explicit
fail, as they expect a real std::atomic<> pointer.
Fixing this with an #ifdef to unbreak the build for now.
llvm-svn: 272271
As I replaced no-op TCR_4 with actual code, compiler complained while building debug build.
This patch moves 'cast to int' to the correct place.
Extension to Differential Revision: http://reviews.llvm.org/D19880
llvm-svn: 271377
This patch replaces use of compiler builtin atomics with
C++11 atomics for ticket locks implementation. Ticket locks
are used in critical places of the runtime, e.g. in the tasking
mechanism.
The main reason this change was introduced is the problem
with work stealing function on ARM architecture which suffered
from nasty race condition. It turned out that the root cause of
the problem lies in the way ticket locks are implemented. Changing
compiler builtins into C++11 atomics solves the problem.
Two assertions were added into kmp_tasking.c which are useful
for detecting early symptoms of something wrong going on with
work stealing, which were among the possible outcomes of the
race condition.
Differential Revision: http://reviews.llvm.org/D19878
llvm-svn: 271324
This patch implements the new kmp_sch_static_balanced_chunked schedule kind that
the compiler will generate when it encounters schedule(simd: static). It just
adds the new constant and the new switch case __kmp_for_static_init.
Patch by Alex Duran.
Differential Revision: http://reviews.llvm.org/D20699
llvm-svn: 271320
When an asynchronous offload task is completed, COI calls the runtime to queue
a "destructor task". When the task deques are full, a dead-lock situation
arises where the OpenMP threads are inside but cannot progress because the COI
thread is stuck inside the runtime trying to find a slot in a deque.
This patch implements the solution where the task deques doubled in size when
a task is being queued from a COI thread.
Differential Revision: http://reviews.llvm.org/D20733
llvm-svn: 271319
The problem is the lack of dispatch buffers when thousands of loops with nowait,
about 10 iterations each, are executed by hundreds of threads. We only have
built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
buffers.
The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
to give users same possibility I changed build-time control into run-time one,
adding API just in case.
This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
function kmp_set_disp_num_buffers(int num_buffers).
The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
because during the serial initialization we already allocate buffers for the hot
team, so it is too late to change the number of buffers later (or we need to
reallocate buffers for all teams which sounds too complicated). The
kmp_set_defaults() routine does not work for this envirable, because it calls
serial initialization before reading the parameter string. So a new routine,
kmp_set_disp_num_buffers(), is created so that it can set our internal global
variable before the library initialization. If both the envirable and API used
the envirable wins.
Differential Revision: http://reviews.llvm.org/D20697
llvm-svn: 271318
The OMP_PROC_BIND=spread strategy fails to assign the master thread the
correct place partition after the first parallel region. Other threads in the
hot team will remember their place_partition, but the master's place partition
is restored to what it was before entering the parallel region. So when the hot
team is used for subsequent parallel regions, the master has lost this info.
This fix calls __kmp_partition_places to update only the master thread's place
partition in the spread case when there are no other changes to the hot team.
Patch by Terry Wilmarth
Differential Revision: http://reviews.llvm.org/D20539
llvm-svn: 270890
On Blue Gene/Q, having LIBOMP_USE_ITT_NOTIFY support compiled into a
statically-linked binary causes a failure at runtime because dlopen fails.
This patch changes LIBOMP_USE_ITT_NOTIFY to a cacheable configuration setting
that can be disabled.
Patch by John Mellor-Crummey
Differential Revision: http://reviews.llvm.org/D20517
llvm-svn: 270884
Clang no longer restricts itself to generating microtasks with a small number
of arguments, and so an assembly implementation is required to prevent hitting
the parameter limit present in the C implementation. This adds an
implementation for ppc64[le].
llvm-svn: 270821
Most of this is modifications to check for differences before updating data
fields in team struct. There is also some rearrangement of the team struct.
Patch by Diego Caballero
Differential Revision: http://reviews.llvm.org/D20487
llvm-svn: 270468
These changes allow testing on Windows using clang.exe.
There are two main changes:
1. Only link to -lm when it actually exists on the system
2. Create basic versions of pthread_create() and pthread_join() for windows.
They are not POSIX compliant by any stretch but will allow any existing
and future tests to use pthread_create() and pthread_join() for testing
interactions of libomp with os threads.
Differential Revision: http://reviews.llvm.org/D20391
llvm-svn: 270464