When linking with libhwloc, the ORDERED EPCC test slows down on big
machines (> 48 cores). Performance analysis showed that a cache thrash
was occurring and this padding helps alleviate the problem.
Also, inside the main spin-wait loop in kmp_wait_release.h, we can eliminate
the references to the global shared variables by instead creating a local
variable, oversubscribed and instead checking that.
Differential Revision: http://reviews.llvm.org/D22093
llvm-svn: 274894
These tests are now modeled after the sections nowait test where threads wait
to be released in the first construct (either for or single) and the last thread
skips the last for/single construct and releases those threads. If the test
fails, then it hangs because an unnecessary barrier is executed in between the
constructs.
llvm-svn: 274641
If update_master_only is set the place list is not completely traversed
and therefore this assertion failed. Make it only trigger if
update_master_only is false.
(was introduced by D20539)
Differential Revision: http://reviews.llvm.org/D21925
llvm-svn: 274482
This change fixes an error in comparing the existing schedule on the team to
the new schedule, in the chunk field. Also added additional checks and used
KMP_CHECK_UPDATE where appropriate.
Patch by Terry Wilmarth.
Differential Revision: http://reviews.llvm.org/D21897
llvm-svn: 274371
EPCC Performance of single is considerably worse than plain barrier.
Adding a read-only check to the code before the atomic compare-and-store
helps considerably.
Patch by Terry Wilmarth.
Differential Revision: http://reviews.llvm.org/D21893
llvm-svn: 274369
This rewrite of the omp_sections_nowait.c test file causes it to hang if the
nowait is not respected. If the nowait isn't respected, the lone thread which
can escape the first sections construct will just sleep at a barrier which
shouldn't exist. All reliance on timers is taken out. For good measure, the test
makes sure that all eight sections are executed as well. The test should take no
longer than a few seconds on any modern machine.
Differential Revision: http://reviews.llvm.org/D21842
llvm-svn: 274151
* Incorrect lock value written in __kmp_test_futex_lock
* Incorrect lock value check in tas/futex lock with USE_LOCK_PROFILE on
Patch by Hansang Bae
llvm-svn: 274053
UNICODE and _UNICODE defintions were added in the LLVM CMake build system.
While on Unices, the UNICODE/_UNICODE macros don't cause problems, on Windows
only ittnotify_static.c should be compiled using -DUNICODE. We are still
looking at a proper fix, but this change sets the build back to exactly what it
was doing before. Also, a comment and TODO were added in the src/CMakeLists.txt
file to help explain.
llvm-svn: 274052
That patch made all LLVM projects build with -DUNICODE. However, this doesn't
work for the OpenMP runtime.
But just overriding the flag with -UUNICODE breaks compiling ittnotify_static.c,
which for some reason needs to be compiled with -DUNICIODE. Note that compiling
ittnotify.h with -DUNICODE does not work though.
This seems like a mess. This commit fixes it for now, but it would be great
if someone who works on the OpenMP runtime could fix it properly.
llvm-svn: 273898
Bug fix for hang when omp task and nested parallelism used together.
Still some problem remains with task state saving/restoring, but
user's case works fine now. All tasking unit tests passed as well.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21558
llvm-svn: 273297
Replaced readings of nproc from team structure with ones from
thread structure to improve performance.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21559
llvm-svn: 273293
The removal of legacy code to support long-deprecated debugger support library
resulted in some whitespace changes. Comments from that legacy code were made
public as they may be useful for other debuggers.
Patch by Olga Malysheva.
Differential Revision: http://reviews.llvm.org/D21391
llvm-svn: 273282
A couple improvements:
1) Add ability to limit fullMask size when KMP_HW_SUBSET limits resources.
2) Make KMP_HW_SUBSET work for affinity_none, and only limit fullMask in this case.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21528
llvm-svn: 273278
There was a segfault in the stubs library in posix_memalign because
of a bad parameter. The fix is to send address of the pointer as a
parameter. Also added check of result of posix_memalign.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21529
llvm-svn: 273276
This change appends the process id to the KMP_STATS_FILE (if specified) which
enables MPI processes to output their stats to separate files.
Differential Revision: http://reviews.llvm.org/D21386
llvm-svn: 273273
Change hwloc discovery algorithm to print topology for only accessible
resources, and report uniformity correspondingly, similar to what other topology
discovery algorithms do. Fixes minor inconsistency in total topology reported
and resources used for threads binding in case hwloc used.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21389
llvm-svn: 272952
This patch allows a user to enable Hwloc on windows. There are three main
changes in here:
1.kmp.h - Move definitions/declarations out of KMP_OS_WINDOWS guard (our windows
implementation of affinity) because they need to be defined when
KMP_USE_HWLOC is on as well.
2.teach __kmp_set_system_affinity, __kmp_get_system_affinity,
__kmp_get_proc_group, and __kmp_affinity_bind_thread how to use hwloc.
3.teach CMake how to include hwloc when building Windows
Another minor change in here is to make sure that anything under KMP_USE_HWLOC
is also guarded by KMP_AFFINITY_SUPPORTED as well. This is to prevent Mac
builds from requiring anything from Hwloc.
Differential Revision: http://reviews.llvm.org/D21441
llvm-svn: 272951
With single thread using __kmpc_omp_wait_deps segfaults in OpenMP runtime.
Offloading with depend also encounters this problem when we generate
kmpc_omp_wait_deps instead of kmpc_omp_task_with_deps.
Patch by Alex Duran
Differential Revision: http://reviews.llvm.org/D21384
llvm-svn: 272949
Cleanup: fixed missing memory cleanup in couple of corner cases. Fixes possible
memory leak in some corner cases
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21355
llvm-svn: 272946
Improved performance of ittnotify calls by request from ittnotify
owner: calls to __itt_string_handle_create made unique (it was
called multiple times).
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21353
llvm-svn: 272945
Deprecate KMP_PLACE_THREADS and rename it to KMP_HW_SUBSET due to confusion
about its purpose and function among users. KMP_HW_SUBSET is an environment
variable which allows users to easily pick a subset of the hardware topology to
use. e.g., KMP_HW_SUBSET=30c,2t means use 30 cores, 2 threads per core.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21340
llvm-svn: 272937
Added argv array check/allocation for parallel directly nested inside the teams
construct, as new coming Fortran codegen passes parameters directly into
kmpc_fork_call missing same parameters in kmpc_fork_teams (earlier codegen
passed to parallel the subset of parameter passed to teams, and thus
no check/allocation needed).
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21336
llvm-svn: 272935
Currently, there is a big overhead in reporting of loop metadata through
ittnotify. The pair of functions: __kmp_str_loc_init/__kmp_str_loc_free are
replaced with strchr/atoi calls. Thus, a lot of time consuming actions are
skipped - many memory allocations/deallocations, heavy string duplication, etc.
The loop metadata only needs line and column info from the source string, so no
allocations and string splitting actually needed.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21309
llvm-svn: 272698
Cleanup - unused code removal.
TODO: consider to remove (replace with flag class methods)
also kmp_wait_64 and kmp_release_64 routines.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21332
llvm-svn: 272697
OpenMP 4.1 is now OpenMP 4.5. Any mention of 41 or 4.1 is replaced with
45 or 4.5. Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that
41 is deprecated and to use 45 instead.
llvm-svn: 272687
Fix for bugzilla https://llvm.org/bugs/show_bug.cgi?id=26602. Removed functions
body consisted of the only KMP_ASSERT(0) statement. Thus possible runtime crash
converted to compile-time error, which looks preferable (faster possible error
detection).
TODO: consider C++11 static assert as an alternative, that could
make the diagnostics better.
Patch by Andrey Churbanov
Differential Revision: http://reviews.llvm.org/D21304
llvm-svn: 272590
Remove static specifier from var fullMask and remove kmp_get_fullMask() routine.
When iterating through procs in a mask, always check if proc is in fullMask
(this check was missing in a few places).
Patch by Brian Bliss.
Differential Revision: http://reviews.llvm.org/D21300
llvm-svn: 272589
If either current_task or new_task is untied then skip task scheduling
constraint checks, because untied tasks are not affected by the task
scheduling constraints.
Differential Revision: http://reviews.llvm.org/D21196
llvm-svn: 272570
The problem scenario is the following:
A dynamic library, libfoo.so, depends on libomp.so (it creates parallel region
and calls some omp functions). An application has a loop where it dynamically
loads libfoo.so, calls the function from it, unloads libfoo.so. After several
loop iterations application crashes with the message about lack of resources
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
The problem is that pthread_kill() was not followed by pthread_join() in case
of terminated thread. This patch fixes this problem for both worker and monitor
threads.
Differential Revision: http://reviews.llvm.org/D21200
llvm-svn: 272567
These changes remove the hwloc_topology_ignore_type function which doesn't exist
in the hwloc 2.0 API. In the existing code, the topology extracted from hwloc
has the cache levels stripped out and then assumes the final stripped topology
follows the typical three-level topology: packages -> cores -> HW threads.
But the code is doing unclean manipulations to determine at what level those
resources are located and also assumes too much about what hwloc is detecting
(there could be intermediate levels in between socket and core for instance).
This new way of extracting the topology doesn't strip out any hardware objects
that hwloc detects. It does not assume the three level topology, and instead
searches for the relevant three levels within the topology for each bit of
information using hwloc interface functions. i.e., the three level topology
subset that our affinity code is interested in is extracted from the hwloc
topology tree directly.
For example, the new __kmp_hwloc_get_nobjs_under_obj function gives the user the
number of cores under a socket reliably without worrying if there are unexpected
objects between the socket object and core object in the hwloc topology
structure. Also, now that all topology information is kept, there are also
possibilities of using the caches/numa nodes to determine more sophisticated
affinity settings in the future.
There is also some cleanup code added for the destruction of the
__kmp_hwloc_topology object.
Differential Revision: http://reviews.llvm.org/D21195
llvm-svn: 272565
The bitmask complement operation doesn't consider the max proc id which means
something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a
Linux system even though there aren't 600 processors on said system. This
change has the complement bitmask and-ed with the fullmask so that it will only
contain valid processors.
Differential Revision: http://reviews.llvm.org/D21245
llvm-svn: 272561
Refactored __kmp_execute_tasks_template to shorten and remove code redundancy.
The original code for __kmp_execute_tasks_template was very redundant with
large sections of repeated code that needed to be kept consistent, and goto
statements that made the control flow difficult to discern. This refactoring
removes all gotos and redundancy.
Patch by Terry Wilmarth
Differential Revision: http://reviews.llvm.org/D20879
llvm-svn: 272286
MSVC doesn't allow std::atomic<>s in a union since they don't have trivial
copy constructor. Replacing them with e.g. std::atomic_int works, but that
breaks the GCC build on Linux, because then calls to e.g. std::atomic_load_explicit
fail, as they expect a real std::atomic<> pointer.
Fixing this with an #ifdef to unbreak the build for now.
llvm-svn: 272271
As I replaced no-op TCR_4 with actual code, compiler complained while building debug build.
This patch moves 'cast to int' to the correct place.
Extension to Differential Revision: http://reviews.llvm.org/D19880
llvm-svn: 271377
This patch replaces use of compiler builtin atomics with
C++11 atomics for ticket locks implementation. Ticket locks
are used in critical places of the runtime, e.g. in the tasking
mechanism.
The main reason this change was introduced is the problem
with work stealing function on ARM architecture which suffered
from nasty race condition. It turned out that the root cause of
the problem lies in the way ticket locks are implemented. Changing
compiler builtins into C++11 atomics solves the problem.
Two assertions were added into kmp_tasking.c which are useful
for detecting early symptoms of something wrong going on with
work stealing, which were among the possible outcomes of the
race condition.
Differential Revision: http://reviews.llvm.org/D19878
llvm-svn: 271324
This patch implements the new kmp_sch_static_balanced_chunked schedule kind that
the compiler will generate when it encounters schedule(simd: static). It just
adds the new constant and the new switch case __kmp_for_static_init.
Patch by Alex Duran.
Differential Revision: http://reviews.llvm.org/D20699
llvm-svn: 271320