Commit Graph

399 Commits

Author SHA1 Message Date
Jonas Hahnfeld 170fcc8772 __kmp_partition_places: Update assertion for new parameter update_master_only
If update_master_only is set the place list is not completely traversed
and therefore this assertion failed. Make it only trigger if
update_master_only is false.

(was introduced by D20539)

Differential Revision: http://reviews.llvm.org/D21925

llvm-svn: 274482
2016-07-04 05:58:10 +00:00
Jonathan Peyton 6b560f0dd9 Fix checks on schedule struct
This change fixes an error in comparing the existing schedule on the team to
the new schedule, in the chunk field. Also added additional checks and used
KMP_CHECK_UPDATE where appropriate.

Patch by Terry Wilmarth.

Differential Revision: http://reviews.llvm.org/D21897

llvm-svn: 274371
2016-07-01 17:54:32 +00:00
Jonathan Peyton c1666960f9 Improve performance of #pragma omp single
EPCC Performance of single is considerably worse than plain barrier.
Adding a read-only check to the code before the atomic compare-and-store
helps considerably.

Patch by Terry Wilmarth.

Differential Revision: http://reviews.llvm.org/D21893

llvm-svn: 274369
2016-07-01 17:37:49 +00:00
Jonathan Peyton fdcca8cd55 Fix omp_sections_nowait.c test to address Bugzilla Bug 28336
This rewrite of the omp_sections_nowait.c test file causes it to hang if the
nowait is not respected. If the nowait isn't respected, the lone thread which
can escape the first sections construct will just sleep at a barrier which
shouldn't exist. All reliance on timers is taken out. For good measure, the test
makes sure that all eight sections are executed as well. The test should take no
longer than a few seconds on any modern machine.

Differential Revision: http://reviews.llvm.org/D21842

llvm-svn: 274151
2016-06-29 19:46:52 +00:00
Jonathan Peyton ac7ba406ed Fix bugs in TAS and futex lock
* Incorrect lock value written in __kmp_test_futex_lock
* Incorrect lock value check in tas/futex lock with USE_LOCK_PROFILE on

Patch by Hansang Bae

llvm-svn: 274053
2016-06-28 19:37:24 +00:00
Jonathan Peyton cceebeef17 Revert r273898's UNICODE quick fix in favor of CMake's remove_definitions()
UNICODE and _UNICODE defintions were added in the LLVM CMake build system.
While on Unices, the UNICODE/_UNICODE macros don't cause problems, on Windows
only ittnotify_static.c should be compiled using -DUNICODE.  We are still
looking at a proper fix, but this change sets the build back to exactly what it
was doing before.  Also, a comment and TODO were added in the src/CMakeLists.txt
file to help explain.

llvm-svn: 274052
2016-06-28 19:25:13 +00:00
Hans Wennborg 8065c51875 Fix the Windows build after r273599
That patch made all LLVM projects build with -DUNICODE. However, this doesn't
work for the OpenMP runtime.

But just overriding the flag with -UUNICODE breaks compiling ittnotify_static.c,
which for some reason needs to be compiled with -DUNICIODE. Note that compiling
ittnotify.h with -DUNICODE does not work though.

This seems like a mess. This commit fixes it for now, but it would be great
if someone who works on the OpenMP runtime could fix it properly.

llvm-svn: 273898
2016-06-27 18:03:45 +00:00
Jonathan Peyton e119e8e5b5 Remove redundant %libomp-compile step from test/lock/omp_lock.c
llvm-svn: 273576
2016-06-23 16:18:59 +00:00
Jonathan Peyton eeec4c8364 Fix bug in futex fast path inside kmp_csupport.c
llvm-svn: 273439
2016-06-22 16:36:07 +00:00
Jonathan Peyton 9d2412c9e5 Apply the KMP_USE_FUTEX feature macro everywhere
llvm-svn: 273438
2016-06-22 16:35:12 +00:00
Jonathan Peyton d4f397741b Add debug trace messages for taskloop
llvm-svn: 273299
2016-06-21 19:18:13 +00:00
Jonathan Peyton c76f9f0df8 Bug fix for hang when tasks used in nested parallel
Bug fix for hang when omp task and nested parallelism used together.
Still some problem remains with task state saving/restoring, but
user's case works fine now. All tasking unit tests passed as well.

Patch by Andrey Churbanov

Differential Revision: http://reviews.llvm.org/D21558

llvm-svn: 273297
2016-06-21 19:12:07 +00:00
Jonathan Peyton ff5ca8b4cf Performance improvement: accessing thread struct as opposed to team struct
Replaced readings of nproc from team structure with ones from
thread structure to improve performance.

Patch by Andrey Churbanov.

Differential Revision: http://reviews.llvm.org/D21559

llvm-svn: 273293
2016-06-21 18:30:15 +00:00
Jonathan Peyton 8c61c597be Addition of debugger comments and whitespace
The removal of legacy code to support long-deprecated debugger support library
resulted in some whitespace changes. Comments from that legacy code were made
public as they may be useful for other debuggers.

Patch by Olga Malysheva.

Differential Revision: http://reviews.llvm.org/D21391

llvm-svn: 273282
2016-06-21 15:59:34 +00:00
Jonathan Peyton fd7cc42fed Improvements to process affinity mask setting
A couple improvements:
1) Add ability to limit fullMask size when KMP_HW_SUBSET limits resources.
2) Make KMP_HW_SUBSET work for affinity_none, and only limit fullMask in this case.

Patch by Andrey Churbanov.

Differential Revision: http://reviews.llvm.org/D21528

llvm-svn: 273278
2016-06-21 15:54:38 +00:00
Jonathan Peyton 5a276c45c2 Bug fix for segfault in stubs library
There was a segfault in the stubs library in posix_memalign because
of a bad parameter. The fix is to send address of the pointer as a
parameter. Also added check of result of posix_memalign.

Patch by Andrey Churbanov.

Differential Revision: http://reviews.llvm.org/D21529

llvm-svn: 273276
2016-06-21 15:39:08 +00:00
Jonathan Peyton 98b76f6f87 [STATS] Adding process id to output filename
This change appends the process id to the KMP_STATS_FILE (if specified) which
enables MPI processes to output their stats to separate files.

Differential Revision: http://reviews.llvm.org/D21386

llvm-svn: 273273
2016-06-21 15:20:33 +00:00
Jonathan Peyton ea26f3f82a Fix typos in Fortran headers
Fix typos in Fortran headers to match spec.
Patch by Andrey Churbanov.
Differential Revision: http://reviews.llvm.org/D21531

llvm-svn: 273272
2016-06-21 15:16:51 +00:00
Jonathan Peyton bf35771bcc Change hwloc discovery algorithm to print topology only for accessible resources
Change hwloc discovery algorithm to print topology for only accessible
resources, and report uniformity correspondingly, similar to what other topology
discovery algorithms do. Fixes minor inconsistency in total topology reported
and resources used for threads binding in case hwloc used.

Patch by Andrey Churbanov.

Differential Revision: http://reviews.llvm.org/D21389

llvm-svn: 272952
2016-06-16 20:31:19 +00:00
Jonathan Peyton 0f3c2b921d Teach OpenMP Library to use Hwloc on Windows
This patch allows a user to enable Hwloc on windows. There are three main
changes in here:
1.kmp.h - Move definitions/declarations out of KMP_OS_WINDOWS guard (our windows
          implementation of affinity) because they need to be defined when
          KMP_USE_HWLOC is on as well.
2.teach __kmp_set_system_affinity, __kmp_get_system_affinity,
        __kmp_get_proc_group, and __kmp_affinity_bind_thread how to use hwloc.
3.teach CMake how to include hwloc when building Windows

Another minor change in here is to make sure that anything under KMP_USE_HWLOC
is also guarded by KMP_AFFINITY_SUPPORTED as well. This is to prevent Mac
builds from requiring anything from Hwloc.

Differential Revision: http://reviews.llvm.org/D21441

llvm-svn: 272951
2016-06-16 20:23:11 +00:00
Jonathan Peyton c505ab6733 Fix for crash in task dependencies
With single thread using __kmpc_omp_wait_deps segfaults in OpenMP runtime.
Offloading with depend also encounters this problem when we generate
kmpc_omp_wait_deps instead of kmpc_omp_task_with_deps.

Patch by Alex Duran

Differential Revision: http://reviews.llvm.org/D21384

llvm-svn: 272949
2016-06-16 20:18:31 +00:00
Jonathan Peyton 72a8498e08 Fixed missing memory cleanup in __kmp_affinity_create_hwloc_map()
Cleanup: fixed missing memory cleanup in couple of corner cases. Fixes possible
memory leak in some corner cases

Patch by Andrey Churbanov

Differential Revision: http://reviews.llvm.org/D21355

llvm-svn: 272946
2016-06-16 20:14:54 +00:00
Jonathan Peyton 4ba3b0cda9 Reduce perf impact of redundant ittnotify calls
Improved performance of ittnotify calls by request from ittnotify
owner: calls to __itt_string_handle_create made unique (it was
called multiple times).

Patch by Andrey Churbanov

Differential Revision: http://reviews.llvm.org/D21353

llvm-svn: 272945
2016-06-16 20:11:51 +00:00
Jonathan Peyton b9d28fbeb3 Deprecate KMP_PLACE_THREADS and rename as KMP_HW_SUBSET
Deprecate KMP_PLACE_THREADS and rename it to KMP_HW_SUBSET due to confusion
about its purpose and function among users.  KMP_HW_SUBSET is an environment
variable which allows users to easily pick a subset of the hardware topology to
use.  e.g., KMP_HW_SUBSET=30c,2t means use 30 cores, 2 threads per core.

Patch by Andrey Churbanov

Differential Revision: http://reviews.llvm.org/D21340

llvm-svn: 272937
2016-06-16 18:53:48 +00:00
Jonathan Peyton 7cf08d4299 Bug fix: crash if teams executed on host
Added argv array check/allocation for parallel directly nested inside the teams
construct, as new coming Fortran codegen passes parameters directly into
kmpc_fork_call missing same parameters in kmpc_fork_teams (earlier codegen
passed to parallel the subset of parameter passed to teams, and thus
no check/allocation needed).

Patch by Andrey Churbanov

Differential Revision: http://reviews.llvm.org/D21336

llvm-svn: 272935
2016-06-16 18:47:38 +00:00
Jonathan Peyton 614bb6618e Fix large overhead with itt notifications on region/barrier name composing
Currently, there is a big overhead in reporting of loop metadata through
ittnotify.  The pair of functions: __kmp_str_loc_init/__kmp_str_loc_free are
replaced with strchr/atoi calls.  Thus, a lot of time consuming actions are
skipped - many memory allocations/deallocations, heavy string duplication, etc.
The loop metadata only needs line and column info from the source string, so no
allocations and string splitting actually needed.

Patch by Andrey Churbanov

Differential Revision: http://reviews.llvm.org/D21309

llvm-svn: 272698
2016-06-14 19:27:22 +00:00
Jonathan Peyton e85ba3f58f Remove unused wait/release code.
Cleanup - unused code removal.
TODO: consider to remove (replace with flag class methods)
also kmp_wait_64 and kmp_release_64 routines.

Patch by Andrey Churbanov

Differential Revision: http://reviews.llvm.org/D21332

llvm-svn: 272697
2016-06-14 19:15:40 +00:00
Jonathan Peyton 957a151fd1 Whitespace cleanup of dllexports
Differential Revision: http://reviews.llvm.org/D21331

llvm-svn: 272691
2016-06-14 18:47:47 +00:00
Jonathan Peyton df6818bea4 Renaming change: 41 -> 45 and 4.1 -> 4.5
OpenMP 4.1 is now OpenMP 4.5.  Any mention of 41 or 4.1 is replaced with
45 or 4.5.  Also, if the CMake option LIBOMP_OMP_VERSION is 41, CMake warns that
41 is deprecated and to use 45 instead.

llvm-svn: 272687
2016-06-14 17:57:47 +00:00
Jonathan Peyton e1890e12f0 Bug fix for Bugzilla bug 26602: Remove function bodies with KMP_ASSERT(0)
Fix for bugzilla https://llvm.org/bugs/show_bug.cgi?id=26602.  Removed functions
body consisted of the only KMP_ASSERT(0) statement.  Thus possible runtime crash
converted to compile-time error, which looks preferable (faster possible error
detection).

TODO: consider C++11 static assert as an alternative, that could
make the diagnostics better.

Patch by Andrey Churbanov

Differential Revision: http://reviews.llvm.org/D21304

llvm-svn: 272590
2016-06-13 21:33:30 +00:00
Jonathan Peyton c5304aa3c4 Affinity mask processing improvements
Remove static specifier from var fullMask and remove kmp_get_fullMask() routine.
When iterating through procs in a mask, always check if proc is in fullMask
(this check was missing in a few places).

Patch by Brian Bliss.

Differential Revision: http://reviews.llvm.org/D21300

llvm-svn: 272589
2016-06-13 21:28:03 +00:00
Jonathan Peyton 8cb45c838f Exclude untied tasks from task stealing constraint
If either current_task or new_task is untied then skip task scheduling
constraint checks, because untied tasks are not affected by the task
scheduling constraints.

Differential Revision: http://reviews.llvm.org/D21196

llvm-svn: 272570
2016-06-13 17:51:59 +00:00
Jonathan Peyton 93495de265 Fix crash when libomp loaded/unloaded multiple times
The problem scenario is the following:
A dynamic library, libfoo.so, depends on libomp.so (it creates parallel region
and calls some omp functions).  An application has a loop where it dynamically
loads libfoo.so, calls the function from it, unloads libfoo.so.  After several
loop iterations application crashes with the message about lack of resources
OMP: Error #34: System unable to allocate necessary resources for OMP thread:

The problem is that pthread_kill() was not followed by pthread_join() in case
of terminated thread. This patch fixes this problem for both worker and monitor
threads.

Differential Revision: http://reviews.llvm.org/D21200

llvm-svn: 272567
2016-06-13 17:36:40 +00:00
Jonathan Peyton 202a24dd9b Hwloc refactoring patch
These changes remove the hwloc_topology_ignore_type function which doesn't exist
in the hwloc 2.0 API. In the existing code, the topology extracted from hwloc
has the cache levels stripped out and then assumes the final stripped topology
follows the typical three-level topology: packages -> cores -> HW threads.
But the code is doing unclean manipulations to determine at what level those
resources are located and also assumes too much about what hwloc is detecting
(there could be intermediate levels in between socket and core for instance).
This new way of extracting the topology doesn't strip out any hardware objects
that hwloc detects. It does not assume the three level topology, and instead
searches for the relevant three levels within the topology for each bit of
information using hwloc interface functions. i.e., the three level topology
subset that our affinity code is interested in is extracted from the hwloc
topology tree directly.

For example, the new __kmp_hwloc_get_nobjs_under_obj function gives the user the
number of cores under a socket reliably without worrying if there are unexpected
objects between the socket object and core object in the hwloc topology
structure. Also, now that all topology information is kept, there are also
possibilities of using the caches/numa nodes to determine more sophisticated
affinity settings in the future.

There is also some cleanup code added for the destruction of the
__kmp_hwloc_topology object.

Differential Revision: http://reviews.llvm.org/D21195

llvm-svn: 272565
2016-06-13 17:30:08 +00:00
Jonathan Peyton 34c72c4773 Fix bitmask complement operation
The bitmask complement operation doesn't consider the max proc id which means
something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a
Linux system even though there aren't 600 processors on said system. This
change has the complement bitmask and-ed with the fullmask so that it will only
contain valid processors.

Differential Revision: http://reviews.llvm.org/D21245

llvm-svn: 272561
2016-06-13 17:01:26 +00:00
Jonathan Peyton 5a299da55d [STATS] Add stats gathering for taskloop construct
llvm-svn: 272560
2016-06-13 16:56:41 +00:00
Jonathan Peyton b6f0f521f5 Fix spelling in comment
llvm-svn: 272291
2016-06-09 18:51:17 +00:00
Jonathan Peyton 61fdddfd64 Revert accidental commit to lit.cfg
llvm-svn: 272287
2016-06-09 18:29:36 +00:00
Jonathan Peyton c4c722ac0d Refactor __kmp_execute_tasks_template function
Refactored __kmp_execute_tasks_template to shorten and remove code redundancy.
The original code for __kmp_execute_tasks_template was very redundant with
large sections of repeated code that needed to be kept consistent, and goto
statements that made the control flow difficult to discern. This refactoring
removes all gotos and redundancy.

Patch by Terry Wilmarth

Differential Revision: http://reviews.llvm.org/D20879

llvm-svn: 272286
2016-06-09 18:27:03 +00:00
Hans Wennborg 5b89fbc822 kmp_lock.h: Fix VS2013 build after r271324
MSVC doesn't allow std::atomic<>s in a union since they don't have trivial
copy constructor. Replacing them with e.g. std::atomic_int works, but that
breaks the GCC build on Linux, because then calls to e.g. std::atomic_load_explicit
fail, as they expect a real std::atomic<> pointer.

Fixing this with an #ifdef to unbreak the build for now.

llvm-svn: 272271
2016-06-09 15:54:43 +00:00
Paul Osmialowski 9cc353e2b3 Fine tuning of TC* macros - small followup
As I replaced no-op TCR_4 with actual code, compiler complained while building debug build.
This patch moves 'cast to int' to the correct place.

Extension to Differential Revision: http://reviews.llvm.org/D19880

llvm-svn: 271377
2016-06-01 09:59:26 +00:00
Paul Osmialowski f7cc6affdb Use C++11 atomics for ticket locks implementation
This patch replaces use of compiler builtin atomics with
C++11 atomics for ticket locks implementation. Ticket locks
are used in critical places of the runtime, e.g. in the tasking
mechanism.

The main reason this change was introduced is the problem
with work stealing function on ARM architecture which suffered
from nasty race condition. It turned out that the root cause of
the problem lies in the way ticket locks are implemented. Changing
compiler builtins into C++11 atomics solves the problem.

Two assertions were added into kmp_tasking.c which are useful
for detecting early symptoms of something wrong going on with
work stealing, which were among the possible outcomes of the
race condition.

Differential Revision: http://reviews.llvm.org/D19878

llvm-svn: 271324
2016-05-31 20:20:32 +00:00
Jonathan Peyton ef7347994e Addition of OpenMP 4.5 feature: schedule(simd:static)
This patch implements the new kmp_sch_static_balanced_chunked schedule kind that
the compiler will generate when it encounters schedule(simd: static). It just
adds the new constant and the new switch case __kmp_for_static_init.

Patch by Alex Duran.

Differential Revision: http://reviews.llvm.org/D20699

llvm-svn: 271320
2016-05-31 19:12:18 +00:00
Jonathan Peyton f4f969569d Avoid deadlock with COI
When an asynchronous offload task is completed, COI calls the runtime to queue
a "destructor task".  When the task deques are full, a dead-lock situation
arises where the OpenMP threads are inside but cannot progress because the COI
thread is stuck inside the runtime trying to find a slot in a deque.

This patch implements the solution where the task deques doubled in size when
a task is being queued from a COI thread.

Differential Revision: http://reviews.llvm.org/D20733

llvm-svn: 271319
2016-05-31 19:07:00 +00:00
Jonathan Peyton 067325f935 Offer API for setting number of loop dispatch buffers
The problem is the lack of dispatch buffers when thousands of loops with nowait,
about 10 iterations each, are executed by hundreds of threads. We only have
built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
buffers.

The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
to give users same possibility I changed build-time control into run-time one,
adding API just in case.

This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
function kmp_set_disp_num_buffers(int num_buffers).

The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
because during the serial initialization we already allocate buffers for the hot
team, so it is too late to change the number of buffers later (or we need to
reallocate buffers for all teams which sounds too complicated). The
kmp_set_defaults() routine does not work for this envirable, because it calls
serial initialization before reading the parameter string. So a new routine,
kmp_set_disp_num_buffers(), is created so that it can set our internal global
variable before the library initialization. If both the envirable and API used
the envirable wins.

Differential Revision: http://reviews.llvm.org/D20697

llvm-svn: 271318
2016-05-31 19:01:15 +00:00
Hal Finkel 49bee007d0 Fix storing the frame pointer for OMP-T during ppc64 microtask dispatch
Thanks to John Mellor-Crummey for reporting the omission.

llvm-svn: 271035
2016-05-27 19:04:05 +00:00
Jonathan Peyton 50eae7f8b2 Add missing OpenMP 4.5 device entries to stubs library.
llvm-svn: 271006
2016-05-27 15:51:14 +00:00
Jonathan Peyton 7ba9baef6d Fix for OMP_PROC_BIND=spread strategy
The OMP_PROC_BIND=spread strategy fails to assign the master thread the
correct place partition after the first parallel region. Other threads in the
hot team will remember their place_partition, but the master's place partition
is restored to what it was before entering the parallel region. So when the hot
team is used for subsequent parallel regions, the master has lost this info.
This fix calls __kmp_partition_places to update only the master thread's place
partition in the spread case when there are no other changes to the hot team.

Patch by Terry Wilmarth

Differential Revision: http://reviews.llvm.org/D20539

llvm-svn: 270890
2016-05-26 19:09:46 +00:00
Jonathan Peyton 7abf9d5927 Make LIBOMP_USE_ITT_NOTIFY a setting that can be enabled or disabled
On Blue Gene/Q, having LIBOMP_USE_ITT_NOTIFY support compiled into a
statically-linked binary causes a failure at runtime because dlopen fails.
This patch changes LIBOMP_USE_ITT_NOTIFY to a cacheable configuration setting
that can be disabled.

Patch by John Mellor-Crummey

Differential Revision: http://reviews.llvm.org/D20517

llvm-svn: 270884
2016-05-26 18:19:10 +00:00
Hal Finkel 0a665a83da Add a test case for microtask dispatch with many arguments
This is a cleaned-up version of the test case posted in the D19879 review.

llvm-svn: 270867
2016-05-26 16:34:05 +00:00