Commit Graph

370 Commits

Author SHA1 Message Date
Jonathan Peyton e1890e12f0 Bug fix for Bugzilla bug 26602: Remove function bodies with KMP_ASSERT(0)
Fix for bugzilla https://llvm.org/bugs/show_bug.cgi?id=26602.  Removed functions
body consisted of the only KMP_ASSERT(0) statement.  Thus possible runtime crash
converted to compile-time error, which looks preferable (faster possible error
detection).

TODO: consider C++11 static assert as an alternative, that could
make the diagnostics better.

Patch by Andrey Churbanov

Differential Revision: http://reviews.llvm.org/D21304

llvm-svn: 272590
2016-06-13 21:33:30 +00:00
Jonathan Peyton c5304aa3c4 Affinity mask processing improvements
Remove static specifier from var fullMask and remove kmp_get_fullMask() routine.
When iterating through procs in a mask, always check if proc is in fullMask
(this check was missing in a few places).

Patch by Brian Bliss.

Differential Revision: http://reviews.llvm.org/D21300

llvm-svn: 272589
2016-06-13 21:28:03 +00:00
Jonathan Peyton 8cb45c838f Exclude untied tasks from task stealing constraint
If either current_task or new_task is untied then skip task scheduling
constraint checks, because untied tasks are not affected by the task
scheduling constraints.

Differential Revision: http://reviews.llvm.org/D21196

llvm-svn: 272570
2016-06-13 17:51:59 +00:00
Jonathan Peyton 93495de265 Fix crash when libomp loaded/unloaded multiple times
The problem scenario is the following:
A dynamic library, libfoo.so, depends on libomp.so (it creates parallel region
and calls some omp functions).  An application has a loop where it dynamically
loads libfoo.so, calls the function from it, unloads libfoo.so.  After several
loop iterations application crashes with the message about lack of resources
OMP: Error #34: System unable to allocate necessary resources for OMP thread:

The problem is that pthread_kill() was not followed by pthread_join() in case
of terminated thread. This patch fixes this problem for both worker and monitor
threads.

Differential Revision: http://reviews.llvm.org/D21200

llvm-svn: 272567
2016-06-13 17:36:40 +00:00
Jonathan Peyton 202a24dd9b Hwloc refactoring patch
These changes remove the hwloc_topology_ignore_type function which doesn't exist
in the hwloc 2.0 API. In the existing code, the topology extracted from hwloc
has the cache levels stripped out and then assumes the final stripped topology
follows the typical three-level topology: packages -> cores -> HW threads.
But the code is doing unclean manipulations to determine at what level those
resources are located and also assumes too much about what hwloc is detecting
(there could be intermediate levels in between socket and core for instance).
This new way of extracting the topology doesn't strip out any hardware objects
that hwloc detects. It does not assume the three level topology, and instead
searches for the relevant three levels within the topology for each bit of
information using hwloc interface functions. i.e., the three level topology
subset that our affinity code is interested in is extracted from the hwloc
topology tree directly.

For example, the new __kmp_hwloc_get_nobjs_under_obj function gives the user the
number of cores under a socket reliably without worrying if there are unexpected
objects between the socket object and core object in the hwloc topology
structure. Also, now that all topology information is kept, there are also
possibilities of using the caches/numa nodes to determine more sophisticated
affinity settings in the future.

There is also some cleanup code added for the destruction of the
__kmp_hwloc_topology object.

Differential Revision: http://reviews.llvm.org/D21195

llvm-svn: 272565
2016-06-13 17:30:08 +00:00
Jonathan Peyton 34c72c4773 Fix bitmask complement operation
The bitmask complement operation doesn't consider the max proc id which means
something like !{0} will be translated to {1,2,3,4,...,600,601,...,1023} on a
Linux system even though there aren't 600 processors on said system. This
change has the complement bitmask and-ed with the fullmask so that it will only
contain valid processors.

Differential Revision: http://reviews.llvm.org/D21245

llvm-svn: 272561
2016-06-13 17:01:26 +00:00
Jonathan Peyton 5a299da55d [STATS] Add stats gathering for taskloop construct
llvm-svn: 272560
2016-06-13 16:56:41 +00:00
Jonathan Peyton b6f0f521f5 Fix spelling in comment
llvm-svn: 272291
2016-06-09 18:51:17 +00:00
Jonathan Peyton 61fdddfd64 Revert accidental commit to lit.cfg
llvm-svn: 272287
2016-06-09 18:29:36 +00:00
Jonathan Peyton c4c722ac0d Refactor __kmp_execute_tasks_template function
Refactored __kmp_execute_tasks_template to shorten and remove code redundancy.
The original code for __kmp_execute_tasks_template was very redundant with
large sections of repeated code that needed to be kept consistent, and goto
statements that made the control flow difficult to discern. This refactoring
removes all gotos and redundancy.

Patch by Terry Wilmarth

Differential Revision: http://reviews.llvm.org/D20879

llvm-svn: 272286
2016-06-09 18:27:03 +00:00
Hans Wennborg 5b89fbc822 kmp_lock.h: Fix VS2013 build after r271324
MSVC doesn't allow std::atomic<>s in a union since they don't have trivial
copy constructor. Replacing them with e.g. std::atomic_int works, but that
breaks the GCC build on Linux, because then calls to e.g. std::atomic_load_explicit
fail, as they expect a real std::atomic<> pointer.

Fixing this with an #ifdef to unbreak the build for now.

llvm-svn: 272271
2016-06-09 15:54:43 +00:00
Paul Osmialowski 9cc353e2b3 Fine tuning of TC* macros - small followup
As I replaced no-op TCR_4 with actual code, compiler complained while building debug build.
This patch moves 'cast to int' to the correct place.

Extension to Differential Revision: http://reviews.llvm.org/D19880

llvm-svn: 271377
2016-06-01 09:59:26 +00:00
Paul Osmialowski f7cc6affdb Use C++11 atomics for ticket locks implementation
This patch replaces use of compiler builtin atomics with
C++11 atomics for ticket locks implementation. Ticket locks
are used in critical places of the runtime, e.g. in the tasking
mechanism.

The main reason this change was introduced is the problem
with work stealing function on ARM architecture which suffered
from nasty race condition. It turned out that the root cause of
the problem lies in the way ticket locks are implemented. Changing
compiler builtins into C++11 atomics solves the problem.

Two assertions were added into kmp_tasking.c which are useful
for detecting early symptoms of something wrong going on with
work stealing, which were among the possible outcomes of the
race condition.

Differential Revision: http://reviews.llvm.org/D19878

llvm-svn: 271324
2016-05-31 20:20:32 +00:00
Jonathan Peyton ef7347994e Addition of OpenMP 4.5 feature: schedule(simd:static)
This patch implements the new kmp_sch_static_balanced_chunked schedule kind that
the compiler will generate when it encounters schedule(simd: static). It just
adds the new constant and the new switch case __kmp_for_static_init.

Patch by Alex Duran.

Differential Revision: http://reviews.llvm.org/D20699

llvm-svn: 271320
2016-05-31 19:12:18 +00:00
Jonathan Peyton f4f969569d Avoid deadlock with COI
When an asynchronous offload task is completed, COI calls the runtime to queue
a "destructor task".  When the task deques are full, a dead-lock situation
arises where the OpenMP threads are inside but cannot progress because the COI
thread is stuck inside the runtime trying to find a slot in a deque.

This patch implements the solution where the task deques doubled in size when
a task is being queued from a COI thread.

Differential Revision: http://reviews.llvm.org/D20733

llvm-svn: 271319
2016-05-31 19:07:00 +00:00
Jonathan Peyton 067325f935 Offer API for setting number of loop dispatch buffers
The problem is the lack of dispatch buffers when thousands of loops with nowait,
about 10 iterations each, are executed by hundreds of threads. We only have
built-in 7 dispatch buffers, but there is a need in dozens or hundreds of
buffers.

The problem can be fixed by setting KMP_MAX_DISP_BUF to bigger value. In order
to give users same possibility I changed build-time control into run-time one,
adding API just in case.

This change adds an environment variable KMP_DISP_NUM_BUFFERS and a new API
function kmp_set_disp_num_buffers(int num_buffers).

The KMP_DISP_NUM_BUFFERS envirable works only before serial initialization,
because during the serial initialization we already allocate buffers for the hot
team, so it is too late to change the number of buffers later (or we need to
reallocate buffers for all teams which sounds too complicated). The
kmp_set_defaults() routine does not work for this envirable, because it calls
serial initialization before reading the parameter string. So a new routine,
kmp_set_disp_num_buffers(), is created so that it can set our internal global
variable before the library initialization. If both the envirable and API used
the envirable wins.

Differential Revision: http://reviews.llvm.org/D20697

llvm-svn: 271318
2016-05-31 19:01:15 +00:00
Hal Finkel 49bee007d0 Fix storing the frame pointer for OMP-T during ppc64 microtask dispatch
Thanks to John Mellor-Crummey for reporting the omission.

llvm-svn: 271035
2016-05-27 19:04:05 +00:00
Jonathan Peyton 50eae7f8b2 Add missing OpenMP 4.5 device entries to stubs library.
llvm-svn: 271006
2016-05-27 15:51:14 +00:00
Jonathan Peyton 7ba9baef6d Fix for OMP_PROC_BIND=spread strategy
The OMP_PROC_BIND=spread strategy fails to assign the master thread the
correct place partition after the first parallel region. Other threads in the
hot team will remember their place_partition, but the master's place partition
is restored to what it was before entering the parallel region. So when the hot
team is used for subsequent parallel regions, the master has lost this info.
This fix calls __kmp_partition_places to update only the master thread's place
partition in the spread case when there are no other changes to the hot team.

Patch by Terry Wilmarth

Differential Revision: http://reviews.llvm.org/D20539

llvm-svn: 270890
2016-05-26 19:09:46 +00:00
Jonathan Peyton 7abf9d5927 Make LIBOMP_USE_ITT_NOTIFY a setting that can be enabled or disabled
On Blue Gene/Q, having LIBOMP_USE_ITT_NOTIFY support compiled into a
statically-linked binary causes a failure at runtime because dlopen fails.
This patch changes LIBOMP_USE_ITT_NOTIFY to a cacheable configuration setting
that can be disabled.

Patch by John Mellor-Crummey

Differential Revision: http://reviews.llvm.org/D20517

llvm-svn: 270884
2016-05-26 18:19:10 +00:00
Hal Finkel 0a665a83da Add a test case for microtask dispatch with many arguments
This is a cleaned-up version of the test case posted in the D19879 review.

llvm-svn: 270867
2016-05-26 16:34:05 +00:00
Hal Finkel 91e19a3de4 Add an assembly __kmp_invoke_microtask for ppc64[le]
Clang no longer restricts itself to generating microtasks with a small number
of arguments, and so an assembly implementation is required to prevent hitting
the parameter limit present in the C implementation. This adds an
implementation for ppc64[le].

llvm-svn: 270821
2016-05-26 04:48:14 +00:00
Andrey Churbanov 2fd1654278 D20525: Use more general function for getting gtid which may be faster than specific one.
llvm-svn: 270694
2016-05-25 12:53:17 +00:00
Jonathan Peyton b044e4fa31 Fork performance improvements
Most of this is modifications to check for differences before updating data
fields in team struct. There is also some rearrangement of the team struct.

Patch by Diego Caballero

Differential Revision: http://reviews.llvm.org/D20487

llvm-svn: 270468
2016-05-23 18:01:19 +00:00
Jonathan Peyton 1ab887d403 Allow unit testing on Windows
These changes allow testing on Windows using clang.exe.
There are two main changes:
1. Only link to -lm when it actually exists on the system
2. Create basic versions of pthread_create() and pthread_join() for windows.
   They are not POSIX compliant by any stretch but will allow any existing
   and future tests to use pthread_create() and pthread_join() for testing
   interactions of libomp with os threads.

Differential Revision: http://reviews.llvm.org/D20391

llvm-svn: 270464
2016-05-23 17:50:32 +00:00
Jonathan Peyton b2b6d4e2e1 Changed parameter names in Fortran modules to correspond with OpenMP 4.5 specification
llvm-svn: 270447
2016-05-23 16:24:39 +00:00
Jonathan Peyton 611184919f Remove trailing whitespace in src/ directory
This patch doesn't affect D19878's context.  So D19878 still cleanly applies.

llvm-svn: 270252
2016-05-20 19:03:38 +00:00
Jonathan Peyton aa7d2d781b Remove unnecessary unistd.h header from tests.
llvm-svn: 269987
2016-05-18 21:36:34 +00:00
Jonathan Peyton 096ccdd389 Remove trailing whitespace in files in doc/ directory
llvm-svn: 269842
2016-05-17 21:12:48 +00:00
Jonathan Peyton 3731076997 Remove trailing whitespace from tests
llvm-svn: 269841
2016-05-17 21:08:52 +00:00
Jonathan Peyton 0c3a85a327 Remove trailing whitespace in files in tools/ directory
llvm-svn: 269837
2016-05-17 20:54:10 +00:00
Jonathan Peyton 975dabc96e Remove trailing whitespace in CMake files
llvm-svn: 269836
2016-05-17 20:51:24 +00:00
Jonathan Peyton 924a6627ea Remove trailing whitespace in READMEs, CREDITS.txt and index.html
llvm-svn: 269835
2016-05-17 20:48:42 +00:00
Jonathan Peyton 0e8f053023 [OpenMP Testing] Have lit.py be a valid lit executable
Users can use either llvm-lit (generated during llvm build) or lit.py which
exists in llvm/utils/lit.

llvm-svn: 269774
2016-05-17 15:12:11 +00:00
Paul Osmialowski fb043fdfff Clean all the mess around KMP_USE_FUTEX and kmp_lock.h
KMP_USE_FUTEX preprocessor definition defined in kmp_lock.h is used
inconsequently throughout LLVM libomp code.

* some .c files that use this define do not include kmp_lock.h file,
  in effect guarded part of code are never compiled
* some places in code use architecture-depending preprocessor
  logic expressions which effectively disable use of Futex for
  AArch64 architecture, all these places should use
  '#if KMP_USE_FUTEX' instead to avoid any further confusions
* some places use KMP_HAS_FUTEX which is nowhere defined,
  KMP_USE_FUTEX should be used instead

Differential Revision: http://reviews.llvm.org/D19629

llvm-svn: 269642
2016-05-16 09:44:11 +00:00
Paul Osmialowski 97ae10c67c NFC fix indent (relates to my previous commit)
llvm-svn: 269443
2016-05-13 17:45:49 +00:00
Paul Osmialowski 7e5e8684fb Solve 'Too many args to microtask' problem
This patch solves 'Too many args to microtask' problem which occurs
while executing lulesh2.0.3 benchmark on AArch64.

To solve this I had to wrtite AArch64 assembly version of
__kmp_invoke_microtask() function, similar to x86 and x86_64
implementations.

Differential Revision: http://reviews.llvm.org/D19879

llvm-svn: 269399
2016-05-13 08:26:42 +00:00
Jonathan Peyton f83ae31caf Adding new kmp_aligned_malloc() entry point
This change adds a new entry point,
kmp_aligned_malloc(size_t size, size_t alignment), an entry point corresponding
to kmp_malloc() but with the capability to return aligned memory as well.
Other allocator routines have been adjusted so that kmp_free() can be used for
freeing memory blocks allocated by any kmp_*alloc() routine, including the new
kmp_aligned_malloc() routine.

Differential Revision: http://reviews.llvm.org/D19814

llvm-svn: 269365
2016-05-12 22:00:37 +00:00
Jonathan Peyton 2b749b33cc Fix team reuse with foreign threads
After hot teams were enabled by default, the library started using levels kept
in the team structure. The levels are broken in case foreign thread exits and
puts its team into the pool which is then re-used by another foreign thread.
The broken behavior observed is when printing the levels for each new team, one
gets 1, 2, 1, 2, 1, 2, etc. This makes the library believe that every other
team is nested which is incorrect. What is wanted is for the levels to be
1, 1, 1, etc.

Differential Revision: http://reviews.llvm.org/D19980

llvm-svn: 269363
2016-05-12 21:54:30 +00:00
Paul Osmialowski 562a3c2b66 New hwloc API compatibility
Differential Revision: http://reviews.llvm.org/D19628

llvm-svn: 269284
2016-05-12 11:46:40 +00:00
Hal Finkel 55acbf8877 Restore NULL flag check in __kmp_null_resume_wrapper
This reverts a presumaby-unintentional change in:

  r268640 - [STATS] Use partitioned timer scheme

and fixes segfaults in an x86_64 debug build of the runtime library.

llvm-svn: 269259
2016-05-12 00:54:08 +00:00
Paul Osmialowski 52bef53f86 Fine tuning of TC* macros
This patch introduces following:
* TCI_* and TCD_* macros for incrementation and decrementation
* Fix for invalid use of TCR_8 in one expression

Differential Revision: http://reviews.llvm.org/D19880

llvm-svn: 268826
2016-05-07 00:00:00 +00:00
Jonathan Peyton 11dc82fa83 [STATS] Use partitioned timer scheme
This change removes the current timers with ones that partition time properly.
The current timers are nested, so that if a new timer, B, starts when the
current timer, A, is already timing, A's time will include B's. To eliminate
this problem, the partitioned timers are designed to stop the current timer (A),
let the new timer run (B), and when the new timer is finished, restart the
previously running timer (A). With this partitioning of time, a threads' timers
all sum up to the OMP_worker_thread_life time and can now easily show the
percentage of time a thread is spending in different parts of the runtime or
user code.

There is also a new state variable associated with each thread which tells where
it is executing a task. This corresponds with the timers: OMP_task_*, e.g., if
time is spent in OMP_task_taskwait, then that thread executed tasks inside a
#pragma omp taskwait construct.

The changes are mostly changing the MACROs to use the new PARITIONED_* macros,
the new partitionedTimers class and its methods, and new state logic.

Differential Revision: http://reviews.llvm.org/D19229

llvm-svn: 268640
2016-05-05 16:15:57 +00:00
Paul Osmialowski fedce46bbd NFC remove unneded spaces (test commit)
llvm-svn: 268462
2016-05-03 23:10:20 +00:00
Jonathan Peyton 8407f5b3bd Remove architecture dependent Hwloc DEBUG section
This debug sections's functionality can be replicated using the environment
variable KMP_TOPOLOGY_METHOD with different values and KMP_AFFINITY=verbose

llvm-svn: 267472
2016-04-25 21:11:26 +00:00
Jonathan Peyton 1d5487c5d0 Fix buffer problem with printing long Hwloc affinity mask
This change has the hwloc_bitmap_list_snprintf() function use the entire buffer
to print the mask.  There is no need to shorten the buffer length by 7.  It only
needs to be shortened by one byte.

llvm-svn: 267470
2016-04-25 21:08:31 +00:00
Jonathan Peyton a1202bf594 [ITTNOTIFY] Remove serialized parallel regions from frame notification
llvm-svn: 266760
2016-04-19 16:55:17 +00:00
Jonathan Peyton 5235a1b603 Fix trip count calculation for parallel loops in runtime
The trip count calculation was incorrect for loops with large bounds. For example,
for(int i=-2,000,000,000; i < 2,000,000,000; i+=50000000), the trip count
calculation had overflow (trying to calculate 2,000,000,000 + 2,000,000,000 with
signed integers) and wasn't giving the right value. This patch fixes this error
in the runtime by using unsigned integers instead. There is still a bug in the
clang compiler component because it warns that there is overflow in the
test case file when there isn't. This error isn't there for the Intel Compiler.
So for now, the test case is designated as XFAIL.

Differential Revision: http://reviews.llvm.org/D19078

llvm-svn: 266677
2016-04-18 21:38:29 +00:00
Jonathan Peyton e6643daa18 Runtime support for untied tasks
Introduced a counter of parts of an untied task submitted for execution. The
counter controls whether all parts of the task are already finished. The
compiler should generate re-submission of partially executed untied task by
itself before exiting of each task part except for the lexical last part.

Differential Revision: http://reviews.llvm.org/D19026

llvm-svn: 266675
2016-04-18 21:35:14 +00:00
Jonathan Peyton f252010f69 Fix for pthread_setspecific (TLS and shutdown) problem
Some codes that use TLS fail intermittently because one thread tries to write
TLS values after the TLS key has been destroyed by another thread. This happens
when one thread executes library shutdown (and destroys TLS keys), while another
thread starts to execute the TLS key destructor routine. Before this change, the
kmp_init_runtime flag was checked before calling pthread_* TLS functions, but
this flag is set to FALSE later than the destruction of the TLS keys, which
leads to failure. The fix is to check kmp_init_gtid instead, as this flag is
unset *before* the destruction of TLS keys.

Differential Revision: http://reviews.llvm.org/D19022

llvm-svn: 266674
2016-04-18 21:33:01 +00:00