Make sure that OMPT is enabled in runtime entry points that access internals
of the runtime. Else, return an appropiate value indicating an error or that
the data is not available.
Patch provided by @sconvent
Reviewers: jlpeyton, omalyshe, hbae, Hahnfeld, joachim.protze
Reviewed By: joachim.protze
Tags: #openmp, #ompt
Differential Revision: https://reviews.llvm.org/D47717
llvm-svn: 351311
Using proc_bind clause on a nested #pragma omp parallel region
with KMP_AFFINITY set causes an assertion error. This assertion occurs because
the place-partition-var is not properly initialized in the nested master threads.
Trying to get an intuitive result with KMP_AFFINITY + proc_bind is difficult
because of how the KMP_AFFINITY gtid-to-place mapping occurs. This
patch creates an initial place list no matter what affinity mechanism is used.
For KMP_AFFINITY, the place-partition-var is initialized to all the places.
Differential Revision: https://reviews.llvm.org/D55795
llvm-svn: 351227
The omp-tools.h file is generated from the OpenMP spec to ensure that the interface
is implemented as specified.
The other changes are necessary to update the interface implementation to the
final version as published in 5.0.
The omp-tools.h header was previously called ompt.h, currently a copy under this name
is installed for legacy tools.
Patch partially perpared by @sconvent
Reviewers: AndreyChurbanov, hbae, Hahnfeld
Reviewed By: hbae
Tags: #openmp, #ompt
Differential Revision: https://reviews.llvm.org/D55579
llvm-svn: 351197
Add omp_get_device_num() function for 5.0 which returns the number of the
device the current thread is running on. Currently, we are leaving it to the
compiler to handle this properly if it is called inside target.
Also, did some cleanup and updating of duplicate device API functions (in both
libomp and libomptarget) to make them into weak functions that check for the
symbol from libomptarget, and will call the version in libomptarget if it is
present. If any additional device API functions are implemented also in
libomptarget in the future, we should add the dlsym calls to the host functions.
Also, if the omp_target_* functions are to be implemented for the host (this has
been requested), they should attempt to call the libomptarget versions as well.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D55578
llvm-svn: 350352
Fix the newly-added tests to use %python substitution in order to use
the correct path to Python interpreter. Otherwise, they fail on NetBSD
where there is no 'python', just 'pythonX.Y'.
Differential Revision: https://reviews.llvm.org/D56048
llvm-svn: 350001
XFAIL two tests that fail on PowerPC LE Linux due
to the change of default from PIC to no-PIC on that
platform.
A Bug has been opened for this:
https://bugs.llvm.org/show_bug.cgi?id=40082
The tests are:
runtime/test/ompt/misc/control_tool.c
runtime/test/ompt/synchronization/taskwait.c
llvm-svn: 349512
This patch updates the implementation of the ompt_frame_t, ompt_wait_id_t
and ompt_state_t. The final version of the OpenMP 5.0 spec added the "t"
for these types.
Furthermore the structure for ompt_frame_t changed and allows to specify
that the reenter frame belongs to the runtime.
Patch partially prepared by Simon Convent
Reviewers: hbae
llvm-svn: 349458
Summary:
I have discovered this because i wanted to experiment with
building static libomp (with openmp-4.0 support only)
for debugging purposes.
There are three kinds of problems here:
1. `__kmp_compare_and_store_acq()` simply does not exist.
It was added in D47903 by @jlpeyton.
I'm guessing `__kmp_atomic_compare_store_acq()` was meant.
2. In `__kmp_is_ticket_lock_initialized()`,
`lck->lk.initialized` is `std::atomic<bool>`,
while `lck` is `kmp_ticket_lock_t *`.
Naturally, they can't be equality-compared.
Either, it should return the value read from `lck->lk.initialized`,
or do what `__kmp_is_queuing_lock_initialized()` does,
compare the passed pointer with the field in the struct
pointed by the pointer. I think the latter is correct-er choice here.
3. Tests were not versioned.
They assume that `LIBOMP_OMP_VERSION` is at the latest version.
This does not touch LIBOMP_OMP_VERSION=30. That is still broken.
Reviewers: jlpeyton, Hahnfeld, AndreyChurbanov
Reviewed By: AndreyChurbanov
Subscribers: guansong, jfb, openmp-commits, jlpeyton
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D55496
llvm-svn: 349260
This patch adds the affinity format functionality introduced in OpenMP 5.0.
This patch adds: Two new environment variables:
OMP_DISPLAY_AFFINITY=TRUE|FALSE
OMP_AFFINITY_FORMAT=<string>
and Four new API:
1) omp_set_affinity_format()
2) omp_get_affinity_format()
3) omp_display_affinity()
4) omp_capture_affinity()
The affinity format functionality has two ICV's associated with it:
affinity-display-var (bool) and affinity-format-var (string).
The affinity-display-var enables/disables the functionality through the
envirable OMP_DISPLAY_AFFINITY. The affinity-format-var is a formatted
string with the special field types beginning with a '%' character
similar to printf
For example, the affinity-format-var could be:
"OMP: host:%H pid:%P OStid:%i num_threads:%N thread_num:%n affinity:{%A}"
The affinity-format-var is displayed by every thread implicitly at the beginning
of a parallel region when any thread's affinity has changed (including a brand
new thread being spawned), or explicitly using the omp_display_affinity() API.
The omp_capture_affinity() function can capture the affinity-format-var in a
char buffer. And omp_set|get_affinity_format() allow the user to set|get the
affinity-format-var explicitly at runtime. omp_capture_affinity() and
omp_get_affinity_format() both return the number of characters needed to hold
the entire string it tried to make (not including NULL character). If not
enough buffer space is available,
both these functions truncate their output.
Differential Revision: https://reviews.llvm.org/D55148
llvm-svn: 349089
Increase the range for omp_get_wtick() test to allow for 0.01
(from <0.01). This is needed for NetBSD where it returns exactly that
value due to CLOCKS_PER_SEC being 100. This should not cause
a significant difference from e.g. FreeBSD where it is 128,
and especially from Linux where CLOCKS_PER_SEC is apparently meaningless
and sysconf(_SC_CLK_TCK) gives 100 as well.
Differential Revision: https://reviews.llvm.org/D55493
llvm-svn: 348857
On NetBSD, alloca() is in stdlib.h and there is no alloca.h. Adjust
the includes appopriately.
Differential Revision: https://reviews.llvm.org/D55487
llvm-svn: 348856
Pass `-n -s` instead of `--numeric --stable` to sort(1), as long options
are not supported by NetBSD sort implementation. `-n` is defined
by POSIX, so it should be fully portable. `-s` is used consistently
at least in GNU sort and FreeBSD sort, and I honestly doubt it would
cause issues with any other implementation supporting `--stable`.
Differential Revision: https://reviews.llvm.org/D55479
llvm-svn: 348855
This change renames ompt_mutex_impl_unknown to ompt_mutex_impl_none,
following the name change in the specification.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D54347
llvm-svn: 347802
Some types and callback signatures have changed from TR6 to TR7.
Major changes (only adding signatures and stubs):
(-remove idle callback) done by D48362
-add reduction and dispatch callback
-add get_task_memory and finalize_tool runtime entry points
-ompt_invoker_t becomes ompt_parallel_flag_t
-more types of sync_regions
Patch provided by Simon Convent
Reviewers: hbae, protze.joachim
Differential Revision: https://reviews.llvm.org/D50774
llvm-svn: 341834
Implemented omp_alloc, omp_free, omp_{set,get}_default_allocator entries,
and OMP_ALLOCATOR environment variable.
Added support for HBW memory on Linux if libmemkind.so library is accessible
(dynamic library only, no support for static libraries).
Only used stable API (hbwmalloc) of the memkind library
though we may consider using experimental API in future.
The ICV def-allocator-var is implemented per implicit task similar to
place-partition-var. In the absence of a requested allocator, the uses the
default allocator.
Predefined allocators (the only ones currently available) are made similar
for C and Fortran, - pointers (long integers) with values 1 to 8.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D51232
llvm-svn: 341687
This is a follow-up to r341371: The new test for PR38704 doesn't
work with Clang 6.0. It uses an UNSUPPORTED: clang-6, but that
hasn't worked because the compiler features weren't known to lit.
llvm-svn: 341448
The idle callback was removed from the spec as of TR7.
This removes it from the implementation.
Patch provided by Simon Convent
Reviewers: hbae, protze.joachim
Differential Revision: https://reviews.llvm.org/D48362
llvm-svn: 339771
This patch adds a test using the doacross clauses in OpenMP and removes gcc from
testing kmp_doacross_check.c which is only testing the kmp rather than the
gomp interface.
Differential Revision: https://reviews.llvm.org/D50014
llvm-svn: 338757
Only supported since GCC 6 and Intel 17.0. However GCC 6.3.0 is
crashing on two of the tests, so disable them as well...
Differential Revision: https://reviews.llvm.org/D50085
llvm-svn: 338720
The taskloop testcase had scheduling effects. Tasks of the taskloop would
sometimes be scheduled before all task were created. The testing is now
split into two phases. First, the task creation on the master is tested,
than the scheduling events of the tasks are tested. Thus, the order of
creation and scheduling events is irrelavant.
Patch by Simon Convent
Reviewed by: protze.joachim, Hahnfeld
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D50140
llvm-svn: 338580
GCC 4.8.5 defaults to this old C standard. I think we should make the
tests pass a newer -std=c99|c11 but that's too intrusive for now...
Differential Revision: https://reviews.llvm.org/D50084
llvm-svn: 338490
From the bug report, the runtime needs to initialize the nproc variables
(inside middle init) for each root when the task is encountered, otherwise,
a segfault can occur.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36720
Differential Revision: https://reviews.llvm.org/D49996
llvm-svn: 338313
Fix the order of callbacks related to the taskloop construct.
Add the iteration_count to work callbacks (according to the spec).
Use kmpc_omp_task() instead of kmp_omp_task() to include OMPT callbacks.
Add a testcase.
Patch by Simon Convent
Reviewed by: protze.joachim, hbae
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D47709
llvm-svn: 338146
The ompt/tasks/task_types.c testcase did not test untied tasks properly. Now,
frame addresses are tested and two scheduling points are added at which the
task can switch to another thread. Due to scheduling effects, the frame address
could be NULL.
This needed a restructure of the way OMPT callbacks are called.
__ompt_task_finish() now as an extra parameter, whether a task is completed.
Its invocation has been moved into __kmp_task_finish(). Thus, the order of the
writes to the frame addresses is not subject to scheduling effects anymore.
Patch by Simon Convent
Reviewed by: protze.joachim, hbae
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D49181
llvm-svn: 338145
The two more outputs are needed to match the return addresses when using the
Intel Compiler, as it generates more instructions between the fuzzy-printing
of the address and the runtime call.
Patch by Simon Convent
Reviewed By: protze.joachim, hbae
Differential Revision: https://reviews.llvm.org/D49373
llvm-svn: 338144
The initial commit said that the test passes with Intel Compiler,
so change XFAIL to only list clang and gcc.
Differential Revision: https://reviews.llvm.org/D49801
llvm-svn: 338051
The flag "--no-as-needed" is not recognized by the linker on macOS making the following tests fail:
ompt/loadtool/tool_available/tool_available.c
ompt/loadtool/tool_not_available/tool_not_available.c
This patch removes this flag for macOS and adds it only for Linux and Windows.
I tested it on Ubuntu 16.04 and macOS HighSierra, with Clang/LLVM 6.0.1 and OpenMP trunk.
This solution was also discussed in the OpenMP-dev mailing list.
Patch provided by Simone Atzeni
Differential Revision: https://reviews.llvm.org/D48888
llvm-svn: 336327
The testcase potentially fails when a thread is reused.
The added synchronization makes sure this does not happen.
Patch provided by Simon Convent
Differential Revision: https://reviews.llvm.org/D48932
llvm-svn: 336326
When compiling with icc, there is a problem with reenter frame addresses in
parallel_begin callbacks in the interoperability.c testcase. (The address is
not available. thus NULL)
Using alloca() forces availability of the frame pointer.
Patch provided by Simon Convent
Differential Revision: https://reviews.llvm.org/D48282
llvm-svn: 336088
Several runtime entry points have not been tested from non-OpenMP threads. This
adds tests to an existing testcase. While at it, the testcase was reformatted
Patch provided by Simon Convent
Differential Revision: https://reviews.llvm.org/D48124
llvm-svn: 336087
Especially the thread_end callback has not been tested before.
This adds a testcase for nested and non-nested threads.
Patch provided by Simon Convent
Differential Revision: https://reviews.llvm.org/D47824
llvm-svn: 336086
The current implementation always provides the thread-num for the current
parallel region. This patch fixes the behavior for ancestor levels >0.
Differential Revision: https://reviews.llvm.org/D46533
llvm-svn: 336085
Upcoming changes to FileCheck will modify CHECK-DAG to not match
overlapping regions of the input. This test was found to be affected
because it expects to find four threads to invoke events of type
ompt_event_implicit_task_begin. It turns out this is wrong because
OMP_THREAD_LIMIT is set to 2, so there are only two threads. The
rest of the test got it right so it went unnoticed until now.
(Rewrite test and apply clang-format to it as discussed in the past.)
Differential Revision: https://reviews.llvm.org/D47119
llvm-svn: 333361
The api_calls_misc.c testcase tests the following api calls:
ompt_get_callback()
ompt_get_state()
ompt_enumerate_states()
ompt_enumerate_mutex_impls()
These have not been tested previously.
The api_calls.c testcase has been renamed to api_calls_places.c because it only tests api calls that are related to places.
Differential Revision: https://reviews.llvm.org/D42523
llvm-svn: 331631
This patch introduces GOMP_taskloop to our API. It adds GOMP_4.5 to our
version symbols. Being a wrapper around __kmpc_taskloop, the function
creates a task with the loop bounds properly nested in the shareds so that
the GOMP task thunk will work properly. Also, the firstprivate copy constructors
are properly handled using the __kmp_gomp_task_dup() auxiliary function.
Currently, only linear spawning of tasks is supported
for the GOMP_taskloop interface.
Differential Revision: https://reviews.llvm.org/D45327
llvm-svn: 330282
We have to ensure that the runtime is initialized _before_ waiting
for the two started threads to guarantee that the master threads
post their ompt_event_thread_begin before the worker threads. This
is not guaranteed in the parallel region where one worker thread
could start before the other master thread has invoked the callback.
The problem did not happen with Clang becauses the generated code
calls __kmpc_global_thread_num() and cashes its result for functions
that contain OpenMP pragmas.
Differential Revision: https://reviews.llvm.org/D43882
llvm-svn: 326435
The thread_num parameter of ompt_get_task_info() was not being used previously,
but need to be set.
The print_task_type() function (form the task-types.c testcase) was merged into
the print_ids() function (in callback.h). Testing of ompt_get_task_info() was
added to the task-types.c testcase. It was not tested extensively previously.
Differential Revision: https://reviews.llvm.org/D42472
llvm-svn: 326338
The main change of this patch is to insert {{.*}} in current_address=[[RETURN_ADDRESS_END]].
This is needed to match any of the alternatively printed addresses.
Additionally, clang-format is applied to the two tests.
Differential Revision: https://reviews.llvm.org/D43115
llvm-svn: 326312
This is required to be NULL for implicit barriers at the end of a
parallel region. Noticed in review of D43191.
Differential Revision: https://reviews.llvm.org/D43308
llvm-svn: 325922
The compiler inlines the user code in the task. Check for that case at
runtime by comparing the frame addresses and print the expected exit
address.
Also showcase how I think the OMPT tests could be reformatted to match
LLVM's code style. In my opinion it would be great to that kind of change
to all tests that need to be touched for whatever reason...
Differential Revision: https://reviews.llvm.org/D43191
llvm-svn: 325921
Test whether OMPT-callbacks for two threads that initiate a parallel region are correct.
Differential Revision: https://reviews.llvm.org/D41942
llvm-svn: 325423
Only use ompt_ functions when testing OMPT in api_calls testcase.
Add size parameter to print_list.
Fix small bug in implementation of ompt_get_partition_place_nums(): return correct length.
Differential Revision: https://reviews.llvm.org/D42162
llvm-svn: 325422
This affects all outlined functions, not just tasks! Only show warning
when using Clang 5.0 or later.
Differential Revision: https://reviews.llvm.org/D43190
llvm-svn: 325131
Tests the search for tools as defined in the spec. The OMP_TOOL_LIBRARIES
environment variable contains paths to the following files(in that order)
-to a nonexisting file
-to a shared library that does not have a ompt_start_tool function
-to a shared library that has an ompt_start_tool implementation returning NULL
-to a shared library that has an ompt_start_tool implementation returning a
pointer to a valid instance of ompt_start_tool_result_t
The expected result is that the last tool gets active and can print in the
thread-begin callback.
Differential Revision: https://reviews.llvm.org/D42166
llvm-svn: 324588
Add a testcase that checks wheter the runtime can handle an ompt_start_tool
method that returns NULL indicating that no tool shall be loaded.
All tool_available testcases need a separate folder to avoid file conflicts for
the generated tools.
Differential Revision: https://reviews.llvm.org/D41904
llvm-svn: 324587
Use fuzzy return addresses in lock testcases so that these
testcases can also be run using the Intel Compiler.
Patch by Simon Convent!
Differential Revision: https://reviews.llvm.org/D41896
llvm-svn: 323529
Add Workaround for Intel Compiler Bug with Case#: 03138964
A critical region within a nested task causes a segfault in icc 14-18:
int main()
{
#pragma omp parallel num_threads(2)
#pragma omp master
#pragma omp task
#pragma omp task
#pragma omp critical
printf("test\n");
}
When the critical region is in a separate function, the segault does not occur.
So we add noinline to make sure that the function call stays there.
Differential Revision: https://reviews.llvm.org/D41182
llvm-svn: 322622
When the current thread is not an (initialized) OpenMP thread, the runtime
entry points return values that correspond to "not available" or similar
Differential Revision: https://reviews.llvm.org/D41167
llvm-svn: 322620
As for normal task creation, the task frame addresses need to be stored
for the encountering task.
Differential Revision: https://reviews.llvm.org/D41165
llvm-svn: 321421
The compiler warns that _BSD_SOURCE is deprecated and _DEFAULT_SOURCE should
be used instead. We keep _BSD_SOURCE for older compilers, that don't know
about _DEFAULT_SOURCE.
The linker drops the tool when linking, since there is no visible need for
the library. So we need to tell the linker, that the tool should be linked
anyway.
Differential Revision: https://reviews.llvm.org/D41499
llvm-svn: 321362
This function is defined in OpenMP-TR6 section 4.1.5.1.6
The functions was not implemented yet.
Since ompt-functions can only be called after the runtime was initialized and
has loaded a tool, it can assume the runtime to be initialized. In contrast
to omp_get_num_procs which needs to check whether the runtime is initialized.
Differential Revision: https://reviews.llvm.org/D40949
llvm-svn: 321269
This revision fixes failing testcases with parallel for loops and the gomp
interface. The return address needs to be stored at entry to runtime.
The storage is cleared on usage, so we need to update the storage before
calling again internal functions, that will trigger event callbacks.
Differential Revision: https://reviews.llvm.org/D41181
llvm-svn: 321265
Clang 5 or higher adds an intermediate function call in certain cases when
compiling with debug flag. This revision updates the testcases to work
correctly.
Differential Revision: https://reviews.llvm.org/D40595
llvm-svn: 321263
Reasons for expected failures are mainly bugs when using lables in OpenMP regions
or missing support of some OpenMP features.
For some worksharing clauses, support to distinguish the kind of workshare was
added just recently.
If an issue was fixed in a minor release version of a compiler, we flag the
test as unsupported for this compiler version to avoid false positives.
Same for fixes that where backported to older compiler versions.
Differential Revision: https://reviews.llvm.org/D40384
llvm-svn: 321262
Otherwise I see hangs in the omp_single_copyprivate test when
compiling in release mode. With the debug assertions, I get a
failure `head > 0 && tail > 0`.
Differential Revision: https://reviews.llvm.org/D40722
llvm-svn: 320150
The runtime will use the global kmp_critical_name as a lock and
tries to atomically store a pointer in there. This will fail
if the global is only aligned by 4 bytes, the size of one int32_t
element. Use a union to ensure the global is aligned to the size
of a pointer on the current platform.
llvm-svn: 319811
__kmpc_reduce_nowait() correctly swapped the teams for reductions
in a teams construct. Apply the same logic to __kmpc_reduce() and
__kmpc_reduce_end().
Differential Revision: https://reviews.llvm.org/D40753
llvm-svn: 319788
Perform a nested CMake invocation to avoid writing our own parser
for compiler versions when we are not testing the in-tree compiler.
Use the extracted information to mark a test as unsupported that
hangs with Clang prior to version 4.0.1 and restrict tests for
libomptarget to Clang version 6.0.0 and later.
Differential Revision: https://reviews.llvm.org/D40083
llvm-svn: 319448
The code for the two OpenMP runtime libraries was very similar.
Move to common CMake file that is included and provides a simple
interface for adding testsuites. Also add a common check-openmp
target that runs all testsuites that have been registered.
Note that this renames all test options to the common OPENMP
namespace, for example OPENMP_TEST_C_COMPILER instead of
LIBOMP_TEST_COMPILER and so on.
Differential Revision: https://reviews.llvm.org/D40082
llvm-svn: 319343
As a first step, this allows us to generalize the detection of
standalone builds and make it fully compatible when building in
llvm/runtimes/ which automatically sets OPENMP_STANDLONE_BUILD.
Differential Revision: https://reviews.llvm.org/D40080
llvm-svn: 319341
Power has a weak consistency model so we need memory barriers to
make writes (both from runtime and from user code) available for
all threads.
Differential Revision: https://reviews.llvm.org/D40175
llvm-svn: 318848
These tests were failing rarely on my MacBook when there was some
activity in the background. Read: one of a thousand executions?
* sections.c missed the sorting based on thread ids. This worked
as long as the master thread finished its section before the
worker thread started the second one but failed if the master
thread was put to sleep by the OS.
* The checks in single.c assumed that the master thread executes
the single region which works most of the time because it is
usually faster than the newly spawned worker thread.
Differential Revision: https://reviews.llvm.org/D39853
llvm-svn: 318527
Traditionally, the library had a weak symbol for ompt_start_tool()
that served as fallback and disabled OMPT if called. Tools could
provide their own version and replace the default implementation
to register callbacks and lookup functions. This mechanism has
worked reasonably well on Linux systems where this interface was
initially developed.
On Darwin / Mac OS X the situation is a bit more complicated and
the weak symbol doesn't work out-of-the-box. In my tests, the
library with the tool needed to link against the OpenMP runtime
to make the process work. This would effectively mean that a tool
needed to choose a runtime library whereas one design goal of the
interface was to allow tools that are agnostic of the runtime.
The solution is to use dlsym() with the argument RTLD_DEFAULT so
that static implementations of ompt_start_tool() are found in the
main executable. This works because the linker on Mac OS X includes
all symbols of an executable in the global symbol table by default.
To use the same code path on Linux, the application would need to
be built with -Wl,--export-dynamic. To avoid this restriction, we
continue to use weak symbols on Linux systems as before.
Finally this patch extends the existing test to cover all possible
ways of initializing the tool as described by the standard. It
also fixes ompt_finalize() to not call omp_get_thread_num() when
the library is shut down which resulted in hangs on Darwin.
The changes have been tested on Linux to make sure that it passes
the current tests as well as the newly extended one.
Differential Revision: https://reviews.llvm.org/D39801
llvm-svn: 317980
If a parallel region is cancelled, execution resumes at the end
of the structured block. That is why this test cannot use the
"normal" macros that print right after inserting the label.
Instead it previously printed the addresses before the pragma
and swapped the checks compared to the other tests.
However, this does not work because FileChecks '*' is greedy
so that RETURN_ADDRESS always matched the second address. This
makes the test fail when an "overflow" occurrs and the first
address matches the value of codeptr_ra.
I discovered this on my MacBook but I'm unable to reproduce the
failure with the current version. Nevertheless we should fix this
problem to avoid that this test fails later after an unrelated change.
Differential Revision: https://reviews.llvm.org/D39708
llvm-svn: 317787
Return addresses are determined based on the address of a label
that is inserted directly after a pragma / API call. In some cases
the tests can assume a known number of instructions between the
addresses. However, the instructions and their encoded lengths
depend on the target that the test is compiled on.
Firstly, this patch refactors the macro print_current_address() to
allow such target dependent modifications and adds information for
the observed instructions on POWER. Secondly, it adapts the related
macro print_fuzzy_address() to reuse much of "hacky" code and fixes
the used formatting strings in the printf() call. Finally, it also
adds documentation about how these macros are intended to work.
Differential Revision: https://reviews.llvm.org/D39699
llvm-svn: 317786
The TR6 document is expected to be publically released around November 15.
This patch does not implement OMPT for libomptarget.
Patch by Simon Convent and Joachim Protze
Differential Revision: https://reviews.llvm.org/D39182
llvm-svn: 317436
This is part of the renaming of data types from OpenMP TR4 to TR6
Patch by Simon Convent
Differential Revision: https://reviews.llvm.org/D39326
llvm-svn: 317435
The TR6 document is expected to be publically released around November 15.
This patch does not implement OMPT for libomptarget.
Patch by Simon Convent and Joachim Protze
Differential Revision: https://reviews.llvm.org/D39182
llvm-svn: 317339
This is part of the renaming of data types from OpenMP TR4 to TR6
Patch by Simon Convent
Differential Revision: https://reviews.llvm.org/D39326
llvm-svn: 317338
This is a partial fix for bug 34050.
This prevents callers of omp_set_lock (which does not hold __kmp_global_lock)
from ever seeing an uninitialized version of __kmp_i_lock_table.table.
It does not solve a use-after-free race condition if omp_set_lock obtains a
pointer to __kmp_i_lock_table.table before it is updated and then attempts to
dereference afterwards. That race is far less likely and can be handled in a
separate patch.
The unit test usually segfaults on the current trunk revision. It passes with
the patch.
Patch by Adam Azarchs
Differential Revision: https://reviews.llvm.org/D39439
llvm-svn: 317115
The code is tested to work with latest clang, GNU and Intel compiler. The implementation
is optimized for low overhead when no tool is attached shifting the cost to execution with
tool attached.
This patch does not implement OMPT for libomptarget.
Patch by Simon Convent and Joachim Protze
Differential Revision: https://reviews.llvm.org/D38185
llvm-svn: 317085
This change fixes the implementation of OMP_THREAD_LIMIT. The implementation of
this previously was not restricted to a contention group (but it should be,
according to the spec), and this is fixed here. A field is added to root thread
to store a counter of the threads in the contention group. An extra check is
added when reserving threads for a parallel region that checks this variable and
compares to threadlimit-var, which is implemented as a new global variable,
kmp_cg_max_nth. Associated settings changes were also made, and clean up of
comments that referred to OMP_THREAD_LIMIT, but should refer to the new
KMP_DEVICE_THREAD_LIMIT (added in an earlier patch).
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D35912
llvm-svn: 309319
Summary:
Taskloop implementation is extended by using recursive task scheduling.
Envirable KMP_TASKLOOP_MIN_TASKS added as a manual threshold for the user
to switch from recursive to linear tasks scheduling.
Details:
* The calculations for the loop parameters are moved from __kmp_taskloop_linear
upper level
* Initial calculation is done in the __kmpc_taskloop, further range splitting
is done in the __kmp_taskloop_recur.
* Added threshold to switch from recursive to linear tasks scheduling;
* One half of split range is scheduled as an internal task which just moves
sub-range parameters to the stealing thread that continues recursive
scheduling (if number of tasks still enough), the other half is processed
recursively;
* Internal task duplication routine fixed to assign parent task, that was not
needed when all tasks were scheduled by same thread, but is needed now.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D35273
llvm-svn: 308338
I've found it very difficult to get test/parallel/omp_nested.c to pass
consistently across my build environments. The problem is that it creates N^2
threads (it is testing nested parallel regions), and that often exceeds the
thread limits on systems with many cores. We do raise the process limits in
lit, and that often helps, but if running lit with a smaller number of threads
or on a system where we're otherwise resource constrained, this particular test
tends to fail (because the runtime cannot create a sufficient number of
threads).
This seems to work: if the maximum number of threads is more than some small
number, then cap the number of threads used for the parallel region. The choice
of 4 here is somewhat arbitrary.
Differential Revision: https://reviews.llvm.org/D32033
llvm-svn: 306357