#endif was one line too low. If KMP_USE_ADAPTIVE_LOCKS is 0,
then queuing locks would incorrectly use drdpa lock mechanism.
This is a fix for https://llvm.org/bugs/show_bug.cgi?id=26649
llvm-svn: 264934
Removed reference to "ref ct" in a comment, as ref_ct no longer exists. Also
moved the comment to where the task_team is about to be tested if NULL.
llvm-svn: 264786
The problem is that the definition of kmp_cpuinfo_t contains:
char name [3*sizeof (kmp_cpuid_t)]; // CPUID(0x80000002,0x80000003,0x80000004)
and kmp_cpuid_t is only defined when compiling for x86.
Differential Revision: http://reviews.llvm.org/D18245
llvm-svn: 264535
For serialized parallel regions, wrong ids were reported. Now the same code is
used as in kmp_dispatch.cpp which emits the correct ids.
Differential Revision: http://reviews.llvm.org/D18348
llvm-svn: 264266
For non-serialized parallel regions the master thread issued two callbacks:
The first one in kmp_gsupport.c and the second in __kmp_join_call. Therefore
only trigger the callback in kmp_gsupport.c for serialized parallel regions.
Differential Revision: http://reviews.llvm.org/D16716
llvm-svn: 264264
Have Visual Studio use MemoryBarrier() instead of _mm_mfence() and remove
__declspec align attribute from function parameters in kmp_atomic.h
llvm-svn: 264166
Some basic checks next to the implementation should futher lower the
possibility to introduce regressions. (Note that this would have catched
the ordering issue fixed in rL258866 and pointed to rL263940.)
The tests are implementation dependent in one point because they assume that
thread ids are assigned in ascending order. This is not defined by the standard
but currently ensured in libomp. We have to think about another way of ordering
the threads should this ever be subject to change...
Note that this isn't aiming at replacing the implementation independent
test-suite at https://github.com/OpenMPToolsInterface/ompt-test-suite!
Differential Revision: http://reviews.llvm.org/D16715
llvm-svn: 264027
This change logically separates the stats_flags_e::noTotal bit flag from the
stats_flags_e::onlyInMaster and stats_flags_e::noUnits bit flags. If no
TOTAL_foo output is wanted for a particular statistic, the flag must be
explicitly included in that statistic's flags.
Differential Revision: http://reviews.llvm.org/D18198
llvm-svn: 263954
Building libomp using CMake versions < 3.3 caused a link time error. These
errors occurred because when assembling z_Windows_NT-586_asm.asm, the
definitions: OMPT_SUPPORT, _M_AMD64|_M_IA32 weren't defined on the command line.
To fix the problem, the COMPILE_FLAGS property for the assembly file is appended
to instead of the COMPILE_DEFINITIONS property being set. For whatever reason, the
COMPILE_DEFINITIONS property doesn't pick up the definitions for assembly files
for the older CMake versions.
llvm-svn: 263651
This change adds a header to the printout of the statistics which includes the
time, machine name, and processor info if available. This change also includes
some cosmetic changes like using enum casting for timer and counter iteration.
Differential Revision: http://reviews.llvm.org/D18153
llvm-svn: 263580
This change removes synthesized stats and instead has all timers print out a
total which is the aggregate statistics across threads. This is displayed as
"Total_foo" at the end of program. The stats_flags_e::synthesized flag is
removed and the printStats() function is split into two separate functions:
printTimerStats() which can display the aggregate total and printCounterStats().
Differential Revision: http://reviews.llvm.org/D17869
llvm-svn: 263290
From the standard: The taskloop construct specifies that the iterations of one
or more associated loops will be executed in parallel using OpenMP tasks. The
iterations are distributed across tasks created by the construct and scheduled
to be executed.
This initial implementation uses a simple linear tasks distribution algorithm.
Later we can add other algorithms to speedup generation of huge number of tasks
(i.e., tree-like tasks generation should be faster).
This needs to be put into the OpenMP runtime library in order for the
compiler team to develop the compiler side of the implementation.
Differential Revision: http://reviews.llvm.org/D17404
llvm-svn: 262535
From the standard: A doacross loop nest is a loop nest that has cross-iteration
dependence. An iteration is dependent on one or more lexicographically earlier
iterations. The ordered clause parameter on a loop directive identifies the
loop(s) associated with the doacross loop nest.
The init/fini routines allocate/free doacross buffer(s) for each loop for each
thread. The wait routine waits for a flag designated by the dependence vector.
The post routine sets the flag designated by current iteration vector. We use
a similar technique of shared buffer indices that covers up to 7 nowait loops
executed simultaneously by different threads (number 7 has no real meaning,
just heuristic value). Also, the size of structures are kept intact via
reducing dummy arrays.
This needs to be put into the OpenMP runtime library in order for the compiler
team to develop the compiler side of the implementation.
Differential Revision: http://reviews.llvm.org/D17399
llvm-svn: 262532
This change introduces the new OpenMP 4.5 affinity api surrounding
OpenMP Places. There are six new entry points:
Typically called in serial region:
* omp_get_num_places - returns the number of places available to the execution
environment in the place list.
* omp_get_place_num_procs - returns the number of processors available to the
execution environment in the specified place.
* omp_get_place_proc_ids - returns the numerical identifiers of the processors
available to the execution environment in the specified place.
Typically called inside parallel region:
* omp_get_place_num - returns the place number of the place to which the
encountering thread is bound.
* omp_get_partition_num_places - returns the number of places in the place
partition of the innermost implicit task.
* omp_get_partition_place_nums - returns the list of place numbers
corresponding to the places in the place-var ICV of the innermost
implicit task.
Differential Revision: http://reviews.llvm.org/D17417
llvm-svn: 261915
The maximum task priority value is read from envirable: OMP_MAX_TASK_PRIORITY.
But as of now, nothing is done with it. We just handle the environment variable
and add the new api: omp_get_max_task_priority() which returns that value or
zero if it is not set.
Differential Revision: http://reviews.llvm.org/D17411
llvm-svn: 261908
The monotonic/non-monotonic flags are sent to the runtime via the sched_type by
setting the 30th (non-monotonic) or 29th (monotonic) bit in the sched_type.
Macros are added to probe if monotonic or non-monotonic is specified
(SCHEDULE_HAS_[NON]MONOTONIC & SCHEDULE_HAS_NO_MODIFIERS)
and also to to get the base sched_type (SCHEDULE_WITHOUT_MODIFIERS)
Currently, nothing is done with the modifiers.
Also, this patch adds some comments on the use of the enumerations in at least
one place where it is subtle.
Differential Revision: http://reviews.llvm.org/D17406
llvm-svn: 261906
For pragma omp taskwait the runtime is called from the task context.
Therefore, the reentry frame information should be updated.
The information should be available for both taskwait event calls; therefore,
set before the first event and reset after the last event.
Patch by Joachim Protze
Differential Revision: http://reviews.llvm.org/D17145
llvm-svn: 260674
When a target task finishes and it tries to access the th_task_team from the
threads in the team where it was created, th_task_team can be NULL or point to
a different place when that thread started a nested region that is still
running. Finding the exact task_team that the threads were using is difficult
as it would require to unwind the task_state_memo_stack. So a new field was added
in the taskdata structure to point to the active task_team when the task was
created.
llvm-svn: 260615
The problem is that the master's thread state was not saved before entering a
parallel region so it does not remember tasks when it returns.
llvm-svn: 260306
The -install_name linker flag will use "@rpath/" when supported in CMake
which is the recommended usage for dynamic libraries on Mac OSX.
llvm-svn: 260300
(libgomp has bool as well)
This was causing a test failure in omp_test_if.c when building with GCC in
Debug mode. I have verified that GCC versions 4.9.2 and 5.3.0 now work and
compile-tested this change with clang 3.7.1 and Intel Compiler 16.0.
Differential Revision: http://reviews.llvm.org/D16921
llvm-svn: 260204
This will be used in a later patch to find additional LLVM tools for tests and
enables reusability for libomptarget that is currently under review.
Differential Revision: http://reviews.llvm.org/D16713
llvm-svn: 259876
When building executables for Cray supercomputers, statically-linked executables
are preferred. This patch makes it possible to build the OpenMP runtime as an
archive for building statically-linked executables. The patch adds the flag
LIBOMP_ENABLE_SHARED, which defaults to true. When true, a build of the OpenMP
runtime yields dynamic libraries. When false, a build of the OpenMP runtime
yields static libraries. There is no setting that allows both kinds of libraries
to be built.
Patch by John Mellor-Crummey
Differential Revision: http://reviews.llvm.org/D16525
llvm-svn: 259817
In: http://lists.llvm.org/pipermail/openmp-dev/2015-August/000858.html, a
performance issue was found with libomp's task dependencies. The task
dependencies hash table has an issue with collisions. The current table size is
a power of two. This combined with the current hash function causes a large
number of collisions to occurr. Also, the current size (64) is too small for
larger applications so the table size is increased.
This patch creates a two level hash table approach for task dependencies. The
implicit task is considered the "master" or "top-level" task which has a large
static sized hash table (997), and nested tasks will have smaller hash
tables (97). Prime numbers were chosen to help reduce collisions.
Differential Revision: http://reviews.llvm.org/D16640
llvm-svn: 259113
The attached patch adds support for ompt_event_task_dependences and
ompt_event_task_dependence_pair events from the OMPT specification [1]. These
events only apply to OpenMP 4.0 and 4.1 (aka 4.5) because task dependencies
were introduced in 4.0.
With respect to the changes:
ompt_event_task_dependences
According to the specification, this event is raised after the task has been
created, thefore this event needs to be raised after ompt_event_task_begin
(in __kmp_task_start). However, the dependencies are known at
__kmpc_omp_task_with_deps which occurs before __kmp_task_start. My modifications
extend the ompt_task_info_t struct in order to store the dependencies of the
task when _kmpc_omp_task_with_deps occurs and then they are emitted in
__kmp_task_start just after raising the ompt_event_task_begin. The deps field
is allocated and valid until the event is raised and it is freed and set
to null afterwards.
ompt_event_task_dependence_pair
The processing of the dependences (i.e. checking whenever a dependence is
already satisfied) is done within __kmp_process_deps. That function checks
every dependence and calls the __kmp_track_dependence routine which gives some
support for graphical output. I used that routine to emit the dependence pair
but I also needed to know the sink_task. Despite the fact that the code within
KMP_SUPPORT_GRAPH_OUTPUT refers to task_sink it may be null because
sink->dn.task (there's a comment regarding this) and in fact it does not point
to a proper pointer value because the value is set in node->dn.task = task;
after the __kmp_process_deps calls in __kmp_check_deps. I have extended the
__kmp_process_deps and __kmp_track_dependence parameter list to receive the
sink_task.
[1] https://github.com/OpenMPToolsInterface/OMPT-Technical-Report/blob/target/ompt-tr.pdf
Patch by Harald Servat
Differential Revision: http://reviews.llvm.org/D14746
llvm-svn: 259038
When the code behind the barrier is executed, the master thread may have
already resumed execution. That's why we cannot safely assume that *pteam
is not yet freed.
This has been introduced by r258866.
llvm-svn: 259037
Current clang trunk reports _OPENMP to be 201307 = OpenMP 4.0. It doesn't
recognize '#pragma omp declare target' though (patch still pending) and
therefore fails compilation.
Differential Revision: http://reviews.llvm.org/D16631
llvm-svn: 259026
For implcit barriers in simple parallel for loops, the order of the OMPT events
was wrong. The barrier_{begin,end} events came after the implcit_task_end
event for the implcit barrier at the end of the parallel region. This is wrong
because the implicit task executes the barrier before ending. This patch fixes
the order of the event: It will be triggerd now just before
__kmp_pop_current_task_from_thread() is called.
Patch by Tim Cramer
Differential Revision: http://reviews.llvm.org/D16347
llvm-svn: 258866
This change fixes the bug: https://llvm.org/bugs/show_bug.cgi?id=25975
by bypassing the perl module files which try to deduce system information.
These perl modules files don't offer useful information and are from the
original build system. They can be removed after this change.
llvm-svn: 258843
This change fixes one issue reported at https://llvm.org/bugs/show_bug.cgi?id=26184
There was missing cleanup code for the cached indirect lock pool. The change
will fix the reported case where it tries to initialize a lock after runtime
cleanup/reinitialization, but it is still possible that the user program runs
into another problem because most test programs have a call to __kmpc_set_lock
after cleanup/reinitialization without calling __kmpc_init_lock causing a crash/hang.
llvm-svn: 258528
The release builds are configured to be reproducible, so that the
binaries compare equal between bootstrap iterations. The OpenMP
run-time build was failing like this:
runtime/src/kmp_version.c:108:79: error: expansion of date or time macro is not reproducible [-Werror,-Wdate-time]
char const __kmp_version_build_time[] = KMP_VERSION_PREFIX "build time: " __DATE__ " " __TIME__;
Figuring as the build currently doesn't set LIBOMP_DATE, it's probably
OK to skip setting the build time here too.
llvm-svn: 257833
This new API, int kmp_set_thread_affinity_mask_initial(), is available for use
by other parallel runtime libraries inside a possibly OpenMP-registered thread.
This entry point restores the current thread's affinity mask to the affinity
mask of the application when it first began. If -1 is returned it can be assumed
that either the thread hasn't called affinity initialization or that the thread
isn't registered with the OpenMP library. If 0 is returned then, then the call
was successful. Any return value greater than zero indicates an error occurred
when setting affinity.
Differential Revision: http://reviews.llvm.org/D15867
llvm-svn: 257489
Recent changes to support dynamic locks didn't consider the code compiled when
OMPT_SUPPORT=true. As a result, the OMPT support was broken by recent changes
to nested locks to support dynamic locks. For OMPT to work with dynamic locks,
they need to provide a return code indicating whether a nested lock acquisition
was the first or not.
This patch moves the OMPT support for nested locks into the #else case when
DYNAMIC locks were not used. New support is needed for dynamic locks. This patch
fixes the build and leaves a placeholder where the missing OMPT callbacks can be
added either the author of the OMPT support for locks, or the dynamic
locking support.
Patch by John Mellor-Crummey
Differential Revision: http://reviews.llvm.org/D15656
llvm-svn: 256314
When users sets envirable KMP_BLOCKTIME to "infinite" (the time one busy-waits
at barrieres, etc.), the monitor thread is not useful and can be ignored. This
change prevents the creation of the monitor thread when the users sets
KMP_BLOCKTIME to "infinite".
Differential Revision: http://reviews.llvm.org/D15628
llvm-svn: 256061
This change allows clang to build the stats library for every architecture
which supports __builtin_readcyclecounter(). CMake also checks for all
necessary features for stats and will error out if the platform does not
support it.
Patch by Hal Finkel and Johnny Peyton
llvm-svn: 256002
This change set includes all changes to make the code conform to the OMP 4.5 specification:
* Removed hint / hinted_init definitions from include/40 files
* Hint values are powers of 2 to enable composition (4.5 spec)
* Hinted lock initialization functions were renamed (4.5 spec)
kmp_init_lock_hinted -> omp_init_lock_with_hint
kmp_init_nest_lock_hinted -> omp_init_nest_lock_with_hint
* __kmpc_critical_section_with_hint was added to support a critical section with
a hint (4.5 spec)
* __kmp_map_hint_to_lock was added to convert a hint (possibly a composite) to
an internal lock type
* kmpc_init_lock_with_hint and kmpc_init_nest_lock_with_hint were added as
internal entries for the hinted lock initializers. The preivous internal
functions (__kmp_init*) were moved to kmp_csupport.c and reused in multiple
places
* Added the two init functions to dllexports
* KMP_USE_DYNAMIC_LOCK is turned on if OMP_41_ENABLED is turned on
Differential Revision: http://reviews.llvm.org/D15205
llvm-svn: 255376
* Added a new user TSX lock implementation, RTM, This implementation is a
light-weight version of the adaptive lock implementation, omitting the
back-off logic for deciding when to specualte (or not). The fall-back lock is
still the queuing lock.
* Changed indirect lock table management. The data for indirect lock management
was encapsulated in the "kmp_indirect_lock_table_t" type. Also, the lock table
dimension was changed to 2D (was linear), and each entry is a
kmp_indirect_lock_t object now (was a pointer to an object).
* Some clean up in the critical section code
* Removed the limits of the tuning parameters read from KMP_ADAPTIVE_LOCK_PROPS
* KMP_USE_DYNAMIC_LOCK=1 also turns on these two switches:
KMP_USE_TSX, KMP_USE_ADAPTIVE_LOCKS
Differential Revision: http://reviews.llvm.org/D15204
llvm-svn: 255375
There are going to be two more patches which bring this feature up to date and in line with OpenMP 4.5.
* Renamed jump tables for the lock functions (and some clean up).
* Renamed some macros to be in KMP_ namespace.
* Return type of unset functions changed from void to int.
* Enabled use of _xebgin() et al. intrinsics for accessing TSX instructions.
Differential Revision: http://reviews.llvm.org/D15199
llvm-svn: 255373
Fix for crash in the teams construct in case user sets OMP_THREAD_LIMIT to a
number less than the number of processors. Now the number of threads will be
silently reduced if the user didn't specify teams parameters or with a
warning if the user specified teams parameters conflicting with
OMP_THREAD_LIMIT.
Differential Revision: http://reviews.llvm.org/D14732
llvm-svn: 254322
The task_team pointer is dereferenced unconditionally which causes a SEGFAULT
when it is NULL (e.g. for serialized parallel, that can happen for "teams"
construct or for "target nowait"). The solution is to skip second task team
setup for single thread team.
Differential Revision: http://reviews.llvm.org/D14729
llvm-svn: 254321
These changes allow libhwloc to be used as the topology discovery/affinity
mechanism for libomp. It is supported on Unices. The code additions:
* Canonicalize KMP_CPU_* interface macros so bitmask operations are
implementation independent and work with both hwloc bitmaps and libomp
bitmaps. So there are new KMP_CPU_ALLOC_* and KMP_CPU_ITERATE() macros and
the like. These are all in kmp.h and appropriately placed.
* Hwloc topology discovery code in kmp_affinity.cpp. This uses the hwloc
interface to create a libomp address2os object which the rest of libomp knows
how to handle already.
* To build, use -DLIBOMP_USE_HWLOC=on and
-DLIBOMP_HWLOC_INSTALL_DIR=/path/to/install/dir [default /usr/local]. If CMake
can't find the library or hwloc.h, then it will tell you and exit.
Differential Revision: http://reviews.llvm.org/D13991
llvm-svn: 254320
Fix ittnotify loop metadata reporting for schedule(runtime) and
chunked schedule set via OMP_SCHEDULE. The bug was that chunk=1
reported always.
llvm-svn: 252952
The patch adds support for ompt_event_task_switch into LLVM/OpenMP. Note that
the patch has also updated the signature of ompt_event_task_switch to
ompt_task_pair_callback_t (rather than the previous ompt_task_switch_callback_t).
Patch by Harald Servat
Differential Revision: http://reviews.llvm.org/D14566
llvm-svn: 252761
1) Add get_ptr_type() method to all wait flag types.
2) Flag in sleep_loc may change type by the time the resume is called from
__kmp_null_resume_wrapper. We use get_ptr_type to obtain the real type
and compare it to the casted object received. If they don't match, we know
the flag has changed (already resumed and replaced by another flag). If they
match, it doesn't hurt to go ahead and resume it.
Differential Revision: http://reviews.llvm.org/D14458
llvm-svn: 252487
1) When the number of threads in a team increases, new threads need to have all
their barrier struct fields initialized. We were missing the parent_bar and
team fields.
2) For non-forkjoin barriers, we now do the __kmp_task_team_setup before the
gather. The setup now sets up the task_team that all the threads will switch
to after the barrier, but it needs to be done before other threads do the
switch.
3) Remove an unneeded assignment of tt_found_tasks in task team free function.
Differential Revision: http://reviews.llvm.org/D14456
llvm-svn: 252486
These changes include:
1) Machine hierarchy now uses the base_num_threads field to indicate the
maximum number of threads the current hierarchy can handle without a resize.
2) In __kmp_get_hierarchy, we need to get depth after any potential resize
is done.
3) Cleanup of hierarchy resize code to support 1 above.
Differential Revision: http://reviews.llvm.org/D14455
llvm-svn: 252475
Setting dynamic schedule with chunk size 0 via omp_set_schedule(dynamic,0)
and then using "schedule (runtime)" causes infinite loop because for the
chunked dynamic schedule we didn't correct zero chunk to the default (1).
llvm-svn: 252338
Use of #ifdef OMPT_DEBUG was causing messages to be generated under normal
operation when the OpenMP library was compiled with KMP_DEBUG enabled.
Elsewhere, KMP_DEBUG evaluates assertions, but never produces messages during
normal operation. To avoid this inconsistency, set OMPT_DEBUG using a cmake
variable LIBOMP_OMPT_DEBUG.
While I was editing the associated ompt-specific.h and ompt-general.c files,
make the spacing and comments consistent.
Patch by John Mellor-Crummey
Differential Revision: http://reviews.llvm.org/D14355
llvm-svn: 252173
This is a refactoring of the task_team code that more elegantly handles the two
task_team case. Two task_teams per team are kept in use for the lifetime of the
team. Thus no reference counting is needed.
Differential Revision: http://reviews.llvm.org/D13993
llvm-svn: 252082
Add additional dependency to clang/clang-headers/FileCheck to avoid possible troubles with in-tree build/test of libomp + allow parallel testing of libomp. Also includes bugfixes for tests + improvements to avoid possible race conditions.
Differential Revision: http://reviews.llvm.org/D14055
llvm-svn: 251797
The problem is that the ompt_tool() function (which must be implemented by a
performance tool) should be defined in the RTL as well to cover the case when
the tool is not present in the address space of the process. This functionality
is accomplished with weak symbols in Unices. Unfortunately, Windows does not
support weak symbols.
The solution in these changes is to grab the list of all modules loaded by the
process and then search for symbol "ompt_tool()" within them. The function
ompt_tool_windows() performs the search of the ompt_tool symbol. If ompt_tool is
found, then its return value is used to initialize the tool. If ompt_tool is not
found, then ompt_tool_windows() returns NULL and OMPT is thus, disabled.
While doing these changes, the OMPT_SUPPORT detection in CMake was changed to
test for the required featuers for OMPT_SUPPORT, namely: builtin_frame_address()
existence, weak attribute existence and psapi.dll existence. For
LIBOMP_HAVE_OMPT_SUPPORT to be true, it must be that the builtin_frame_address()
intrinsic exists AND one of: either weak attributes exist or psapi.dll exists.
Also, since Process Status API is used I had to add new dependency -- psapi.dll
to the library dependency micro test.
Differential Revision: http://reviews.llvm.org/D14027
llvm-svn: 251654
The th.th_task_state for the master thread at the start of a nested parallel
should not be zeroed in __kmp_allocate_team() because it is later put in the
stack of states in __kmp_fork_call() for further re-use after exiting the
nested region. It is zeroed after being put in the stack.
Differential Revision: http://reviews.llvm.org/D13702
llvm-svn: 250847
Moved '@' from delimiters to offset designators for the KMP_PLACE_THREADS
environment variable. Only one of: postfix "o" or prefix @, should be used
in the value of KMP_PLACE_THREADS. For example, '2s@2,4c@2,1t'. This is also
the format of KMP_SETTINGS=1 output now (removed "o" from there).
e.g., 2s,2o,4c,2o,1t.
Differential Revision: http://reviews.llvm.org/D13701
llvm-svn: 250846