Summary:
Removed the function that used a lock and varargs
Used the same mechanism as for debug messages
Reviewers: ABataev, gtbercea, grokos, Hahnfeld
Reviewed By: gtbercea, Hahnfeld
Subscribers: mikerice, ABataev, RaviNarayanaswamy, guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D51226
llvm-svn: 340767
The __kmp_execute_tasks_template() function reads the task_team and
current_task from the thread structure. There appears to be a pathological
timing where the number of threads in the hot team decreases and so a
thread is put in the pool via __kmp_free_thread(). It could be the case that:
1) A thread reads th_task_team into task_team local variables
and is then interrupted by the OS
2) Master frees the thread and sets current task and task team to NULL
3) The thread reads current_task as NULL
When this happens, current_task is dereferenced and a segfault occurs.
This patch just checks for current_task to not be NULL as well.
Differential Revision: https://reviews.llvm.org/D50651
llvm-svn: 340632
If hot teams are not being used, this code could seg fault without the added
check, and does so when composability is used in conjunction with nesting.
The fix prevents the segfault.
Differential Revision: https://reviews.llvm.org/D50649
llvm-svn: 340629
Exclude nested explicit tasks from timing, only outer level explicit task
counted and its time added to barrier arrive time for the thread.
Differential Revision: https://reviews.llvm.org/D50584
llvm-svn: 340628
Summary:
Right now, only the OMP_TARGET_OFFLOAD=DISABLED was implemented. Added support for the other MANDATORY and DEFAULT values.
Reviewers: gtbercea, ABataev, grokos, caomhin, Hahnfeld
Reviewed By: Hahnfeld
Subscribers: protze.joachim, gtbercea, AlexEichenberger, RaviNarayanaswamy, Hahnfeld, guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D50522
llvm-svn: 340542
The idle callback was removed from the spec as of TR7.
This removes it from the implementation.
Patch provided by Simon Convent
Reviewers: hbae, protze.joachim
Differential Revision: https://reviews.llvm.org/D48362
llvm-svn: 339771
This change fixes an incorrect behavior of the omp_control_tool function when
called from Fortran applications. A tool callback function for this event is
supposed to get NULL for the third argument according to the specification, but
the current implementation just passes a garbage value. A possible fix is to use
the OPTIONAL attribute for the third argument.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D50565
llvm-svn: 339585
This patch cleans up unused functions, variables, sign compare issues, and
addresses some -Warning flags which are now enabled including -Wcast-qual.
Not all the warning flags in LibompHandleFlags.cmake are enabled, but some
are with this patch.
Some __kmp_gtid_from_* macros in kmp.h are switched to static inline functions
which allows us to remove the awkward definition of KMP_DEBUG_ASSERT() and
KMP_ASSERT() macros which used the comma operator. This had to be done for the
innumerable -Wunused-value warnings related to KMP_DEBUG_ASSERT()
Differential Revision: https://reviews.llvm.org/D49105
llvm-svn: 339393
This patch adds a test using the doacross clauses in OpenMP and removes gcc from
testing kmp_doacross_check.c which is only testing the kmp rather than the
gomp interface.
Differential Revision: https://reviews.llvm.org/D50014
llvm-svn: 338757
This is broken per PR36561 and PR36574, so disable it for now until
somebody interested can take a look. OMPT can still be activated manually
by passing -DLIBOMP_OMPT_SUPPORT=ON during configuration.
Differential Revision: https://reviews.llvm.org/D50086
llvm-svn: 338721
Only supported since GCC 6 and Intel 17.0. However GCC 6.3.0 is
crashing on two of the tests, so disable them as well...
Differential Revision: https://reviews.llvm.org/D50085
llvm-svn: 338720
The taskloop testcase had scheduling effects. Tasks of the taskloop would
sometimes be scheduled before all task were created. The testing is now
split into two phases. First, the task creation on the master is tested,
than the scheduling events of the tasks are tested. Thus, the order of
creation and scheduling events is irrelavant.
Patch by Simon Convent
Reviewed by: protze.joachim, Hahnfeld
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D50140
llvm-svn: 338580
GCC 4.8.5 defaults to this old C standard. I think we should make the
tests pass a newer -std=c99|c11 but that's too intrusive for now...
Differential Revision: https://reviews.llvm.org/D50084
llvm-svn: 338490
From the bug report, the runtime needs to initialize the nproc variables
(inside middle init) for each root when the task is encountered, otherwise,
a segfault can occur.
Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36720
Differential Revision: https://reviews.llvm.org/D49996
llvm-svn: 338313
Summary:
When OMPT is not supported the __kmp_omp_task() function is passed the parameters in the wrong order. This is a fix related to patch D47709.
Reviewers: Hahnfeld, sconvent, caomhin, jlpeyton
Reviewed By: Hahnfeld
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D50001
llvm-svn: 338295
GCC 8 produces false-positives with this:
In file included from <openmp>/src/runtime/src/kmp_os.h:950,
from <openmp>/src/runtime/src/kmp.h:78,
from <openmp>/src/runtime/src/kmp_environment.cpp:54:
<openmp>/src/runtime/src/kmp_environment.cpp: In function ‘char* __kmp_env_get(const char*)’:
<openmp>/src/runtime/src/kmp_safe_c_api.h:52:50: warning: ‘char* strncpy(char*, const char*, size_t)’ specified bound depends on the length of the source argument [-Wstringop-overflow=]
#define KMP_STRNCPY_S(dst, bsz, src, cnt) strncpy(dst, src, cnt)
~~~~~~~^~~~~~~~~~~~~~~
<openmp>/src/runtime/src/kmp_environment.cpp:97:5: note: in expansion of macro ‘KMP_STRNCPY_S’
KMP_STRNCPY_S(result, len, value, len);
^~~~~~~~~~~~~
<openmp>/src/runtime/src/kmp_environment.cpp:92:28: note: length computed here
size_t len = KMP_STRLEN(value) + 1;
This is stupid because result is allocated with KMP_INTERNAL_MALLOC(len),
so the arguments are correct.
Differential Revision: https://reviews.llvm.org/D49904
llvm-svn: 338283
This change introduces GOMP doacross compatibility. There are 12 new interface
functions 6 for long type and 6 for unsigned long long type:
GOMP_doacross_post, GOMP_doacross_wait, GOMP_loop_doacross_[schedule]_start
where schedule can be static, dynamic, guided, or runtime.
These functions just translate the parameters if necessary and send them
to the corresponding kmp function.
E.g., GOMP_doacross_post() -> __kmpc_doacross_post()
For the GOMP_doacross_post function, there is template specialization to
account for when long is a four byte vs an eight byte type. If it is a
four byte type, then a temporary array has to be created to convert the
four byte integers into eight byte integers and then sending that into
__kmpc_doacross_post(). Because GOMP_doacross_wait uses varargs, it
always needs a temporary array and does not need template specialization.
Differential Revision: https://reviews.llvm.org/D49857
llvm-svn: 338280
This change fixes build errors when building a runtime with adaptive lock stats
enabled. Most of the errors were due to the recent changes in the runtime, but
it seems that we have not tried to build this debug runtime on Windows for a
long time.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D49823
llvm-svn: 338277
1) Remove unnecessary data from list node structure
2) Remove timerPair in favor of pushing/popping explicitTimers.
This way, nested timers will work properly.
3) Fix #pragma omp critical timers
4) Add histogram capability
5) Add KMP_STATS_FILE formatting capability
6) Have time partitioned into serial & parallel by introducing
partitionedTimers::exchange(). This also counts the number of serial regions
in the executable.
7) Fix up the timers around OMP loops so that scheduling overhead and work are
both counted correctly.
8) Fix up the iterations statistics so they count the number of iterations the
thread receives at each loop scheduling event
9) Change timers so there is only one RDTSC read per event change
10) Fix up the outdated comments for the timers
Differential Revision: https://reviews.llvm.org/D49699
llvm-svn: 338276
Fix the order of callbacks related to the taskloop construct.
Add the iteration_count to work callbacks (according to the spec).
Use kmpc_omp_task() instead of kmp_omp_task() to include OMPT callbacks.
Add a testcase.
Patch by Simon Convent
Reviewed by: protze.joachim, hbae
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D47709
llvm-svn: 338146
The ompt/tasks/task_types.c testcase did not test untied tasks properly. Now,
frame addresses are tested and two scheduling points are added at which the
task can switch to another thread. Due to scheduling effects, the frame address
could be NULL.
This needed a restructure of the way OMPT callbacks are called.
__ompt_task_finish() now as an extra parameter, whether a task is completed.
Its invocation has been moved into __kmp_task_finish(). Thus, the order of the
writes to the frame addresses is not subject to scheduling effects anymore.
Patch by Simon Convent
Reviewed by: protze.joachim, hbae
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D49181
llvm-svn: 338145
The two more outputs are needed to match the return addresses when using the
Intel Compiler, as it generates more instructions between the fuzzy-printing
of the address and the runtime call.
Patch by Simon Convent
Reviewed By: protze.joachim, hbae
Differential Revision: https://reviews.llvm.org/D49373
llvm-svn: 338144
This function was not enabled by default and not exported when manually
tweaking the build flags. Additionally it was hard to use since there
is no corresponding __kmp_ft_page_free().
The code itself is questionable because the returned memory address
is padded by an extra pointer which stores the unpadded start of the
allocated region (this would need to be freed).
Differential Revision: https://reviews.llvm.org/D49802
llvm-svn: 338052
The initial commit said that the test passes with Intel Compiler,
so change XFAIL to only list clang and gcc.
Differential Revision: https://reviews.llvm.org/D49801
llvm-svn: 338051
Summary:
1. Fixed internal problem in `__kmpc_barrier` function: SPMD mode
synchronization function should be called only in L1 parallel level.
2. Removed some extra code for synchronization inside of the code, used
`__kmpc_barrier` instead.
3. Some code cleanup.
Reviewers: gtbercea, grokos
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D49564
llvm-svn: 337691
This change fixes possibly invalid access to the internal data structure during
library shutdown. In a heavily oversubscribed situation, the library shutdown
sequence can reach the point where resources are deallocated while there still
exist threads in their final spinning loop. The added loop in
__kmp_internal_end() checks if there are such busy-waiting threads and blocks
the shutdown sequence if that is the case. Two versions of kmp_wait_template()
are now used to minimize performance impact.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D49452
llvm-svn: 337486
This patch removes the translation code since this functionality is now implemented in the compiler.
target_data_begin and target_data_end are also patched to handle some special cases that used to be
handled by the obsolete translation function, namely ensure proper alignment of struct members when
we have partially mapped structs. Mapping a struct from a higher address (i.e. not from its beginning)
can result in distortion of the alignment for some of its member fields. Padding restores the original
(proper) alignment.
Differential revision: https://reviews.llvm.org/D44186
llvm-svn: 337455
In revision r336569 (D49036) libomptarget support for multiple nvidia images
has been fixed in case a target region resides inside one or multiple
libraries and in the compiled application. But the issues is still present
for elf images.
This fix will also support multiple images for elf.
Patch by Jannis Klinkenberg
Reviewers: protze.joachim, ABataev, grokos
Reviewed By: protze.joachim, ABataev, grokos
Subscribers: openmp-commits
Differential Revision: https://reviews.llvm.org/D49418
llvm-svn: 337355
Summary:
Should be variable name instead of variable reference. If the variable is
somehow unset, it messes up the if condition expression and causes a CMake
error.
Reviewers: jlpeyton, AndreyChurbanov, Hahnfeld
Reviewed By: Hahnfeld
Subscribers: mgorny, llvm-commits, openmp-commits
Differential Revision: https://reviews.llvm.org/D47221
llvm-svn: 337133
Summary: This patch fixes the data sharing infrastructure to work for the SPMD and non-SPMD cases.
Reviewers: ABataev, grokos, carlo.bertolli, caomhin
Reviewed By: ABataev, grokos
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D49204
llvm-svn: 337013
Summary:
Patch fixes the next problems.
1. Removes unused functions from omptarget_nvptx_ThreadPrivateContext
class + simplified data members.
2. Fixed calculation of loop boundaries for dynamic loops with static
scheduling.
3. Introduced saving/restoring of the dynamic loop boundaries to support
several nested parallel dynamic loops.
Reviewers: grokos
Subscribers: guansong, kkwli0, openmp-commits
Differential Revision: https://reviews.llvm.org/D49241
llvm-svn: 336915
This patch introduces the logic implementing hierarchical scheduling.
First and foremost, hierarchical scheduling is off by default
To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage.
This work is based off if the IWOMP paper:
"Workstealing and Nested Parallelism in SMP Systems"
Hierarchical scheduling is the layering of OpenMP schedules for different layers
of the memory hierarchy. One can have multiple layers between the threads and
the global iterations space. The threads will go up the hierarchy to grab
iterations, using possibly a different schedule & chunk for each layer.
[ Global iteration space (0-999) ]
(use static)
[ L1 | L1 | L1 | L1 ]
(use dynamic,1)
[ T0 T1 | T2 T3 | T4 T5 | T6 T7 ]
In the example shown above, there are 8 threads and 4 L1 caches begin targeted.
If the topology indicates that there are two threads per core, then two
consecutive threads will share the data of one L1 cache unit. This example
would have the iteration space (0-999) split statically across the four L1
caches (so the first L1 would get (0-249), the second would get (250-499), etc).
Then the threads will use a dynamic,1 schedule to grab iterations from the L1
cache units. There are currently four supported layers: L1, L2, L3, NUMA
OMP_SCHEDULE can now read a hierarchical schedule with this syntax:
OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK
And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before
I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h
to try to keep it separate from the rest of the code.
Differential Revision: https://reviews.llvm.org/D47962
llvm-svn: 336571
Summary:
Currently Cuda plugin supports loading of the single image, though we
may have the executable with the several images, if it has target
regions inside of the dynamically loaded library. Patch allows to load
multiple images.
Reviewers: grokos
Subscribers: guansong, openmp-commits, kkwli0
Differential Revision: https://reviews.llvm.org/D49036
llvm-svn: 336569
This patch reorganizes the loop scheduling code in order to allow hierarchical
scheduling to use it more effectively. In particular, the goal of this patch
is to separate the algorithmic parts of the scheduling from the thread
logistics code.
Moves declarations & structures to kmp_dispatch.h for easier access in
other files. Extracts the algorithmic part of __kmp_dispatch_init() and
__kmp_dispatch_next() into __kmp_dispatch_init_algorithm() and
__kmp_dispatch_next_algorithm(). The thread bookkeeping logic is still kept in
__kmp_dispatch_init() and __kmp_dispatch_next(). This is done because the
hierarchical scheduler needs to access the scheduling logic without the
bookkeeping logic. To prepare for new pointer in dispatch_private_info_t, a
new flags variable is created which stores the ordered and nomerge flags instead
of them being in two separate variables. This will keep the
dispatch_private_info_t structure the same size.
Differential Revision: https://reviews.llvm.org/D47961
llvm-svn: 336568
These are preliminary changes that attempt to use C++11 Atomics in the runtime.
We are expecting better portability with this change across architectures/OSes.
Here is the summary of the changes.
Most variables that need synchronization operation were converted to generic
atomic variables (std::atomic<T>). Variables that are updated with combined CAS
are packed into a single atomic variable, and partial read/write is done
through unpacking/packing
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D47903
llvm-svn: 336563
The flag "--no-as-needed" is not recognized by the linker on macOS making the following tests fail:
ompt/loadtool/tool_available/tool_available.c
ompt/loadtool/tool_not_available/tool_not_available.c
This patch removes this flag for macOS and adds it only for Linux and Windows.
I tested it on Ubuntu 16.04 and macOS HighSierra, with Clang/LLVM 6.0.1 and OpenMP trunk.
This solution was also discussed in the OpenMP-dev mailing list.
Patch provided by Simone Atzeni
Differential Revision: https://reviews.llvm.org/D48888
llvm-svn: 336327
The testcase potentially fails when a thread is reused.
The added synchronization makes sure this does not happen.
Patch provided by Simon Convent
Differential Revision: https://reviews.llvm.org/D48932
llvm-svn: 336326
When compiling with icc, there is a problem with reenter frame addresses in
parallel_begin callbacks in the interoperability.c testcase. (The address is
not available. thus NULL)
Using alloca() forces availability of the frame pointer.
Patch provided by Simon Convent
Differential Revision: https://reviews.llvm.org/D48282
llvm-svn: 336088
Several runtime entry points have not been tested from non-OpenMP threads. This
adds tests to an existing testcase. While at it, the testcase was reformatted
Patch provided by Simon Convent
Differential Revision: https://reviews.llvm.org/D48124
llvm-svn: 336087
Especially the thread_end callback has not been tested before.
This adds a testcase for nested and non-nested threads.
Patch provided by Simon Convent
Differential Revision: https://reviews.llvm.org/D47824
llvm-svn: 336086
The current implementation always provides the thread-num for the current
parallel region. This patch fixes the behavior for ancestor levels >0.
Differential Revision: https://reviews.llvm.org/D46533
llvm-svn: 336085
Summary:
Patch fixes several problems in the implementation of NVPTX RTL.
1. Detection of the last iteration for loops with static scheduling, no chunks.
2. Fixes reductions for the serialized parallel constructs.
3. Fixes handling of the barriers.
Reviewers: grokos
Reviewed By: grokos
Subscribers: Hahnfeld, guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D48480
llvm-svn: 335469
Upcoming changes to FileCheck will modify CHECK-DAG to not match
overlapping regions of the input. This test was found to be affected
because it expects to find four threads to invoke events of type
ompt_event_implicit_task_begin. It turns out this is wrong because
OMP_THREAD_LIMIT is set to 2, so there are only two threads. The
rest of the test got it right so it went unnoticed until now.
(Rewrite test and apply clang-format to it as discussed in the past.)
Differential Revision: https://reviews.llvm.org/D47119
llvm-svn: 333361
The generic entry points for static loop scheduling previously
hardcoded that the runtime was initialized. This can be wrong if
the compiler analyzes that the runtime is not needed and calls
the init functions accordingly.
This didn't affect clang-ykt because they have entry points for
different combinations of SPMD x Runtime not needed. I didn't do
measurements yet but with inlining we might get away with always
calling the generic interface and letting compiler and runtime
figure out the rest.
In any case, a correct runtime is always better than having
functions that may only be called if previous calls passed in
a specific set of arguments!
Differential Revision: https://reviews.llvm.org/D47131
llvm-svn: 333285
Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands.
This also fixes installation of libomptarget-nvptx that previously
didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX.
Differential Revision: https://reviews.llvm.org/D47130
llvm-svn: 333284
The existing implementation of the dynamic scheduling
breaks the contract introduced by the original openmp
runtime and, thus, is incorrect. Patch fixes it and
introduces correct dynamic scheduling model.
Thanks to Alexey Bataev for submitting this patch.
Differential Revision: https://reviews.llvm.org/D47333
llvm-svn: 333225
We already know where the CUDA SDK is, so there is no point in
letting Clang search for it again and possibly finding no or
a different installation.
--cuda-path is supported since the beginning of CUDA support in
Clang, so making this required doesn't impose additional restrictions.
Differential Revision: https://reviews.llvm.org/D46930
llvm-svn: 332495
Move all logic related to selecting the bitcode compiler and linker
into a new file and dynamically test required compiler flags. This
also adds -fcuda-rdc for Clang trunk as previously attempted in D44992
which fixes the build.
As a result this change also enables building the library by default
if all prerequisites are met.
Differential Revision: https://reviews.llvm.org/D46901
llvm-svn: 332494
Summary: Add function to the NVPTX libomptarget library that will return true if the current target region is being executed in SPMD mode.
Reviewers: ABataev, grokos, carlo.bertolli, caomhin
Reviewed By: grokos
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D46840
llvm-svn: 332360
implicit_task_end callbacks in nested parallel regions did not always give the
correct thread_num, since the inner parallel region may have already been
finalized.
Now, the thread_num is stored at the beginning of the implicit task and
retrieved at the end, whenever necessary.
A testcase was added as well.
Differential Revision: https://reviews.llvm.org/D46260
llvm-svn: 331632
The api_calls_misc.c testcase tests the following api calls:
ompt_get_callback()
ompt_get_state()
ompt_enumerate_states()
ompt_enumerate_mutex_impls()
These have not been tested previously.
The api_calls.c testcase has been renamed to api_calls_places.c because it only tests api calls that are related to places.
Differential Revision: https://reviews.llvm.org/D42523
llvm-svn: 331631
Summary:
Enable the device side debug messages at compile time, use env var to control at runtime.
To achieve this, an environment data block is passed to the device lib when it is loaded.
By default, the message is off, to enable it, a user need to set LIBOMPDEVICE_DEBUG=1.
Reviewers: grokos
Reviewed By: grokos
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D46210
llvm-svn: 331550
Summary: Minor printf format correction. NVCC ignore those. Clang will give warning on these if debug is enabled.
Reviewers: grokos
Reviewed By: grokos
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D45528
llvm-svn: 330944
Summary: The LIBOMPTARGET_NVPTX_DEBUG flag is inconsistent between using nvcc to generate .a file and clang to generate .bc file. Sync the two setting so we can get debug messages from the bc file path as well.
Reviewers: grokos
Subscribers: Hahnfeld, openmp-commits, mgorny
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D45530
llvm-svn: 330477
Currently, the affinity API reports garbage for the initial place list and any
thread's place lists when using KMP_AFFINITY=none|compact|scatter.
This patch does two things:
for KMP_AFFINITY=none, Creates a one entry table for the places, this way, the
initial place list is just a single place with all the proc ids in it. We also
set the initial place of any thread to 0 instead of KMP_PLACE_ALL so that the
thread reports that single place (place 0) instead of garbage (-1) when using
the affinity API.
When non-OMP_PROC_BIND affinity is used
(including KMP_AFFINITY=compact|scatter), a thread's place list is populated
correctly. We assume that each thread is assigned to a single place. This is
implemented in two of the affinity API functions
Differential Revision: https://reviews.llvm.org/D45527
llvm-svn: 330283
This patch introduces GOMP_taskloop to our API. It adds GOMP_4.5 to our
version symbols. Being a wrapper around __kmpc_taskloop, the function
creates a task with the loop bounds properly nested in the shareds so that
the GOMP task thunk will work properly. Also, the firstprivate copy constructors
are properly handled using the __kmp_gomp_task_dup() auxiliary function.
Currently, only linear spawning of tasks is supported
for the GOMP_taskloop interface.
Differential Revision: https://reviews.llvm.org/D45327
llvm-svn: 330282
Summary:
This one line change is to remove this warning message
"warning: integer conversion resulted in a change of sign"
Reviewers: grokos
Reviewed By: grokos
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D45415
llvm-svn: 329713
This change removes the unnecessary lock operation on __kmp_initz_lock inside
the __kmp_atfork_child() function for Linux; the lock variable is initialized
in the same function later.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D44949
llvm-svn: 328900
The summarizeStats.py script processes raw data provided by the
instrumented (stats-gathering) OpenMP* runtime library. It provides:
1) A radar chart which plots counters as frequency (per GigaTick) of use within
the program. The frequencies are plotted as log10, however values less than
one are kept as it is and represented in red color. This was done to help
visualize the differences better.
2) Pie charts separating total time as compute and non-compute. The compute and
non-compute times have their own pie charts showing the constructs that
contributed to them. The percentages listed are with respect to the total
time.
3) '.csv' file with percentage of time spent within the different constructs.
The script can be used as:
$ python $PATH_TO_SCRIPT/summarizeStats.py instrumented1.csv instrumented2.csv
Patch by Taru Doodi
Differential Revision: https://reviews.llvm.org/D41838
llvm-svn: 328568
Summary: The global stack initialization function may be called multiple times. The initialization of the shared memory slots should only happen when the function is called for the first time for a given warp master thread.
Reviewers: grokos, carlo.bertolli, ABataev, caomhin
Reviewed By: grokos
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D44754
llvm-svn: 328148
Summary: The check for the master warp must take into consideration the actual number of warps: the master warp is equal to the last active warp not necessarily WARPSIZE - 1.
Reviewers: grokos, carlo.bertolli, ABataev, caomhin
Reviewed By: grokos
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D44537
llvm-svn: 328146
Summary:
This patch allows worker to have a global memory stack managed by the runtime. This patch is needed for completeness and consistency with the globalization policy: if a worker-side variable escapes the current context it then needs to be globalized.
Until now, only the master thread was allowed to have such a stack. These global values can now potentially be shared amongst workers if the semantics of the OpenMP program require it.
Reviewers: ABataev, grokos, carlo.bertolli, caomhin
Reviewed By: grokos
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D44487
llvm-svn: 328144
Added settings code to read OMP_TARGET_OFFLOAD environment variable. Added
target-offload-var ICV as __kmp_target_offload, set via OMP_TARGET_OFFLOAD,
if available, otherwise defaulting to DEFAULT. Valid values for the ICV are
specified as enum values {0,1,2} for disabled, default, and mandatory. An
internal API access function __kmpc_get_target_offload is provided.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D44577
llvm-svn: 328046
Summary:
Allow the runtime to use the existing shared memory statically allocated slots.
When a variable is globalized, the underlying memory can be either shared or global memory (both have block-wide visibility). In this case, we allow that the storage to use a limited amount of shared memory that has been statically allocated already. Only if shared memory doesn't prove to be enough do we then invoke malloc() to create a new global memory slot.
Reviewers: ABataev, carlo.bertolli, grokos, caomhin
Reviewed By: grokos
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D44486
llvm-svn: 327639
Summary: To save on calls to malloc, this patch enables the re-use of pre-allocated global memory slots.
Reviewers: ABataev, grokos, carlo.bertolli, caomhin
Reviewed By: grokos
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D44470
llvm-svn: 327637
Summary:
This patch adds support for the sharing of variables from the master thread of a team to the worker threads of the team.
The runtime uses a stack structure implemented as a doubly-linked list of slots with each slot having the exact same size as the size requested. This implementation leverages existing data structures. The runtime functions are added as separate functions to avoid interfering with the current interface.
Limitations to be addressed in future patches:
- This current patch only employs global memory. In a future patch we will enable to usage for shared memory as an optimization.
- Allow the allocation of several requested sizes in the same slot.
Reviewers: ABataev, grokos, caomhin, carlo.bertolli
Reviewed By: grokos
Subscribers: Hahnfeld, guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D44260
llvm-svn: 327440
Summary: To make the two parts of the union have the same size, the size of vect needs to be increased by 16 bits.
Reviewers: grokos, carlo.bertolli, caomhin, ABataev
Reviewed By: grokos, ABataev
Subscribers: fedor.sergeev, guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D44254
llvm-svn: 327040
Summary:
This patch reverts the changes to libomptarget that were coupled with the changes to Clang code gen for data sharing using shared memory. A similar patch exists for Clang: D43625
Shared memory is meant to be used as an optimization on top of a more general scheme. So far we didn't have a global memory implementation ready so shared memory was a solution which applied to the current level of OpenMP complexity supported by trunk on GPU devices (due to the missing NVPTX backend patch this functionality has never been exercised). Now that we have a global memory solution this patch is "in the way" and needs to be removed (for now). This patch (or an equivalent version of it) will be put out for review once the global memory scheme is in place.
Reviewers: ABataev, grokos, carlo.bertolli, caomhin
Reviewed By: grokos
Subscribers: Hahnfeld, guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D43626
llvm-svn: 326950
We have to ensure that the runtime is initialized _before_ waiting
for the two started threads to guarantee that the master threads
post their ompt_event_thread_begin before the worker threads. This
is not guaranteed in the parallel region where one worker thread
could start before the other master thread has invoked the callback.
The problem did not happen with Clang becauses the generated code
calls __kmpc_global_thread_num() and cashes its result for functions
that contain OpenMP pragmas.
Differential Revision: https://reviews.llvm.org/D43882
llvm-svn: 326435
The thread_num parameter of ompt_get_task_info() was not being used previously,
but need to be set.
The print_task_type() function (form the task-types.c testcase) was merged into
the print_ids() function (in callback.h). Testing of ompt_get_task_info() was
added to the task-types.c testcase. It was not tested extensively previously.
Differential Revision: https://reviews.llvm.org/D42472
llvm-svn: 326338