Initializing an ompt_data_t object using the pointer union member is potentially
unsafe in 32-bit programs. This change fixes the issue
by using the constant, ompt_data_none.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D52046
llvm-svn: 343785
On Windows, child workers are terminated by the parent during the normal
program exit process (ExitProcess()) and they are not able to finish generating
their OpenMP events. We can force manual library shut down in __kmpc_end() to
fix this at least for the cases where __kmpc_end() is properly inserted.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D52628
llvm-svn: 343619
If the user requested LIBOMPTARGET_NVPTX_DEBUG, include asserts in
the bitcode library. Everything else will have very unpleasent
effects because asserts will appear when falling back to the static
library libomptarget-nvptx.a.
Differential Revision: https://reviews.llvm.org/D52701
llvm-svn: 343477
Pass in the correct value of isRuntimeUninitialized() which solves
parallel reductions as reported on the mailing list.
For reference: r333285 did the same for loop scheduling.
Differential Revision: https://reviews.llvm.org/D52725
llvm-svn: 343476
Patch suggested by Kelvin Li: removed optional "kind=" part of kind-selector
for variables with long names and kind names.
Differential Revision: https://reviews.llvm.org/D52712
llvm-svn: 343475
NVPTX requires addresses of pointer locations to be 8-byte aligned
or there will be an exception during runtime.
This could happen without this patch as shown in the added test:
getId() requires 4 byte of stack and putValueInParallel() uses 16
bytes to store the addresses of the captured variables.
Differential Revision: https://reviews.llvm.org/D52655
llvm-svn: 343402
According to OpenMP 4.5, p250:12-14:
If the requested nest level is outside the range of 0 and the
nest level of the current thread, as returned by the omp_get_level
routine, the routine returns -1.
The SPMD code path will need a similar fix.
Differential Revision: https://reviews.llvm.org/D51787
llvm-svn: 343401
Clang trunk will serialize nested parallel regions. Check that this
is correctly reflected in various API methods.
Differential Revision: https://reviews.llvm.org/D51786
llvm-svn: 343382
There is no support and according to the OpenMP 4.5, p238:7-9:
For implementations that do not support dynamic adjustment
of the number of threads this routine has no effect: the
value of dyn-var remains false.
Add a test that cancellation and nested parallelism aren't
supported either.
Differential Revision: https://reviews.llvm.org/D51785
llvm-svn: 343381
If there is no num_threads() clause we must consider the
nthreads-var ICV. Its value is set by omp_set_num_threads()
and can be queried using omp_get_max_num_threads().
The rewritten code now closely resembles the algorithm given
in the OpenMP standard.
Differential Revision: https://reviews.llvm.org/D51783
llvm-svn: 343380
infinite loop on removing non-mapped pointer-with-object.
Added test to check that libomptarget does not cause infinite loop when
trying to unmap the pointer-with-object data that was not previously
mapped.
llvm-svn: 343344
This patch also introduces testing for libomptarget-nvptx
which has been missing until now. I propose to add tests for
all bugs that are fixed in the future.
The target check-libomptarget-nvptx is not run by default because
- we can't determine if there is a GPU plugged into the system.
- it will require the latest Clang compiler. Keeping compatibility
with older releases would prevent testing newer code generation
developed in trunk.
Differential Revision: https://reviews.llvm.org/D51687
llvm-svn: 343324
This patch puts the __kmpc_critical_with_hint function in dllexports
and also replaces some OMP_45_ENABLED to OMP_50_ENABLED
Differential Revision: https://reviews.llvm.org/D52380
llvm-svn: 343143
Balanced affinity only updated the thread's affinity with the operating system.
This change also has the thread's private mask reflect that change as well so
that any API that probes the thread's affinity mask will report the correct
mask value.
Differential Revision: https://reviews.llvm.org/D52379
llvm-svn: 343142
This patch updates the ittnotify sources to the latest
corresponding with Intel(R) VTune(TM) Amplifier 2018
Differential Revision: https://reviews.llvm.org/D52378
llvm-svn: 343139
This change improves the performance of 376.kdtree by giving the compiler an
opportunity to do inlining and other optimizations for the call path,
__kmpc_omp_task_complete_if0()->__kmp_task_finish(), which is one of the hot
paths in the program; some functions in kmp_taskdeps.cpp were moved to the new
header file, kmp_taskdeps.h to achieve this.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D51889
llvm-svn: 343138
This change includes miscellaneous improvements as follows:
1) Added ompt_get_proc_id() implementation for Windows
2) Added parser and print tool for omp-tool-var, just in case it needs
to be printed (OMP_DISPLAY_ENV)
3) omp_control_tool is exported on Windows
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D50538
llvm-svn: 343137
Summary: NFC - just fixing a bug: the empty slot test was before the re-setting of the Stack pointer.
Reviewers: ABataev, caomhin, Hahnfeld
Reviewed By: ABataev
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D52122
llvm-svn: 343006
Summary:
There is currently no supported situation where the warp master is not the first thread in the warp.
This also avoids the device execution from hanging on Volta GPUs when ballot_sync is called by a number of threads that is less that the size of a warp.
Reviewers: ABataev, caomhin, grokos
Reviewed By: grokos
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D50188
llvm-svn: 342972
Summary:
We need the support for per-team shared variables to support codegen for
lastprivates/reductions. Patch adds this support by using shared memory
if the total size of the reductions/lastprivates is <= 128 bytes,
then pre-allocated buffer in global memory if size is <= 4K bytes,or
uses malloc/free, otherwise.
Reviewers: gtbercea, kkwli0, grokos
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D51875
llvm-svn: 342737
Summary:
Missed operation of the incrementing iterator when required just to
continue execution.
Reviewers: kkwli0, gtbercea, grokos
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D51937
llvm-svn: 341964
Some types and callback signatures have changed from TR6 to TR7.
Major changes (only adding signatures and stubs):
(-remove idle callback) done by D48362
-add reduction and dispatch callback
-add get_task_memory and finalize_tool runtime entry points
-ompt_invoker_t becomes ompt_parallel_flag_t
-more types of sync_regions
Patch provided by Simon Convent
Reviewers: hbae, protze.joachim
Differential Revision: https://reviews.llvm.org/D50774
llvm-svn: 341834
Add atomic hint flags to the enum.
The hint parameter type was changed to uint32_t in __kmpc_critical_with_hint()
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D51235
llvm-svn: 341694
ident flags reserved for atomic hints.
This patch adds omp_sync_hint_t to omp.h and omp_sync_hint_kind to omp_lib.h.
For better maintainability the list of macros for ident flags was replaced with
a enum. The new KMP_IDENT_ATOMIC_HINT_MASK was added to the enum to
support possible future atomic hints.
Also fix omp_lib.h.var to be under 72 chars again after 5.0 OpenMP Memory commit
Patch by Olga Malysheva
Differential Revision: https://reviews.llvm.org/D51233
llvm-svn: 341693
Implemented omp_alloc, omp_free, omp_{set,get}_default_allocator entries,
and OMP_ALLOCATOR environment variable.
Added support for HBW memory on Linux if libmemkind.so library is accessible
(dynamic library only, no support for static libraries).
Only used stable API (hbwmalloc) of the memkind library
though we may consider using experimental API in future.
The ICV def-allocator-var is implemented per implicit task similar to
place-partition-var. In the absence of a requested allocator, the uses the
default allocator.
Predefined allocators (the only ones currently available) are made similar
for C and Fortran, - pointers (long integers) with values 1 to 8.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D51232
llvm-svn: 341687
This is a follow-up to r341371: The new test for PR38704 doesn't
work with Clang 6.0. It uses an UNSUPPORTED: clang-6, but that
hasn't worked because the compiler features weren't known to lit.
llvm-svn: 341448
cuDeviceGetProperties has apparently been deprecated since CUDA 5.0.
Nvidia started using annotations only in CUDA 9.2, so nobody noticed
nor cared before.
The new function returns the same values, tested with a P100.
Differential Revision: https://reviews.llvm.org/D51624
llvm-svn: 341372
* cg and HasCancel in WorkDescr were never read and can be removed.
* This eliminates the last use of priv in ThreadPrivateContext.
* CounterGroup is unused afterwards.
* Remove duplicate external declares in omptarget-nvptx.cu that are
already in the header omptarget-nvptx.h.
Differential Revision: https://reviews.llvm.org/D51622
llvm-svn: 341370
If the runtime is uninitialized the master thread must Enqueue the
state object, and ALL threads must return immediately.
Found post-commit of https://reviews.llvm.org/D51222.
llvm-svn: 341328
Summary:
Implemented simple and lightweight runtime support for SPMD mode-based
constructs. It adds support for L2 sequential parallelism wihtout full
runtime support. Also, patch fixes some use cases for
uninitialized|lightweight runtime.
Reviewers: grokos, kkwli0, Hahnfeld, gtbercea
Subscribers: guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D51222
llvm-svn: 340944
Summary:
Removed the function that used a lock and varargs
Used the same mechanism as for debug messages
Reviewers: ABataev, gtbercea, grokos, Hahnfeld
Reviewed By: gtbercea, Hahnfeld
Subscribers: mikerice, ABataev, RaviNarayanaswamy, guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D51226
llvm-svn: 340767
The __kmp_execute_tasks_template() function reads the task_team and
current_task from the thread structure. There appears to be a pathological
timing where the number of threads in the hot team decreases and so a
thread is put in the pool via __kmp_free_thread(). It could be the case that:
1) A thread reads th_task_team into task_team local variables
and is then interrupted by the OS
2) Master frees the thread and sets current task and task team to NULL
3) The thread reads current_task as NULL
When this happens, current_task is dereferenced and a segfault occurs.
This patch just checks for current_task to not be NULL as well.
Differential Revision: https://reviews.llvm.org/D50651
llvm-svn: 340632
If hot teams are not being used, this code could seg fault without the added
check, and does so when composability is used in conjunction with nesting.
The fix prevents the segfault.
Differential Revision: https://reviews.llvm.org/D50649
llvm-svn: 340629
Exclude nested explicit tasks from timing, only outer level explicit task
counted and its time added to barrier arrive time for the thread.
Differential Revision: https://reviews.llvm.org/D50584
llvm-svn: 340628
Summary:
Right now, only the OMP_TARGET_OFFLOAD=DISABLED was implemented. Added support for the other MANDATORY and DEFAULT values.
Reviewers: gtbercea, ABataev, grokos, caomhin, Hahnfeld
Reviewed By: Hahnfeld
Subscribers: protze.joachim, gtbercea, AlexEichenberger, RaviNarayanaswamy, Hahnfeld, guansong, openmp-commits
Differential Revision: https://reviews.llvm.org/D50522
llvm-svn: 340542
The idle callback was removed from the spec as of TR7.
This removes it from the implementation.
Patch provided by Simon Convent
Reviewers: hbae, protze.joachim
Differential Revision: https://reviews.llvm.org/D48362
llvm-svn: 339771
This change fixes an incorrect behavior of the omp_control_tool function when
called from Fortran applications. A tool callback function for this event is
supposed to get NULL for the third argument according to the specification, but
the current implementation just passes a garbage value. A possible fix is to use
the OPTIONAL attribute for the third argument.
Patch by Hansang Bae
Differential Revision: https://reviews.llvm.org/D50565
llvm-svn: 339585