Summary:
The termination function duplicated the functionality of the
__attribute((destructor))-annotated function __kmp_internal_end_fini,
and we have no indication that this doesn't work.
The function might cause issues with link-time optimization turned on:
until very recently, none of the usual linkers was reporting functions
named in -Wl,-fini as used to the LTO plugin, so it might be dropped.
If the function is dropped, -Wl,-fini=__kmp_internal_end_fini doesn't
do what we want: with ld.bfd and lld it drops the FINI attribute from
.dynamic and with gold we get FINI = 0x0, which leads to a crash on
cleanup. This can be reproduced by building with
-DLLVM_ENABLE_PROJECTS="clang;openmp" \
-DLLVM_ENABLE_LTO=Thin \
-DLLVM_USE_LINKER=gold
The issue in lld has been fixed in f95273f75a, but gold remains without
fix so far.
Fixes PR43927.
Reviewers: JonChesterfield, jdoerfert, AndreyChurbanov
Reviewed By: AndreyChurbanov
Differential Revision: https://reviews.llvm.org/D69927
Summary:
[libomptarget][nfc] Move some source into common from nvptx
Moves some source that compiles cleanly under amdgcn into a common subdirectory
Includes some non-trivial files and some headers. Keeps the cuda file extension.
The build systems for different architectures seem unlikely to have much in
common. The idea is therefore to set include paths such that files under
common/src compile as if they were under arch/src as the mechanism for sharing.
In particular, files under common/src need to be able to include target_impl.h.
The corresponding -Icommon is left out in favour of explicit includes on the
basis that the it makes it clearer which files under common are used by a given
architecture.
Reviewers: jdoerfert, ABataev, grokos
Reviewed By: ABataev
Subscribers: jfb, mgorny, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D70328
The tool provides TSAN annotations for OpenMP synchronization. The tool
is activated if no other OMPT tool is loaded.
The tool detects whether the application was built with TSan and rejects
activation according to the OMPT protocol if there is no TSan-rt.
Differential Revision: https://reviews.llvm.org/D45890
Summary:
[libomptarget][nfc] Use cuda variable wrappers from support.h
Reimplementation of D69693, after the revert of D69885
Use the wrappers in support.h for cuda builtin variables at all call sites.
Localises use of cuda and removes WARPSIZE==32 assumption in debug.h.
Reviewers: ABataev, jdoerfert, grokos
Reviewed By: jdoerfert
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D70186
Summary:
[libomptarget] Move supporti.h to support.cu
Reimplementation of D69652, without the unity build and refactors.
Will need a clean build of libomptarget as the cmakelists changed.
Reviewers: ABataev, jdoerfert
Reviewed By: jdoerfert
Subscribers: mgorny, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D70131
Summary:
[libomptarget] Revert all improvements to support
The change to unity build for nvcc has broken the build for some developers.
This patch reverts to a known-working state.
There has been some confusion over exactly how the build broke. I think we
have reached a common understanding that the disappearing symbols are from
the bitcode library built by clang. The static archive built by nvcc may show the
same problem. Some of the confusion arose from building the deviceRTL twice
and using one or the other library based on various environmental factors.
I'm pretty sure the problem is clang expanding `__forceinline__` into both `__inline__`
and `attribute(("always_inline"))`. The `__inline__` attribute resolves to linkonce_odr
which is not safe for exporting symbols from translation units.
"always_inline" is the desired semantic for small functions defined in one translation
unit that are intended to be inlined at link time. "inline" is not.
This therefore reintroduces the dependency hazard of supporti.h and some code
duplication, and blocks progress separating deviceRTL into reusable components.
See also D69857, D69859 for attempts at a fix instead of a revert.
Reviewers: ABataev, jdoerfert, grokos, ikitayama, tianshilei1992
Reviewed By: ABataev
Subscribers: mgorny, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D69885
Summary:
[libomptarget] Implement target_impl for amdgcn
Smallest atomic addition for a new target. Implements enough of the amdgcn
specific code that some of the source files under nvptx/src could be compiled,
without modification, to run on amdgcn.
This foreshadows a work in progress patch to move said source out of nvptx/src.
Patch based on fork at https://github.com/ROCm-Developer-Tools/llvm-project
Reviewers: ABataev, jdoerfert, grokos, ronlieb
Subscribers: jvesely, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D69718
Summary:
[nfc][omptarget] Use builtin var abstraction. Second pass at D69476
Use the wrappers in support.h for cuda builtin variables at all call sites.
Localises use of cuda and removes WARPSIZE==32 assumption in debug.h.
Reviewers: ABataev, jdoerfert, grokos
Reviewed By: jdoerfert
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D69693
Summary:
[nfc][libomptarget] Reorganise support header
All functions defined in support implementation are now declared in support.h
Reordered functions in support implementation to match the sequence in support.h
Added include guards to support.h
Added #include interface to support.h to provide kmp_Ident declaration
Move supporti.h to support.cu and s/INLINE/EXTERN/g
Add remaining includes to support.cu
A minor side effect is to change the name mangling of the support functions to
extern "C". If this matters another macro along the lines of INLINE/EXTERN
can be added - perhaps DEVICE as that's the obvious implementation.
Reviewers: jdoerfert, ABataev, grokos
Reviewed By: jdoerfert
Subscribers: mgorny, jfb, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D69652
Summary:
[libomptarget] Change nvcc compilation to use a unity build
This allows nvcc to inline functions between what would otherwise be distinct
translation units, which in turn removes any runtime cost from implementing
functions in source files (as opposed to inline in headers).
This will then allow the circular dependencies in deviceRTL to be readily
broken and individual components more easily shared between architectures.
Reviewers: ABataev, jdoerfert, grokos, RaviNarayanaswamy, hfinkel, ronlieb, gregrodgers
Reviewed By: jdoerfert
Subscribers: mgorny, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D69489
Summary:
[libomptarget] Always call malloc, free via SafeMalloc, SafeFree wrapper
NFC for release, adds some verbosity to debug printing. Motivation is to provide
one place where local modifications can be made to the behaviour of all heap
allocation or deallocation while debugging.
Reviewers: jdoerfert, ABataev, grokos
Reviewed By: ABataev
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D69492
Summary:
[nfc][libomptarget] Decrease coupling between files
debug.h used the symbol omptarget_device_environment so implicitly required
an include of omptarget-nvptx.h to compile. Similarly interface.h uses size_t.
Moving this declaration to a new header means cancel, critical can now build
without omptarget-nvptx.h. After this change, debug.h, cancel.cu, critical.cu
could move under a common source directory.
Reviewers: ABataev, jdoerfert, grokos
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D69473
Summary:
[nfc][libomptarget] Inline option into target_impl
Subset of D69423. The macros that were in option.h are all target dependent.
Inlining the header simplifies the dependency graph when looking to move code
into a common subdir.
Reviewers: ABataev, jdoerfert, grokos
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D69472
Summary:
[NFC][libomptarget] move remaining device specific code out of omptarget-nvptx.h
Strictly there is one remaining difference wrt amdgcn - parallelLevel is
volatile qualified on amdgcn and not on nvptx. Determining whether this is
correct - and how to represent the different semantics of 'volatile' under
various conditions - is beyond the scope of this code motion patch.
Reviewers: ABataev, jdoerfert, grokos
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D69424
Details:
- nconflicts field initialized;
- formatting fix (moved declaration out of the long line);
- count conflicts in new hash as opposed to old one.
Differential Revision: https://reviews.llvm.org/D68036
Summary:
[libomptarget][nfc] Make interface.h target independent
Move interface.h under a top level include directory.
Remove #includes to avoid the interface depending on the implementation.
Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy
Reviewed By: jdoerfert
Subscribers: mgorny, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D68615
llvm-svn: 374919
Summary:
[libomptarget][nfc] Update remaining uint32 to use lanemask_t
Update a few functions in the API to use lanemask_t instead of i32. NFC for
nvptx. Also update the ActiveThreads type in DataSharingStateTy.
This removes a lot of #ifdef from the downsteam amdgcn implementation.
Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D68513
llvm-svn: 373806
/proc unless Linux layer compatibility is activated for CentOS is activated is not present
thus relying on a more native for checking the address.
Reviewers: Hahnfeld, kongyl, jdoerfert, jlpeyton, AndreyChurbanov, emaster, dim
Reviewed By: Hahnfeld
Differential Revision: https://reviews.llvm.org/D67326
llvm-svn: 373152
Linker automatically provides __start_<section name> and __stop_<section name> symbols to satisfy unresolved references if <section name> is representable as a C identifier (see https://sourceware.org/binutils/docs/ld/Input-Section-Example.html for details). These symbols indicate the start address and end address of the output section respectively. Therefore, renaming OpenMP offload entries section name from ".omp.offloading_entries" to "omp_offloading_entries" to use this feature.
This is the first part of the patch for eliminating OpenMP linker script (please see https://reviews.llvm.org/D64943).
Differential Revision: https://reviews.llvm.org/D68070
llvm-svn: 373118
There's no need to initialize variables with static storage duration
because they're implicitly initialized to zero. See
https://en.cppreference.com/w/c/language/initialization#Implicit_initialization
I think that's already relied upon because the supplied 0 only sets
'kmp_time_global_t g_time;' in 'struct kmp_base_global'. The other fields
are not set in the code, but implicitly initialized by the compiler.
Differential Revision: https://reviews.llvm.org/D66292
llvm-svn: 370943
Summary:
In non-SPMD mode we may end up with the divergent threads when trying to
increment/decrement parallel level counter. It may lead to incorrect
calculations of the parallel level and wrong results when threads are
divergent. We need to reconverge the threads before trying to modify the
parallel level counter.
Reviewers: grokos, jdoerfert
Subscribers: guansong, openmp-commits, caomhin, kkwli0
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66802
llvm-svn: 370803
Summary:
Use target_impl functions to replace more inline asm
Follow on from D65836. Removes remaining asm shuffles and lanemask accessors
Also changes the types of target_impl bitwise functions to unsigned.
Reviewers: jdoerfert, ABataev, grokos, Hahnfeld, gregrodgers, ronlieb, hfinkel
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66809
llvm-svn: 370216
Summary:
[libomptarget] Refactor syncthreads macro to inline function
See also abandoned D66846, split into this diff and others.
Rev 2 of D66855
Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66861
llvm-svn: 370210
Summary:
[libomptarget] Refactor syncwarp macro to inline function
See also abandoned D66846, split into this diff and others.
Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66857
llvm-svn: 370149
Summary:
[libomptarget] Refactor shfl_down_sync macro to inline function
See also abandoned D66846, split into this diff and others.
Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66853
llvm-svn: 370146
Summary:
[libomptarget] Refactor shfl_sync macro to inline function
See also abandoned D66846, split into this diff and others.
Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66852
llvm-svn: 370144
Summary:
Added function void __kmpc_syncwarp(int32_t) to expose it to the
compiler. It is required to fix the problem with the critical regions in
Cuda9.0+. We cannot use barrier in the critical region, but still need
to reconverge the threads in the warp after. This function allows to do
this.
Reviewers: grokos, jdoerfert
Subscribers: guansong, openmp-commits, kkwli0, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D66672
llvm-svn: 369933
Summary:
In Cuda 9.0 it is not guaranteed that threads in the warps are
convergent. We need to use __syncwarp() function to reconverge
the threads and to guarantee the memory ordering among threads in the
warps.
This is the first patch to fix the problem with the test
libomptarget/deviceRTLs/nvptx/src/sync.cu on Cuda9+.
This patch just replaces calls to __shfl_sync() function with the call
of __syncwarp() function where we need to reconverge the threads when we
try to modify the value of the parallel level counter.
Reviewers: grokos
Subscribers: guansong, jfb, jdoerfert, caomhin, kkwli0, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D65013
llvm-svn: 369796
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=42906, via adding
adjustment of number of threads on enter to the teams construct on host
according to user settings. This allows to pass checks and avoid assertions
at time of team of threads creation.
Patch by Andrey Churbanov
Differential Revision: https://reviews.llvm.org/D66351
llvm-svn: 369430
Fix last warned location in ittnotify_static.cpp using the defined
macro KMP_FALLTHROUGH().
Differential Revision: https://reviews.llvm.org/D65871
llvm-svn: 369003
The variables in kmp_lock.cpp are really arrays of function pointers
that return void or int, not pointers to functions that return void*
or int*. The other changes are only cosmetic.
Differential Revision: https://reviews.llvm.org/D65870
llvm-svn: 369002
The implementation status can only be one of
ompt_event_UNIMPLEMENTED = ompt_set_never = 1
ompt_event_MAY_ALWAYS = ompt_set_always = 5
In both cases, the condition was already true, so just remove
the check.
Differential Revision: https://reviews.llvm.org/D65869
llvm-svn: 369001
Instead, maintain a list of disabled options to still build libomp and
libomptarget without warnings. This includes -Wno-error and -Wno-pedantic
to silence warnings that LLVM enables when building in-tree.
I tested the following compilers:
* Clang 6.0, 7.0, 8.0
* GCC 4.8.5 (CentOS 7), GCC 6, 7, 8, 9
* Intel Compiler 16, 17, 18, 19
RFC thread on openmp-dev mailing list:
http://lists.llvm.org/pipermail/openmp-dev/2019-August/002668.html
Differential Revision: https://reviews.llvm.org/D65867
llvm-svn: 368999
Summary:
[libomptarget] Factor architecture dependent code out of loop.cu
Related to the patch series starting D64217. Added subscribers to said series as reviewers. This effort is smaller in scope.
This patch factors out just enough architecture dependent code from loop.cu to allow the same source to be used with amdgcn, given a different target_impl.h. Testing is that the same bitcode (modulo variable names) is generated for libomptarget before and after the refactor, for nvptx and the out of tree amdgcn.
Reviewers: jdoerfert, ABataev, bollu, jfb, tra, grokos, Hahnfeld, guansong, xtian, gregrodgers, ronlieb, hfinkel, gtbercea, guraypp, arpith-jacob
Reviewed By: jdoerfert, ABataev
Subscribers: dexonsmith, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D65836
llvm-svn: 368751
This patch fixes problem raised in post-review comments of the
https://reviews.llvm.org/D65285. Developers of ittnotify confirmed
that dll_path_ptr field of the __itt_global structure is never used
by ittnotify library, so it is safe to remove the dll_path array.
Differential Revision: https://reviews.llvm.org/D65885
llvm-svn: 368559
Summary:
This patch adds support for the close map modifier.
The close map modifier will overwrite the unified shared memory requirement and create a device copy of the data.
Reviewers: ABataev, Hahnfeld, caomhin, grokos, jdoerfert, AlexEichenberger
Reviewed By: Hahnfeld, AlexEichenberger
Subscribers: guansong, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D65340
llvm-svn: 368488
We have one global RTLs.RequiresFlags, I don't see a need to make a
copy per device that the runtime manages. This was problematic anyway
because the copy happened during the first __tgt_register_lib(). This
made it impossible to call __tgt_register_requires() from normal user
funtions for testing.
Hence, this change also fixes unified_shared_memory/shared_update.c for
older versions of Clang that don't call __tgt_register_requires() before
__tgt_register_lib().
Differential Revision: https://reviews.llvm.org/D66019
llvm-svn: 368465
Summary:
This patch adds support for using unified memory in the case of regular maps that happen when a target region is offloaded to the device.
For cases where only a single version of the data is required then the host address can be used. When variables need to be privatized in any way or globalized, then the copy to the device is still required for correctness.
Reviewers: ABataev, jdoerfert, Hahnfeld, AlexEichenberger, caomhin, grokos
Reviewed By: Hahnfeld
Subscribers: mgorny, guansong, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D65001
llvm-svn: 368192
Summary:
[libomptarget] Use forceinline. Necessary for nvcc to inline small functions within the bitcode library
Suggested in D65836
Reviewers: ABataev, jdoerfert, grokos, gregrodgers
Subscribers: openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D65876
llvm-svn: 368177
New OMPT tests with teams construct should be disabled for GCC as it
emits code with a GOMP entry not supported in the LLVM runtime.
Differential Revision: https://reviews.llvm.org/D65757
llvm-svn: 367939
Ensures that CUDA fail reasons (such as "No CUDA-capable device detected")
are printed together with libomptarget's debug message
(e.g. "Error when setting CUDA context"). Previously, the former was
printed only in CMAKE_BUILD_TYPE=Debug builds while the latter was
enabled by LIBOMPTARGET_ENABLE_DEBUG.
With this change, also only call cuGetErrorString when the error will be
printed.
Suggested-by: Ye Luo <xw111luoye@gmail.com>
Differential Revision: https://reviews.llvm.org/D65687
llvm-svn: 367910
This patch implements the libomptarget runtime interface for OpenMP 5.0
declare mapper functions. The declare mapper functions generated by
Clang will call them to complete the mapping of members.
kmpc_mapper_num_components gets the current number of components for a
user-defined mapper; kmpc_push_mapper_component pushes back one
component for a user-defined mapper.
The design slides can be found at
https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx
Patch by Lingda Li <lildmh@gmail.com>
Differential Revision: https://reviews.llvm.org/D60972
llvm-svn: 367772
All other files are already C++ and the build system has always
passed '-x c++' for C files, effectively compiling them as C++.
To stay warning free we need one fix in ittnotify_static.{c,cpp}:
The variable dll_path can be written to, so it must not be const.
GCC complained with -Wcast-qual and I think it's right.
Differential Revision: https://reviews.llvm.org/D65285
llvm-svn: 367343
Round the stack size to a multiple of the page size. Older versions of
Android (until KitKat) would fail pthread_attr_setstacksize with
EINVAL if the stack size was not a multiple of the page size.
Patch by Dan Albert <danalbert@google.com>.
Test: Build, copied into the NDK, passed openmp test on ICS.
Bug: https://github.com/android-ndk/ndk/issues/9
llvm-svn: 367070
Both Clang and GCC complained that they cannot initialize a return
object of type 'kmp_proc_bind_t' with an 'int'. While at it, also
fix a warning about missing parentheses thrown by Clang.
Differential Revision: https://reviews.llvm.org/D65284
llvm-svn: 367041
Summary:
According to the OpenMP standard, barrier operation must perform
implicit flush operation. Currently, if there is only one thread in the
team, barrier does not flush the memory. Patch fixes this problem.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D62398
llvm-svn: 367024
This is a port of libomp for the RISC-V 64-bit Linux target.
We have tested this port on a HiFive Unleashed development board
using a downstream LLVM that has support for the missing bits in
upstream. As of now, all tests are passing, including OMPT.
Patch by Ferran Pallarès!
Differential Revision: https://reviews.llvm.org/D59880
llvm-svn: 367021
If the first target region in a program calls the push_tripcount
function, libomptarget didn't handle the offload policy correctly.
This could lead to unexpected error messages as seen in
http://lists.llvm.org/pipermail/openmp-dev/2019-June/002561.html
To solve this, add a check calling IsOffloadDisabled() as all other
entry points already do. If this method returns false, libomptarget
is effectively disabled.
Differential Revision: https://reviews.llvm.org/D64626
llvm-svn: 366810
This is done at call-site and does not need to be handled in
__kmp_invoke_microtask. It was already absent from the x86
and x86_64 assembly, this patch removes it from the generic
implementation in z_Linux_util.cpp and adds documentation for
AArch64 and PPC64 that it's actually not needed. I can't test
on these architectures, so I don't want to change the code just
because it looks right :)
While at it, rename some variables for consistency and add a
check in test/ompt/parallel/normal.c that the pointer was reset
before entering the barrier.
Differential Revision: https://reviews.llvm.org/D64442
llvm-svn: 366721
Remove loopTripCnt from threaded device stack after consuming it.
Added a libomptarget DP message to aid in future debugging and to
validate the added testcase, which only runs in Debug build.
Differential Revision: https://reviews.llvm.org/D64808
llvm-svn: 366349
This is a follow up patch to D64534 (r365963) which removed all OMP
spec versioning within the OpenMP runtime codebase. This patch removes
REQUIRES: openmp-x.y lines from lit tests.
llvm-svn: 366341
This leads to problems when compiling C++ code with libc++ for Nvidia GPUs
because Clang now uses wrappers for math functions that might include
C++ templates not allowed in 'extern "C"'.
Differentiel Revision: https://reviews.llvm.org/D64625
llvm-svn: 366229
Summary:
We used CUDART_VERSION macro to check for the installed cuda version
but this macro is defined in cuda_runtime_api.h, which is not used by
project. Better to use CUDA_VERSION macro, which is defined in cuda.h.
Also, added the check if this macro is defined. If macro is undefined,
there is something wrong with the cuda configuration and we should not
continue the compilation.
This also fixes problems with runtime building in cuda 10+.
Reviewers: grokos
Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D64648
llvm-svn: 366224
Summary:
We used to call __kmpc_omp_taskwait function with global threadid set to
0. It may crash the application at the runtime if the thread executing
target region is not a master thread.
Reviewers: grokos, kkwli0
Subscribers: guansong, jdoerfert, caomhin, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D64571
llvm-svn: 366220
Remove all older OMP spec versioning from the runtime and build system.
Patch by Terry Wilmarth
Differential Revision: https://reviews.llvm.org/D64534
llvm-svn: 365963
These entry points are never called by Clang trunk nor clang-ykt. If
XL doesn't use them either, they can finally go away.
Differential Revision: https://reviews.llvm.org/D52700
llvm-svn: 365817
Summary:
__kmpc_push_tripcount function is not thread safe and may lead to data
race when the target regions are executed in parallel threads. The patch
makes loopTripCnt counter thread aware and stores the tripcount value
per thread in the map. Access to map is guarded by mutex to prevent
data race in the map itself.
Test is for NVPTX target because it does not work correctly on the
host. Seems to me, there is a problem in libomp with target regions in
the parallel threads.
Reviewers: grokos
Subscribers: guansong, jfb, jdoerfert, openmp-commits, kkwli0, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D64080
llvm-svn: 365332
Summary:
According to the OpenMP standard, flush makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied.
According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality.
__threadfence_system() also provides required functionality, but it also
includes some extra functionality, like synchronization of page-locked
host memory, synchronization for the host, etc. It is not required per
the standard and we can use more relaxed version of memory fence
operation.
Reviewers: grokos, gtbercea, kkwli0
Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D62397
llvm-svn: 364572
Bug reported in https://bugs.llvm.org/show_bug.cgi?id=42269.
Freeing of the contention group (CG) stucture by master thread looks wrong,
because workers can leave the CG later on. Intead the freeing
is now done by the last thread leaving the CG.
Differential Revision: https://reviews.llvm.org/D63599
llvm-svn: 364456
Summary:
This patch adds support for handling variables under the:
```
#pragma omp declare target to()
```
clause when the
```
#pragma omp requires unified_shared_memory
```
is used.
The address of the host variable is copied into the device pointer just like for the declare target link case.
Reviewers: ABataev, caomhin, grokos, AlexEichenberger
Reviewed By: grokos
Subscribers: jcownie, guansong, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D63106
llvm-svn: 363825
Summary:
The problems with __syncthreads() were fixed in clang >= 9.0 and the
original __syncthreads() can be used instead of the ptx instruction.
Reviewers: grokos
Subscribers: guansong, jdoerfert, openmp-commits, kkwli0, caomhin
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D63515
llvm-svn: 363807
Summary: This patch enables the usage of a host variable on the device for declare target link variables when unified memory is available.
Reviewers: ABataev, caomhin, grokos
Reviewed By: grokos
Subscribers: Hahnfeld, guansong, jdoerfert, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D60884
llvm-svn: 362505
Made type of depth of hwloc object to correapond with
change from unsigned in hwloc 1,x to int in hwloc 2.x.
This eliminates the warning on signed-unsigned comparison.
Differential Revision: https://reviews.llvm.org/D62332
llvm-svn: 362401
Current parsing allows trailing string after the permitted value,
MANDATORY|DISABLED|DEFAULT -- e.g., "mandatorynot" is also recognized
as "MANDATORY". Such cases should be recognized as incorrect/unknown
value.
Differential Revision: https://reviews.llvm.org/D62431
llvm-svn: 362125
This change adds checks before dereferencing a pointer returned from a
function.
Differential Revision: https://reviews.llvm.org/D62224
llvm-svn: 362111
The omp_taskloop_num_tasks and omp_taskwait have deadlooped
on the NetBSD buildbot previously, practically hanging the host running
it. Disable them until we can find a good solution, or make the kernel
less fragile.
llvm-svn: 361825