Commit Graph

264 Commits

Author SHA1 Message Date
Alexey Bataev e57f8ad914 [LIBOMPTARGET]Call GetLaneId function, do not use its address in debug
log functions.
2019-11-01 09:43:47 -04:00
JonChesterfield 9b06ac98d0 [nfc][omptarget] Use builtin var abstraction. Second pass at D69476
Summary:
[nfc][omptarget] Use builtin var abstraction. Second pass at D69476

Use the wrappers in support.h for cuda builtin variables at all call sites.
Localises use of cuda and removes WARPSIZE==32 assumption in debug.h.

Reviewers: ABataev, jdoerfert, grokos

Reviewed By: jdoerfert

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69693
2019-11-01 02:21:44 +00:00
JonChesterfield 764c8420e4 [nfc][libomptarget] Reorganise support header
Summary:
[nfc][libomptarget] Reorganise support header

All functions defined in support implementation are now declared in support.h
Reordered functions in support implementation to match the sequence in support.h
Added include guards to support.h
Added #include interface to support.h to provide kmp_Ident declaration
Move supporti.h to support.cu and s/INLINE/EXTERN/g
Add remaining includes to support.cu

A minor side effect is to change the name mangling of the support functions to
extern "C". If this matters another macro along the lines of INLINE/EXTERN
can be added - perhaps DEVICE as that's the obvious implementation.

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: jdoerfert

Subscribers: mgorny, jfb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69652
2019-10-31 17:15:02 +00:00
Jon Chesterfield e9f9dfab82 [libomptarget] Change nvcc compilation to use a unity build
Summary:
[libomptarget] Change nvcc compilation to use a unity build

This allows nvcc to inline functions between what would otherwise be distinct
translation units, which in turn removes any runtime cost from implementing
functions in source files (as opposed to inline in headers).

This will then allow the circular dependencies in deviceRTL to be readily
broken and individual components more easily shared between architectures.

Reviewers: ABataev, jdoerfert, grokos, RaviNarayanaswamy, hfinkel, ronlieb, gregrodgers

Reviewed By: jdoerfert

Subscribers: mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69489
2019-10-31 01:58:51 +00:00
Jon Chesterfield 8548e2f543 [nfc][libomptarget] Move named_sync() into target_impl
Summary: [nfc][libomptarget] Move named_sync() into target_impl

Reviewers: ABataev, jdoerfert, grokos

Reviewed By: ABataev

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69487
2019-10-30 16:25:05 +00:00
Jon Chesterfield 74bb5ee674 [nfc][libomptarget] Move smid() into target_impl
Summary: [nfc][libomptarget] Move smid() into target_impl

Reviewers: ABataev, jdoerfert, grokos

Reviewed By: ABataev

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69485
2019-10-30 13:39:15 +00:00
Jon Chesterfield 62a161cc00 [libomptarget] Always call malloc, free via SafeMalloc, SafeFree wrapper
Summary:
[libomptarget] Always call malloc, free via SafeMalloc, SafeFree wrapper

NFC for release, adds some verbosity to debug printing. Motivation is to provide
one place where local modifications can be made to the behaviour of all heap
allocation or deallocation while debugging.

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: ABataev

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69492
2019-10-30 13:35:34 +00:00
Alexey Bataev d7941a6ab9 [LIBOMPTARGET]Fix build, NFC.
Need to include nvptx_interface.h in target_impl.h, otherwise the build
is failed because of missing __kmpc_impl_lanemask_t type.
2019-10-28 10:43:00 -04:00
Jon Chesterfield 174967f153 [nfc][libomptarget] Decrease coupling between files
Summary:
[nfc][libomptarget] Decrease coupling between files

debug.h used the symbol omptarget_device_environment so implicitly required
an include of omptarget-nvptx.h to compile. Similarly interface.h uses size_t.

Moving this declaration to a new header means cancel, critical can now build
without omptarget-nvptx.h. After this change, debug.h, cancel.cu, critical.cu
could move under a common source directory.

Reviewers: ABataev, jdoerfert, grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69473
2019-10-27 14:27:54 +00:00
Jon Chesterfield ad4c42666d [nfc][libomptarget] Inline option into target_impl
Summary:
[nfc][libomptarget] Inline option into target_impl

Subset of D69423. The macros that were in option.h are all target dependent.
Inlining the header simplifies the dependency graph when looking to move code
into a common subdir.

Reviewers: ABataev, jdoerfert, grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69472
2019-10-27 14:26:55 +00:00
Jon Chesterfield f7c3c640af [NFC][libomptarget]Remove TRUE,FALSE macros from option.h
Summary:
[NFC][libomptarget]Remove TRUE,FALSE macros from option.h
Subset of D69423. Patch series ends with removing option.h.

Reviewers: ABataev, jdoerfert, grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69463
2019-10-27 01:31:12 +01:00
Jon Chesterfield 197b7b24c3 [NFC][libomptarget] move remaining device specific code out of omptarget-nvptx.h
Summary:
[NFC][libomptarget] move remaining device specific code out of omptarget-nvptx.h

Strictly there is one remaining difference wrt amdgcn - parallelLevel is
volatile qualified on amdgcn and not on nvptx. Determining whether this is
correct - and how to represent the different semantics of 'volatile' under
various conditions - is beyond the scope of this code motion patch.

Reviewers: ABataev, jdoerfert, grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D69424
2019-10-25 18:58:31 +01:00
Jon Chesterfield d69d1aa131 [libomptarget][nfc] Make interface.h target independent
Summary:
[libomptarget][nfc] Make interface.h target independent

Move interface.h under a top level include directory.
Remove #includes to avoid the interface depending on the implementation.

Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy

Reviewed By: jdoerfert

Subscribers: mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D68615

llvm-svn: 374919
2019-10-15 17:15:26 +00:00
Jon Chesterfield 58fd6b5b9c [libomptarget][nfc] Update remaining uint32 to use lanemask_t
Summary:
[libomptarget][nfc] Update remaining uint32 to use lanemask_t

Update a few functions in the API to use lanemask_t instead of i32. NFC for
nvptx. Also update the ActiveThreads type in DataSharingStateTy.
This removes a lot of #ifdef from the downsteam amdgcn implementation.

Reviewers: ABataev, jdoerfert, grokos, ronlieb, RaviNarayanaswamy

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D68513

llvm-svn: 373806
2019-10-04 22:30:28 +00:00
Jon Chesterfield 4f75a73796 Use named constant to indicate all lanes, to handle 32 and 64 wide architectures
Summary: Use named constant to indicate all lanes, to handle 32 and 64 wide architectures

Reviewers: ABataev, jdoerfert, grokos, ronlieb

Reviewed By: grokos

Subscribers: ronlieb, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D68369

llvm-svn: 373793
2019-10-04 21:39:22 +00:00
Alexey Bataev 4812941776 [OPENMP][NVPTX]Fix parallel level counter in non-SPMD mode.
Summary:
In non-SPMD mode we may end up with the divergent threads when trying to
increment/decrement parallel level counter. It may lead to incorrect
calculations of the parallel level and wrong results when threads are
divergent. We need to reconverge the threads before trying to modify the
parallel level counter.

Reviewers: grokos, jdoerfert

Subscribers: guansong, openmp-commits, caomhin, kkwli0

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D66802

llvm-svn: 370803
2019-09-03 18:11:50 +00:00
Jon Chesterfield bbdd282371 [libomptarget] Refactor activemask macro to inline function
Summary:
[libomptarget] Refactor activemask macro to inline function
See also abandoned D66846, split into this diff and others.

Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers

Reviewed By: jdoerfert, ABataev

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D66851

llvm-svn: 370781
2019-09-03 16:31:30 +00:00
Jon Chesterfield 3294421926 Use target_impl functions to replace more inline asm
Summary:
Use target_impl functions to replace more inline asm
Follow on from D65836. Removes remaining asm shuffles and lanemask accessors
Also changes the types of target_impl bitwise functions to unsigned.

Reviewers: jdoerfert, ABataev, grokos, Hahnfeld, gregrodgers, ronlieb, hfinkel

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D66809

llvm-svn: 370216
2019-08-28 15:04:06 +00:00
Jon Chesterfield 80f9a38a76 [libomptarget] Refactor syncthreads macro to inline function
Summary:
[libomptarget] Refactor syncthreads macro to inline function
See also abandoned D66846, split into this diff and others.

Rev 2 of D66855

Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D66861

llvm-svn: 370210
2019-08-28 14:22:35 +00:00
Jon Chesterfield be3d487313 [libomptarget] Refactor syncwarp macro to inline function
Summary:
[libomptarget] Refactor syncwarp macro to inline function
See also abandoned D66846, split into this diff and others.

Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D66857

llvm-svn: 370149
2019-08-28 02:02:53 +00:00
Jon Chesterfield e73e3013a6 Fix build break due to close brace lost in merge
llvm-svn: 370148
2019-08-28 01:56:26 +00:00
Jon Chesterfield 327aa81123 [libomptarget] Refactor shfl_down_sync macro to inline function
Summary:
[libomptarget] Refactor shfl_down_sync macro to inline function
See also abandoned D66846, split into this diff and others.

Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D66853

llvm-svn: 370146
2019-08-28 01:47:41 +00:00
Jon Chesterfield b9b712df82 [libomptarget] Refactor shfl_sync macro to inline function
Summary:
[libomptarget] Refactor shfl_sync macro to inline function
See also abandoned D66846, split into this diff and others.

Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D66852

llvm-svn: 370144
2019-08-28 01:31:04 +00:00
Alexey Bataev da8b5cc9f1 [OPENMP][NVPTX]Add __kmpc_syncwarp(int32_t) function.
Summary:
Added function void __kmpc_syncwarp(int32_t) to expose it to the
compiler. It is required to fix the problem with the critical regions in
Cuda9.0+. We cannot use barrier in the critical region, but still need
to reconverge the threads in the warp after. This function allows to do
this.

Reviewers: grokos, jdoerfert

Subscribers: guansong, openmp-commits, kkwli0, caomhin

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D66672

llvm-svn: 369933
2019-08-26 17:32:45 +00:00
Alexey Bataev 0366168f3a [OPENMP][NVPTX]Use __syncwarp() to reconverge the threads.
Summary:
In Cuda 9.0 it is not guaranteed that threads in the warps are
convergent. We need to use __syncwarp() function to reconverge
the threads and to guarantee the memory ordering among threads in the
warps.
This is the first patch to fix the problem with the test
libomptarget/deviceRTLs/nvptx/src/sync.cu on Cuda9+.
This patch just replaces calls to __shfl_sync() function with the call
of __syncwarp() function where we need to reconverge the threads when we
try to modify the value of the parallel level counter.

Reviewers: grokos

Subscribers: guansong, jfb, jdoerfert, caomhin, kkwli0, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D65013

llvm-svn: 369796
2019-08-23 18:34:48 +00:00
Jon Chesterfield ed3324f6b6 Factor architecture dependent code out of loop.cu
Summary:
[libomptarget] Factor architecture dependent code out of loop.cu

Related to the patch series starting D64217. Added subscribers to said series as reviewers. This effort is smaller in scope.

This patch factors out just enough architecture dependent code from loop.cu to allow the same source to be used with amdgcn, given a different target_impl.h. Testing is that the same bitcode (modulo variable names) is generated for libomptarget before and after the refactor, for nvptx and the out of tree amdgcn.

Reviewers: jdoerfert, ABataev, bollu, jfb, tra, grokos, Hahnfeld, guansong, xtian, gregrodgers, ronlieb, hfinkel, gtbercea, guraypp, arpith-jacob

Reviewed By: jdoerfert, ABataev

Subscribers: dexonsmith, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D65836

llvm-svn: 368751
2019-08-13 21:41:47 +00:00
Jon Chesterfield ae0178bee7 Use forceinline. Necessary for nvcc to inline small functions within the bitcode library
Summary:
[libomptarget] Use forceinline. Necessary for nvcc to inline small functions within the bitcode library
Suggested in D65836

Reviewers: ABataev, jdoerfert, grokos, gregrodgers

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D65876

llvm-svn: 368177
2019-08-07 15:24:12 +00:00
Alexey Bataev ca424d100c [OPENMP][NVPTX]Perform memory flush if number of threads to sync is 1 or less.
Summary:
According to the OpenMP standard, barrier operation must perform
implicit flush operation. Currently, if there is only one thread in the
team, barrier does not flush the memory. Patch fixes this problem.

Reviewers: grokos, gtbercea, kkwli0

Subscribers: guansong, jdoerfert, openmp-commits, caomhin

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D62398

llvm-svn: 367024
2019-07-25 15:02:28 +00:00
Alexey Bataev 85b9651edd [OPENMP][NVPTX]Fixed checks for cuda versions.
Summary:
We used CUDART_VERSION macro to check for the installed cuda version
but this macro is defined in cuda_runtime_api.h, which is not used by
project. Better to use CUDA_VERSION macro, which is defined in cuda.h.
Also, added the check if this macro is defined. If macro is undefined,
there is something wrong with the cuda configuration and we should not
continue the compilation.
This also fixes problems with runtime building in cuda 10+.

Reviewers: grokos

Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D64648

llvm-svn: 366224
2019-07-16 16:07:10 +00:00
Jonas Hahnfeld 2dfc5179f6 [libomptarget-nvptx] Remove dead functions
These entry points are never called by Clang trunk nor clang-ykt. If
XL doesn't use them either, they can finally go away.

Differential Revision: https://reviews.llvm.org/D52700

llvm-svn: 365817
2019-07-11 20:12:51 +00:00
Alexey Bataev 060921dee7 [OPENMP]Make __kmpc_push_tripcount thread safe.
Summary:
__kmpc_push_tripcount function is not thread safe and may lead to data
race when the target regions are executed in parallel threads. The patch
makes loopTripCnt counter thread aware and stores the tripcount value
per thread in the map. Access to map is guarded by mutex to prevent
data race in the map itself.
Test is for NVPTX target because it does not work correctly on the
host. Seems to me, there is a problem in libomp with target regions in
the parallel threads.

Reviewers: grokos

Subscribers: guansong, jfb, jdoerfert, openmp-commits, kkwli0, caomhin

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D64080

llvm-svn: 365332
2019-07-08 15:30:23 +00:00
Alexey Bataev bb55ece269 [OPENMP][NVPTX]Relax flush directive.
Summary:
According to the OpenMP standard, flush  makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied.

According to the Cuda toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#memory-fence-functions), __threadfence() functions provides required functionality.

__threadfence_system() also provides required functionality, but it also
includes some extra functionality, like synchronization of page-locked
host memory, synchronization for the host, etc. It is not required per
the standard and we can use more relaxed version of memory fence
operation.

Reviewers: grokos, gtbercea, kkwli0

Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D62397

llvm-svn: 364572
2019-06-27 18:33:09 +00:00
Alexey Bataev 8a2bd361eb [OPENMP][CUDA]Use __syncthreads when compiled by nvcc and clang >= 9.0.
Summary:
The problems with __syncthreads() were fixed in clang >= 9.0 and the
original __syncthreads() can be used instead of the ptx instruction.

Reviewers: grokos

Subscribers: guansong, jdoerfert, openmp-commits, kkwli0, caomhin

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D63515

llvm-svn: 363807
2019-06-19 14:20:34 +00:00
Alexey Bataev e1947b84c1 Revert "[OPENMP][NVPTX]Fix barriers and parallel level counters, NFC."
This reverts commit r361421 to split the patch into 3 parts.

llvm-svn: 361638
2019-05-24 14:06:47 +00:00
Alexey Bataev 9d9e406684 [OPENMP][NVPTX]Fix barriers and parallel level counters, NFC.
Summary:
Parallel level counter should be volatile to prevent some dangerous
optimiations by the ptxas. Otherwise, ptxas optimizations lead to
undefined behaviour in some cases.
Also, use __threadfence() for #pragma omp flush and if the barrier
should not be used (we have only one thread in the team), still perform
flush operation since the standard requires implicit flush when
executing barriers.

Reviewers: gtbercea, kkwli0, grokos

Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D62199

llvm-svn: 361421
2019-05-22 19:50:32 +00:00
Alexey Bataev f9e00db818 [OPENMP][NVPTX]Simplify handling of thread limit, NFC.
Summary:
Patch improves performance of the full runtime mode by moving
threads limit counter to the shared memory. It also allows to save
global memory.

Reviewers: grokos, kkwli0, gtbercea

Subscribers: guansong, jdoerfert, openmp-commits, caomhin

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D61801

llvm-svn: 360584
2019-05-13 14:21:46 +00:00
Alexey Bataev f62c266de7 [OPENMP][NVPTX]Improve number of threads counter, NFC.
Summary:
Patch improves performance of the full runtime mode by moving
number-of-threads counter to the shared memory. It also allows to save
global memory.

Reviewers: grokos, gtbercea, kkwli0

Subscribers: guansong, jfb, jdoerfert, openmp-commits, caomhin

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D61785

llvm-svn: 360457
2019-05-10 18:56:05 +00:00
Alexey Bataev a857e31011 [OPENMP][NVPTX]Improve thread limit counter, NFC.
Summary:
Patch improves performance of the full runtime mode by moving
thread-limit counter to the shared memory. It also allows to save
global memory.

Reviewers: grokos, gtbercea, kkwli0

Subscribers: guansong, jdoerfert, caomhin, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D61526

llvm-svn: 359922
2019-05-03 20:00:38 +00:00
Alexey Bataev e031e17919 [OPENMP][NVPTX]Improved several standard OpenMP functions, NFC.
Summary:
Used parallelLevel[] counter to simplify and improve implementation of
the existing standard OpenMP functions. Functions are tested already in
several tests, the patch is NFC.

Reviewers: grokos, gtbercea, kkwli0

Subscribers: guansong, jdoerfert, caomhin, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D61459

llvm-svn: 359892
2019-05-03 14:47:20 +00:00
Alexey Bataev 8ccb8f8647 [OPENMP][NVPTX]Improve code by using parallel level counter.
Summary:
Previously for the different purposes we need to get the active/common
parallel level and with full runtime we iterated over all the records to
calculate this level. Instead, we can used the warp-based parallel level
counters used in no-runtime mode.

Reviewers: grokos, gtbercea, kkwli0

Subscribers: guansong, jfb, jdoerfert, caomhin, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D61395

llvm-svn: 359822
2019-05-02 20:05:01 +00:00
Alexey Bataev 4ad6dbc5fd [OPENMP][NVPTX]Improve omp_get_max_threads() function.
Summary:
Function omp_get_max_threads() can always return 1 if current execution
mode is SPMD.

Reviewers: grokos, gtbercea, kkwli0

Subscribers: guansong, jdoerfert, caomhin, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D61379

llvm-svn: 359792
2019-05-02 14:52:52 +00:00
Alexey Bataev 8e6bf88cf7 [OPENMP][NVPTX]Improved omp_get_thread_limit() function.
Summary:
Function omp_get_thread_limit() in SPMD mode can return the maximum
available number of threads as a result.

Reviewers: grokos, gtbercea, kkwli0

Subscribers: guansong, jdoerfert, openmp-commits, caomhin

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D61378

llvm-svn: 359790
2019-05-02 14:46:32 +00:00
Alexey Bataev c03fe73176 [OPENMP][NVPTX]Correctly handle L2 parallelism in SPMD mode.
Summary:
The parallelLevel counter must be on per-thread basis to fully support
L2+ parallelism, otherwise we may end up with undefined behavior.
Introduce the parallelLevel on per-warp basis using shared memory. It
allows to avoid the problems with the synchronization and allows fully
support L2+ parallelism in SPMD mode with no runtime.

Reviewers: gtbercea, grokos

Subscribers: guansong, jdoerfert, caomhin, kkwli0, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D60918

llvm-svn: 359341
2019-04-26 19:30:34 +00:00
Alexey Bataev 5de5d74c8d [OPENMP][NVPTX] Fix the test, NFC.
Fix the test to run it really in SPMD mode without runtime. Previously
it was run in SPMD + full runtime mode and does not allow to cehck the
functionality correctly.

llvm-svn: 358902
2019-04-22 17:25:31 +00:00
Alexey Bataev 13532ea623 [OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions.
Summary:
If the kernel is executed in SPMD mode and the L2+ parallel for region
with the dynamic scheduling is executed, dynamic scheduling functions
are called. They expect full runtime support, but SPMD kernels may be
executed without the full runtime. It leads to the runtime crash of the
compiled program. Patch fixes this problem + fixes handling of the
parallelism level in SPMD mode, which is required as part of this patch.

Reviewers: gtbercea, kkwli0, grokos

Subscribers: guansong, jdoerfert, openmp-commits, caomhin

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D60578

llvm-svn: 358442
2019-04-15 20:15:20 +00:00
Gheorghe-Teodor Bercea 06e08f0b0a [OpenMP][libomptarget] New reduction scheme for team reductions
Summary:
This patch adds a more sophisticated team reduction scheme to the OpenMP libomptarget-nvptx runtime.

The scheme uses a fixed size global memory buffer whose length can be adjusted via compiler flag:
```
-fopenmp-cuda-teams-reduction-recs-num=1024
```
The global buffer is a structure of arrays (with default size of 1024 each and controlled by the above flag), one array for each reduction variable.

Values in the buffer are processed by the last team to finish executing the body of the target region.

In addition to adding support for the new flag, the compiler also emits special functions used for the reduction of the intermediate reduction values. These changes will be added in a separate compiler patch following this one.




Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: guansong, jfb, jdoerfert, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D58409

llvm-svn: 354471
2019-02-20 14:55:55 +00:00
Chandler Carruth 57b08b0944 Update more file headers across all of the LLVM projects in the monorepo
to reflect the new license. These used slightly different spellings that
defeated my regular expressions.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351648
2019-01-19 10:56:40 +00:00
Gheorghe-Teodor Bercea 1653633a1c [OpenMP][libomptarget] Use shared memory variable for tracking parallel level
Summary: Replace existing infrastructure for tracking parallel level using global memory with a per-team shared memory variable. This minimizes the impact of the overhead of tracking the parallel level for non-nested cases.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D55773

llvm-svn: 350747
2019-01-09 18:30:14 +00:00
Alexey Bataev 26e6c86b79 [OPENMP][NVPTX]Fix dynamic scheduling.
Summary:
Previous implementation may cause the runtime crash when the number of
teams is > 1024. Patch fixes this problem + reduces number of the atomic
operations by 32 times.

Reviewers: grokos, gtbercea, kkwli0

Subscribers: guansong, jfb, openmp-commits, caomhin

Differential Revision: https://reviews.llvm.org/D56332

llvm-svn: 350524
2019-01-07 14:25:25 +00:00
Alexey Bataev 6b3153ada0 [OPENMP][NVPTX]General formatting/code improvement, NFC.
Summary: Formatting.

Reviewers: gtbercea, grokos, kkwli0

Subscribers: guansong, openmp-commits, caomhin

Differential Revision: https://reviews.llvm.org/D56290

llvm-svn: 350431
2019-01-04 20:16:54 +00:00
Alexey Bataev dcf2edcdf5 [OPENMP][NVPTX]Improve performance + reduce number of used registers.
Summary:
Reduced number of the used register + improved performance propagating
the information about current execution/data sharing mode directly from
the compiler, where it is possible.
In some cases, it requires new/reworked interfaces of the runtime
external functions. Old functions are marked as deprecated.

Reviewers: grokos, gtbercea, kkwli0

Subscribers: guansong, jfb, openmp-commits, caomhin

Differential Revision: https://reviews.llvm.org/D56278

llvm-svn: 350405
2019-01-04 17:09:12 +00:00
Alexey Bataev 3c74be8049 [OPENMP][NVPTX]Fix incompatibility of __syncthreads with LLVM, NFC.
Summary:
One of the LLVM optimizations, split critical edges, also clones tail
instructions. This is a dangerous operation for __syncthreads()
functions and this transformation leads to undefined behavior or
incorrect results. Patch fixes this problem by replacing __syncthreads()
function with the assembler instruction, which cost is too high and
wich cannot be copied.

Reviewers: grokos, gtbercea, kkwli0

Subscribers: guansong, openmp-commits, caomhin

Differential Revision: https://reviews.llvm.org/D56274

llvm-svn: 350333
2019-01-03 17:43:46 +00:00
Alexey Bataev d1cd005ec5 [OPENMP][NVPTX]Added/fixed debugging messages, NFC.
Summary: Added or fixed new/old debugging messages for the better diagnostics.

Reviewers: gtbercea, kkwli0, grokos

Reviewed By: grokos

Subscribers: caomhin, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D56102

llvm-svn: 350137
2018-12-28 21:36:09 +00:00
Alexey Bataev 28eccf5ba0 [OPENMP][NVPTX]Fixed initialization of the data-sharing interface.
Summary:
Avoid using of the atomic loop to wait for the completion of the
data-sharing interface initialization, use __shfl_sync instead for the
communication within the warp to signal other threads in the warp about
completion of the initialization.

Reviewers: gtbercea, kkwli0, grokos

Subscribers: guansong, jfb, caomhin, openmp-commits

Differential Revision: https://reviews.llvm.org/D56100

llvm-svn: 350129
2018-12-28 17:31:06 +00:00
Alexey Bataev 1708858dbd [OPENMP][NVPTX]Outline assert into noinline function, NFC.
Summary:
At high optimization level asserts lead to some unexpected results
because of auto-inserted unreachable instructions. This outlining
prevents some of such dangerous optimizations and leads to better
stability.

Reviewers: gtbercea, kkwli0, grokos

Subscribers: guansong, caomhin, openmp-commits

Differential Revision: https://reviews.llvm.org/D56101

llvm-svn: 350128
2018-12-28 17:29:47 +00:00
Alexey Bataev 9056f1116d [OPENMP][NVPTX]Revert __kmpc_shuffle_int64 to its original form.
Summary:
Use the original shuffle implementation for __kmpc_shuffle_int64 since
default implementation uses the same implementation.

Reviewers: gtbercea

Subscribers: guansong, caomhin, openmp-commits

Differential Revision: https://reviews.llvm.org/D55514

llvm-svn: 348772
2018-12-10 16:50:36 +00:00
Alexey Bataev cc6cf64c38 [OPENMP][NVPTX]Enable fast shuffles on 64bit values only if CUDA >= 9.
Summary:
Shuffle on 64bit data is allowed only for CUDA >= 9.0. Also, fixed the
constant for the mask, need one extra L in the end.

Reviewers: gtbercea, kkwli0

Subscribers: guansong, caomhin, openmp-commits

Differential Revision: https://reviews.llvm.org/D55440

llvm-svn: 348758
2018-12-10 14:29:05 +00:00
Alexey Bataev 8acafff404 [OPENMP][NVPTX]Save registers for optimized builds with enabled logging.
Summary:
Introduced special noinline function log that allows to save some
registers for optimized builds but with enabled logging. Also, it
increases the stability of the optimized builds with inlined runtime.

Reviewers: gtbercea, kkwli0

Reviewed By: gtbercea

Subscribers: caomhin, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D55436

llvm-svn: 348606
2018-12-07 16:08:29 +00:00
Alexey Bataev 653e8ba79a [OPENMP][NVPTX]Correct type casting for printf args + simplified shfl64 function.
Summary:
Explicitly casted printf's args to the required types + simplified
shfl64 function.

Reviewers: gtbercea, kkwli0

Subscribers: guansong, jfb, caomhin, openmp-commits

Differential Revision: https://reviews.llvm.org/D55379

llvm-svn: 348521
2018-12-06 19:45:48 +00:00
Alexey Bataev 5442f3e549 [OPENMP][NVPTX]Fix __kmpc_flush to flush the memory per system, not per block.
Summary:
According to the standard, after memory flushing the changes in the
memory must be visible to all the threads in all teams. Patch fixes
this.

Reviewers: gtbercea, kkwli0

Subscribers: guansong, jfb, caomhin, openmp-commits

Differential Revision: https://reviews.llvm.org/D55370

llvm-svn: 348491
2018-12-06 15:27:58 +00:00
Gheorghe-Teodor Bercea 10b2e60b7e [OpenMP][libomptarget] Flush intermediate values during team reduction
Summary: Ensure intermediate values of a team reduction are flushed to memory.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: guansong, jfb, openmp-commits

Differential Revision: https://reviews.llvm.org/D55219

llvm-svn: 348148
2018-12-03 15:21:49 +00:00
Alexey Bataev 0f221f53d8 [OPENMP][NVPTX]Make runtime compatible with the original runtime.
Summary:
Reworked runtime to make it compatible with the requirements of the
original runtime library. Also, simplified some code to reduce number of
function calls.

Reviewers: gtbercea, kkwli0

Subscribers: guansong, jfb, caomhin, openmp-commits

Differential Revision: https://reviews.llvm.org/D55130

llvm-svn: 348003
2018-11-30 16:52:38 +00:00
Gheorghe-Teodor Bercea 31c1589ab0 [OpenMP][libomptarget] Add new version of SPMD deinit kernel function with argument
Summary: To enable the compiler to optimize parts of the function that are not needed when runtime can be omitted, a new version of the SPMD deinit kernel function is needed. This function takes the runtime required flag as an argument.

Reviewers: ABataev, kkwli0, caomhin

Reviewed By: ABataev

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D54969

llvm-svn: 347714
2018-11-27 21:23:40 +00:00
Alexey Bataev d4de439cf4 [OPENMP][NVPTX]Basic support for reductions across the teams.
Summary:
Added functions __kmpc_nvptx_teams_reduce_nowait_simple and
__kmpc_nvptx_teams_end_reduce_nowait_simple to implement basic support
for reductions across the teams.

Reviewers: gtbercea, kkwli0

Subscribers: guansong, jfb, caomhin, openmp-commits

Differential Revision: https://reviews.llvm.org/D54967

llvm-svn: 347710
2018-11-27 21:06:09 +00:00
Gheorghe-Teodor Bercea ad8632a9ba [OpenMP][libomptarget] Refactor SPMD and runtime requirement checking
Summary: Refactor the checking for SPMD mode and whether the runtime is initialized or not. This uses constant flags which enables the runtime to optimize out unused sections of code that depend on these flags.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: guansong, jfb, openmp-commits

Differential Revision: https://reviews.llvm.org/D54960

llvm-svn: 347698
2018-11-27 19:45:10 +00:00
Alexey Bataev 8ab0924ab4 [OPENMP][NVPTX]Improved lock/critical constructs.
Summary: Improved support for critical constructs + omp_..._lock... constructs.

Reviewers: gtbercea, kkwli0, caomhin

Subscribers: guansong, jfb, openmp-commits

Differential Revision: https://reviews.llvm.org/D54766

llvm-svn: 347342
2018-11-20 20:19:36 +00:00
Alexey Bataev 463e9f3224 [OPENMP][NVPTX]Fixed/improved support for globalization in team contexts.
Summary:
Current globalization scheme works correctly only for SPMD+lightweight
runtime mode and does not work for full runtime. Patch improves support
for the globalization scheme + reduces global memory consumption in
  lightweight runtime mode.
Patch adds runtime functions to work with the statically allocated
global memory. It allows to improve performance and memory consumption.
This global memory must be allocated by the compiler.

Reviewers: grokos, kkwli0, gtbercea, caomhin

Subscribers: guansong, jfb, openmp-commits

Differential Revision: https://reviews.llvm.org/D53943

llvm-svn: 345976
2018-11-02 14:43:23 +00:00
Gheorghe-Teodor Bercea b10bacf122 [OpenMP][libomptarget] Add runtime function for pushing coalesced global records
Summary: In the case of coalesced global records, we need to push the exact data size passed in. This patch fixes this by outlining the common functionality of the previous push function and by adding a separate entry point for coalesced pushes. The pop function remains unchanged.

Reviewers: ABataev, grokos, caomhin

Reviewed By: ABataev, grokos

Subscribers: jholewinski, cfe-commits, Hahnfeld, guansong, jfb, openmp-commits

Differential Revision: https://reviews.llvm.org/D53141

llvm-svn: 345867
2018-11-01 18:08:12 +00:00
Jonas Hahnfeld a762bfc03a [libomptarget-nvptx] Enable asserts in bclib
If the user requested LIBOMPTARGET_NVPTX_DEBUG, include asserts in
the bitcode library. Everything else will have very unpleasent
effects because asserts will appear when falling back to the static
library libomptarget-nvptx.a.

Differential Revision: https://reviews.llvm.org/D52701

llvm-svn: 343477
2018-10-01 14:16:55 +00:00
Jonas Hahnfeld a1100e6b9a [libomptarget-nvptx] reduction: Determine if runtime uninitialized
Pass in the correct value of isRuntimeUninitialized() which solves
parallel reductions as reported on the mailing list.
For reference: r333285 did the same for loop scheduling.

Differential Revision: https://reviews.llvm.org/D52725

llvm-svn: 343476
2018-10-01 14:14:26 +00:00
Jonas Hahnfeld 1bf767fb8e [libomptarget-nvptx] Align data sharing stack
NVPTX requires addresses of pointer locations to be 8-byte aligned
or there will be an exception during runtime.
This could happen without this patch as shown in the added test:
getId() requires 4 byte of stack and putValueInParallel() uses 16
bytes to store the addresses of the captured variables.

Differential Revision: https://reviews.llvm.org/D52655

llvm-svn: 343402
2018-09-30 09:23:21 +00:00
Jonas Hahnfeld 067235f227 [libomptarget-nvptx] Fix ancestor_thread_num and team_size (non-SPMD)
According to OpenMP 4.5, p250:12-14:

    If the requested nest level is outside the range of 0 and the
    nest level of the current thread, as returned by the omp_get_level
    routine, the routine returns -1.

The SPMD code path will need a similar fix.

Differential Revision: https://reviews.llvm.org/D51787

llvm-svn: 343401
2018-09-30 09:23:14 +00:00
Jonas Hahnfeld fb1b80191e [libomptarget-nvptx] Add tests for nested parallelism
Clang trunk will serialize nested parallel regions. Check that this
is correctly reflected in various API methods.

Differential Revision: https://reviews.llvm.org/D51786

llvm-svn: 343382
2018-09-29 16:02:32 +00:00
Jonas Hahnfeld c89a14f5d2 [libomptarget-nvptx] Ignore calls to dynamic API
There is no support and according to the OpenMP 4.5, p238:7-9:

    For implementations that do not support dynamic adjustment
    of the number of threads this routine has no effect: the
    value of dyn-var remains false.

Add a test that cancellation and nested parallelism aren't
supported either.

Differential Revision: https://reviews.llvm.org/D51785

llvm-svn: 343381
2018-09-29 16:02:25 +00:00
Jonas Hahnfeld a743c04412 [libomptarget-nvptx] Fix number of threads in parallel
If there is no num_threads() clause we must consider the
nthreads-var ICV. Its value is set by omp_set_num_threads()
and can be queried using omp_get_max_num_threads().
The rewritten code now closely resembles the algorithm given
in the OpenMP standard.

Differential Revision: https://reviews.llvm.org/D51783

llvm-svn: 343380
2018-09-29 16:02:17 +00:00
Jonas Hahnfeld 122dbb5dce [libomptarget-nvptx] Add testing infrastructure
This patch also introduces testing for libomptarget-nvptx
which has been missing until now. I propose to add tests for
all bugs that are fixed in the future.
The target check-libomptarget-nvptx is not run by default because
 - we can't determine if there is a GPU plugged into the system.
 - it will require the latest Clang compiler. Keeping compatibility
   with older releases would prevent testing newer code generation
   developed in trunk.

Differential Revision: https://reviews.llvm.org/D51687

llvm-svn: 343324
2018-09-28 15:05:43 +00:00
Gheorghe-Teodor Bercea f7256a593f [OpenMP][libomptarget] Set the frame pointer then test empty slot condition
Summary: NFC - just fixing a bug: the empty slot test was before the re-setting of the Stack pointer. 

Reviewers: ABataev, caomhin, Hahnfeld

Reviewed By: ABataev

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D52122

llvm-svn: 343006
2018-09-25 18:48:14 +00:00
Gheorghe-Teodor Bercea 9bc3bfffb4 [OpenMP][libomptarget] Simplify warp master selection for data sharing
Summary:
There is currently no supported situation where the warp master is not the first thread in the warp.

This also avoids the device execution from hanging on Volta GPUs when ballot_sync is called by a number of threads that is less that the size of a warp.


Reviewers: ABataev, caomhin, grokos

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D50188

llvm-svn: 342972
2018-09-25 13:23:32 +00:00
Alexey Bataev 022bf16b41 [OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime.
Summary:
We need the support for per-team shared variables to support codegen for
lastprivates/reductions. Patch adds this support by using shared memory
if the total size of the reductions/lastprivates is <= 128 bytes,
then  pre-allocated buffer in global memory if size is <= 4K bytes,or
uses malloc/free, otherwise.

Reviewers: gtbercea, kkwli0, grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D51875

llvm-svn: 342737
2018-09-21 14:11:41 +00:00
Jonas Hahnfeld dc79c7187c [libomptarget-nvptx] Remove last mentions of __kmpc_print_*
Their implementation was removed during review, delete their
prototype declarations.

llvm-svn: 341748
2018-09-08 12:10:19 +00:00
Jonas Hahnfeld 82d20201d0 [libomptarget][NVPTX] Drop dead code and data structures, NFCI.
* cg and HasCancel in WorkDescr were never read and can be removed.
 * This eliminates the last use of priv in ThreadPrivateContext.
 * CounterGroup is unused afterwards.
 * Remove duplicate external declares in omptarget-nvptx.cu that are
   already in the header omptarget-nvptx.h.

Differential Revision: https://reviews.llvm.org/D51622

llvm-svn: 341370
2018-09-04 15:13:17 +00:00
Jonas Hahnfeld 96c13488ab [libomptarget][NVPTX] Fix __kmpc_spmd_kernel_deinit
If the runtime is uninitialized the master thread must Enqueue the
state object, and ALL threads must return immediately.
Found post-commit of https://reviews.llvm.org/D51222.

llvm-svn: 341328
2018-09-03 17:24:23 +00:00
Alexey Bataev 39a4724095 [OPENMP][NVPTX] Replace assert() by ASSERT0() macro, NFC.
Required to fix the buildbots.

llvm-svn: 340956
2018-08-29 19:22:06 +00:00
Alexey Bataev b7a5d38cf5 [OPENMP][NVPTX] Lightweight runtime support for SPMD mode.
Summary:
Implemented simple and lightweight runtime support for SPMD mode-based
constructs. It adds support for L2 sequential parallelism wihtout full
runtime support. Also, patch fixes some use cases for
uninitialized|lightweight runtime.

Reviewers: grokos, kkwli0, Hahnfeld, gtbercea

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D51222

llvm-svn: 340944
2018-08-29 17:35:09 +00:00
Alexey Bataev 37d4156b11 [OPNEMP, NVPTX] Fixed sychronization construct + code cleanup.
Summary:
1. Fixed internal problem in `__kmpc_barrier` function: SPMD mode
synchronization function should be called only in L1 parallel level.
2. Removed some extra code for synchronization inside of the code, used
`__kmpc_barrier` instead.
3. Some code cleanup.

Reviewers: gtbercea, grokos

Subscribers: openmp-commits

Differential Revision: https://reviews.llvm.org/D49564

llvm-svn: 337691
2018-07-23 13:52:12 +00:00
Gheorghe-Teodor Bercea 9e94326185 [OpenMP][libomptarget] Fix data sharing and globalization infrastructure to work in SPMD mode
Summary: This patch fixes the data sharing infrastructure to work for the SPMD and non-SPMD cases.

Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: ABataev, grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D49204

llvm-svn: 337013
2018-07-13 16:14:22 +00:00
Alexey Bataev c2c0138a04 [OPENMP, NVPTX] Fix loop boundaries calculation for dynamic loops.
Summary:
Patch fixes the next problems.
1. Removes unused functions from omptarget_nvptx_ThreadPrivateContext
class + simplified data members.
2. Fixed calculation of loop boundaries for dynamic loops with static
scheduling.
3. Introduced saving/restoring of the dynamic loop boundaries to support
several nested parallel dynamic loops.

Reviewers: grokos

Subscribers: guansong, kkwli0, openmp-commits

Differential Revision: https://reviews.llvm.org/D49241

llvm-svn: 336915
2018-07-12 15:18:28 +00:00
Alexey Bataev 3994bafbc7 [OPENMP, NVPTX] Sync threads before start ordered loops.
Summary: Threads must be synchronized before starting ordered construct.

Reviewers: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D48732

llvm-svn: 335987
2018-06-29 16:16:00 +00:00
Alexey Bataev 0ac29350b5 [OPENMP, NVPTX] Fixes for NVPTX RTL
Summary:
Patch fixes several problems in the implementation of NVPTX RTL.
1. Detection of the last iteration for loops with static scheduling, no chunks.
2. Fixes reductions for the serialized parallel constructs.
3. Fixes handling of the barriers.

Reviewers: grokos

Reviewed By: grokos

Subscribers: Hahnfeld, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D48480

llvm-svn: 335469
2018-06-25 13:43:35 +00:00
Guansong Zhang f9e56e5982 [OpenMP] [CUDA] Expose teamid to the debug path
Summary: Small bug fix for debug build. A previous fix causing trouble for debug build.

Reviewers: grokos

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D48286

llvm-svn: 335046
2018-06-19 14:05:38 +00:00
Jonas Hahnfeld 17aabf83e9 [libomptarget-nvptx] loop: Determine if runtime uninitialized
The generic entry points for static loop scheduling previously
hardcoded that the runtime was initialized. This can be wrong if
the compiler analyzes that the runtime is not needed and calls
the init functions accordingly.

This didn't affect clang-ykt because they have entry points for
different combinations of SPMD x Runtime not needed. I didn't do
measurements yet but with inlining we might get away with always
calling the generic interface and letting compiler and runtime
figure out the rest.
In any case, a correct runtime is always better than having
functions that may only be called if previous calls passed in
a specific set of arguments!

Differential Revision: https://reviews.llvm.org/D47131

llvm-svn: 333285
2018-05-25 15:56:48 +00:00
Jonas Hahnfeld 65e0b8784c [CMake] Unify install path for libraries
Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands.
This also fixes installation of libomptarget-nvptx that previously
didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX.

Differential Revision: https://reviews.llvm.org/D47130

llvm-svn: 333284
2018-05-25 15:56:41 +00:00
George Rokos 6da6f433a0 [CUDA]Fix dynamic|guided scheduling.
The existing implementation of the dynamic scheduling
breaks the contract introduced by the original openmp
runtime and, thus, is incorrect. Patch fixes it and
introduces correct dynamic scheduling model.

Thanks to Alexey Bataev for submitting this patch.

Differential Revision: https://reviews.llvm.org/D47333

llvm-svn: 333225
2018-05-24 21:12:41 +00:00
Jonas Hahnfeld 37bbe1a698 [libomptarget-nvptx] Test bitcode compiler flags and enable by default
Move all logic related to selecting the bitcode compiler and linker
into a new file and dynamically test required compiler flags. This
also adds -fcuda-rdc for Clang trunk as previously attempted in D44992
which fixes the build.

As a result this change also enables building the library by default
if all prerequisites are met.

Differential Revision: https://reviews.llvm.org/D46901

llvm-svn: 332494
2018-05-16 17:20:21 +00:00
Gheorghe-Teodor Bercea 787a350021 [OpenMP][libomptarget] Add function for checking SPMD mode
Summary: Add function to the NVPTX libomptarget library that will return true if the current target region is being executed in SPMD mode.

Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D46840

llvm-svn: 332360
2018-05-15 15:16:43 +00:00
Guansong Zhang e1c7a46d5b [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side
Summary:
Enable the device side debug messages at compile time, use env var to control at runtime.

To achieve this, an environment data block is passed to the device lib when it is loaded.

By default, the message is off, to enable it, a user need to set LIBOMPDEVICE_DEBUG=1.

Reviewers: grokos

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D46210

llvm-svn: 331550
2018-05-04 19:29:28 +00:00
Guansong Zhang ad6c26516b [OpenMP] Remove compilation warning when using clang to compile bc files.
Summary: Minor printf format correction. NVCC ignore those. Clang will give warning on these if debug is enabled.

Reviewers: grokos

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D45528

llvm-svn: 330944
2018-04-26 14:06:53 +00:00
Guansong Zhang 334c379e32 [OpenMP] Make bc file compilation sensitive to LIBOMPTARGET_NVPTX_DEBUG flag
Summary: The LIBOMPTARGET_NVPTX_DEBUG flag is inconsistent between using nvcc to generate .a file and clang to generate .bc file. Sync the two setting so we can get debug messages from the bc file path as well.

Reviewers: grokos

Subscribers: Hahnfeld, openmp-commits, mgorny

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D45530

llvm-svn: 330477
2018-04-20 20:41:00 +00:00
Guansong Zhang f679431f91 [OpenMP] Remove extra warning when we build
Summary:
This one line change is to remove this warning message

"warning: integer conversion resulted in a change of sign"

Reviewers: grokos

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D45415

llvm-svn: 329713
2018-04-10 15:28:31 +00:00
Guansong Zhang f0029a7738 Revert "[OpenMP] enable bc file compilation using the latest clang"
This reverts commit 6849e31c36d712d97433bca9af39b7a09c8c1207.

llvm-svn: 329576
2018-04-09 14:45:41 +00:00
Guansong Zhang e47fbc9da8 [OpenMP] enable bc file compilation using the latest clang
Summary: adding cuda-rdc flag to allow extern global data

Reviewers: grokos

Reviewed By: grokos

Subscribers: gregrodgers, mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D44992

llvm-svn: 329072
2018-04-03 15:01:34 +00:00
Gheorghe-Teodor Bercea 4bc36a06e2 [OpenMP][libomptarget] Initialize global memory stack only once.
Summary: The global stack initialization function may be called multiple times. The initialization of the shared memory slots should only happen when the function is called for the first time for a given warp master thread.

Reviewers: grokos, carlo.bertolli, ABataev, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44754

llvm-svn: 328148
2018-03-21 21:02:55 +00:00
Gheorghe-Teodor Bercea b4332ca3da [OpenMP][libomptarget] Fix master warp check
Summary: The check for the master warp must take into consideration the actual number of warps: the master warp is equal to the last active warp not necessarily WARPSIZE - 1.

Reviewers: grokos, carlo.bertolli, ABataev, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44537

llvm-svn: 328146
2018-03-21 20:51:16 +00:00
Gheorghe-Teodor Bercea c8d395a168 [OpenMP][libomptarget] Enable globalization for workers
Summary:
This patch allows worker to have a global memory stack managed by the runtime. This patch is needed for completeness and consistency with the globalization policy: if a worker-side variable escapes the current context it then needs to be globalized.
Until now, only the master thread was allowed to have such a stack. These global values can now potentially be shared amongst workers if the semantics of the OpenMP program require it.

Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44487

llvm-svn: 328144
2018-03-21 20:34:19 +00:00
Gheorghe-Teodor Bercea 876c1ed2e5 [OpenMP][libomptarget] Enable usage of shared memory slots
Summary:
Allow the runtime to use the existing shared memory statically allocated slots.

When a variable is globalized, the underlying memory can be either shared or global memory (both have block-wide visibility). In this case, we allow that the storage to use a limited amount of shared memory that has been statically allocated already. Only if shared memory doesn't prove to be enough do we then invoke malloc() to create a new global memory slot.

Reviewers: ABataev, carlo.bertolli, grokos, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44486

llvm-svn: 327639
2018-03-15 16:05:34 +00:00
Gheorghe-Teodor Bercea f3de222b0d [OpenMP][libomptarget] Enable multiple frames per global memory slot
Summary: To save on calls to malloc, this patch enables the re-use of pre-allocated global memory slots.

Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44470

llvm-svn: 327637
2018-03-15 15:56:04 +00:00
George Rokos 59be4b434f [libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread.
llvm-svn: 327556
2018-03-14 19:11:36 +00:00
Gheorghe-Teodor Bercea 49b62649cf [OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing.
Summary:
This patch adds support for the sharing of variables from the master thread of a team to the worker threads of the team.
The runtime uses a stack structure implemented as a doubly-linked list of slots with each slot having the exact same size as the size requested. This implementation leverages existing data structures. The runtime functions are added as separate functions to avoid interfering with the current interface. 

Limitations to be addressed in future patches:
- This current patch only employs global memory. In a future patch we will enable to usage for shared memory as an optimization.
- Allow the allocation of several requested sizes in the same slot.

Reviewers: ABataev, grokos, caomhin, carlo.bertolli

Reviewed By: grokos

Subscribers: Hahnfeld, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44260

llvm-svn: 327440
2018-03-13 19:44:53 +00:00
Gheorghe-Teodor Bercea d5e5992f9a [OpenMP][libomptarget] Fix union.
Summary: To make the two parts of the union have the same size, the size of vect needs to be increased by 16 bits.

Reviewers: grokos, carlo.bertolli, caomhin, ABataev

Reviewed By: grokos, ABataev

Subscribers: fedor.sergeev, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44254

llvm-svn: 327040
2018-03-08 18:44:02 +00:00
Gheorghe-Teodor Bercea 7a5fa21ae2 [OpenMP] Remove implicit data sharing using device shared memory from libomptarget
Summary:
This patch reverts the changes to libomptarget that were coupled with the changes to Clang code gen for data sharing using shared memory. A similar patch exists for Clang: D43625

Shared memory is meant to be used as an optimization on top of a more general scheme. So far we didn't have a global memory implementation ready so shared memory was a solution which applied to the current level of OpenMP complexity supported by trunk on GPU devices (due to the missing NVPTX backend patch this functionality has never been exercised). Now that we have a global memory solution this patch is "in the way" and needs to be removed (for now). This patch (or an equivalent version of it) will be put out for review once the global memory scheme is in place.


Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: grokos

Subscribers: Hahnfeld, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D43626

llvm-svn: 326950
2018-03-07 22:10:10 +00:00
Gheorghe-Teodor Bercea d5ae4e6501 [OpenMP][libomptarget] Enable the compilation of multiple bc libraries for runtime inlining
Summary:
Different NVIDIA GPUs support different compute capabilities. To enable the inlining of runtime functions and the best performance on different generations of NVIDIA GPUs, a bc library for each compute capability needs to be compiled. The same compiler build will then be usable in conjunction with multiple generations of NVIDIA GPUs.
To differentiate between versions of the same bc lib, the output file name will contain the compute capability ID.
Depends on D14254

Reviewers: Hahnfeld, hfinkel, carlo.bertolli, caomhin, ABataev, grokos

Reviewed By: Hahnfeld, grokos

Subscribers: guansong, mgorny, openmp-commits

Differential Revision: https://reviews.llvm.org/D41724

llvm-svn: 324904
2018-02-12 16:45:20 +00:00
Gheorghe-Teodor Bercea aaeab8d4ef [OpenMP][libomptarget] Add data sharing support in libomptarget
Summary: This patch extends the libomptarget functionality in patch D14254 with support for the data sharing scheme for supporting implicitly shared variables. The runtime therefore maintains a list of references to shared variables.

Reviewers: carlo.bertolli, ABataev, Hahnfeld, grokos, caomhin, hfinkel

Reviewed By: Hahnfeld, grokos

Subscribers: guansong, llvm-commits, openmp-commits

Differential Revision: https://reviews.llvm.org/D41485

llvm-svn: 324495
2018-02-07 18:21:55 +00:00
Carlo Bertolli 57e9f44a8c [OpenMP-RT] Fix debug string for NVPTX runtime library
https://reviews.llvm.org/D42757

The method ThreadsInTeam is used to determine the number of threads to be used in a parallel region under SPMD mode (see line 127 of supporti.h in libomptarget/deviceRTLs/nvptx/src/). This patch fixes the corresponding debug print upon initialization of the kernel in SPMD mode.

llvm-svn: 323978
2018-02-01 16:12:16 +00:00
George Rokos 0dd6ed74fd [OpenMP] Initial implementation of OpenMP offloading library - libomptarget device RTLs.
This patch implements the device runtime library whose interface is used in the code generation for OpenMP offloading devices.
Currently there is a single device RTL written in CUDA meant to CUDA enabled GPUs.
The interface is a variation of the kmpc interface that includes some extra calls to do thread and storage management that only make sense for a GPU target.

Differential revision: https://reviews.llvm.org/D14254

llvm-svn: 323649
2018-01-29 13:59:35 +00:00