Commit Graph

96 Commits

Author SHA1 Message Date
Gheorghe-Teodor Bercea f7256a593f [OpenMP][libomptarget] Set the frame pointer then test empty slot condition
Summary: NFC - just fixing a bug: the empty slot test was before the re-setting of the Stack pointer. 

Reviewers: ABataev, caomhin, Hahnfeld

Reviewed By: ABataev

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D52122

llvm-svn: 343006
2018-09-25 18:48:14 +00:00
Gheorghe-Teodor Bercea 9bc3bfffb4 [OpenMP][libomptarget] Simplify warp master selection for data sharing
Summary:
There is currently no supported situation where the warp master is not the first thread in the warp.

This also avoids the device execution from hanging on Volta GPUs when ballot_sync is called by a number of threads that is less that the size of a warp.


Reviewers: ABataev, caomhin, grokos

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D50188

llvm-svn: 342972
2018-09-25 13:23:32 +00:00
Alexey Bataev 022bf16b41 [OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime.
Summary:
We need the support for per-team shared variables to support codegen for
lastprivates/reductions. Patch adds this support by using shared memory
if the total size of the reductions/lastprivates is <= 128 bytes,
then  pre-allocated buffer in global memory if size is <= 4K bytes,or
uses malloc/free, otherwise.

Reviewers: gtbercea, kkwli0, grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D51875

llvm-svn: 342737
2018-09-21 14:11:41 +00:00
Alexey Bataev 06b6e0f406 [OPENMP]Increment iterator when the loop is continued.
Summary:
Missed operation of the incrementing iterator when required just to
continue execution.

Reviewers: kkwli0, gtbercea, grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D51937

llvm-svn: 341964
2018-09-11 17:16:26 +00:00
Jonas Hahnfeld dc79c7187c [libomptarget-nvptx] Remove last mentions of __kmpc_print_*
Their implementation was removed during review, delete their
prototype declarations.

llvm-svn: 341748
2018-09-08 12:10:19 +00:00
Jonas Hahnfeld 21e3ee0afe [libomptarget] Remove two unneeded includes, NFCI.
Follow-up to r340542 and r340767.

llvm-svn: 341563
2018-09-06 17:00:57 +00:00
Jonas Hahnfeld f27dcf01d2 [libomptaret][test] Announce compiler features
This is a follow-up to r341371: The new test for PR38704 doesn't
work with Clang 6.0. It uses an UNSUPPORTED: clang-6, but that
hasn't worked because the compiler features weren't known to lit.

llvm-svn: 341448
2018-09-05 07:26:00 +00:00
Sergey Dmitriev b4dc69ff80 [libomptarget] Remove `Devices` from `RTLInfoTy`
This patch removes unused field `Devices` from `RTLInfoTy`.

Differential Revision: https://reviews.llvm.org/D51653

llvm-svn: 341399
2018-09-04 20:23:09 +00:00
Jonas Hahnfeld bb51d39871 [libomptarget][CUDA] Use cuDeviceGetAttribute, NFCI.
cuDeviceGetProperties has apparently been deprecated since CUDA 5.0.
Nvidia started using annotations only in CUDA 9.2, so nobody noticed
nor cared before.
The new function returns the same values, tested with a P100.

Differential Revision: https://reviews.llvm.org/D51624

llvm-svn: 341372
2018-09-04 15:13:28 +00:00
Jonas Hahnfeld f7f86971e6 [libomptarget] PR38704: Fix erase of ShadowPtrMap
erase() invalidates the iterator and returns a new one pointing
to the following element. The code now follows the example at
https://en.cppreference.com/w/cpp/container/map/erase.
(The added testcase crashes without this patch.)

Reported by David Binderman (https://llvm.org/PR38704)!

Differential Revision: https://reviews.llvm.org/D51623

llvm-svn: 341371
2018-09-04 15:13:23 +00:00
Jonas Hahnfeld 82d20201d0 [libomptarget][NVPTX] Drop dead code and data structures, NFCI.
* cg and HasCancel in WorkDescr were never read and can be removed.
 * This eliminates the last use of priv in ThreadPrivateContext.
 * CounterGroup is unused afterwards.
 * Remove duplicate external declares in omptarget-nvptx.cu that are
   already in the header omptarget-nvptx.h.

Differential Revision: https://reviews.llvm.org/D51622

llvm-svn: 341370
2018-09-04 15:13:17 +00:00
Jonas Hahnfeld 96c13488ab [libomptarget][NVPTX] Fix __kmpc_spmd_kernel_deinit
If the runtime is uninitialized the master thread must Enqueue the
state object, and ALL threads must return immediately.
Found post-commit of https://reviews.llvm.org/D51222.

llvm-svn: 341328
2018-09-03 17:24:23 +00:00
Alexey Bataev 39a4724095 [OPENMP][NVPTX] Replace assert() by ASSERT0() macro, NFC.
Required to fix the buildbots.

llvm-svn: 340956
2018-08-29 19:22:06 +00:00
Alexey Bataev b7a5d38cf5 [OPENMP][NVPTX] Lightweight runtime support for SPMD mode.
Summary:
Implemented simple and lightweight runtime support for SPMD mode-based
constructs. It adds support for L2 sequential parallelism wihtout full
runtime support. Also, patch fixes some use cases for
uninitialized|lightweight runtime.

Reviewers: grokos, kkwli0, Hahnfeld, gtbercea

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D51222

llvm-svn: 340944
2018-08-29 17:35:09 +00:00
Alexandre Eichenberger e9b7d8dcd6 [OpenMP][libomptarget] rework of fatal error reporting
Summary:
Removed the function that used a lock and varargs
Used the same mechanism as for debug messages

Reviewers: ABataev, gtbercea, grokos, Hahnfeld

Reviewed By: gtbercea, Hahnfeld

Subscribers: mikerice, ABataev, RaviNarayanaswamy, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D51226

llvm-svn: 340767
2018-08-27 18:20:15 +00:00
Alexandre Eichenberger 1b4a666ba5 [OpenMP][libomptarget] Bringing up to spec with respect to OMP_TARGET_OFFLOAD env var
Summary:
Right now, only the OMP_TARGET_OFFLOAD=DISABLED was implemented. Added support for the other MANDATORY and DEFAULT values.


Reviewers: gtbercea, ABataev, grokos, caomhin, Hahnfeld

Reviewed By: Hahnfeld

Subscribers: protze.joachim, gtbercea, AlexEichenberger, RaviNarayanaswamy, Hahnfeld, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D50522

llvm-svn: 340542
2018-08-23 16:22:42 +00:00
Alexey Bataev 37d4156b11 [OPNEMP, NVPTX] Fixed sychronization construct + code cleanup.
Summary:
1. Fixed internal problem in `__kmpc_barrier` function: SPMD mode
synchronization function should be called only in L1 parallel level.
2. Removed some extra code for synchronization inside of the code, used
`__kmpc_barrier` instead.
3. Some code cleanup.

Reviewers: gtbercea, grokos

Subscribers: openmp-commits

Differential Revision: https://reviews.llvm.org/D49564

llvm-svn: 337691
2018-07-23 13:52:12 +00:00
George Rokos a0da24683b [OpenMP][libomptarget] New map interface: remove translation code and ensure proper alignment of struct members
This patch removes the translation code since this functionality is now implemented in the compiler.
target_data_begin and target_data_end are also patched to handle some special cases that used to be
handled by the obsolete translation function, namely ensure proper alignment of struct members when
we have partially mapped structs. Mapping a struct from a higher address (i.e. not from its beginning)
can result in distortion of the alignment for some of its member fields. Padding restores the original
(proper) alignment.

Differential revision: https://reviews.llvm.org/D44186

llvm-svn: 337455
2018-07-19 13:41:03 +00:00
Joachim Protze bb869f42b7 [libomptarget] Also support several images for elf
In revision r336569 (D49036) libomptarget support for multiple nvidia images
has been fixed in case a target region resides inside one or multiple
libraries and in the compiled application. But the issues is still present
for elf images.
This fix will also support multiple images for elf.

Patch by Jannis Klinkenberg

Reviewers: protze.joachim, ABataev, grokos

Reviewed By: protze.joachim, ABataev, grokos

Subscribers: openmp-commits

Differential Revision: https://reviews.llvm.org/D49418

llvm-svn: 337355
2018-07-18 07:23:46 +00:00
Azharuddin Mohammed 6712b8675b [cmake] Fix libomptarget/test/CMakeLists.txt
Summary:
Should be variable name instead of variable reference. If the variable is
somehow unset, it messes up the if condition expression and causes a CMake
error.

Reviewers: jlpeyton, AndreyChurbanov, Hahnfeld

Reviewed By: Hahnfeld

Subscribers: mgorny, llvm-commits, openmp-commits

Differential Revision: https://reviews.llvm.org/D47221

llvm-svn: 337133
2018-07-15 17:29:43 +00:00
Gheorghe-Teodor Bercea 9e94326185 [OpenMP][libomptarget] Fix data sharing and globalization infrastructure to work in SPMD mode
Summary: This patch fixes the data sharing infrastructure to work for the SPMD and non-SPMD cases.

Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: ABataev, grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D49204

llvm-svn: 337013
2018-07-13 16:14:22 +00:00
Alexey Bataev c2c0138a04 [OPENMP, NVPTX] Fix loop boundaries calculation for dynamic loops.
Summary:
Patch fixes the next problems.
1. Removes unused functions from omptarget_nvptx_ThreadPrivateContext
class + simplified data members.
2. Fixed calculation of loop boundaries for dynamic loops with static
scheduling.
3. Introduced saving/restoring of the dynamic loop boundaries to support
several nested parallel dynamic loops.

Reviewers: grokos

Subscribers: guansong, kkwli0, openmp-commits

Differential Revision: https://reviews.llvm.org/D49241

llvm-svn: 336915
2018-07-12 15:18:28 +00:00
Alexey Bataev 2622e9e5b3 [OPENMP, NVPTX] Support several images in the executable.
Summary:
Currently Cuda plugin supports loading of the single image, though we
may have the executable with the several images, if it has target
regions inside of the dynamically loaded library. Patch allows to load
multiple images.

Reviewers: grokos

Subscribers: guansong, openmp-commits, kkwli0

Differential Revision: https://reviews.llvm.org/D49036

llvm-svn: 336569
2018-07-09 17:46:55 +00:00
Alexey Bataev 3994bafbc7 [OPENMP, NVPTX] Sync threads before start ordered loops.
Summary: Threads must be synchronized before starting ordered construct.

Reviewers: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D48732

llvm-svn: 335987
2018-06-29 16:16:00 +00:00
Alexey Bataev 0ac29350b5 [OPENMP, NVPTX] Fixes for NVPTX RTL
Summary:
Patch fixes several problems in the implementation of NVPTX RTL.
1. Detection of the last iteration for loops with static scheduling, no chunks.
2. Fixes reductions for the serialized parallel constructs.
3. Fixes handling of the barriers.

Reviewers: grokos

Reviewed By: grokos

Subscribers: Hahnfeld, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D48480

llvm-svn: 335469
2018-06-25 13:43:35 +00:00
Guansong Zhang f9e56e5982 [OpenMP] [CUDA] Expose teamid to the debug path
Summary: Small bug fix for debug build. A previous fix causing trouble for debug build.

Reviewers: grokos

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D48286

llvm-svn: 335046
2018-06-19 14:05:38 +00:00
Jonas Hahnfeld 17aabf83e9 [libomptarget-nvptx] loop: Determine if runtime uninitialized
The generic entry points for static loop scheduling previously
hardcoded that the runtime was initialized. This can be wrong if
the compiler analyzes that the runtime is not needed and calls
the init functions accordingly.

This didn't affect clang-ykt because they have entry points for
different combinations of SPMD x Runtime not needed. I didn't do
measurements yet but with inlining we might get away with always
calling the generic interface and letting compiler and runtime
figure out the rest.
In any case, a correct runtime is always better than having
functions that may only be called if previous calls passed in
a specific set of arguments!

Differential Revision: https://reviews.llvm.org/D47131

llvm-svn: 333285
2018-05-25 15:56:48 +00:00
Jonas Hahnfeld 65e0b8784c [CMake] Unify install path for libraries
Introduce OPENMP_INSTALL_LIBDIR and use in all install() commands.
This also fixes installation of libomptarget-nvptx that previously
didn't honor {OPENMP,LLVM}_LIBDIR_SUFFIX.

Differential Revision: https://reviews.llvm.org/D47130

llvm-svn: 333284
2018-05-25 15:56:41 +00:00
George Rokos 6da6f433a0 [CUDA]Fix dynamic|guided scheduling.
The existing implementation of the dynamic scheduling
breaks the contract introduced by the original openmp
runtime and, thus, is incorrect. Patch fixes it and
introduces correct dynamic scheduling model.

Thanks to Alexey Bataev for submitting this patch.

Differential Revision: https://reviews.llvm.org/D47333

llvm-svn: 333225
2018-05-24 21:12:41 +00:00
Jonas Hahnfeld 9228f9718c [libomptarget-nvptx-bc] Pass found CUDA installations
We already know where the CUDA SDK is, so there is no point in
letting Clang search for it again and possibly finding no or
a different installation.

--cuda-path is supported since the beginning of CUDA support in
Clang, so making this required doesn't impose additional restrictions.

Differential Revision: https://reviews.llvm.org/D46930

llvm-svn: 332495
2018-05-16 17:20:27 +00:00
Jonas Hahnfeld 37bbe1a698 [libomptarget-nvptx] Test bitcode compiler flags and enable by default
Move all logic related to selecting the bitcode compiler and linker
into a new file and dynamically test required compiler flags. This
also adds -fcuda-rdc for Clang trunk as previously attempted in D44992
which fixes the build.

As a result this change also enables building the library by default
if all prerequisites are met.

Differential Revision: https://reviews.llvm.org/D46901

llvm-svn: 332494
2018-05-16 17:20:21 +00:00
Gheorghe-Teodor Bercea 787a350021 [OpenMP][libomptarget] Add function for checking SPMD mode
Summary: Add function to the NVPTX libomptarget library that will return true if the current target region is being executed in SPMD mode.

Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D46840

llvm-svn: 332360
2018-05-15 15:16:43 +00:00
Guansong Zhang e1c7a46d5b [OpenMP] Use LIBOMPTARGET_DEVICE_RTL_DEBUG env var to control debug messages on the device side
Summary:
Enable the device side debug messages at compile time, use env var to control at runtime.

To achieve this, an environment data block is passed to the device lib when it is loaded.

By default, the message is off, to enable it, a user need to set LIBOMPDEVICE_DEBUG=1.

Reviewers: grokos

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D46210

llvm-svn: 331550
2018-05-04 19:29:28 +00:00
Guansong Zhang ad6c26516b [OpenMP] Remove compilation warning when using clang to compile bc files.
Summary: Minor printf format correction. NVCC ignore those. Clang will give warning on these if debug is enabled.

Reviewers: grokos

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D45528

llvm-svn: 330944
2018-04-26 14:06:53 +00:00
Guansong Zhang 334c379e32 [OpenMP] Make bc file compilation sensitive to LIBOMPTARGET_NVPTX_DEBUG flag
Summary: The LIBOMPTARGET_NVPTX_DEBUG flag is inconsistent between using nvcc to generate .a file and clang to generate .bc file. Sync the two setting so we can get debug messages from the bc file path as well.

Reviewers: grokos

Subscribers: Hahnfeld, openmp-commits, mgorny

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D45530

llvm-svn: 330477
2018-04-20 20:41:00 +00:00
Guansong Zhang f679431f91 [OpenMP] Remove extra warning when we build
Summary:
This one line change is to remove this warning message

"warning: integer conversion resulted in a change of sign"

Reviewers: grokos

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D45415

llvm-svn: 329713
2018-04-10 15:28:31 +00:00
Guansong Zhang f0029a7738 Revert "[OpenMP] enable bc file compilation using the latest clang"
This reverts commit 6849e31c36d712d97433bca9af39b7a09c8c1207.

llvm-svn: 329576
2018-04-09 14:45:41 +00:00
Guansong Zhang e47fbc9da8 [OpenMP] enable bc file compilation using the latest clang
Summary: adding cuda-rdc flag to allow extern global data

Reviewers: grokos

Reviewed By: grokos

Subscribers: gregrodgers, mgorny, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D44992

llvm-svn: 329072
2018-04-03 15:01:34 +00:00
Gheorghe-Teodor Bercea 4bc36a06e2 [OpenMP][libomptarget] Initialize global memory stack only once.
Summary: The global stack initialization function may be called multiple times. The initialization of the shared memory slots should only happen when the function is called for the first time for a given warp master thread.

Reviewers: grokos, carlo.bertolli, ABataev, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44754

llvm-svn: 328148
2018-03-21 21:02:55 +00:00
Gheorghe-Teodor Bercea b4332ca3da [OpenMP][libomptarget] Fix master warp check
Summary: The check for the master warp must take into consideration the actual number of warps: the master warp is equal to the last active warp not necessarily WARPSIZE - 1.

Reviewers: grokos, carlo.bertolli, ABataev, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44537

llvm-svn: 328146
2018-03-21 20:51:16 +00:00
Gheorghe-Teodor Bercea c8d395a168 [OpenMP][libomptarget] Enable globalization for workers
Summary:
This patch allows worker to have a global memory stack managed by the runtime. This patch is needed for completeness and consistency with the globalization policy: if a worker-side variable escapes the current context it then needs to be globalized.
Until now, only the master thread was allowed to have such a stack. These global values can now potentially be shared amongst workers if the semantics of the OpenMP program require it.

Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44487

llvm-svn: 328144
2018-03-21 20:34:19 +00:00
George Rokos 6b9bb5e1c2 Bugfix, extern declarations for libomp functions are `extern "C"` declarations
llvm-svn: 327763
2018-03-17 02:07:42 +00:00
George Rokos 2878c3957b Moved extern declarations to private header file, they are only used from within libomptarget, they don't need to be in omptarget.h.
llvm-svn: 327740
2018-03-16 20:40:09 +00:00
Gheorghe-Teodor Bercea 876c1ed2e5 [OpenMP][libomptarget] Enable usage of shared memory slots
Summary:
Allow the runtime to use the existing shared memory statically allocated slots.

When a variable is globalized, the underlying memory can be either shared or global memory (both have block-wide visibility). In this case, we allow that the storage to use a limited amount of shared memory that has been statically allocated already. Only if shared memory doesn't prove to be enough do we then invoke malloc() to create a new global memory slot.

Reviewers: ABataev, carlo.bertolli, grokos, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44486

llvm-svn: 327639
2018-03-15 16:05:34 +00:00
Gheorghe-Teodor Bercea f3de222b0d [OpenMP][libomptarget] Enable multiple frames per global memory slot
Summary: To save on calls to malloc, this patch enables the re-use of pre-allocated global memory slots.

Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: grokos

Subscribers: guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44470

llvm-svn: 327637
2018-03-15 15:56:04 +00:00
George Rokos 59be4b434f [libomptarget][nvptx] Bug fix: Correctly identify the warp master active thread.
llvm-svn: 327556
2018-03-14 19:11:36 +00:00
Gheorghe-Teodor Bercea 49b62649cf [OpenMP][libomptarget] Add global memory data sharing support for master-worker sharing.
Summary:
This patch adds support for the sharing of variables from the master thread of a team to the worker threads of the team.
The runtime uses a stack structure implemented as a doubly-linked list of slots with each slot having the exact same size as the size requested. This implementation leverages existing data structures. The runtime functions are added as separate functions to avoid interfering with the current interface. 

Limitations to be addressed in future patches:
- This current patch only employs global memory. In a future patch we will enable to usage for shared memory as an optimization.
- Allow the allocation of several requested sizes in the same slot.

Reviewers: ABataev, grokos, caomhin, carlo.bertolli

Reviewed By: grokos

Subscribers: Hahnfeld, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44260

llvm-svn: 327440
2018-03-13 19:44:53 +00:00
Gheorghe-Teodor Bercea d5e5992f9a [OpenMP][libomptarget] Fix union.
Summary: To make the two parts of the union have the same size, the size of vect needs to be increased by 16 bits.

Reviewers: grokos, carlo.bertolli, caomhin, ABataev

Reviewed By: grokos, ABataev

Subscribers: fedor.sergeev, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D44254

llvm-svn: 327040
2018-03-08 18:44:02 +00:00
Gheorghe-Teodor Bercea 7a5fa21ae2 [OpenMP] Remove implicit data sharing using device shared memory from libomptarget
Summary:
This patch reverts the changes to libomptarget that were coupled with the changes to Clang code gen for data sharing using shared memory. A similar patch exists for Clang: D43625

Shared memory is meant to be used as an optimization on top of a more general scheme. So far we didn't have a global memory implementation ready so shared memory was a solution which applied to the current level of OpenMP complexity supported by trunk on GPU devices (due to the missing NVPTX backend patch this functionality has never been exercised). Now that we have a global memory solution this patch is "in the way" and needs to be removed (for now). This patch (or an equivalent version of it) will be put out for review once the global memory scheme is in place.


Reviewers: ABataev, grokos, carlo.bertolli, caomhin

Reviewed By: grokos

Subscribers: Hahnfeld, guansong, openmp-commits

Differential Revision: https://reviews.llvm.org/D43626

llvm-svn: 326950
2018-03-07 22:10:10 +00:00
Gheorghe-Teodor Bercea d5ae4e6501 [OpenMP][libomptarget] Enable the compilation of multiple bc libraries for runtime inlining
Summary:
Different NVIDIA GPUs support different compute capabilities. To enable the inlining of runtime functions and the best performance on different generations of NVIDIA GPUs, a bc library for each compute capability needs to be compiled. The same compiler build will then be usable in conjunction with multiple generations of NVIDIA GPUs.
To differentiate between versions of the same bc lib, the output file name will contain the compute capability ID.
Depends on D14254

Reviewers: Hahnfeld, hfinkel, carlo.bertolli, caomhin, ABataev, grokos

Reviewed By: Hahnfeld, grokos

Subscribers: guansong, mgorny, openmp-commits

Differential Revision: https://reviews.llvm.org/D41724

llvm-svn: 324904
2018-02-12 16:45:20 +00:00