Commit Graph

15 Commits

Author SHA1 Message Date
Alexey Bataev 8c5555c39a [OPENMP][NVPTX]Mark more functions as always_inline for better
performance.

Internally generated functions must be marked as always_inlines in most
cases. Patch marks some extra reduction function + outlined parallel
functions as always_inline for better performance, but only if the
optimization is requested.

llvm-svn: 361269
2019-05-21 15:11:58 +00:00
Alexey Bataev 8e009036c9 [OPENMP][NVPTX]Use new functions from the runtime library.
Updated codegen to use the new functions from the runtime library.

llvm-svn: 350415
2019-01-04 17:25:09 +00:00
Alexey Bataev 6a1b06bcd4 [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytes
buffer.

Seems to me, nvlink has a bug with the proper support of the weakly
linked symbols. It does not allow to define several shared memory buffer
with the different sizes even with the weak linkage. Instead we always
use 128 bytes buffer to prevent nvlink from the error message emission.

llvm-svn: 349540
2018-12-18 21:01:42 +00:00
Alexey Bataev f2f39be9ed [OPENMP][NVPTX]Emit correct reduction code for teams/parallel
reductions.

Fixed previously committed code for the reduction support in
teams/parallel constructs taking into account new design of the NVPTX
support in the compiler. Teams reduction are not fully functional yet,
it is going to be fixed in the following patches.

llvm-svn: 347081
2018-11-16 19:38:21 +00:00
Alexey Bataev 09c9eea78f [OPENMP][NVPTX]Allow to use shared memory for the
target|teams|distribute variables.

If the total size of the variables, declared in target|teams|distribute
regions, is less than the maximal size of shared memory available, the
buffer is allocated in the shared memory.

llvm-svn: 346507
2018-11-09 16:18:04 +00:00
Alexey Bataev e40901806f [OPENMP][NVPTX]Improve emission of the globalized variables for
target/teams/distribute regions.

Target/teams/distribute regions exist for all the time the kernel is
executed. Thus, if the variable is declared in their context and then
escape it, we can allocate global memory statically instead of
allocating it dynamically.
Patch captures all the globalized variables in target/teams/distribute
contexts, merges them into the records, one per each target region.
Those records are then joined into the union, one per compilation unit
(to save the global memory). Those units are organized into
2 x dimensional arrays, where the first dimension is
the number of blocks per SM and the second one is the number of SMs.
Runtime functions manage this global memory space between the executing
teams.

llvm-svn: 345978
2018-11-02 14:54:07 +00:00
Alexey Bataev 4ac58d1a4b [OPENMP][NVPTX]Reduce memory usage in target region.
Additional reduction of the global memory usage in the target regions
without parallel regions.

llvm-svn: 344413
2018-10-12 20:19:59 +00:00
Alexey Bataev 9ea3c38597 [OPENMP][NVPTX] Support memory coalescing for globalized variables.
Added support for memory coalescing for better performance for
globalized variables. From now on all the globalized variables are
represented as arrays of 32 elements and each thread accesses these
elements using `tid & 31` as index.

llvm-svn: 344049
2018-10-09 14:49:00 +00:00
Alexey Bataev 6bc2732f71 [OPENMP][NVPTX] Fix emission of __kmpc_global_thread_num() for non-SPMD
mode.

__kmpc_global_thread_num() should be called before initialization of the
runtime.

llvm-svn: 343857
2018-10-05 15:27:47 +00:00
Alexey Bataev a4fa0b880a [OPENMP] General code improvements.
llvm-svn: 330140
2018-04-16 17:59:34 +00:00
Alexey Bataev b7f3cba84c [OPENMP, NVPTX] Emit correct thread id.
We emitted fake thread id for the outined function in NVPTX codegen.
Patch adds emission of the real thread id.

llvm-svn: 327867
2018-03-19 17:04:07 +00:00
Alexey Bataev c99042ba97 [OPENMP, NVPTX] Improve globalization of the variables captured by value.
If the variable is captured by value and the corresponding parameter in
the outlined function escapes its declaration context, this parameter
must be globalized. To globalize it we need to get the address of the
original parameter, load the value, store it to the global address and
use this global address instead of the original.

Patch improves globalization for parallel|teams regions + functions in
declare target regions.

llvm-svn: 327654
2018-03-15 18:10:54 +00:00
Samuel Antao 1168d63cf9 [OpenMP] Use fopenmp prefix for all options introduced by the offloading implementation.
Summary: This patch changes the options used by offloading to start with -fopenmp instead of -fomp. This makes the option naming more consistent and materializes a suggestion by Richard Smith in http://reviews.llvm.org/D9888.

Reviewers: hfinkel, carlo.bertolli, arpith-jacob, ABataev

Subscribers: kkwli0, cfe-commits, caomhin

Differential Revision: http://reviews.llvm.org/D21841

llvm-svn: 274283
2016-06-30 21:22:08 +00:00
Alexey Bataev 7ace49dff1 [OPENMP] Pass scalar firstprivate vars by value.
For better performance and to unify code with offloading part we pass
scalar firstprivate values by value, instead of by reference. It will
remove some extra copying operations.

llvm-svn: 269751
2016-05-17 08:55:33 +00:00
Carlo Bertolli c687225b43 [OPENMP] Codegen for teams directive for NVPTX
This patch implements the teams directive for the NVPTX backend. It is different from the host code generation path as it:

Does not call kmpc_fork_teams. All necessary teams and threads are started upon touching the target region, when launching a CUDA kernel, and their execution is coordinated through sequential and parallel regions within the target region.
Does not call kmpc_push_num_teams even if a num_teams of thread_limit clause is present. Setting the number of teams and the thread limit is implemented by the nvptx-related runtime.
Please note that I am now passing a Clang Expr * to emitPushNumTeams instead of the originally chosen llvm::Value * type. The reason for that is that I want to avoid emitting expressions for num_teams and thread_limit if they are not needed in the target region.

http://reviews.llvm.org/D17963

llvm-svn: 265304
2016-04-04 15:55:02 +00:00