llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexey Bataev	8c5555c39a	[OPENMP][NVPTX]Mark more functions as always_inline for better performance. Internally generated functions must be marked as always_inlines in most cases. Patch marks some extra reduction function + outlined parallel functions as always_inline for better performance, but only if the optimization is requested. llvm-svn: 361269	2019-05-21 15:11:58 +00:00
Alexey Bataev	8e009036c9	[OPENMP][NVPTX]Use new functions from the runtime library. Updated codegen to use the new functions from the runtime library. llvm-svn: 350415	2019-01-04 17:25:09 +00:00
Alexey Bataev	a3924b517e	[OPENMP][NVPTX]Use __kmpc_barrier_simple_spmd(nullptr, 0) instead of nvvm_barrier0. Use runtime functions instead of the direct call to the nvvm intrinsics. It allows to prevent some dangerous LLVM optimizations, that breaks the code for the NVPTX target. llvm-svn: 350328	2019-01-03 16:25:35 +00:00
Alexey Bataev	6a1b06bcd4	[OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytes buffer. Seems to me, nvlink has a bug with the proper support of the weakly linked symbols. It does not allow to define several shared memory buffer with the different sizes even with the weak linkage. Instead we always use 128 bytes buffer to prevent nvlink from the error message emission. llvm-svn: 349540	2018-12-18 21:01:42 +00:00
Alexey Bataev	f2f39be9ed	[OPENMP][NVPTX]Emit correct reduction code for teams/parallel reductions. Fixed previously committed code for the reduction support in teams/parallel constructs taking into account new design of the NVPTX support in the compiler. Teams reduction are not fully functional yet, it is going to be fixed in the following patches. llvm-svn: 347081	2018-11-16 19:38:21 +00:00
Alexey Bataev	09c9eea78f	[OPENMP][NVPTX]Allow to use shared memory for the target\|teams\|distribute variables. If the total size of the variables, declared in target\|teams\|distribute regions, is less than the maximal size of shared memory available, the buffer is allocated in the shared memory. llvm-svn: 346507	2018-11-09 16:18:04 +00:00
Alexey Bataev	e40901806f	[OPENMP][NVPTX]Improve emission of the globalized variables for target/teams/distribute regions. Target/teams/distribute regions exist for all the time the kernel is executed. Thus, if the variable is declared in their context and then escape it, we can allocate global memory statically instead of allocating it dynamically. Patch captures all the globalized variables in target/teams/distribute contexts, merges them into the records, one per each target region. Those records are then joined into the union, one per compilation unit (to save the global memory). Those units are organized into 2 x dimensional arrays, where the first dimension is the number of blocks per SM and the second one is the number of SMs. Runtime functions manage this global memory space between the executing teams. llvm-svn: 345978	2018-11-02 14:54:07 +00:00
Alexey Bataev	4ac58d1a4b	[OPENMP][NVPTX]Reduce memory usage in target region. Additional reduction of the global memory usage in the target regions without parallel regions. llvm-svn: 344413	2018-10-12 20:19:59 +00:00
Alexey Bataev	9ea3c38597	[OPENMP][NVPTX] Support memory coalescing for globalized variables. Added support for memory coalescing for better performance for globalized variables. From now on all the globalized variables are represented as arrays of 32 elements and each thread accesses these elements using `tid & 31` as index. llvm-svn: 344049	2018-10-09 14:49:00 +00:00
Gheorghe-Teodor Bercea	ad4e579407	[OpenMP] Initialize data sharing stack for SPMD case Summary: In the SPMD case, we need to initialize the data sharing and globalization infrastructure. This covers the case when an SPMD region calls a function in a different compilation unit. Reviewers: ABataev, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: Hahnfeld, jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D49188 llvm-svn: 337015	2018-07-13 16:18:24 +00:00
Alexey Bataev	b99dcb5f31	[OPENMP, NVPTX] Do not globalize local variables in parallel regions. In generic data-sharing mode we are allowed to not globalize local variables that escape their declaration context iff they are declared inside of the parallel region. We can do this because L2 parallel regions are executed sequentially and, thus, we do not need to put shared local variables in the global memory. llvm-svn: 336567	2018-07-09 17:43:58 +00:00
Alexey Bataev	9a70017537	[OPENMP, NVPTX] Fix linkage of the global entries. The linkage of the global entries must be weak to enable support of redefinition of the same target regions in multiple compilation units. llvm-svn: 331768	2018-05-08 14:16:57 +00:00
Gheorghe-Teodor Bercea	36cdfad062	[OpenMP][Clang] Add call to global data sharing stack initialization on the workers side Summary: The workers also need to initialize the global stack. The call to the initialization function needs to happen after the kernel_init() function is called by the master. This ensures that the per-team data structures of the runtime have been initialized. Reviewers: ABataev, grokos, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D44749 llvm-svn: 328219	2018-03-22 17:33:27 +00:00
Alexey Bataev	63cc8e96c3	[OPENMP, NVPTX] Globalization of the private redeclarations. If the generic codegen is enabled and private copy of the original variable escapes the declaration context, this private copy should be globalized just like it was the original variable. llvm-svn: 327985	2018-03-20 14:45:59 +00:00
Gheorghe-Teodor Bercea	d3dcf2f05d	[OpenMP] Add OpenMP data sharing infrastructure using global memory Summary: This patch handles the Clang code generation phase for the OpenMP data sharing infrastructure. TODO: add a more detailed description. Reviewers: ABataev, carlo.bertolli, caomhin, hfinkel, Hahnfeld Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D43660 llvm-svn: 327513	2018-03-14 14:17:45 +00:00
Gheorghe-Teodor Bercea	7d80da15a0	[OpenMP] Remove implicit data sharing code gen that aims to use device shared memory Summary: Remove this scheme for now since it will be covered by another more generic scheme using global memory. This code will be worked into an optimization for the generic data sharing scheme. Removing this completely and then adding it via future patches will make all future data sharing patches cleaner. Reviewers: ABataev, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D43625 llvm-svn: 326948	2018-03-07 21:59:50 +00:00
Jonas Hahnfeld	fa059ba59e	[OpenMP] Further adjustments of nvptx runtime functions Pass in default value of 1, similar to previous commit r318836. Differential Revision: https://reviews.llvm.org/D41012 llvm-svn: 321486	2017-12-27 10:39:56 +00:00
Gheorghe-Teodor Bercea	b4c74c6603	[OpenMP] Add function attribute for triggering data sharing. Summary: The backend should only emit data sharing code for the cases where it is needed. A new function attribute is used by Clang to enable data sharing only for the cases where OpenMP semantics require it and there are variables that need to be shared. Reviewers: hfinkel, Hahnfeld, ABataev, carlo.bertolli, caomhin Reviewed By: ABataev Subscribers: cfe-commits, jholewinski Differential Revision: https://reviews.llvm.org/D41123 llvm-svn: 320527	2017-12-12 21:38:43 +00:00
Jonas Hahnfeld	cfd162d8e5	Fix test/OpenMP/nvptx_data_sharing.cpp This was an oversight that stayed in the test from development. llvm-svn: 318779	2017-11-21 16:49:11 +00:00
Gheorghe-Teodor Bercea	eb89b1d46f	[OpenMP] Add implicit data sharing support when offloading to NVIDIA GPUs using OpenMP device offloading Summary: This patch is part of the development effort to add support in the current OpenMP GPU offloading implementation for implicitly sharing variables between a target region executed by the team master thread and the worker threads within that team. This patch is the first of three required for successfully performing the implicit sharing of master thread variables with the worker threads within a team. The remaining two patches are: - Patch D38978 to the LLVM NVPTX backend which ensures the lowering of shared variables to an device memory which allows the sharing of references; - Patch (coming soon) is a patch to libomptarget runtime library which ensures that a list of references to shared variables is properly maintained. A simple code snippet which illustrates an implicit data sharing situation is as follows: ``` #pragma omp target { // master thread only int v; #pragma omp parallel { // worker threads // use v } } ``` Variable v is implicitly shared from the team master thread which executes the code in between the target and parallel directives. The worker threads must operate on the latest version of v, including any updates performed by the master. The code generated in this patch relies on the LLVM NVPTX patch (mentioned above) which prevents v from being lowered in the thread local memory of the master thread thus making the reference to this variable un-shareable with the workers. This ensures that the code generated by this patch is correct. Since the parallel region is outlined the passing of arguments to the outlined regions must preserve the original order of arguments. The runtime therefore maintains a list of references to shared variables thus ensuring their passing in the correct order. The passing of arguments to the outlined parallel function is performed in a separate function which the data sharing infrastructure constructs in this patch. The function is inlined when optimizations are enabled. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, Hahnfeld, ABataev, caomhin Reviewed By: ABataev Subscribers: cfe-commits, jholewinski Differential Revision: https://reviews.llvm.org/D38976 llvm-svn: 318773	2017-11-21 15:54:54 +00:00

20 Commits