Commit Graph

19 Commits

Author SHA1 Message Date
Philipp Schaad 50139f0f38 [PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtime
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.

Differential revision: D36925

llvm-svn: 311248
2017-08-19 17:04:57 +00:00
Siddharth Bhat 9e3db2b756 [PPCGCodeGen] [3/3] Update PPCGCodeGen + tests to latest ppcg.
This commit *WILL COMPILE*.

1. `PPCG` now uses `isl_multi_pw_aff` instead of an array of `pw_aff`.
   This needs us to adjust how we index array bounds and how we construct
   array bounds.

2. `PPCG` introduces two new kinds of nodes: `init_device` and `clear_device`.
   We should investigate what the correct way to handle these are.

3. `PPCG` has gotten smarter with its use of live range reordering, so some of
   the tests have a qualitative improvement.

4. `PPCG` changed its output style, so many test cases need to be updated to
   fit the new style for `polly-acc-dump-code` checks.

Differential Revision: https://reviews.llvm.org/D35677

llvm-svn: 308625
2017-07-20 15:48:36 +00:00
Singapuram Sanjay Srivallabh 1abd9ffa37 [PPCGCodeGen] Differentiate kernels based on their parent Scop
Summary:
Add a sequence number that identifies a ptx_kernel's parent Scop within a function to it's name to differentiate it from other kernels produced from the same function, yet different Scops.

Kernels produced from different Scops can end up having the same name. Consider a function with 2 Scops and each Scop being able to produce just one kernel. Both of these kernels have the name "kernel_0". This can lead to the wrong kernel being launched when the runtime picks a kernel from its cache based on the name alone. This patch supplements D33985, by differentiating kernels across Scops as well.

Previously (even before D33985) while profiling kernels generated through JIT e.g. Julia, [[ https://groups.google.com/d/msg/polly-dev/J1j587H3-Qw/mR-jfL16BgAJ | kernels associated with different functions, and even different SCoPs within a function, would be grouped together due to the common name ]]. This patch prevents this grouping and the kernels are reported separately.

Reviewers: grosser, bollu

Reviewed By: grosser

Subscribers: mehdi_amini, nemanjai, pollydev, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35176

llvm-svn: 307814
2017-07-12 16:46:19 +00:00
Singapuram Sanjay Srivallabh 79f13b9a80 Prefix the name of the calling host function in the name of callee GPU kernel
Summary:
Provide more context to the name of a GPU kernel by prefixing its name with the host function that calls it. E.g. The first kernel called by `gemm` would be `FUNC_gemm_KERNEL_0`.

Kernels currently follow the "kernel_#" (# = 0,1,2,3,...) nomenclature. This patch makes it easier to map host caller and device callee, especially when there are many kernels produced by Polly-ACC.

Reviewers: grosser, Meinersbur, bollu, philip.pfaffe, kbarton!

Reviewed By: grosser

Subscribers: nemanjai, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D33985

llvm-svn: 307173
2017-07-05 16:48:21 +00:00
Siddharth Bhat f16db04cd5 [FIX] Fix regression caused by c29f4ed, testcase matches output
- Commit changed codegen for induction variables
- Updated testcase

llvm-svn: 302891
2017-05-12 11:34:51 +00:00
Siddharth Bhat a90be207c6 [Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen
Summary: PPCGCodeGeneration now attaches the size of the kernel launch parameters at the end of the parameter list. For the existing CUDA Runtime, this gets ignored, but the OpenCL Runtime knows to check for kernel-argument size at the end of the parameter list. (The resulting parameters list is twice as long. This has been accounted for in the corresponding test cases).

Reviewers: grosser, Meinersbur, bollu

Reviewed By: bollu

Subscribers: nemanjai, yaxunl, Anastasia, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32961

llvm-svn: 302515
2017-05-09 10:45:52 +00:00
Siddharth Bhat d277feda91 [PPCGCodeGeneration] Update PPCG Code Generation for OpenCL compatibility
Added a small change to the way pointer arguments are set in the kernel
code generation. The way the pointer is retrieved now, specifically requests
global address space to be annotated. This is necessary, if the IR should be
run through NVPTX to generate OpenCL compatible PTX.

The changes do not affect the PTX Strings generated for the CUDA target
(nvptx64-nvidia-cuda), but are necessary for OpenCL (nvptx64-nvidia-nvcl).

Additionally, the data layout has been updated to what the NVPTX Backend requests/recommends.

Contributed-by: Philipp Schaad

Reviewers: Meinersbur, grosser, bollu

Reviewed By: grosser, bollu

Subscribers: jlebar, pollydev, llvm-commits, nemanjai, yaxunl, Anastasia

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32215

llvm-svn: 301299
2017-04-25 08:08:29 +00:00
Tobias Grosser cfdee6582b GPGPU: Make test cases independent of register numbering [NFC]
llvm-svn: 281847
2016-09-18 06:50:28 +00:00
Tobias Grosser c59b3ce044 [BlockGenerator] Also eliminate dead code not originating from BB
After having generated the code for a ScopStmt, we run a simple dead-code
elimination that drops all instructions that are known to be and remain unused.
Until this change, we only considered instructions for dead-code elimination, if
they have a corresponding instruction in the original BB that belongs to
ScopStmt. However, when generating code we do not only copy code from the BB
belonging to a ScopStmt, but also generate code for operands referenced from BB.
After this change, we now also considers code for dead code elimination, which
does not have a corresponding instruction in BB.

This fixes a bug in Polly-ACC where such dead-code referenced CPU code from
within a GPU kernel, which is possible as we do not guarantee that all variables
that are used in known-dead-code are moved to the GPU.

llvm-svn: 278103
2016-08-09 08:59:05 +00:00
Tobias Grosser 629109b633 GPGPU: Mark kernel functions as polly.skip
Otherwise, we would try to re-optimize them with Polly-ACC and possibly even
generate kernels that try to offload themselves, which does not work as the
GPURuntime is not available on the accelerator and also does not make any
sense.

llvm-svn: 277589
2016-08-03 12:00:07 +00:00
Tobias Grosser a490147c90 GPGPU: Pass host iterators to kernel
llvm-svn: 276962
2016-07-28 06:47:56 +00:00
Tobias Grosser 79a947c233 GPGPU: Add basic support for kernel launches
llvm-svn: 276863
2016-07-27 13:20:16 +00:00
Tobias Grosser 5779359624 GPGPU: Load GPU kernels
We embed the PTX code into the host IR as a global variable and compile it
at run-time into a GPU kernel.

llvm-svn: 276645
2016-07-25 16:31:21 +00:00
Tobias Grosser edb885cb12 GPGPU: generate code for ScopStatements
This change introduces the actual compute code in the GPU kernels. To ensure
all values referenced from the statements in the GPU kernel are indeed available
we scan all ScopStmts in the GPU kernel for references to llvm::Values that
are not yet covered by already modeled outer loop iterators, parameters, or
array base pointers and also pass these additional llvm::Values to the
GPU kernel.

For arrays used in the GPU kernel we introduce a new ScopArrayInfo object, which
is referenced by the newly generated access functions within the GPU kernel and
which is used to help with code generation.

llvm-svn: 276270
2016-07-21 13:15:59 +00:00
Tobias Grosser 59ab070523 GPGPU: generate control flow within the kernel
llvm-svn: 275956
2016-07-19 07:33:11 +00:00
Tobias Grosser f6044bd0ef GPGPU: add host iterators to kernel arguments
llvm-svn: 275954
2016-07-19 07:32:55 +00:00
Tobias Grosser b9fc860a57 GPGPU: collect array references
Initialize the list of references to a GPU array to ensure that the arrays that
need to be passed to kernel calls are computed correctly.  Furthermore, the very
same information is also necessary to compute synchronization correctly. As the
functionality to compute these references is already available, what is left for
us to do is only to connect the necessary functionality to compute array
reference information.

llvm-svn: 275798
2016-07-18 15:44:32 +00:00
Tobias Grosser 05aad8dbcd test: Add missing 'REQUIRES' line
llvm-svn: 275784
2016-07-18 12:02:44 +00:00
Tobias Grosser 38fc0aed08 GPGPU: Create host control flow
Create LLVM-IR for all host-side control flow of a given GPU AST. We implement
this by introducing a new GPUNodeBuilder class derived from IslNodeBuilder.  The
IslNodeBuilder will take care of generating all general-purpose ast nodes, but
we provide our own createUser implementation to handle the different GPU
specific user statements. For now, we just skip any user statement and only
generate a host-code sceleton, but in subsequent commits we will add handling of
normal ScopStmt's performing computations, kernel calls, as well as host-device
data transfers. We will also introduce run-time check generation and LICM in
subsequent commits.

llvm-svn: 275783
2016-07-18 11:56:39 +00:00