Commit Graph

5 Commits

Author SHA1 Message Date
Singapuram Sanjay Srivallabh 1abd9ffa37 [PPCGCodeGen] Differentiate kernels based on their parent Scop
Summary:
Add a sequence number that identifies a ptx_kernel's parent Scop within a function to it's name to differentiate it from other kernels produced from the same function, yet different Scops.

Kernels produced from different Scops can end up having the same name. Consider a function with 2 Scops and each Scop being able to produce just one kernel. Both of these kernels have the name "kernel_0". This can lead to the wrong kernel being launched when the runtime picks a kernel from its cache based on the name alone. This patch supplements D33985, by differentiating kernels across Scops as well.

Previously (even before D33985) while profiling kernels generated through JIT e.g. Julia, [[ https://groups.google.com/d/msg/polly-dev/J1j587H3-Qw/mR-jfL16BgAJ | kernels associated with different functions, and even different SCoPs within a function, would be grouped together due to the common name ]]. This patch prevents this grouping and the kernels are reported separately.

Reviewers: grosser, bollu

Reviewed By: grosser

Subscribers: mehdi_amini, nemanjai, pollydev, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35176

llvm-svn: 307814
2017-07-12 16:46:19 +00:00
Singapuram Sanjay Srivallabh 79f13b9a80 Prefix the name of the calling host function in the name of callee GPU kernel
Summary:
Provide more context to the name of a GPU kernel by prefixing its name with the host function that calls it. E.g. The first kernel called by `gemm` would be `FUNC_gemm_KERNEL_0`.

Kernels currently follow the "kernel_#" (# = 0,1,2,3,...) nomenclature. This patch makes it easier to map host caller and device callee, especially when there are many kernels produced by Polly-ACC.

Reviewers: grosser, Meinersbur, bollu, philip.pfaffe, kbarton!

Reviewed By: grosser

Subscribers: nemanjai, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D33985

llvm-svn: 307173
2017-07-05 16:48:21 +00:00
Siddharth Bhat d277feda91 [PPCGCodeGeneration] Update PPCG Code Generation for OpenCL compatibility
Added a small change to the way pointer arguments are set in the kernel
code generation. The way the pointer is retrieved now, specifically requests
global address space to be annotated. This is necessary, if the IR should be
run through NVPTX to generate OpenCL compatible PTX.

The changes do not affect the PTX Strings generated for the CUDA target
(nvptx64-nvidia-cuda), but are necessary for OpenCL (nvptx64-nvidia-nvcl).

Additionally, the data layout has been updated to what the NVPTX Backend requests/recommends.

Contributed-by: Philipp Schaad

Reviewers: Meinersbur, grosser, bollu

Reviewed By: grosser, bollu

Subscribers: jlebar, pollydev, llvm-commits, nemanjai, yaxunl, Anastasia

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32215

llvm-svn: 301299
2017-04-25 08:08:29 +00:00
Tobias Grosser 470608e3e4 Add missing 'REQUIRES' line
llvm-svn: 277800
2016-08-05 07:08:45 +00:00
Tobias Grosser c1c6a2a61b GPGPU: Add cuda annotations to specify maximal number of threads per block
These annotations ensure that the NVIDIA PTX assembler limits the number of
registers used such that we can be certain the resulting kernel can be executed
for the number of threads in a thread block that we are planning to use.

llvm-svn: 277799
2016-08-05 06:47:43 +00:00