forked from OSchip/llvm-project
247 lines
9.2 KiB
Plaintext
247 lines
9.2 KiB
Plaintext
Requirements:
|
|
|
|
- automake, autoconf, libtool
|
|
(not needed when compiling a release)
|
|
- pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config)
|
|
(not needed when compiling a release using the included isl and pet)
|
|
- gmp (http://gmplib.org/)
|
|
- libyaml (http://pyyaml.org/wiki/LibYAML)
|
|
(only needed if you want to compile the pet executable)
|
|
- LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html)
|
|
Unless you have some other reasons for wanting to use the svn version,
|
|
it is best to install the latest release (3.9).
|
|
For more details, see pet/README.
|
|
|
|
If you are installing on Ubuntu, then you can install the following packages:
|
|
|
|
automake autoconf libtool pkg-config libgmp3-dev libyaml-dev libclang-dev llvm
|
|
|
|
Note that you need at least version 3.2 of libclang-dev (ubuntu raring).
|
|
Older versions of this package did not include the required libraries.
|
|
If you are using an older version of ubuntu, then you need to compile and
|
|
install LLVM/clang from source.
|
|
|
|
|
|
Preparing:
|
|
|
|
Grab the latest release and extract it or get the source from
|
|
the git repository as follows. This process requires autoconf,
|
|
automake, libtool and pkg-config.
|
|
|
|
git clone git://repo.or.cz/ppcg.git
|
|
cd ppcg
|
|
./get_submodules.sh
|
|
./autogen.sh
|
|
|
|
|
|
Compilation:
|
|
|
|
./configure
|
|
make
|
|
make check
|
|
|
|
If you have installed any of the required libraries in a non-standard
|
|
location, then you may need to use the --with-gmp-prefix,
|
|
--with-libyaml-prefix and/or --with-clang-prefix options
|
|
when calling "./configure".
|
|
|
|
|
|
Using PPCG to generate CUDA or OpenCL code
|
|
|
|
To convert a fragment of a C program to CUDA, insert a line containing
|
|
|
|
#pragma scop
|
|
|
|
before the fragment and add a line containing
|
|
|
|
#pragma endscop
|
|
|
|
after the fragment. To generate CUDA code run
|
|
|
|
ppcg --target=cuda file.c
|
|
|
|
where file.c is the file containing the fragment. The generated
|
|
code is stored in file_host.cu and file_kernel.cu.
|
|
|
|
To generate OpenCL code run
|
|
|
|
ppcg --target=opencl file.c
|
|
|
|
where file.c is the file containing the fragment. The generated code
|
|
is stored in file_host.c and file_kernel.cl.
|
|
|
|
|
|
Specifying tile, grid and block sizes
|
|
|
|
The iterations space tile size, grid size and block size can
|
|
be specified using the --sizes option. The argument is a union map
|
|
in isl notation mapping kernels identified by their sequence number
|
|
in a "kernel" space to singleton sets in the "tile", "grid" and "block"
|
|
spaces. The sizes are specified outermost to innermost.
|
|
|
|
The dimension of the "tile" space indicates the (maximal) number of loop
|
|
dimensions to tile. The elements of the single integer tuple
|
|
specify the tile sizes in each dimension.
|
|
In case of hybrid tiling, the first element is half the size of
|
|
the tile in the time (sequential) dimension. The second element
|
|
specifies the number of elements in the base of the hexagon.
|
|
The remaining elements specify the tile sizes in the remaining space
|
|
dimensions.
|
|
|
|
The dimension of the "grid" space indicates the (maximal) number of block
|
|
dimensions in the grid. The elements of the single integer tuple
|
|
specify the number of blocks in each dimension.
|
|
|
|
The dimension of the "block" space indicates the (maximal) number of thread
|
|
dimensions in the grid. The elements of the single integer tuple
|
|
specify the number of threads in each dimension.
|
|
|
|
For example,
|
|
|
|
{ kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 }
|
|
|
|
specifies that in kernel 0, two loops should be tiled with a tile
|
|
size of 64 in both dimensions and that all kernels except kernel 4
|
|
should be run using a block of 16 threads.
|
|
|
|
Since PPCG performs some scheduling, it can be difficult to predict
|
|
what exactly will end up in a kernel. If you want to specify
|
|
tile, grid or block sizes, you may want to run PPCG first with the defaults,
|
|
examine the kernels and then run PPCG again with the desired sizes.
|
|
Instead of examining the kernels, you can also specify the option
|
|
--dump-sizes on the first run to obtain the effectively used default sizes.
|
|
|
|
|
|
Compiling the generated CUDA code with nvcc
|
|
|
|
To get optimal performance from nvcc, it is important to choose --arch
|
|
according to your target GPU. Specifically, use the flag "--arch sm_20"
|
|
for fermi, "--arch sm_30" for GK10x Kepler and "--arch sm_35" for
|
|
GK110 Kepler. We discourage the use of older cards as we have seen
|
|
correctness issues with compilation for older architectures.
|
|
Note that in the absence of any --arch flag, nvcc defaults to
|
|
"--arch sm_13". This will not only be slower, but can also cause
|
|
correctness issues.
|
|
If you want to obtain results that are identical to those obtained
|
|
by the original code, then you may need to disable some optimizations
|
|
by passing the "--fmad=false" option.
|
|
|
|
|
|
Compiling the generated OpenCL code with gcc
|
|
|
|
To compile the host code you need to link against the file
|
|
ocl_utilities.c which contains utility functions used by the generated
|
|
OpenCL host code. To compile the host code with gcc, run
|
|
|
|
gcc -std=c99 file_host.c ocl_utilities.c -lOpenCL
|
|
|
|
Note that we have experienced the generated OpenCL code freezing
|
|
on some inputs (e.g., the PolyBench symm benchmark) when using
|
|
at least some version of the Nvidia OpenCL library, while the
|
|
corresponding CUDA code runs fine.
|
|
We have experienced no such freezes when using AMD, ARM or Intel
|
|
OpenCL libraries.
|
|
|
|
By default, the compiled executable will need the _kernel.cl file at
|
|
run time. Alternatively, the option --opencl-embed-kernel-code may be
|
|
given to place the kernel code in a string literal. The kernel code is
|
|
then compiled into the host binary, such that the _kernel.cl file is no
|
|
longer needed at run time. Any kernel include files, in particular
|
|
those supplied using --opencl-include-file, will still be required at
|
|
run time.
|
|
|
|
|
|
Function calls
|
|
|
|
Function calls inside the analyzed fragment are reproduced
|
|
in the CUDA or OpenCL code, but for now it is left to the user
|
|
to make sure that the functions that are being called are
|
|
available from the generated kernels.
|
|
|
|
In the case of OpenCL code, the --opencl-include-file option
|
|
may be used to specify one or more files to be #include'd
|
|
from the generated code. These files may then contain
|
|
the definitions of the functions being called from the
|
|
program fragment. If the pathnames of the included files
|
|
are relative to the current directory, then you may need
|
|
to additionally specify the --opencl-compiler-options=-I.
|
|
to make sure that the files can be found by the OpenCL compiler.
|
|
The included files may contain definitions of types used by the
|
|
generated kernels. By default, PPCG generates definitions for
|
|
types as needed, but these definitions may collide with those in
|
|
the included files, as PPCG does not consider the contents of the
|
|
included files. The --no-opencl-print-kernel-types will prevent
|
|
PPCG from generating type definitions.
|
|
|
|
|
|
GNU extensions
|
|
|
|
By default, PPCG may print out macro definitions that involve
|
|
GNU extensions such as __typeof__ and statement expressions.
|
|
Some compilers may not support these extensions.
|
|
In particular, OpenCL 1.2 beignet 1.1.1 (git-6de6918)
|
|
has been reported not to support __typeof__.
|
|
The use of these extensions can be turned off with the
|
|
--no-allow-gnu-extensions option.
|
|
|
|
|
|
Processing PolyBench
|
|
|
|
When processing a PolyBench/C 3.2 benchmark, you should always specify
|
|
-DPOLYBENCH_USE_C99_PROTO on the ppcg command line. Otherwise, the source
|
|
files are inconsistent, having fixed size arrays but parametrically
|
|
bounded loops iterating over them.
|
|
However, you should not specify this define when compiling
|
|
the PPCG generated code using nvcc since CUDA does not support VLAs.
|
|
|
|
|
|
CUDA and function overloading
|
|
|
|
While CUDA supports function overloading based on the arguments types,
|
|
no such function overloading exists in the input language C. Since PPCG
|
|
simply prints out the same function name as in the original code, this
|
|
may result in a different function being called based on the types
|
|
of the arguments. For example, if the original code contains a call
|
|
to the function sqrt() with a float argument, then the argument will
|
|
be promoted to a double and the sqrt() function will be called.
|
|
In the transformed (CUDA) code, however, overloading will cause the
|
|
function sqrtf() to be called. Until this issue has been resolved in PPCG,
|
|
we recommend that users either explicitly call the function sqrtf() or
|
|
explicitly cast the argument to double in the input code.
|
|
|
|
|
|
Contact
|
|
|
|
For bug reports, feature requests and questions,
|
|
contact http://groups.google.com/group/isl-development
|
|
|
|
Whenever you report a bug, please mention the exact version of PPCG
|
|
that you are using (output of "./ppcg --version"). If you are unable
|
|
to compile PPCG, then report the git version (output of "git describe")
|
|
or the version number included in the name of the tarball.
|
|
|
|
|
|
Citing PPCG
|
|
|
|
If you use PPCG for your research, you are invited to cite
|
|
the following paper.
|
|
|
|
@article{Verdoolaege2013PPCG,
|
|
author = {Verdoolaege, Sven and Juega, Juan Carlos and Cohen, Albert and
|
|
G\'{o}mez, Jos{\'e} Ignacio and Tenllado, Christian and
|
|
Catthoor, Francky},
|
|
title = {Polyhedral parallel code generation for CUDA},
|
|
journal = {ACM Trans. Archit. Code Optim.},
|
|
issue_date = {January 2013},
|
|
volume = {9},
|
|
number = {4},
|
|
month = jan,
|
|
year = {2013},
|
|
issn = {1544-3566},
|
|
pages = {54:1--54:23},
|
|
doi = {10.1145/2400682.2400713},
|
|
acmid = {2400713},
|
|
publisher = {ACM},
|
|
address = {New York, NY, USA},
|
|
}
|