Summary:
Previously we compiled CUDA device code to PTX assembly and embedded
that asm as text in our host binary. Now we compile to PTX assembly and
then invoke ptxas to assemble the PTX into a cubin file. We gather the
ptx and cubin files for each of our --cuda-gpu-archs and combine them
using fatbinary, and then embed that into the host binary.
Adds two new command-line flags, -Xcuda_ptxas and -Xcuda_fatbinary,
which pass args down to the external tools.
Reviewers: tra, echristo
Subscribers: cfe-commits, jhen
Differential Revision: http://reviews.llvm.org/D16082
llvm-svn: 257809
- added detection of libdevice bitcode file and API to find one appropriate for the GPU we're compiling for.
- pass additional cc1 options for linking with detected libdevice bitcode
- added -nocudalib to prevent automatic linking with libdevice
- added test cases to verify new functionality
Differential Revision: http://reviews.llvm.org/D14556
llvm-svn: 253387
Added new option --cuda-path=<path> which allows
overriding default search paths.
If it's not specified we look for CUDA installation in
/usr/include/cuda and /usr/include/cuda-7.0.
Differential Revision: http://reviews.llvm.org/D12989
llvm-svn: 248433
Added new option --cuda-path=<path> which allows
overriding default search paths.
If it's not specified we look for CUDA installation in
/usr/include/cuda and /usr/include/cuda-7.0.
Differential Revision: http://reviews.llvm.org/D12989
llvm-svn: 248408