llvm-project

Commit Graph

Author	SHA1	Message	Date
Artem Belevich	578653a8fc	[CUDA] Fixed the list of GPUs supported by CUDA-9. Differential Revision: https://reviews.llvm.org/D47268 llvm-svn: 333098	2018-05-23 16:45:23 +00:00
Artem Belevich	679dafe69e	[CUDA] Added -f[no-]cuda-short-ptr option The option enables use of 32-bit pointers for accessing const/local/shared memory. The feature is disabled by default. Differential Revision: https://reviews.llvm.org/D46148 llvm-svn: 331938	2018-05-09 23:10:09 +00:00
Artem Belevich	3cce307799	[CUDA] Enable CUDA compilation with CUDA-9.2 Differential Revision: https://reviews.llvm.org/D45827 llvm-svn: 330753	2018-04-24 18:23:19 +00:00
Artem Belevich	0ae8590354	[NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions. The new instructions were added added for sm_70+ GPUs in CUDA-9.1. Differential Revision: https://reviews.llvm.org/D45068 llvm-svn: 330296	2018-04-18 21:51:48 +00:00
Alexey Bataev	e36c67b35c	[NVPTX] Emit debug info in DWARF-2 by default for Cuda devices. Summary: NVPTX target supports debug info in DWARF-2 format. Patch adds emission of debug info in DWARF-2 by default. Reviewers: tra, jlebar Subscribers: aprantl, JDevlieghere, cfe-commits Differential Revision: https://reviews.llvm.org/D42581 llvm-svn: 330272	2018-04-18 16:31:09 +00:00
Artem Belevich	dde3dc27ee	[CUDA] Added --[no-]cuda-include-ptx=sm_XX\|all option. Currently we always include PTX into the fatbin along with the GPU code.It about doubles the size of the GPU binary we need to carry in the executable. These options allow control inclusion of PTX into GPU binary. This patch does not change the defaults, though we may consider making no-PTX the default in the future. Differential Revision: https://reviews.llvm.org/D45495 llvm-svn: 329737	2018-04-10 18:38:22 +00:00
Alexander Kornienko	2a8c18d991	Fix typos in clang Found via codespell -q 3 -I ../clang-whitelist.txt Where whitelist consists of: archtype cas classs checkk compres definit frome iff inteval ith lod methode nd optin ot pres statics te thru Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few files that have dubious fixes reverted.) Differential revision: https://reviews.llvm.org/D44188 llvm-svn: 329399	2018-04-06 15:14:32 +00:00
Gheorghe-Teodor Bercea	0d5aa84ad9	[OpenMP] Add flag for linking runtime bitcode library Summary: This patch adds an additional flag to the OpenMP device offloading toolchain to link in the runtime library bitcode. Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, grokos, hfinkel Reviewed By: ABataev, grokos Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D43197 llvm-svn: 327460	2018-03-13 23:19:52 +00:00
Gheorghe-Teodor Bercea	0805b80a73	Revert revision 327438. llvm-svn: 327447	2018-03-13 20:50:12 +00:00
Gheorghe-Teodor Bercea	148046c11b	[OpenMP] Add flag for linking runtime bitcode library Summary: This patch adds an additional flag to the OpenMP device offloading toolchain to link in the runtime library bitcode. Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, grokos, hfinkel Reviewed By: ABataev, grokos Subscribers: jholewinski, guansong, cfe-commits Differential Revision: https://reviews.llvm.org/D43197 llvm-svn: 327438	2018-03-13 19:39:19 +00:00
Jonas Hahnfeld	5379c6d6fd	[CUDA] Add option to generate relocatable device code As a first step, pass '-c/--compile-only' to ptxas so that it doesn't complain about references to external function. This will successfully generate object files, but they won't work at runtime because the registration routines need to adapted. Differential Revision: https://reviews.llvm.org/D42921 llvm-svn: 324878	2018-02-12 10:46:45 +00:00
Jonas Hahnfeld	7f9c518423	[CUDA] Detect installation in PATH If the CUDA toolkit is not installed to its default locations in /usr/local/cuda, the user is forced to specify --cuda-path. This is tedious and the driver can be smarter if well-known tools (like ptxas) can already be found in the PATH environment variable. Add option --cuda-path-ignore-env if the user wants to ignore set environment variables. Also use it in the tests to make sure the driver always finds the same CUDA installation, regardless of the user's environment. Differential Revision: https://reviews.llvm.org/D42642 llvm-svn: 323848	2018-01-31 08:26:51 +00:00
Artem Belevich	fbc56a904f	[CUDA] Added partial support for CUDA-9.1 Clang can use CUDA-9.1 now, though new APIs (are not implemented yet. The major change is that headers in CUDA-9.1 went through substantial changes that started in CUDA-9.0 which required substantial changes in the cuda compatibility headers provided by clang. There are two major issues: * CUDA SDK no longer provides declarations for libdevice functions. * A lot of device-side functions have become nvcc's builtins and CUDA headers no longer contain their implementations. This patch changes the way CUDA headers are handled if we compile with CUDA 9.x. Both 9.0 and 9.1 are affected. * Clang provides its own declarations of libdevice functions. * For CUDA-9.x clang now provides implementation of device-side 'standard library' functions using libdevice. This patch should not affect compilation with CUDA-8. There may be some observable differences for CUDA-9.0, though they are not expected to affect functionality. Tested: CUDA test-suite tests for all supported combinations of: CUDA: 7.0,7.5,8.0,9.0,9.1 GPU: sm_20, sm_35, sm_60, sm_70 Differential Revision: https://reviews.llvm.org/D42513 llvm-svn: 323713	2018-01-30 00:00:12 +00:00
Ismail Donmez	64f99dfa08	Fix function call to fix build ../tools/clang/lib/Driver/ToolChains/Cuda.cpp:80:18: error: reference to non-static member function must be called; did you mean to call it with no arguments? if (Distro(D.getVFS).IsDebian()) ~~^~~~~~ () llvm-svn: 319322	2017-11-29 15:18:02 +00:00
Sylvestre Ledru	02082c39f3	Follow up of r319317, add the missing header file llvm-svn: 319319	2017-11-29 15:11:53 +00:00
Sylvestre Ledru	0cfcdc3ffd	Add the nvidia-cuda-toolkit Debian package path to search path Summary: Reported here: http://bugs.debian.org/882505 Patch by Andreas Beckmann Reviewers: Hahnfeld, tra Reviewed By: tra Subscribers: jlebar, cfe-commits Differential Revision: https://reviews.llvm.org/D40453 llvm-svn: 319317	2017-11-29 15:03:28 +00:00
Jonas Hahnfeld	7c78cc5273	[OpenMP] Consistently use cubin extension for nvlink This was previously done in some places, but for example not for bundling so that single object compilation with -c failed. In addition cubin was used for all file types during unbundling which is incorrect for assembly files that are passed to ptxas. Tighten up the tests so that we can't regress in that area. Differential Revision: https://reviews.llvm.org/D40250 llvm-svn: 318763	2017-11-21 14:44:45 +00:00
Justin Lebar	066494d8c1	[CUDA] Print an error if you try to compile with < sm_30 on CUDA 9. Summary: CUDA 9's minimum sm is sm_30. Ideally we should also make sm_30 the default when compiling with CUDA 9, but that seems harder than it should be. Subscribers: sanjoy Differential Revision: https://reviews.llvm.org/D39109 llvm-svn: 316611	2017-10-25 21:32:06 +00:00
Jonas Hahnfeld	30b4418e5a	[CMake][OpenMP] Customize default offloading arch For the shuffle instructions in reductions we need at least sm_30 but the user may want to customize the default architecture. Differential Revision: https://reviews.llvm.org/D38883 llvm-svn: 315996	2017-10-17 13:37:36 +00:00
Jonas Hahnfeld	e2c342fc65	[CUDA] Require libdevice only if needed If the user passes -nocudalib, we can live without it being present. Simplify the code by just checking whether LibDeviceMap is empty. Differential Revision: https://reviews.llvm.org/D38901 llvm-svn: 315902	2017-10-16 13:31:30 +00:00
Gheorghe-Teodor Bercea	5a3608ccfa	[OpenMP] Don't throw cudalib not found error if only front-end is required. Summary: If we only use the compiler front-end, do not throw an error about the cuda device library not being found. This allows the front-end to be run on systems where no Cuda installation is found. Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, tra Reviewed By: tra Subscribers: hfinkel, tra, cfe-commits Differential Revision: https://reviews.llvm.org/D37914 llvm-svn: 314217	2017-09-26 15:36:20 +00:00
Gheorghe-Teodor Bercea	20789a5f09	[OpenMP] Enable the existing nocudalib flag for OpenMP offloading toolchain. Summary: Enable the -nocudalib flag for the OpenMP device offloading toolchain as well. Currently it can only be used for the CUDA toolchain. Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, hfinkel, tra Reviewed By: tra Subscribers: hfinkel, cfe-commits Differential Revision: https://reviews.llvm.org/D37913 llvm-svn: 314164	2017-09-25 21:56:32 +00:00
Gheorghe-Teodor Bercea	5636f4b33a	[OpenMP] Bugfix: output file name drops the absolute path where full path is needed. Summary: When composing the output file name, the path to the file is being dropped. The full path is required. Reviewers: Hahnfeld, ABataev, caomhin, carlo.bertolli, hfinkel, tra Reviewed By: tra Subscribers: hfinkel, tra, cfe-commits Differential Revision: https://reviews.llvm.org/D37912 llvm-svn: 314156	2017-09-25 21:25:38 +00:00
Gheorghe-Teodor Bercea	d45720b55a	Revert commit with wrong message. llvm-svn: 314154	2017-09-25 21:22:49 +00:00
Gheorghe-Teodor Bercea	8cf757ceda	[OpenMP] Don't throw cudalib not found error if only front-end is required. Summary: If we only use the compiler front-end, do not throw an error about the cuda device library not being found. This allows the front-end to be run on systems where no Cuda installation is found. Reviewers: Hahnfeld, ABataev, carlo.bertolli, caomhin, tra Reviewed By: tra Subscribers: hfinkel, tra, cfe-commits Differential Revision: https://reviews.llvm.org/D37914 llvm-svn: 314150	2017-09-25 21:07:16 +00:00
Artem Belevich	4654dc89be	[NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins. Differential Revision: https://reviews.llvm.org/D38090 llvm-svn: 313820	2017-09-20 21:23:07 +00:00
Artem Belevich	8af4e23d1e	[CUDA] Added rudimentary support for CUDA-9 and sm_70. For now CUDA-9 is not included in the list of CUDA versions clang searches for, so the path to CUDA-9 must be explicitly passed via --cuda-path=. On LLVM side NVPTX added sm_70 GPU type which bumps required PTX version to 6.0, but otherwise is equivalent to sm_62 at the moment. Differential Revision: https://reviews.llvm.org/D37576 llvm-svn: 312734	2017-09-07 18:14:32 +00:00
Gheorghe-Teodor Bercea	9c52574886	[OpenMP] Enable previously successful offloading tests. Create a separate test file to contain all tests for OpenMP offloading to GPUs. Make libdevice checking more robust by accounting for the case in which no libdevice is found. This changes are in connrection with diff: D29660 llvm-svn: 310718	2017-08-11 15:46:22 +00:00
Gheorghe-Teodor Bercea	14528c60ba	[OpenMP] Delete tests in openmp-offload.c which cuase failures until a better way to perform these tests is figured out. Change connected to diff: D29654 llvm-svn: 310625	2017-08-10 16:56:59 +00:00
Alex Lorenz	994f231792	Revert r310489 and follow-up commits r310505, r310519, r310537 and r310549 Commit r310489 caused 'openmp-offload.c' test failures on Darwin and other platforms: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/39230/testReport/junit/Clang/Driver/openmp_offload_c/ The follow-up commits tried to fix the test, but the test is still failing. llvm-svn: 310580	2017-08-10 10:34:46 +00:00
Gheorghe-Teodor Bercea	a659943306	[OpenMP] Provide a default GPU arch that is supported by the underlying hardware. This fixes a bug triggered by diff: D29660 llvm-svn: 310549	2017-08-10 05:01:42 +00:00
Gheorghe-Teodor Bercea	690f6f9b9f	[OpenMP] Enable executable lookup into driver directory. Summary: Invoking the compiler inside a script causes the clang-offload-bundler executable to not be found. This patch enables the lookup for executables in the driver directory where the clang-offload-bundler resides. Reviewers: hfinkel, carlo.bertolli, arpith-jacob, ABataev, caomhin Reviewed By: hfinkel Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D36537 llvm-svn: 310513	2017-08-09 19:52:28 +00:00
Gheorghe-Teodor Bercea	6b26dcb6d6	[OpenMP] Add flag for overwriting default PTX version for OpenMP targets Summary: This flag "--fopenmp-ptx=" enables the overwriting of the default PTX version used for GPU offloaded OpenMP target regions: "+ptx42". Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, Hahnfeld, jlebar, hfinkel, tstellar Reviewed By: ABataev Subscribers: rengolin, cfe-commits Differential Revision: https://reviews.llvm.org/D29660 llvm-svn: 310489	2017-08-09 15:56:54 +00:00
Gheorghe-Teodor Bercea	0846582878	[OpenMP] Add flag for disabling the default generation of relocatable OpenMP target code for NVIDIA GPUs. Summary: Previously we have added the "-c" flag which gets passed to PTXAS by default to generate relocatable OpenMP target code by default. This set of flags exposes control over this behaviour. Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, Hahnfeld, jlebar, hfinkel, tstellar Reviewed By: ABataev Subscribers: Hahnfeld, rengolin, cfe-commits Differential Revision: https://reviews.llvm.org/D29659 llvm-svn: 310484	2017-08-09 15:27:39 +00:00
Gheorghe-Teodor Bercea	b9d117233f	[OpenMP] Make OpenMP generated code for the NVIDIA device relocatable by default Original Diff: D29642 This patch was previously reverted due to an error with patch D29654 that this depends on. llvm-svn: 310479	2017-08-09 14:59:35 +00:00
Gheorghe-Teodor Bercea	2c92693280	[OpenMP] OpenMP device offloading code generation produces a cubin file which is then integrated in the host binary using the host linker. Diff: D29654 llvm-svn: 310362	2017-08-08 14:33:05 +00:00
Alex Lorenz	7e9c478cda	Revert r310291, r310300 and r310332 because of test failure on Darwin The commit r310291 introduced the failure. r310332 was a test fix commit and r310300 was a followup commit. I reverted these two to avoid merge conflicts when reverting. The 'openmp-offload.c' test is failing on Darwin because the following run lines: // RUN: touch %t1.o // RUN: touch %t2.o // RUN: %clang -### -no-canonical-prefixes -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -save-temps -no-canonical-prefixes %t1.o %t2.o 2>&1 \ // RUN: \| FileCheck -check-prefix=CHK-TWOCUBIN %s trigger the following assertion: Driver.cpp:3418: assert(CachedResults.find(ActionTC) != CachedResults.end() && "Result does not exist??"); llvm-svn: 310345	2017-08-08 11:20:17 +00:00
Gheorghe-Teodor Bercea	ceb422236a	[OpenMP] Make OpenMP generated code for the NVIDIA device relocatable by default Summary: When device offloading is enabled and the device is an NVIDIA GPU, OpenMP target regions must be compiled with relocation enabled by passing the "-c" flag to the PTXAS invocation. Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, Hahnfeld, jlebar, hfinkel, tstellar Reviewed By: Hahnfeld Subscribers: Hahnfeld, rengolin, mkuron, cfe-commits Differential Revision: https://reviews.llvm.org/D29642 llvm-svn: 310300	2017-08-07 20:31:51 +00:00
Gheorghe-Teodor Bercea	53431bc046	[OpenMP] Pass -v to PTXAS if it was passed to the driver. Summary: When compiling code being offloaded by OpenMP to an NVIDIA GPU, pass the -v to PTXAS if it was passed to the CLANG driver. Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, jlebar, hfinkel, tstellar Reviewed By: jlebar Subscribers: Hahnfeld, rengolin, cfe-commits Differential Revision: https://reviews.llvm.org/D29644 llvm-svn: 310295	2017-08-07 20:19:23 +00:00
Gheorghe-Teodor Bercea	4cdba82ee0	[OpenMP] Integrate OpenMP target region cubin into host binary Summary: OpenMP device offloading code generation produces a cubin file which is then integrated in the host binary using the host linker. Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, Hahnfeld, jlebar, rnk, hfinkel, tstellar Reviewed By: hfinkel Subscribers: sfantao, rnk, rengolin, cfe-commits Differential Revision: https://reviews.llvm.org/D29654 llvm-svn: 310291	2017-08-07 20:01:48 +00:00
Gheorghe-Teodor Bercea	47e0cf378c	[OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading Summary: OpenMP has the ability to offload target regions to devices which may have different architectures. A new -fopenmp-target-arch flag is introduced to specify the device architecture. In this patch I use the new flag to specify the compute capability of the underlying NVIDIA architecture for the OpenMP offloading CUDA tool chain. Only a host-offloading test is provided since full device offloading capability will only be available when [[ https://reviews.llvm.org/D29654 \| D29654 ]] lands. Reviewers: hfinkel, Hahnfeld, carlo.bertolli, caomhin, ABataev Reviewed By: hfinkel Subscribers: guansong, cfe-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D34784 llvm-svn: 310263	2017-08-07 15:39:11 +00:00
Gheorghe-Teodor Bercea	f0f29608d0	[OpenMP] Extend CLANG target options with device offloading kind. Summary: Pass the type of the device offloading when building the tool chain for a particular target architecture. This is required when supporting multiple tool chains that target a single device type. In our particular use case, the OpenMP and CUDA tool chains will use the same ```addClangTargetOptions ``` method. This enables the reuse of common options and ensures control over options only supported by a particular tool chain. Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, jlebar, hfinkel, tstellar, Hahnfeld Reviewed By: hfinkel Subscribers: jgravelle-google, aheejin, rengolin, jfb, dschuff, sbc100, cfe-commits Differential Revision: https://reviews.llvm.org/D29647 llvm-svn: 307272	2017-07-06 16:22:21 +00:00
David L. Jones	f561abab56	[Driver] Consolidate tools and toolchains by target platform. (NFC) Summary: (This is a move-only refactoring patch. There are no functionality changes.) This patch splits apart the Clang driver's tool and toolchain implementation files. Each target platform toolchain is moved to its own file, along with the closest-related tools. Each target platform toolchain has separate headers and implementation files, so the hierarchy of classes is unchanged. There are some remaining shared free functions, mostly from Tools.cpp. Several of these move to their own architecture-specific files, similar to r296056. Some of them are only used by a single target platform; since the tools and toolchains are now together, some helpers now live in a platform-specific file. The balance are helpers related to manipulating argument lists, so they are now in a new file pair, CommonArgs.h and .cpp. I've tried to cluster the code logically, which is fairly straightforward for most of the target platforms and shared architectures. I think I've made reasonable choices for these, as well as the various shared helpers; but of course, I'm happy to hear feedback in the review. There are some particular things I don't like about this patch, but haven't been able to find a better overall solution. The first is the proliferation of files: there are several files that are tiny because the toolchain is not very different from its base (usually the Gnu tools/toolchain). I think this is mostly a reflection of the true complexity, though, so it may not be "fixable" in any reasonable sense. The second thing I don't like are the includes like "../Something.h". I've avoided this largely by clustering into the current file structure. However, a few of these includes remain, and in those cases it doesn't make sense to me to sink an existing file any deeper. Reviewers: rsmith, mehdi_amini, compnerd, rnk, javed.absar Subscribers: emaste, jfb, danalbert, srhines, dschuff, jyknight, nemanjai, nhaehnle, mgorny, cfe-commits Differential Revision: https://reviews.llvm.org/D30372 llvm-svn: 297250	2017-03-08 01:02:16 +00:00

43 Commits