llvm-project

Commit Graph

Author	SHA1	Message	Date
Joseph Huber	3e8d46921f	[Libomptarget] Stop testing CPU offloading with LTO Summary: Some of the buildbots don't find the libraries because they don't build for the GPU. Although it should always be there it's unclear why these buildbots are having problemsd. LTO is only interesting on the GPU and these tests take extra time anyway so I'm just going to disable them for now.	2022-07-21 16:47:41 -04:00
John Ericson	07b749800c	[cmake] Don't export `LLVM_TOOLS_INSTALL_DIR` anymore First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as `CMAKE_INSTALL_BINDIR` becomes an absolute path, and then when downstream projects try to install there too this breaks because our builds always install to fresh directories for isolation's sake. Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the other specially crafted `LLVM_CONFIG_*` variables substituted in `llvm/cmake/modules/LLVMConfig.cmake.in`. @beanz added it in `d0e1c2a550` to fix a dangling reference in `AddLLVM`, but I am suspicious of how this variable doesn't follow the pattern. Those other ones are carefully made to be build-time vs install-time variables depending on which `LLVMConfig.cmake` is being generated, are carefully made relative as appropriate, etc. etc. For my NixOS use-case they are also fine because they are never used as downstream install variables, only for reading not writing. To avoid the problems I face, and restore symmetry, I deleted the exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s. `AddLLVM` now instead expects each project to define its own, and they do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports `LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in the usual way, matching the other remaining exported variables. For the `AddLLVM` changes, I tried to copy the existing pattern of internal vs non-internal or for LLVM vs for downstream function/macro names, but it would good to confirm I did that correctly. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D117977	2022-07-21 19:04:00 +00:00
Johannes Doerfert	d150152615	[OpenMP] Introduce more fine-grained control over the thread state use We can help optimizations by making sure we use the team state whenever it is clear there is no thread state. To this end we introduce a new state flag (`state::HasThreadState`) and explicit control for the `state::ValueRAII` helpers, including a dedicated "assert equal". Differential Revision: https://reviews.llvm.org/D130113	2022-07-21 12:30:38 -05:00
Johannes Doerfert	7472b42b78	[OpenMP] Use Undef instead of null as pointer for inactive lanes Our conditional writes in the runtime look like this: ``` if (active) *ptr = value; ``` In the RAII we need to assign `ptr` which comes from a lookup call. If a thread that is not the main thread calls lookup with the intention to write the pointer, we'll create a new thread state. As such, we need to avoid calling lookup for inactive threads. We used to use `nullptr` as their `ptr` value but that can cause pessimistic reasoning. We now use `undef` instead. Differential Revision: https://reviews.llvm.org/D130114	2022-07-21 12:28:45 -05:00
Johannes Doerfert	a42361dc1c	[OpenMP] Expose the state in the header to allow non-lto optimizations We used to inline the `lookup` calls such that the runtime had "known" access offsets when it was shipped. With the new static library build it doesn't as the lookup is an indirection we cannot look through. This should help us optimize the code better until we can do LTO for the runtime again. Differential Revision: https://reviews.llvm.org/D130111	2022-07-21 12:28:44 -05:00
Joseph Huber	e01ce4e88a	[Libomptarget] Add checks for CUDA subarchitecture using new info This patch extends the `is_valid_binary` routine to also check if the binary's architecture string matches the one parsed from the runtime. This should allow us to only use the binary whose compute capability matches, allowing us to support basic multi-architecture binaries for CUDA. Depends on D127432 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D127505	2022-07-21 13:20:06 -04:00
Joseph Huber	fbcb1ee7f3	[Libomptarget] Add support for offloading binaries in libomptarget The previous path changed the linker wrapper to embed the offloading binary format inside the target image instead. This will allow us to more generically bundle metadata with these images, such as requires clauses or the target architecture it was compiled for. I wasn't sure how to handle this best, so I introduced a new type that replaces the old `__tgt_device_image` struct that we can expand inside the runtime library. I made the new `__tgt_device_binary` struct pretty much the same for now. In the future we could change this struct to pretty much be the `OffloadBinary` class in the future. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D127432	2022-07-21 13:20:04 -04:00
Joseph Huber	5d8a76feb0	[Libomptarget] Build the device library even if the sm list is empty We previously had some logic that stopped us from building the device runtime if there were no NVPTX architectures provided. This is incorrect because we could have AMDGPU libraries. Even if the lists are empty we should be able to attempt to build these and get dummy output. THis wilil make it much easier for our tooling which expects certain libraries. If the user wishes to disable the library entirely they should use `-DLIBOMPTARGET_BUILD_DEVICERTL_BCLIB=OFF" Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D130266	2022-07-21 10:57:47 -04:00
Joseph Huber	dc52712a06	[Libomptarget] Make libomptarget an LLVM library This patch makes libomptarget depend on LLVM libraries to be built. The reason for this is because we already have an implicit dependency on LLVM headers for ELF identification and extraction as well as an optional dependenly on the LLVMSupport library for time tracing information. Furthermore, there are changes in the future that require using more LLVM libraries, and will heavily simplify some future code as well as open up the large amount of useful LLVM libraries to libomptarget. This will make "standalone" builds of `libomptarget' more difficult for vendors wishing to ship their own. This will require a sufficiently new version of LLVM to be installed on the system that should be picked up by the existing handling for the implicit headers. The things this patch changes are as follows: - `libomptarget.so` links against LLVMSupport and LLVMObject - `libomptarget.so` is a symbolic link to `libomptarget.so.15` - If using a shared library build, user applications will depend on LLVM libraries as well - We can now use LLVM resources in Libomptarget. Note that this patch only changes this to apply to libomptarget itself, not the plugins. Additional patches will be necessary for that. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D129875	2022-07-20 15:58:06 -04:00
Joseph Huber	b5b20164d2	Revert "[Libomptarget] Make libomptarget an LLVM library" This reverts commit `643dfd97d5`. This patch still makes the AMDGPU buildbots unhappy. Reverting for now until the AMD folks figure it out.	2022-07-20 10:18:55 -04:00
Joseph Huber	6b0db92bbd	[Libomptarget] Fix LTO command line in test Summary: The test passed -offload-lto instead of -foffload-lto.	2022-07-20 10:18:55 -04:00
Joseph Huber	643dfd97d5	[Libomptarget] Make libomptarget an LLVM library This patch makes libomptarget depend on LLVM libraries to be built. The reason for this is because we already have an implicit dependency on LLVM headers for ELF identification and extraction as well as an optional dependenly on the LLVMSupport library for time tracing information. Furthermore, there are changes in the future that require using more LLVM libraries, and will heavily simplify some future code as well as open up the large amount of useful LLVM libraries to libomptarget. This will make "standalone" builds of `libomptarget' more difficult for vendors wishing to ship their own. This will require a sufficiently new version of LLVM to be installed on the system that should be picked up by the existing handling for the implicit headers. The things this patch changes are as follows: - `libomptarget.so` links against LLVMSupport and LLVMObject - `libomptarget.so` is a symbolic link to `libomptarget.so.15` - If using a shared library build, user applications will depend on LLVM libraries as well - We can now use LLVM resources in Libomptarget. Note that this patch only changes this to apply to libomptarget itself, not the plugins. Additional patches will be necessary for that. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D129875	2022-07-20 09:52:09 -04:00
Jon Chesterfield	e46f727b38	Revert "[Libomptarget] Make libomptarget an LLVM library" This reverts commit `70039be627`.	2022-07-19 17:59:45 +01:00
Joseph Huber	70039be627	[Libomptarget] Make libomptarget an LLVM library This patch makes libomptarget depend on LLVM libraries to be built. The reason for this is because we already have an implicit dependency on LLVM headers for ELF identification and extraction as well as an optional dependenly on the LLVMSupport library for time tracing information. Furthermore, there are changes in the future that require using more LLVM libraries, and will heavily simplify some future code as well as open up the large amount of useful LLVM libraries to libomptarget. This will make "standalone" builds of `libomptarget' more difficult for vendors wishing to ship their own. This will require a sufficiently new version of LLVM to be installed on the system that should be picked up by the existing handling for the implicit headers. The things this patch changes are as follows: - `libomptarget.so` links against LLVMSupport and LLVMObject - `libomptarget.so` is a symbolic link to `libomptarget.so.15` - If using a shared library build, user applications will depend on LLVM libraries as well - We can now use LLVM resources in Libomptarget. Note that this patch only changes this to apply to libomptarget itself, not the plugins. Additional patches will be necessary for that. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D129875	2022-07-19 12:33:31 -04:00
Joseph Huber	cdea437057	[Libomptarget] Fix warnings on address space attributes The device runtime uses the address space attribute to control the placement of important constants on the GPU. The changes made in D126061 caused these to start emitting errors as they were not applied to the type. This patch fixes the issues to make the warnings go away. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D129896	2022-07-15 17:21:30 -04:00
Joseph Huber	1f940b69c3	[Libomptarget][NFC] Fix signed comparison warnings Summary: Non-functional change, just fixing some sign comparison warnings by making both match.	2022-07-15 13:22:55 -04:00
Joseph Huber	b1d574867d	[Libomptarget] Allow static assert to work on 32-bit systems Summary: We use a static assert to make sure that someone doesn't change the size of an argument struct without properly updating all the other logic. This originally only checked the size on a 64-bit system with 8-byte pointers, causing builds on 32-bit systems to fail. This patch allows either pointer size to work. Fixes #56486	2022-07-12 08:05:01 -04:00
Shilei Tian	e7d998e51e	[NFC][OpenMP][Offloading] Fix compilation warning caused by misuse of `static_cast`	2022-07-08 20:59:37 -04:00
Joseph Huber	269d5c16bc	[Libomptarget][NFC] Move legacy functions to a separate file This patch moves the old legacy interfaces into `libomptarget` to a separate file. These do not need to be included anywhere and are simply provided for backwards compatibility with the ABI. This cleans up the interface greatly. Depends on D128817 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D128818	2022-07-08 14:44:21 -04:00
Joseph Huber	c9353eb4bc	[Libomptarget] Use new tripcount argument in the runtime. The previous patch added an argument to the `__tgt_target_kernel` runtime function which includes the tripcount used for the loop clause. This was originally passed in via the `__kmpc_push_target_tripcount` function. Now we move this logic to the kernel launch itself and remove the need for the push function. Depends on D128816 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D128817	2022-07-08 14:44:19 -04:00
Joseph Huber	ad23e4d85f	[Libomptarget] Implement a unified kernel entry function This patch implements a unified kernel entry function that will be targeted from both teams and non-teams clauses. We introduce a new interface and make the old functions call in using the new one. A following patch will include the necessary changes to Clang to call these new functions instead. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D128549	2022-07-08 14:44:06 -04:00
Ye Luo	fca79b78c4	[libomptarget] compile DeviceRTL bc files with -O3 bc files of DeviceRTL are compiled with -O3, the same as the static library. Differential Revision: https://reviews.llvm.org/D129344	2022-07-08 10:00:26 -05:00
Joseph Huber	d27d0a673c	[Libomptarget][NFC] Make Libomptarget use the LLVM naming convention Libomptarget grew out of a project that was originally not in LLVM. As we develop libomptarget this has led to an increasingly large clash between the naming conventions used. This patch fixes most of the variable names that did not confrom to the LLVM standard, that is `VariableName` for variables and `functionName` for functions. This patch was primarily done using my editor's linting messages, if there are any issues I missed arising from the automation let me know. Reviewed By: saiislam Differential Revision: https://reviews.llvm.org/D128997	2022-07-05 14:53:38 -04:00
Shilei Tian	696bca9bb2	[NFC][OpenMP][CUDA] Remove unnecessary default label	2022-07-01 09:50:29 -04:00
Jose M Monsalve Diaz	616dd9ae14	[OpenMP] Implementing omp_get_device_num() This patch implements omp_get_device_num() in the host and the device. It uses the already existing getDeviceNum in the device config for the device. And in the host it uses the omp_get_num_devices(). Two simple tests added Differential Revision: https://reviews.llvm.org/D128347	2022-06-29 02:18:21 -05:00
Shilei Tian	2695e23ad9	[OpenMP][CUDA] Fix the issue that P2P memcpy doesn't work This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In addition, a matrix to track the states of each pair of devices is also added such that we only need to query and configure the device once. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122764	2022-06-28 15:32:03 -04:00
Joseph Huber	3351ae61d9	[Libomptarget] Remove duplicate data environment exit Summary: This patch removes a duplicated exit from the OpenMP data envrionment. We already have an RAII method that guards this environment so it is unnecessary.	2022-06-21 22:35:32 -04:00
Ye Luo	4d9499e8cc	[libomptarget] Make libomptarget.devicertl.a built in all cases. Make libomptarget.device.a built when using -DLLVM_ENABLE_PROJECTS=openmp Use add_custom_command. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D128130	2022-06-20 08:29:16 -05:00
Ye Luo	54b45afb59	[libomptarget]Add a trap for external omptarget from LLVM Old LLVM installation may expose its internal omptarget CMake target when being used by find_package(LLVM) and caused issues in the CMake of libomptarget that is being built. Trap the issue early. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D128129	2022-06-18 21:08:53 -05:00
Joseph Huber	d87ca519c9	[Libomptarget] Use binutils archive executable to address failing tests Summary: The static linking test ensures that we can statically link offloading programs. To create the test we used `llvm-ar`. However, this may not exist in the user's environment. This patch changes it to use the binutils `ar` which should exist on every system running these tests currently. In the future we should set up the dependencies properly.	2022-06-14 22:14:17 -04:00
Joseph Huber	d5d836635c	[Libomptarget] Add test config for compiling in LTO-mode We are planning on making LTO the default compilation mode for offloading. In order to make sure it works we should run these tests on the test suite. AMDGPU already uses the LTO compilation path for its linking, but in LTO mode it also links the static library late. Performing LTO requires the static library to be built, if we make the change this will be a hard requirement and the old bitcode library will go away. This means users will need to use either a two-step build or a runtimes build for libomptarget. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D127512	2022-06-14 10:16:03 -04:00
John Ericson	0bb317b7bf	Revert "[cmake] Don't export `LLVM_TOOLS_INSTALL_DIR` anymore" This reverts commit `d5daa5c5b0`.	2022-06-10 19:26:12 +00:00
John Ericson	d5daa5c5b0	[cmake] Don't export `LLVM_TOOLS_INSTALL_DIR` anymore First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as `CMAKE_INSTALL_BINDIR` becomes an absolute path, and then when downstream projects try to install there too this breaks because our builds always install to fresh directories for isolation's sake. Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the other specially crafted `LLVM_CONFIG_*` variables substituted in `llvm/cmake/modules/LLVMConfig.cmake.in`. @beanz added it in `d0e1c2a550` to fix a dangling reference in `AddLLVM`, but I am suspicious of how this variable doesn't follow the pattern. Those other ones are carefully made to be build-time vs install-time variables depending on which `LLVMConfig.cmake` is being generated, are carefully made relative as appropriate, etc. etc. For my NixOS use-case they are also fine because they are never used as downstream install variables, only for reading not writing. To avoid the problems I face, and restore symmetry, I deleted the exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s. `AddLLVM` now instead expects each project to define its own, and they do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports `LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in the usual way, matching the other remaining exported variables. For the `AddLLVM` changes, I tried to copy the existing pattern of internal vs non-internal or for LLVM vs for downstream function/macro names, but it would good to confirm I did that correctly. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D117977	2022-06-10 14:35:18 +00:00
Jose Manuel Monsalve Diaz	15ed5c0a07	[LIBOMPTARGET] Adding AMD to llvm-omp-device-info Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line tool. This commit adds missing HSA functions, enums and structs needed to query additional information from the HSA agents. A generic message for the `generic-elf-64bit` plugin is also added Example of an output: ``` llvm-omp-device-info Device (0): This is a generic-elf-64bit device Device (1): This is a generic-elf-64bit device Device (2): This is a generic-elf-64bit device Device (3): This is a generic-elf-64bit device Device (4): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 0 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (5): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 1 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (6): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 2 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (7): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 3 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE ``` Differential Revision: https://reviews.llvm.org/D126836	2022-06-09 11:58:39 +00:00
Jose Manuel Monsalve Diaz	84e020a061	Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info" This reverts commit `d16a0877d8`.	2022-06-09 10:46:03 +00:00
Jose Manuel Monsalve Diaz	d16a0877d8	[LIBOMPTARGET] Adding AMD to llvm-omp-device-info Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line tool. This commit adds missing HSA functions, enums and structs needed to query additional information from the HSA agents. A generic message for the `generic-elf-64bit` plugin is also added Example of an output: ``` llvm-omp-device-info Device (0): This is a generic-elf-64bit device Device (1): This is a generic-elf-64bit device Device (2): This is a generic-elf-64bit device Device (3): This is a generic-elf-64bit device Device (4): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 0 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (5): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 1 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (6): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 2 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (7): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 3 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE ``` Differential Revision: https://reviews.llvm.org/D126836	2022-06-08 16:31:12 +00:00
Joseph Huber	86a4c78047	[Libomptarget] Add missing include to define `printf` Summary: This test was failing because of an implicit declaration of `printf` which isn't legal with newer C, causing it to fail. This patch just adds the necessary header.	2022-06-08 09:56:51 -04:00
Joseph Huber	421b1f55c6	[Libomptarget] Do not use retaining attributes for the static library When we build the libomptarget device runtime library targeting bitcode, we need special care to make sure that certain functions are not optimized out. This is because we manually internalize and optimize these definitions, ignoring their standard linkage semantics. When we build with the static library, we can maintain these semantics and we do not need these to be kept-alive. Furthermore, if they are kept-alive it prevents them from being removed during LTO. This prevents us from completely internalizing `IsSPMDMode` and removing several other functions. This patch removes these for the static library target by using a macro definition to enable them. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D126701	2022-06-07 12:16:34 -04:00
Joseph Huber	f4f23de1a4	[Libomptarget] Add basic support for dynamic shared memory on AMDGPU This patchs adds the arguments necessary to allocate the size of the dynamic shared memory via the `LIBOMPTARGET_SHARED_MEMORY_SIZE` environment variable. This patch only allocates the memory, AMDGPU has a limitation that shared memory can only be accessed from the kernel directly. So this will currently only work with optimizations to inline the accessor function. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D125252	2022-06-01 13:32:50 -04:00
Joseph Huber	ae76652677	Revert "[Libomptarget] Add `leaf` attribute to `vprintf` declaration" This is preventing users from calling `printf` on NVPTX code. Revert for now until there is a fix. This reverts commit `eda4ef3add`.	2022-05-31 10:24:04 -04:00
Joel E. Denny	d2e3cb7374	[OpenMP][Clang] Fix atomic compare for signed vs. unsigned Without this patch, arguments to the `llvm::OpenMPIRBuilder::AtomicOpValue` initializer are reversed. Reviewed By: ABataev, tianshilei1992 Differential Revision: https://reviews.llvm.org/D126619	2022-05-30 11:02:20 -04:00
Joel E. Denny	48ca3a5ebb	[OpenMP] Extend omp teams to permit nested omp atomic OpenMP 5.2, sec. 10.2 "teams Construct", p. 232, L9-12 restricts what regions can be strictly nested within a `teams` construct. This patch relaxes Clang's enforcement of this restriction in the case of nested `atomic` constructs unless `-fno-openmp-extensions` is specified. Cases like the following then seem to work fine with no additional implementation changes: ``` #pragma omp target teams map(tofrom:x) #pragma omp atomic update x++; ``` Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D126323	2022-05-26 14:59:16 -04:00
Joseph Huber	20ec4161d7	[Libomptarget] Add branch prediction intrinsic to state check Summary: We usually used the `OMP_LIKELY` and `OMP_UNLIKELY` macros to add branch prediction intrinsics to help the optimizer ignore unlikely loops. This wasn't applied to this one loop so add that in.	2022-05-20 15:38:54 -04:00
Joseph Huber	eda4ef3add	[Libomptarget] Add `leaf` attribute to `vprintf` declaration Summary: This patch adds the `leaf` attribute to the `vprintf` declaration in the OpenMP runtime. This attribute allows us to determine that the `vprintf` function will not call any functions within the translation unit, allowing us to deduce `norecurse` attributes on the caller.	2022-05-19 14:22:53 -04:00
Joseph Huber	5ffecd28c9	[Libomptarget] Don't build the device runtime without a new Clang The OpenMP device offloading library is a bitcode library and thus only expect to build and linked with the same version of clang that was used to create it. This somewhat copmlicates the building process as we require the Clang that was just built to be used to create the library. This is either done with a two-step build, where OpenMP is built with the Clang that was just installed, or through the `-DLLLVM_ENABLE_RUNTIMES=openmp` option. This has always been the case, but recent changes have caused this to make it difficult to build the rest of OpenMP. This patchs adds a check to not build the OpenMP device runtime if the current compiler is not Clang with the same version as the LLVM installation. This should allow users to build OpenMP as a project using any compiler without it erroring out due to the bitcode library, but if users require it they will need to use the above methods to compile it. Reviewed By: jdoerfert, tianshilei1992, ye-luo Differential Revision: https://reviews.llvm.org/D125698	2022-05-16 18:18:32 -04:00
Joseph Huber	54e02179b3	[Libomptarget] Build the static library without CUDA installed Summary: This patch allows users to compile the static library without CUDA installed on the system. This requires the new flag `--cuda-feature` to indicate that we need `+ptx61` in order to compile the runtime.	2022-05-13 16:30:58 -04:00
Joseph Huber	16b7a0b43b	[Libomptarget] Build the device runtime as a static library This patch adds the necessary CMake configuration to build a static library version of the device runtime, `libomptarget.devicertl.a`. Various improvements in how we handle static libraries and generating offloading code should allow us to treat the device library as a regular project without needing to invoke the clang front-end directly. Here we generate a job for each offloading architecture supported. Each offloading architecture will be embedded into the static library and used as-needed by the host. This library will primarily be used to replace the bitcode library when performing LTO. Currently, we need to manually pass in the bitcode library which requires foreknowledge of the offloading architecture. This approach lets us handle that in the linker wrapper instead. Furthermore this should improve our interface to the device runtime. We can now build it fully under a release build and have all the expected entry points, as well as supporting debug builds. Depends on D125265 D125256 D125260 D125314 D125563 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D125315	2022-05-13 14:38:51 -04:00
Joseph Huber	9ffa945c40	[Libomptarget] Remove global include directory from libomptarget We used to globally include the libomptarget include directory for all projects. This caused some conflicts with the other files named "Debug.h". This patch changes the cmake to include these files via the target include instead. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D125563	2022-05-13 14:38:47 -04:00
Joseph Huber	ce0caf41bd	[Libomptarget] Address existing warnings in the device runtime library This patche attemps to address the current warnings in the OpenMP offloading device runtime. Previously we did not see these because we compiled the runtime without the standard warning flags enabled. However, these warnings are used when we now build the static library version of this runtime. This became extremely noisy when coupled with the fact the we compile each file roughly 32 times when all the architectures are considered. So it would be ideal to not have all these warnings show up when building. Most of these errors were simply implicit switch-case fallthroughs, which can be addressed using C++17's fallthrough attribute. Additionally there was a volatile variable that was being casted away. This is most likely safe to remove because we cast it away before its even used and didn't seem to affect anything in testing. Depends on D125260 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D125339	2022-05-13 14:38:31 -04:00
Joseph Huber	b4f8443d97	[Libomptarget] Allow the device runtime to be compiled for the host Currently the OpenMP offloading device runtime is only expected to be compiled for the specific architecture it's targeting. This is problematic if we want to make compiling the device runtime more general via the standar `clang` driver rather than invoking the clang front-end directly. This patch addresses this by primarily changing the declare type to `nohost` so the host will not contain any of this code. Additionally we forward declare the functions that are defined via variants, otherwise these would cause problems on the host. Reviewed By: jdoerfert, tianshilei1992 Differential Revision: https://reviews.llvm.org/D125260	2022-05-13 14:38:27 -04:00

1 2 3 4 5 ...

957 Commits