llvm-project

Commit Graph

Author	SHA1	Message	Date
Jose M Monsalve Diaz	616dd9ae14	[OpenMP] Implementing omp_get_device_num() This patch implements omp_get_device_num() in the host and the device. It uses the already existing getDeviceNum in the device config for the device. And in the host it uses the omp_get_num_devices(). Two simple tests added Differential Revision: https://reviews.llvm.org/D128347	2022-06-29 02:18:21 -05:00
Shilei Tian	2695e23ad9	[OpenMP][CUDA] Fix the issue that P2P memcpy doesn't work This patch fixes the issue that P2P memcpy doesn't work. The root cause is we didn't set current context when calling the API function. In addition, a matrix to track the states of each pair of devices is also added such that we only need to query and configure the device once. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D122764	2022-06-28 15:32:03 -04:00
Daniel Douglas	d4a7b8de52	[OpenMP][libomp] avoid spin wait and yield on arm64 macOS This patch changes the default behavior to avoid spin waiting and yielding. (See “Don’t Keep Threads Active And Idle” section here: https://developer.apple.com/documentation/apple-silicon/tuning-your-code-s-performance-for-apple-silicon) We verified using instruments traces that the changes improve scheduling behavior on macOS. We also collected results using EPCC schedbench (https://github.com/LangdalP/EPCC-OpenMP-micro-benchmarks) that are attached here that show a reduction in standard deviation and max test run time across all scheduling types. Static scheduling sees dramatic improvements with these changes, we see a 2-4x average runtime improvement in the benchmark. Differential Revision: https://reviews.llvm.org/D126510	2022-06-24 12:02:16 -05:00
Jonathan Peyton	b7b4986576	[OpenMP][libomp] Hold old __kmp_threads arrays until library shutdown When many nested teams are formed, __kmp_threads may be reallocated to accommodate new threads. This reallocation causes a data race when another existing team's thread simultaneously references __kmp_threads. This patch keeps the old thread arrays around until library shutdown so these lingering references can complete without issue and access to __kmp_threads remains a simple array reference. Fixes: https://github.com/llvm/llvm-project/issues/54708 Differential Revision: https://reviews.llvm.org/D125013	2022-06-22 10:30:35 -05:00
Joseph Huber	3351ae61d9	[Libomptarget] Remove duplicate data environment exit Summary: This patch removes a duplicated exit from the OpenMP data envrionment. We already have an RAII method that guards this environment so it is unnecessary.	2022-06-21 22:35:32 -04:00
Ye Luo	4d9499e8cc	[libomptarget] Make libomptarget.devicertl.a built in all cases. Make libomptarget.device.a built when using -DLLVM_ENABLE_PROJECTS=openmp Use add_custom_command. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D128130	2022-06-20 08:29:16 -05:00
Ye Luo	54b45afb59	[libomptarget]Add a trap for external omptarget from LLVM Old LLVM installation may expose its internal omptarget CMake target when being used by find_package(LLVM) and caused issues in the CMake of libomptarget that is being built. Trap the issue early. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D128129	2022-06-18 21:08:53 -05:00
Joseph Huber	d87ca519c9	[Libomptarget] Use binutils archive executable to address failing tests Summary: The static linking test ensures that we can statically link offloading programs. To create the test we used `llvm-ar`. However, this may not exist in the user's environment. This patch changes it to use the binutils `ar` which should exist on every system running these tests currently. In the future we should set up the dependencies properly.	2022-06-14 22:14:17 -04:00
Joseph Huber	d5d836635c	[Libomptarget] Add test config for compiling in LTO-mode We are planning on making LTO the default compilation mode for offloading. In order to make sure it works we should run these tests on the test suite. AMDGPU already uses the LTO compilation path for its linking, but in LTO mode it also links the static library late. Performing LTO requires the static library to be built, if we make the change this will be a hard requirement and the old bitcode library will go away. This means users will need to use either a two-step build or a runtimes build for libomptarget. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D127512	2022-06-14 10:16:03 -04:00
John Ericson	0bb317b7bf	Revert "[cmake] Don't export `LLVM_TOOLS_INSTALL_DIR` anymore" This reverts commit `d5daa5c5b0`.	2022-06-10 19:26:12 +00:00
John Ericson	d5daa5c5b0	[cmake] Don't export `LLVM_TOOLS_INSTALL_DIR` anymore First of all, `LLVM_TOOLS_INSTALL_DIR` put there breaks our NixOS builds, because `LLVM_TOOLS_INSTALL_DIR` defined the same as `CMAKE_INSTALL_BINDIR` becomes an absolute path, and then when downstream projects try to install there too this breaks because our builds always install to fresh directories for isolation's sake. Second of all, note that `LLVM_TOOLS_INSTALL_DIR` stands out against the other specially crafted `LLVM_CONFIG_*` variables substituted in `llvm/cmake/modules/LLVMConfig.cmake.in`. @beanz added it in `d0e1c2a550` to fix a dangling reference in `AddLLVM`, but I am suspicious of how this variable doesn't follow the pattern. Those other ones are carefully made to be build-time vs install-time variables depending on which `LLVMConfig.cmake` is being generated, are carefully made relative as appropriate, etc. etc. For my NixOS use-case they are also fine because they are never used as downstream install variables, only for reading not writing. To avoid the problems I face, and restore symmetry, I deleted the exported and arranged to have many `${project}_TOOLS_INSTALL_DIR`s. `AddLLVM` now instead expects each project to define its own, and they do so based on `CMAKE_INSTALL_BINDIR`. `LLVMConfig` still exports `LLVM_TOOLS_BINARY_DIR` which is the location for the tools defined in the usual way, matching the other remaining exported variables. For the `AddLLVM` changes, I tried to copy the existing pattern of internal vs non-internal or for LLVM vs for downstream function/macro names, but it would good to confirm I did that correctly. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D117977	2022-06-10 14:35:18 +00:00
Yuki Okushi	074f12e467	[OpenMP] Fix the build on Windows The code expanded from kmp_barrier.h uses some `KMP_INTERNAL_*`s, so the definitions have to be placed before it. Fixes #55815 Differential Revision: https://reviews.llvm.org/D126873	2022-06-09 22:12:42 +09:00
Jose Manuel Monsalve Diaz	15ed5c0a07	[LIBOMPTARGET] Adding AMD to llvm-omp-device-info Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line tool. This commit adds missing HSA functions, enums and structs needed to query additional information from the HSA agents. A generic message for the `generic-elf-64bit` plugin is also added Example of an output: ``` llvm-omp-device-info Device (0): This is a generic-elf-64bit device Device (1): This is a generic-elf-64bit device Device (2): This is a generic-elf-64bit device Device (3): This is a generic-elf-64bit device Device (4): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 0 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (5): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 1 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (6): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 2 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (7): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 3 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE ``` Differential Revision: https://reviews.llvm.org/D126836	2022-06-09 11:58:39 +00:00
Jose Manuel Monsalve Diaz	84e020a061	Revert "[LIBOMPTARGET] Adding AMD to llvm-omp-device-info" This reverts commit `d16a0877d8`.	2022-06-09 10:46:03 +00:00
Jose Manuel Monsalve Diaz	d16a0877d8	[LIBOMPTARGET] Adding AMD to llvm-omp-device-info Adding device information print for AMD devices on the `llvm-omp-device-info` command line tool. The output is inspired by the rocminfo command line tool. This commit adds missing HSA functions, enums and structs needed to query additional information from the HSA agents. A generic message for the `generic-elf-64bit` plugin is also added Example of an output: ``` llvm-omp-device-info Device (0): This is a generic-elf-64bit device Device (1): This is a generic-elf-64bit device Device (2): This is a generic-elf-64bit device Device (3): This is a generic-elf-64bit device Device (4): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 0 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (5): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 1 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (6): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 2 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE Device (7): HSA Runtime Version: 1.1 HSA OpenMP Device Number: 3 Device Name: gfx906 Vendor Name: AMD Device Type: GPU Max Queues: 128 Queue Min Size: 64 Queue Max Size: 131072 Cache: L0: 16384 bytes L1: 8388608 bytes Cacheline Size: 64 Max Clock Freq(MHz): 1725 Compute Units: 60 SIMD per CU: 4 Fast F16 Operation: TRUE Wavefront Size: 64 Workgroup Max Size: 1024 Workgroup Max Size per Dimension: x: 1024 y: 1024 z: 1024 Max Waves Per CU: 40 Max Work-item Per CU: 2560 Grid Max Size: 4294967295 Grid Max Size per Dimension: x: 4294967295 y: 4294967295 z: 4294967295 Max fbarriers/Workgrp: 32 Memory Pools: Pool GLOBAL; FLAGS: COARSE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GLOBAL; FLAGS: FINE GRAINED, : Size: 34342961152 bytes Allocatable: TRUE Runtime Alloc Granule: 4096 bytes Runtime Alloc alignment: 4096 bytes Accessable by all: FALSE Pool GROUP: Size: 65536 bytes Allocatable: FALSE Runtime Alloc Granule: 0 bytes Runtime Alloc alignment: 0 bytes Accessable by all: FALSE ``` Differential Revision: https://reviews.llvm.org/D126836	2022-06-08 16:31:12 +00:00
Joseph Huber	86a4c78047	[Libomptarget] Add missing include to define `printf` Summary: This test was failing because of an implicit declaration of `printf` which isn't legal with newer C, causing it to fail. This patch just adds the necessary header.	2022-06-08 09:56:51 -04:00
Joseph Huber	421b1f55c6	[Libomptarget] Do not use retaining attributes for the static library When we build the libomptarget device runtime library targeting bitcode, we need special care to make sure that certain functions are not optimized out. This is because we manually internalize and optimize these definitions, ignoring their standard linkage semantics. When we build with the static library, we can maintain these semantics and we do not need these to be kept-alive. Furthermore, if they are kept-alive it prevents them from being removed during LTO. This prevents us from completely internalizing `IsSPMDMode` and removing several other functions. This patch removes these for the static library target by using a macro definition to enable them. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D126701	2022-06-07 12:16:34 -04:00
Vadim Paretsky	f58fe2e186	[OpenMP] allow loc to be NULL in __kmp_determine_reduction_method for MSVC MSVC may not supply source location information to kmpc_reduce passing NULL for the value. The patch adds a check for the loc value being NULL in kmp_determine_reduction_method. Differential Revision: https://reviews.llvm.org/D126564	2022-06-03 14:11:39 -05:00
Daniel Douglas	5d25dbff67	[OpenMP][libomp] do not try to dlopen libmemkind on macOS The memkind library is only available for linux. Calling dlopen here can also be problematic in a client app that fork'ed. Differential Revision: https://reviews.llvm.org/D126579	2022-06-02 14:28:09 -05:00
David CARLIER	2ba5d820e2	[OpenMP] omp_get_proc_id uses sched_getcpu fallback on FreeBSD 13.1 and above. Reviewers: jlpeyton, jdoerfert Reviewed-By: jlpeyton Differential-Revision: https://reviews.llvm.org/D126408	2022-06-02 17:10:29 +01:00
Mikael Simberg	e27ce28139	[OpenMP][libomp] Make LIBOMP_CONFIGURED_LIBFLAGS a list instead of string When configuring llvm with the openmp subproject, the build for the omp target fails if LIBOMP_CONFIGURED_LIBFLAGS contains more than one item. LIBOMP_CONFIGURED_LIBFLAGS should be a semicolon-separated list instead of a string with items separated by spaces. Differential Revision: https://reviews.llvm.org/D125370	2022-06-02 10:50:21 -05:00
Joseph Huber	f4f23de1a4	[Libomptarget] Add basic support for dynamic shared memory on AMDGPU This patchs adds the arguments necessary to allocate the size of the dynamic shared memory via the `LIBOMPTARGET_SHARED_MEMORY_SIZE` environment variable. This patch only allocates the memory, AMDGPU has a limitation that shared memory can only be accessed from the kernel directly. So this will currently only work with optimizations to inline the accessor function. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D125252	2022-06-01 13:32:50 -04:00
Joseph Huber	ae76652677	Revert "[Libomptarget] Add `leaf` attribute to `vprintf` declaration" This is preventing users from calling `printf` on NVPTX code. Revert for now until there is a fix. This reverts commit `eda4ef3add`.	2022-05-31 10:24:04 -04:00
Joel E. Denny	d2e3cb7374	[OpenMP][Clang] Fix atomic compare for signed vs. unsigned Without this patch, arguments to the `llvm::OpenMPIRBuilder::AtomicOpValue` initializer are reversed. Reviewed By: ABataev, tianshilei1992 Differential Revision: https://reviews.llvm.org/D126619	2022-05-30 11:02:20 -04:00
Joel E. Denny	4a36813669	[OpenACC][OpenMP] Document atomic-in-teams extension That is, put D126323 in the status doc and explain its relationship to OpenACC support. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D126547	2022-05-27 18:53:19 -04:00
Joel E. Denny	48ca3a5ebb	[OpenMP] Extend omp teams to permit nested omp atomic OpenMP 5.2, sec. 10.2 "teams Construct", p. 232, L9-12 restricts what regions can be strictly nested within a `teams` construct. This patch relaxes Clang's enforcement of this restriction in the case of nested `atomic` constructs unless `-fno-openmp-extensions` is specified. Cases like the following then seem to work fine with no additional implementation changes: ``` #pragma omp target teams map(tofrom:x) #pragma omp atomic update x++; ``` Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D126323	2022-05-26 14:59:16 -04:00
Joseph Huber	20ec4161d7	[Libomptarget] Add branch prediction intrinsic to state check Summary: We usually used the `OMP_LIKELY` and `OMP_UNLIKELY` macros to add branch prediction intrinsics to help the optimizer ignore unlikely loops. This wasn't applied to this one loop so add that in.	2022-05-20 15:38:54 -04:00
Jonathan Peyton	f613e6d19d	[OpenMP][libomp] Fix accidental removal of else for core attributes	2022-05-19 14:00:27 -05:00
Joseph Huber	eda4ef3add	[Libomptarget] Add `leaf` attribute to `vprintf` declaration Summary: This patch adds the `leaf` attribute to the `vprintf` declaration in the OpenMP runtime. This attribute allows us to determine that the `vprintf` function will not call any functions within the translation unit, allowing us to deduce `norecurse` attributes on the caller.	2022-05-19 14:22:53 -04:00
AndreyChurbanov	c44ba01de7	[OpenMP] libomp: honor passive wait policy requested with tasking Currently the library ignores requested wait policy in the presence of tasking. Threads always actively spin. The patch fixes this problem making the wait policy passive if this explicitly requested by user. Differential Revision: https://reviews.llvm.org/D123044	2022-05-18 10:04:30 -05:00
Joseph Huber	5ffecd28c9	[Libomptarget] Don't build the device runtime without a new Clang The OpenMP device offloading library is a bitcode library and thus only expect to build and linked with the same version of clang that was used to create it. This somewhat copmlicates the building process as we require the Clang that was just built to be used to create the library. This is either done with a two-step build, where OpenMP is built with the Clang that was just installed, or through the `-DLLLVM_ENABLE_RUNTIMES=openmp` option. This has always been the case, but recent changes have caused this to make it difficult to build the rest of OpenMP. This patchs adds a check to not build the OpenMP device runtime if the current compiler is not Clang with the same version as the LLVM installation. This should allow users to build OpenMP as a project using any compiler without it erroring out due to the bitcode library, but if users require it they will need to use the above methods to compile it. Reviewed By: jdoerfert, tianshilei1992, ye-luo Differential Revision: https://reviews.llvm.org/D125698	2022-05-16 18:18:32 -04:00
Joseph Huber	54e02179b3	[Libomptarget] Build the static library without CUDA installed Summary: This patch allows users to compile the static library without CUDA installed on the system. This requires the new flag `--cuda-feature` to indicate that we need `+ptx61` in order to compile the runtime.	2022-05-13 16:30:58 -04:00
Joseph Huber	16b7a0b43b	[Libomptarget] Build the device runtime as a static library This patch adds the necessary CMake configuration to build a static library version of the device runtime, `libomptarget.devicertl.a`. Various improvements in how we handle static libraries and generating offloading code should allow us to treat the device library as a regular project without needing to invoke the clang front-end directly. Here we generate a job for each offloading architecture supported. Each offloading architecture will be embedded into the static library and used as-needed by the host. This library will primarily be used to replace the bitcode library when performing LTO. Currently, we need to manually pass in the bitcode library which requires foreknowledge of the offloading architecture. This approach lets us handle that in the linker wrapper instead. Furthermore this should improve our interface to the device runtime. We can now build it fully under a release build and have all the expected entry points, as well as supporting debug builds. Depends on D125265 D125256 D125260 D125314 D125563 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D125315	2022-05-13 14:38:51 -04:00
Joseph Huber	9ffa945c40	[Libomptarget] Remove global include directory from libomptarget We used to globally include the libomptarget include directory for all projects. This caused some conflicts with the other files named "Debug.h". This patch changes the cmake to include these files via the target include instead. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D125563	2022-05-13 14:38:47 -04:00
Joseph Huber	ce0caf41bd	[Libomptarget] Address existing warnings in the device runtime library This patche attemps to address the current warnings in the OpenMP offloading device runtime. Previously we did not see these because we compiled the runtime without the standard warning flags enabled. However, these warnings are used when we now build the static library version of this runtime. This became extremely noisy when coupled with the fact the we compile each file roughly 32 times when all the architectures are considered. So it would be ideal to not have all these warnings show up when building. Most of these errors were simply implicit switch-case fallthroughs, which can be addressed using C++17's fallthrough attribute. Additionally there was a volatile variable that was being casted away. This is most likely safe to remove because we cast it away before its even used and didn't seem to affect anything in testing. Depends on D125260 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D125339	2022-05-13 14:38:31 -04:00
Joseph Huber	b4f8443d97	[Libomptarget] Allow the device runtime to be compiled for the host Currently the OpenMP offloading device runtime is only expected to be compiled for the specific architecture it's targeting. This is problematic if we want to make compiling the device runtime more general via the standar `clang` driver rather than invoking the clang front-end directly. This patch addresses this by primarily changing the declare type to `nohost` so the host will not contain any of this code. Additionally we forward declare the functions that are defined via variants, otherwise these would cause problems on the host. Reviewed By: jdoerfert, tianshilei1992 Differential Revision: https://reviews.llvm.org/D125260	2022-05-13 14:38:27 -04:00
serge-sans-paille	40d3a0ba4d	[openmp] Fix strict aliasing issue in cmpxchg routine Avoid warning under -fstrict-aliasing by using a call to memcpy to perform type punning. Differential Revision: https://reviews.llvm.org/D125467	2022-05-12 16:14:48 +02:00
AndreyChurbanov	52d0ef3c00	[OpenMP] libomp: Add itt notifications to sync dependent tasks. Intel Inspector uses itt notifications to analyze code execution, and it reports race conditions in dependent tasks. This patch fixes the issue notifying Inspector on tasks dependency synchronizations. Differential Revision: https://reviews.llvm.org/D123042	2022-05-05 11:30:59 -05:00
AndreyChurbanov	4a64bed216	[OpenMP] libomp: cleanup - remove duplicate check The identical check remains 20 lines above in the code. Differential Revision: https://reviews.llvm.org/D123046	2022-05-05 11:01:20 -05:00
AndreyChurbanov	eed0d85152	[OpenMP] libomp: cleanup dead code Differential Revision: https://reviews.llvm.org/D123047	2022-05-05 10:56:49 -05:00
Hansang Bae	7e23b46ab8	[OpenMP] Possible fix for sporadic test failure from loop_dispatch.c This patch tries to fix sporadic test failure after the change https://reviews.llvm.org/D122107. Made the test wait until every thread has at least one loop iteration. Differential Revision: https://reviews.llvm.org/D124812	2022-05-03 14:46:32 -05:00
Joseph Huber	5ad07ac400	[Libomptarget] Use entry name for global info Currently, globals on the device will have an infinite reference count and an unknown name when using `LIBOMPTARGET_INFO` to print the mapping table. We already store the name of the global in the offloading entry so we should be able to use it, although there will be no source location. To do this we need to create a valid `ident_t` string from a name only. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D124381	2022-04-25 09:56:43 -04:00
Ye Luo	8a880db519	[libomptarget] Make omp_target_is_present checks storage instead of zero length array. Consider checking whether a pointer has been mapped can be achieved via omp_get_mapped_ptr. omp_target_is_present is more needed to check whether the storage being pointed is mapped. This restore the old behavior of omp_target_is_present before D123093 Fixes https://github.com/llvm/llvm-project/issues/54899 Reviewed By: jdenny Differential Revision: https://reviews.llvm.org/D123891	2022-04-22 17:37:06 -05:00
Ye Luo	91ccd8248c	[Clang][OpenMP] libompd: get libomp hwloc includedir by target_link_libraries When hwloc is used and is installed outside of the default paths, the omp CMake target needs to provide the needed include path thru the CMake target by adding it with target_include_directories to it, so libompd gets it as well when it defines it's cmake target using target_link_libraries. As suggested in D122667 Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D123888	2022-04-22 17:33:41 -05:00
Joseph Huber	f557bb8733	[OpenMP][Docs] Remove usage of deprecated flag in documentation Summary: This documentation used the `-fopenmp-target-new-runtime` flag which is deprecated and has no effect. Remove it.	2022-04-21 18:50:25 -04:00
Atmn Patel	c44420e90d	[Libomptarget][remote] Add OpenMP linker flag to the plugin The remote offloading server and plugin rely on OpenMP, so this needs to be added as a linker flag. Without this, applications segfault. Differential Revision: https://reviews.llvm.org/D124200	2022-04-21 15:45:29 -04:00
Atmn Patel	489894f363	[Libomptarget][remote] Fix compile-time error This fixes a compile-time error recently introduced within the remote offloading plugin. This patch also removes some extra linker flags that are unnecessary, and adds an explicit abseil linker flag without which we occasionally get problems. Differential Revision: https://reviews.llvm.org/D119984	2022-04-19 16:46:01 -04:00
Joseph Huber	80787213ea	[Libomptarget] Fix test using old unsupported lit string Summary: One test had an old "unsupported" string that used the old `newDriver` string which was removed. This test should be updated to use the `oldDriver` one instead.	2022-04-18 23:08:12 -04:00
Joseph Huber	ae23be84cb	[OpenMP] Make the new offloading driver the default Previously an opt-in flag `-fopenmp-new-driver` was used to enable the new offloading driver. After passing tests for a few months it should be sufficiently mature to flip the switch and make it the default. The new offloading driver is now enabled if there is OpenMP and OpenMP offloading present and the new `-fno-openmp-new-driver` is not present. The new offloading driver has three main benefits over the old method: - Static library support - Device-side LTO - Unified clang driver stages Depends on D122683 Differential Revision: https://reviews.llvm.org/D122831	2022-04-18 15:05:09 -04:00
Joseph Huber	ba01306009	[Libomptarget] Fix LIBOMPTARGET_INFO test Summary: A patch added a new line to one of the info outputs without updating this test. This patch adds the new text to the existing test.	2022-04-18 14:09:02 -04:00

1 2 3 4 5 ...

2268 Commits