llvm-project

Commit Graph

Author	SHA1	Message	Date
Terry Wilmarth	d8e4cb9121	[OpenMP] libomp: Add new experimental barrier: two-level distributed barrier Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier. This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently. The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required: KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier. Patch fixed for ITTNotify disabled builds and non-x86 builds Co-authored-by: Jonathan Peyton <jonathan.l.peyton@intel.com> Co-authored-by: Vladislav Vinogradov <vlad.vinogradov@intel.com> Differential Revision: https://reviews.llvm.org/D103121	2021-07-29 14:09:26 -05:00
Joachim Protze	4acc2f29a2	[OpenMP][Tools][Tests][NFC] Address flaky archer tests Adding more concurrent threads significantly increases the chance that the data race can be observed during testing.	2021-07-29 17:56:44 +02:00
Jon Chesterfield	a90da62adb	[libomptarget][amdgpu] Update printed plugin name	2021-07-29 14:46:42 +01:00
Jose M Monsalve Diaz	88e66fa60a	[OpenMP] Fixing missing variables when CUDA SDK not in system This patch fixes the error reported in D106751. When there is no CUDA SDK installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE` variables. Using @zsrkmyn sugested fix Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106933	2021-07-27 23:46:15 -05:00
Jose M Monsalve Diaz	313c523995	[OpenMP][Tool] Introducing the `llvm-omp-device-info` tool This patch introduces the `llvm-omp-device-info` tool, which uses the omptarget library and interface to query the device info from all the available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo` Since omptarget usually requires a description structure with executable kernels, I split the initialization of the RTLs and Devices to be able to initialize all possible devices and query each of them. This revision relies on the patch that introduces the print device info. A limitation is that the order in which the devices are initialized, and the corresponding device ID is not necesarily the one seen by OpenMP. The changes are as follows: 1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function 2. Create an `initAllRTLs` method that initializes all available RTLs at runtime 3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information. Example Output: ``` Device (0): print_device_info not implemented Device (1): print_device_info not implemented Device (2): print_device_info not implemented Device (3): print_device_info not implemented Device (4): CUDA Driver Version: 11000 CUDA Device Number: 0 Device Name: Quadro P1000 Global Memory Size: 4236312576 bytes Number of Multiprocessors: 5 Concurrent Copy and Execution: Yes Total Constant Memory: 65536 bytes Max Shared Memory per Block: 49152 bytes Registers per Block: 65536 Warp Size: 32 Threads Maximum Threads per Block: 1024 Maximum Block Dimensions: 1024, 1024, 64 Maximum Grid Dimensions: 2147483647 x 65535 x 65535 Maximum Memory Pitch: 2147483647 bytes Texture Alignment: 512 bytes Clock Rate: 1480500 kHz Execution Timeout: Yes Integrated Device: No Can Map Host Memory: Yes Compute Mode: DEFAULT Concurrent Kernels: Yes ECC Enabled: No Memory Clock Rate: 2505000 kHz Memory Bus Width: 128 bits L2 Cache Size: 1048576 bytes Max Threads Per SMP: 2048 Async Engines: Yes (2) Unified Addressing: Yes Managed Memory: Yes Concurrent Managed Memory: Yes Preemption Supported: Yes Cooperative Launch: Yes Multi-Device Boars: No Compute Capabilities: 61 ``` Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106752	2021-07-27 22:38:35 -04:00
Jose M Monsalve Diaz	d2f85d0910	[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget` This patch introduces a function in the device's plugin to print the device information. This patch relates to another patch that introduces a CLI tool to obtain the device information from the omplibrary directly. It is inspired by PGI's pgaccelinfo. The modifications are as follows: 1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL. 2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented 3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy` 4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106751	2021-07-27 21:47:57 -04:00
Jose M Monsalve Diaz	5ab6aedda9	[OpenMP] Folding threadLimit and numThreads when single value in kernels The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033	2021-07-27 21:47:12 -04:00
Johannes Doerfert	ed7ec860f0	[OpenMP] Improve alignment handling in the new device runtime	2021-07-27 17:50:27 -05:00
Joseph Huber	e3ee76245e	[Libomptarget] Revert new variable sharing to use the old method The new method of sharing variables introduces a `__kmpc_alloc_shared` call that cannot be removed in the middle end because of its non-constant argument and unconnected free. This patch reverts this to the old method that used a static amount of shared memory for sharing variables. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106905	2021-07-27 18:14:01 -04:00
Joachim Protze	e32e1dae61	[OpenMP][Tests] Fix test compatibility gcc and clang disagree in how the event handle needs to be handled. According to OpenMP LC, gcc is right. Will open clang bug report	2021-07-28 00:08:32 +02:00
Joachim Protze	3c76e99291	[OpenMP] Fix deadlock for detachable task with child tasks This patch fixes https://bugs.llvm.org/show_bug.cgi?id=49066. For detachable tasks, the assumption breaks that the proxy task cannot have remaining child tasks when the proxy completes. In stead of increment/decrement the incomplete task count, a high-order bit is flipped to mark and wait for the incomplete proxy task. Differential Revision: https://reviews.llvm.org/D101082	2021-07-28 00:01:35 +02:00
Vignesh Balasubramanian	23eced9ead	Convert the error to warning for enabling OMPD in non-Linux platform OMPD is enabled by default on Linux machines and disabled on others. However, if explicitly enabled it throws an error and exit while configuring. It is mentioned in Bug: https://bugs.llvm.org/show_bug.cgi?id=51121 This patch, instead of throwing error, disables OMPD support with a warning message, so configuration can continue. Reviewed By: @protze.joachim Differential Revision: https://reviews.llvm.org/D106682	2021-07-27 17:25:27 +05:30
Johannes Doerfert	67ab875ff5	[OpenMP] Prototype opt-in new GPU device RTL The "old" OpenMP GPU device runtime (D14254) has served us well for many years but modernizing it has caused some pain recently. This patch introduces an alternative which is mostly written from scratch embracing OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual interfaces. This new runtime is opt-in through a clang flag (D106793). The new runtime is currently only build for nvptx and has "-new" in its name. The design is tailored towards middle-end optimizations rather than front-end code generation choices, a trend we already started in the old runtime a while back. In contrast to the old one, state is organized in a simple manner rather than a "smart" one. While this can induce costs it helps optimizations. Our expectation is that the majority of codes can be optimized and a "simple" design is therefore preferable. The new runtime does also avoid users to pay for things they do not use, especially wrt. memory. The unlikely case of nested parallelism is supported but costly to make the more likely case use less resources. The worksharing and reduction implementation have been taken from the old runtime and will be rewritten in the future if necessary. Documentation and debug features are still mostly missing and will be added over time. All external symbols start with `__kmpc` for legacy reasons but should be renamed once we switch over to a single runtime. All internal symbols are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name clashes with user symbols. Differential Revision: https://reviews.llvm.org/D106803	2021-07-27 00:56:05 -05:00
Shilei Tian	e97e0a4fad	[AbstractAttributor] Fold __kmpc_parallel_level if possible Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible. Note that `__kmpc_parallel_level` doesn't take activeness into consideration, based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead of 0, 129, 130, etc. that also indicate activeness. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106154	2021-07-26 22:46:19 -04:00
Joseph Huber	dead50d442	[OpenMP][NFC] Fix a few typos in OpenMP documentation Summary: Fixes some typos in the OpenMP documentation.	2021-07-26 16:03:47 -04:00
Jon Chesterfield	2a613a7790	[libomptarget] Build amdgpu plugin without hsa Default to building the amdgpu plugin to use dlopen when hsa is not found instead of disabling it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106600	2021-07-26 09:54:51 +01:00
Jon Chesterfield	93fe84d32f	[libomptarget][nfc] Squash unused variable warning Suppress only current warning on openmp-clang-x86_64-linux-debian Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106777	2021-07-26 09:54:31 +01:00
Jon Chesterfield	dd0b463dd9	[libomptarget][amdgpu] More robust handling of failure to init HSA If hsa_init fails, subsequent calls into hsa are not safe. Except for hsa_init, but we don't retry on failure. This patch: - deletes a print that called into hsa to ask why it can't call into hsa - drops a merge conflict block next to that print - reliably initializes number of devices to zero - skips the plugin destructor contents if the constructor failed to init hsa Tested by making hsa_init return error, and by forcing the dynamic library use which was then deleted from disk. Before this patch, both segv. After it, friendly message about offloading being unavailable. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106774	2021-07-25 23:15:58 +01:00
Jon Chesterfield	e3251f2ec4	Revert "[libomptarget] Build amdgpu plugin without hsa" Inaccurate error handling around hsa_init This reverts commit `e30b3b23a4`.	2021-07-25 21:03:51 +01:00
Jon Chesterfield	e30b3b23a4	[libomptarget] Build amdgpu plugin without hsa Default to building the amdgpu plugin to use dlopen when hsa is not found instead of disabling it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106600	2021-07-25 19:33:36 +01:00
Joachim Protze	c46ccb8538	[OpenMP][tests][NFC] Update test status for gcc 11 and 12 gcc 11 introduced support for depend clause, but the gomp interface of libomp does not yet handle the information. Also remove -fopenmp-version=50, which is no longer needed for clang, but not supported by gcc.	2021-07-25 18:56:36 +02:00
Shilei Tian	f1b8fa55d0	[OpenMP][NVPTX] Disable OpenMPOpt when building deviceRTLs We build `deviceRTLs` with `-O1` by default, which also triggers OpenMPOpt. When the info cache is created, some attributes are removed. As a result, although we mark a few functions `noinline`, they are still inlined when the bitcode library is generated. This can cause an issue in middle end optimization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106710	2021-07-25 10:38:27 -04:00
Ye Luo	4079037a3e	[OpenMP] always compile with c++14 instead of gnu++14 Fixes PR 51174. c++14 should be a more portable option than gnu++14. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106632	2021-07-23 17:29:08 -04:00
Shilei Tian	c2c43132f6	[OpenMP] Fix bug 50022 Bug 50022 [0] reports target nowait fails in certain case, which is added in this patch. The root cause of the failure is, when the second task is created, its parent's `td_incomplete_child_tasks` will not be incremented because there is no parallel region here thus its team is serialized. Therefore, when the initial thread is waiting for its unfinished children tasks, it thought there is only one, the first task, because it is hidden helper task, so it is tracked. The second task will only be pushed to the queue when the first task is finished. However, when the first task finishes, it first decrements the counter of its parent, and then release dependences. Once the counter is decremented, the thread will move on because its counter is reset, but actually, the second task has not been executed at all. As a result, since in this case, the main function finishes, then `libomp` starts to destroy. When the second task is pushed somewhere, all some of the structures might already have already been destroyed, then anything could happen. This patch simply moves `__kmp_release_deps` ahead of decrement of the counter. In this way, we can make sure that the initial thread is aware of the existence of another task(s) so it will not move on. In addition, in order to tackle dependence chain starting with hidden helper thread, when hidden helper task is encountered, we force the task to release dependences. Reference: [0] https://bugs.llvm.org/show_bug.cgi?id=50022 Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D106519	2021-07-23 16:54:11 -04:00
Joseph Huber	e1dedecaa6	[Libomptarget] Add unroll flag to shared variables loop Unrolling this loop provides better performance in practice because it is executed on the device and is likely to be very small. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106692	2021-07-23 16:45:27 -04:00
Shilei Tian	18ce3d3f2c	[OpenMP][Offloading] Fix data race in data mapping by using two locks This patch tries to partially fix one of the two data race issues reported in [1] by introducing a per-entry mutex. Additional discussion can also be found in D104418, which will also be refined to fix another data race problem. Here is how it works. Like before, `DataMapMtx` is still being used for mapping table lookup and update. In any case, we will get a table entry. If we need to make a data transfer (update the data on the device), we need to lock the entry right before releasing `DataMapMtx`, and the issue of data transfer should be after releasing `DataMapMtx`, and the entry is unlocked afterwards. This can guarantee that: 1) issue of data movement is not in critical region, which will not affect performance too much, and also will not affect other threads that don't touch the same entry; 2) if another thread accesses the same entry, the state of data movement is consistent (which requires that a thread must first get the update lock before getting data movement information). For a target that doesn't support async data transfer, issue of data movement is data transfer. This two-lock design can potentially improve concurrency compared with the design that guards data movement with `DataMapMtx` as well. For a target that supports async data movement, we could simply attach the event between the issue of data movement and unlock the entry. For a thread that wants to get the event, it must first get the lock. This can also get rid of the busy wait until the event pointer is valid. Reference: [1] https://bugs.llvm.org/show_bug.cgi?id=49940 Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104555	2021-07-23 16:10:51 -04:00
Abhinav Gaba	f7c92995c0	[OpenMP] Fix CUDA plugin build after `3817ba13ae`. The build was broken on machines that don't have Cuda SDK installed. See https://reviews.llvm.org/D106627 for the original discussion.	2021-07-23 16:50:00 +08:00
Johannes Doerfert	d12ee28e2e	[OpenMP] Simplify the ThreadStackTy for globalization fallback With D106496 we can make the globalization fallback stack much simpler and this version doesn't seem to experience the spurious failures and deadlocks we have seen before. Differential Revision: https://reviews.llvm.org/D106576	2021-07-22 23:57:46 -05:00
Joseph Huber	76c0c0ca86	[OpenMP][NFC] Fix formatting in CUDA plugin	2021-07-22 21:50:40 -04:00
Joseph Huber	3817ba13ae	[OpenMP] Add environment variables to change stack / heap size in the CUDA plugin This patch adds support for two environment variables to configure the device. ``LIBOMPTARGET_STACK_SIZE`` sets the amount of memory in bytes that each thread has for its stack. ``LIBOMPTARGET_HEAP_SIZE`` sets the amount of heap memory that can be allocated using malloc / free on the device. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106627	2021-07-22 21:40:02 -04:00
Shilei Tian	ea452353c0	[OpenMP] Refined the logic to give a regular task from a hidden helper task In current implementation, if a regular task depends on a hidden helper task, and when the hidden helper task is releasing its dependences, it directly calls `__kmp_omp_task`. This could cause a problem that if `__kmp_push_task` returns `TASK_NOT_PUSHED`, the task will be executed immediately. However, the hidden helper threads are assumed to only execute hidden helper tasks. This could cause problems because when calling `__kmp_omp_task`, the encountering gtid, which is not the real one of the thread, is passed. This patch uses `__kmp_give_task`, but because it is a static function, a new wrapper `__kmpc_give_task` is added. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D106572	2021-07-22 19:21:29 -04:00
Jose M Monsalve Diaz	68d6278a6e	[OpenMP] Renaming RT functions `GetNumberOfBlocksInKernel` and `GetNumberOfThreadsInBlock` These functions should follow the camel case convention. These are really easy to change and are needed for D106033. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D106390	2021-07-22 18:17:49 -04:00
Jon Chesterfield	9e05c084e5	[libomptarget][amdgpu][nfc] Normalise license headers Reviewed By: gregrodgers, jdoerfert Differential Revision: https://reviews.llvm.org/D106581	2021-07-22 20:23:41 +01:00
Jon Chesterfield	14e34a83b0	[libomptarget][amdgpu][nfc] Replace use of gelf.h with libelf.h AMDGPU can assume Elf64 so doesn't need to abstract over Elf32 Drop a few other unused headers at the same time. Now only llvm elf and libelf are used by the plugin. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106579	2021-07-22 20:04:13 +01:00
Jon Chesterfield	1a96570621	[libomptarget][amdgpu] Implement dlopen of libhsa AMDGPU plugin equivalent of D95155, build without HSA installed locally Compiles a new file, plugins/amdgpu/dynamic_hsa/hsa.cpp, to an object file that exposes the same symbols that the plugin presently uses from hsa. The object file contains dlopen of hsa and cached dlsym calls. Also provides header files corresponding to the subset that is used. This is behind a feature flag, LIBOMPTARGET_FORCE_DLOPEN_LIBHSA, default off. That allows developers to build against the dlopen/dlsym implementation, e.g. while testing this mode. Enabling by default will cause this plugin to build on a wider variety of machines than it does at present so may break some CI builds. That risk can be minimised by reviewing the header dependencies of the library and ensuring it doesn't use any libraries that are not already used by libomptarget. Separating the implementation from enabling by default in case the latter needs to be rolled back after wider CI results. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106559	2021-07-22 16:54:10 +01:00
Jon Chesterfield	6e9cd3e9f1	[libomptarget][nfc] Improve static assert message in dlwrap Revision of D102858. Raise dlwrap arity argument to template argument so the correct value is given in the error message. E.g. '2 == 1' instead of '2 == trait<>::nargs'. Arity higher than it should be: Before diff ``` $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: error: static_assert failed due to requirement '2 == trait<cudaError_enum (*)(unsigned int)>::nargs' "Arity Error" DLWRAP_INTERNAL(cuInit, 2); ^~~~~~~~~~~~~~~~~~~~~~~~~~ ... $/include/dlwrap.h:166:3: note: expanded from macro 'DLWRAP_COMMON' static_assert(ARITY == trait<decltype(&SYMBOL)>::nargs, "Arity Error"); \ ``` After diff In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16: ``` $/include/dlwrap.h:131:3: error: static_assert failed due to requirement '2UL == 1UL' "Arity Error" static_assert(Requested == Required, "Arity Error"); ^ ~~~~~~~~~~~~~~~~~~~~~ $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in instantiation of function template specialization 'dlwrap::verboseAssert<2UL, 1UL>' requested here DLWRAP_INTERNAL(cuInit, 2); ``` Arity lower than it should be: Before diff ``` $/plugins/cuda/dynamic_cuda/cuda.cpp:131:10: error: no matching function for call to 'dlwrap_cuInit' return dlwrap_cuInit(X); ^~~~~~~~~~~~~ $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: candidate function not viable: requires 0 arguments, but 1 was provided DLWRAP_INTERNAL(cuInit, 0); ``` After diff In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16: ``` $/include/dlwrap.h:131:3: error: static_assert failed due to requirement '0UL == 1UL' "Arity Error" static_assert(Requested == Required, "Arity Error"); ^ ~~~~~~~~~~~~~~~~~~~~~ $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in instantiation of function template specialization 'dlwrap::verboseAssert<0UL, 1UL>' requested here DLWRAP_INTERNAL(cuInit, 0); ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106543	2021-07-22 15:24:20 +01:00
Joseph Huber	a158d3663f	[OpenMP] Fix warnings for uninitialized block counts Summary: Fixes some warning given for uninitialized block counts if the exection mode is not recognized. This shouldn't happen in practice because the execution mode is checked when it's read from the device.	2021-07-22 09:24:07 -04:00
Jon Chesterfield	dc1f6f8b92	[libomptarget][amdgpu][nfc] Drop dead signal pool setup This class is instantiated once in rtl.cpp before hsa_init is called. The hsa_signal_create call therefore fails leaving the pool empty. This signal pool is a legacy from ATMI where it was constructed after hsa_init. Moving the state into the rtl.cpp global class disabled the initial populating of the pool without noticeably changing performance. Just rechecked with a fix that allocates the signals after hsa_init and that also doesn't noticeably change performance. This patch therefore drops the initialisation. Only change from main is to drop a DEBUG_PRINT statement that would say the pool initial size is zero. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106515	2021-07-22 10:29:32 +01:00
Joseph Huber	4a66860424	[OpenMP] Add an option to disable function internalization Function internalization can sometimes occur in situations where we want to keep the call sites intact. This patch adds an option to disable function internalization and prevents the device runtime from being internalized while creating the bitcode library. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106438	2021-07-21 21:18:18 -04:00
Joseph Huber	1684012a47	[Libomptarget] Introduce new main thread ID runtime function This patch introduces `__kmpc_is_generic_main_thread_id` which splits the old comparison into its own runtime function. The purpose of this is so we can fold this part independently, so when both this and `is_spmd_mode` are folded the final function will be folded as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106437	2021-07-21 21:18:14 -04:00
Joseph Huber	7d57639264	[OpenMP] Add new execution mode for SPMD execution with Generic semantics Qualified kernels can be transformed from generic-mode to SPMD mode using an optimization in OpenMPOpt. This patch introduces a new execution mode to indicate kernels that have been transformed from generic-mode to SPMD-mode. These kernels have SPMD-mode execution, but need generic-mode semantics for scheduling the blocks and threads. Without this far too few blocks will be scheduled for a generic region as SPMD mode expects the trip count to be divided by the number of threads. Reviewed By: ggeorgakoudis Differential Revision: https://reviews.llvm.org/D106460	2021-07-21 20:57:28 -04:00
Joseph Huber	754eb1c210	[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106496	2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis	5a682d9b91	[OpenMP] Expose libomptarget function to get HW thread id The patch exposes the libomptarget runtime function that gets the hardware thread id through the kmpc API. This is to be used in SPMDization for checking the thread id to execute regions by a single thread in a block. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106323	2021-07-21 10:26:04 -07:00
Jon Chesterfield	a733bbbd17	[libomptarget][amdgpu][nfc] Refactor #includes Create a hsa_api.h header that includes the ROCr headers in use Drop some unused headers and _cplusplus macros Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106455	2021-07-21 17:28:07 +01:00
Shilei Tian	55c65884a4	[OpenMP][deviceRTLs] Update return type of function __kmpc_parallel_level In `deviceRTLs`, the parallel level is stored in a shared variable of type `uint8_t`. `__kmpc_parallel_level` currently returns a 16-bit interger. This patch first changes the return type of the function to `uint8_t`, same as the shared variable, and then corrects function type which was updated in D105955. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106384	2021-07-20 15:45:43 -04:00
Shilei Tian	02dff78983	[NFC][OpenMP] Fix an issue that no CHECK in test cases This fixes the complaint from FileCheck. Reviewed By: abhinavgaba, jdoerfert Differential Revision: https://reviews.llvm.org/D106387	2021-07-20 15:39:18 -04:00
Joseph Huber	b917a1d713	[OpenMP] Change AMDGCN to AMDGPU in the Cmake Module Summary: Change the name for targeting AMD offloading.	2021-07-20 12:52:53 -04:00
Joseph Huber	6242f9b966	[OpenMP][Documentation] Fix hyperlink location Fixes the documentation hyperlinks not showing the header. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106374	2021-07-20 12:42:32 -04:00
Tony Tye	038602139d	[NFC] Correct documentation error in OpenMP release ReleaseNotes Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106330	2021-07-20 02:04:43 +00:00
Shilei Tian	996baa58a4	[OpenMP] Fixed a segmentation fault when using taskloop and target nowait The synchronization of task loop misses hidden helper tasks, causing segmentation fault reported in https://bugs.llvm.org/show_bug.cgi?id=50002. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D106220	2021-07-19 21:09:05 -04:00

1 2 3 4 5 ...

1829 Commits