llvm-project

Commit Graph

Author	SHA1	Message	Date
George Rokos	3819aae6dd	[libomptarget][NFC] Replaced obsolete name "getOrAllocTgtPtr" with new "getTargetPointer" in debug messages.	2021-08-26 18:01:18 -07:00
Jon Chesterfield	3d85342982	[libomptarget][amdgpu][nfc] Rename variables, delete dead code	2021-08-26 19:58:38 +01:00
Jon Chesterfield	68ab93f4d7	[libomptarget][amdgpu][nfc] Rename source files	2021-08-26 18:29:44 +01:00
Jon Chesterfield	a5f4074d85	[libomptarget][amdgpu] Macro for accessing GPU variables from plugin Lets the amdgpu plugin write to omptarget_device_environment to enable debugging. Intend to use in the near future to record the wavesize that a given deviceRTL was compiled with for running on hardware that supports 32 or 64. Patch sets all the attributes that are useful. Notably .data means the variable is set by writing to host memory before copying to the GPU instead of launching a kernel to update the image. Can simplify the plugin slightly to drop the code for patching after load if this is used consistently. NFC on nvptx, cuda plugin seems to work fine without any annotations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108698	2021-08-26 17:28:18 +01:00
Jon Chesterfield	ba0af885e7	[libomptarget][amdgpu][nfc] Make grid value access match devicertl	2021-08-25 15:11:19 +01:00
Jon Chesterfield	9b2c6c07b5	[libomptarget][amdgpu] Refactor debug printing Move most debug printing in rtl.cpp behind DP() macro Adjust the print output for gpu arch mismatch when the architectures match Convert an assert into graceful failure Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108562	2021-08-25 14:57:51 +01:00
Jon Chesterfield	ba8547775b	[libomptarget][amdgpu] Fix debug build from D104696	2021-08-25 01:27:51 +01:00
Michael Kruse	1275ee3041	[OpenMP][amdgcn] Don't use in-tree clang if not available. The use of `$<TARGET_FILE:clang>` was adapted too broadly from D101265. Fixes llvm.org/PR51579 Also see discussion in D108534. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D108640	2021-08-24 12:50:49 -05:00
Pushpinder Singh	9b8b7c1180	[AMDGPU][Libomptarget] Delete g_atl_machine global With uses of g_atl_machine gone, a significant portion of dead code has been removed. This patch depends on D104691 and D104695. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104696	2021-08-24 07:59:40 +00:00
Jon Chesterfield	d26000e4cc	[openmp][devicertl] Freestanding nvptx via stub printf Compiled nvptx devicertl as freestanding, breaking the dependency on host glibc and gcc-multilibs. Thus build it by default. Comes at the cost of #defining out printf. Tried mapping it onto __builtin_printf but that gets transformed back to printf instead of hitting the cuda/openmp lowering transform. Printf could be preserved by one of: - dropping all the standard headers and ffreestanding - providing a header only printf implementation - changing the compiler handling of printf Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D108349	2021-08-23 23:07:47 +01:00
Jon Chesterfield	842f875c8b	[openmp] Use llvm GridValues from devicertl Add include path to the cmakefiles and set the target_impl enums from the llvm constants instead of copying the values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108391	2021-08-23 20:25:24 +01:00
Peyton, Jonathan L	d39d3a327b	[OpenMP][test] fix omp_get_wtime.c test to be more accommodating The omp_get_wtime.c test fails intermittently if the recorded times are off by too much which can happen when many tests are run in parallel. Instead of failing if one timing is a little off, take average of 100 timings minus the 10 worst. Differential Revision: https://reviews.llvm.org/D108488	2021-08-23 08:13:42 -05:00
Vignesh Balasubramanian	589519b9ab	[OpenMP][OMPD]Code movement required for OMPD These changes don't come under OMPD guard as it is a movement of existing code to capture parallel behavior correctly. "Runtime Entry Points for OMPD" like "ompd_bp_parallel_begin" and "ompd_bp_parallel_begin" should be placed at the correct execution point for the debugging tool to access proper handles/data. Without the below changes, in certain cases, debugging tool will pick the wrong parallel and task handle. Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100366	2021-08-20 14:36:22 +05:30
Shilei Tian	1d8d43ae61	[OpenMP] Use `__kmpc_give_task` in `__kmp_push_task` when encountering a hidden helper task This patch replaces the current implementation, overwrites `gtid` and `thread`, with `__kmpc_give_task`. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D106977	2021-08-19 20:49:29 -04:00
Joachim Protze	4bb36df144	[libomptarget][amdcgn] Add build dependency for llvm-link and opt D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime (LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same cmake instance. We could limit the dependency to cases where libomptarget/plugins are really built. But compared to the whole llvm project, building openmp runtime is negligible and postponing the build of OpenMP runtime after the dependencies are ready seems reasonable. The direct dependency introduced in D107156 and D107320 is necessary for the case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp). Differential Revision: https://reviews.llvm.org/D108404	2021-08-20 01:57:58 +02:00
Jennifer Yu	c274b19866	Add implicit map for a list item appears in a reduction clause. A new rule is added in 5.0: If a list item appears in a reduction, lastprivate or linear clause on a combined target construct then it is treated as if it also appears in a map clause with a map-type of tofrom. Currently map clauses for all capture variables are added implicitly. But missing for list item of expression for array elements or array sections. The change is to add implicit map clause for array of elements used in reduction clause. Skip adding map clause if the expression is not mappable. Noted: For linear and lastprivate, since only variable name is accepted, the map has been added though capture variables. To do so: During the mappable checking, if error, ignore diagnose and skip adding implicit map clause. The changes: 1> Add code to generate implicit map in ActOnOpenMPExecutableDirective, for omp 5.0 and up. 2> Add extra default parameter NoDiagnose in ActOnOpenMPMapClause: Use that to skip error as well as skip adding implicit map during the mappable checking. Note: there are only tow places need to be check for NoDiagnose. Rest of them either the check is for < omp 5.0 or the error already generated for reduction clause. Differential Revision: https://reviews.llvm.org/D108132	2021-08-19 12:53:47 -07:00
Jon Chesterfield	ad0f6e1d98	[openmp] Disable the tests that block CI for amdgpu and host offloading.	2021-08-19 20:43:30 +01:00
Jon Chesterfield	6c75ce1b8b	[libomptarget][nfc] Move lanemask_t type into target_impl.h	2021-08-19 18:50:03 +01:00
Jon Chesterfield	77579b99e9	[openmp][nfc] Replace OMPGridValues array with struct [nfc] Replaces enum indices into an array with a struct. Named the fields to match the enum, leaves memory layout and initialization unchanged. Motivation is to later safely remove dead fields and replace redundant ones with (compile time) computation. It should also be possible to factor some common fields into a base and introduce a gfx10 amdgpu instance with less duplication than the arrays of integers require. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108339	2021-08-19 13:25:42 +01:00
Jon Chesterfield	f420939b82	[libomptarget] Apply D106710 to amdgcn devicertl	2021-08-19 01:34:33 +01:00
Jon Chesterfield	c480792b6a	[libomptarget][nfc][devicertl] Delete unused enums	2021-08-19 00:14:34 +01:00
Jon Chesterfield	21d91a8ef3	[libomptarget][devicertl] Replace lanemask with uint64 at interface Use uint64_t for lanemask on all GPU architectures at the interface with clang. Updates tests. The deviceRTL is always linked as IR so the zext and trunc introduced for wave32 architectures will fold after inlining. Simplification partly motivated by amdgpu gfx10 which will be wave32 and is awkward to express in the current arch-dependant typedef interface. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108317	2021-08-18 20:47:33 +01:00
Joseph Huber	edb8acdc6e	[Libomptarget] Correctly default to Generic if exec_mode is not present Currently, the runtime returns an error when the `exec_mode` global is not present. The expected behvaiour is that the region will default to Generic. This prevents global constructors from being called because they do not contain execution mode globals. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108255	2021-08-18 11:24:28 -04:00
Martin Storsjö	f5616a981c	[OpenMP] Fix the usage of sscanf on MinGW KMP_SSCANF only evaluates to sscanf_s within #if KMP_OS_WINDOWS && KMP_MSVC_COMPAT so we need to pass the sscanf_s specific parameters within a similar condition. Differential Revision: https://reviews.llvm.org/D108196	2021-08-17 21:36:09 +03:00
Peyton, Jonathan L	b4a1f441d9	[OpenMP] Add a few small fixes * Add comment to help ensure new construct data are added in two places * Check for division by zero in the loop worksharing code * Check for syntax errors in parrange parsing Differential Revision: https://reviews.llvm.org/D105929	2021-08-16 10:02:49 -05:00
Peyton, Jonathan L	6eeb4c1f32	[OpenMP] Fix incorrect parameters to sscanf_s call On Windows, the documentation states that when using sscanf_s, each %c and %s specifier must also have additional size parameter. This patch adds the size parameter in the one place where %c is used. Differential Revision: https://reviews.llvm.org/D105931	2021-08-16 09:59:21 -05:00
AndreyChurbanov	52cac541d4	[OpenMP] libomp: cleanup: minor fixes to silence static analyzer. Added couple more checks to silence KlocWork static code analyzer. Differential Revision: https://reviews.llvm.org/D107348	2021-08-16 13:39:23 +03:00
AndreyChurbanov	f94da67f49	[OpenMP][NFC] libomp: reduced timeouts in the test from 50 to 2 sec.	2021-08-11 17:58:52 +03:00
George Rokos	df06ec3057	[libomptarget][NFC] Fix compilation issue with GCC Removed redundant assignment from condition which causes gcc to emit the following error: error: operation on ‘MoveData’ may be undefined [-Werror=sequence-point]	2021-08-10 09:43:43 -07:00
Joel E. Denny	2ced1f338a	[OpenMP][NFC] Simplify targetDataEnd conditions for CopyMember targetDataEnd and targetDataBegin compute CopyMember/copy differently, and I don't see why they should. This patch eliminates one of those differences by making a simplifying NFC change to targetDataEnd. The change is NFC as follows. The change only affects the case when `!UNIFIED_SHARED_MEMORY \|\| HasCloseModifier`. In that case, the following points are always true: * The value of CopyMember is relevant later only if DelEntry = false. * DelEntry = false only if one of the following is true: * IsLast = false. In this case, it's always true that CopyMember = false = IsLast. * `MEMBER_OF && !PTR_AND_OBJ` is true. In this case, CopyMember = IsLast. * Thus, if CopyMember is relevant, CopyMember = IsLast. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D105990	2021-08-10 12:29:55 -04:00
Pirama Arumuga Nainar	49fabd9d76	[openmp] Do not use shared memory on Android Android provides ashmem/ASharedMemory support on newer releases, which we can use if requested by openmp users on Android. Also refactor the preprocessor check for using shared memory to kmp_config.h.cmake. Differential Revision: https://reviews.llvm.org/D107181	2021-08-09 09:41:32 -07:00
Dimitry Andric	400cd6d2f0	[libomptarget][amdgpu] use --allow-shlib-undefined to link on FreeBSD On FreeBSD, the `environ` symbol is undefined at link time for shared libraries, but resolved by the dynamic linker at runtime. Therefore, allow the symbol to be undefined when creating a shared library, by using the `--allow-shlib-undefined` linker flag, instead of `-z defs` (a.k.a `--no-undefined`). Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107698	2021-08-08 13:52:44 +02:00
Ye Luo	262289c103	[OpenMP] mark target task untied OpenMP specification Tasking Terminology target task :A mergeable and untied task that ... Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D107686	2021-08-07 12:31:20 -04:00
Dimitry Andric	71ae2e0221	[libomptarget][amdgpu] don't declare Elf_Note on FreeBSD On FreeBSD, the system `<libelf.h>` already declares `struct Elf_Note` indirectly (via `<sys/elf_common.h>`). This results in compile errors when building the libomptarget amdgpu plugin. Avoid redeclaring `struct Elf_Note` on FreeBSD to fix the errors. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107661	2021-08-06 21:45:26 +02:00
Shilei Tian	28939b6ae5	[NFC] Clean up and clang-format openmp/libomptarget/plugins/cuda/src/rtl.cpp	2021-08-05 22:32:28 -04:00
Shilei Tian	680c71b127	[OpenMP] Clean up for hidden helper task This patch makes some clean up for code of hidden helper task. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D107008	2021-08-04 12:36:44 -04:00
Shilei Tian	9f5d6ea52e	[OpenMP] Fix performance regression reported in bug #51235 This patch fixes the "performance regression" reported in https://bugs.llvm.org/show_bug.cgi?id=51235. In fact it has nothing to do with performance. The root cause is, the stolen task is not allowed to execute by another thread because by default it is tied task. Since hidden helper task will always be executed by hidden helper threads, it should be untied. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D107121	2021-08-04 12:34:49 -04:00
Lechen Yu	3bc8ce5dd7	[openmp] Add OMPT initialization in libomptarget When loading libomptarget, the init function in libomptarget/src/rtl.cpp will search for the libomptarget_start_tool function using libdl. libomptarget_start_tool will pass those OMPT callbacks related to target constructs to libomptarget Differential Revision: https://reviews.llvm.org/D99803	2021-08-04 18:00:11 +02:00
AndreyChurbanov	8e29b4b323	[OpenMP] libomp: taskwait depend implementation fixed. Fix for https://bugs.llvm.org/show_bug.cgi?id=49723. Eliminated references from task dependency hash to node allocated on stack, thus eliminated accesses to stale memory. So the node now never freed. Uncommented assertion which triggered when stale memory accessed. Removed unneeded ref count increment for stack allocated node. Differential Revision: https://reviews.llvm.org/D106705	2021-08-03 15:45:20 +03:00
Jon Chesterfield	567c8c7bfd	[libomptarget][nfc] Only set cuda-path for nvptx tests Remove --cuda-path=CUDA_TOOLKIT_ROOT_DIR-NOTFOUND from the invocation of non-nvptx test cases. Better signal to noise ratio on other architectures. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D107074	2021-07-30 23:01:09 +01:00
Jose M Monsalve Diaz	5424ceeda0	[OpenMP] Fixing llvm-omp-device-info compilation with runtimes When using `-DLLVM_ENABLED_RUNTIMES` instead of `-DLLVM_ENABLED_PROJECTS` the `llvm-omp-device-info` tool is not compiled or installed. In general, no llvm tool would be build on runtimes, because the -DLLVM_BUILD_TOOLS flag is removed by the way runtimes compilation calls cmake again. This patch is simple. Just forward the value of this flag to the runtime cmake command. I'm also removing an unnecessary comment in the compilation of the tool Differential Revision: https://reviews.llvm.org/D107177	2021-07-30 13:09:08 -05:00
Shilei Tian	36d53af4a9	[OpenMP][Offloading] Remove task wait in nowait interfaces All `nowait` series of interfaces in `libomptarget` accept four more arguments (`int32_t depNum, void depList, int32_t noAliasDepNum, void noAliasDepList`) compared with their counterparts w/o `nowait`. These extra arguments were expected for dependence resolution, potentially lowered to device side. Current implementation calls `libomp` function `__kmpc_omp_taskwait`. However, the front end simply ignores them, that these four arguments are not emitted at all. As a consequence, the `depNum` and `noAliasDepNum` are garbage, which could lead to unnecessary task wait. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107164	2021-07-30 11:39:46 -04:00
AndreyChurbanov	8b81524c6d	[OpenMP][NFC] libomp: silence warnings on unused variables. Put declarations/definitions of unused variables under corresponding macros to silence clang build warnings. Differential Revision: https://reviews.llvm.org/D106608	2021-07-30 17:04:42 +03:00
Joachim Protze	4ffa1478fd	[libomptarget][amdcgn] Add build dependency for opt This patch should fix the build we observe when building LLVM from scratch. Differential Revision: https://reviews.llvm.org/D107156	2021-07-30 15:45:13 +02:00
Terry Wilmarth	d8e4cb9121	[OpenMP] libomp: Add new experimental barrier: two-level distributed barrier Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier. This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently. The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required: KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier. Patch fixed for ITTNotify disabled builds and non-x86 builds Co-authored-by: Jonathan Peyton <jonathan.l.peyton@intel.com> Co-authored-by: Vladislav Vinogradov <vlad.vinogradov@intel.com> Differential Revision: https://reviews.llvm.org/D103121	2021-07-29 14:09:26 -05:00
Joachim Protze	4acc2f29a2	[OpenMP][Tools][Tests][NFC] Address flaky archer tests Adding more concurrent threads significantly increases the chance that the data race can be observed during testing.	2021-07-29 17:56:44 +02:00
Jon Chesterfield	a90da62adb	[libomptarget][amdgpu] Update printed plugin name	2021-07-29 14:46:42 +01:00
Jose M Monsalve Diaz	88e66fa60a	[OpenMP] Fixing missing variables when CUDA SDK not in system This patch fixes the error reported in D106751. When there is no CUDA SDK installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE` variables. Using @zsrkmyn sugested fix Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106933	2021-07-27 23:46:15 -05:00
Jose M Monsalve Diaz	313c523995	[OpenMP][Tool] Introducing the `llvm-omp-device-info` tool This patch introduces the `llvm-omp-device-info` tool, which uses the omptarget library and interface to query the device info from all the available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo` Since omptarget usually requires a description structure with executable kernels, I split the initialization of the RTLs and Devices to be able to initialize all possible devices and query each of them. This revision relies on the patch that introduces the print device info. A limitation is that the order in which the devices are initialized, and the corresponding device ID is not necesarily the one seen by OpenMP. The changes are as follows: 1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function 2. Create an `initAllRTLs` method that initializes all available RTLs at runtime 3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information. Example Output: ``` Device (0): print_device_info not implemented Device (1): print_device_info not implemented Device (2): print_device_info not implemented Device (3): print_device_info not implemented Device (4): CUDA Driver Version: 11000 CUDA Device Number: 0 Device Name: Quadro P1000 Global Memory Size: 4236312576 bytes Number of Multiprocessors: 5 Concurrent Copy and Execution: Yes Total Constant Memory: 65536 bytes Max Shared Memory per Block: 49152 bytes Registers per Block: 65536 Warp Size: 32 Threads Maximum Threads per Block: 1024 Maximum Block Dimensions: 1024, 1024, 64 Maximum Grid Dimensions: 2147483647 x 65535 x 65535 Maximum Memory Pitch: 2147483647 bytes Texture Alignment: 512 bytes Clock Rate: 1480500 kHz Execution Timeout: Yes Integrated Device: No Can Map Host Memory: Yes Compute Mode: DEFAULT Concurrent Kernels: Yes ECC Enabled: No Memory Clock Rate: 2505000 kHz Memory Bus Width: 128 bits L2 Cache Size: 1048576 bytes Max Threads Per SMP: 2048 Async Engines: Yes (2) Unified Addressing: Yes Managed Memory: Yes Concurrent Managed Memory: Yes Preemption Supported: Yes Cooperative Launch: Yes Multi-Device Boars: No Compute Capabilities: 61 ``` Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106752	2021-07-27 22:38:35 -04:00
Jose M Monsalve Diaz	d2f85d0910	[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget` This patch introduces a function in the device's plugin to print the device information. This patch relates to another patch that introduces a CLI tool to obtain the device information from the omplibrary directly. It is inspired by PGI's pgaccelinfo. The modifications are as follows: 1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL. 2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented 3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy` 4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106751	2021-07-27 21:47:57 -04:00
Jose M Monsalve Diaz	5ab6aedda9	[OpenMP] Folding threadLimit and numThreads when single value in kernels The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033	2021-07-27 21:47:12 -04:00
Johannes Doerfert	ed7ec860f0	[OpenMP] Improve alignment handling in the new device runtime	2021-07-27 17:50:27 -05:00
Joseph Huber	e3ee76245e	[Libomptarget] Revert new variable sharing to use the old method The new method of sharing variables introduces a `__kmpc_alloc_shared` call that cannot be removed in the middle end because of its non-constant argument and unconnected free. This patch reverts this to the old method that used a static amount of shared memory for sharing variables. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106905	2021-07-27 18:14:01 -04:00
Joachim Protze	e32e1dae61	[OpenMP][Tests] Fix test compatibility gcc and clang disagree in how the event handle needs to be handled. According to OpenMP LC, gcc is right. Will open clang bug report	2021-07-28 00:08:32 +02:00
Joachim Protze	3c76e99291	[OpenMP] Fix deadlock for detachable task with child tasks This patch fixes https://bugs.llvm.org/show_bug.cgi?id=49066. For detachable tasks, the assumption breaks that the proxy task cannot have remaining child tasks when the proxy completes. In stead of increment/decrement the incomplete task count, a high-order bit is flipped to mark and wait for the incomplete proxy task. Differential Revision: https://reviews.llvm.org/D101082	2021-07-28 00:01:35 +02:00
Vignesh Balasubramanian	23eced9ead	Convert the error to warning for enabling OMPD in non-Linux platform OMPD is enabled by default on Linux machines and disabled on others. However, if explicitly enabled it throws an error and exit while configuring. It is mentioned in Bug: https://bugs.llvm.org/show_bug.cgi?id=51121 This patch, instead of throwing error, disables OMPD support with a warning message, so configuration can continue. Reviewed By: @protze.joachim Differential Revision: https://reviews.llvm.org/D106682	2021-07-27 17:25:27 +05:30
Johannes Doerfert	67ab875ff5	[OpenMP] Prototype opt-in new GPU device RTL The "old" OpenMP GPU device runtime (D14254) has served us well for many years but modernizing it has caused some pain recently. This patch introduces an alternative which is mostly written from scratch embracing OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual interfaces. This new runtime is opt-in through a clang flag (D106793). The new runtime is currently only build for nvptx and has "-new" in its name. The design is tailored towards middle-end optimizations rather than front-end code generation choices, a trend we already started in the old runtime a while back. In contrast to the old one, state is organized in a simple manner rather than a "smart" one. While this can induce costs it helps optimizations. Our expectation is that the majority of codes can be optimized and a "simple" design is therefore preferable. The new runtime does also avoid users to pay for things they do not use, especially wrt. memory. The unlikely case of nested parallelism is supported but costly to make the more likely case use less resources. The worksharing and reduction implementation have been taken from the old runtime and will be rewritten in the future if necessary. Documentation and debug features are still mostly missing and will be added over time. All external symbols start with `__kmpc` for legacy reasons but should be renamed once we switch over to a single runtime. All internal symbols are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name clashes with user symbols. Differential Revision: https://reviews.llvm.org/D106803	2021-07-27 00:56:05 -05:00
Shilei Tian	e97e0a4fad	[AbstractAttributor] Fold __kmpc_parallel_level if possible Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible. Note that `__kmpc_parallel_level` doesn't take activeness into consideration, based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead of 0, 129, 130, etc. that also indicate activeness. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106154	2021-07-26 22:46:19 -04:00
Joseph Huber	dead50d442	[OpenMP][NFC] Fix a few typos in OpenMP documentation Summary: Fixes some typos in the OpenMP documentation.	2021-07-26 16:03:47 -04:00
Jon Chesterfield	2a613a7790	[libomptarget] Build amdgpu plugin without hsa Default to building the amdgpu plugin to use dlopen when hsa is not found instead of disabling it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106600	2021-07-26 09:54:51 +01:00
Jon Chesterfield	93fe84d32f	[libomptarget][nfc] Squash unused variable warning Suppress only current warning on openmp-clang-x86_64-linux-debian Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106777	2021-07-26 09:54:31 +01:00
Jon Chesterfield	dd0b463dd9	[libomptarget][amdgpu] More robust handling of failure to init HSA If hsa_init fails, subsequent calls into hsa are not safe. Except for hsa_init, but we don't retry on failure. This patch: - deletes a print that called into hsa to ask why it can't call into hsa - drops a merge conflict block next to that print - reliably initializes number of devices to zero - skips the plugin destructor contents if the constructor failed to init hsa Tested by making hsa_init return error, and by forcing the dynamic library use which was then deleted from disk. Before this patch, both segv. After it, friendly message about offloading being unavailable. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106774	2021-07-25 23:15:58 +01:00
Jon Chesterfield	e3251f2ec4	Revert "[libomptarget] Build amdgpu plugin without hsa" Inaccurate error handling around hsa_init This reverts commit `e30b3b23a4`.	2021-07-25 21:03:51 +01:00
Jon Chesterfield	e30b3b23a4	[libomptarget] Build amdgpu plugin without hsa Default to building the amdgpu plugin to use dlopen when hsa is not found instead of disabling it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106600	2021-07-25 19:33:36 +01:00
Joachim Protze	c46ccb8538	[OpenMP][tests][NFC] Update test status for gcc 11 and 12 gcc 11 introduced support for depend clause, but the gomp interface of libomp does not yet handle the information. Also remove -fopenmp-version=50, which is no longer needed for clang, but not supported by gcc.	2021-07-25 18:56:36 +02:00
Shilei Tian	f1b8fa55d0	[OpenMP][NVPTX] Disable OpenMPOpt when building deviceRTLs We build `deviceRTLs` with `-O1` by default, which also triggers OpenMPOpt. When the info cache is created, some attributes are removed. As a result, although we mark a few functions `noinline`, they are still inlined when the bitcode library is generated. This can cause an issue in middle end optimization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106710	2021-07-25 10:38:27 -04:00
Ye Luo	4079037a3e	[OpenMP] always compile with c++14 instead of gnu++14 Fixes PR 51174. c++14 should be a more portable option than gnu++14. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106632	2021-07-23 17:29:08 -04:00
Shilei Tian	c2c43132f6	[OpenMP] Fix bug 50022 Bug 50022 [0] reports target nowait fails in certain case, which is added in this patch. The root cause of the failure is, when the second task is created, its parent's `td_incomplete_child_tasks` will not be incremented because there is no parallel region here thus its team is serialized. Therefore, when the initial thread is waiting for its unfinished children tasks, it thought there is only one, the first task, because it is hidden helper task, so it is tracked. The second task will only be pushed to the queue when the first task is finished. However, when the first task finishes, it first decrements the counter of its parent, and then release dependences. Once the counter is decremented, the thread will move on because its counter is reset, but actually, the second task has not been executed at all. As a result, since in this case, the main function finishes, then `libomp` starts to destroy. When the second task is pushed somewhere, all some of the structures might already have already been destroyed, then anything could happen. This patch simply moves `__kmp_release_deps` ahead of decrement of the counter. In this way, we can make sure that the initial thread is aware of the existence of another task(s) so it will not move on. In addition, in order to tackle dependence chain starting with hidden helper thread, when hidden helper task is encountered, we force the task to release dependences. Reference: [0] https://bugs.llvm.org/show_bug.cgi?id=50022 Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D106519	2021-07-23 16:54:11 -04:00
Joseph Huber	e1dedecaa6	[Libomptarget] Add unroll flag to shared variables loop Unrolling this loop provides better performance in practice because it is executed on the device and is likely to be very small. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106692	2021-07-23 16:45:27 -04:00
Shilei Tian	18ce3d3f2c	[OpenMP][Offloading] Fix data race in data mapping by using two locks This patch tries to partially fix one of the two data race issues reported in [1] by introducing a per-entry mutex. Additional discussion can also be found in D104418, which will also be refined to fix another data race problem. Here is how it works. Like before, `DataMapMtx` is still being used for mapping table lookup and update. In any case, we will get a table entry. If we need to make a data transfer (update the data on the device), we need to lock the entry right before releasing `DataMapMtx`, and the issue of data transfer should be after releasing `DataMapMtx`, and the entry is unlocked afterwards. This can guarantee that: 1) issue of data movement is not in critical region, which will not affect performance too much, and also will not affect other threads that don't touch the same entry; 2) if another thread accesses the same entry, the state of data movement is consistent (which requires that a thread must first get the update lock before getting data movement information). For a target that doesn't support async data transfer, issue of data movement is data transfer. This two-lock design can potentially improve concurrency compared with the design that guards data movement with `DataMapMtx` as well. For a target that supports async data movement, we could simply attach the event between the issue of data movement and unlock the entry. For a thread that wants to get the event, it must first get the lock. This can also get rid of the busy wait until the event pointer is valid. Reference: [1] https://bugs.llvm.org/show_bug.cgi?id=49940 Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104555	2021-07-23 16:10:51 -04:00
Abhinav Gaba	f7c92995c0	[OpenMP] Fix CUDA plugin build after `3817ba13ae`. The build was broken on machines that don't have Cuda SDK installed. See https://reviews.llvm.org/D106627 for the original discussion.	2021-07-23 16:50:00 +08:00
Johannes Doerfert	d12ee28e2e	[OpenMP] Simplify the ThreadStackTy for globalization fallback With D106496 we can make the globalization fallback stack much simpler and this version doesn't seem to experience the spurious failures and deadlocks we have seen before. Differential Revision: https://reviews.llvm.org/D106576	2021-07-22 23:57:46 -05:00
Joseph Huber	76c0c0ca86	[OpenMP][NFC] Fix formatting in CUDA plugin	2021-07-22 21:50:40 -04:00
Joseph Huber	3817ba13ae	[OpenMP] Add environment variables to change stack / heap size in the CUDA plugin This patch adds support for two environment variables to configure the device. ``LIBOMPTARGET_STACK_SIZE`` sets the amount of memory in bytes that each thread has for its stack. ``LIBOMPTARGET_HEAP_SIZE`` sets the amount of heap memory that can be allocated using malloc / free on the device. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106627	2021-07-22 21:40:02 -04:00
Shilei Tian	ea452353c0	[OpenMP] Refined the logic to give a regular task from a hidden helper task In current implementation, if a regular task depends on a hidden helper task, and when the hidden helper task is releasing its dependences, it directly calls `__kmp_omp_task`. This could cause a problem that if `__kmp_push_task` returns `TASK_NOT_PUSHED`, the task will be executed immediately. However, the hidden helper threads are assumed to only execute hidden helper tasks. This could cause problems because when calling `__kmp_omp_task`, the encountering gtid, which is not the real one of the thread, is passed. This patch uses `__kmp_give_task`, but because it is a static function, a new wrapper `__kmpc_give_task` is added. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D106572	2021-07-22 19:21:29 -04:00
Jose M Monsalve Diaz	68d6278a6e	[OpenMP] Renaming RT functions `GetNumberOfBlocksInKernel` and `GetNumberOfThreadsInBlock` These functions should follow the camel case convention. These are really easy to change and are needed for D106033. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D106390	2021-07-22 18:17:49 -04:00
Jon Chesterfield	9e05c084e5	[libomptarget][amdgpu][nfc] Normalise license headers Reviewed By: gregrodgers, jdoerfert Differential Revision: https://reviews.llvm.org/D106581	2021-07-22 20:23:41 +01:00
Jon Chesterfield	14e34a83b0	[libomptarget][amdgpu][nfc] Replace use of gelf.h with libelf.h AMDGPU can assume Elf64 so doesn't need to abstract over Elf32 Drop a few other unused headers at the same time. Now only llvm elf and libelf are used by the plugin. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106579	2021-07-22 20:04:13 +01:00
Jon Chesterfield	1a96570621	[libomptarget][amdgpu] Implement dlopen of libhsa AMDGPU plugin equivalent of D95155, build without HSA installed locally Compiles a new file, plugins/amdgpu/dynamic_hsa/hsa.cpp, to an object file that exposes the same symbols that the plugin presently uses from hsa. The object file contains dlopen of hsa and cached dlsym calls. Also provides header files corresponding to the subset that is used. This is behind a feature flag, LIBOMPTARGET_FORCE_DLOPEN_LIBHSA, default off. That allows developers to build against the dlopen/dlsym implementation, e.g. while testing this mode. Enabling by default will cause this plugin to build on a wider variety of machines than it does at present so may break some CI builds. That risk can be minimised by reviewing the header dependencies of the library and ensuring it doesn't use any libraries that are not already used by libomptarget. Separating the implementation from enabling by default in case the latter needs to be rolled back after wider CI results. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106559	2021-07-22 16:54:10 +01:00
Jon Chesterfield	6e9cd3e9f1	[libomptarget][nfc] Improve static assert message in dlwrap Revision of D102858. Raise dlwrap arity argument to template argument so the correct value is given in the error message. E.g. '2 == 1' instead of '2 == trait<>::nargs'. Arity higher than it should be: Before diff ``` $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: error: static_assert failed due to requirement '2 == trait<cudaError_enum (*)(unsigned int)>::nargs' "Arity Error" DLWRAP_INTERNAL(cuInit, 2); ^~~~~~~~~~~~~~~~~~~~~~~~~~ ... $/include/dlwrap.h:166:3: note: expanded from macro 'DLWRAP_COMMON' static_assert(ARITY == trait<decltype(&SYMBOL)>::nargs, "Arity Error"); \ ``` After diff In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16: ``` $/include/dlwrap.h:131:3: error: static_assert failed due to requirement '2UL == 1UL' "Arity Error" static_assert(Requested == Required, "Arity Error"); ^ ~~~~~~~~~~~~~~~~~~~~~ $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in instantiation of function template specialization 'dlwrap::verboseAssert<2UL, 1UL>' requested here DLWRAP_INTERNAL(cuInit, 2); ``` Arity lower than it should be: Before diff ``` $/plugins/cuda/dynamic_cuda/cuda.cpp:131:10: error: no matching function for call to 'dlwrap_cuInit' return dlwrap_cuInit(X); ^~~~~~~~~~~~~ $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: candidate function not viable: requires 0 arguments, but 1 was provided DLWRAP_INTERNAL(cuInit, 0); ``` After diff In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16: ``` $/include/dlwrap.h:131:3: error: static_assert failed due to requirement '0UL == 1UL' "Arity Error" static_assert(Requested == Required, "Arity Error"); ^ ~~~~~~~~~~~~~~~~~~~~~ $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in instantiation of function template specialization 'dlwrap::verboseAssert<0UL, 1UL>' requested here DLWRAP_INTERNAL(cuInit, 0); ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106543	2021-07-22 15:24:20 +01:00
Joseph Huber	a158d3663f	[OpenMP] Fix warnings for uninitialized block counts Summary: Fixes some warning given for uninitialized block counts if the exection mode is not recognized. This shouldn't happen in practice because the execution mode is checked when it's read from the device.	2021-07-22 09:24:07 -04:00
Jon Chesterfield	dc1f6f8b92	[libomptarget][amdgpu][nfc] Drop dead signal pool setup This class is instantiated once in rtl.cpp before hsa_init is called. The hsa_signal_create call therefore fails leaving the pool empty. This signal pool is a legacy from ATMI where it was constructed after hsa_init. Moving the state into the rtl.cpp global class disabled the initial populating of the pool without noticeably changing performance. Just rechecked with a fix that allocates the signals after hsa_init and that also doesn't noticeably change performance. This patch therefore drops the initialisation. Only change from main is to drop a DEBUG_PRINT statement that would say the pool initial size is zero. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106515	2021-07-22 10:29:32 +01:00
Joseph Huber	4a66860424	[OpenMP] Add an option to disable function internalization Function internalization can sometimes occur in situations where we want to keep the call sites intact. This patch adds an option to disable function internalization and prevents the device runtime from being internalized while creating the bitcode library. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106438	2021-07-21 21:18:18 -04:00
Joseph Huber	1684012a47	[Libomptarget] Introduce new main thread ID runtime function This patch introduces `__kmpc_is_generic_main_thread_id` which splits the old comparison into its own runtime function. The purpose of this is so we can fold this part independently, so when both this and `is_spmd_mode` are folded the final function will be folded as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106437	2021-07-21 21:18:14 -04:00
Joseph Huber	7d57639264	[OpenMP] Add new execution mode for SPMD execution with Generic semantics Qualified kernels can be transformed from generic-mode to SPMD mode using an optimization in OpenMPOpt. This patch introduces a new execution mode to indicate kernels that have been transformed from generic-mode to SPMD-mode. These kernels have SPMD-mode execution, but need generic-mode semantics for scheduling the blocks and threads. Without this far too few blocks will be scheduled for a generic region as SPMD mode expects the trip count to be divided by the number of threads. Reviewed By: ggeorgakoudis Differential Revision: https://reviews.llvm.org/D106460	2021-07-21 20:57:28 -04:00
Joseph Huber	754eb1c210	[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106496	2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis	5a682d9b91	[OpenMP] Expose libomptarget function to get HW thread id The patch exposes the libomptarget runtime function that gets the hardware thread id through the kmpc API. This is to be used in SPMDization for checking the thread id to execute regions by a single thread in a block. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106323	2021-07-21 10:26:04 -07:00
Jon Chesterfield	a733bbbd17	[libomptarget][amdgpu][nfc] Refactor #includes Create a hsa_api.h header that includes the ROCr headers in use Drop some unused headers and _cplusplus macros Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106455	2021-07-21 17:28:07 +01:00
Shilei Tian	55c65884a4	[OpenMP][deviceRTLs] Update return type of function __kmpc_parallel_level In `deviceRTLs`, the parallel level is stored in a shared variable of type `uint8_t`. `__kmpc_parallel_level` currently returns a 16-bit interger. This patch first changes the return type of the function to `uint8_t`, same as the shared variable, and then corrects function type which was updated in D105955. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106384	2021-07-20 15:45:43 -04:00
Shilei Tian	02dff78983	[NFC][OpenMP] Fix an issue that no CHECK in test cases This fixes the complaint from FileCheck. Reviewed By: abhinavgaba, jdoerfert Differential Revision: https://reviews.llvm.org/D106387	2021-07-20 15:39:18 -04:00
Joseph Huber	b917a1d713	[OpenMP] Change AMDGCN to AMDGPU in the Cmake Module Summary: Change the name for targeting AMD offloading.	2021-07-20 12:52:53 -04:00
Joseph Huber	6242f9b966	[OpenMP][Documentation] Fix hyperlink location Fixes the documentation hyperlinks not showing the header. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106374	2021-07-20 12:42:32 -04:00
Tony Tye	038602139d	[NFC] Correct documentation error in OpenMP release ReleaseNotes Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106330	2021-07-20 02:04:43 +00:00
Shilei Tian	996baa58a4	[OpenMP] Fixed a segmentation fault when using taskloop and target nowait The synchronization of task loop misses hidden helper tasks, causing segmentation fault reported in https://bugs.llvm.org/show_bug.cgi?id=50002. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D106220	2021-07-19 21:09:05 -04:00
Joseph Huber	762badb0ab	[Libomptarget] Remove volatile from NVPTX work function Currently the NPVTX work function is marked volatile. This prevents some optimizations from using this value. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106310	2021-07-19 20:03:25 -04:00
Giorgis Georgakoudis	fb0cf01795	Revert "[OpenMP] Codegen aggregate for outlined function captures" This reverts commit `e9c7291cb2`. Fix failing tests	2021-07-19 07:54:26 -07:00
Shilei Tian	4504e1134c	[OpenMP][CMake] Fix an issue when there is space in the argument LIBOMPTARGET_LIT_ARGS D106236 added a new CMake argument for `libomptarget` test, but when user's input contains white spaces, CMake will add escape char to the final lit command, which leads to an error. This patch converts the user's input `LIBOMPTARGET_LIT_ARGS` into a local array, and then passes the array to the function. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D106247	2021-07-18 21:54:14 -04:00
Shilei Tian	954711ed8f	[OpenMP][Offloading] Add a CMake argument LIBOMPTARGET_LIT_ARGS to control behavior of libomptarget lit test By default, `lit` uses all threads to invoke tests, which can easily cause out of memory on GPUs because most of OpenMP offloading test usually take about 1GB GPU memory, but a typical GPU only has 4-8GB memory. This patch introduce a CMake argument `LIBOMPTARGET_LIT_ARGS` to allow users to control the behavior of `libomptarget` tests, similar to `LLVM_LIT_ARGS`. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D106236	2021-07-18 13:16:10 -04:00
Shilei Tian	4357cfc792	[OpenMP][Offloading] Add -g when compiling deviceRTLs in debug mode Currently when we compile the project in debug mode, `-g` will not be added to compilation flag. The bc files generated in different mode are of different size. When using GPU debuggers like `cuda-gdb`, it is expected to provide more info with a debug version of bc lib. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D106229	2021-07-18 09:34:54 -04:00
Giorgis Georgakoudis	e9c7291cb2	[OpenMP] Codegen aggregate for outlined function captures Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102107	2021-07-16 23:27:44 -07:00
Joseph Huber	1616407921	[OpenMP] Add remark documentation to the OpenMP webpage This patch begins adding documentation for each remark emitted by `openmp-opt`. This builds on the IDs introduced in D105939 so that users can more easily identify each remark in the webpage. Depends on D105939. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106018	2021-07-16 14:09:43 -04:00
Shilei Tian	97c8f60bba	[NFC][OpenMP][Offloading] Replaced explicit parallel level computation with function `__kmpc_parallel_level` There are two places in current deviceRTLs where it computes parallel level explicitly, which is basically the functionality of `__kmpc_parallel_level`. Starting from D105787, we plan to introduce a series of function call folding based on information that can be deducted during compilation time. Computation of parallel level is the next target. This patch makes steps for the optimization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105955	2021-07-15 22:21:06 -04:00
George Rokos	0c7a4870c5	[libomptarget] Keep the Shadow Pointer Map up-to-date D105812 introduced a regression where if a PTR_AND_OBJ entry was mapped on the device, then the OBJ was deallocated and then reallocated at a different address, the Shadow Pointer Map would still contain an entry for the PTR but pointing to the old address. This caused test `env/base_ptr_ref_count.c` to fail. Differential Revision: https://reviews.llvm.org/D105947	2021-07-14 15:19:58 -07:00
Peyton, Jonathan L	424f14f0d2	[OpenMP] Fix one sign-compare warning from GCC	2021-07-13 12:36:12 -05:00
Peyton, Jonathan L	405eefe464	[OpenMP][NFC] Change comment style to eliminate warnings from GCC Standalone build for OpenMP runtime using GCC is giving -Wcomment warnings where a backslash newline is encountered in the // style comment. This switches the // style for /* style to silence the warnings.	2021-07-13 12:27:08 -05:00
Hansang Bae	db635a28e6	[OpenMP] Minor improvement in task allocation This patch includes a few changes to improve task allocation performance slightly. These changes are enough to restore performance drop observed after introducing hidden helper. Differential Revision: https://reviews.llvm.org/D105715	2021-07-13 09:07:14 -05:00
Roman Lebedev	4709d9d5be	[libomp] ompd_init(): fix heap-buffer-overflow when constructing libompd.so path There is no guarantee that the space allocated in `libname` is enough to accomodate the whole `dl_info.dli_fname`, because it could e.g. have an suffix - `.5`, and that highlights another problem - what it should do about suffxies, and should it do anything to resolve the symlinks before changing the filename? ``` $ LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/lib" ./src/utilities/rstest/rstest -c /tmp/f49137920.NEF dl_info.dli_fname "/usr/local/lib/libomp.so.5" strlen(dl_info.dli_fname) 26 lib_path_length 14 lib_path_length + 12 26 ================================================================= ==30949==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000002a at pc 0x000000548648 bp 0x7ffdfa0aa780 sp 0x7ffdfa0a9f40 WRITE of size 27 at 0x60300000002a thread T0 #0 0x548647 in strcpy (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x548647) #1 0x7fb9e3e3d234 in ompd_init() /repositories/llvm-project/openmp/runtime/src/ompd-specific.cpp:102:5 #2 0x7fb9e3dcb446 in __kmp_do_serial_initialize() /repositories/llvm-project/openmp/runtime/src/kmp_runtime.cpp:6742:3 #3 0x7fb9e3dcb40b in __kmp_get_global_thread_id_reg /repositories/llvm-project/openmp/runtime/src/kmp_runtime.cpp:251:7 #4 0x59e035 in main /home/lebedevri/rawspeed/build-Clang-SANITIZE/../src/utilities/rstest/rstest.cpp:491 #5 0x7fb9e3762d09 in __libc_start_main csu/../csu/libc-start.c:308:16 #6 0x4df449 in _start (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x4df449) 0x60300000002a is located 0 bytes to the right of 26-byte region [0x603000000010,0x60300000002a) allocated by thread T0 here: #0 0x55cc5d in malloc (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x55cc5d) #1 0x7fb9e3e3d224 in ompd_init() /repositories/llvm-project/openmp/runtime/src/ompd-specific.cpp:101:17 #2 0x7fb9e3762d09 in __libc_start_main csu/../csu/libc-start.c:308:16 SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/lebedevri/rawspeed/build-Clang-SANITIZE/src/utilities/rstest/rstest+0x548647) in strcpy Shadow bytes around the buggy address: 0x0c067fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c067fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c067fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c067fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0c067fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>0x0c067fff8000: fa fa 00 00 00[02]fa fa fa fa fa fa fa fa fa fa 0x0c067fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c067fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==30949==ABORTING Aborted ```	2021-07-13 15:36:46 +03:00
George Rokos	bb0166dc72	[libomptarget] Update device pointer only if needed Currently, libomptarget will always perform a host-to-device memory transfer in order to update the device pointer of a PTR_AND_OBJ entry. This is not always necessary because the device pointer may have been set to the correct pointee address already, so we can eliminate the redundant memory transfer.	2021-07-13 04:18:55 -07:00
Jon Chesterfield	b6b53ffef4	[libomptarget][devicertl] Remove branches around setting parallelLevel Simplifies control flow to allow store/load forwarding This change folds two basic blocks into one, leaving a single store to parallelLevel. This is a step towards spmd kernels with sufficiently aggressive inlining folding the loads from parallelLevel and thus discarding the nested parallel handling when it is unused. Transform: ``` int threadId = GetThreadIdInBlock(); if (threadId == 0) { parallelLevel[0] = expr; } else if (GetLaneId() == 0) { parallelLevel[GetWarpId()] = expr; } // => if (GetLaneId() == 0) { parallelLevel[GetWarpId()] = expr; } // because unsigned GetLaneId() { return GetThreadIdInBlock() & (WARPSIZE - 1);} // so whenever threadId == 0, GetLaneId() is also 0. ``` That replaces a store in two distinct basic blocks with as single store. A more aggressive follow up is possible if the threads in the warp/wave race to write the same value to the same address. This is not done as part of this change. ``` if (GetLaneId() == 0) { parallelLevel[GetWarpId()] = expr; } // => parallelLevel[GetWarpId()] = expr; // because unsigned GetWarpId() { return GetThreadIdInBlock() / WARPSIZE; } // so GetWarpId will index the same element for every thread in the warp // and, because expr is lane-invariant in this case, every lane stores the // same value to this unique address ``` Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D105699	2021-07-13 12:06:57 +01:00
Joachim Protze	681055ea69	[OpenMP] Remove TSAN annotations from libomp The annotations in libomp were never built by default. The annotations are also superseded by the annotations which the OMPT tool libarcher.so provides. With respect to libarcher, libomp behaves as if libarcher would be the last element of OMP_TOOL_LIBARARIES. I.e., if no other OMPT tool gets active, libarcher will check if an OpenMP application is built with TSan. Since libarcher gets loaded by default, enabling LIBOMP_TSAN_SUPPORT would result in redundant annotations for TSan, which slightly differ in details and coverage (e.g. task dependencies are not handled well by the annotations in libomp). This patch removes all TSan annotations from the OpenMP runtime code. Differential Revision: https://reviews.llvm.org/D103767	2021-07-12 18:49:11 +02:00
Joachim Protze	fedbff75f4	[OpenMP][OMPT] Fix compile-time assertion in ompt-multiplex.h The compile-time assertion is supposed to prevent double-free caused by unexpected combination of preprocessor defines passed by an OMPT tool. The current defines are not used, so this patch replaces the check with macros actually used in ompt-multiplex.h Reported by: Semih Burak Differential Revision: https://reviews.llvm.org/D104633	2021-07-12 12:12:09 +02:00
Johannes Doerfert	a7b7b5dfe5	[OpenMP] Create and use `__kmpc_is_generic_main_thread` In order to fold calls based on high-level knowledge and control flow tracking it helps to expose the information as a runtime call. The logic: `!SPMD && getTID() == getMasterTID()` was used in various places and is now encapsulated in `__kmpc_is_generic_main_thread`. As part of this rewrite we replaced eager computation of arguments with on-demand computation, especially helpful if the calls can be folded and arguments don't need to be computed consequently. Differential Revision: https://reviews.llvm.org/D105768	2021-07-11 19:18:03 -05:00
Johannes Doerfert	1ab1f04a2b	[OpenMP] Simplify variable sharing and increase shared memory size In order to avoid malloc/free, up to NUM_SHARED_VARIABLES_IN_SHARED_MEM (=64) variables are communicated in dedicated shared memory instead. The simplification does avoid the need for an "init" and requires "deinit" only if we ever communicate more than NUM_SHARED_VARIABLES_IN_SHARED_MEM variables. Differential Revision: https://reviews.llvm.org/D105767	2021-07-11 19:18:03 -05:00
Johannes Doerfert	0a223827de	[OpenMP] Remove checkXXXX device runtime functions We had multiple functions to determine the execution mode (SPMD/Generic) and runtime status (initialized/uninitialized) but that just increased complexity without a real benefit. Especially with D102307 in mind it is helpful to reduce the dependence on the `ident_t` flags. Differential Revision: https://reviews.llvm.org/D105586	2021-07-10 18:20:40 -05:00
Johannes Doerfert	e2cfbfcc0c	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Nico Weber	d3e7491333	Revert Attributor patch series Broke check-clang, see https://reviews.llvm.org/D102307#2869065 Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`	2021-07-10 16:15:55 -04:00
Johannes Doerfert	e603ca0306	[OpenMP] Remove checkXXXX device runtime functions We had multiple functions to determine the execution mode (SPMD/Generic) and runtime status (initialized/uninitialized) but that just increased complexity without a real benefit. Especially with D102307 in mind it is helpful to reduce the dependence on the `ident_t` flags. Differential Revision: https://reviews.llvm.org/D105586	2021-07-10 12:32:51 -05:00
Johannes Doerfert	1d5711c3ee	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 12:32:50 -05:00
Joel E. Denny	d99f65de2a	[OpenMP] Avoid checking parent reference count in targetDataBegin This patch is an attempt to do for `targetDataBegin` what D104924 does for `targetDataEnd`: * Eliminates a lock/unlock of the data mapping table. * Clarifies the logic that determines whether a struct member's host-to-device transfer occurs. The old logic, which checks the parent struct's reference count, is a leftover from back when we had a different map interface (as pointed out at <https://reviews.llvm.org/D104924#2846972>). Additionally, it eliminates the `DeviceTy::getMapEntryRefCnt`, which is no longer used after this patch. While D104924 does not change the computation of `IsLast`, I found I needed to change the computation of `IsNew` for this patch. As far as I can tell, the change is correct, and this patch does not cause any additional `openmp` tests to fail. However, I'm not sure I've thought of all use cases. Please advise. Reviewed By: jdoerfert, jhuber6, protze.joachim, tianshilei1992, grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D105121	2021-07-10 12:15:04 -04:00
Joel E. Denny	1d0456361a	[OpenMP] Avoid checking parent reference count in targetDataEnd The patch has the following benefits: * Eliminates a lock/unlock of the data mapping table. * Clarifies the logic that determines whether a struct member's device-to-host transfer occurs. The old logic, which checks the parent struct's reference count, is a leftover from back when we had a different map interface (as pointed out at <https://reviews.llvm.org/D104924#2846972>). Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104924	2021-07-10 12:15:04 -04:00
Alexey Bataev	ab8989ab87	[OPENMP]Fix overlapped mapping for dereferenced pointer members. If the base is used in a map clause and later we have a memberexpr with this base, and the member is a pointer, and this pointer is dereferenced anyhow (subscript, array section, dereference, etc.), such components should be considered as overlapped, otherwise it may lead to incorrect size computations, since we try to map a pointee as a part of the whole struct, which is not true for the pointer members. Differential Revision: https://reviews.llvm.org/D105562	2021-07-09 12:51:26 -07:00
Michał Górny	2b0d95fb58	[openmp] [test] Add missing <limits> include to capacity_nthreads Differential Revision: https://reviews.llvm.org/D105474	2021-07-06 20:39:53 +02:00
Jon Chesterfield	ddfb074a80	[libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global [libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global Folds some duplicates logic into a helper function, passes the new environment struct into getLaunchVals which no longer reads the DeviceInfo global. Implemented on top of D105237 Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D105239	2021-07-06 17:06:38 +01:00
Atmn Patel	21e92612c0	[Libomptarget] Experimental Remote Plugin Fixes D97883 introduced a compile-time error in the experimental remote offloading libomptarget plugin, this patch fixes it and resolves a number of inconsistencies in the plugin as well: 1. Non-functional Asynchronous API 2. Unnecessarily verbose debug printing 3. Misc. code clean ups This is not intended to make any functional changes to the plugin. Differential Revision: https://reviews.llvm.org/D105325	2021-07-02 12:38:34 -04:00
Hansang Bae	f1b9ce2736	[OpenMP] Fix a few issues with hidden helper task This patch includes the following changes to address a few issues when using hidden helper task. - Assertion is triggered when there are inadvertent calls to hidden helper functions on non-Linux OS - Added deinit code in __kmp_internal_end_library function to fix random shutdown crashes - Moved task data access into the lock-guarded region in __kmp_push_task Differential Revision: https://reviews.llvm.org/D105308	2021-07-01 17:10:32 -05:00
Shilei Tian	369216ab31	[OpenMP][Offloading] Refined return value of `DeviceTy::getOrAllocTgtPtr` `DeviceTy::getOrAllocTgtPtr` just returns a target pointer. In addition, two bool values (`IsNew` and `IsHostPtr`) are passed by reference to make the change in the function available in callee. In this patch, a struct, which contains the target pointer, two flags, and an iterator to the map table entry corresponding to the queried host pointer, will be returned. In addition to make the logic clearer regarding the two bool values, this paves the way for the next patch to fix the data race in `bug49334.cpp` by attaching an event to the map table entry (and that's why we need the iterator). Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104382	2021-07-01 12:32:03 -04:00
Jon Chesterfield	db89414da4	[libomptarget][nfc] Move grid size computation Change getLaunchVals to return the integers used for launch Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D105237	2021-07-01 12:53:04 +01:00
Dhruva Chakrabarti	98c36f0079	Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size" This reverts commit `2240b41ee4`. A value of 0 for KernDescVal WG_Size implies it is unknown, so it should be set to the default. The above change was made without this assumption. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105250	2021-06-30 17:15:00 -07:00
Jon Chesterfield	4b0926b044	[libomptarget][nfc] Replace out arguments with struct return A step towards making this function adequately self contained that it can be tested easily. No functional change intended here, left variable names unchanged. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D105229	2021-06-30 22:40:07 +01:00
Jon Chesterfield	d86b0073cf	[libomptarget][amdgpu][nfc] Fix build warnings, drop some headers Removes stdarg header, drops uses of iostream, fix some format string errors. Also changes a C style struct to C++ style to avoid a warning from clang/ Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D104923	2021-06-30 22:23:36 +01:00
Shilei Tian	24a36ce58b	[OpenMP][Offloading] Replace all calls to `isSPMDMode` with `__kmpc_is_spmd_exec_mode` In our ongoing work, we are using `AbstractAttributor` to deduct execution model of device functions, and potententially remove unnecessary function calls to `__kmpc_is_spmd_exec_mode`. In current device runtime, we have mixed use of `isSPMDMode` and `__kmpc_is_spmd_exec_mode`, but in fact in `__kmpc_is_spmd_exec_mode` it simply calls `isSPMDMode`. Since all functions starting with `__kmpc` is C function, which doesn't have things like name mangling. It is more optimization friendly. In this patch, we simply replaced all calls to `isSPMDMode` with `__kmpc_is_spmd_exec_mode` to pave the way for the optimization. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105211	2021-06-30 15:39:57 -04:00
Dhruva Chakrabarti	e0b713a035	[libomptarget] [amdgpu] Change default number of teams per computation unit This patch is related to https://reviews.llvm.org/D98832. Based on discussions there, I decided to separate out the teams default as this patch. This change is to increase the number of teams per computation unit so as to provide more wavefronts for hiding latency. This change improves performance for some programs, including 20-50% for some Stream benchmarks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D99003	2021-06-29 15:34:35 -07:00
Dhruva Chakrabarti	2240b41ee4	[libomptarget] [amdgpu] Fix default setting of max flat workgroup size When max flat workgroup size is not specified, it is set to the default workgroup size. This prevents kernel launch with a workgroup size larger than the default. The fix is to ignore a size of 0 and treat it as unspecified. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105073	2021-06-29 13:47:24 -07:00
Johannes Doerfert	4eb90e893f	Revert "[OpenMP] Add Two-level Distributed Barrier" This reverts commit `25073a4ecf`. This breaks non-x86 OpenMP builds for a while now. Until a solution is ready to be upstreamed we revert the feature and unblock those builds. See: https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821 and https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821 The currently proposed fix (D104788) seems not to be ready yet: https://reviews.llvm.org/D104788#2841928	2021-06-29 09:38:27 -05:00
Johannes Doerfert	bc8bb3df35	Revert "[omp] Fix build without ITT after D103121 changes" This reverts commit `eab1fd389b`. This commit fixed a problem with `25073a4ecf` (D103121) which is the one we actually need to revert to unblock non-X86 builds of OpenMP. Can be reapplied, or merged into, D103121 as it goes in again.	2021-06-29 09:38:27 -05:00
Joseph Huber	2190c48fde	[OpenMP][Documentation] Add FAQ entry for CMake module This patch adds documentation for using the CMake find module for OpenMP target offloading provided by LLVM. It also removes the requirement for AMD's architecture to be set as this isn't necessary for upstream LLVM. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105051	2021-06-28 17:05:07 -04:00
Joseph Huber	c9f3240c9d	[OpenMP][Documentation] Add OpenMPOpt optimization section Add some information about the optimizations currently provided by OpenMPOpt. Every optimization performed should eventually be listed here. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105050	2021-06-28 17:05:03 -04:00
Pushpinder Singh	20df2c7052	[AMDGPU][Libomptarget] Collect allocatable memory pools using HSA The logic is almost similar to that of system.cpp with one change that instead of adding all the memory pools to a device struct it only keeps a single pool. The existing approach also always allocated memory on the first HSA pool found for a GPU. This depends on D104691. The goal of this series of patches is to remove _atl_machine global. The next patch will drop g_atl_machine entirely. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104695	2021-06-28 11:28:04 +00:00
Jon Chesterfield	f66b8fdc0a	[libomptarget][amdgpu] Build openmp for two more targets [libomptarget][amdgpu] Build openmp for two more targets The 4800U APU is a gfx902 and the MI100 accelerator is a gfx908. Both numbers are listed in ROCT topology.c Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D104922	2021-06-25 19:02:03 +01:00
Jon Chesterfield	96f6873dff	[OpenMP][NFC] Drop unused headers from amdgpu plugin	2021-06-25 12:08:56 +01:00
AndreyChurbanov	b2787945f9	[OpenMP][NFC] libomp: fix wrong debug assertion. Normalized bounds of chunk of iterations to steal from are inclusive, so upper bound should not be decremented in expression to check. Problem was in attempt to steal iterations 0:0, that caused assertion after wrong decrement. Reported in comment to https://reviews.llvm.org/D103648. Differential Revision: https://reviews.llvm.org/D104880	2021-06-25 02:02:14 +03:00
Aakanksha Patil	3453f3dd46	[AMDGPU] Add gfx1035 target Differential Revision: https://reviews.llvm.org/D104804	2021-06-24 14:32:41 -04:00
Joel E. Denny	9fa5e3280d	[OpenMP] Fix delete map type in ref count debug messages For example, without this patch: ``` $ cat test.c int main() { int x; #pragma omp target enter data map(alloc: x) #pragma omp target enter data map(alloc: x) #pragma omp target enter data map(alloc: x) #pragma omp target exit data map(delete: x) ; return 0; } $ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c $ LIBOMPTARGET_DEBUG=1 ./a.out \|& grep 'Creating\\|Mapping exists\\|last' Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=1, Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (incremented), Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=3 (incremented), Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (decremented) Libomptarget --> There are 4 bytes allocated at target address 0x00000000013bb040 - is not last ``` `RefCount` is reported as decremented to 2, but it ought to be reset because of the `delete` map type, and `is not last` is incorrect. This patch migrates the reset of reference counts from `DeviceTy::deallocTgtPtr` to `DeviceTy::getTgtPtrBegin`, which then correctly reports the reset. Based on the `IsLast` result from `DeviceTy::getTgtPtrBegin`, `targetDataEnd` then correctly reports `is last` for any deletion. `DeviceTy::deallocTgtPtr` is responsible only for the final reference count decrement and mapping removal. An obscure side effect of this patch is that a `delete` map type when the reference count is infinite yields `DelEntry=IsLast=false` in `targetDataEnd` and so no longer results in a `DeviceTy::deallocTgtPtr` call. Without this patch, that call is a no-op anyway besides some unnecessary locking and mapping table lookups. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104560	2021-06-23 09:57:19 -04:00
Joel E. Denny	48421ac441	[OpenMP] Improve ref count debug messages For example, without this patch: ``` $ cat test.c int main() { int x; #pragma omp target enter data map(alloc: x) #pragma omp target exit data map(release: x) ; return 0; } $ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c $ LIBOMPTARGET_DEBUG=1 ./a.out \|& grep 'Creating\\|Mapping exists' Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, Name=unknown Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, updated RefCount=1 ``` There are two problems in this example: * `RefCount` is not reported when a mapping is created, but it might be 1 or infinite. In this case, because it's created by `omp target enter data`, it's 1. Seeing that would make later `RefCount` messages easier to understand. * `RefCount` is still 1 at the `omp target exit data`, but it's reported as `updated`. The reason it's still 1 is that, upon deletions, the reference count is generally not updated in `DeviceTy::getTgtPtrBegin`, where the report is produced. Instead, it's zeroed later in `DeviceTy::deallocTgtPtr`, where it's actually removed from the mapping table. This patch makes the following changes: * Report the reference count when creating a mapping. * Where an existing mapping is reported, always report a reference count action: * `update suppressed` when `UpdateRefCount=false` * `incremented` * `decremented` * `deferred final decrement`, which replaces the misleading `updated` in the above example * Add comments to `DeviceTy::getTgtPtrBegin` to explain why it does not zero the reference count. (Please advise if these comments miss the point.) * For unified shared memory, don't report confusing messages like `RefCount=` or `RefCount= updated` given that reference counts are irrelevant in this case. Instead, just report `for unified shared memory`. * Use `INFO` not `DP` consistently for `Mapping exists` messages. * Fix device table dumps to print `INF` instead of `-1` for an infinite reference count. Reviewed By: jhuber6, grokos Differential Revision: https://reviews.llvm.org/D104559	2021-06-23 09:57:19 -04:00
Joseph Huber	72d4cd627c	[OpenMP] Introduce an CMake find module for OpenMP Target support This introduces a CMake find module for detecting target offloading support in a compiler. The goal is to make it easier to incorporate target offloading into a cmake project. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D104710	2021-06-22 23:01:38 -04:00
Joseph Huber	422adaa879	[OpenMP] Add thread limit environment variable support to plugins The OpenMP 5.1 standard defines the environment variable `OMP_TEAMS_THREAD_LIMIT` to limit the number of threads that will be run in a single block. This patch adds support for this into the AMDGPU and CUDA plugins. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103923	2021-06-22 16:25:40 -04:00
Shilei Tian	0029059074	[NFC][OpenMP][Offloading] Unified the construction of mapping table entry This patch unifies construction of mapping table entry to use `emplace`. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104580	2021-06-22 12:38:47 -04:00
Joseph Huber	244e98ff48	[Libomptarget] Improve device runtime implementation for globalized variables. Currently the runtime implementation of `__kmpc_alloc_shared` is extremely slow because it allocated memory for each thread individually. This patch adds a small buffer for the threads to share data and will greatly improve performance for builds where all globalization could not be optimized out. If the shared buffer is full, then memory will not only be allocated per-warp rather than per-thread. Depends on D97680 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104666	2021-06-22 11:52:49 -04:00
Joseph Huber	952a0f2385	[Libomptarget] Introduce new globalization runtime calls Summary: This patch introduces the new globalization runtime to be used by D97680. These runtime calls will replace the __kmpc_data_sharing_push_stack and __kmpc_data_sharing_pop_stack functions. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102532	2021-06-22 10:05:42 -04:00
AndreyChurbanov	5dd4d0d46f	[OpenMP] libomp: fix dynamic loop dispatcher Restructured dynamic loop dispatcher code. Fixed use of dispatch buffers for nonmonotonic dynamic (static_steal) schedule: - eliminated possibility of stealing iterations of the wrong loop when victim thread changed its buffer to work on another loop; - fixed race when victim thread changed its buffer to work in nested parallel; - eliminated "static" property of the schedule, that is now a single thread can execute whole loop. Differential Revision: https://reviews.llvm.org/D103648	2021-06-22 16:29:01 +03:00
Pushpinder Singh	9d110f9159	[AMDGPU][Libomptarget] Move allow_access_to_all_gpu_agents to rtl.cpp Moving this method helps eliminate a use of g_atl_machine. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104691	2021-06-22 11:44:52 +00:00
Vladislav Vinogradov	eab1fd389b	[omp] Fix build without ITT after D103121 changes Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D104638	2021-06-21 18:17:52 +03:00
Vyacheslav Zakharin	aad9e48c5f	[NFC][libomptarget] Remove redundant libelf dependency for elf_common. Differential Revision: https://reviews.llvm.org/D104549	2021-06-21 07:19:55 -07:00
Pushpinder Singh	7a97cd9da7	[AMDGPU][Libomptarget] Remove redundant functions There does not seem to be any use of these functions. They just put the value to a local which is never used again. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104512	2021-06-21 06:13:24 +00:00
Shilei Tian	ec97866454	[OpenMP] Make bug49334.cpp more reproducible `bug49334.cpp` cannot detect data race in `libomptarget` efficiently. It is reported that with `N = 256` and `BS = 16`, the data race can be reproduced more steadily. The next coming pathces will fix it so this patch is expected to fail now. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104552	2021-06-18 18:35:41 -04:00
Asher Mancinelli	5c189d30e6	[OpenMP] Update FAQ for enabling cuda offloading Add an FAQ entry and add a few lines to an existing one. Document the use of `GCC_INSTALL_PREFIX` for pointing clang to correct GCC installation for two-stage build. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D104474	2021-06-18 11:55:45 -06:00
Vyacheslav Zakharin	836992ab9a	[NFC][libomptarget] Build elf_common with PIC. Differential Revision: https://reviews.llvm.org/D104545	2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin	c5b7c7c8f7	[NFC][libomptarget] Fixed -DLLVM_ENABLE_RUNTIMES="openmp" build. Differential Revision: https://reviews.llvm.org/D104535	2021-06-18 09:20:10 -07:00
Terry Wilmarth	25073a4ecf	[OpenMP] Add Two-level Distributed Barrier Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier. This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently. The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required: KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier. Differential Revision: https://reviews.llvm.org/D103121	2021-06-16 15:34:55 -05:00
Vyacheslav Zakharin	b5c4fc0f23	[NFC][libomptarget] Reduce the dependency on libelf This change-set removes libelf usage from elf_common part of the plugins. libelf is still used in x86_64 generic plugin code and in some plugins (e.g. amdgpu) - these will have to be cleaned up in separate checkins. Differential Revision: https://reviews.llvm.org/D103545	2021-06-16 08:34:23 -07:00
AndreyChurbanov	610fea65e2	[OpenMP] libomp: fixed implementation of OMP 5.1 inoutset task dependence type Refactored code of dependence processing and added new inoutset dependence type. Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps. All dependence flags library gets so far and corresponding dependence types: 1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET. Differential Revision: https://reviews.llvm.org/D97085	2021-06-16 14:47:29 +03:00
Joachim Protze	d2a7871b5e	[OpenMP][NFC] Add back suppression of warning Commit `cff215565e` did not fix all unused variables in different builds, so adding back the suppression for now.	2021-06-16 10:14:59 +02:00
Joachim Protze	cff215565e	[OpenMP] Remove unused variables from libomp code Several variables were left unused as a result of different patches removing their use. Two variables have some use: `poll_count` is used by the KMP_BLOCKING macro only under certain conditions. Adding (void) to tell the compiler to ignore the unused variable. `padding` is a dummy stack allocation with no intent to be used. Also adding (void) to make the compiler ignore the unused variable. Differential Revision: https://reviews.llvm.org/D104303	2021-06-16 09:33:46 +02:00
Peyton, Jonathan L	56da28240f	[OpenMP] Add GOMP 5.0 version symbols to API * Add GOMP versioned pause functions * Add GOMP versioned affinity format functions To do the affinity format functions, only attach versioned symbols to the APPEND Fortran entries (e.g., omp_set_affinity_format_) since GOMP only exports two symbols (one for Fortran, one for C). Our affinity format functions have three symbols. e.g., with omp_set_affinity_format: 1) omp_set_affinity_format (Fortran interface) 2) omp_set_affinity_format_ (Fortran interface) 3) ompc_set_affinity_format (C interface) Have the GOMP version of the C symbol alias the ompc_* 3) version instead of the Fortran unappended version 1). Differential Revision: https://reviews.llvm.org/D103647	2021-06-15 16:25:00 -05:00
Peyton, Jonathan L	92baf414db	[OpenMP] Fix affinity determine capable algorithm on Linux Remove strange checks for syscall() arguments where mask is NULL. Valgrind reports these as error usages for the syscall. Instead, just check if CACHE_LINE bytes is long enough. If not, then search for the size. Also, by limiting the first size detection attempt to CACHE_LINE bytes, instead of 1MB, we don't use more than one cache line for the mask size. Before this patch, sometimes the returned mask size was 640 bytes (10 cache lines) because the initial call to getaffinity() was limited only by the internal kernel mask size which can be very large. Differential Revision: https://reviews.llvm.org/D103637	2021-06-15 16:21:30 -05:00
Peyton, Jonathan L	0ddde4d865	[OpenMP] Lazily assign root affinity Lazily set affinity for root threads. Previously, the root thread executing middle initialization would attempt to assign affinity to other existing root threads. This was not working properly as the set_system_affinity() function wasn't setting the affinity for the target thread. Instead, the middle init thread was resetting the its own affinity using the target thread's affinity mask. Differential Revision: https://reviews.llvm.org/D103625	2021-06-15 16:21:06 -05:00
Pushpinder Singh	cadcaf3f46	[AMDGPU][Libomptarget] Drop dead code related to g_atl_machine This patch includes some changes which deletes the code accessing g_atl_machine global. Some accesses related to memory_pools are still remaining. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103813	2021-06-15 05:21:35 +00:00
Ron Lieberman	91f147792e	[libomptarget][amdgpu] Remove stray fprintf in rtl.cpp remove unintended fprintf in rtl.cpp Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D104003	2021-06-10 01:57:30 +00:00
AndreyChurbanov	9ce2e5e700	Revert "[OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type" This reverts commit `a1f550e052`. Revert in order to fix backwards compatibility breakage caused by type size change for task dependence flag.	2021-06-09 17:38:38 +03:00
Joachim Protze	639b397931	[OpenMP][Tools] Fix Archer handling of task dependencies The current handling of dependencies in Archer has two flaws: - annotation of dependency synchronization is not limited to sibling tasks - annotation of in/out dependencies is based on the assumption, that dependency variables will rarely be byte-sized variables. This patch introduces a map in the generating task to manage the dependency variables for the child tasks. The map is only accesses from the generating task, so no locking is necessary. This also limits the dependency-based synchronization to sibling tasks. This patch also introduces proper handling for new dependency types such as mutexinoutset and inoutset. Differential Revision: https://reviews.llvm.org/D103608	2021-06-09 13:36:20 +02:00
Joachim Protze	08d8f1a958	[OpenMP][Tools] Cleanup memory pool used in Archer The main motivation for reusing objects is that it helps to avoid creating and leaking synchronization clocks in TSan. The reused object will reuse the synchronization clock in TSan. Before, new and delete operators were overloaded to get and return memory for the object from/to the object pool. This patch replaces the operator overloading with explicit static New/Delete functions. Objects for parallel regions and implicit tasks will always be recruited and returned to the thread-local object pool. Only for explicit task, there is a chance that an other thread completes the task and will free the object. This patch optimizes the thread-local New/Delete calls by avoiding locks and only lock if the pool is empty. Remote threads return the object into a separate queue. The chunk size for allocations is now decided based on page size. The objects will also be aligned to cache lines avoiding false sharing. This is the first patch in a series to provide better tasking support. Differential Revision: https://reviews.llvm.org/D103606	2021-06-09 13:36:19 +02:00
Joachim Protze	82e4e50531	[OpenMP][Tools] Fix Archer for MACOS Archer uses weak symbol overloads of TSan functions to enable loading the tool even if the application is not built with TSan. For MACOS the tool collects the function pointer at runtime. When adding the function entry/exit markers, we missed to add the functions in the MACOS codepath. This patch also replaces the repeated function lookup by a single initial function lookup and fixes the disabling logic in RunningOnValgrind. Differential Revision: https://reviews.llvm.org/D103607	2021-06-09 13:36:19 +02:00
Brendon Cahoon	294efbbd3e	Reland "[AMDGPU] Add gfx1013 target" This reverts commit `211e584fa2`. Fixed a use-after-free error that caused the sanitizers to fail.	2021-06-08 21:15:35 -04:00
Joseph Huber	df965513a9	[OpenMP] Add an information flag for device data transfers This patch adds an information flag that indicated when data is being copied to and from the device. This will be helpful for finding redundant or unnecessary data transfers in applications. Reviewed By: jdoerfert, grokos Differential Revision: https://reviews.llvm.org/D103927	2021-06-08 20:23:27 -04:00
Brendon Cahoon	211e584fa2	Revert "[AMDGPU] Add gfx1013 target" This reverts commit `ea10a86984`. A sanitizer buildbot reports an error.	2021-06-08 16:29:41 -04:00
Brendon Cahoon	ea10a86984	[AMDGPU] Add gfx1013 target Differential Revision: https://reviews.llvm.org/D103663	2021-06-08 12:49:49 -04:00
Vignesh Balasubramanian	f61602b0d3	[OpenMP][OMPD] Implementation of OMPD debugging library - libompd. This is the first of seven patches that implements OMPD, a debugging interface to support debugging of OpenMP programs. It contains support code required in "openmp/runtime" for OMPD implementation. Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100181	2021-06-08 16:44:22 +05:30
Peyton, Jonathan L	d70e1f1276	[OpenMP][runtime] add .clang-tidy file Use same checks as compiler-rt which removes checks for readability-* and llvm-header style. Differential Revision: https://reviews.llvm.org/D103711	2021-06-07 13:56:39 -05:00
AndreyChurbanov	a1f550e052	[OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type Refactored code of dependence processing and added new inoutset dependence type. Compiler can set dependence flag to 0x8 when call __kmpc_omp_task_with_deps. Size of type of the dependence flag changed from 1 to 4 bytes in clang. All dependence flags library gets so far and corresponding dependence types: 1 - IN, 2 - OUT, 3 - INOUT, 4 - MUTEXINOUTSET, 8 - INOUTSET. Differential Revision: https://reviews.llvm.org/D97085	2021-06-07 21:42:51 +03:00
Bryan Chan	54f059c900	[OpenMP] Check loc for NULL before dereferencing it The ident_t * argument in __kmp_get_monotonicity was being used without a customary NULL check, causing the function to crash in a Debug build. Release builds were not affected thanks to dead store elimination.	2021-06-07 10:45:48 -04:00
Pushpinder Singh	4f8bc7caf4	[AMDGPU][Libomptarget] Remove atlc global This global struct used to hold various flags for monitoring the initialization of hsa. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103795	2021-06-07 11:09:01 +00:00
Pushpinder Singh	f5f329a371	[AMDGPU][Libomptarget] Rework logic for locating kernarg pools Previous logic was to always use the first kernarg pool found to allocate kernel args. This patch changes this to use only the kernarg pool which has non-zero size. This logic is also reworked to not use any globals. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103600	2021-06-07 06:41:37 +00:00
Terry Wilmarth	8ec9aa236e	[OpenMP] Add experimental nesting mode feature Nesting mode is a new experimental feature in the OpenMP runtime. It allows a user to set up nesting for an application in a way that corresponds to the hardware topology levels on the machine an application is being run on. For example, if a machine has 2 sockets, each with 12 cores, then use of nesting mode could set up an outer level of nesting that uses 2 threads per parallel region, and an inner level of nesting that uses 12 threads per parallel region. Nesting mode is controlled with the KMP_NESTING_MODE environment variable as follows: 1) KMP_NESTING_MODE = 0: Nesting mode is off (default); max-active-levels-var is set to 1 (the default -- nesting is off, nested parallel regions are serialized). 2) KMP_NESTING_MODE = 1: Nesting mode is on, and a number of threads will be assigned for each level discovered in the machine topology; max-active-levels-var is set to the number of levels discovered. 3) KMP_NESTING_MODE = n, n>1: [Note: this option is experimental and may change or be removed in the future.] Nesting mode is on, and a number of threads will be assigned for each topology level discovered on the machine, up to k<=n levels (since there may be fewer than n levels discovered in the topology), and beyond the kth level, nested parallel regions will be serialized; NOTE: max-active-levels-var is 1 (the default -- nesting is off, and nested parallel regions are serialized until the user changes max-active-levels-var. If the user sets OMP_NUM_THREADS or OMP_MAX_ACTIVE_LEVELS, they will override KMP_NESTING_MODE settings for the associated environment variables. The detected topology may be limited by an affinity mask setting on the initial thread, or if the user sets KMP_HW_SUBSET. See also: KMP_HOT_TEAMS_MAX_LEVEL for controlling use of hot teams for nested parallel regions. Note that this feature only sets numbers of threads used at nesting levels. The user should make use of OMP_PLACES and OMP_PROC_BIND or KMP_AFFINITY for affinitizing those threads, if desired. Differential Revision: https://reviews.llvm.org/D102188	2021-06-04 16:01:11 -05:00
Peyton, Jonathan L	56dd158c32	[OpenMP] fix spelling error in message-converter.pl	2021-06-04 11:20:32 -05:00
Peyton, Jonathan L	f7655f3df3	[OpenMP] Fix improper printf format specifier	2021-06-02 11:04:48 -05:00
Hansang Bae	7ba4e96ede	[OpenMP] Use new task type/flag for taskwait depend events. Differential Revision: https://reviews.llvm.org/D103464	2021-06-02 10:16:38 -05:00
Pushpinder Singh	b25546a4b4	[AMDGPU][Libomptarget][NFC] Remove bunch of dead structs Dropped structs are atmi_machine_t, atmi_device_t and atmi_memory_t Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103509	2021-06-02 10:40:51 +00:00
Pushpinder Singh	2368170a8d	[AMDGPU][Libomptarget][NFC] Remove atmi_place_t atmi_place_t has been replaced with int DeviceId. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103508	2021-06-02 10:35:28 +00:00
Peyton, Jonathan L	2020c981fa	[OpenMP] Add L2-Tile equivalence for KNL When on KNL and L2 or Tile layer is detected, manually add the corresponding layer which is equivalent. Differential Revision: https://reviews.llvm.org/D102865	2021-06-01 14:17:13 -05:00
Hansang Bae	cf5c94ef08	[OpenMP] Define named constants for interop's foreign runtime ID Also added missing Fortran definitions for interop support. Differential Revision: https://reviews.llvm.org/D102883	2021-06-01 13:06:59 -05:00
Pushpinder Singh	fb113264a8	[AMDGPU][Libomptarget] Remove g_atmi_machine global Turns out the only purpose of this class was verify if device ID was in range or not which could be done easily by using g_atl_machine. Still getting rid of g_atl_machine is pending which would be done in a later patch. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103443	2021-06-01 12:34:24 +00:00
Pushpinder Singh	4fc3286951	[AMDGPU][Libomptarget][NFC] Split host and device malloc This patch splits the code path for host and device malloc. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103389	2021-05-31 12:09:18 +00:00
Pushpinder Singh	8b79dfb302	[AMDGPU][Libomptarget][NFC] Remove atmi_mem_place_t This struct was used to specify the device on which memory was being allocated/free in atmi_malloc/free. It has now been replaced with int DeviceId. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103239	2021-05-27 11:53:18 +00:00
Jon Chesterfield	2fdf8bbd19	[libomptarget][nfc][amdgpu] Factor out setting upper bounds Refactor suggested in D103037 to help avoid similar copy-paste errors. Change is mechanical. Some parts of this would be more robust with unsigned. Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D103090	2021-05-26 19:57:49 +01:00
Jon Chesterfield	c5c1ec7945	[libomptarget][nfc][amdgpu] Refactor uses of KernelInfoTable Suggested in D103059. Use a single lookup instead of two, more const, less mutation. Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D103093	2021-05-26 19:25:25 +01:00
Jon Chesterfield	07f59baad6	[libomptarget][nfc][amdgpu] Remove atmi_status_t type ATMI_STATUS_UNKNOWN was unused, deleted references to it. Replaced ATMI_STATUS_{SUCCESS,ERROR} with HSA_STATUS_{SUCCESS,ERROR} Replaced atmi_status_t with hsa_status_t Otherwise no change. In particular, conversions between atmi_status_t and hsa_status_t will now be conversions between hsa_status_t and itself. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D103115	2021-05-26 17:02:19 +01:00
Pushpinder Singh	a2d6ef5876	[AMDGPU][Libomptarget] Inline atmi_init/atmi_finalize After D102847, these functions can be inlined. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103075	2021-05-26 10:50:08 +00:00
Pushpinder Singh	cc8661ac4a	[AMDGPU][Libomptarget] Delete g_atmi_initialized This patch drops g_atmi_initialized and inlines the Initialize & Finalize methods from Runtime class. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102847	2021-05-26 10:46:54 +00:00
Pushpinder Singh	7648b6978e	[AMDGPU][Libomptarget] Move Kernel/Symbol info tables to RTLDeviceInfoTy Two globals KernelInfoTable & SymbolInfoTable are moved into RTLDeviceInfoTy class. This builds on the top of D102691. [2/2] Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102692	2021-05-26 10:02:28 +00:00
Jon Chesterfield	df005fa364	[libomptarget][nfc] Move hostcall required test to rtl [libomptarget][nfc] Move hostcall required test to rtl Remove a global, fix minor race. First of N patches to bring up hostcall. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103058	2021-05-25 22:43:17 +01:00
Pushpinder Singh	b0d68c7141	[AMDGPU][Libomptarget] Mark lambda_by_value test as XFAIL Reason: Missing printf definition Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103078	2021-05-25 12:16:54 +00:00
Jon Chesterfield	75492e20fb	[libomptarget][nfc] Accept callable for hsa iterate_symbols [libomptarget][nfc] Accept callable for hsa iterate_symbols Candidate refactor to simplify D102692 Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D103030	2021-05-25 09:29:11 +01:00
Dhruva Chakrabarti	96d70f4d28	[libomptarget] [amdgpu] Added LDS usage to the kernel trace Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103059	2021-05-24 19:33:48 -07:00
Hansang Bae	95cefacfe1	[OpenMP] Fix crashing critical section with hint clause Runtime was using the default lock type without using the hint. Differential Revision: https://reviews.llvm.org/D102955	2021-05-24 17:25:01 -05:00
Dhruva Chakrabarti	ca17b26d4d	[libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner case. Fix the case where NumTeams was set incorrectly instead of NumThreads Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103037	2021-05-24 15:23:15 -07:00
Pushpinder Singh	486110eb41	[AMDGPU][Libomptarget] Remove global KernelNameMap KernelNameMap contains entries like "key.kd" => key which clearly could be replaced by simple logic of removing suffix from the key. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102691	2021-05-24 08:46:08 +00:00
AndreyChurbanov	aa6e7e8da8	[OpenMP] libomp: move warnings to after library initialization Warnings on deprecated api cannot be suppressed if the library is not initialized. With this change it is possible to set KMP_WARNINGS=false to suppress the warnings. Differential Revision: https://reviews.llvm.org/D102676	2021-05-21 23:47:23 +03:00
George Rokos	d0bc04d6b9	[libomptarget] Fix a bug whereby firstprivates are not copied over to the device The check for the TO flag when processing firstprivates is missing. As a result, sometimes the device copy of a firstprivate never gets initialized. Currectly we try to force lambda structs to be allocated immediately by marking them as a non-firstprivate, so that PrivateArgumentManagerTy::addArg allocates memory for them immediately. However, calling addArg with IsFirstPrivate=false makes the function skip initializing the device copy. Whether an argument is firstprivate and whether we need to allocate memory immediately are not synonyms, so this patch introduces one more control variable for immediate allocation and sets it apart from initialization. Differential Revision: https://reviews.llvm.org/D102890	2021-05-21 10:52:08 -07:00
Jon Chesterfield	d54712ab4d	[libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation [libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation There are a lot of different ways we might implement the devicertl local alloc and free functions. Via host, local buffers (stack or arena), specialising per kernel etc. It is not yet clear what the right design is. This change makes the alloc and free functions weak, so one can override them from local tests while comparing options. Not strictly necessary, as a comparable patch can be applied locally each time, but would be convenient for out of tree dev. Plan would be to drop the weak attribute at the same time as introducing a working allocator to trunk. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102499	2021-05-21 16:09:22 +01:00
Jon Chesterfield	68b88ae670	[libomptarget] Improve dlwrap compile time error diagnostic [libomptarget] Improve dlwrap compile time error diagnostic The dlwrap interface takes an explict arity, e.g. DLWRAP(cuAlloc, 2); This probably can't be eliminated as it controls the argument list of an external symbol, not an inline header function. If the arity given is too big, the error from clang referring to the line is in the middle of implementation details. /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1277:7: error: static_assert failed due to requirement '0UL < tuple_size<std::tuple<>>::value' "tuple index is in range" static_assert(__i < tuple_size<tuple<>>::value, ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1260:7: ... /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1260:7: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:93:27 ... /home/amd/llvm-project/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp:34:1: note: in instantiation of template class 'dlwrap::trait<cudaError_enum ()(unsigned long , unsigned long)>::arg<2>' requested here DLWRAP(cuMemAlloc, 3); ^ /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:51:31: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:166:3: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:133:3: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:186:37: ... If the arity is too small, the diagnostic is better: cuda/dynamic_cuda/cuda.cpp:34:1: error: too few arguments to function call, expected 2, have 1 DLWRAP(cuMemAlloc, 1); This patch changes the diagnostic to: cuda/dynamic_cuda/cuda.cpp:34:1: error: static_assert failed due to requirement '1 == trait<cudaError_enum ()(unsigned long , unsigned long)>::nargs' "Arity Error" DLWRAP(cuMemAlloc, 1); or cuda/dynamic_cuda/cuda.cpp:34:1: error: static_assert failed due to requirement '3 == trait<cudaError_enum ()(unsigned long , unsigned long)>::nargs' "Arity Error" DLWRAP(cuMemAlloc, 3); Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102858	2021-05-20 20:33:36 +01:00
Jon Chesterfield	d18fb09c69	[libomptarget][amdgpu] Remove majority of fatal errors [libomptarget][amdgpu] Remove majority of fatal errors Replaces most calls to exit() with returning an error to the library entry point. Minor changes to error handling for clear bugs, remove some dead code. Each exit() call site replaced is either in a library entry point or a function that already returns error codes on some paths. The existing handling is not well tested but replacing exit() with a fallback path should be a strict improvement. Remaining two early exit points are an abort() from a callback and exit() from within msgpack. Fixes for those are less obvious and left for a later patch. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102346	2021-05-20 16:26:43 +01:00
Jon Chesterfield	ea68ad6e26	[libomptarget] Disable test bug49334 on amdgpu [libomptarget] Disable test bug49334 on amdgpu Hangs on amdgpu, do not know why. Disable to unblock build. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D102017	2021-05-20 15:46:56 +01:00
Pushpinder Singh	d7503c3bce	[AMDGPU][Libomptarget] Rename & move g_executables to private This patch moves g_executables to private member of Runtime class and is renamed to HSAExecutables following LLVM naming convention. This movement required making Runtime::Initialize and Runtime::Finalize non-static. Verified the correctness of this change by running libomptarget tests on gfx906. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102600	2021-05-18 05:43:23 +00:00
Pushpinder Singh	3bc2b97b34	[AMDGPU][libomptarget] Remove unused global variables This initial patch removes some unused variables from global namespace. There will more incoming patches for moving global variables to classes or static members. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102598	2021-05-18 05:40:49 +00:00
Shilei Tian	af6511d730	[OpenMP] Fixed Bug 49356 Bug 49356 (https://bugs.llvm.org/show_bug.cgi?id=49356) reports crash in the test case `tasking/bug_taskwait_detach.cpp`, which is caused by the wrong function declaration. `gtid` in `__kmpc_omp_task` should be `kmp_int32`. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D102584	2021-05-17 12:14:54 -04:00
Aakanksha Patil	464e4dc50f	[AMDGPU] Add gfx1034 target Differential Revision: https://reviews.llvm.org/D102306	2021-05-13 14:25:18 -04:00
Jon Chesterfield	10de217209	[libomptarget][amdgpu] Fix truncation error for partial wavefront [libomptarget][amdgpu] Fix truncation error for partial wavefront The partial barrier implementation involves one wavefront resetting and N-1 waiting. This change future proofs against launching with a number of threads that is not a multiple of the wavefront size. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102407	2021-05-13 17:31:57 +01:00
Jon Chesterfield	b049870d3b	[libomptarget][amdgpu] Convert an assert to print and offload_fail [libomptarget][amdgpu] Convert an assert to print and offload_fail The kernel launched is supposed to be present in the binary, but a not yet diagnosed bug means it is missing for some of the qmcpack test cases. Changing from assert to print and offload_fail should help diagnose that and similar bugs. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102378	2021-05-13 17:31:36 +01:00
Michael Kruse	34ed3e6337	[OpenMP] Test unified shared memory tests only on systems that support it. Add a `REQUIRES: unified_shared_memory` option to tests that use `#pragma omp requires unified_shared_memory`. For CUDA, the feature tag is derived from LIBOMPTARGET_DEP_CUDA_ARCH which itself is derived using [[ https://cmake.org/cmake/help/latest/module/FindCUDA.html#commands \| cuda_select_nvcc_arch_flags ]]. The latter determines which compute capability the GPU in the system supports. To ensure that this is the CUDA arch being used, we could also set the `-Xopenmp-target -march=` flag. In the absence of an NVIDIA GPU, LIBOMPTARGET_DEP_CUDA_ARCH will be 35. That is, in that case we are assuming unified_shared_memory is not available. CUDA plugin testing could be disabled entirely in this case, but this currently depends on `LIBOMPTARGET_CAN_LINK_LIBCUDA OR LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA`, not on whether the hardware is actually available. For all other targets, nothing changes and we are assuming unified shared memory is available. This might need refinement if not the case. This tries to fix the [[ http://meinersbur.de:8011/#/builders/143 \| OpenMP Offloading Buildbot ]] that, although brand-new, only has a Pascal-generation (sm_61) GPU installed. Hence, tests that require unified shared memory are currently failing. I wish I had known in advance. Reviewed By: protze.joachim, tianshilei1992 Differential Revision: https://reviews.llvm.org/D101498	2021-05-13 11:08:04 -05:00
Jon Chesterfield	9934571eab	[libomptarget][amdgpu][nfc] Expand errorcheck macros [libomptarget][amdgpu][nfc] Expand errorcheck macros These macros expand to continue, which is confusing, or exit, which is incompatible with continuing execution on offloading fail. Expanding the macros in place makes the code look untidy but the control flow obvious and amenable to improving. In particular, exit becomes easier to eliminate. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102230	2021-05-12 17:30:41 +01:00
Christopher Pulido	4fb0aaf033	[OpenMP] Changes to enable MSVC ARM64 build of libomp This is the first in a series of changes to the OpenMP runtime that have been done internally by Microsoft. This patch makes the necessary changes to enable libomp.dll to build with the MSVC compiler targeting ARM64. Differential Revision: https://reviews.llvm.org/D101173	2021-05-11 23:03:12 +03:00
Jon Chesterfield	72995a4bdf	[libomptarget][nfc] Add hook to easily disable building amdgcn bclib [libomptarget][nfc] Add hook to easily disable building amdgcn bclib This is useful when building LLVM with a toolchain that can't emit code for amdgcn, e.g. because it overrides the include search path with headers from another architecture, or the clang compiler is missing builtins. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102229	2021-05-11 17:23:09 +01:00
Peyton, Jonathan L	c765d140fe	[OpenMP] Fix hidden helper + affinity When KMP_AFFINITY is set, each worker thread's gtid value is used as an index into the place list to determine the thread's placement. With hidden helpers enabled, this gtid value is shifted down leading to unexpected shifted thread placement. This patch restores the previous behavior by adjusting the mask index to take the number of hidden helper threads into account. Hidden helper threads are given the full initial mask and do not participate in any of the other affinity mechanisms (place partitioning, balanced affinity). Their affinity is only printed for debug builds. Differential Revision: https://reviews.llvm.org/D101882	2021-05-11 08:54:22 -05:00
Jon Chesterfield	dedca78d48	[libomptarget][nfc] Drop stringify in macro [libomptarget][nfc] Drop stringify in macro A step towards deleting the macros entirely. Differential Revision: https://reviews.llvm.org/D102228	2021-05-11 12:19:55 +01:00
Jon Chesterfield	6da348569c	[libomptarget] Add support for target allocators to dynamic cuda RTL [libomptarget] Add support for target allocators to dynamic cuda RTL Follow on to D102000 which introduced new calls into libcuda. This patch adds the corresponding entry points to dynamic_cuda, fixing the build for systems that do not have the cuda toolkit installed. Function types and enum from https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102169	2021-05-10 15:27:50 +01:00
Pushpinder Singh	9586937ef5	[AMDGPU][OpenMP] Disable tests when amdgpu-arch fails This patch prevents runtime tests running on systems without amdgpu. Reviewed By: protze.joachim, tianshilei1992 Differential Revision: https://reviews.llvm.org/D102054	2021-05-10 07:37:27 +00:00
Vyacheslav Zakharin	f2f88f3e7a	An attempt to abandon omptarget out-of-tree builds. I want to start using LLVM component libraries in libomptarget to stop duplicating implementations already available in LLVM (e.g. LLVMObject, LLVMSupport, etc.). Without relying on LLVM in all libomptarget builds one has to provide fallback implementation for each used LLVM feature. This is an attempt to stop supporting out-of-llvm-tree builds of libomptarget. I understand that I may need to revert this, if this affects downstream projects in a bad way. Differential Revision: https://reviews.llvm.org/D101509	2021-05-07 12:43:50 -07:00
Joseph Huber	a15f8589f4	[libomptarget] Add support for target memory allocators to cuda RTL Summary: The allocator interface added in D97883 allows the RTL to allocate shared and host-pinned memory from the cuda plugin. This patch adds support for these to the runtime. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D102000	2021-05-07 10:27:02 -04:00
Jon Chesterfield	44ee974e2f	[libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one [libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one D101976 would require a second barrier instance. This NFC to amdgpu makes it simpler to add one (an extra global, one more line in init). Also renames the current barrier to L0. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102016	2021-05-06 23:52:19 +01:00
Jon Chesterfield	7e9351b9de	[libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin [libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin Drops an enum that was identical to a HSA one, localises some functions where they were only called from one TU. Covers everything internalize + adce can identify as dead, except for msgpack::dump which is useful when debugging. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102014	2021-05-06 23:16:32 +01:00
Jon Chesterfield	25fe17d3c1	[libomptarget] Initial documentation on amdgpu offload [libomptarget] Initial documentation on amdgpu offload Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101927	2021-05-05 19:58:52 +01:00
Peyton, Jonathan L	9982f33e2c	[OpenMP] Refactor/Rework topology discovery code This patch does the following: 1) Introduce kmp_topology_t as the runtime-friendly structure (the corresponding global variable is __kmp_topology) to determine the exact machine topology which can vary widely among current and future architectures. The current design is not easy to expand beyond the assumed three layer topology: sockets, cores, and threads so a rework capable of using the existing KMP_AFFINITY mechanisms is required. This new topology structure has: * The depth and types of the topology * Ratio count for each consecutive level (e.g., number of cores per socket, number of threads per core) * Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads) * Equivalent topology layer map (e.g., Numa domain is equivalent to socket, L1/L2 cache equivalent to core) * Whether it is uniform or not The hardware threads are represented with the kmp_hw_thread_t structure. This structure contains the ids (e.g., socket 0, core 1, thread 0) and other information grabbed from the previous Address structure. The kmp_topology_t structure contains an array of these. 2) Generalize the KMP_HW_SUBSET envirable for the new kmp_topology_t structure. The algorithm doesn't assume any order with tiles,numa domains,sockets,cores,threads. Instead it just parses the envirable, makes sure it is consistent with the detected topology (including taking into account equivalent layers) and then trims away the unneeded subset of hardware threads. To enable this, a new kmp_hw_subset_t structure is introduced which contains a vector of items (hardware type, number user wants, offset). Any keyword within __kmp_hw_get_keyword() can be used as a name and can be shortened as well. e.g., KMP_HW_SUBSET=1s,2numa,4tile,2c,3t can be used on the KNL SNC-4 machine. 3) Simplify topology detection functions so they only do the singular task of detecting the machine's topology. Printing, and all canonicalizing functionality is now done afterwards. So many lines of duplicated code are eliminated. 4) Add new ll_caches and numa_domains to OMP_PLACES, and consequently, KMP_AFFINITY's granularity setting. All the names within __kmp_hw_get_keyword() are available for use in OMP_PLACES or KMP_AFFINITY's granularity setting. 5) Simplify and future-proof code where explicit lists of allowed affinity settings keywords inside if() conditions. 6) Add x86 CPUID leaf 4 cache detection to existing x2apic id method so equivalent caches could be detected (in particular for the ll_caches place). Differential Revision: https://reviews.llvm.org/D100997	2021-05-03 18:00:24 -05:00
Pushpinder Singh	ae845d6426	[AMDGPU][OpenMP] Enable Libomptarget runtime tests This enables the runtime tests on amdgpu targets. 10 tests have been marked as XFAIL on amdgcn currently mostly due to missing printf. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D99656	2021-05-03 05:56:42 +00:00
Martin Storsjö	01d27fc408	[OpenMP] Fix warnings due to redundant semicolons. NFC.	2021-05-02 21:51:06 +03:00
Kevin Athey	bc9120047b	Correct tiny misspelling (readlef -> readelf). Getting my feet wet here as a new committer. Correct misspelling in check-depends.pl. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D101552	2021-04-30 17:20:35 -07:00
Michael Kruse	7308862ff5	[OpenMP][CMake] Use in-project clang as CUDA->IR compiler. If available, use the clang that is already built in the same project as CUDA compiler unless another executable is explicitly defined. This also ensures the generated deviceRTL IR will be consistent with the version of Clang. This patch is required to reliably test OpenMP offloading in a buildbot without either a two-stage build (e.g. with LLVM_ENABLE_RUNTIMES) or a separately installed clang on the worker that will eventually become outdated. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101265	2021-04-30 12:45:52 -05:00
Michael Kruse	3244a8b536	[OpenMP][CMake] Pass --cuda-path to regression tests. The OpenMP runtime can be compiled using a CUDA installed at non-default location with the -DCUDA_TOOLKIT_ROOT_DIR setting. However, check-openmp will fail afterwards because Clang needs to know where to find the CUDA headers. Fix by passing -cuda-path to Clang using the value of CUDA_TOOLKIT_ROOT_DIR which has been determined by CMake. Also set LD_LIBRARY_PATH such that it can find the cuda runtime when executing. This will ensure that the regression test do not depend on the current environment, but use the environment it was configured for. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101266	2021-04-27 16:27:40 -05:00
Joachim Protze	24f836e8fd	[OpenMP][libomptarget] Separate lit tests for different offloading targets (2/2) This patch fuses the RUN lines for most libomptarget tests. The previous patch D101315 created separate test targets for each supported offloading triple. This patch updates the RUN lines in libomptarget tests to use a generic run line independent of the offloading target selected for the lit instance. In cases, where no RUN line was defined for a specific offloading target, the corresponding target is declared as XFAIL. If it turns out that a test actually supports the target, the XFAIL line can be removed. Differential Revision: https://reviews.llvm.org/D101326	2021-04-27 15:54:32 +02:00
Joachim Protze	b845217b1d	[OpenMP][libomptarget] Separate lit tests for different offloading targets (1/2) This patch creates a separate test directory for each offloading target to be tested. This allows to test multiple architectures in one configuration, while still see all failing tests separately. The lit test names include the target triple, so that it will be easier to spot the failing target. This patch also allows to mark expected failing tests based on the target-triple, as the currently used triple is added to the lit "features": ``` // XFAIL: nvptx64-nvidia-cuda ``` Differential Revision: https://reviews.llvm.org/D101315	2021-04-27 12:30:01 +02:00
Joseph Huber	077fe0f739	[OpenMP][Documentation] Add FAQ entry for dynamically linked libraries Summary: Add an FAW entry detailing the support for using dynamically linked libraries with OpenMP Offloading	2021-04-26 14:21:17 -04:00
Jon Chesterfield	58f125493d	[libomptarget] Enable AMDGPU devicertl [libomptarget] Enable AMDGPU devicertl The amdgpu devicertl is written in freestanding openmp and compiles to a bitcode library (per listed gfx arch) with no unresolved symbols. It requires a recent clang, preferably the one from the same monorepo checkout. This is D98658, with printf explicitly stubbed out, after patching clang to no longer require an llvm with the amdgpu target enabled. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101213	2021-04-24 02:24:44 +01:00
Johannes Doerfert	17330a3cb1	[OpenMP] Avoid reading uninitialized parallel level values In a last minute change request for `a2dbfb6b72` we introduced a read of the uninitialized parallel level value in SPMD-mode. We go back to initializing the array early and checking for an adjusted level. Found by the miniqmc unit tests: https://cdash.qmcpack.org/CDash/viewTest.php?onlyfailed&buildid=203434 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101123	2021-04-23 11:21:58 -05:00
Joseph Huber	59b6849012	[OpenMP] Replace global InfoLevel with a reference to an internal one. Summary: This patch improves the implementation of D100774 by replacing the global variable introduced with a function that returns a reference to an internal one. This removes the need to define the variable in every plugin that uses it. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101102	2021-04-23 09:43:46 -04:00
Joseph Huber	2b6f20082e	[OpenMP] Add function for setting LIBOMPTARGET_INFO at runtime Summary: This patch adds a new runtime function __tgt_set_info_flag that allows the user to set the information level at runtime without using the environment variable. Using this will require an extern function, but will eventually be added into an auxilliary library for OpenMP support functions. This patch required moving the current InfoLevel to a global variable which must be instantiated by each plugin. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100774	2021-04-22 12:48:11 -04:00
Alexey Bataev	ca70512099	[OPENMP]Mark test as unsupported to avoid possible unexpected passes, NFC.	2021-04-22 08:06:25 -07:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
Alexey Bataev	079884225a	[OPENMP]Fix PR49698: OpenMP declare mapper causes segmentation fault. The implicitly generated mappings for allocation/deallocation in mappers runtime should be mapped as implicit, also no need to clear member_of flag to avoid ref counter increment. Also, the ref counter should not be incremented for the very first element that comes from the mapper function. Differential Revision: https://reviews.llvm.org/D100673	2021-04-21 10:38:31 -07:00
Peyton, Jonathan L	4457565757	[OpenMP] Implement GOMP task reductions Implement the remaining GOMP_* functions to support task reductions in taskgroup, parallel, loop, and taskloop constructs. The unused mem argument to many of the work-sharing constructs has to do with the scan() directive/ inscan() modifier. If mem is set, each function will call KMP_FATAL() and tell the user scan/inscan is unsupported. The GOMP reduction implementation is kept separate from our implementation because of how GOMP presents reduction data and computes the reductions. GOMP expects the privatized copies to be present even after a #pragma omp parallel reduction(task:...) region has ended so the data is stored inside GOMP's uintptr_t* data pseudo-structure. This style is tightly coupled with GCC compiler codegen. There also isn't any init(), combiner(), fini() functions in GOMP's codegen so the two implementations were to disparate to try to wrap GOMP's around our own. Differential Revision: https://reviews.llvm.org/D98806	2021-04-16 16:36:31 -05:00
Peyton, Jonathan L	5ebbb366c4	[OpenMP] Allow affinity to re-detect for child processes Current atfork() handler for child processes does not reset the affinity masks array which prevents users from setting their own affinity in child processes. Differential Revision: https://reviews.llvm.org/D99218	2021-04-16 16:34:02 -05:00
Hansang Bae	9b98497b44	[OpenMP] Add omp_target_is_accessible() to header files -- Added omp_target_is_accessible to the header files -- Added missing const qualifier to device memory routines Differential Revision: https://reviews.llvm.org/D100420	2021-04-16 07:54:15 -05:00
Joseph Huber	83d4b2e2e0	[OpenMP] Add info for device table changes Summary: This patch adds a feature to print information whenever the host-device pointer mapping table is changed by inserting or removing an entry. This introduces a new bit field for LIBOMPTARGET_INFO at position 0x8. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100600	2021-04-15 18:39:48 -04:00
Hansang Bae	77dc7b4653	[OpenMP] Fix printing routine for OMP_TOOL_VERBOSE_INIT Also fixed typo in the verbose message. Differential Revision: https://reviews.llvm.org/D100414	2021-04-14 07:55:26 -05:00
Hansang Bae	3da61ddae7	[OpenMP] Define omp_is_initial_device() variants in omp.h omp_is_initial_device() is marked as a built-in function in the current compiler, and user code guarded by this call may be optimized away, resulting in undesired behavior in some cases. This patch provides a possible fix for such cases by defining the routine as a variant function and removing it from builtin list. Differential Revision: https://reviews.llvm.org/D99447	2021-04-06 16:58:01 -05:00
Peyton, Jonathan L	2aebb7cb3c	[OpenMP] Fix incorrect KMP_STRLEN() macro The second argument to the strnlen_s(str, size) function should be sizeof(str) when str is a true array of characters with known size (instead of just a char*). Use type traits to determine if first parameter is a character array and use the correct size based on that trait. Differential Revision: https://reviews.llvm.org/D98209	2021-04-05 09:03:09 -05:00
Joseph Huber	0af4e74aef	[OpenMP][NFC] Fix typo in libomptarget error message Summary: There was a typo suggesting the user to use `LIBOMPTARGET_DEBUG` instead of `LIBOMPTARGET_INFO`	2021-04-01 12:45:28 -04:00
Joseph Huber	29338459fb	[OpenMP] Trim error messages in CUDA plugin Summary: Remove some of the error messages printed when the CUDA plugin fails. The current error messages can be confusing because they are the first error messages printed after the async stream finds an error. This means that the printed values aren't related to what caused the issue, but are simply the last asyncronous operation that succeeded on the device. Remove these as they can be misleading. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99510	2021-03-29 12:20:19 -04:00
Alexey Bataev	0411b23319	[OPENMP]Map data field with l-value reference types. Added initial support dfor the mapping of the data members with l-value reference types. Differential Revision: https://reviews.llvm.org/D98812	2021-03-29 07:07:09 -07:00
Joseph Huber	16064e71e9	[OpenMP] Reset async stream properly upon failure Summary: If the call to `synchronize` fails, it will currently block the stream indefinitely if execution is continued from this point. Additionally, if the program exits it will trigger an assertion on the non-null value of the async queue and prevent the runtime from printing debugging information. Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D99443	2021-03-26 19:05:06 -04:00
Hansang Bae	467f39249d	[OpenMP] Misc. changes that add or remove pointer/bound checks -- Added or moved checks to appropriate places. -- Removed ineffective null check where the pointer is already being dereferenced around the code. -- Initialized variables that can be used without definitions. -- Added call to dlclose/FreeLibrary in OMPT tool activation. -- Added a new build compiler definition. Differential Revision: https://reviews.llvm.org/D98584	2021-03-23 18:55:08 -05:00
Shilei Tian	2df65f87c1	[OpenMP] Fixed a crash in hidden helper thread It is reported that after enabling hidden helper thread, the program can hit the assertion `new_gtid < __kmp_threads_capacity` sometimes. The root cause is explained as follows. Let's say the default `__kmp_threads_capacity` is `N`. If hidden helper thread is enabled, `__kmp_threads_capacity` will be offset to `N+8` by default. If the number of threads we need exceeds `N+8`, e.g. via `num_threads` clause, we need to expand `__kmp_threads`. In `__kmp_expand_threads`, the expansion starts from `__kmp_threads_capacity`, and repeatedly doubling it until the new capacity meets the requirement. Let's assume the new requirement is `Y`. If `Y` happens to meet the constraint `(N+8)2^X=Y` where `X` is the number of iterations, the new capacity is not enough because we have 8 slots for hidden helper threads. Here is an example. ``` #include <vector> int main(int argc, char argv[]) { constexpr const size_t N = 1344; std::vector<int> data(N); #pragma omp parallel for for (unsigned i = 0; i < N; ++i) { data[i] = i; } #pragma omp parallel for num_threads(N) for (unsigned i = 0; i < N; ++i) { data[i] += i; } return 0; } ``` My CPU is 20C40T, then `__kmp_threads_capacity` is 160. After offset, `__kmp_threads_capacity` becomes 168. `1344 = (160+8)*2^3`, then the assertions hit. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D98838	2021-03-18 18:25:36 -04:00
Jon Chesterfield	626a31de15	[libomptarget] Add register usage info to kernel metadata Add register usage information to the runtime metadata so that it can be used during kernel launch (that change will be in a different commit). Add this information to the kernel trace. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D98829	2021-03-18 17:00:42 +00:00
Jon Chesterfield	dbf8f2b089	Revert "[libomptarget] Build amdgcn devicertl by default" This reverts commit `e23f3502d9`. It broke the build of openmp for clang built without amdgcn support. D98746, under review, would allow this to reland.	2021-03-17 11:34:44 +00:00
Hansang Bae	a6f9cb6adc	[OpenMP] Add runtime interface for OpenMP 5.1 error directive The proposed new interface is for supporting `at(execution)` clause in the error directive. Differential Revision: https://reviews.llvm.org/D98448	2021-03-16 08:55:25 -05:00
Johannes Doerfert	0a954a528b	[OpenMP][FIX] Repair accidental replacement of _shfl_sync with _shfl This was broken accidentally in D95752. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D98677	2021-03-15 22:46:00 -05:00
Jon Chesterfield	e23f3502d9	[libomptarget] Build amdgcn devicertl by default [libomptarget] Build amdgcn devicertl by default The cmake for this looks for an llvm install and does the right thing when building as part of enable_runtimes. It will probably do the right thing in other settings - at least, it won't try to build this with gcc. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98658	2021-03-15 23:17:50 +00:00
Peyton, Jonathan L	7085f04573	[OpenMP] Remove unused cpu_stackoffset member	2021-03-15 16:52:04 -05:00
Jon Chesterfield	bb38d7ff05	[libomptarget][nfc][amdgcn] Use precise triple for devicertl build	2021-03-15 20:24:13 +00:00
Jon Chesterfield	d0bc85f04a	[libomptarget][nfc] Drop unused DEVICE macro [libomptarget][nfc] Drop unused DEVICE macro Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98655	2021-03-15 20:12:50 +00:00
Jon Chesterfield	7da76aaaf4	[libomptarget] Build amdgpu plugin by default [libomptarget] Build amdgpu plugin by default This will build the amdgpu plugin if cmake is able to find the hsa runtime library, which will be the case if rocm is installed or if the hsa library has been installed somewhere cmake looks. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98654	2021-03-15 20:12:01 +00:00
Jon Chesterfield	bcb3f0f867	[libomptarget] Fix devicertl build [libomptarget] Fix devicertl build The target specific functions in target_interface are extern C, but the implementations for nvptx were mostly C++ mangling. That worked out as a quirk of DEVICE macro expanding to nothing, except for shuffle.h which only forward declared the functions with C++ linkage. Also implements GetWarpSize, as used by shuffle, and includes target_interface in nvptx target_impl.cu to help catch future divergence between interface and implementation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98651	2021-03-15 19:50:22 +00:00
Jon Chesterfield	f675b3df48	[libomptarget] Drop assert.h, use freestanding for amdgcn devicertl [libomptarget] Drop assert.h, use freestanding for amdgcn devicertl Promotes the runtime assert to a link time error for the unimplemented fallback functions. Enables amdgcn to build with only clang provided headers, which makes it less likely to break other builds when enabled. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D98649	2021-03-15 18:50:09 +00:00
Jon Chesterfield	156842937f	[libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding [libomptarget][amdgcn] Drop use of inttypes.h, moving closer to freestanding The glibc headers are a periodic source of problems compiling the devicertl. This patch resolves the following error run into while building llvm on a slightly different linux system. ``` In file included from .../lib/clang/13.0.0/include/inttypes.h:21: In file included from /usr/include/inttypes.h:25: /usr/include/features.h:461:12: fatal error: 'sys/cdefs.h' file not found # include <sys/cdefs.h> ^~~~~~~~~~~~~ ``` As a second patch, removing assert.h from shuffle will let amdgcn build as -ffreestanding, at which point only the headers that clang itself provides are used and interactions with the host glibc are eliminated. Doing the same for nvptx is complicated by printf handling but also seems worthwhile. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D98565	2021-03-15 16:54:58 +00:00
George Rokos	2468fdd9af	[libomptarget] Add allocator support for target memory This patch adds the infrastructure for allocator support for target memory. Three allocators are introduced for device, host and shared memory. The corresponding API functions have the llvm_ prefix temporarily, until they become part of the OpenMP standard. Differential Revision: https://reviews.llvm.org/D97883	2021-03-13 03:47:07 -08:00
Johannes Doerfert	5449fbb5d4	[OpenMP][NFC] Use `AsyncInfo` as the variable name for a `__tgt_async_info` Reviewed By: grokos, tianshilei1992 Differential Revision: https://reviews.llvm.org/D96444	2021-03-11 23:31:34 -06:00
Johannes Doerfert	66ba494b49	[OpenMP][DeviceRTL] Extract shuffle idiom and port it to declare variant The shuffle idiom is differently implemented in our supported targets. To reduce the "target_impl" file we now move the shuffle idiom in it's own self-contained header that provides the implementation for AMDGPU and NVPTX. A fallback can be added later on. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D95752	2021-03-11 23:31:30 -06:00
Joseph Huber	807466ef28	[OpenMP] Restore backwards compatibility for libomptarget Summary: The changes introduced in D87946 changed the API for libomptarget functions. `__kmpc_push_target_tripcount` was a function in Clang 11.x but was not given a backward-compatible interface. This change will require people using Clang 13.x or 12.x to recompile their offloading programs. Reviewed By: jdoerfert cchen Differential Revision: https://reviews.llvm.org/D98358	2021-03-11 09:52:11 -05:00
Leonard Chan	baf637dcde	Rename top-level LICENSE.txt files to LICENSE.TXT This makes all the license filenames uniform across subprojects. Differential Revision: https://reviews.llvm.org/D98380	2021-03-10 21:26:24 -08:00
AndreyChurbanov	aaf16b80dd	[OpenMP] libomp: eliminate pause from atomic CAS loops For clang this change is NFC cleanup, because clang never calls atomic functions from runtime library. Basically, pause is good in spin-loops waiting for something. Atomic CAS loops do not wait for anything, each CAS failure means some other thread progressed. Performance experiments show that the pause only causes unnecessary slowdown on CPUs with slow pause instruction, no difference on CPUs with fast pause instruction, removal of the pause gives lesser binary size which is good. Differential Revision: https://reviews.llvm.org/D97079	2021-03-09 18:30:08 +03:00
AndreyChurbanov	e4492b6f31	[OpenMP] NFC: temporarily disable assertion until the bug with dependences is fixed	2021-03-08 22:18:30 +03:00
Shilei Tian	c41ae246ac	[OpenMP][Clang][NVPTX] Only build one bitcode library for each SM In D97003, CUDA 9.2 is the minimum requirement for OpenMP offloading on NVPTX target. We don't need to have macros in source code to select right functions based on CUDA version. we don't need to compile multiple bitcode libraries of different CUDA versions for each SM. We don't need to worry about future compatibility with newer CUDA version. `-target-feature +ptx61` is used in this patch, which corresponds to the highest PTX version that CUDA 9.2 can support. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97198	2021-03-08 12:03:04 -05:00
Peyton, Jonathan L	e2738b3758	[OpenMP] Fix potential integer overflow in dynamic schedule code Restrict the chunk_size * chunk_num to only occur for valid chunk_nums and reimplement calculating the limit to avoid overflow. Differential Revision: https://reviews.llvm.org/D96747	2021-03-08 09:43:05 -06:00
tlwilmar	97d000cfc6	Added API for "masked" construct via two entrypoints: __kmpc_masked, and __kmpc_end_masked. The "master" construct is deprecated. Changed proc-bind keyword from "master" to "primary". Use of both master construct and master as proc-bind keyword is still allowed, but deprecated. Remove references to "master" in comments and strings, and replace with "primary" or "primary thread". Function names and variables were not touched, nor were references to deprecated master construct. These can be updated over time. No new code should refer to master.	2021-03-05 09:29:57 -06:00
Joel E. Denny	d0eb25a643	[OpenMP] Encapsulate more in checkDeviceAndCtors This patch just encapsulates some repeated code. To do so, it relocates some functions from interface.cpp to omptarget.cpp. It also adjusts them to the LLVM coding style. This patch is almost NFC except some `DP` messages are a bit different. For example, messages like "Entering target region" are now emitted even if offload is disabled, but a subsequent "Offload is disabled" is then emitted. Reviewed By: jdoerfert, grokos Differential Revision: https://reviews.llvm.org/D97908	2021-03-04 12:03:42 -05:00
Joel E. Denny	bfe5452b93	[OpenMP] Fix lone target exit data Without this patch, an `omp target exit data` before the runtime is initialized produces a runtime error. This patch fixes that by changing `__tgt_target_data_end_mapper` to call `CheckDeviceAndCtors` like many other runtime routines. Discussed at <https://lists.llvm.org/pipermail/openmp-dev/2021-March/003920.html>. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D97907	2021-03-04 12:03:42 -05:00
Joel E. Denny	10c18c69f2	[OpenMP] Fix support for device as host Without this patch, when the offload device is set to `omp_get_initial_device()`, the runtime fails with an error diagnostic when entering target regions or target data regions. However, OpenMP 5.1, sec. 2.14.5 "target Construct", "Restrictions", p. 203, L3-5 states: > The device clause expression must evaluate to a non-negative integer > value that is less than or equal to the value of > omp_get_num_devices(). Sec. 3.7.7 "omp_get_initial_device", p. 412, L2-3 states: > The value of the device number is the value returned by the > omp_get_num_devices routine. Similarly, OpenMP 5.0, sec. 2.12.5 "target Construct", "Restrictions", p. 174 L30-32 states: > The device clause expression must evaluate to a non-negative integer > value less than the value of omp_get_num_devices() or to the value > of omp_get_initial_device(). This patch fixes this behavior by changing the runtime to behave as if offloading is disabled whenever it finds the offload device (either from a `device` clause or the default device) is set to the host device. In the case of mandatory offloading when `omp_get_num_devices() == 0`, it incorporates the behavior proposed for OpenMP 5.2 in OpenMP spec github issue 2669. Reviewed By: grokos, RaviNarayanaswamy Differential Revision: https://reviews.llvm.org/D97616	2021-03-04 12:03:42 -05:00
Hansang Bae	b6c2f538b2	[OpenMP] Add allocator support for target memory This is a preview of allocator support for target memory that depends on the offload runtime API which allocates memory as described below. llvm_omp_target_alloc_host(size_t size, int device_num); -- Returns non-migratable memory owned by host. -- Memory is accessible by host and device(s). llvm_omp_target_alloc_shared(size_t size, int device_num); -- Returns migratable memory owned by host and device. -- Memory is accessible by host and device. llvm_omp_target_alloc_device(size_t size, int device_num); -- Returns memory owned by device. -- Memory is only accessible by device. New memory space and predefined allocator names are -- llvm_omp_target_host_mem_space -- llvm_omp_target_shared_mem_space -- llvm_omp_target_device_mem_space -- llvm_omp_target_host_mem_alloc -- llvm_omp_target_shared_mem_alloc -- llvm_omp_target_device_mem_alloc Differential Revision: https://reviews.llvm.org/D96669	2021-03-02 16:45:12 -06:00
Alexey Bataev	0caf736d7e	[OPENMP50]Mapping of the subcomponents with the 'default' mappers. If the mapped structure has data members, which have 'default' mappers, need to map these members individually using their 'default' mappers. Differential Revision: https://reviews.llvm.org/D92195	2021-03-02 07:11:06 -08:00
Peyton, Jonathan L	e83380fccc	[OpenMP] Fix clang-cl build error regarding TSX intrinsics Fix for https://bugs.llvm.org/show_bug.cgi?id=49339 The CMake check for the RTM intrinsics needs the -mrtm flag to be set during the test. This way clang-cl correctly detects it has the _xbegin() intrinsic. Otherwise, the CMake check fails. Differential Revision: https://reviews.llvm.org/D97413	2021-03-02 07:47:42 -06:00
AndreyChurbanov	1df6e58e55	[OpenMP] libomp minor cleanup Cleanup changes: - check value read from file; - remove dead code; - make unsigned variable to read hexadecimal number to; - add debug assertion to check ref count. Differential Revision: https://reviews.llvm.org/D96893	2021-02-26 00:44:51 +03:00
AndreyChurbanov	4932101177	[OpenMP] libomp: fix ittnotify stack stitching for teams construct Stitching id could be overridden causing reference of destroyed object when number of teams is 1. The patch separates stitching id store location for teams and parallel nested in teams. Differential Revision: https://reviews.llvm.org/D96562	2021-02-26 00:23:24 +03:00
Peyton, Jonathan L	d12ae7db99	[OpenMP] Fix accidental addition of use omp_lib_kinds Fortran header accidentally had use omp_lib_kinds added inside a subroutine and function. This patch removes the lines.	2021-02-25 12:49:56 -06:00
Harmen Stoppels	a54f160b3a	Prefer /usr/bin/env xxx over /usr/bin/xxx where xxx = perl, python, awk Allow users to use a non-system version of perl, python and awk, which is useful in certain package managers. Reviewed By: JDevlieghere, MaskRay Differential Revision: https://reviews.llvm.org/D95119	2021-02-25 11:32:27 +01:00
Vyacheslav Zakharin	6baeeb9efa	[libomptarget] Fixed MSVC build fail caused by __attribute__((used)). Differential Revision: https://reviews.llvm.org/D97348	2021-02-24 09:59:39 -08:00
Joachim Protze	2fbce374c8	[OpenMP][Tests][NFC] rename macro to avoid naming clash Rename a macro use missed in `e0f3acc5d3`	2021-02-24 18:46:56 +01:00
Shilei Tian	e5da63d5a9	[OpenMP] Fixed a crash when offloading to x86_64 with target nowait PR#49334 reports a crash when offloading to x86_64 with `target nowait`, which is caused by referencing a nullptr. The root cause of the issue is, when pushing a hidden helper task in `__kmp_push_task`, it also maps the gtid to its shadow gtid, which is wrong. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97329	2021-02-24 12:37:30 -05:00
Joachim Protze	f3a72509a7	[OpenMP][Tests][NFC] lit might also be known as llvm-lit.py	2021-02-24 18:32:24 +01:00
Manoel Roemmer	542d9c2154	[libomptarget] Load images in order of registration This makes sure that images are loaded in the order in which they are registered with libomptarget. If a target can load multiple images and these images depend on each other (for example if one image contains the programs target regions and one image contains library code), then the order in which images are loaded can be important for symbol resolution (for example, in the VE plugin). In this case: because the same code exist in the host binaries, the order in which the host linker loads them (which is also the order in which images are registered with libomptarget) is the order in which the images have to be loaded onto the device. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D95530	2021-02-24 18:15:41 +01:00
Joachim Protze	e0f3acc5d3	[OpenMP][Tests][NFC] rename macro to avoid naming clash Rename a macro and macro use missed in `35ab6d6390`	2021-02-24 18:13:28 +01:00
Joachim Protze	35ab6d6390	[OpenMP][Tests][NFC] rename macro to avoid naming clash When including <ostream>, the register_callback macro of the OMPT callback.h clashes with a function defined in ostream. This patch renames the macro and includes ompt into the macro name.	2021-02-24 18:03:54 +01:00
Shilei Tian	f6c2984a09	[OpenMP][NVPTX] Fixed a compilation error in deviceRTLs caused by unsupported feature in release verion of LLVM `ptx71` is not supported in release version of LLVM yet. As a result, the support of CUDA 11.2 and CUDA 11.1 caused a compilation error as mentioned in D97004. Since the support in D97004 is just a WA for releease, and we'll not use it in the near future, using `ptx70` for CUDA 11 is feasible. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D97195	2021-02-23 13:20:21 -05:00

... 4 5 6 7 8 ...

2123 Commits