llvm-project

Commit Graph

Author	SHA1	Message	Date
Joel E. Denny	1688b4cf8e	[OpenMP][AMDGPU] XFAIL test where kernels call printf	2021-08-31 22:11:28 -04:00
Joel E. Denny	ec1ebcd302	[OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in runtime (2/2) This patch implements OpenMP runtime support for an original OpenMP extension we have developed to support OpenACC: the `ompx_hold` map type modifier. The previous patch in this series, D106509, implements Clang support and documents the new functionality in detail. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D106510	2021-08-31 16:13:49 -04:00
Joel E. Denny	83ddfa0d22	[OpenMP][OpenACC] Implement `ompx_hold` map type modifier extension in Clang (1/2) This patch implements Clang support for an original OpenMP extension we have developed to support OpenACC: the `ompx_hold` map type modifier. The next patch in this series, D106510, implements OpenMP runtime support. Consider the following example: ``` #pragma omp target data map(ompx_hold, tofrom: x) // holds onto mapping of x { foo(); // might have map(delete: x) #pragma omp target map(present, alloc: x) // x is guaranteed to be present printf("%d\n", x); } ``` The `ompx_hold` map type modifier above specifies that the `target data` directive holds onto the mapping for `x` throughout the associated region regardless of any `target exit data` directives executed during the call to `foo`. Thus, the presence assertion for `x` at the enclosed `target` construct cannot fail. (As usual, the standard OpenMP reference count for `x` must also reach zero before the data is unmapped.) Justification for inclusion in Clang and LLVM's OpenMP runtime: * The `ompx_hold` modifier supports OpenACC functionality (structured reference count) that cannot be achieved in standard OpenMP, as of 5.1. * The runtime implementation for `ompx_hold` (next patch) will thus be used by Flang's OpenACC support. * The Clang implementation for `ompx_hold` (this patch) as well as the runtime implementation are required for the Clang OpenACC support being developed as part of the ECP Clacc project, which translates OpenACC to OpenMP at the directive AST level. These patches are the first step in upstreaming OpenACC functionality from Clacc. * The Clang implementation for `ompx_hold` is also used by the tests in the runtime implementation. That syntactic support makes the tests more readable than low-level runtime calls can. Moreover, upstream Flang and Clang do not yet support OpenACC syntax sufficiently for writing the tests. * More generally, the Clang implementation enables a clean separation of concerns between OpenACC and OpenMP development in LLVM. That is, LLVM's OpenMP developers can discuss, modify, and debug LLVM's extended OpenMP implementation and test suite without directly considering OpenACC's language and execution model, which can be handled by LLVM's OpenACC developers. * OpenMP users might find the `ompx_hold` modifier useful, as in the above example. See new documentation introduced by this patch in `openmp/docs` for more detail on the functionality of this extension and its relationship with OpenACC. For example, it explains how the runtime must support two reference counts, as specified by OpenACC. Clang recognizes `ompx_hold` unless `-fno-openmp-extensions`, a new command-line option introduced by this patch, is specified. Reviewed By: ABataev, jdoerfert, protze.joachim, grokos Differential Revision: https://reviews.llvm.org/D106509	2021-08-31 16:13:49 -04:00
Shilei Tian	8442967fe3	[OpenMP] Fix task wait doesn't work as expected in serialized team As discussed in D107121, task wait doesn't work when a regular task T depends on a detached task or a hidden helper task T' in a serialized team. The root cause is, since the team is serialized, the last task will not be tracked by `td_incomplete_child_tasks`. When T' is finished, it first releases its dependences, and then decrements its parent counter. So far so good. For the thread that is running task wait, if at the moment it is still spinning and trying to execute tasks, it is fine because it can detect the new task and execute it. However, if it happends to finish the function `flag.execute_tasks(...)`, it will be broken because `td_incomplete_child_tasks` is 0 now. In this patch, we update the rule to track children tasks a little bit. If the task team encounters a proxy task or a hidden helper task, all following tasks will be tracked. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D107496	2021-08-31 12:15:46 -04:00
Joachim Protze	5ea1c37118	[libomptarget][amdcgn] Only add opt/llvm-link dependency if TARGET is available In some build configurations, the target we depend on is not available for declaring the build dependency. We only need to declare the build dependency, if the build target is available in the same build. Fixes the issue raised in https://reviews.llvm.org/D107156#2969862 This patch should go into release/13 together with D108404 Differential Revision: https://reviews.llvm.org/D108868	2021-08-30 17:32:11 +02:00
Shilei Tian	e8fdacfd81	[OpenMP][NVPTX] Fixed missing variables for CUDA free compilation in NVPTX plugin `CU_EVENT_DEFAULT` is defined in CUDA header. It should be added to `openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.h` for CUDA free build. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108878	2021-08-28 18:08:10 -04:00
Shilei Tian	29df4ab3f3	[OpenMP][Offloading] Add support for event related interfaces This patch adds the support form event related interfaces, which will be used later to fix data race. See D104418 for more details. Reviewed By: jdoerfert, ye-luo Differential Revision: https://reviews.llvm.org/D108528	2021-08-28 16:24:14 -04:00
George Rokos	a2bd44089e	[libomptarget][NFC] Fixed tests which checked for obsolete string "getOrAllocTgtPtr"	2021-08-28 07:35:42 -07:00
Jon Chesterfield	78f92c3810	[openmp][amdgpu] Initial gfx10 offloading implementation Lets wavefront size be 32 for amdgpu openmp, as well as 64. Fixes up as little as possible to pass that through the libraries. This change is end to end, as opposed to updating clang/devicertl/plugin separately. It can be broken up for review/commit if preferred. Posting as-is so that others with a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are probably bugs remaining as well as the todo: for letting grid values vary more. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108708	2021-08-27 12:34:03 +01:00
George Rokos	3819aae6dd	[libomptarget][NFC] Replaced obsolete name "getOrAllocTgtPtr" with new "getTargetPointer" in debug messages.	2021-08-26 18:01:18 -07:00
Jon Chesterfield	3d85342982	[libomptarget][amdgpu][nfc] Rename variables, delete dead code	2021-08-26 19:58:38 +01:00
Jon Chesterfield	68ab93f4d7	[libomptarget][amdgpu][nfc] Rename source files	2021-08-26 18:29:44 +01:00
Jon Chesterfield	a5f4074d85	[libomptarget][amdgpu] Macro for accessing GPU variables from plugin Lets the amdgpu plugin write to omptarget_device_environment to enable debugging. Intend to use in the near future to record the wavesize that a given deviceRTL was compiled with for running on hardware that supports 32 or 64. Patch sets all the attributes that are useful. Notably .data means the variable is set by writing to host memory before copying to the GPU instead of launching a kernel to update the image. Can simplify the plugin slightly to drop the code for patching after load if this is used consistently. NFC on nvptx, cuda plugin seems to work fine without any annotations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108698	2021-08-26 17:28:18 +01:00
Jon Chesterfield	ba0af885e7	[libomptarget][amdgpu][nfc] Make grid value access match devicertl	2021-08-25 15:11:19 +01:00
Jon Chesterfield	9b2c6c07b5	[libomptarget][amdgpu] Refactor debug printing Move most debug printing in rtl.cpp behind DP() macro Adjust the print output for gpu arch mismatch when the architectures match Convert an assert into graceful failure Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108562	2021-08-25 14:57:51 +01:00
Jon Chesterfield	ba8547775b	[libomptarget][amdgpu] Fix debug build from D104696	2021-08-25 01:27:51 +01:00
Michael Kruse	1275ee3041	[OpenMP][amdgcn] Don't use in-tree clang if not available. The use of `$<TARGET_FILE:clang>` was adapted too broadly from D101265. Fixes llvm.org/PR51579 Also see discussion in D108534. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D108640	2021-08-24 12:50:49 -05:00
Pushpinder Singh	9b8b7c1180	[AMDGPU][Libomptarget] Delete g_atl_machine global With uses of g_atl_machine gone, a significant portion of dead code has been removed. This patch depends on D104691 and D104695. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104696	2021-08-24 07:59:40 +00:00
Jon Chesterfield	d26000e4cc	[openmp][devicertl] Freestanding nvptx via stub printf Compiled nvptx devicertl as freestanding, breaking the dependency on host glibc and gcc-multilibs. Thus build it by default. Comes at the cost of #defining out printf. Tried mapping it onto __builtin_printf but that gets transformed back to printf instead of hitting the cuda/openmp lowering transform. Printf could be preserved by one of: - dropping all the standard headers and ffreestanding - providing a header only printf implementation - changing the compiler handling of printf Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D108349	2021-08-23 23:07:47 +01:00
Jon Chesterfield	842f875c8b	[openmp] Use llvm GridValues from devicertl Add include path to the cmakefiles and set the target_impl enums from the llvm constants instead of copying the values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108391	2021-08-23 20:25:24 +01:00
Peyton, Jonathan L	d39d3a327b	[OpenMP][test] fix omp_get_wtime.c test to be more accommodating The omp_get_wtime.c test fails intermittently if the recorded times are off by too much which can happen when many tests are run in parallel. Instead of failing if one timing is a little off, take average of 100 timings minus the 10 worst. Differential Revision: https://reviews.llvm.org/D108488	2021-08-23 08:13:42 -05:00
Vignesh Balasubramanian	589519b9ab	[OpenMP][OMPD]Code movement required for OMPD These changes don't come under OMPD guard as it is a movement of existing code to capture parallel behavior correctly. "Runtime Entry Points for OMPD" like "ompd_bp_parallel_begin" and "ompd_bp_parallel_begin" should be placed at the correct execution point for the debugging tool to access proper handles/data. Without the below changes, in certain cases, debugging tool will pick the wrong parallel and task handle. Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100366	2021-08-20 14:36:22 +05:30
Shilei Tian	1d8d43ae61	[OpenMP] Use `__kmpc_give_task` in `__kmp_push_task` when encountering a hidden helper task This patch replaces the current implementation, overwrites `gtid` and `thread`, with `__kmpc_give_task`. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D106977	2021-08-19 20:49:29 -04:00
Joachim Protze	4bb36df144	[libomptarget][amdcgn] Add build dependency for llvm-link and opt D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime (LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same cmake instance. We could limit the dependency to cases where libomptarget/plugins are really built. But compared to the whole llvm project, building openmp runtime is negligible and postponing the build of OpenMP runtime after the dependencies are ready seems reasonable. The direct dependency introduced in D107156 and D107320 is necessary for the case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp). Differential Revision: https://reviews.llvm.org/D108404	2021-08-20 01:57:58 +02:00
Jennifer Yu	c274b19866	Add implicit map for a list item appears in a reduction clause. A new rule is added in 5.0: If a list item appears in a reduction, lastprivate or linear clause on a combined target construct then it is treated as if it also appears in a map clause with a map-type of tofrom. Currently map clauses for all capture variables are added implicitly. But missing for list item of expression for array elements or array sections. The change is to add implicit map clause for array of elements used in reduction clause. Skip adding map clause if the expression is not mappable. Noted: For linear and lastprivate, since only variable name is accepted, the map has been added though capture variables. To do so: During the mappable checking, if error, ignore diagnose and skip adding implicit map clause. The changes: 1> Add code to generate implicit map in ActOnOpenMPExecutableDirective, for omp 5.0 and up. 2> Add extra default parameter NoDiagnose in ActOnOpenMPMapClause: Use that to skip error as well as skip adding implicit map during the mappable checking. Note: there are only tow places need to be check for NoDiagnose. Rest of them either the check is for < omp 5.0 or the error already generated for reduction clause. Differential Revision: https://reviews.llvm.org/D108132	2021-08-19 12:53:47 -07:00
Jon Chesterfield	ad0f6e1d98	[openmp] Disable the tests that block CI for amdgpu and host offloading.	2021-08-19 20:43:30 +01:00
Jon Chesterfield	6c75ce1b8b	[libomptarget][nfc] Move lanemask_t type into target_impl.h	2021-08-19 18:50:03 +01:00
Jon Chesterfield	77579b99e9	[openmp][nfc] Replace OMPGridValues array with struct [nfc] Replaces enum indices into an array with a struct. Named the fields to match the enum, leaves memory layout and initialization unchanged. Motivation is to later safely remove dead fields and replace redundant ones with (compile time) computation. It should also be possible to factor some common fields into a base and introduce a gfx10 amdgpu instance with less duplication than the arrays of integers require. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108339	2021-08-19 13:25:42 +01:00
Jon Chesterfield	f420939b82	[libomptarget] Apply D106710 to amdgcn devicertl	2021-08-19 01:34:33 +01:00
Jon Chesterfield	c480792b6a	[libomptarget][nfc][devicertl] Delete unused enums	2021-08-19 00:14:34 +01:00
Jon Chesterfield	21d91a8ef3	[libomptarget][devicertl] Replace lanemask with uint64 at interface Use uint64_t for lanemask on all GPU architectures at the interface with clang. Updates tests. The deviceRTL is always linked as IR so the zext and trunc introduced for wave32 architectures will fold after inlining. Simplification partly motivated by amdgpu gfx10 which will be wave32 and is awkward to express in the current arch-dependant typedef interface. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108317	2021-08-18 20:47:33 +01:00
Joseph Huber	edb8acdc6e	[Libomptarget] Correctly default to Generic if exec_mode is not present Currently, the runtime returns an error when the `exec_mode` global is not present. The expected behvaiour is that the region will default to Generic. This prevents global constructors from being called because they do not contain execution mode globals. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108255	2021-08-18 11:24:28 -04:00
Martin Storsjö	f5616a981c	[OpenMP] Fix the usage of sscanf on MinGW KMP_SSCANF only evaluates to sscanf_s within #if KMP_OS_WINDOWS && KMP_MSVC_COMPAT so we need to pass the sscanf_s specific parameters within a similar condition. Differential Revision: https://reviews.llvm.org/D108196	2021-08-17 21:36:09 +03:00
Peyton, Jonathan L	b4a1f441d9	[OpenMP] Add a few small fixes * Add comment to help ensure new construct data are added in two places * Check for division by zero in the loop worksharing code * Check for syntax errors in parrange parsing Differential Revision: https://reviews.llvm.org/D105929	2021-08-16 10:02:49 -05:00
Peyton, Jonathan L	6eeb4c1f32	[OpenMP] Fix incorrect parameters to sscanf_s call On Windows, the documentation states that when using sscanf_s, each %c and %s specifier must also have additional size parameter. This patch adds the size parameter in the one place where %c is used. Differential Revision: https://reviews.llvm.org/D105931	2021-08-16 09:59:21 -05:00
AndreyChurbanov	52cac541d4	[OpenMP] libomp: cleanup: minor fixes to silence static analyzer. Added couple more checks to silence KlocWork static code analyzer. Differential Revision: https://reviews.llvm.org/D107348	2021-08-16 13:39:23 +03:00
AndreyChurbanov	f94da67f49	[OpenMP][NFC] libomp: reduced timeouts in the test from 50 to 2 sec.	2021-08-11 17:58:52 +03:00
George Rokos	df06ec3057	[libomptarget][NFC] Fix compilation issue with GCC Removed redundant assignment from condition which causes gcc to emit the following error: error: operation on ‘MoveData’ may be undefined [-Werror=sequence-point]	2021-08-10 09:43:43 -07:00
Joel E. Denny	2ced1f338a	[OpenMP][NFC] Simplify targetDataEnd conditions for CopyMember targetDataEnd and targetDataBegin compute CopyMember/copy differently, and I don't see why they should. This patch eliminates one of those differences by making a simplifying NFC change to targetDataEnd. The change is NFC as follows. The change only affects the case when `!UNIFIED_SHARED_MEMORY \|\| HasCloseModifier`. In that case, the following points are always true: * The value of CopyMember is relevant later only if DelEntry = false. * DelEntry = false only if one of the following is true: * IsLast = false. In this case, it's always true that CopyMember = false = IsLast. * `MEMBER_OF && !PTR_AND_OBJ` is true. In this case, CopyMember = IsLast. * Thus, if CopyMember is relevant, CopyMember = IsLast. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D105990	2021-08-10 12:29:55 -04:00
Pirama Arumuga Nainar	49fabd9d76	[openmp] Do not use shared memory on Android Android provides ashmem/ASharedMemory support on newer releases, which we can use if requested by openmp users on Android. Also refactor the preprocessor check for using shared memory to kmp_config.h.cmake. Differential Revision: https://reviews.llvm.org/D107181	2021-08-09 09:41:32 -07:00
Dimitry Andric	400cd6d2f0	[libomptarget][amdgpu] use --allow-shlib-undefined to link on FreeBSD On FreeBSD, the `environ` symbol is undefined at link time for shared libraries, but resolved by the dynamic linker at runtime. Therefore, allow the symbol to be undefined when creating a shared library, by using the `--allow-shlib-undefined` linker flag, instead of `-z defs` (a.k.a `--no-undefined`). Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107698	2021-08-08 13:52:44 +02:00
Ye Luo	262289c103	[OpenMP] mark target task untied OpenMP specification Tasking Terminology target task :A mergeable and untied task that ... Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D107686	2021-08-07 12:31:20 -04:00
Dimitry Andric	71ae2e0221	[libomptarget][amdgpu] don't declare Elf_Note on FreeBSD On FreeBSD, the system `<libelf.h>` already declares `struct Elf_Note` indirectly (via `<sys/elf_common.h>`). This results in compile errors when building the libomptarget amdgpu plugin. Avoid redeclaring `struct Elf_Note` on FreeBSD to fix the errors. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107661	2021-08-06 21:45:26 +02:00
Shilei Tian	28939b6ae5	[NFC] Clean up and clang-format openmp/libomptarget/plugins/cuda/src/rtl.cpp	2021-08-05 22:32:28 -04:00
Shilei Tian	680c71b127	[OpenMP] Clean up for hidden helper task This patch makes some clean up for code of hidden helper task. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D107008	2021-08-04 12:36:44 -04:00
Shilei Tian	9f5d6ea52e	[OpenMP] Fix performance regression reported in bug #51235 This patch fixes the "performance regression" reported in https://bugs.llvm.org/show_bug.cgi?id=51235. In fact it has nothing to do with performance. The root cause is, the stolen task is not allowed to execute by another thread because by default it is tied task. Since hidden helper task will always be executed by hidden helper threads, it should be untied. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D107121	2021-08-04 12:34:49 -04:00
Lechen Yu	3bc8ce5dd7	[openmp] Add OMPT initialization in libomptarget When loading libomptarget, the init function in libomptarget/src/rtl.cpp will search for the libomptarget_start_tool function using libdl. libomptarget_start_tool will pass those OMPT callbacks related to target constructs to libomptarget Differential Revision: https://reviews.llvm.org/D99803	2021-08-04 18:00:11 +02:00
AndreyChurbanov	8e29b4b323	[OpenMP] libomp: taskwait depend implementation fixed. Fix for https://bugs.llvm.org/show_bug.cgi?id=49723. Eliminated references from task dependency hash to node allocated on stack, thus eliminated accesses to stale memory. So the node now never freed. Uncommented assertion which triggered when stale memory accessed. Removed unneeded ref count increment for stack allocated node. Differential Revision: https://reviews.llvm.org/D106705	2021-08-03 15:45:20 +03:00
Jon Chesterfield	567c8c7bfd	[libomptarget][nfc] Only set cuda-path for nvptx tests Remove --cuda-path=CUDA_TOOLKIT_ROOT_DIR-NOTFOUND from the invocation of non-nvptx test cases. Better signal to noise ratio on other architectures. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D107074	2021-07-30 23:01:09 +01:00
Jose M Monsalve Diaz	5424ceeda0	[OpenMP] Fixing llvm-omp-device-info compilation with runtimes When using `-DLLVM_ENABLED_RUNTIMES` instead of `-DLLVM_ENABLED_PROJECTS` the `llvm-omp-device-info` tool is not compiled or installed. In general, no llvm tool would be build on runtimes, because the -DLLVM_BUILD_TOOLS flag is removed by the way runtimes compilation calls cmake again. This patch is simple. Just forward the value of this flag to the runtime cmake command. I'm also removing an unnecessary comment in the compilation of the tool Differential Revision: https://reviews.llvm.org/D107177	2021-07-30 13:09:08 -05:00
Shilei Tian	36d53af4a9	[OpenMP][Offloading] Remove task wait in nowait interfaces All `nowait` series of interfaces in `libomptarget` accept four more arguments (`int32_t depNum, void depList, int32_t noAliasDepNum, void noAliasDepList`) compared with their counterparts w/o `nowait`. These extra arguments were expected for dependence resolution, potentially lowered to device side. Current implementation calls `libomp` function `__kmpc_omp_taskwait`. However, the front end simply ignores them, that these four arguments are not emitted at all. As a consequence, the `depNum` and `noAliasDepNum` are garbage, which could lead to unnecessary task wait. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107164	2021-07-30 11:39:46 -04:00
AndreyChurbanov	8b81524c6d	[OpenMP][NFC] libomp: silence warnings on unused variables. Put declarations/definitions of unused variables under corresponding macros to silence clang build warnings. Differential Revision: https://reviews.llvm.org/D106608	2021-07-30 17:04:42 +03:00
Joachim Protze	4ffa1478fd	[libomptarget][amdcgn] Add build dependency for opt This patch should fix the build we observe when building LLVM from scratch. Differential Revision: https://reviews.llvm.org/D107156	2021-07-30 15:45:13 +02:00
Terry Wilmarth	d8e4cb9121	[OpenMP] libomp: Add new experimental barrier: two-level distributed barrier Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier. This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently. The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required: KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier. Patch fixed for ITTNotify disabled builds and non-x86 builds Co-authored-by: Jonathan Peyton <jonathan.l.peyton@intel.com> Co-authored-by: Vladislav Vinogradov <vlad.vinogradov@intel.com> Differential Revision: https://reviews.llvm.org/D103121	2021-07-29 14:09:26 -05:00
Joachim Protze	4acc2f29a2	[OpenMP][Tools][Tests][NFC] Address flaky archer tests Adding more concurrent threads significantly increases the chance that the data race can be observed during testing.	2021-07-29 17:56:44 +02:00
Jon Chesterfield	a90da62adb	[libomptarget][amdgpu] Update printed plugin name	2021-07-29 14:46:42 +01:00
Jose M Monsalve Diaz	88e66fa60a	[OpenMP] Fixing missing variables when CUDA SDK not in system This patch fixes the error reported in D106751. When there is no CUDA SDK installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE` variables. Using @zsrkmyn sugested fix Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106933	2021-07-27 23:46:15 -05:00
Jose M Monsalve Diaz	313c523995	[OpenMP][Tool] Introducing the `llvm-omp-device-info` tool This patch introduces the `llvm-omp-device-info` tool, which uses the omptarget library and interface to query the device info from all the available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo` Since omptarget usually requires a description structure with executable kernels, I split the initialization of the RTLs and Devices to be able to initialize all possible devices and query each of them. This revision relies on the patch that introduces the print device info. A limitation is that the order in which the devices are initialized, and the corresponding device ID is not necesarily the one seen by OpenMP. The changes are as follows: 1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function 2. Create an `initAllRTLs` method that initializes all available RTLs at runtime 3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information. Example Output: ``` Device (0): print_device_info not implemented Device (1): print_device_info not implemented Device (2): print_device_info not implemented Device (3): print_device_info not implemented Device (4): CUDA Driver Version: 11000 CUDA Device Number: 0 Device Name: Quadro P1000 Global Memory Size: 4236312576 bytes Number of Multiprocessors: 5 Concurrent Copy and Execution: Yes Total Constant Memory: 65536 bytes Max Shared Memory per Block: 49152 bytes Registers per Block: 65536 Warp Size: 32 Threads Maximum Threads per Block: 1024 Maximum Block Dimensions: 1024, 1024, 64 Maximum Grid Dimensions: 2147483647 x 65535 x 65535 Maximum Memory Pitch: 2147483647 bytes Texture Alignment: 512 bytes Clock Rate: 1480500 kHz Execution Timeout: Yes Integrated Device: No Can Map Host Memory: Yes Compute Mode: DEFAULT Concurrent Kernels: Yes ECC Enabled: No Memory Clock Rate: 2505000 kHz Memory Bus Width: 128 bits L2 Cache Size: 1048576 bytes Max Threads Per SMP: 2048 Async Engines: Yes (2) Unified Addressing: Yes Managed Memory: Yes Concurrent Managed Memory: Yes Preemption Supported: Yes Cooperative Launch: Yes Multi-Device Boars: No Compute Capabilities: 61 ``` Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106752	2021-07-27 22:38:35 -04:00
Jose M Monsalve Diaz	d2f85d0910	[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget` This patch introduces a function in the device's plugin to print the device information. This patch relates to another patch that introduces a CLI tool to obtain the device information from the omplibrary directly. It is inspired by PGI's pgaccelinfo. The modifications are as follows: 1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL. 2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented 3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy` 4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106751	2021-07-27 21:47:57 -04:00
Jose M Monsalve Diaz	5ab6aedda9	[OpenMP] Folding threadLimit and numThreads when single value in kernels The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033	2021-07-27 21:47:12 -04:00
Johannes Doerfert	ed7ec860f0	[OpenMP] Improve alignment handling in the new device runtime	2021-07-27 17:50:27 -05:00
Joseph Huber	e3ee76245e	[Libomptarget] Revert new variable sharing to use the old method The new method of sharing variables introduces a `__kmpc_alloc_shared` call that cannot be removed in the middle end because of its non-constant argument and unconnected free. This patch reverts this to the old method that used a static amount of shared memory for sharing variables. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106905	2021-07-27 18:14:01 -04:00
Joachim Protze	e32e1dae61	[OpenMP][Tests] Fix test compatibility gcc and clang disagree in how the event handle needs to be handled. According to OpenMP LC, gcc is right. Will open clang bug report	2021-07-28 00:08:32 +02:00
Joachim Protze	3c76e99291	[OpenMP] Fix deadlock for detachable task with child tasks This patch fixes https://bugs.llvm.org/show_bug.cgi?id=49066. For detachable tasks, the assumption breaks that the proxy task cannot have remaining child tasks when the proxy completes. In stead of increment/decrement the incomplete task count, a high-order bit is flipped to mark and wait for the incomplete proxy task. Differential Revision: https://reviews.llvm.org/D101082	2021-07-28 00:01:35 +02:00
Vignesh Balasubramanian	23eced9ead	Convert the error to warning for enabling OMPD in non-Linux platform OMPD is enabled by default on Linux machines and disabled on others. However, if explicitly enabled it throws an error and exit while configuring. It is mentioned in Bug: https://bugs.llvm.org/show_bug.cgi?id=51121 This patch, instead of throwing error, disables OMPD support with a warning message, so configuration can continue. Reviewed By: @protze.joachim Differential Revision: https://reviews.llvm.org/D106682	2021-07-27 17:25:27 +05:30
Johannes Doerfert	67ab875ff5	[OpenMP] Prototype opt-in new GPU device RTL The "old" OpenMP GPU device runtime (D14254) has served us well for many years but modernizing it has caused some pain recently. This patch introduces an alternative which is mostly written from scratch embracing OpenMP 5.X, C++, LLVM coding style (where applicable), and conceptual interfaces. This new runtime is opt-in through a clang flag (D106793). The new runtime is currently only build for nvptx and has "-new" in its name. The design is tailored towards middle-end optimizations rather than front-end code generation choices, a trend we already started in the old runtime a while back. In contrast to the old one, state is organized in a simple manner rather than a "smart" one. While this can induce costs it helps optimizations. Our expectation is that the majority of codes can be optimized and a "simple" design is therefore preferable. The new runtime does also avoid users to pay for things they do not use, especially wrt. memory. The unlikely case of nested parallelism is supported but costly to make the more likely case use less resources. The worksharing and reduction implementation have been taken from the old runtime and will be rewritten in the future if necessary. Documentation and debug features are still mostly missing and will be added over time. All external symbols start with `__kmpc` for legacy reasons but should be renamed once we switch over to a single runtime. All internal symbols are placed in appropriate namespaces (anonymous or `_OMP`) to avoid name clashes with user symbols. Differential Revision: https://reviews.llvm.org/D106803	2021-07-27 00:56:05 -05:00
Shilei Tian	e97e0a4fad	[AbstractAttributor] Fold __kmpc_parallel_level if possible Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible. Note that `__kmpc_parallel_level` doesn't take activeness into consideration, based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead of 0, 129, 130, etc. that also indicate activeness. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106154	2021-07-26 22:46:19 -04:00
Joseph Huber	dead50d442	[OpenMP][NFC] Fix a few typos in OpenMP documentation Summary: Fixes some typos in the OpenMP documentation.	2021-07-26 16:03:47 -04:00
Jon Chesterfield	2a613a7790	[libomptarget] Build amdgpu plugin without hsa Default to building the amdgpu plugin to use dlopen when hsa is not found instead of disabling it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106600	2021-07-26 09:54:51 +01:00
Jon Chesterfield	93fe84d32f	[libomptarget][nfc] Squash unused variable warning Suppress only current warning on openmp-clang-x86_64-linux-debian Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106777	2021-07-26 09:54:31 +01:00
Jon Chesterfield	dd0b463dd9	[libomptarget][amdgpu] More robust handling of failure to init HSA If hsa_init fails, subsequent calls into hsa are not safe. Except for hsa_init, but we don't retry on failure. This patch: - deletes a print that called into hsa to ask why it can't call into hsa - drops a merge conflict block next to that print - reliably initializes number of devices to zero - skips the plugin destructor contents if the constructor failed to init hsa Tested by making hsa_init return error, and by forcing the dynamic library use which was then deleted from disk. Before this patch, both segv. After it, friendly message about offloading being unavailable. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106774	2021-07-25 23:15:58 +01:00
Jon Chesterfield	e3251f2ec4	Revert "[libomptarget] Build amdgpu plugin without hsa" Inaccurate error handling around hsa_init This reverts commit `e30b3b23a4`.	2021-07-25 21:03:51 +01:00
Jon Chesterfield	e30b3b23a4	[libomptarget] Build amdgpu plugin without hsa Default to building the amdgpu plugin to use dlopen when hsa is not found instead of disabling it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106600	2021-07-25 19:33:36 +01:00
Joachim Protze	c46ccb8538	[OpenMP][tests][NFC] Update test status for gcc 11 and 12 gcc 11 introduced support for depend clause, but the gomp interface of libomp does not yet handle the information. Also remove -fopenmp-version=50, which is no longer needed for clang, but not supported by gcc.	2021-07-25 18:56:36 +02:00
Shilei Tian	f1b8fa55d0	[OpenMP][NVPTX] Disable OpenMPOpt when building deviceRTLs We build `deviceRTLs` with `-O1` by default, which also triggers OpenMPOpt. When the info cache is created, some attributes are removed. As a result, although we mark a few functions `noinline`, they are still inlined when the bitcode library is generated. This can cause an issue in middle end optimization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106710	2021-07-25 10:38:27 -04:00
Ye Luo	4079037a3e	[OpenMP] always compile with c++14 instead of gnu++14 Fixes PR 51174. c++14 should be a more portable option than gnu++14. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106632	2021-07-23 17:29:08 -04:00
Shilei Tian	c2c43132f6	[OpenMP] Fix bug 50022 Bug 50022 [0] reports target nowait fails in certain case, which is added in this patch. The root cause of the failure is, when the second task is created, its parent's `td_incomplete_child_tasks` will not be incremented because there is no parallel region here thus its team is serialized. Therefore, when the initial thread is waiting for its unfinished children tasks, it thought there is only one, the first task, because it is hidden helper task, so it is tracked. The second task will only be pushed to the queue when the first task is finished. However, when the first task finishes, it first decrements the counter of its parent, and then release dependences. Once the counter is decremented, the thread will move on because its counter is reset, but actually, the second task has not been executed at all. As a result, since in this case, the main function finishes, then `libomp` starts to destroy. When the second task is pushed somewhere, all some of the structures might already have already been destroyed, then anything could happen. This patch simply moves `__kmp_release_deps` ahead of decrement of the counter. In this way, we can make sure that the initial thread is aware of the existence of another task(s) so it will not move on. In addition, in order to tackle dependence chain starting with hidden helper thread, when hidden helper task is encountered, we force the task to release dependences. Reference: [0] https://bugs.llvm.org/show_bug.cgi?id=50022 Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D106519	2021-07-23 16:54:11 -04:00
Joseph Huber	e1dedecaa6	[Libomptarget] Add unroll flag to shared variables loop Unrolling this loop provides better performance in practice because it is executed on the device and is likely to be very small. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106692	2021-07-23 16:45:27 -04:00
Shilei Tian	18ce3d3f2c	[OpenMP][Offloading] Fix data race in data mapping by using two locks This patch tries to partially fix one of the two data race issues reported in [1] by introducing a per-entry mutex. Additional discussion can also be found in D104418, which will also be refined to fix another data race problem. Here is how it works. Like before, `DataMapMtx` is still being used for mapping table lookup and update. In any case, we will get a table entry. If we need to make a data transfer (update the data on the device), we need to lock the entry right before releasing `DataMapMtx`, and the issue of data transfer should be after releasing `DataMapMtx`, and the entry is unlocked afterwards. This can guarantee that: 1) issue of data movement is not in critical region, which will not affect performance too much, and also will not affect other threads that don't touch the same entry; 2) if another thread accesses the same entry, the state of data movement is consistent (which requires that a thread must first get the update lock before getting data movement information). For a target that doesn't support async data transfer, issue of data movement is data transfer. This two-lock design can potentially improve concurrency compared with the design that guards data movement with `DataMapMtx` as well. For a target that supports async data movement, we could simply attach the event between the issue of data movement and unlock the entry. For a thread that wants to get the event, it must first get the lock. This can also get rid of the busy wait until the event pointer is valid. Reference: [1] https://bugs.llvm.org/show_bug.cgi?id=49940 Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D104555	2021-07-23 16:10:51 -04:00
Abhinav Gaba	f7c92995c0	[OpenMP] Fix CUDA plugin build after `3817ba13ae`. The build was broken on machines that don't have Cuda SDK installed. See https://reviews.llvm.org/D106627 for the original discussion.	2021-07-23 16:50:00 +08:00
Johannes Doerfert	d12ee28e2e	[OpenMP] Simplify the ThreadStackTy for globalization fallback With D106496 we can make the globalization fallback stack much simpler and this version doesn't seem to experience the spurious failures and deadlocks we have seen before. Differential Revision: https://reviews.llvm.org/D106576	2021-07-22 23:57:46 -05:00
Joseph Huber	76c0c0ca86	[OpenMP][NFC] Fix formatting in CUDA plugin	2021-07-22 21:50:40 -04:00
Joseph Huber	3817ba13ae	[OpenMP] Add environment variables to change stack / heap size in the CUDA plugin This patch adds support for two environment variables to configure the device. ``LIBOMPTARGET_STACK_SIZE`` sets the amount of memory in bytes that each thread has for its stack. ``LIBOMPTARGET_HEAP_SIZE`` sets the amount of heap memory that can be allocated using malloc / free on the device. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106627	2021-07-22 21:40:02 -04:00
Shilei Tian	ea452353c0	[OpenMP] Refined the logic to give a regular task from a hidden helper task In current implementation, if a regular task depends on a hidden helper task, and when the hidden helper task is releasing its dependences, it directly calls `__kmp_omp_task`. This could cause a problem that if `__kmp_push_task` returns `TASK_NOT_PUSHED`, the task will be executed immediately. However, the hidden helper threads are assumed to only execute hidden helper tasks. This could cause problems because when calling `__kmp_omp_task`, the encountering gtid, which is not the real one of the thread, is passed. This patch uses `__kmp_give_task`, but because it is a static function, a new wrapper `__kmpc_give_task` is added. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D106572	2021-07-22 19:21:29 -04:00
Jose M Monsalve Diaz	68d6278a6e	[OpenMP] Renaming RT functions `GetNumberOfBlocksInKernel` and `GetNumberOfThreadsInBlock` These functions should follow the camel case convention. These are really easy to change and are needed for D106033. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D106390	2021-07-22 18:17:49 -04:00
Jon Chesterfield	9e05c084e5	[libomptarget][amdgpu][nfc] Normalise license headers Reviewed By: gregrodgers, jdoerfert Differential Revision: https://reviews.llvm.org/D106581	2021-07-22 20:23:41 +01:00
Jon Chesterfield	14e34a83b0	[libomptarget][amdgpu][nfc] Replace use of gelf.h with libelf.h AMDGPU can assume Elf64 so doesn't need to abstract over Elf32 Drop a few other unused headers at the same time. Now only llvm elf and libelf are used by the plugin. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106579	2021-07-22 20:04:13 +01:00
Jon Chesterfield	1a96570621	[libomptarget][amdgpu] Implement dlopen of libhsa AMDGPU plugin equivalent of D95155, build without HSA installed locally Compiles a new file, plugins/amdgpu/dynamic_hsa/hsa.cpp, to an object file that exposes the same symbols that the plugin presently uses from hsa. The object file contains dlopen of hsa and cached dlsym calls. Also provides header files corresponding to the subset that is used. This is behind a feature flag, LIBOMPTARGET_FORCE_DLOPEN_LIBHSA, default off. That allows developers to build against the dlopen/dlsym implementation, e.g. while testing this mode. Enabling by default will cause this plugin to build on a wider variety of machines than it does at present so may break some CI builds. That risk can be minimised by reviewing the header dependencies of the library and ensuring it doesn't use any libraries that are not already used by libomptarget. Separating the implementation from enabling by default in case the latter needs to be rolled back after wider CI results. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106559	2021-07-22 16:54:10 +01:00
Jon Chesterfield	6e9cd3e9f1	[libomptarget][nfc] Improve static assert message in dlwrap Revision of D102858. Raise dlwrap arity argument to template argument so the correct value is given in the error message. E.g. '2 == 1' instead of '2 == trait<>::nargs'. Arity higher than it should be: Before diff ``` $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: error: static_assert failed due to requirement '2 == trait<cudaError_enum (*)(unsigned int)>::nargs' "Arity Error" DLWRAP_INTERNAL(cuInit, 2); ^~~~~~~~~~~~~~~~~~~~~~~~~~ ... $/include/dlwrap.h:166:3: note: expanded from macro 'DLWRAP_COMMON' static_assert(ARITY == trait<decltype(&SYMBOL)>::nargs, "Arity Error"); \ ``` After diff In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16: ``` $/include/dlwrap.h:131:3: error: static_assert failed due to requirement '2UL == 1UL' "Arity Error" static_assert(Requested == Required, "Arity Error"); ^ ~~~~~~~~~~~~~~~~~~~~~ $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in instantiation of function template specialization 'dlwrap::verboseAssert<2UL, 1UL>' requested here DLWRAP_INTERNAL(cuInit, 2); ``` Arity lower than it should be: Before diff ``` $/plugins/cuda/dynamic_cuda/cuda.cpp:131:10: error: no matching function for call to 'dlwrap_cuInit' return dlwrap_cuInit(X); ^~~~~~~~~~~~~ $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: candidate function not viable: requires 0 arguments, but 1 was provided DLWRAP_INTERNAL(cuInit, 0); ``` After diff In file included from $/plugins/cuda/dynamic_cuda/cuda.cpp:16: ``` $/include/dlwrap.h:131:3: error: static_assert failed due to requirement '0UL == 1UL' "Arity Error" static_assert(Requested == Required, "Arity Error"); ^ ~~~~~~~~~~~~~~~~~~~~~ $/plugins/cuda/dynamic_cuda/cuda.cpp:23:1: note: in instantiation of function template specialization 'dlwrap::verboseAssert<0UL, 1UL>' requested here DLWRAP_INTERNAL(cuInit, 0); ``` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106543	2021-07-22 15:24:20 +01:00
Joseph Huber	a158d3663f	[OpenMP] Fix warnings for uninitialized block counts Summary: Fixes some warning given for uninitialized block counts if the exection mode is not recognized. This shouldn't happen in practice because the execution mode is checked when it's read from the device.	2021-07-22 09:24:07 -04:00
Jon Chesterfield	dc1f6f8b92	[libomptarget][amdgpu][nfc] Drop dead signal pool setup This class is instantiated once in rtl.cpp before hsa_init is called. The hsa_signal_create call therefore fails leaving the pool empty. This signal pool is a legacy from ATMI where it was constructed after hsa_init. Moving the state into the rtl.cpp global class disabled the initial populating of the pool without noticeably changing performance. Just rechecked with a fix that allocates the signals after hsa_init and that also doesn't noticeably change performance. This patch therefore drops the initialisation. Only change from main is to drop a DEBUG_PRINT statement that would say the pool initial size is zero. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106515	2021-07-22 10:29:32 +01:00
Joseph Huber	4a66860424	[OpenMP] Add an option to disable function internalization Function internalization can sometimes occur in situations where we want to keep the call sites intact. This patch adds an option to disable function internalization and prevents the device runtime from being internalized while creating the bitcode library. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106438	2021-07-21 21:18:18 -04:00
Joseph Huber	1684012a47	[Libomptarget] Introduce new main thread ID runtime function This patch introduces `__kmpc_is_generic_main_thread_id` which splits the old comparison into its own runtime function. The purpose of this is so we can fold this part independently, so when both this and `is_spmd_mode` are folded the final function will be folded as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106437	2021-07-21 21:18:14 -04:00
Joseph Huber	7d57639264	[OpenMP] Add new execution mode for SPMD execution with Generic semantics Qualified kernels can be transformed from generic-mode to SPMD mode using an optimization in OpenMPOpt. This patch introduces a new execution mode to indicate kernels that have been transformed from generic-mode to SPMD-mode. These kernels have SPMD-mode execution, but need generic-mode semantics for scheduling the blocks and threads. Without this far too few blocks will be scheduled for a generic region as SPMD mode expects the trip count to be divided by the number of threads. Reviewed By: ggeorgakoudis Differential Revision: https://reviews.llvm.org/D106460	2021-07-21 20:57:28 -04:00
Joseph Huber	754eb1c210	[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106496	2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis	5a682d9b91	[OpenMP] Expose libomptarget function to get HW thread id The patch exposes the libomptarget runtime function that gets the hardware thread id through the kmpc API. This is to be used in SPMDization for checking the thread id to execute regions by a single thread in a block. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106323	2021-07-21 10:26:04 -07:00
Jon Chesterfield	a733bbbd17	[libomptarget][amdgpu][nfc] Refactor #includes Create a hsa_api.h header that includes the ROCr headers in use Drop some unused headers and _cplusplus macros Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106455	2021-07-21 17:28:07 +01:00
Shilei Tian	55c65884a4	[OpenMP][deviceRTLs] Update return type of function __kmpc_parallel_level In `deviceRTLs`, the parallel level is stored in a shared variable of type `uint8_t`. `__kmpc_parallel_level` currently returns a 16-bit interger. This patch first changes the return type of the function to `uint8_t`, same as the shared variable, and then corrects function type which was updated in D105955. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106384	2021-07-20 15:45:43 -04:00
Shilei Tian	02dff78983	[NFC][OpenMP] Fix an issue that no CHECK in test cases This fixes the complaint from FileCheck. Reviewed By: abhinavgaba, jdoerfert Differential Revision: https://reviews.llvm.org/D106387	2021-07-20 15:39:18 -04:00
Joseph Huber	b917a1d713	[OpenMP] Change AMDGCN to AMDGPU in the Cmake Module Summary: Change the name for targeting AMD offloading.	2021-07-20 12:52:53 -04:00

1 2 3 4 5 ...

1932 Commits