llvm-project

Commit Graph

Author	SHA1	Message	Date
Jon Chesterfield	a5f4074d85	[libomptarget][amdgpu] Macro for accessing GPU variables from plugin Lets the amdgpu plugin write to omptarget_device_environment to enable debugging. Intend to use in the near future to record the wavesize that a given deviceRTL was compiled with for running on hardware that supports 32 or 64. Patch sets all the attributes that are useful. Notably .data means the variable is set by writing to host memory before copying to the GPU instead of launching a kernel to update the image. Can simplify the plugin slightly to drop the code for patching after load if this is used consistently. NFC on nvptx, cuda plugin seems to work fine without any annotations. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108698	2021-08-26 17:28:18 +01:00
Jon Chesterfield	ba0af885e7	[libomptarget][amdgpu][nfc] Make grid value access match devicertl	2021-08-25 15:11:19 +01:00
Jon Chesterfield	9b2c6c07b5	[libomptarget][amdgpu] Refactor debug printing Move most debug printing in rtl.cpp behind DP() macro Adjust the print output for gpu arch mismatch when the architectures match Convert an assert into graceful failure Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108562	2021-08-25 14:57:51 +01:00
Jon Chesterfield	ba8547775b	[libomptarget][amdgpu] Fix debug build from D104696	2021-08-25 01:27:51 +01:00
Michael Kruse	1275ee3041	[OpenMP][amdgcn] Don't use in-tree clang if not available. The use of `$<TARGET_FILE:clang>` was adapted too broadly from D101265. Fixes llvm.org/PR51579 Also see discussion in D108534. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D108640	2021-08-24 12:50:49 -05:00
Pushpinder Singh	9b8b7c1180	[AMDGPU][Libomptarget] Delete g_atl_machine global With uses of g_atl_machine gone, a significant portion of dead code has been removed. This patch depends on D104691 and D104695. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104696	2021-08-24 07:59:40 +00:00
Jon Chesterfield	d26000e4cc	[openmp][devicertl] Freestanding nvptx via stub printf Compiled nvptx devicertl as freestanding, breaking the dependency on host glibc and gcc-multilibs. Thus build it by default. Comes at the cost of #defining out printf. Tried mapping it onto __builtin_printf but that gets transformed back to printf instead of hitting the cuda/openmp lowering transform. Printf could be preserved by one of: - dropping all the standard headers and ffreestanding - providing a header only printf implementation - changing the compiler handling of printf Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D108349	2021-08-23 23:07:47 +01:00
Jon Chesterfield	842f875c8b	[openmp] Use llvm GridValues from devicertl Add include path to the cmakefiles and set the target_impl enums from the llvm constants instead of copying the values. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108391	2021-08-23 20:25:24 +01:00
Peyton, Jonathan L	d39d3a327b	[OpenMP][test] fix omp_get_wtime.c test to be more accommodating The omp_get_wtime.c test fails intermittently if the recorded times are off by too much which can happen when many tests are run in parallel. Instead of failing if one timing is a little off, take average of 100 timings minus the 10 worst. Differential Revision: https://reviews.llvm.org/D108488	2021-08-23 08:13:42 -05:00
Vignesh Balasubramanian	589519b9ab	[OpenMP][OMPD]Code movement required for OMPD These changes don't come under OMPD guard as it is a movement of existing code to capture parallel behavior correctly. "Runtime Entry Points for OMPD" like "ompd_bp_parallel_begin" and "ompd_bp_parallel_begin" should be placed at the correct execution point for the debugging tool to access proper handles/data. Without the below changes, in certain cases, debugging tool will pick the wrong parallel and task handle. Reviewed By: @hbae Differential Revision: https://reviews.llvm.org/D100366	2021-08-20 14:36:22 +05:30
Shilei Tian	1d8d43ae61	[OpenMP] Use `__kmpc_give_task` in `__kmp_push_task` when encountering a hidden helper task This patch replaces the current implementation, overwrites `gtid` and `thread`, with `__kmpc_give_task`. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D106977	2021-08-19 20:49:29 -04:00
Joachim Protze	4bb36df144	[libomptarget][amdcgn] Add build dependency for llvm-link and opt D107156 and D107320 are not sufficient when OpenMP is built as llvm runtime (LLVM_ENABLE_RUNTIMES=openmp) because dependencies only work within the same cmake instance. We could limit the dependency to cases where libomptarget/plugins are really built. But compared to the whole llvm project, building openmp runtime is negligible and postponing the build of OpenMP runtime after the dependencies are ready seems reasonable. The direct dependency introduced in D107156 and D107320 is necessary for the case where OpenMP is built as llvm project (LLVM_ENABLE_PROJECTS=openmp). Differential Revision: https://reviews.llvm.org/D108404	2021-08-20 01:57:58 +02:00
Jennifer Yu	c274b19866	Add implicit map for a list item appears in a reduction clause. A new rule is added in 5.0: If a list item appears in a reduction, lastprivate or linear clause on a combined target construct then it is treated as if it also appears in a map clause with a map-type of tofrom. Currently map clauses for all capture variables are added implicitly. But missing for list item of expression for array elements or array sections. The change is to add implicit map clause for array of elements used in reduction clause. Skip adding map clause if the expression is not mappable. Noted: For linear and lastprivate, since only variable name is accepted, the map has been added though capture variables. To do so: During the mappable checking, if error, ignore diagnose and skip adding implicit map clause. The changes: 1> Add code to generate implicit map in ActOnOpenMPExecutableDirective, for omp 5.0 and up. 2> Add extra default parameter NoDiagnose in ActOnOpenMPMapClause: Use that to skip error as well as skip adding implicit map during the mappable checking. Note: there are only tow places need to be check for NoDiagnose. Rest of them either the check is for < omp 5.0 or the error already generated for reduction clause. Differential Revision: https://reviews.llvm.org/D108132	2021-08-19 12:53:47 -07:00
Jon Chesterfield	ad0f6e1d98	[openmp] Disable the tests that block CI for amdgpu and host offloading.	2021-08-19 20:43:30 +01:00
Jon Chesterfield	6c75ce1b8b	[libomptarget][nfc] Move lanemask_t type into target_impl.h	2021-08-19 18:50:03 +01:00
Jon Chesterfield	77579b99e9	[openmp][nfc] Replace OMPGridValues array with struct [nfc] Replaces enum indices into an array with a struct. Named the fields to match the enum, leaves memory layout and initialization unchanged. Motivation is to later safely remove dead fields and replace redundant ones with (compile time) computation. It should also be possible to factor some common fields into a base and introduce a gfx10 amdgpu instance with less duplication than the arrays of integers require. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D108339	2021-08-19 13:25:42 +01:00
Jon Chesterfield	f420939b82	[libomptarget] Apply D106710 to amdgcn devicertl	2021-08-19 01:34:33 +01:00
Jon Chesterfield	c480792b6a	[libomptarget][nfc][devicertl] Delete unused enums	2021-08-19 00:14:34 +01:00
Jon Chesterfield	21d91a8ef3	[libomptarget][devicertl] Replace lanemask with uint64 at interface Use uint64_t for lanemask on all GPU architectures at the interface with clang. Updates tests. The deviceRTL is always linked as IR so the zext and trunc introduced for wave32 architectures will fold after inlining. Simplification partly motivated by amdgpu gfx10 which will be wave32 and is awkward to express in the current arch-dependant typedef interface. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108317	2021-08-18 20:47:33 +01:00
Joseph Huber	edb8acdc6e	[Libomptarget] Correctly default to Generic if exec_mode is not present Currently, the runtime returns an error when the `exec_mode` global is not present. The expected behvaiour is that the region will default to Generic. This prevents global constructors from being called because they do not contain execution mode globals. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108255	2021-08-18 11:24:28 -04:00
Martin Storsjö	f5616a981c	[OpenMP] Fix the usage of sscanf on MinGW KMP_SSCANF only evaluates to sscanf_s within #if KMP_OS_WINDOWS && KMP_MSVC_COMPAT so we need to pass the sscanf_s specific parameters within a similar condition. Differential Revision: https://reviews.llvm.org/D108196	2021-08-17 21:36:09 +03:00
Peyton, Jonathan L	b4a1f441d9	[OpenMP] Add a few small fixes * Add comment to help ensure new construct data are added in two places * Check for division by zero in the loop worksharing code * Check for syntax errors in parrange parsing Differential Revision: https://reviews.llvm.org/D105929	2021-08-16 10:02:49 -05:00
Peyton, Jonathan L	6eeb4c1f32	[OpenMP] Fix incorrect parameters to sscanf_s call On Windows, the documentation states that when using sscanf_s, each %c and %s specifier must also have additional size parameter. This patch adds the size parameter in the one place where %c is used. Differential Revision: https://reviews.llvm.org/D105931	2021-08-16 09:59:21 -05:00
AndreyChurbanov	52cac541d4	[OpenMP] libomp: cleanup: minor fixes to silence static analyzer. Added couple more checks to silence KlocWork static code analyzer. Differential Revision: https://reviews.llvm.org/D107348	2021-08-16 13:39:23 +03:00
AndreyChurbanov	f94da67f49	[OpenMP][NFC] libomp: reduced timeouts in the test from 50 to 2 sec.	2021-08-11 17:58:52 +03:00
George Rokos	df06ec3057	[libomptarget][NFC] Fix compilation issue with GCC Removed redundant assignment from condition which causes gcc to emit the following error: error: operation on ‘MoveData’ may be undefined [-Werror=sequence-point]	2021-08-10 09:43:43 -07:00
Joel E. Denny	2ced1f338a	[OpenMP][NFC] Simplify targetDataEnd conditions for CopyMember targetDataEnd and targetDataBegin compute CopyMember/copy differently, and I don't see why they should. This patch eliminates one of those differences by making a simplifying NFC change to targetDataEnd. The change is NFC as follows. The change only affects the case when `!UNIFIED_SHARED_MEMORY \|\| HasCloseModifier`. In that case, the following points are always true: * The value of CopyMember is relevant later only if DelEntry = false. * DelEntry = false only if one of the following is true: * IsLast = false. In this case, it's always true that CopyMember = false = IsLast. * `MEMBER_OF && !PTR_AND_OBJ` is true. In this case, CopyMember = IsLast. * Thus, if CopyMember is relevant, CopyMember = IsLast. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D105990	2021-08-10 12:29:55 -04:00
Pirama Arumuga Nainar	49fabd9d76	[openmp] Do not use shared memory on Android Android provides ashmem/ASharedMemory support on newer releases, which we can use if requested by openmp users on Android. Also refactor the preprocessor check for using shared memory to kmp_config.h.cmake. Differential Revision: https://reviews.llvm.org/D107181	2021-08-09 09:41:32 -07:00
Dimitry Andric	400cd6d2f0	[libomptarget][amdgpu] use --allow-shlib-undefined to link on FreeBSD On FreeBSD, the `environ` symbol is undefined at link time for shared libraries, but resolved by the dynamic linker at runtime. Therefore, allow the symbol to be undefined when creating a shared library, by using the `--allow-shlib-undefined` linker flag, instead of `-z defs` (a.k.a `--no-undefined`). Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107698	2021-08-08 13:52:44 +02:00
Ye Luo	262289c103	[OpenMP] mark target task untied OpenMP specification Tasking Terminology target task :A mergeable and untied task that ... Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D107686	2021-08-07 12:31:20 -04:00
Dimitry Andric	71ae2e0221	[libomptarget][amdgpu] don't declare Elf_Note on FreeBSD On FreeBSD, the system `<libelf.h>` already declares `struct Elf_Note` indirectly (via `<sys/elf_common.h>`). This results in compile errors when building the libomptarget amdgpu plugin. Avoid redeclaring `struct Elf_Note` on FreeBSD to fix the errors. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107661	2021-08-06 21:45:26 +02:00
Shilei Tian	28939b6ae5	[NFC] Clean up and clang-format openmp/libomptarget/plugins/cuda/src/rtl.cpp	2021-08-05 22:32:28 -04:00
Shilei Tian	680c71b127	[OpenMP] Clean up for hidden helper task This patch makes some clean up for code of hidden helper task. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D107008	2021-08-04 12:36:44 -04:00
Shilei Tian	9f5d6ea52e	[OpenMP] Fix performance regression reported in bug #51235 This patch fixes the "performance regression" reported in https://bugs.llvm.org/show_bug.cgi?id=51235. In fact it has nothing to do with performance. The root cause is, the stolen task is not allowed to execute by another thread because by default it is tied task. Since hidden helper task will always be executed by hidden helper threads, it should be untied. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D107121	2021-08-04 12:34:49 -04:00
Lechen Yu	3bc8ce5dd7	[openmp] Add OMPT initialization in libomptarget When loading libomptarget, the init function in libomptarget/src/rtl.cpp will search for the libomptarget_start_tool function using libdl. libomptarget_start_tool will pass those OMPT callbacks related to target constructs to libomptarget Differential Revision: https://reviews.llvm.org/D99803	2021-08-04 18:00:11 +02:00
AndreyChurbanov	8e29b4b323	[OpenMP] libomp: taskwait depend implementation fixed. Fix for https://bugs.llvm.org/show_bug.cgi?id=49723. Eliminated references from task dependency hash to node allocated on stack, thus eliminated accesses to stale memory. So the node now never freed. Uncommented assertion which triggered when stale memory accessed. Removed unneeded ref count increment for stack allocated node. Differential Revision: https://reviews.llvm.org/D106705	2021-08-03 15:45:20 +03:00
Jon Chesterfield	567c8c7bfd	[libomptarget][nfc] Only set cuda-path for nvptx tests Remove --cuda-path=CUDA_TOOLKIT_ROOT_DIR-NOTFOUND from the invocation of non-nvptx test cases. Better signal to noise ratio on other architectures. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D107074	2021-07-30 23:01:09 +01:00
Jose M Monsalve Diaz	5424ceeda0	[OpenMP] Fixing llvm-omp-device-info compilation with runtimes When using `-DLLVM_ENABLED_RUNTIMES` instead of `-DLLVM_ENABLED_PROJECTS` the `llvm-omp-device-info` tool is not compiled or installed. In general, no llvm tool would be build on runtimes, because the -DLLVM_BUILD_TOOLS flag is removed by the way runtimes compilation calls cmake again. This patch is simple. Just forward the value of this flag to the runtime cmake command. I'm also removing an unnecessary comment in the compilation of the tool Differential Revision: https://reviews.llvm.org/D107177	2021-07-30 13:09:08 -05:00
Shilei Tian	36d53af4a9	[OpenMP][Offloading] Remove task wait in nowait interfaces All `nowait` series of interfaces in `libomptarget` accept four more arguments (`int32_t depNum, void depList, int32_t noAliasDepNum, void noAliasDepList`) compared with their counterparts w/o `nowait`. These extra arguments were expected for dependence resolution, potentially lowered to device side. Current implementation calls `libomp` function `__kmpc_omp_taskwait`. However, the front end simply ignores them, that these four arguments are not emitted at all. As a consequence, the `depNum` and `noAliasDepNum` are garbage, which could lead to unnecessary task wait. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D107164	2021-07-30 11:39:46 -04:00
AndreyChurbanov	8b81524c6d	[OpenMP][NFC] libomp: silence warnings on unused variables. Put declarations/definitions of unused variables under corresponding macros to silence clang build warnings. Differential Revision: https://reviews.llvm.org/D106608	2021-07-30 17:04:42 +03:00
Joachim Protze	4ffa1478fd	[libomptarget][amdcgn] Add build dependency for opt This patch should fix the build we observe when building LLVM from scratch. Differential Revision: https://reviews.llvm.org/D107156	2021-07-30 15:45:13 +02:00
Terry Wilmarth	d8e4cb9121	[OpenMP] libomp: Add new experimental barrier: two-level distributed barrier Two-level distributed barrier is a new experimental barrier designed for Intel hardware that has better performance in some cases than the default hyper barrier. This barrier is designed to handle fine granularity parallelism where barriers are used frequently with little compute and memory access between barriers. There is no need to use it for codes with few barriers and large granularity compute, or memory intensive applications, as little difference will be seen between this barrier and the default hyper barrier. This barrier is designed to work optimally with a fixed number of threads, and has a significant setup time, so should NOT be used in situations where the number of threads in a team is varied frequently. The two-level distributed barrier is off by default -- hyper barrier is used by default. To use this barrier, you must set all barrier patterns to use this type, because it will not work with other barrier patterns. Thus, to turn it on, the following settings are required: KMP_FORKJOIN_BARRIER_PATTERN=dist,dist KMP_PLAIN_BARRIER_PATTERN=dist,dist KMP_REDUCTION_BARRIER_PATTERN=dist,dist Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER, and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed barrier. Patch fixed for ITTNotify disabled builds and non-x86 builds Co-authored-by: Jonathan Peyton <jonathan.l.peyton@intel.com> Co-authored-by: Vladislav Vinogradov <vlad.vinogradov@intel.com> Differential Revision: https://reviews.llvm.org/D103121	2021-07-29 14:09:26 -05:00
Joachim Protze	4acc2f29a2	[OpenMP][Tools][Tests][NFC] Address flaky archer tests Adding more concurrent threads significantly increases the chance that the data race can be observed during testing.	2021-07-29 17:56:44 +02:00
Jon Chesterfield	a90da62adb	[libomptarget][amdgpu] Update printed plugin name	2021-07-29 14:46:42 +01:00
Jose M Monsalve Diaz	88e66fa60a	[OpenMP] Fixing missing variables when CUDA SDK not in system This patch fixes the error reported in D106751. When there is no CUDA SDK installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE` variables. Using @zsrkmyn sugested fix Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106933	2021-07-27 23:46:15 -05:00
Jose M Monsalve Diaz	313c523995	[OpenMP][Tool] Introducing the `llvm-omp-device-info` tool This patch introduces the `llvm-omp-device-info` tool, which uses the omptarget library and interface to query the device info from all the available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo` Since omptarget usually requires a description structure with executable kernels, I split the initialization of the RTLs and Devices to be able to initialize all possible devices and query each of them. This revision relies on the patch that introduces the print device info. A limitation is that the order in which the devices are initialized, and the corresponding device ID is not necesarily the one seen by OpenMP. The changes are as follows: 1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function 2. Create an `initAllRTLs` method that initializes all available RTLs at runtime 3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information. Example Output: ``` Device (0): print_device_info not implemented Device (1): print_device_info not implemented Device (2): print_device_info not implemented Device (3): print_device_info not implemented Device (4): CUDA Driver Version: 11000 CUDA Device Number: 0 Device Name: Quadro P1000 Global Memory Size: 4236312576 bytes Number of Multiprocessors: 5 Concurrent Copy and Execution: Yes Total Constant Memory: 65536 bytes Max Shared Memory per Block: 49152 bytes Registers per Block: 65536 Warp Size: 32 Threads Maximum Threads per Block: 1024 Maximum Block Dimensions: 1024, 1024, 64 Maximum Grid Dimensions: 2147483647 x 65535 x 65535 Maximum Memory Pitch: 2147483647 bytes Texture Alignment: 512 bytes Clock Rate: 1480500 kHz Execution Timeout: Yes Integrated Device: No Can Map Host Memory: Yes Compute Mode: DEFAULT Concurrent Kernels: Yes ECC Enabled: No Memory Clock Rate: 2505000 kHz Memory Bus Width: 128 bits L2 Cache Size: 1048576 bytes Max Threads Per SMP: 2048 Async Engines: Yes (2) Unified Addressing: Yes Managed Memory: Yes Concurrent Managed Memory: Yes Preemption Supported: Yes Cooperative Launch: Yes Multi-Device Boars: No Compute Capabilities: 61 ``` Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106752	2021-07-27 22:38:35 -04:00
Jose M Monsalve Diaz	d2f85d0910	[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget` This patch introduces a function in the device's plugin to print the device information. This patch relates to another patch that introduces a CLI tool to obtain the device information from the omplibrary directly. It is inspired by PGI's pgaccelinfo. The modifications are as follows: 1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL. 2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented 3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy` 4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106751	2021-07-27 21:47:57 -04:00
Jose M Monsalve Diaz	5ab6aedda9	[OpenMP] Folding threadLimit and numThreads when single value in kernels The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033	2021-07-27 21:47:12 -04:00
Johannes Doerfert	ed7ec860f0	[OpenMP] Improve alignment handling in the new device runtime	2021-07-27 17:50:27 -05:00
Joseph Huber	e3ee76245e	[Libomptarget] Revert new variable sharing to use the old method The new method of sharing variables introduces a `__kmpc_alloc_shared` call that cannot be removed in the middle end because of its non-constant argument and unconnected free. This patch reverts this to the old method that used a static amount of shared memory for sharing variables. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106905	2021-07-27 18:14:01 -04:00

1 2 3 4 5 ...

1870 Commits