llvm-project

Commit Graph

Author	SHA1	Message	Date
Dimitry Andric	400cd6d2f0	[libomptarget][amdgpu] use --allow-shlib-undefined to link on FreeBSD On FreeBSD, the `environ` symbol is undefined at link time for shared libraries, but resolved by the dynamic linker at runtime. Therefore, allow the symbol to be undefined when creating a shared library, by using the `--allow-shlib-undefined` linker flag, instead of `-z defs` (a.k.a `--no-undefined`). Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107698	2021-08-08 13:52:44 +02:00
Dimitry Andric	71ae2e0221	[libomptarget][amdgpu] don't declare Elf_Note on FreeBSD On FreeBSD, the system `<libelf.h>` already declares `struct Elf_Note` indirectly (via `<sys/elf_common.h>`). This results in compile errors when building the libomptarget amdgpu plugin. Avoid redeclaring `struct Elf_Note` on FreeBSD to fix the errors. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D107661	2021-08-06 21:45:26 +02:00
Shilei Tian	28939b6ae5	[NFC] Clean up and clang-format openmp/libomptarget/plugins/cuda/src/rtl.cpp	2021-08-05 22:32:28 -04:00
Jon Chesterfield	a90da62adb	[libomptarget][amdgpu] Update printed plugin name	2021-07-29 14:46:42 +01:00
Jose M Monsalve Diaz	88e66fa60a	[OpenMP] Fixing missing variables when CUDA SDK not in system This patch fixes the error reported in D106751. When there is no CUDA SDK installed in the system, the build fails due to missing `CU_DEVICE_ATTRIBUTE` variables. Using @zsrkmyn sugested fix Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106933	2021-07-27 23:46:15 -05:00
Jose M Monsalve Diaz	313c523995	[OpenMP][Tool] Introducing the `llvm-omp-device-info` tool This patch introduces the `llvm-omp-device-info` tool, which uses the omptarget library and interface to query the device info from all the available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo` Since omptarget usually requires a description structure with executable kernels, I split the initialization of the RTLs and Devices to be able to initialize all possible devices and query each of them. This revision relies on the patch that introduces the print device info. A limitation is that the order in which the devices are initialized, and the corresponding device ID is not necesarily the one seen by OpenMP. The changes are as follows: 1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function 2. Create an `initAllRTLs` method that initializes all available RTLs at runtime 3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information. Example Output: ``` Device (0): print_device_info not implemented Device (1): print_device_info not implemented Device (2): print_device_info not implemented Device (3): print_device_info not implemented Device (4): CUDA Driver Version: 11000 CUDA Device Number: 0 Device Name: Quadro P1000 Global Memory Size: 4236312576 bytes Number of Multiprocessors: 5 Concurrent Copy and Execution: Yes Total Constant Memory: 65536 bytes Max Shared Memory per Block: 49152 bytes Registers per Block: 65536 Warp Size: 32 Threads Maximum Threads per Block: 1024 Maximum Block Dimensions: 1024, 1024, 64 Maximum Grid Dimensions: 2147483647 x 65535 x 65535 Maximum Memory Pitch: 2147483647 bytes Texture Alignment: 512 bytes Clock Rate: 1480500 kHz Execution Timeout: Yes Integrated Device: No Can Map Host Memory: Yes Compute Mode: DEFAULT Concurrent Kernels: Yes ECC Enabled: No Memory Clock Rate: 2505000 kHz Memory Bus Width: 128 bits L2 Cache Size: 1048576 bytes Max Threads Per SMP: 2048 Async Engines: Yes (2) Unified Addressing: Yes Managed Memory: Yes Concurrent Managed Memory: Yes Preemption Supported: Yes Cooperative Launch: Yes Multi-Device Boars: No Compute Capabilities: 61 ``` Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D106752	2021-07-27 22:38:35 -04:00
Jose M Monsalve Diaz	d2f85d0910	[OpenMP][Libomptarget] Adding `print_device_info` to RTL and `omptarget` This patch introduces a function in the device's plugin to print the device information. This patch relates to another patch that introduces a CLI tool to obtain the device information from the omplibrary directly. It is inspired by PGI's pgaccelinfo. The modifications are as follows: 1. Introduce the optional `void __tgt_rtl_print_device_info(RTLdevID)` function into the RTL. 2. Introduce the `bool __tgt_print_device_info(devID)` function into `omptarget` interface. Returns false if the RTL is not implemented 3. Added `bool printDeviceInfo(RTLDevID)` to the `DeviceTy` 4. Implement the `__tgt_rtl_print_device_info` for CUDA. Added additional CUDA Runtime calls. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106751	2021-07-27 21:47:57 -04:00
Jon Chesterfield	2a613a7790	[libomptarget] Build amdgpu plugin without hsa Default to building the amdgpu plugin to use dlopen when hsa is not found instead of disabling it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106600	2021-07-26 09:54:51 +01:00
Jon Chesterfield	dd0b463dd9	[libomptarget][amdgpu] More robust handling of failure to init HSA If hsa_init fails, subsequent calls into hsa are not safe. Except for hsa_init, but we don't retry on failure. This patch: - deletes a print that called into hsa to ask why it can't call into hsa - drops a merge conflict block next to that print - reliably initializes number of devices to zero - skips the plugin destructor contents if the constructor failed to init hsa Tested by making hsa_init return error, and by forcing the dynamic library use which was then deleted from disk. Before this patch, both segv. After it, friendly message about offloading being unavailable. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106774	2021-07-25 23:15:58 +01:00
Jon Chesterfield	e3251f2ec4	Revert "[libomptarget] Build amdgpu plugin without hsa" Inaccurate error handling around hsa_init This reverts commit `e30b3b23a4`.	2021-07-25 21:03:51 +01:00
Jon Chesterfield	e30b3b23a4	[libomptarget] Build amdgpu plugin without hsa Default to building the amdgpu plugin to use dlopen when hsa is not found instead of disabling it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106600	2021-07-25 19:33:36 +01:00
Abhinav Gaba	f7c92995c0	[OpenMP] Fix CUDA plugin build after `3817ba13ae`. The build was broken on machines that don't have Cuda SDK installed. See https://reviews.llvm.org/D106627 for the original discussion.	2021-07-23 16:50:00 +08:00
Joseph Huber	76c0c0ca86	[OpenMP][NFC] Fix formatting in CUDA plugin	2021-07-22 21:50:40 -04:00
Joseph Huber	3817ba13ae	[OpenMP] Add environment variables to change stack / heap size in the CUDA plugin This patch adds support for two environment variables to configure the device. ``LIBOMPTARGET_STACK_SIZE`` sets the amount of memory in bytes that each thread has for its stack. ``LIBOMPTARGET_HEAP_SIZE`` sets the amount of heap memory that can be allocated using malloc / free on the device. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106627	2021-07-22 21:40:02 -04:00
Jon Chesterfield	9e05c084e5	[libomptarget][amdgpu][nfc] Normalise license headers Reviewed By: gregrodgers, jdoerfert Differential Revision: https://reviews.llvm.org/D106581	2021-07-22 20:23:41 +01:00
Jon Chesterfield	14e34a83b0	[libomptarget][amdgpu][nfc] Replace use of gelf.h with libelf.h AMDGPU can assume Elf64 so doesn't need to abstract over Elf32 Drop a few other unused headers at the same time. Now only llvm elf and libelf are used by the plugin. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106579	2021-07-22 20:04:13 +01:00
Jon Chesterfield	1a96570621	[libomptarget][amdgpu] Implement dlopen of libhsa AMDGPU plugin equivalent of D95155, build without HSA installed locally Compiles a new file, plugins/amdgpu/dynamic_hsa/hsa.cpp, to an object file that exposes the same symbols that the plugin presently uses from hsa. The object file contains dlopen of hsa and cached dlsym calls. Also provides header files corresponding to the subset that is used. This is behind a feature flag, LIBOMPTARGET_FORCE_DLOPEN_LIBHSA, default off. That allows developers to build against the dlopen/dlsym implementation, e.g. while testing this mode. Enabling by default will cause this plugin to build on a wider variety of machines than it does at present so may break some CI builds. That risk can be minimised by reviewing the header dependencies of the library and ensuring it doesn't use any libraries that are not already used by libomptarget. Separating the implementation from enabling by default in case the latter needs to be rolled back after wider CI results. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106559	2021-07-22 16:54:10 +01:00
Joseph Huber	a158d3663f	[OpenMP] Fix warnings for uninitialized block counts Summary: Fixes some warning given for uninitialized block counts if the exection mode is not recognized. This shouldn't happen in practice because the execution mode is checked when it's read from the device.	2021-07-22 09:24:07 -04:00
Jon Chesterfield	dc1f6f8b92	[libomptarget][amdgpu][nfc] Drop dead signal pool setup This class is instantiated once in rtl.cpp before hsa_init is called. The hsa_signal_create call therefore fails leaving the pool empty. This signal pool is a legacy from ATMI where it was constructed after hsa_init. Moving the state into the rtl.cpp global class disabled the initial populating of the pool without noticeably changing performance. Just rechecked with a fix that allocates the signals after hsa_init and that also doesn't noticeably change performance. This patch therefore drops the initialisation. Only change from main is to drop a DEBUG_PRINT statement that would say the pool initial size is zero. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106515	2021-07-22 10:29:32 +01:00
Joseph Huber	7d57639264	[OpenMP] Add new execution mode for SPMD execution with Generic semantics Qualified kernels can be transformed from generic-mode to SPMD mode using an optimization in OpenMPOpt. This patch introduces a new execution mode to indicate kernels that have been transformed from generic-mode to SPMD-mode. These kernels have SPMD-mode execution, but need generic-mode semantics for scheduling the blocks and threads. Without this far too few blocks will be scheduled for a generic region as SPMD mode expects the trip count to be divided by the number of threads. Reviewed By: ggeorgakoudis Differential Revision: https://reviews.llvm.org/D106460	2021-07-21 20:57:28 -04:00
Jon Chesterfield	a733bbbd17	[libomptarget][amdgpu][nfc] Refactor #includes Create a hsa_api.h header that includes the ROCr headers in use Drop some unused headers and _cplusplus macros Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106455	2021-07-21 17:28:07 +01:00
Jon Chesterfield	ddfb074a80	[libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global [libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global Folds some duplicates logic into a helper function, passes the new environment struct into getLaunchVals which no longer reads the DeviceInfo global. Implemented on top of D105237 Reviewed By: dhruvachak Differential Revision: https://reviews.llvm.org/D105239	2021-07-06 17:06:38 +01:00
Atmn Patel	21e92612c0	[Libomptarget] Experimental Remote Plugin Fixes D97883 introduced a compile-time error in the experimental remote offloading libomptarget plugin, this patch fixes it and resolves a number of inconsistencies in the plugin as well: 1. Non-functional Asynchronous API 2. Unnecessarily verbose debug printing 3. Misc. code clean ups This is not intended to make any functional changes to the plugin. Differential Revision: https://reviews.llvm.org/D105325	2021-07-02 12:38:34 -04:00
Jon Chesterfield	db89414da4	[libomptarget][nfc] Move grid size computation Change getLaunchVals to return the integers used for launch Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D105237	2021-07-01 12:53:04 +01:00
Dhruva Chakrabarti	98c36f0079	Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size" This reverts commit `2240b41ee4`. A value of 0 for KernDescVal WG_Size implies it is unknown, so it should be set to the default. The above change was made without this assumption. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105250	2021-06-30 17:15:00 -07:00
Jon Chesterfield	4b0926b044	[libomptarget][nfc] Replace out arguments with struct return A step towards making this function adequately self contained that it can be tested easily. No functional change intended here, left variable names unchanged. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D105229	2021-06-30 22:40:07 +01:00
Jon Chesterfield	d86b0073cf	[libomptarget][amdgpu][nfc] Fix build warnings, drop some headers Removes stdarg header, drops uses of iostream, fix some format string errors. Also changes a C style struct to C++ style to avoid a warning from clang/ Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D104923	2021-06-30 22:23:36 +01:00
Dhruva Chakrabarti	e0b713a035	[libomptarget] [amdgpu] Change default number of teams per computation unit This patch is related to https://reviews.llvm.org/D98832. Based on discussions there, I decided to separate out the teams default as this patch. This change is to increase the number of teams per computation unit so as to provide more wavefronts for hiding latency. This change improves performance for some programs, including 20-50% for some Stream benchmarks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D99003	2021-06-29 15:34:35 -07:00
Dhruva Chakrabarti	2240b41ee4	[libomptarget] [amdgpu] Fix default setting of max flat workgroup size When max flat workgroup size is not specified, it is set to the default workgroup size. This prevents kernel launch with a workgroup size larger than the default. The fix is to ignore a size of 0 and treat it as unspecified. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D105073	2021-06-29 13:47:24 -07:00
Pushpinder Singh	20df2c7052	[AMDGPU][Libomptarget] Collect allocatable memory pools using HSA The logic is almost similar to that of system.cpp with one change that instead of adding all the memory pools to a device struct it only keeps a single pool. The existing approach also always allocated memory on the first HSA pool found for a GPU. This depends on D104691. The goal of this series of patches is to remove _atl_machine global. The next patch will drop g_atl_machine entirely. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104695	2021-06-28 11:28:04 +00:00
Jon Chesterfield	96f6873dff	[OpenMP][NFC] Drop unused headers from amdgpu plugin	2021-06-25 12:08:56 +01:00
Aakanksha Patil	3453f3dd46	[AMDGPU] Add gfx1035 target Differential Revision: https://reviews.llvm.org/D104804	2021-06-24 14:32:41 -04:00
Joseph Huber	422adaa879	[OpenMP] Add thread limit environment variable support to plugins The OpenMP 5.1 standard defines the environment variable `OMP_TEAMS_THREAD_LIMIT` to limit the number of threads that will be run in a single block. This patch adds support for this into the AMDGPU and CUDA plugins. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D103923	2021-06-22 16:25:40 -04:00
Pushpinder Singh	9d110f9159	[AMDGPU][Libomptarget] Move allow_access_to_all_gpu_agents to rtl.cpp Moving this method helps eliminate a use of g_atl_machine. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104691	2021-06-22 11:44:52 +00:00
Vyacheslav Zakharin	aad9e48c5f	[NFC][libomptarget] Remove redundant libelf dependency for elf_common. Differential Revision: https://reviews.llvm.org/D104549	2021-06-21 07:19:55 -07:00
Pushpinder Singh	7a97cd9da7	[AMDGPU][Libomptarget] Remove redundant functions There does not seem to be any use of these functions. They just put the value to a local which is never used again. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D104512	2021-06-21 06:13:24 +00:00
Vyacheslav Zakharin	836992ab9a	[NFC][libomptarget] Build elf_common with PIC. Differential Revision: https://reviews.llvm.org/D104545	2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin	c5b7c7c8f7	[NFC][libomptarget] Fixed -DLLVM_ENABLE_RUNTIMES="openmp" build. Differential Revision: https://reviews.llvm.org/D104535	2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin	b5c4fc0f23	[NFC][libomptarget] Reduce the dependency on libelf This change-set removes libelf usage from elf_common part of the plugins. libelf is still used in x86_64 generic plugin code and in some plugins (e.g. amdgpu) - these will have to be cleaned up in separate checkins. Differential Revision: https://reviews.llvm.org/D103545	2021-06-16 08:34:23 -07:00
Pushpinder Singh	cadcaf3f46	[AMDGPU][Libomptarget] Drop dead code related to g_atl_machine This patch includes some changes which deletes the code accessing g_atl_machine global. Some accesses related to memory_pools are still remaining. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103813	2021-06-15 05:21:35 +00:00
Ron Lieberman	91f147792e	[libomptarget][amdgpu] Remove stray fprintf in rtl.cpp remove unintended fprintf in rtl.cpp Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D104003	2021-06-10 01:57:30 +00:00
Brendon Cahoon	294efbbd3e	Reland "[AMDGPU] Add gfx1013 target" This reverts commit `211e584fa2`. Fixed a use-after-free error that caused the sanitizers to fail.	2021-06-08 21:15:35 -04:00
Brendon Cahoon	211e584fa2	Revert "[AMDGPU] Add gfx1013 target" This reverts commit `ea10a86984`. A sanitizer buildbot reports an error.	2021-06-08 16:29:41 -04:00
Brendon Cahoon	ea10a86984	[AMDGPU] Add gfx1013 target Differential Revision: https://reviews.llvm.org/D103663	2021-06-08 12:49:49 -04:00
Pushpinder Singh	4f8bc7caf4	[AMDGPU][Libomptarget] Remove atlc global This global struct used to hold various flags for monitoring the initialization of hsa. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103795	2021-06-07 11:09:01 +00:00
Pushpinder Singh	f5f329a371	[AMDGPU][Libomptarget] Rework logic for locating kernarg pools Previous logic was to always use the first kernarg pool found to allocate kernel args. This patch changes this to use only the kernarg pool which has non-zero size. This logic is also reworked to not use any globals. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103600	2021-06-07 06:41:37 +00:00
Pushpinder Singh	b25546a4b4	[AMDGPU][Libomptarget][NFC] Remove bunch of dead structs Dropped structs are atmi_machine_t, atmi_device_t and atmi_memory_t Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103509	2021-06-02 10:40:51 +00:00
Pushpinder Singh	2368170a8d	[AMDGPU][Libomptarget][NFC] Remove atmi_place_t atmi_place_t has been replaced with int DeviceId. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103508	2021-06-02 10:35:28 +00:00
Pushpinder Singh	fb113264a8	[AMDGPU][Libomptarget] Remove g_atmi_machine global Turns out the only purpose of this class was verify if device ID was in range or not which could be done easily by using g_atl_machine. Still getting rid of g_atl_machine is pending which would be done in a later patch. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103443	2021-06-01 12:34:24 +00:00
Pushpinder Singh	4fc3286951	[AMDGPU][Libomptarget][NFC] Split host and device malloc This patch splits the code path for host and device malloc. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103389	2021-05-31 12:09:18 +00:00

1 2 3 4

173 Commits