llvm-project

Commit Graph

Author	SHA1	Message	Date
Jon Chesterfield	dee7704829	[AMDGPU] Add __builtin_amdgcn_grid_size [AMDGPU] Add __builtin_amdgcn_grid_size Similar to D76772, loads the data from the dispatch pointer. Marked invariant. Patch also updates the openmp devicertl to use this builtin. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D90251	2020-10-29 16:25:13 +00:00
Benjamin Kramer	207cf71fa9	Revert "[OpenMP] Add Passing in Original Declaration Names To Mapper API" This reverts commit `d981c7b758` and `a87d7b3d44`. Test fails under msan.	2020-10-28 13:58:14 +01:00
Joseph Huber	d981c7b758	[OpenMP] Add Support for Mapping Names in Libomptarget RTL Summary: This patch adds basic support for priting the source location and names for the mapped variables. This patch does not support names for custom mappers. This is based on D89802. The names information currently will be printed out only in debug mode or using env LIBOMPTARGET_INFO during execution. But the information is added when availible to the Device and Private data structures. To get the information out the code must be built with debug symbols on using -g or -Rpass=openmp-opt Reviewers: jdoerfert Differential Revision: https://reviews.llvm.org/D90172	2020-10-27 16:53:05 -04:00
Joseph Huber	a87d7b3d44	[OpenMP] Add Passing in Original Declaration Names To Mapper API Summary: This patch adds support for passing in the original delcaration name in the source file to the libomptarget runtime. This will allow the runtime to provide more intelligent debugging messages. This patch takes the original expression parsed from the OpenMP map / update clause and provides a textual representation if it was explicitly mapped, otherwise it takes the name of the variable declaration as a fallback. The information in passed to the runtime in a global array of strings that matches the existing ident_t source location strings using ";name;filename;column;row;;". See clang/test/OpenMP/target_map_names.cpp for an example of the generated output for a given map clause. Reviewers: jdoervert Differential Revision: https://reviews.llvm.org/D89802	2020-10-27 16:09:19 -04:00
Shilei Tian	e20d64c3d9	[Clang][OpenMP] Fixed an issue of segment fault when using target nowait The implementation of target nowait just wraps the target region into a task. The essential four parameters (base ptr, ptr, size, mapper) are taken as firstprivate such that they will be copied to the private location. When there is no user-defined mapper, the mapper variable will be nullptr. However, it will be still copied to the corresponding place. Therefore, a memcpy will be generated and the source pointer will be nullptr, causing a segmentation fault. The root cause is when calling `emitOffloadingArraysArgument`, the last argument `Options` has a field about whether it requires a task. It only takes depend clause into account. In this patch, the nowait clause is also included. There're two things that will be done in another patches: 1. target data nowait has not been supported yet. D90099 added the support. 2. When there is no mapper, the mapper array can be nullptr no matter whether it requires outer task or not. It can avoid an unnecessary data copy. This is an optimization that is covered in D90101. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89844	2020-10-26 22:33:22 -04:00
AndreyChurbanov	d6a0957467	[OpenMP] changing OMP rtl to use shared memory instead of env variable Patch by Erdner, Todd <todd.erdner@intel.com> Differential Revision: https://reviews.llvm.org/D89898	2020-10-26 19:02:21 +03:00
Shilei Tian	3091ed099f	[OpenMP] Fixed a potential integer overflow `size_t` has different width on 32- and 64-bit architecture, but the computation to floor to power of two assumed it is 64-bit, which can cause an integer overflow. In this patch, architecture detection is added so that the operation for 64-bit `size_t`. Thank Luke for reporting the issue. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89878	2020-10-22 21:22:19 -04:00
Jon Chesterfield	26790ed248	[libomptarget] Require LLVM source tree to build libomptarget [libomptarget] Require LLVM source tree to build libomptarget This is to permit reliably #including files from the LLVM tree in libomptarget, as an improvement on the copy and paste that is currently in use. See D87841 for the first example of removing duplication given this new requirement. The weekly openmp dev call reached consensus on this approach. See also D87841 for some alternatives that were considered. In the future, we may want to introduce a new top level repo for shared constants, or start using the ADT library within openmp. This will break sufficiently exotic build systems, trivial fixes as below. Building libomptarget as part of the monorepo will continue to work. If openmp is built separately, it now requires a cmake macro indicating where to find the LLVM source tree. If openmp is built separately, without the llvm source tree already on disk, the build machine will need a copy of a subset of the llvm source tree and the cmake macro indicating where it is. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D89426	2020-10-21 18:53:00 +01:00
JonChesterfield	55dc123555	[libomptarget][amdgcn] Refactor memcpy to eliminate maps [libomptarget][amdgcn] Refactor memcpy to eliminate maps Builds on D89776 to remove now dead code. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D89888	2020-10-21 16:59:33 +01:00
Pushpinder Singh	aa616efbb3	[libomptarget][AMDGPU][NFC] Split atmi_memcpy for h2d and d2h The calls to atmi_memcpy presently determine the direction of copy (host to device or device to host) by storing pointers in a map during malloc and looking up the pointers during memcpy. As each call site already knows the direction, this stash+lookup can be eliminated. This NFC will be followed by a functional one that deletes those map lookups. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D89776 Change-Id: I1d9089bc1e56b3a9a30e334735fa07dee1f84990	2020-10-20 06:29:32 -04:00
Jon Chesterfield	d27b39ce11	[libomptarget][amdgcn] Implement missing symbols in deviceRTL [libomptarget][amdgcn] Implement missing symbols in deviceRTL Malloc, wtime are stubs. Malloc needs a hostrpc implementation which is a work in progress, wtime needs some experimentation to find out the multiplier to get a time in seconds as documentation is scarce. Reviewed By: ronlieb Differential Revision: https://reviews.llvm.org/D89725	2020-10-20 00:24:15 +01:00
George Rokos	5adb3a6d86	[libomptarget] Fix copy-to motion for PTR_AND_OBJ entries where PTR is a struct member. This patch fixes a problem whereby the pointee object of a PTR_AND_OBJ entry with a `map(to)` motion clause can be overwritten on the device even if its reference counter is >=1. Currently, we check the reference counter of the parent struct in order to determine whether the motion clause should be respected, but since the pointee object is not part of the struct, it's got its own reference counter which should be used to enqueue the copy or discard it. The same behavior has already been implemented in targetDataEnd (omptarget.cpp:539-540), but we somehow missed doing the same in targetDataBegin. Differential Revision: https://reviews.llvm.org/D89597	2020-10-16 16:14:01 -07:00
JonChesterfield	7d2ecef5ed	[openmp][libomptarget] Include header from LLVM source tree [openmp][libomptarget] Include header from LLVM source tree The change is to the amdgpu plugin so is unlikely to break anything. The point of contention is whether libomptarget can depend on LLVM. A community discussion was cautiously not opposed yesterday. This introduces a compile time dependency on the LLVM source tree, in this case expressed as skipping the building of the plugin if LLVM_MAIN_INCLUDE_DIR is not set. One the source files will #include llvm/Frontend/OpenMP/OMPGridValues.h, instead of copy&pasting the numbers across. For users that download the monorepo, the llvm tree is already on disk. This will inconvenience users who download only the openmp source as a tar, as they would now also have to download (at least a file or two) from the llvm source, if they want to build the parts of the openmp project that (post this patch) depend on llvm. There was interest expressed in going further - using llvm tools as part of building libomp, or linking against llvm libraries. That seems less clear cut an improvement and worthy of further discussion. This patch seeks only to change policy to support openmp depending on the llvm source tree. Including in the other direction, or using libraries / tools etc, are purposefully out of scope. Reviewers are a best guess at interested parties, please feel free to add others Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D87841	2020-10-15 15:46:19 +01:00
JonChesterfield	8b6cd15242	[libomptarget][amdgcn] Implement partial barrier [libomptarget][amdgcn] Implement partial barrier named_sync is used to coordinate non-spmd kernels. This uses bar.sync on nvptx. There is no corresponding ISA support on amdgcn, so this is implemented using shared memory, one word initialized to zero. Each wave increments the variable by one. Whichever wave is last is responsible for resetting the variable to zero, at which point it and the others continue. The race condition on a wave reaching the barrier before another wave has noticed that it has been released is handled with a generation counter, packed into the same word. Uses a shared variable that is not needed on nvptx. Introduces a new hook, kmpc_impl_target_init, to allow different targets to do extra initialization. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D88602	2020-10-12 21:27:32 +01:00
Joseph Huber	d564409946	[OpenMP] Change CMake Configuration to Build for Highest CUDA Architecture by Default Summary: This patch changes the CMake files for Clang and Libomptarget to query the system for its supported CUDA architecture. This makes it much easier for the user to build optimal code without needing to set the flags manually. This relies on the now deprecated FindCUDA method in CMake, but full support for architecture detection is only availible in CMake >3.18 Reviewers: jdoerfert ye-luo Subscribers: cfe-commits guansong mgorny openmp-commits sstefan1 yaxunl Tags: #clang #OpenMP Differential Revision: https://reviews.llvm.org/D87946	2020-10-08 12:09:34 -04:00
Pushpinder Singh	3a12ff0dac	[OpenMP][RTL] Remove dead code RequiresDataSharing was always 0, resulting dead code in device runtime library. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D88829	2020-10-06 05:43:47 -04:00
Joachim Protze	69f87400a8	[OpenMP][Archer][Tests] NFC: fix spurious test failure The test disables suppression and therefore sometimes triggers a know false positive in the openmp runtime. The test should only verify that the env var is handles as expected.	2020-10-06 00:26:08 +02:00
Joachim Protze	34b34e90fc	[OpenMP][Tests] NFC: fix flaky test failure caused by rare scheduling The worker thread can start execution of the task before creation of the second task Fixes the spurious failure reported in https://reviews.llvm.org/D61657	2020-10-05 16:55:32 +02:00
Joachim Protze	23419bfd1c	[OpenMP][libarcher] Allow all possible argument separators in TSAN_OPTIONS Currently, the parser used to tokenize the TSAN_OPTIONS in libomp uses only spaces as separators, even though TSAN in compiler-rt supports other separators like ':' or ','. CTest uses ':' to separate sanitizer options by default. The documentation for other sanitizers mentions ':' as separator, but TSAN only lists spaces, which is probably where this mismatch originated. Patch provided by upsj Differential Revision: https://reviews.llvm.org/D87144	2020-10-01 01:10:13 +02:00
Joachim Protze	6104b30446	[OpenMP][OMPT] Update OMPT tests for newly added GOMP interface patches This patch updates the expected results for the GOMP interface patches: D87267, D87269, and D87271. The taskwait-depend test is changed to really use taskwait-depend and copied to an task_if0-depend test. To pass the tests, the handling of the return address was fixed. Differential Revision: https://reviews.llvm.org/D87680	2020-10-01 00:53:41 +02:00
Joachim Protze	55cff5b288	[OpenMP][libomptarget] make omp_get_initial_device 5.1 compliant OpenMP 5.1 defines omp_get_initial_device to return the same value as omp_get_num_devices. Since this change is also 5.0 compliant, no versioning is needed. Differential Revision: https://reviews.llvm.org/D88149	2020-10-01 00:51:11 +02:00
JonChesterfield	d256797c90	[nfc][libomptarget] Drop parameter to named_sync [nfc][libomptarget] Drop parameter to named_sync named_sync has one call site (in sync.cu) where it always passed L1_BARRIER. Folding this into the call site and dropping the macro is a simplification. amdgpu doesn't have ptx' bar.sync instruction. A correct implementation of __kmpc_impl_named_sync in terms of shared memory is much easier if it can assume that the barrier argument is this constant. Said implementation is left for a second patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D88474	2020-09-29 23:12:21 +01:00
Manoel Roemmer	c816ee13ad	[OpenMP][VE plugin] Fixing failure to build VE plugin with consolidated error handling in libomptarget The libomptarget VE plugin [[ http://lab.llvm.org:8014/builders/clang-ve-ninja/builds/8937/steps/build-unified-tree/logs/stdio \| fails zu build ]] after `ae95ceeb8f` . Differential Revision: https://reviews.llvm.org/D88476	2020-09-29 17:38:01 +02:00
Joseph Huber	0103df7903	[OpenMP] Add Missing _static Director for OpenMP Documentation Summary: Adding a missing directory needed for generating Sphinx documentation without errors. Directory current contains a placeholder image just to populate the directory.	2020-09-27 15:35:47 -04:00
Ye Luo	ffd159d8e9	[OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL It allows customizing MAX_SM for non-flagship GPU and reduces graphic memory usage. In addition, so far the size is hard-coded up to __CUDA_ARCH__ 700 and is already a hassle for 800. Introduce MAX_SM for 800 and protect future arch Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D88185	2020-09-24 12:39:59 -04:00
Peyton, Jonathan L	ee1c04a926	[OpenMP] Fix if0 task with dependencies in the runtime The current GOMP interface for serialized tasks does not take into account task dependencies. Add the check and wait for dependencies. Fixes: https://bugs.llvm.org/show_bug.cgi?id=46573 Differential Revision: https://reviews.llvm.org/D87271	2020-09-24 09:47:53 -05:00
Peyton, Jonathan L	9089b4a5c5	[OpenMP] Introduce GOMP taskwait depend in the runtime This change introduces the GOMP_taskwait_depend() function. It implements the OpenMP 5.0 feature of #pragma omp taskwait with depend() clause by wrapping around __kmpc_omp_wait_deps(). Differential Revision: https://reviews.llvm.org/D87269	2020-09-24 09:45:14 -05:00
Peyton, Jonathan L	72ada5ae6c	[OpenMP] Introduce GOMP mutexinoutset in the runtime Encapsulate GOMP task dependencies in separate class and introduce the new mutexinoutset dependency type. This separate class allows future GOMP task APIs easier access to the task dependency functionality and better ability to propagate new dependency types to all existing GOMP task APIs which use task dependencies. Differential Revision: https://reviews.llvm.org/D87267	2020-09-24 09:45:13 -05:00
Peyton, Jonathan L	ea34d95e0a	[OpenMP] Introduce GOMP teams support in runtime Implement GOMP_teams_reg() function which enables GOMP support of the standalone teams construct. The GOMP_parallel* functions were modified to call __kmp_fork_call() unconditionally so that the teams-specific code could be reused within __kmp_fork_call() instead of reproduced inside the GOMP_* functions. Differential Revision: https://reviews.llvm.org/D87167	2020-09-24 09:45:13 -05:00
Ye Luo	03111e5e7a	[OpenMP] Protect unrecogonized CUDA error code If an error code can not be recognized by cuGetErrorString, errStr remains null and causes crashing at DP() printing. Protect this case. Reviewed By: jhuber6, tianshilei1992 Differential Revision: https://reviews.llvm.org/D87980	2020-09-21 13:43:08 -04:00
Joseph Huber	1c4c21489f	[OpenMP] Initial Support for OpenMP Webpage Documentation Summary: Adding support for generated html documentation for OpenMP. Changing Cmake files to build the documentation and adding the base templates for future documentation to be added. Reviewers: jdoerfert Subscribers: aaron.ballman arphaman guansong mgorny openmp-commits sstefan1 yaxunl Tags: #OpenMP Differential Revision: https://reviews.llvm.org/D87797	2020-09-18 16:32:22 -04:00
JonChesterfield	a9be2b5cb2	[libomptarget] Disable build of amdgpu plugin as it doesn't build with rocm.	2020-09-18 18:10:27 +01:00
Joseph Huber	c3e6054b07	[OpenMP] Additional Information for Libomptarget Mappings Summary: This patch adds additonal support for priting infromation from Libomptarget for already existing maps and printing the final data mapped on the device at device destruction. Reviewers: jdoerfort gkistanova Subscribers: guansong openmp-commits sstefan1 yaxunl Tags: #OpenMP Differential Revision: https://reviews.llvm.org/D87722	2020-09-15 18:12:57 -04:00
Raul Tambre	c42f96cb23	[CMake][OpenMP] Simplify getting CUDA library directory LLVM now requires CMake 3.13.4 so we can simplify this. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D87195	2020-09-11 21:19:11 +03:00
Joseph Huber	ae209397b1	[OpenMP] Begin Printing Information Dumps In Libomptarget and Plugins Summary: This patch starts adding support for adding information dumps to libomptarget and rtl plugins. The information printing is controlled by the LIBOMPTARGET_INFO environment variable introduced in D86483. The goal of this patch is to provide the user with additional information about the device during kernel execution and providing the user with information dumps in the case of failure. This patch added the ability to dump the pointer mapping table as well as printing the number of blocks and threads in the cuda RTL. Reviewers: jdoerfort gkistanova ye-luo Subscribers: guansong openmp-commits sstefan1 yaxunl ye-luo Tags: #OpenMP Differential Revision: https://reviews.llvm.org/D87165	2020-09-09 12:03:56 -04:00
Pushpinder Singh	7634c64b61	[OpenMP][AMDGPU] Use DS_Max_Warp_Number instead of WARPSIZE The size of worker_rootS should have been DS_Max_Warp_Number. This reduces memory usage by deviceRTL on AMDGPU from around 2.3GB to around 770MB. Reviewed By: JonChesterfield, jdoerfert Differential Revision: https://reviews.llvm.org/D87084	2020-09-07 05:15:21 -04:00
Raul Tambre	21c0e74c9e	[CMake][OpenMP] Remove old dead CMake code LLVM requires CMake 3.13.4 so remove code behind checks for an older version. Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D87191	2020-09-07 10:56:56 +03:00
Joseph Huber	ae95ceeb8f	[OpenMP] Consolidate error handling and debug messages in Libomptarget Summary: This patch consolidates the error handling and messaging routines to a single file omptargetmessage. The goal is to simplify the error handling interface prior to adding more error handling support Reviewers: jdoerfert grokos ABataev AndreyChurbanov ronlieb JonChesterfield ye-luo tianshilei1992 Subscribers: danielkiss guansong jvesely kerbowa nhaehnle openmp-commits sstefan1 yaxunl	2020-09-01 15:28:19 -04:00
Alexey Bataev	6aa7228a62	[LIBOMPTARGET]Do not try to optimize bases for the next parameters. PrivateArgumentManager shall immediately allocate firstprivates if they are bases for the next parameters and the next paramaters rely on the fact that the base musst be allocated already. Differential Revision: https://reviews.llvm.org/D86781	2020-08-28 15:46:31 -04:00
Shilei Tian	46e0ced762	[OpenMP] Fixed wrong test command in the test private_mapping.c The test command in `private_mapping.c` was set to expect failure by mistake. It is fixed in this patch. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D86758	2020-08-28 12:19:46 -04:00
Joseph Huber	7a5a74ea96	[OpenMP] Always emit debug messages that indicate offloading failure Summary: This patch changes the libomptarget runtime to always emit debug messages that occur before offloading failure. The goal is to provide users with information about why their application failed in the target region rather than a single failure message. This is only done in regions that precede offloading failure so this should not impact runtime performance. if the debug environment variable is set then the message is forwarded to the debug output as usual. A new environment variable was added for future use but does nothing in this current patch. LIBOMPTARGET_INFO will be used to report runtime information to the user if requrested, such as grid size, SPMD usage, or data mapping. It will take an integer indicating the level of information verbosity and a value of 0 will disable it. Reviewers: jdoerfort Subscribers: guansong sstefan1 yaxunl ye-luo Tags: #OpenMP Differential Revision: https://reviews.llvm.org/D86483	2020-08-26 19:30:41 -04:00
JonChesterfield	5d989fb37d	[libomptarget][amdgpu] Improve thread safety, remove dead code	2020-08-26 22:04:03 +01:00
Jon Chesterfield	28fbf422f2	[libomptarget][amdgpu] Update plugin CMake to work with latest rocr library	2020-08-26 20:01:42 +01:00
AndreyChurbanov	1596ea80fd	[OpenMP] Fix import library installation with MinGW Patch by mati865@gmail.com Differential Revision: https://reviews.llvm.org/D86552	2020-08-26 21:56:01 +03:00
AndreyChurbanov	09af378f49	[OpenMP] Fix build on macOS sdk 10.12 and newer Patch by nihui (Ni Hui) Differential Revision: https://reviews.llvm.org/D76755	2020-08-26 16:52:46 +03:00
Shilei Tian	0775c1dfbc	[OpenMP] Pack first-private arguments to improve efficiency of data transfer In this patch, we pack all small first-private arguments, allocate and transfer them all at once to reduce the number of data transfer which is very expensive. Let's take the test case as example. ``` int main() { int data1[3] = {1}, data2[3] = {2}, data3[3] = {3}; int sum[16] = {0}; #pragma omp target teams distribute parallel for map(tofrom: sum) firstprivate(data1, data2, data3) for (int i = 0; i < 16; ++i) { for (int j = 0; j < 3; ++j) { sum[i] += data1[j]; sum[i] += data2[j]; sum[i] += data3[j]; } } } ``` Here `data1`, `data2`, and `data3` are three first-private arguments of the target region. In the previous `libomptarget`, it called data allocation and data transfer three times, each of which allocated and transferred 12 bytes. With this patch, it only calls allocation and transfer once. The size is `(12+4)3=48` where 12 is the size of each array and 4 is the padding to keep the address aligned with 8. It is implemented in this way: 1. First collect all information for those first*-private arguments. _private_ arguments are not the case because private arguments don't need to be mapped to target device. It just needs a data allocation. With the patch for memory manager, the data allocation could be very cheap, especially for the small size. For each qualified argument, push a place holder pointer `nullptr` to the `vector` for kernel arguments, and we will update them later. 2. After we have all information, create a buffer that can accommodate all arguments plus their paddings. Copy the arguments to the buffer at the right place, i.e. aligned address. 3. Allocate a target memory with the same size as the host buffer, transfer the host buffer to target device, and finally update all place holder pointers in the arguments `vector`. The reason we only consider small arguments is, the data transfer is asynchronous. Therefore, for the large argument, we could continue to do things on the host side meanwhile, hopefully, the data is also being transferred. The "small" is defined by that the argument size is less than a predefined value. Currently it is 1024. I'm not sure whether it is a good one, and that is an open question. Another question is, do we need to make it configurable via an environment variable? Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D86307	2020-08-25 16:06:29 -04:00
Dimitry Andric	47b0262d3f	Add <stdarg.h> include to kmp_os.h, to get the va_list type, required after `cde8f4c164`. Sort system includes, while here.	2020-08-24 22:45:02 +02:00
Dimitry Andric	cde8f4c164	Move special va_list handling to kmp_os.h Instead of copying and pasting the same `#ifdef` expressions in multiple places, define a type and a pair of macros in `kmp_os.h`, to handle whether `va_list` is pointer-like or not: * `kmp_va_list` is the type to use for `__kmp_fork_call()` * `kmp_va_deref()` dereferences a `va_list`, if necessary * `kmp_va_addr_of()` takes the address of a `va_list`, if necessary Also add FreeBSD to the list of OSes that has a non pointer-like va_list. This can now be easily extended to other OSes too. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D86397	2020-08-24 22:31:56 +02:00
AndreyChurbanov	d0f4f5a182	[OpenMP] Check if _MSC_VER is defined before using it Patch by mati865@gmail.com Differential Revision: https://reviews.llvm.org/D86448	2020-08-24 17:50:38 +03:00
Shilei Tian	f93b42a629	[NFC][OpenMP] Remove outdated comments about potential issues The issue mentioned has been fixed in D84996	2020-08-24 01:21:06 +00:00
Shilei Tian	0289696751	[OpenMP] Introduce target memory manager Target memory manager is introduced in this patch which aims to manage target memory such that they will not be freed immediately when they are not used because the overhead of memory allocation and free is very large. For CUDA device, cuMemFree even blocks the context switch on device which affects concurrent kernel execution. The memory manager can be taken as a memory pool. It divides the pool into multiple buckets according to the size such that memory allocation/free distributed to different buckets will not affect each other. In this version, we use the exact-equality policy to find a free buffer. This is an open question: will best-fit work better here? IMO, best-fit is not good for target memory management because computation on GPU usually requires GBs of data. Best-fit might lead to a serious waste. For example, there is a free buffer of size 1960MB, and now we need a buffer of size 1200MB. If best-fit, the free buffer will be returned, leading to a 760MB waste. The allocation will happen when there is no free memory left, and the memory free on device will take place in the following two cases: 1. The program ends. Obviously. However, there is a little problem that plugin library is destroyed before the memory manager is destroyed, leading to a fact that the call to target plugin will not succeed. 2. Device is out of memory when we request a new memory. The manager will walk through all free buffers from the bucket with largest base size, pick up one buffer, free it, and try to allocate immediately. If it succeeds, it will return right away rather than freeing all buffers in free list. Update: A threshold (8KB by default) is set such that users could control what size of memory will be managed by the manager. It can also be configured by an environment variable `LIBOMPTARGET_MEMORY_MANAGER_THRESHOLD`. Reviewed By: jdoerfert, ye-luo, JonChesterfield Differential Revision: https://reviews.llvm.org/D81054	2020-08-19 23:12:23 -04:00
Shilei Tian	83c3d07994	[OpenMP] Refactored the function `DeviceTy::data_exchange` This patch contains the following changes: 1. Renamed the function `DeviceTy::data_exchange` to `DeviceTy::dataExchange`; 2. Changed the second argument `DeviceTy DstDev` to `DeviceTy &DstDev`; 3. Renamed the last argument. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D86238	2020-08-19 16:08:14 -04:00
Jon Chesterfield	6e1b11087f	[libomptarget][amdgpu] Support building with static rocm libraries	2020-08-19 15:44:30 +01:00
George Rokos	32ebdc70f3	[libomptarget][NFC] Sort list of plugins in chronological order Differential Revision: https://reviews.llvm.org/D86082	2020-08-17 08:33:36 -07:00
Johannes Doerfert	5272d29e2c	[OpenMP][CUDA] Keep one kernel list per device, not globally. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D86039	2020-08-16 14:38:35 -05:00
Johannes Doerfert	aa27cfc1e7	[OpenMP][CUDA] Cache the maximal number of threads per block (per kernel) Instead of calling `cuFuncGetAttribute` with `CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK` for every kernel invocation, we can do it for the first one and cache the result as part of the `KernelInfo` struct. The only functional change is that we now expect `cuFuncGetAttribute` to succeed and otherwise propagate the error. Ignoring any error seems like a slippery slope... Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D86038	2020-08-16 14:38:33 -05:00
Jon Chesterfield	d0b312955f	[libomptarget] Implement host plugin for amdgpu [libomptarget] Implement host plugin for amdgpu Replacement for D71384. Primary difference is inlining the dependency on atmi followed by extensive simplification and bugfixes. This is the latest version from https://github.com/ROCm-Developer-Tools/amd-llvm-project/tree/aomp12 with minor patches and a rename from hsa to amdgpu, on the basis that this can't be used by other implementations of hsa without additional work. This will not build unless the ROCM_DIR variable is passed so won't break other builds. That variable is used to locate two amdgpu specific libraries that ship as part of rocm: libhsakmt at https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface libhsa-runtime64 at https://github.com/RadeonOpenCompute/ROCR-Runtime These libraries build from source. The build scripts in those repos are for shared libraries, but can be adapted to statically link both into this plugin. There are caveats. - This works well enough to run various tests and benchmarks, and will be used to support the current clang bring up - It is adequately thread safe for the above but there will be races remaining - It is not stylistically correct for llvm, though has had clang-format run - It has suboptimal memory management and locking strategies - The debug printing / error handling is inconsistent I would like to contribute this pretty much as-is and then improve it in-tree. This would be advantagous because the aomp12 branch that was in use for fixing this codebase has just been joined with the amd internal rocm dev process. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85742	2020-08-15 23:58:28 +01:00
Joachim Protze	66a3575c28	[OpenMP] Fix releasing of stack memory Starting with `787eb0c637` I got spurious segmentation faults for some testcases. I could nail it down to `brel` trying to release the "memory" of the node allocated on the stack of __kmpc_omp_wait_deps. With this patch, you will see the assertion triggering for some of the tests in the test suite. My proposed solution for the issue is to just patch __kmpc_omp_wait_deps: ``` __kmp_init_node(&node); - node.dn.on_stack = 1; + // the stack owns the node + __kmp_node_ref(&node); ``` What do you think? Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D84472	2020-08-14 10:32:53 +02:00
Joel E. Denny	518a27e559	[OpenMP] Fix ref count dec for implicit map of partial data D85342 broke this case. The new test case presents an example. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D85369	2020-08-06 11:39:29 -04:00
Joel E. Denny	8c8bb128df	[OpenMP] Fix `target data` exit for array extension For example: ``` #pragma omp target data map(tofrom:arr[0:100]) { #pragma omp target exit data map(delete:arr[0:100]) #pragma omp target enter data map(alloc:arr[98:2]) } ``` Without this patch, the transfer at the end of the target data region is broken and fails depending on the target device. According to my read of the spec, the transfer shouldn't even be attempted because `arr[0:100]` isn't (fully) present there. To fix that, this patch makes `DeviceTy::getTgtPtrBegin` return null for this case. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D85342	2020-08-05 16:51:25 -04:00
Joel E. Denny	41b1aefecb	[OpenMP] Fix `present` diagnostic for array extension For example, without this patch, the following fails as expected with or without the `present` modifier, but the `present` modifier doesn't produce its usual diagnostic: ``` #pragma omp target data map(alloc: arr[0:2]) { #pragma omp target map(present, tofrom: arr[0:100]) // not fully present ; } ``` Reviewed By: grokos, vzakhari Differential Revision: https://reviews.llvm.org/D85320	2020-08-05 16:51:24 -04:00
George Rokos	40470eb27a	[libomptarget][NFC] Replace `%ld` with PRId64 for data of type int64_t. The standard way of printing `int64_t` data is via the PRId64 macro, `ld` is for `long int` and int64_t is not guaranteed to be typedef'ed as `long int` on all platforms. E.g. on Windows we get mismatch warnings. Differential Revision: https://reviews.llvm.org/D85353	2020-08-05 13:28:35 -07:00
Alexey Bataev	6780d5675b	[LIBOMPTARGET]Fix order of mapper data for targetDataEnd function. targetDataMapper function fills arrays with the mapping data in the direct order. When this function is called by targetDataBegin or tgt_target_update functions, it works as expected. But targetDataEnd function processes mapped data in reverse order. In this case, the base pointer might be deleted before the associated data is deleted. Need to reverse data, mapped by mapper, too, since it always adds data that must be deleted at the end of the buffer. Fixes the test declare_mapper_target_update.cpp. Also, reduces the memry fragmentation by preallocation the memory buffers. Differential Revision: https://reviews.llvm.org/D85216	2020-08-05 13:42:24 -04:00
Joel E. Denny	5ab43989c3	[OpenMP] Fix `omp target update` for array extension OpenMP TR8 sec. 2.15.6 "target update Construct", p. 183, L3-4 states: > If the corresponding list item is not present in the device data > environment and there is no present modifier in the clause, then no > assignment occurs to or from the original list item. L10-11 states: > If a present modifier appears in the clause and the corresponding > list item is not present in the device data environment then an > error occurs and the program termintates. (OpenMP 5.0 also has the first passage but without mention of the present modifier of course.) In both passages, I assume "is not present" includes the case of partially but not entirely present. However, without this patch, the target update directive misbehaves in this case both with and without the present modifier. For example: ``` #pragma omp target enter data map(to:arr[0:3]) #pragma omp target update to(arr[0:5]) // might fail on data transfer #pragma omp target update to(present:arr[0:5]) // might fail on data transfer ``` The problem is that `DeviceTy::getTgtPtrBegin` does not return a null pointer in that case, so `target_data_update` sees the data as fully present, and the data transfer then might fail depending on the target device. However, without the present modifier, there should never be a failure. Moreover, with the present modifier, there should always be a failure, and the diagnostic should mention the present modifier. This patch fixes `DeviceTy::getTgtPtrBegin` to return null when `target_data_update` is the caller. I'm wondering if it should do the same for more callers. Reviewed By: grokos, jdoerfert Differential Revision: https://reviews.llvm.org/D85246	2020-08-05 10:03:31 -04:00
Joel E. Denny	002d61db2b	[OpenMP] Fix `present` for exit from `omp target data` Without this patch, the following example fails but shouldn't according to OpenMP TR8: ``` #pragma omp target enter data map(alloc:i) #pragma omp target data map(present, alloc: i) { #pragma omp target exit data map(delete:i) } // fails presence check here ``` OpenMP TR8 sec. 2.22.7.1 "map Clause", p. 321, L23-26 states: > If the map clause appears on a target, target data, target enter > data or target exit data construct with a present map-type-modifier > then on entry to the region if the corresponding list item does not > appear in the device data environment an error occurs and the > program terminates. There is no corresponding statement about the exit from a region. Thus, the `present` modifier should: 1. Check for presence upon entry into any region, including a `target exit data` region. This behavior is already implemented correctly. 2. Should not check for presence upon exit from any region, including a `target` or `target data` region. Without this patch, this behavior is not implemented correctly, breaking the above example. In the case of `target data`, this patch fixes the latter behavior by removing the `present` modifier from the map types Clang generates for the runtime call at the end of the region. In the case of `target`, we have not found a valid OpenMP program for which such a fix would matter. It appears that, if a program can guarantee that data is present at the beginning of a `target` region so that there's no error there, that data is also guaranteed to be present at the end. This patch adds a comment to the runtime to document this case. Reviewed By: grokos, RaviNarayanaswamy, ABataev Differential Revision: https://reviews.llvm.org/D84422	2020-08-05 10:03:31 -04:00
Adrian Pop	bf2aa74e51	[OpenMP] support build on msys2/mingw with clang or gcc RTM Adaptive Locks are supported on msys2/mingw for clang and gcc. Differential Revision: https://reviews.llvm.org/D81776	2020-08-04 23:15:36 +03:00
AndreyChurbanov	4a04bc8995	[OpenMP] Don't use MSVC workaround with MinGW Patch by mati865@gmail.com Differential Revision: https://reviews.llvm.org/D85210	2020-08-04 18:48:25 +03:00
David Blaikie	0c938a8dd8	OpenMP: Fix typo variabls -> variables	2020-08-03 17:00:15 -07:00
Shilei Tian	f2400f024d	[OpenMP] Fixed the issue that target memory deallocation might be called when they're being used This patch fixed the issue that target memory might be deallocated when they're still being used or before they're used. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84996	2020-07-31 18:54:18 -04:00
Joachim Protze	03116a9f8c	[OpenMP] Use weak attribute in interface only for static library This is to address the issue reported at: https://bugs.llvm.org/show_bug.cgi?id=46863 Since weak is meaningless for a shared library interface function, this patch disables the attribute, when the OpenMP library is built as shared library. ompt_start_tool is not an interface function, but a internally called function possibly implemented by an OMPT tool. This function needs to be weak if possible to allow overwriting ompt_start_tool with a function implementation built into the application. Differential Revision: https://reviews.llvm.org/D84871	2020-07-31 12:29:05 +02:00
Shilei Tian	0f10165626	[OpenMP] Refactored the function `targetDataEnd` Refactored the function `targetDataEnd` to make preparation of fixing the issue of ahead-of-time target memory deallocation. This patch only renamed `targetDataEnd` related variables and functions to conform with LLVM code standard. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84991	2020-07-30 21:39:26 -04:00
Shilei Tian	8218eee269	[OpenMP] Refactored the function `target` Refactored the function `target` to make preparation for fixing the issue of ahead-of-time device memory deallocation. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84816	2020-07-30 21:05:55 -04:00
Alexey Bataev	622e46156d	[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region. Need to map the base pointer for all directives, not only target data-based ones. The base pointer is mapped for array sections, array subscript, array shaping and other array-like constructs with the base pointer. Also, codegen for use_device_ptr clause was modified to correctly handle mapping combination of array like constructs + use_device_ptr clause. The data for use_device_ptr clause is emitted as the last records in the data mapping array. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84767	2020-07-30 11:18:33 -04:00
Alexey Bataev	b69357c2f4	Revert "[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region." This reverts commit `142d0d3ed8` to investigate undefined behavior revealed by buildbots.	2020-07-30 10:57:56 -04:00
Alexey Bataev	142d0d3ed8	[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region. Need to map the base pointer for all directives, not only target data-based ones. The base pointer is mapped for array sections, array subscript, array shaping and other array-like constructs with the base pointer. Also, codegen for use_device_ptr clause was modified to correctly handle mapping combination of array like constructs + use_device_ptr clause. The data for use_device_ptr clause is emitted as the last records in the data mapping array. It applies only for global pointers. Differential Revision: https://reviews.llvm.org/D84767	2020-07-30 09:40:05 -04:00
Joel E. Denny	cee52dd026	[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2) This patch implements OpenMP runtime support for the OpenMP TR8 `present` motion modifier for `omp target update` directives. The previous patch in this series implements Clang front end support. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D84712	2020-07-29 12:18:50 -04:00
Shilei Tian	30440924d4	[OpenMP] Replaced mutex lock/unlock in `target` with `std::lock_guard` Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84799	2020-07-28 20:31:40 -04:00
Joel E. Denny	65564e5eaf	Revert "[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2)" This reverts commit `2cb926a447`. It depends on `3c3faae497`, which is being reverted.	2020-07-28 20:30:05 -04:00
Shilei Tian	3ce69d4d50	[NFC][OpenMP] Renamed all variable and function names in `target` to conform with LLVM code standard This patch only touched variables and functions in `target`. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84797	2020-07-28 20:11:09 -04:00
Joel E. Denny	2cb926a447	[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2) This patch implements OpenMP runtime support for the OpenMP TR8 `present` motion modifier for `omp target update` directives. The previous patch in this series implements Clang front end support. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D84712	2020-07-28 19:15:18 -04:00
Jinsong Ji	d28f86723f	Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `bf544fa1c3`. Fixed the typo in PPCInstrInfo.cpp.	2020-07-28 14:00:11 +00:00
Joel E. Denny	9b4826d18b	[OpenMP] Fix libomptarget negative tests to expect abort On runtime failures, D83963 causes the runtime to abort instead of merely exiting with a non-zero value, but many tests in the libomptarget test suite still expect the former behavior. This patch updates the test suite and was discussed in post-commit comments on D83963 and D84557.	2020-07-28 09:02:16 -04:00
Joachim Protze	e2f5444c9c	[OpenMP][Tests] Enable nvptx64 testing for most libomptarget tests Also add $BUILD/lib to the LIBRARY_PATH to fix https://bugs.llvm.org/show_bug.cgi?id=46836. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D84557	2020-07-28 11:08:24 +02:00
Jinsong Ji	bf544fa1c3	Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `adffce7153`. This is breaking test-suite, revert while investigation.	2020-07-27 21:07:00 +00:00
Ye Luo	9323166601	[OpenMP] Add more pass-through functions in DeviceTy Summary: 1. Add DeviceTy::data_alloc, DeviceTy::data_delete, DeviceTy::data_alloc, DeviceTy::synchronize pass-through functions. Avoid directly accessing Device.RTL 2. Fix the type of the first argument of synchronize_ty in rth.h, device id is int32_t which is consistent with other functions. Reviewers: tianshilei1992, jdoerfert Reviewed By: tianshilei1992 Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D84487	2020-07-27 16:08:30 -04:00
Jinsong Ji	adffce7153	[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html no one is making use of QPX/A2Q/BGQ/BGP CNK anymore. This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang, CNK support in openmp/polly. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D83915	2020-07-27 19:24:39 +00:00
Johannes Doerfert	9c87466c39	[OpenMP] Use `abort` not `error` for fatal runtime exceptions See PR46515 for the rational but generally, we want to really abort not gracefully shut down. Reviewed By: grokos, ABataev Differential Revision: https://reviews.llvm.org/D83963	2020-07-24 15:15:38 -05:00
David Truby	bb099c87ab	[openmp] Don't copy exports into the source folder by default. Additionally fix the copy if enabled on multi-config targets. Summary: This changes the copy command for libomp.so to use the output of the target as the source of the copy, rather than trying to find it based on ${LIBOMP_LIBRARY_DIR}, which appears to be incorrect in multi-config generator builds. Reviewers: jdoerfert Subscribers: mgorny, yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D84148	2020-07-24 14:34:50 +01:00
Shilei Tian	c0185dc7df	Revert "[OpenMP] Wait for kernel prior to memory deallocation" This reverts commit `9b2832c089`.	2020-07-22 23:03:36 -04:00
Shilei Tian	9b2832c089	[OpenMP] Wait for kernel prior to memory deallocation Summary: In the function `target`, memory deallocation and `target_data_end` is called immediately returning from launching kernel. This might cause a race condition that the corresponding memory is still being used by the kernel and a potential issue that when the kernel starts to execute, its required data have already been deallocated, especially when multiple kernels running concurrently. Since nevertheless, we will block the thread issuing the target offloading at the end of the target, we just move the synchronization ahead a little bit to make sure the correctness. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D84381	2020-07-22 22:55:34 -04:00
Louis Dionne	afa1afd410	[CMake] Bump CMake minimum version to 3.13.4 This upgrade should be friction-less because we've already been ensuring that CMake >= 3.13.4 is used. This is part of the effort discussed on llvm-dev here: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140578.html Differential Revision: https://reviews.llvm.org/D78648	2020-07-22 14:25:07 -04:00
Joel E. Denny	708752b2f6	[OpenMP] Implement TR8 `present` map type modifier in runtime (2/2) This implements OpenMP runtime support for the OpenMP TR8 `present` map type modifier. The previous patch in this series implements Clang front end support. See that patch summary for behaviors that are not yet supported. Reviewed By: grokos, jdoerfert Differential Revision: https://reviews.llvm.org/D83062	2020-07-22 14:04:58 -04:00
Joel E. Denny	fc247c8f3c	Revert "[OpenMP] Implement TR8 `present` map type modifier in runtime (2/2)" This reverts commit `45b8f7ec35`. It attempts to use debug macros `DPxMOD` and `DPxPTR` in release builds. Will fix and reapply later.	2020-07-22 11:22:08 -04:00
Joel E. Denny	45b8f7ec35	[OpenMP] Implement TR8 `present` map type modifier in runtime (2/2) This implements OpenMP runtime support for the OpenMP TR8 `present` map type modifier. The previous patch in this series implements Clang front end support. See that patch summary for behaviors that are not yet supported. Reviewed By: grokos, jdoerfert Differential Revision: https://reviews.llvm.org/D83062	2020-07-22 10:15:32 -04:00
Joachim Protze	ae31d7838c	[OpenMP][NFC] pass on env variables to libomptarget tests	2020-07-22 12:14:45 +02:00
Saiyedul Islam	741e55aeed	[OpenMP] Temporarily disable failing runtime tests for clang-12 Following tests were disabled for clang-11 after upgrading to version 5.0 in D82963: 1. openmp/runtime/test/env/kmp_set_dispatch_buf.c 2. openmp/runtime/test/worksharing/for/kmp_set_dispatch_buf.c They are also failing for clang-12. Thus this temporary disabling until they are fixed. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D84241	2020-07-21 15:32:46 +00:00
AndreyChurbanov	617787ea77	[OpenMP] add missed REQUIRES:ompt for 2 OMPT tests	2020-07-21 16:31:17 +03:00
AndreyChurbanov	5a8779169e	[OpenMP] libomp build fix without OMPT_SUPPORT	2020-07-21 16:03:17 +03:00
AndreyChurbanov	917f842159	[OpenMP] libomp cleanup: add checks of bad memory access Add check of frm to prevent array out-of-bound access; add check of new_nproc to prevent access of unallocated hot_teams array; add check of location info pointer to prevent NULL dereference; add check of d_tn pointer to prevent NULL dereference in release build. These checks make static analyzers happier. This is second part of the patch from https://reviews.llvm.org/D84062.	2020-07-21 00:12:46 +03:00
AndreyChurbanov	787eb0c637	[OpenMP] libomp cleanup: add check of input global tid parameter Add check of negative gtid before indexing __kmp_threads. This makes static analyzers happier. This is the first part of the patch split in two parts. Differential Revision: https://reviews.llvm.org/D84062	2020-07-20 23:49:58 +03:00
Joachim Protze	f226171429	[OpenMP][Tests][NFC] Mark compatibility with older versions of clang	2020-07-20 13:53:29 +02:00
AndreyChurbanov	86fb2db49b	[OpenMP] libomp cleanup: check presence of hwloc objects CORE, PACKAGE hwloc documentation guarantees the only object that is always present in the topology is PU. We can check the presence of other objects in the topology, just in case. Differential Revision: https://reviews.llvm.org/D84065	2020-07-18 01:15:37 +03:00
AndreyChurbanov	62d88a1c79	[OpenMP] libomp: add itt notifications for teams construct on host Add barrier/region notification for parallel inside teams construct when number of teams is 1, as VTune only shows outer level regions for simplicity. Differential Revision: https://reviews.llvm.org/D84024	2020-07-17 21:10:25 +03:00
serge-sans-paille	515bc8c155	Harmonize Python shebang Differential Revision: https://reviews.llvm.org/D83857	2020-07-16 21:53:45 +02:00
AndreyChurbanov	ffd8f00931	[openmp] libomp: added itt notifications for task, taskwait, taskgroup Add releasing->acquire edges for child task->taskwait and child task->end of taskgroup. Differential Revision: https://reviews.llvm.org/D83804	2020-07-16 14:28:46 +03:00
George Rokos	140ab574a1	[OpenMP][Offload] Declare mapper runtime implementation Libomptarget patch adding runtime support for "declare mapper". Patch co-developed by Lingda Li and George Rokos. Differential revision: https://reviews.llvm.org/D68100	2020-07-15 18:11:43 -07:00
Johannes Doerfert	5937434677	[OpenMP] Silence unused symbol warning with proper ifdefs	2020-07-11 11:57:42 -05:00
Johannes Doerfert	c98699582a	[OpenMP][NFC] Remove unused (always fixed) arguments There are various runtime calls in the device runtime with unused, or always fixed, arguments. This is bad for all sorts of reasons. Clean up two before as we match them in OpenMPOpt now. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83268	2020-07-11 00:51:51 -05:00
Johannes Doerfert	cd0ea03e6f	[OpenMP][NFC] Remove unused and untested code from the device runtime Summary: We carried a lot of unused and untested code in the device runtime. Among other reasons, we are planning major rewrites for which reduced size is going to help a lot. The number of code lines reduced by 14%! Before: ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- CUDA 13 489 841 2454 C/C++ Header 14 322 493 1377 C 12 117 124 559 CMake 4 64 64 262 C++ 1 6 6 39 ------------------------------------------------------------------------------- SUM: 44 998 1528 4691 ------------------------------------------------------------------------------- After: ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- CUDA 13 366 733 1879 C/C++ Header 14 317 484 1293 C 12 117 124 559 CMake 4 64 64 262 C++ 1 6 6 39 ------------------------------------------------------------------------------- SUM: 44 870 1411 4032 ------------------------------------------------------------------------------- Reviewers: hfinkel, jhuber6, fghanim, JonChesterfield, grokos, AndreyChurbanov, ye-luo, tianshilei1992, ggeorgakoudis, Hahnfeld, ABataev, hbae, ronlieb, gregrodgers Subscribers: jvesely, yaxunl, bollu, guansong, jfb, sstefan1, aaron.ballman, openmp-commits, cfe-commits Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D83349	2020-07-10 19:09:41 -05:00
Joachim Protze	0fa0cf8638	[OpenMP][Tests] Update compatibility with GCC (NFC) Commit `95a28df5c` provided implementation for GOMP_nonmonotonicruntime* functions. Now the tests succeed with gcc 9 and 10	2020-07-08 00:27:19 +02:00
Ye Luo	c5348aecd7	[OpenMP] Use primary context in CUDA plugin Summary: Retaining per device primary context is preferred to creating a context owned by the plugin. From CUDA documentation 1. Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used." from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html 2. Right under cuCtxCreate. In most cases it is recommended to use cuDevicePrimaryCtxRetain. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf 3. The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX Two issues are addressed by this patch: 1. Not using the primary context caused interoperability issue with libraries like cublas, cusolver. CUBLAS_STATUS_EXECUTION_FAILED and cudaErrorInvalidResourceHandle 2. On OLCF summit, "Error returned from cuCtxCreate" and "CUDA error is: invalid device ordinal" Regarding the flags of the primary context. If it is inactive, we set CU_CTX_SCHED_BLOCKING_SYNC. If it is already active, we respect the current flags. Reviewers: grokos, ABataev, jdoerfert, protze.joachim, AndreyChurbanov, Hahnfeld Reviewed By: jdoerfert Subscribers: openmp-commits, yaxunl, guansong, sstefan1, tianshilei1992 Tags: #openmp Differential Revision: https://reviews.llvm.org/D82718	2020-07-07 10:14:51 -04:00
Saiyedul Islam	38d6640ba5	[libomptarget] Implement atomic inc and fence functions for AMDGCN using clang builtins This function uses __builtin_amdgcn_atomic_inc32(): uint32_t atomicInc(uint32_t *address, uint32_t max); These functions use __builtin_amdgcn_fence(): __kmpc_impl_threadfence() __kmpc_impl_threadfence_block() __kmpc_impl_threadfence_system() They will take place of current mechanism of directly calling IR functions. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83132	2020-07-07 06:36:25 +00:00
Peyton, Jonathan L	95a28df5c4	[OpenMP] Add GOMP 5.0 loop entry points This patch adds missing GOMP_5.0 loop entry points which incorporate new non-monotonic default into entry point name. Since monotonic schedules are a subset of nonmonotonic, it is acceptable to use monotonic as the implementation. This patch simply has the nonmonotonic (and possibly non-monontonic) versions of the loop entry points as wrappers around the monotonic ones. Differential Revision: https://reviews.llvm.org/D73922	2020-07-06 17:22:26 -05:00
Joachim Protze	6d9626d2da	[OpenMP][Tests] Fix/Mark compatibilty for GCC Reviewed by: Hahnfeld, saiislam Differential Revision: https://reviews.llvm.org/D82267	2020-07-06 23:56:09 +02:00
Saiyedul Islam	4c4bda1630	[OpenMP] Temporarily disable failing runtime tests for OpenMP 5.0 Following tests are failing after upgrading to version 5.0 but are passing for version 4.5: 1. openmp/runtime/test/env/kmp_set_dispatch_buf.c 2. openmp/runtime/test/worksharing/for/kmp_set_dispatch_buf.c To be enabled as soon as these tests are fixed. Reviewed By: ABataev Differential Revision: https://reviews.llvm.org/D82963	2020-07-06 14:04:43 +00:00
Joachim Protze	8289f2891e	[OpenMP][Tests] Flag compatibility of OpenMP runtime tests with GCC versions If the compilation fails, the test is marked as unsupported. -> This will never change for a specific version of gcc If the linking fails, the test is marked as expected to fail. -> This might change as LLVM/OpenMP implements the missing GOMP interface function Reviewed by: Hahnfeld Differential Revision: https://reviews.llvm.org/D83077	2020-07-05 22:49:54 +02:00
Joachim Protze	30205865d9	[OpenMP][OMPT] Fix ifdefs for OMPT code Fixes build with LIBOMP_OMPT_SUPPORT=off Reported by: Jason Edson Reviewed by: Hahnfeld Differential Revision: https://reviews.llvm.org/D83171	2020-07-05 22:39:25 +02:00
Fangrui Song	6ba4380ed6	[libomptarget][test] Fix text relocations by adding -fPIC	2020-07-05 12:51:28 -07:00
Joachim Protze	3fc97f9636	[OpenMP][Tests] NFC use type macro in printf	2020-07-05 09:17:18 +02:00
Joachim Protze	47cb8a0f0b	[OpenMP][OMPT]Add event callbacks for taskwait with depend This adds the missing event callbacks to express dependencies on included tasks and taskwait with depend clause. The test fails for GCC, see bug report: https://bugs.llvm.org/show_bug.cgi?id=46573 Reviewed by: hbae Differential Revision: https://reviews.llvm.org/D81891	2020-07-03 09:58:31 +02:00
Jonas Hahnfeld	0e0483bf5c	[OpenMP][CMake] Fix version detection of testing compiler When configuring in-tree, the correct names are LLVM_VERSION_MAJOR and LLVM_VERSION_MINOR. This has been wrong since the code was added in commits `fc473dee98` and `821649229e`.	2020-07-02 19:39:30 +02:00
Ye Luo	45bb073da8	[OpenMP] fix clang warning about printf format in CUDA plugin Summary: Warnings are printed by clang when building LIBOMPTARGET_ENABLE_DEBUG=ON due incorrect format string. Reviewers: tianshilei1992, jdoerfert Reviewed By: tianshilei1992 Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D82789	2020-06-29 22:35:39 -04:00
AndreyChurbanov	7f3d9cc1c0	[openmp][NFC] Cleanup: guard __kmp_mic_type by KMP_MIC_SUPPORTED macro. Differential Revision: https://reviews.llvm.org/D82301	2020-06-29 14:14:56 +03:00
Joachim Protze	d4230c67bf	[OpenMP][Tool] Fix buffer overflow in ompt-multiplex.h Reviewed by: runlieb Differential Revision: https://reviews.llvm.org/D82452	2020-06-29 12:44:33 +02:00
Han Zhu	1eaebe192f	[openmp] Use config.test_extra_flags in archer and multiplex tests Summary: `config.test_extra_flags` is passed in from `lit.site.cfg.in` files, but they're not used in the LIT configs. This variable can be useful for distros which don't have the standard c/c++ headers in the default search paths. Since the tests run clang on c/c++ source code, we rely on `test_extra_flags` to pass in the necessary header files. This is a similar setup that's also done in litomptarget https://github.com/llvm/llvm-project/blob/master/openmp/libomptarget/test/lit.cfg#L42 and openmp/runtime. Reviewers: jdoerfert, jdenny, protze.joachim Reviewed By: jdoerfert Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D82516	2020-06-25 11:58:52 -07:00
Ye Luo	6e5f64c44f	[OpenMP] Adopt std::set in HostDataToTargetMap Summary: lookupMapping took significant time due to linear complexity searching. This is bad for offloading from multiple host threads because lookupMapping is protected by mutex. Use std::set for logarithmic complexity searching. Before my change. libomptarget inclusive time 16.7 sec, exclusive time 8.6 sec. After the change libomptarget inclusive time 7.3 sec, exclusive time 0.4 sec. Most of the overhead of libomptarget (exclusive time) is gone. Reviewers: jdoerfert, grokos Reviewed By: grokos Subscribers: tianshilei1992, yaxunl, guansong, sstefan1 Tags: #openmp Differential Revision: https://reviews.llvm.org/D82264	2020-06-24 12:22:45 -04:00
Joachim Protze	73b7ff4e16	[OpenMP] NFC: Create OpenMP release notes file	2020-06-24 13:42:32 +02:00
Joachim Protze	63a3c5925d	[OpenMP][OMPT] Pass mutexinoutset to the tool Adds OMPT support for the mutexinoutset dependency Reviewed by: hbae Differential Revision: https://reviews.llvm.org/D81890	2020-06-19 12:51:18 +02:00
Shilei Tian	aaf50adb53	Revert "[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info" This reverts commit `ee1bf45e1d`.	2020-06-17 15:01:16 -04:00
Shilei Tian	ee1bf45e1d	[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info DeviceID is added for some cases that we only have the __tgt_async_info but do not know its corresponding device id. However, to communicate with target plugins, we need that information. Event is added for another way to synchronize.	2020-06-17 14:29:09 -04:00
Alexey Bataev	08029595ca	[OPENMP]Fix overflow during counting the number of iterations. Summary: The OpenMP loops are normalized and transformed into the loops from 0 to max number of iterations. In some cases, original scheme may lead to overflow during calculation of number of iterations. If it is unknown, if we can end up with overflow or not (the bounds are not constant and we cannot define if there is an overflow), cast original type to the unsigned. Reviewers: jdoerfert Subscribers: yaxunl, guansong, sstefan1, openmp-commits, cfe-commits, caomhin Tags: #clang, #openmp Differential Revision: https://reviews.llvm.org/D81881	2020-06-17 08:47:01 -04:00
Joachim Protze	8580af3f7d	subdirectories should not use cmake project command	2020-06-17 09:38:56 +02:00
Joachim Protze	e9b8ed1fd7	[OpenMP][Tool] Header-only multiplexing of OMPT tools Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D76012	2020-06-17 09:16:46 +02:00
Joachim Protze	cbea36903e	[OpenMP][OMPT] Add callbacks for doacross loops Adds the callbacks for ordered with source/sink dependencies. The test for task dependencies changed, because callbach.h now actually prints the passed dependencies and the test also checks for the address. Reviewed by: hbae Differential Revision: https://reviews.llvm.org/D81807	2020-06-16 16:53:40 +02:00
Joachim Protze	9e5aefc5f9	[OpenMP][Tests] fix data race in an OpenMP runtime test Reviewed by: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D81804	2020-06-15 18:48:35 +02:00
Joachim Protze	d056d7592a	[OpenMP][Tool] Extend reuse of OMPT testing This patch allows to specify a prefix (default:empty) to be included into print-out written by callback.h. Also adding a cmake target to find the header file from other tests. Reviewed by: jdoerfert Differential Revision: https://reviews.llvm.org/D76008	2020-06-14 15:55:32 +02:00
Joachim Protze	add8d90cb3	[OpenMP] support alloc of serialized tasks Reviewed by: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D81497	2020-06-14 15:55:32 +02:00
Joachim Protze	e7577d1d76	Remove mention of counter from Archer readme The feature was removed before upstreaming Archer, so the documentation is wrong	2020-06-05 14:31:03 +02:00
Shilei Tian	a014fbbc21	[OpenMP] Improve D2D memcpy to use more efficient driver API Summary: In current implementation, D2D memcpy is first to copy data back to host and then copy from host to device. This is very efficient if the device supports D2D memcpy, like CUDA. In this patch, D2D memcpy will first try to use native supported driver API. If it fails, fall back to original way. It is worth noting that D2D memcpy in this scenerio contains two ideas: - Same devices: this is the D2D memcpy in the CUDA context. - Different devices: this is the PeerToPeer memcpy in the CUDA context. My implementation merges this two parts. It chooses the best API according to the source device and destination device. Reviewers: jdoerfert, AndreyChurbanov, grokos Reviewed By: jdoerfert Subscribers: yaxunl, guansong, sstefan1, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D80649	2020-06-04 16:59:06 -04:00
AndreyChurbanov	abe64360ae	[openmp] Fixed nonmonotonic schedule implementation. Differential Revision: https://reviews.llvm.org/D80942	2020-06-04 15:39:45 +03:00
Joachim Protze	10995c77b4	[OpenMP][OMPT] Fix and add event callbacks for detached tasks The OpenMP spec has the task-fulfill event for a call to omp_fulfill_event. If the task did not yet finish execution, ompt_task_early_fulfill is used, otherwise ompt_task_late_fulfill. If a task does not complete, when the execution finishes (i.e., the task goes in detached mode), ompt_task_detach instead of ompt_task_complete must be used, when the next task is scheduled. A test for both cases is included, which only work with clang-11+ Reviewed By: hbae Differential revision: https://reviews.llvm.org/D80843	2020-06-02 09:52:40 +02:00
AndreyChurbanov	5e111c5df8	[openmp] Fixed taskloop recursive splitting so that taskloop tasks have same parent tasks. Differential Revision: https://reviews.llvm.org/D80577	2020-06-01 17:51:02 +03:00
Joachim Protze	3895148d7c	[OpenMP] Fix a race in task queue reallocation __kmp_realloc_task_deque implicitly assumes, that the task queue is full (ntasks == size), therefore tail = size in line 319. An assertion is added to document this assumption. The first check for a full queue is before the locking and might not hold when the lock is taken. So, we need to check again for this condition when we have the lock. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D80480	2020-05-25 10:23:22 +02:00
AndreyChurbanov	57d8b8d6f0	[openmp] Fixed hang if detached task was serialized. The patch fixes https://bugs.llvm.org/show_bug.cgi?id=45904. Differential Revision: https://reviews.llvm.org/D79944	2020-05-18 15:32:13 +03:00
Joachim Protze	d23131a3c0	[OpenMP] Fix race condition in the completion/freeing of detached tasks Spurious assertion failures are symptoms of a race condition for the handling of detached tasks: Assertion failure at kmp_tasking.cpp(3744): taskdata->td_flags.complete == 1. Assertion failure at kmp_tasking.cpp(710): taskdata->td_flags.executing == 0. in the case of detach=true, all accesses to taskdata in __kmp_task_finish need to happen before (~line 873): taskdata->td_flags.proxy = TASK_PROXY; This assignment signals to __kmp_fulfill_event, that the task will need to be freed there. So, conceptionally the ownership of taskdata is moved. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D79702	2020-05-17 12:28:38 +02:00
Manoel Roemmer	6b9e43c67e	[Openmp][VE] Libomptarget plugin for NEC SX-Aurora This patch adds a libomptarget plugin for the NEC SX-Aurora TSUBASA Vector Engine (VE target). The code is largely based on the existing generic-elf plugin and uses the NEC VEO and VEOSINFO libraries for offloading. Differential Revision: https://reviews.llvm.org/D76843	2020-05-12 10:47:30 +02:00
Joel E. Denny	dd5ba4b585	[OpenMP][NFC] Fix `not` sustitution in tests D78566 introduced a `\bnot\b` lit substitution in OpenMP test suites. However, that would corrupt a command like `FileCheck -implicit-check-not` or any file name like `%t.not`. We could use lookbehind/lookahead assertions to avoid such cases, but this patch switches to `%not` (suggested during the D78566 review) as a safer option. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D79529	2020-05-11 14:53:48 -04:00
Shilei Tian	cb038927ef	[OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices Summary: There is a typo in DeviceRTLTy::getNumOfDevices that the type of its return value is bool. It will lead to a problem of wrong device number returned from omp_get_num_devices. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D79255	2020-05-03 15:59:06 -04:00
Ron Lieberman	ee9c53d271	[libomptarget] Initialize reference parameter IsNew within Device::getOrAllocTgtPtr The two locals IsNew and Pointer_IsNew were uninitialized at declaration, and then passed by reference to Device.getOrAllocTgtPtr which in turn did not assign on all paths within the function. This resulted in occasional runtime failures in one application. Device::getOrAllocTgtPtr will now initialize IsNew to false on entry to function. Differential Revision: https://reviews.llvm.org/D78744	2020-04-24 15:33:37 -05:00
Joel E. Denny	5f6aa9680c	[OpenMP] target_data_begin: fail on device alloc fail Without this patch, target_data_begin continues after an illegal mapping or an out-of-memory error on the device. With this patch, it terminates the runtime with an error instead. The new test exercises only illegal mappings. I didn't think of a good way to exercise out-of-memory errors from the test suite. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D78170	2020-04-21 17:10:50 -04:00
Joel E. Denny	ba942610f6	[OpenMP] Add scaffolding for negative runtime tests Without this patch, the openmp project's test suites do not appear to have support for negative tests. However, D78170 needs to add a test that an expected runtime failure occurs. This patch makes `not` visible in all of the openmp project's test suites. In all but `libomptarget/test`, it should be possible for a test author to insert `not` before a use of the lit substitution for running a test program. In `libomptarget/test`, that substitution is target-specific, and its value is `echo` when the target is not available. In that case, inserting `not` before a lit substitution would expect an `echo` fail, so this patch instead defines a separate lit substitution for expected runtime fails. Reviewed By: jdoerfert, Hahnfeld Differential Revision: https://reviews.llvm.org/D78566	2020-04-21 17:10:50 -04:00
Bryan Chan	b86ff5f6ef	[OpenMP] Sync writes to child thread's data before reduction On systems with weak memory consistency, this patch fixes an intermittent crash in the reduction function called by __kmp_hyper_barrier_gather, which suffers from a race on a child thread's data. Reviewed-By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D77603	2020-04-14 14:31:06 -04:00
Shilei Tian	4031bb982b	[OpenMP] Refined CUDA plugin to put all CUDA operations into class Summary: Current implementation mixed everything up so that there is almost no encapsulation. In this patch, all CUDA related operations are put into a new class DeviceRTLTy and only necessary functions are exposed. In addition, all C++ code now conforms with LLVM code standard, keeping those API functions following C style. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: jfb, yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77951	2020-04-13 13:32:46 -04:00
Shilei Tian	feed674dec	[OpenMP] Introduce stream pool to make sure the correctness of device synchr... ...onization Summary: In previous patch, in order to optimize performance, we only synchronize once for each target region. The syncrhonization is via stream synchronization. However, in the extreme situation, the performce might be bad. Consider the following case: There is a task that requires transferring huge amount of data (call many times of data transferring function). It is scheduled to the first stream. And then we have 255 very light tasks scheduled to the remaining 255 streams (by default we have 256 streams). They can be finished before we do synchronization at the end of the first task. Next, we get another very huge task. It will be scheduled again to the first stream. Now the first task finishes its kernel launch and call stream synchronization. Right now, the stream already contains two kernels, and the synchronization will wait until the two kernels finish instead of just the first one for the first task. In this patch, we introduce stream pool. After each synchronization, the stream will be returned back to the pool to make sure that for each synchronization, only expected operations are waited. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: gregrodgers, yaxunl, lildmh, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77412	2020-04-11 07:08:56 -04:00
Shilei Tian	03ff643d2e	[OpenMP] Put old APIs back and added new _async series for backward compatibility Summary: According to comments on bi-weekly meeting, this patch put back old APIs and added new `_async` series Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77822	2020-04-09 22:40:58 -04:00
Shilei Tian	32ed29271f	[OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream Summary: This patch introduces two things for offloading: 1. Asynchronous data transferring: those functions are suffix with `_async`. They have one more argument compared with their synchronous counterparts: `__tgt_async_info`, which is a new struct that only has one field, `void Identifier`. This struct is for information exchange between different asynchronous operations. It can be used for stream selection, like in this case, or operation synchronization, which is also used. We may expect more usages in the future. 2. Optimization of stream selection for data mapping. Previous implementation was using asynchronous device memory transfer but synchronizing after each memory transfer. Actually, if we say kernel A needs four memory copy to device and two memory copy back to host, then we can schedule these seven operations (four H2D, two D2H, and one kernel launch) into a same stream and just need synchronization after memory copy from device to host. In this way, we can save a huge overhead compared with synchronization after each operation. Reviewers: jdoerfert, ye-luo Reviewed By: jdoerfert Subscribers: yaxunl, lildmh, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77005	2020-04-07 14:55:47 -04:00
Kazuaki Ishizaki	4201679110	[OpenMP] NFC: Fix trivial typo Differential Revision: https://reviews.llvm.org/D77430	2020-04-04 12:06:54 +09:00
Vitaly Buka	c9ae3c5e10	[openmp] Disable tests flaky on Debian https://bugs.llvm.org/show_bug.cgi?id=45397	2020-04-01 21:58:05 -07:00
JonChesterfield	09834f9761	[libomptarget][nfc] Move non-freestanding headers out of common Summary: [libomptarget][nfc] Move non-freestanding headers out of common Lowers the bar for building deviceRTL. Drops math.h entirely as it wasn't used and libm is a big dependency. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D77071	2020-03-31 23:43:18 +01:00
Alexey Bataev	0fca766458	[OPENMP50]Fix PR45117: Orphaned task reduction should be allowed. Add support for orpahned task reductions.	2020-03-27 17:47:30 -04:00
Henry Kao	236ac68fa5	[OpenMP] Add memory barrier to solve data race Data race occurs when acquiring lock for critical section triggering assertion failure. Added barrier to ensure all memory is commited before checking assertion. Reviewed By: Hahnfeld Differential Revision: https://reviews.llvm.org/D76780	2020-03-27 16:32:28 -04:00
Jon Chesterfield	856c995436	[libomptarget] Add missing elf_end call in elf_common.c Summary: [libomptarget] Add missing elf_end call in elf_common.c Noticed when reviewing D76843. Reviewers: simoll, jdoerfert, efocht, AndreyChurbanov, grokos, manorom Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D76874	2020-03-26 19:07:33 +00:00
JonChesterfield	0813f41005	[libomptarget][nfc] Explicitly static function scope shared variables Summary: [libomptarget][nfc] Explicitly static function scope shared variables `__shared__` in CUDA implies static in function scope. See e.g. D.2.1.1 in CUDA_C_Programming_Guide.pdf, http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ This is surprising for non-cuda developers, see e.g. D73239 where I thought local variables would be thread local. Tested by IR diff of libomptarget.bc (no change), running in tree tests, and binary diff of the nvcc static archives (no significant change). Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D76713	2020-03-24 18:51:50 +00:00
AndreyChurbanov	ae044467ed	[openmp][runtime] Fixed hang for explicit task inside a taskloop. Added missed initialization of td_last_tied field for taskloop tasks. Differential Revision: https://reviews.llvm.org/D75673	2020-03-23 20:07:30 +03:00
Sylvestre Ledru	72fd1033ea	Doc: Links should use https	2020-03-22 22:49:33 +01:00
JonChesterfield	298527587c	[libomptarget][nfc] Disable amdgcn rtl build. The cmake logic for finding llvm is misbehaving.	2020-03-21 00:01:03 +00:00
George Rokos	0a42c9bfe4	Enable CUDA offloading on aarch64 host Differential Revision: https://reviews.llvm.org/D76469	2020-03-20 15:38:47 -07:00
Tom Scogland	a23d7282ca	openmp: fix memcpy memory leak Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D72637	2020-03-12 23:24:16 -05:00
Alexey Bataev	c422d69b1a	[LIBOMPTARGET]Fix PR45139: Bug in mixing Python and OpenMP target offload. Summary: Explicitly initialize data members of RTLsTy class upon construction. Reviewers: grokos Subscribers: guansong, openmp-commits, caomhin, kkwli0 Tags: #openmp Differential Revision: https://reviews.llvm.org/D75946	2020-03-11 09:12:02 -04:00
Jonas Hahnfeld	f0689d2e62	archer: Remove superfluous dot from warning message	2020-03-06 15:19:30 +01:00
Jon Chesterfield	221ada654b	[libomptarget] Implement locks for amdgcn Summary: [libomptarget] Implement locks for amdgcn The nvptx implementation deadlocks on amdgcn. atomic_cas with multiple active lanes can deadlock - if one lane succeeds, all the others are locked out. The set_lock implementation therefore runs on a single lane. Also uses a sleep intrinsic instead of the system clock for a probably minor performance improvement. The unset/test implementations may be revised later, based on code size / performance or similar concerns. This implements the lock at a per-wavefront scope. That's not strictly as specified, since openmp describes locks in terms of threads. I think the nvptx implementation provides true per-thread locking on volta and the same per-warp locking on other architectures. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D75546	2020-03-05 20:25:31 +00:00
Jon Chesterfield	918a1065be	[libomptarget][nfc] Move GetWarp/LaneId functions into per arch code Summary: [libomptarget][nfc] Move GetWarp/LaneId functions into per arch code No code change for nvptx. Amdgcn currently has two implementations of GetLaneId, this patch keeps the one a colleague considered to be superior for our ISA. GetWarpId is currently the same function for amdgcn and nvptx, but I think it's cleaner to keep it grouped with all the others than to keep it in support.cu. Reviewers: jdoerfert, grokos, ABataev Reviewed By: jdoerfert Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D75587	2020-03-05 17:05:58 +00:00
Jon Chesterfield	84ac0dffd4	[libomptarget][nfc][amdgcn] Replace magic number with named intrinsic	2020-03-05 11:50:30 +00:00
Jon Chesterfield	133db44996	[libomptarget] Implement most hip atomic functions in terms of intrinsics Summary: [libomptarget] Implement hip atomic functions in terms of intrinsics All but atomicInc can be implemented using type generic clang intrinsics. There is not yet a corresponding intrinsic for atomicInc in clang, only one in LLVM. This patch leaves atomicInc as an unresolved symbol. Reviewers: jdoerfert, ABataev, hfinkel, grokos, arsenm Reviewed By: arsenm Subscribers: sri, saiislam, wdng, jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D73076	2020-03-04 17:56:40 +00:00
AndreyChurbanov	95df6747cf	[openmp] OpenMP 5.1 omp_display_env function implementation. Patch by Michael Klemm. Differential Revision: https://reviews.llvm.org/D74956	2020-03-04 18:15:05 +03:00
Jon Chesterfield	ad3d021b9e	[libomptarget][nfc][amdgcn] Simplify assert_fail implementation	2020-03-03 18:24:51 +00:00
Alexey Bataev	c4a9d976c1	[LIBOMPTARGET]Lower priority of global constructor/destructor to silence the warning from gcc. Summary: fixed the warning from gcc since prios 0-100 are reserved for the internal use. Reviewers: grokos Subscribers: kkwli0, caomhin, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D75458	2020-03-02 15:15:11 -05:00
Alexey Bataev	63cef621f9	[LIBOMPTARGET]Fix PR44933: fix crash because of the too early deinitialization of libomptarget. Summary: Instead of using global variables with unpredicted time of deinitialization, use dynamically allocated variables with functions explicitly marked as global constructor/destructor and priority. This allows to prevent the crash because of the incorrect order of dynamic libraries deinitialization. Reviewers: grokos, hfinkel Subscribers: caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D74837	2020-02-25 15:54:37 -05:00
Kelvin Li	e16e267bb6	[OpenMP][cmake] ignore warning on unknown CUDA version Differential Revision: https://reviews.llvm.org/D75001	2020-02-25 09:29:07 -05:00
Shoaib Meenai	e34ddc09f4	[arcconfig] Delete subproject arcconfigs From https://secure.phabricator.com/book/phabricator/article/arcanist_new_project/: > An .arcconfig file is a JSON file which you check into your project's root. I've done some experimentation, and it looks like the subproject .arcconfigs just get ignored, as the documentation says. Given that we're fully on the monorepo now, it's safe to remove them. Differential Revision: https://reviews.llvm.org/D74996	2020-02-24 16:20:36 -08:00
serge-sans-paille	99b03c1c18	Detect and disable openmp tests that require multiple hardware processor to run Team tests seem to require at least two physical cores, and using the same trick as in https://reviews.llvm.org/D55598 doesn't work (why?) . Using lit configuration instead. Differential Revision: https://reviews.llvm.org/D74921	2020-02-21 14:02:12 +01:00
Yuanfang Chen	c2c4f1c120	[openmp][cmake] passing option argument correctly From the context, it looks like the test should not be run with `check-all`, but it does. It turns out option argument resolving to True/False which could not be passed down as is. There is one such example in AddLLVM.cmake.	2020-02-13 09:33:58 -08:00
Alexey Bataev	578c13d13c	[OPENMP]Fix the test, NFC.	2020-02-13 10:40:06 -05:00
Ethan Stewart	190a11148b	Changed omp_get_max_threads() implementation to more closely match spec description. Summary: The 5.0 spec states, "The omp_get_max_threads routine returns an upper bound on the number of threads that could be used to form a new team if a parallel construct without a num_threads clause were encountered after execution returns from this routine." The attached test shows Max Threads: 96, Num Threads: 128 without the proposed change. The number of threads should not exceed the (max) nthreads ICV, hence we should return the higher SPMD thread number even when omp_get_max_threads() is called in a generic kernel. This change does fail the api test, max_threads.c, because now it would return 64 instead of 32. Reviewers: jdoerfert, ABataev, grokos, JonChesterfield Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D74092	2020-02-12 23:29:34 +00:00
JonChesterfield	c2ce9ea4e3	[libomptarget][nfc] Change enum values to match those in cuda/rtl Summary: [libomptarget][nfc] Change enum values to match those in cuda/rtl support.h and cuda/rtl.cpp (and downsteam hsa/rtl.cpp) have enums for execution mode. These are actually independent - the numbers that used within support, or within the plugin, are never passed across the boundary. Nevertheless, trying to work out why the values are different between the two has generated a reasonable amount of confusion. This patch changes support to match the values in plugin, on the basis that the plugin also has some comments which I'd have to update if I changed that one instead. Credit to Ron for working through this in our own fork. See rocm-developer-tools/aomp/issues/7 for that earlier diagnostic write up. Also happy with generic = 0, spmd = 1 - provided it's the same in both places. Reviewers: jdoerfert, grokos, ABataev, ronlieb Reviewed By: grokos Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D74503	2020-02-12 23:27:08 +00:00
Kelvin Li	4f1f2b7a5b	[OpenMP] update strings output of libomp.so [NFC] Change the string from "Intel(R) OMP" to "LLVM OMP" in libomp.so Differential Revision: https://reviews.llvm.org/D74462	2020-02-12 15:45:55 -05:00
Johannes Doerfert	a5153dbc36	[OpenMP][Offloading] Added support for multiple streams so that multiple kernels can be executed concurrently Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D74145	2020-02-11 22:07:14 -06:00
Johannes Doerfert	3ff4e2eee8	[OpenMP] Switch default C++ standard to C++ 14 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D74258	2020-02-11 17:11:54 -06:00
Jonas Devlieghere	4fe839ef3a	[CMake] Rename EXCLUDE_FROM_ALL and make it an argument to add_lit_testsuite EXCLUDE_FROM_ALL means something else for add_lit_testsuite as it does for something like add_executable. Distinguish between the two by renaming the variable and making it an argument to add_lit_testsuite. Differential revision: https://reviews.llvm.org/D74168	2020-02-06 15:33:18 -08:00
Jon Chesterfield	6a82f0f0b9	[libomptarget] Implement wavefront functions for amdgcn Summary: [libomptarget] Implement wavefront functions for amdgcn Reviewers: jdoerfert, ABataev, grokos, arsenm Reviewed By: arsenm Subscribers: saiislam, wdng, arsenm, jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D73077	2020-02-04 21:55:29 +00:00
protze@itc.rwth-aachen.de	90e4ebdce5	[OpenMP][OMPT] fix reduction test for 32-bit x86 Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=44733 \| TEST 'libomp :: ompt/synchronization/reduction/tree_reduce.c' FAILED on 32-bit x86 ]] For 32-bit we need at least 3 variables to avoid atomic reduction to be choosen by runtime function `__kmp_determine_reduction_method`. This patch adds reduction variables to the testcase. Reviewers: mgorny, Hahnfeld Differential Revision: https://reviews.llvm.org/D73850	2020-02-04 12:19:10 +01:00
Jon Chesterfield	ab9762a9f5	Revert "[nfc][libomptarget] Remove SHARED annotation from local variables" This reverts commit `0e9374e374`. Revert D73239. It fails some local testing, cause presently unknown	2020-01-27 20:05:17 +00:00
Michał Górny	3c545e4b73	[openmp] Disable archer if LIBOMP_OMPT_SUPPORT is off This fixed build failures due to missing ompt headers. See https://bugs.gentoo.org/700762. Differential Revision: https://reviews.llvm.org/D73249	2020-01-23 19:26:18 +01:00
Kelvin Li	ad24cf2a94	[OpenMP] change omp_atk_* and omp_atv_* enumerators to lowercase [NFC] The OpenMP spec defines the OMP_ATK_* and OMP_ATV_* to be lowercase. Differential Revision: https://reviews.llvm.org/D73248	2020-01-23 11:15:44 -05:00
Jon Chesterfield	0e9374e374	[nfc][libomptarget] Remove SHARED annotation from local variables Summary: [nfc][libomptarget] Remove SHARED annotation from local variables A few local variables in reduction.cu were marked SHARED. This patch leaves all per-kernel global state localised in omp_data.cu. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D73239	2020-01-23 00:00:23 +00:00
Alexey Bataev	9148b8b734	[OpenMP][Offloading] Fix the issue that omp_get_num_devices returns wrong number of devices, by Shiley Tian. Summary: This patch is to fix issue in the following simple case: #include <omp.h> #include <stdio.h> int main(int argc, char *argv[]) { int num = omp_get_num_devices(); printf("%d\n", num); return 0; } Currently it returns 0 even devices exist. Since this file doesn't contain any target region, the host entry is empty so further actions like initialization will not be proceeded, leading to wrong device number returned by runtime function call. Reviewers: jdoerfert, ABataev, protze.joachim Reviewed By: ABataev Subscribers: protze.joachim Tags: #openmp Differential Revision: https://reviews.llvm.org/D72576	2020-01-21 13:25:18 -05:00
David Carlier	ea99c09963	[OpenMP] affinity little fix for FreeBSD - pthread affinity np has different semantic than sched affinity counterpart. On success returns strictly 0. Reviewers: chandlerc, AndreyChurbanov, jdoerfert Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D72132	2020-01-20 18:52:10 +00:00
Jon Chesterfield	03c2a59cd6	[libomptarget] Implement smid for amdgcn Summary: [libomptarget] Implement smid for amdgcn Implementation is in a new file as it uses an intrinsic with complicated encoding that warranted substantial comments. Reviewers: jdoerfert, grokos, ABataev, ronlieb Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72956	2020-01-20 14:52:17 +00:00
Joachim Protze	39f746d8de	[OpenMP][Tool] Fix memory leak and double-allocation Fix the memory leak pointed out in https://reviews.llvm.org/D70412. And a second one due to double-allocation. Reviewed by: Hahnfeld Differential revision: https://reviews.llvm.org/D72779	2020-01-16 10:05:06 -10:00
George Rokos	e244145ab0	[LIBOMPTARGET] Do not increment/decrement the refcount for "declare target" objects The reference counter for global objects marked with declare target is INF. This patch prevents the runtime from incrementing /decrementing INF refcounts. Without it, the map(delete: global_object) directive actually deallocates the global on the device. With this patch, such a directive becomes a no-op. Differential Revision: https://reviews.llvm.org/D72525	2020-01-14 16:30:38 -08:00
Joachim Protze	2d4571bf30	[OpenMP][Tool] Runtime warning for missing TSan-option TSan spuriously reports for any OpenMP application a race on the initialization of a runtime internal mutex: ``` Atomic read of size 1 at 0x7b6800005940 by thread T4: #0 pthread_mutex_lock <null> (a.out+0x43f39e) #1 __kmp_resume_64 <null> (libomp.so.5+0x84db4) Previous write of size 1 at 0x7b6800005940 by thread T7: #0 pthread_mutex_init <null> (a.out+0x424793) #1 __kmp_suspend_initialize_thread <null> (libomp.so.5+0x8422e) ``` According to @AndreyChurbanov this is a false positive report, as the control flow of the runtime guarantees the ordering of the mutex initialization and the lock: https://software.intel.com/en-us/forums/intel-open-source-openmp-runtime-library/topic/530363 To suppress this report, I suggest the use of TSAN_OPTIONS='ignore_uninstrumented_modules=1'. With this patch, a runtime warning is provided in case an OpenMP application is built with Tsan and executed without this Tsan-option. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D70412	2020-01-14 09:58:05 -10:00
Jon Chesterfield	2a43688a0a	[nfc][libomptarget] Refactor nvptx/target_impl.cu Summary: [nfc][libomptarget] Refactor nxptx/target_impl.cu Use __kmpc_impl_atomic_add instead of atomicAdd to match the rest of the file. Alternatively, target_impl.cu could use the cuda functions directly. Using a mixture in this file was an oversight, happy to resolve in either direction. Removed some comments that look outdated. Call __kmpc_impl_unset_lock directly to avoid a redundant diagnostic and remove an implict dependency on interface.h. Reviewers: ABataev, grokos, jdoerfert Reviewed By: jdoerfert Subscribers: jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72719	2020-01-14 19:27:45 +00:00
Jon Chesterfield	2d287bec3c	[nfc][libomptarget] Refactor amdgcn target_impl Summary: [nfc][libomptarget] Refactor amdgcn target_impl Removes references to internal libraries from the header Standardises on C++ mangling for all the target_impl functions Update comment block clang-format Move some functions into a new target_impl.hip source file This lays the groundwork for implementing the remaining unresolved symbols in the target_impl.hip source. Reviewers: jdoerfert, grokos, ABataev, ronlieb Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72712	2020-01-14 19:27:07 +00:00
Joachim Protze	ed810da732	[OpenMP][Tool] Improving stack trace for Archer The OpenMP runtime is not instrumented, so entering the runtime leaves no hint on the source line of the pragma on ThreadSanitizer's function stack. This patch adds function entry/exit annotations for OpenMP parallel regions, and synchronization regions (barrier, taskwait, taskgroup). Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D70408	2020-01-13 22:14:06 -10:00
Joachim Protze	84637408f2	[OpenMP][Tool] Make tests for archer dependent on TSan If the openmp project is built standalone, the test compiler is feature tested for an available -fsanitize=thread flag. If the openmp project is built as part of llvm, the target tsan is needed to test archer. An additional line (requires tsan) was introduced to the tests, this patch updates the line numbers for the race. Follow-up for `77ad98c` Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D71914	2020-01-13 21:47:58 -10:00
Alexey Bataev	b19c0810e5	[LIBOMPTARGET]Ignore empty target descriptors. Summary: If the dynamically loaded module has been compiled with -fopenmp-targets and has no target regions, it has empty target descriptor. It leads to a crash at the runtime if another module has at least one target region and at least one entry in its descriptor. The runtime library is unable to load the empty binary descriptor and terminates the execution. Caused by a clang-offload-wrapper. Reviewers: grokos, jdoerfert Subscribers: caomhin, kkwli0, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72472	2020-01-10 09:45:27 -05:00
Kazuaki Ishizaki	4c6a098ad5	[OpenMP] NFC: Fix trivial typos in comments Reviewers: jdoerfert, Jim Reviewed By: Jim Subscribers: Jim, mgorny, guansong, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D72285	2020-01-07 14:05:03 +08:00
Kelvin Li	19433b199d	[OpenMP] Fix incorrect property of __has_attribute() macro __has_attribute(fallthough) -> __has_attribute(fallthrough) Submitted by: kiszk (Kazuaki Ishizaki <ishizaki@jp.ibm.com>) Differential Revision: https://reviews.llvm.org/D72287	2020-01-06 15:00:10 -05:00
Kelvin Li	ed5fe64581	[OpenMP] NFC: Fix trivial typos in comments Submitted by: kiszk Differential Revision: https://reviews.llvm.org/D72171	2020-01-03 22:03:42 -05:00
Jon Chesterfield	bc48af8c57	[libomptarget][nfc] Change unintentional target_impl prefix to kmpc_impl	2019-12-30 20:50:23 +00:00
protze@itc.rwth-aachen.de	3356e268f6	[OpenMP] Implementation of OMPT reduction callbacks Including two tests These callbacks were added late to the 5.0 specification, an implementation is missing. Reviewed By: jdoerfert Differential Review: https://reviews.llvm.org/D70395	2019-12-27 15:30:51 +01:00
Jon Chesterfield	63e2aa5658	[libomptarget][nfc] Provide target_impl malloc/free Summary: [libomptarget][nfc] Provide target_impl malloc/free Sufficient to build support.cu for amdgcn Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71685	2019-12-19 16:54:28 +00:00
JonChesterfield	b40822fc14	[libomptarget][nvptx] Fix build, second symbol reordering	2019-12-19 02:02:44 +00:00
Jon Chesterfield	89a2bef27a	[libomptarget][nvptx] Fix build, symbol ordering in target_impl.h	2019-12-19 01:50:06 +00:00
JonChesterfield	9aefe5f65e	[libomptarget][amdgcn] Correct return type of extern __clock64 to unsigned	2019-12-19 00:11:21 +00:00
Jon Chesterfield	2caeaf2f45	[libomptarget][nfc] Introduce atomic wrapper function Summary: [libomptarget][nfc] Introduce atomic wrapper function Wraps atomic functions in a template prefixed __kmpc_atomic that dispatches to cuda or hip atomic functions. Intended to be easily extended to dispatch to OpenCL or C++ atomics for a third target. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: Anastasia, jvesely, mgrang, dexonsmith, llvm-commits, mgorny, jfb, openmp-commits Tags: #openmp, #llvm Differential Revision: https://reviews.llvm.org/D71404	2019-12-18 20:06:17 +00:00
JonChesterfield	8adae6027c	[libomptarget][nfc] Extract function from data_sharing, move to common Summary: [libomptarget][nfc] Extract function from data_sharing, move to common Finding the first active thread in the warp is different on nvptx and amdgcn, mostly due to warp size and the desire for efficiency. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71643	2019-12-18 19:39:35 +00:00
Alexey Bataev	15d47deedd	[LIBOPENMP][NVPTX]Fix the build error in the runtime.	2019-12-17 14:46:04 -05:00
JonChesterfield	0c83f8ccc7	[libomptarget][nfc] Move three files under common, build them for amdgcn Summary: [libomptarget][nfc] Move three files under common, build them for amdgcn Change to reduction.cu to remove two dead includes, otherwise no code change. Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71601	2019-12-17 18:02:49 +00:00
JonChesterfield	3d3e4076cd	[libomptarget][nfc] Move omp locks under target_impl Summary: [libomptarget][nfc] Move omp locks under target_impl These are likely to be target specific, even down to the lock_t which is correspondingly moved out of interface.h. The alternative is to include interface.h in target_impl which substantiatially increases the scope of those symbols. The current nvptx implementation deadlocks on amdgcn. The preferred implementation for that arch is still under discussion - this change leaves declarations in target_impl. The functions could be inline for nvptx. I'd prefer to keep the internals hidden in the target_impl translation unit, but will add the (possibly renamed) macros to target_impl.h if preferred. Reviewers: ABataev, jdoerfert, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71574	2019-12-17 12:18:57 +00:00
Jon Chesterfield	ce12a523b0	[libomptarget][nfc] Move timer functions behind target_impl Summary: [libomptarget][nfc] Move timer functions behind target_impl Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71584	2019-12-17 02:22:29 +00:00
Jon Chesterfield	53bcd1e141	[libomptarget][nfc] Wrap cuda min() in target_impl Summary: [libomptarget][nfc] Wrap cuda min() in target_impl nvptx forwards to cuda min, amdgcn implements directly. Sufficient to build parallel.cu for amdgcn, added to CMakeLists. All call sites are homogenous except one that passes a uint32_t and an int32_t. This could be smoothed over by taking two type parameters and some care over the return type, but overall I think the inline <uint32_t> calling attention to what was an implicit sign conversion is cleaner. Reviewers: ABataev, jdoerfert Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71580	2019-12-17 01:30:04 +00:00
JonChesterfield	69fcc6ecc1	Revert "Revert "[libomptarget] Move resource id functions into target specific code, implement for amdgcn"" Summary: This reverts commit `dd8a7fcdd7`. Alexey reports undefined symbols for the new inline functions defined in target_impl.h This does not reproduce for me for nvptx, or amdgcn, under release or debug builds. I believe the patch is fine, based on: - the semantics of an inline function in C++ (the cuda INLINE functions end up as linkonce_odr in IR), which are only legal to drop if they have no uses - the code generated from a debug build of clang 9 does not show these undef symbols - the tests pass - the code is trivial To progress from here I either need: - A tie break - someone to play the role of CI in determining whether the patch works - Alexey to provide sufficient information about his build for me to reproduce the failure - Alexey to debug why the symbols are disappearing for him and report back Reviewers: ABataev, jdoerfert, grokos Subscribers: jvesely, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71502	2019-12-16 16:16:14 +00:00
Alexey Bataev	dd8a7fcdd7	Revert "[libomptarget] Move resource id functions into target specific code, implement for amdgcn" This reverts commit `dbb3fec8ad` since it breaks the NVPTX tests.	2019-12-13 16:36:06 -05:00
Jon Chesterfield	40d72134fd	[libomptarget] Build most of common/src for amdgcn Summary: [libomptarget] Build most of common/src for amdgcn Excluding parallel.cu, which uses an integer min() from cuda, Excluding support.cu, which calls malloc that is not yet available for amdgcn Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: gregrodgers, ronlieb, jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71446	2019-12-13 17:48:19 +00:00
Jon Chesterfield	56adcebfda	[libomptarget][nfc] Add nop syncwarp function for amdgcn	2019-12-13 14:27:52 +00:00
Jon Chesterfield	479868646a	[libomptarget][nfc] Add declarations of atomic functions for amdgcn Summary: [libomptarget][nfc] Add declarations of atomic functions for amdgcn This enables building more source for amdgcn. The functions are usually available in a hip runtime header, but are duplicated here to decouple the implementation Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71412	2019-12-12 22:56:14 +00:00
Jon Chesterfield	dbb3fec8ad	[libomptarget] Move resource id functions into target specific code, implement for amdgcn Summary: [libomptarget] Move resource id functions into target specific code, implement for amdgcn Reviewers: jdoerfert, ABataev, grokos Reviewed By: jdoerfert Subscribers: jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71382	2019-12-12 22:49:02 +00:00
Jon Chesterfield	b399252028	[libomptarget][nfc] Add missing header for amdgcn/target_impl	2019-12-12 09:36:57 +00:00
David Carlier	27535a1449	[OpenMP] Fix linkage issue on FreeBSD needs kmp_set_thread_affinity_mask_initial implementation.	2019-12-06 15:47:50 +00:00
JonChesterfield	0dd62c5c2e	[libomptarget][nfc] Move cuda threadfence functions behind kmpc_impl Summary: [libomptarget][nfc] Move cuda threadfence functions behind kmpc_impl Part of building code under common/ without requiring a cuda compiler Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: jvesely, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71102	2019-12-06 15:41:18 +00:00
Jon Chesterfield	cd90f49d70	[libomptarget][nfc] Move three more files to common Summary: [libomptarget][nfc] Move three more files to common Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71103	2019-12-06 15:29:50 +00:00
Jon Chesterfield	4af84d2686	[libomptarget][nfc] Introduce SHARED, ALIGN macros Summary: [libomptarget][nfc] Introduce SHARED, ALIGN macros Move remaining cuda attributes behind such macros Reviewers: ABataev, jdoerfert, grokos Reviewed By: ABataev Subscribers: openmp-commits, jvesely Tags: #openmp Differential Revision: https://reviews.llvm.org/D71076	2019-12-05 21:57:58 +00:00
Jon Chesterfield	d0b9ed5c49	[libomptarget][nfc] Move omptarget-nvptx under common Summary: [libomptarget][nfc] Move omptarget-nvptx under common Almost all files depend on require omptarget-nvptx, which no longer contains any obviously architecture dependent code. Moving it under common unblocks task/loop for amdgcn, and allows moving other code. At some point there should probably be a widespread symbol renaming to replace the nvptx string. I'd prefer to get things working first. Building this (and task.cu, loop.cu) without a cuda library requires some more refactoring, e.g. wrap threadfence(), use DEVICE macro more consistently. Patches for that are orthogonal and will be posted shortly. Reviewers: jdoerfert, ABataev, grokos Reviewed By: ABataev Subscribers: mgorny, fedor.sergeev, jfb, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D71073	2019-12-05 20:34:15 +00:00
JonChesterfield	3ada8d2a87	[libomptarget] Build a minimal deviceRTL for amdgcn Summary: [libomptarget] Build a minimal deviceRTL for amdgcn Repeat of D70414, with an include path fixed. Diff for sanity checking. The CMakeLists.txt file is functionally identical to the one used in the aomp fork. Whitespace changes were made based on nvptx/CMakeLists.txt, plus the copyright notice updated to match (Greg was the original author so would like his sign off on that here). This change will build a small subset of the deviceRTL if an appropriate toolchain is available, e.g. a local install of rocm. Support.h is moved from nvptx as a dependency of debug.h. Reviewers: ABataev, jdoerfert Reviewed By: ABataev Subscribers: jvesely, mgorny, jfb, openmp-commits, jdoerfert Tags: #openmp Differential Revision: https://reviews.llvm.org/D70971	2019-12-04 16:43:37 +00:00
Alexey Bataev	02b9c5d963	Revert "[libomptarget] Build a minimal deviceRTL for amdgcn" This reverts commit `877ffa716f` because it breaks the build.	2019-12-03 12:35:08 -05:00
Jon Chesterfield	877ffa716f	[libomptarget] Build a minimal deviceRTL for amdgcn Summary: [libomptarget] Build a minimal deviceRTL for amdgcn The CMakeLists.txt file is functionally identical to the one used in the aomp fork. Whitespace changes were made based on nvptx/CMakeLists.txt, plus the copyright notice updated to match (Greg was the original author so would like his sign off on that here). This change will build a small subset of the deviceRTL if an appropriate toolchain is available, e.g. a local install of rocm. Support.h is moved from nvptx as a dependency of debug.h. Reviewers: jdoerfert, ABataev, grokos, ronlieb, gregrodgers Reviewed By: jdoerfert Subscribers: jfb, Hahnfeld, jvesely, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70414	2019-12-03 15:18:41 +00:00
Bryan Chan	4d3198e243	[OpenMP] build offload plugins before testing them Summary: "make check-all" or "make check-libomptarget" would attempt to run offloading tests before the offload plugins are built. This patch corrects that by adding dependencies to the libomptarget CMake rules. Reviewers: jdoerfert Subscribers: mgorny, guansong, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70803	2019-11-28 17:43:56 -05:00
AndreyChurbanov	bd2fb41c2d	[openmp] Fixed nonmonotonic schedule when #threads > #chunks in a loop. Differential Revision: https://reviews.llvm.org/D70713	2019-11-27 15:26:51 +03:00
AndreyChurbanov	5f8b8d2820	[openmp] Recognise ARMv7ve machine arch. Patch by raj.khem (Khem Raj) Differential Revision: https://reviews.llvm.org/D68543	2019-11-26 14:37:24 +03:00
protze@itc.rwth-aachen.de	77ad98c808	[OpenMP][Tool] archer tests require tsan Testing for tsan capability in the test-compiler in follow-up review	2019-11-22 17:11:16 +01:00
protze@itc.rwth-aachen.de	6b2431e0c2	[OpenMP][Tool] disable archer tests in standalone build Will be enabled after Build-Bots are fixed	2019-11-22 15:25:43 +01:00
protze@itc.rwth-aachen.de	ac21de0d7e	[OpenMP][Tool] Fix cmake variable in lit.site.cfg.in As noted in D45890	2019-11-22 14:31:54 +01:00
JonChesterfield	a84b48d01e	[nfc][libomptarget] Remove casts of string literals to char*	2019-11-19 19:41:59 +00:00
JonChesterfield	4681e2e434	[nfc][libomptarget] Write amdgcn macros in terms of compiler intrinsics	2019-11-19 17:23:46 +00:00
AndreyChurbanov	3a76b8a538	Fix openmp on PowerPC64-BE-ELFv2 ABI on FreeBSD. Patch by adalava (Alfredo Dal'Ava J.nior) Differential Revision: https://reviews.llvm.org/D67190	2019-11-19 19:45:06 +03:00
Aaron Puchert	b29c7fdb61	[OpenMP] Remove -Wl,-fini=__kmp_internal_end_fini Summary: The termination function duplicated the functionality of the __attribute((destructor))-annotated function __kmp_internal_end_fini, and we have no indication that this doesn't work. The function might cause issues with link-time optimization turned on: until very recently, none of the usual linkers was reporting functions named in -Wl,-fini as used to the LTO plugin, so it might be dropped. If the function is dropped, -Wl,-fini=__kmp_internal_end_fini doesn't do what we want: with ld.bfd and lld it drops the FINI attribute from .dynamic and with gold we get FINI = 0x0, which leads to a crash on cleanup. This can be reproduced by building with -DLLVM_ENABLE_PROJECTS="clang;openmp" \ -DLLVM_ENABLE_LTO=Thin \ -DLLVM_USE_LINKER=gold The issue in lld has been fixed in `f95273f75a`, but gold remains without fix so far. Fixes PR43927. Reviewers: JonChesterfield, jdoerfert, AndreyChurbanov Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D69927	2019-11-19 00:54:58 +01:00
Jon Chesterfield	5a4a05d776	[libomptarget][nfc] Move some source into common from nvptx Summary: [libomptarget][nfc] Move some source into common from nvptx Moves some source that compiles cleanly under amdgcn into a common subdirectory Includes some non-trivial files and some headers. Keeps the cuda file extension. The build systems for different architectures seem unlikely to have much in common. The idea is therefore to set include paths such that files under common/src compile as if they were under arch/src as the mechanism for sharing. In particular, files under common/src need to be able to include target_impl.h. The corresponding -Icommon is left out in favour of explicit includes on the basis that the it makes it clearer which files under common are used by a given architecture. Reviewers: jdoerfert, ABataev, grokos Reviewed By: ABataev Subscribers: jfb, mgorny, openmp-commits Tags: #openmp Differential Revision: https://reviews.llvm.org/D70328	2019-11-18 18:17:36 +00:00
protze@itc.rwth-aachen.de	2b8115b10b	[OpenMP] Add implementation and tests of Archer tool The tool provides TSAN annotations for OpenMP synchronization. The tool is activated if no other OMPT tool is loaded. The tool detects whether the application was built with TSan and rejects activation according to the OMPT protocol if there is no TSan-rt. Differential Revision: https://reviews.llvm.org/D45890	2019-11-18 14:45:34 +01:00
Sylvestre Ledru	9b40a7f3bf	Remove +x permission on some files	2019-11-16 14:47:20 +01:00

... 3 4 5 6 7 ...

1543 Commits