llvm-project

Commit Graph

Author	SHA1	Message	Date
Hansang Bae	95cefacfe1	[OpenMP] Fix crashing critical section with hint clause Runtime was using the default lock type without using the hint. Differential Revision: https://reviews.llvm.org/D102955	2021-05-24 17:25:01 -05:00
Dhruva Chakrabarti	ca17b26d4d	[libomptarget] [amdgpu] Fix copy-paste error setting NumThreads for a corner case. Fix the case where NumTeams was set incorrectly instead of NumThreads Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D103037	2021-05-24 15:23:15 -07:00
Pushpinder Singh	486110eb41	[AMDGPU][Libomptarget] Remove global KernelNameMap KernelNameMap contains entries like "key.kd" => key which clearly could be replaced by simple logic of removing suffix from the key. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102691	2021-05-24 08:46:08 +00:00
AndreyChurbanov	aa6e7e8da8	[OpenMP] libomp: move warnings to after library initialization Warnings on deprecated api cannot be suppressed if the library is not initialized. With this change it is possible to set KMP_WARNINGS=false to suppress the warnings. Differential Revision: https://reviews.llvm.org/D102676	2021-05-21 23:47:23 +03:00
George Rokos	d0bc04d6b9	[libomptarget] Fix a bug whereby firstprivates are not copied over to the device The check for the TO flag when processing firstprivates is missing. As a result, sometimes the device copy of a firstprivate never gets initialized. Currectly we try to force lambda structs to be allocated immediately by marking them as a non-firstprivate, so that PrivateArgumentManagerTy::addArg allocates memory for them immediately. However, calling addArg with IsFirstPrivate=false makes the function skip initializing the device copy. Whether an argument is firstprivate and whether we need to allocate memory immediately are not synonyms, so this patch introduces one more control variable for immediate allocation and sets it apart from initialization. Differential Revision: https://reviews.llvm.org/D102890	2021-05-21 10:52:08 -07:00
Jon Chesterfield	d54712ab4d	[libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation [libomptarget][amdgpu] Mark alloc, free weak to facilitate local experimentation There are a lot of different ways we might implement the devicertl local alloc and free functions. Via host, local buffers (stack or arena), specialising per kernel etc. It is not yet clear what the right design is. This change makes the alloc and free functions weak, so one can override them from local tests while comparing options. Not strictly necessary, as a comparable patch can be applied locally each time, but would be convenient for out of tree dev. Plan would be to drop the weak attribute at the same time as introducing a working allocator to trunk. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102499	2021-05-21 16:09:22 +01:00
Jon Chesterfield	68b88ae670	[libomptarget] Improve dlwrap compile time error diagnostic [libomptarget] Improve dlwrap compile time error diagnostic The dlwrap interface takes an explict arity, e.g. DLWRAP(cuAlloc, 2); This probably can't be eliminated as it controls the argument list of an external symbol, not an inline header function. If the arity given is too big, the error from clang referring to the line is in the middle of implementation details. /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1277:7: error: static_assert failed due to requirement '0UL < tuple_size<std::tuple<>>::value' "tuple index is in range" static_assert(__i < tuple_size<tuple<>>::value, ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1260:7: ... /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/tuple:1260:7: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:93:27 ... /home/amd/llvm-project/openmp/libomptarget/plugins/cuda/dynamic_cuda/cuda.cpp:34:1: note: in instantiation of template class 'dlwrap::trait<cudaError_enum ()(unsigned long , unsigned long)>::arg<2>' requested here DLWRAP(cuMemAlloc, 3); ^ /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:51:31: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:166:3: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:133:3: ... /home/amd/llvm-project/openmp/libomptarget/include/dlwrap.h:186:37: ... If the arity is too small, the diagnostic is better: cuda/dynamic_cuda/cuda.cpp:34:1: error: too few arguments to function call, expected 2, have 1 DLWRAP(cuMemAlloc, 1); This patch changes the diagnostic to: cuda/dynamic_cuda/cuda.cpp:34:1: error: static_assert failed due to requirement '1 == trait<cudaError_enum ()(unsigned long , unsigned long)>::nargs' "Arity Error" DLWRAP(cuMemAlloc, 1); or cuda/dynamic_cuda/cuda.cpp:34:1: error: static_assert failed due to requirement '3 == trait<cudaError_enum ()(unsigned long , unsigned long)>::nargs' "Arity Error" DLWRAP(cuMemAlloc, 3); Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102858	2021-05-20 20:33:36 +01:00
Jon Chesterfield	d18fb09c69	[libomptarget][amdgpu] Remove majority of fatal errors [libomptarget][amdgpu] Remove majority of fatal errors Replaces most calls to exit() with returning an error to the library entry point. Minor changes to error handling for clear bugs, remove some dead code. Each exit() call site replaced is either in a library entry point or a function that already returns error codes on some paths. The existing handling is not well tested but replacing exit() with a fallback path should be a strict improvement. Remaining two early exit points are an abort() from a callback and exit() from within msgpack. Fixes for those are less obvious and left for a later patch. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102346	2021-05-20 16:26:43 +01:00
Jon Chesterfield	ea68ad6e26	[libomptarget] Disable test bug49334 on amdgpu [libomptarget] Disable test bug49334 on amdgpu Hangs on amdgpu, do not know why. Disable to unblock build. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D102017	2021-05-20 15:46:56 +01:00
Pushpinder Singh	d7503c3bce	[AMDGPU][Libomptarget] Rename & move g_executables to private This patch moves g_executables to private member of Runtime class and is renamed to HSAExecutables following LLVM naming convention. This movement required making Runtime::Initialize and Runtime::Finalize non-static. Verified the correctness of this change by running libomptarget tests on gfx906. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102600	2021-05-18 05:43:23 +00:00
Pushpinder Singh	3bc2b97b34	[AMDGPU][libomptarget] Remove unused global variables This initial patch removes some unused variables from global namespace. There will more incoming patches for moving global variables to classes or static members. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D102598	2021-05-18 05:40:49 +00:00
Shilei Tian	af6511d730	[OpenMP] Fixed Bug 49356 Bug 49356 (https://bugs.llvm.org/show_bug.cgi?id=49356) reports crash in the test case `tasking/bug_taskwait_detach.cpp`, which is caused by the wrong function declaration. `gtid` in `__kmpc_omp_task` should be `kmp_int32`. Reviewed By: AndreyChurbanov Differential Revision: https://reviews.llvm.org/D102584	2021-05-17 12:14:54 -04:00
Aakanksha Patil	464e4dc50f	[AMDGPU] Add gfx1034 target Differential Revision: https://reviews.llvm.org/D102306	2021-05-13 14:25:18 -04:00
Jon Chesterfield	10de217209	[libomptarget][amdgpu] Fix truncation error for partial wavefront [libomptarget][amdgpu] Fix truncation error for partial wavefront The partial barrier implementation involves one wavefront resetting and N-1 waiting. This change future proofs against launching with a number of threads that is not a multiple of the wavefront size. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102407	2021-05-13 17:31:57 +01:00
Jon Chesterfield	b049870d3b	[libomptarget][amdgpu] Convert an assert to print and offload_fail [libomptarget][amdgpu] Convert an assert to print and offload_fail The kernel launched is supposed to be present in the binary, but a not yet diagnosed bug means it is missing for some of the qmcpack test cases. Changing from assert to print and offload_fail should help diagnose that and similar bugs. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102378	2021-05-13 17:31:36 +01:00
Michael Kruse	34ed3e6337	[OpenMP] Test unified shared memory tests only on systems that support it. Add a `REQUIRES: unified_shared_memory` option to tests that use `#pragma omp requires unified_shared_memory`. For CUDA, the feature tag is derived from LIBOMPTARGET_DEP_CUDA_ARCH which itself is derived using [[ https://cmake.org/cmake/help/latest/module/FindCUDA.html#commands \| cuda_select_nvcc_arch_flags ]]. The latter determines which compute capability the GPU in the system supports. To ensure that this is the CUDA arch being used, we could also set the `-Xopenmp-target -march=` flag. In the absence of an NVIDIA GPU, LIBOMPTARGET_DEP_CUDA_ARCH will be 35. That is, in that case we are assuming unified_shared_memory is not available. CUDA plugin testing could be disabled entirely in this case, but this currently depends on `LIBOMPTARGET_CAN_LINK_LIBCUDA OR LIBOMPTARGET_FORCE_DLOPEN_LIBCUDA`, not on whether the hardware is actually available. For all other targets, nothing changes and we are assuming unified shared memory is available. This might need refinement if not the case. This tries to fix the [[ http://meinersbur.de:8011/#/builders/143 \| OpenMP Offloading Buildbot ]] that, although brand-new, only has a Pascal-generation (sm_61) GPU installed. Hence, tests that require unified shared memory are currently failing. I wish I had known in advance. Reviewed By: protze.joachim, tianshilei1992 Differential Revision: https://reviews.llvm.org/D101498	2021-05-13 11:08:04 -05:00
Jon Chesterfield	9934571eab	[libomptarget][amdgpu][nfc] Expand errorcheck macros [libomptarget][amdgpu][nfc] Expand errorcheck macros These macros expand to continue, which is confusing, or exit, which is incompatible with continuing execution on offloading fail. Expanding the macros in place makes the code look untidy but the control flow obvious and amenable to improving. In particular, exit becomes easier to eliminate. Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102230	2021-05-12 17:30:41 +01:00
Christopher Pulido	4fb0aaf033	[OpenMP] Changes to enable MSVC ARM64 build of libomp This is the first in a series of changes to the OpenMP runtime that have been done internally by Microsoft. This patch makes the necessary changes to enable libomp.dll to build with the MSVC compiler targeting ARM64. Differential Revision: https://reviews.llvm.org/D101173	2021-05-11 23:03:12 +03:00
Jon Chesterfield	72995a4bdf	[libomptarget][nfc] Add hook to easily disable building amdgcn bclib [libomptarget][nfc] Add hook to easily disable building amdgcn bclib This is useful when building LLVM with a toolchain that can't emit code for amdgcn, e.g. because it overrides the include search path with headers from another architecture, or the clang compiler is missing builtins. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D102229	2021-05-11 17:23:09 +01:00
Peyton, Jonathan L	c765d140fe	[OpenMP] Fix hidden helper + affinity When KMP_AFFINITY is set, each worker thread's gtid value is used as an index into the place list to determine the thread's placement. With hidden helpers enabled, this gtid value is shifted down leading to unexpected shifted thread placement. This patch restores the previous behavior by adjusting the mask index to take the number of hidden helper threads into account. Hidden helper threads are given the full initial mask and do not participate in any of the other affinity mechanisms (place partitioning, balanced affinity). Their affinity is only printed for debug builds. Differential Revision: https://reviews.llvm.org/D101882	2021-05-11 08:54:22 -05:00
Jon Chesterfield	dedca78d48	[libomptarget][nfc] Drop stringify in macro [libomptarget][nfc] Drop stringify in macro A step towards deleting the macros entirely. Differential Revision: https://reviews.llvm.org/D102228	2021-05-11 12:19:55 +01:00
Jon Chesterfield	6da348569c	[libomptarget] Add support for target allocators to dynamic cuda RTL [libomptarget] Add support for target allocators to dynamic cuda RTL Follow on to D102000 which introduced new calls into libcuda. This patch adds the corresponding entry points to dynamic_cuda, fixing the build for systems that do not have the cuda toolkit installed. Function types and enum from https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html Reviewed By: pdhaliwal Differential Revision: https://reviews.llvm.org/D102169	2021-05-10 15:27:50 +01:00
Pushpinder Singh	9586937ef5	[AMDGPU][OpenMP] Disable tests when amdgpu-arch fails This patch prevents runtime tests running on systems without amdgpu. Reviewed By: protze.joachim, tianshilei1992 Differential Revision: https://reviews.llvm.org/D102054	2021-05-10 07:37:27 +00:00
Vyacheslav Zakharin	f2f88f3e7a	An attempt to abandon omptarget out-of-tree builds. I want to start using LLVM component libraries in libomptarget to stop duplicating implementations already available in LLVM (e.g. LLVMObject, LLVMSupport, etc.). Without relying on LLVM in all libomptarget builds one has to provide fallback implementation for each used LLVM feature. This is an attempt to stop supporting out-of-llvm-tree builds of libomptarget. I understand that I may need to revert this, if this affects downstream projects in a bad way. Differential Revision: https://reviews.llvm.org/D101509	2021-05-07 12:43:50 -07:00
Joseph Huber	a15f8589f4	[libomptarget] Add support for target memory allocators to cuda RTL Summary: The allocator interface added in D97883 allows the RTL to allocate shared and host-pinned memory from the cuda plugin. This patch adds support for these to the runtime. Reviewed By: grokos Differential Revision: https://reviews.llvm.org/D102000	2021-05-07 10:27:02 -04:00
Jon Chesterfield	44ee974e2f	[libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one [libomptarget][nfc] Refactor amdgpu partial barrier to simplify adding a second one D101976 would require a second barrier instance. This NFC to amdgpu makes it simpler to add one (an extra global, one more line in init). Also renames the current barrier to L0. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102016	2021-05-06 23:52:19 +01:00
Jon Chesterfield	7e9351b9de	[libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin [libomptarget][amdgpu][nfc] Remove dead code from amdgpu plugin Drops an enum that was identical to a HSA one, localises some functions where they were only called from one TU. Covers everything internalize + adce can identify as dead, except for msgpack::dump which is useful when debugging. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102014	2021-05-06 23:16:32 +01:00
Jon Chesterfield	25fe17d3c1	[libomptarget] Initial documentation on amdgpu offload [libomptarget] Initial documentation on amdgpu offload Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D101927	2021-05-05 19:58:52 +01:00
Peyton, Jonathan L	9982f33e2c	[OpenMP] Refactor/Rework topology discovery code This patch does the following: 1) Introduce kmp_topology_t as the runtime-friendly structure (the corresponding global variable is __kmp_topology) to determine the exact machine topology which can vary widely among current and future architectures. The current design is not easy to expand beyond the assumed three layer topology: sockets, cores, and threads so a rework capable of using the existing KMP_AFFINITY mechanisms is required. This new topology structure has: * The depth and types of the topology * Ratio count for each consecutive level (e.g., number of cores per socket, number of threads per core) * Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads) * Equivalent topology layer map (e.g., Numa domain is equivalent to socket, L1/L2 cache equivalent to core) * Whether it is uniform or not The hardware threads are represented with the kmp_hw_thread_t structure. This structure contains the ids (e.g., socket 0, core 1, thread 0) and other information grabbed from the previous Address structure. The kmp_topology_t structure contains an array of these. 2) Generalize the KMP_HW_SUBSET envirable for the new kmp_topology_t structure. The algorithm doesn't assume any order with tiles,numa domains,sockets,cores,threads. Instead it just parses the envirable, makes sure it is consistent with the detected topology (including taking into account equivalent layers) and then trims away the unneeded subset of hardware threads. To enable this, a new kmp_hw_subset_t structure is introduced which contains a vector of items (hardware type, number user wants, offset). Any keyword within __kmp_hw_get_keyword() can be used as a name and can be shortened as well. e.g., KMP_HW_SUBSET=1s,2numa,4tile,2c,3t can be used on the KNL SNC-4 machine. 3) Simplify topology detection functions so they only do the singular task of detecting the machine's topology. Printing, and all canonicalizing functionality is now done afterwards. So many lines of duplicated code are eliminated. 4) Add new ll_caches and numa_domains to OMP_PLACES, and consequently, KMP_AFFINITY's granularity setting. All the names within __kmp_hw_get_keyword() are available for use in OMP_PLACES or KMP_AFFINITY's granularity setting. 5) Simplify and future-proof code where explicit lists of allowed affinity settings keywords inside if() conditions. 6) Add x86 CPUID leaf 4 cache detection to existing x2apic id method so equivalent caches could be detected (in particular for the ll_caches place). Differential Revision: https://reviews.llvm.org/D100997	2021-05-03 18:00:24 -05:00
Pushpinder Singh	ae845d6426	[AMDGPU][OpenMP] Enable Libomptarget runtime tests This enables the runtime tests on amdgpu targets. 10 tests have been marked as XFAIL on amdgcn currently mostly due to missing printf. Reviewed By: protze.joachim Differential Revision: https://reviews.llvm.org/D99656	2021-05-03 05:56:42 +00:00
Martin Storsjö	01d27fc408	[OpenMP] Fix warnings due to redundant semicolons. NFC.	2021-05-02 21:51:06 +03:00
Kevin Athey	bc9120047b	Correct tiny misspelling (readlef -> readelf). Getting my feet wet here as a new committer. Correct misspelling in check-depends.pl. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D101552	2021-04-30 17:20:35 -07:00
Michael Kruse	7308862ff5	[OpenMP][CMake] Use in-project clang as CUDA->IR compiler. If available, use the clang that is already built in the same project as CUDA compiler unless another executable is explicitly defined. This also ensures the generated deviceRTL IR will be consistent with the version of Clang. This patch is required to reliably test OpenMP offloading in a buildbot without either a two-stage build (e.g. with LLVM_ENABLE_RUNTIMES) or a separately installed clang on the worker that will eventually become outdated. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101265	2021-04-30 12:45:52 -05:00
Michael Kruse	3244a8b536	[OpenMP][CMake] Pass --cuda-path to regression tests. The OpenMP runtime can be compiled using a CUDA installed at non-default location with the -DCUDA_TOOLKIT_ROOT_DIR setting. However, check-openmp will fail afterwards because Clang needs to know where to find the CUDA headers. Fix by passing -cuda-path to Clang using the value of CUDA_TOOLKIT_ROOT_DIR which has been determined by CMake. Also set LD_LIBRARY_PATH such that it can find the cuda runtime when executing. This will ensure that the regression test do not depend on the current environment, but use the environment it was configured for. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101266	2021-04-27 16:27:40 -05:00
Joachim Protze	24f836e8fd	[OpenMP][libomptarget] Separate lit tests for different offloading targets (2/2) This patch fuses the RUN lines for most libomptarget tests. The previous patch D101315 created separate test targets for each supported offloading triple. This patch updates the RUN lines in libomptarget tests to use a generic run line independent of the offloading target selected for the lit instance. In cases, where no RUN line was defined for a specific offloading target, the corresponding target is declared as XFAIL. If it turns out that a test actually supports the target, the XFAIL line can be removed. Differential Revision: https://reviews.llvm.org/D101326	2021-04-27 15:54:32 +02:00
Joachim Protze	b845217b1d	[OpenMP][libomptarget] Separate lit tests for different offloading targets (1/2) This patch creates a separate test directory for each offloading target to be tested. This allows to test multiple architectures in one configuration, while still see all failing tests separately. The lit test names include the target triple, so that it will be easier to spot the failing target. This patch also allows to mark expected failing tests based on the target-triple, as the currently used triple is added to the lit "features": ``` // XFAIL: nvptx64-nvidia-cuda ``` Differential Revision: https://reviews.llvm.org/D101315	2021-04-27 12:30:01 +02:00
Joseph Huber	077fe0f739	[OpenMP][Documentation] Add FAQ entry for dynamically linked libraries Summary: Add an FAW entry detailing the support for using dynamically linked libraries with OpenMP Offloading	2021-04-26 14:21:17 -04:00
Jon Chesterfield	58f125493d	[libomptarget] Enable AMDGPU devicertl [libomptarget] Enable AMDGPU devicertl The amdgpu devicertl is written in freestanding openmp and compiles to a bitcode library (per listed gfx arch) with no unresolved symbols. It requires a recent clang, preferably the one from the same monorepo checkout. This is D98658, with printf explicitly stubbed out, after patching clang to no longer require an llvm with the amdgpu target enabled. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D101213	2021-04-24 02:24:44 +01:00
Johannes Doerfert	17330a3cb1	[OpenMP] Avoid reading uninitialized parallel level values In a last minute change request for `a2dbfb6b72` we introduced a read of the uninitialized parallel level value in SPMD-mode. We go back to initializing the array early and checking for an adjusted level. Found by the miniqmc unit tests: https://cdash.qmcpack.org/CDash/viewTest.php?onlyfailed&buildid=203434 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101123	2021-04-23 11:21:58 -05:00
Joseph Huber	59b6849012	[OpenMP] Replace global InfoLevel with a reference to an internal one. Summary: This patch improves the implementation of D100774 by replacing the global variable introduced with a function that returns a reference to an internal one. This removes the need to define the variable in every plugin that uses it. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D101102	2021-04-23 09:43:46 -04:00
Joseph Huber	2b6f20082e	[OpenMP] Add function for setting LIBOMPTARGET_INFO at runtime Summary: This patch adds a new runtime function __tgt_set_info_flag that allows the user to set the information level at runtime without using the environment variable. Using this will require an extern function, but will eventually be added into an auxilliary library for OpenMP support functions. This patch required moving the current InfoLevel to a global variable which must be instantiated by each plugin. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100774	2021-04-22 12:48:11 -04:00
Alexey Bataev	ca70512099	[OPENMP]Mark test as unsupported to avoid possible unexpected passes, NFC.	2021-04-22 08:06:25 -07:00
Giorgis Georgakoudis	a2dbfb6b72	[OpenMP] Simplify offloading parallel call codegen This revision simplifies Clang codegen for parallel regions in OpenMP GPU target offloading and corresponding changes in libomptarget: SPMD/non-SPMD parallel calls are unified under a single `kmpc_parallel_51` runtime entry point for parallel regions (which will be commonized between target, host-side parallel regions), data sharing is internalized to the runtime. Tests have been auto-generated using `update_cc_test_checks.py`. Also, the revision contains changes to OpenMPOpt for remark creation on target offloading regions. Reviewed By: jdoerfert, Meinersbur Differential Revision: https://reviews.llvm.org/D95976	2021-04-21 18:46:07 -07:00
Alexey Bataev	079884225a	[OPENMP]Fix PR49698: OpenMP declare mapper causes segmentation fault. The implicitly generated mappings for allocation/deallocation in mappers runtime should be mapped as implicit, also no need to clear member_of flag to avoid ref counter increment. Also, the ref counter should not be incremented for the very first element that comes from the mapper function. Differential Revision: https://reviews.llvm.org/D100673	2021-04-21 10:38:31 -07:00
Peyton, Jonathan L	4457565757	[OpenMP] Implement GOMP task reductions Implement the remaining GOMP_* functions to support task reductions in taskgroup, parallel, loop, and taskloop constructs. The unused mem argument to many of the work-sharing constructs has to do with the scan() directive/ inscan() modifier. If mem is set, each function will call KMP_FATAL() and tell the user scan/inscan is unsupported. The GOMP reduction implementation is kept separate from our implementation because of how GOMP presents reduction data and computes the reductions. GOMP expects the privatized copies to be present even after a #pragma omp parallel reduction(task:...) region has ended so the data is stored inside GOMP's uintptr_t* data pseudo-structure. This style is tightly coupled with GCC compiler codegen. There also isn't any init(), combiner(), fini() functions in GOMP's codegen so the two implementations were to disparate to try to wrap GOMP's around our own. Differential Revision: https://reviews.llvm.org/D98806	2021-04-16 16:36:31 -05:00
Peyton, Jonathan L	5ebbb366c4	[OpenMP] Allow affinity to re-detect for child processes Current atfork() handler for child processes does not reset the affinity masks array which prevents users from setting their own affinity in child processes. Differential Revision: https://reviews.llvm.org/D99218	2021-04-16 16:34:02 -05:00
Hansang Bae	9b98497b44	[OpenMP] Add omp_target_is_accessible() to header files -- Added omp_target_is_accessible to the header files -- Added missing const qualifier to device memory routines Differential Revision: https://reviews.llvm.org/D100420	2021-04-16 07:54:15 -05:00
Joseph Huber	83d4b2e2e0	[OpenMP] Add info for device table changes Summary: This patch adds a feature to print information whenever the host-device pointer mapping table is changed by inserting or removing an entry. This introduces a new bit field for LIBOMPTARGET_INFO at position 0x8. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D100600	2021-04-15 18:39:48 -04:00
Hansang Bae	77dc7b4653	[OpenMP] Fix printing routine for OMP_TOOL_VERBOSE_INIT Also fixed typo in the verbose message. Differential Revision: https://reviews.llvm.org/D100414	2021-04-14 07:55:26 -05:00
Hansang Bae	3da61ddae7	[OpenMP] Define omp_is_initial_device() variants in omp.h omp_is_initial_device() is marked as a built-in function in the current compiler, and user code guarded by this call may be optimized away, resulting in undesired behavior in some cases. This patch provides a possible fix for such cases by defining the routine as a variant function and removing it from builtin list. Differential Revision: https://reviews.llvm.org/D99447	2021-04-06 16:58:01 -05:00

1 2 3 4 5 ...

1670 Commits