Commit Graph

2091 Commits

Author SHA1 Message Date
Jonathan Peyton 6a556ecaf4 [OpenMP][libomp] Add use-all syntax to KMP_HW_SUBSET
This patch allows the user to request all resources of a particular
layer (or core-attribute). The syntax of KMP_HW_SUBSET is modified
so the number of units requested is optional or can be replaced with an
'*' character.

e.g., KMP_HW_SUBSET=c:intel_atom@3 will use all the cores after offset 3
e.g., KMP_HW_SUBSET=*c:intel_core will use all the big cores
e.g., KMP_HW_SUBSET=*s,*c,1t will use all the sockets, all cores per
      each socket and 1 thread per core.

Differential Revision: https://reviews.llvm.org/D115826
2021-12-20 13:45:21 -06:00
Jon Chesterfield 38af5b4fd1 [libomptarget][nfc] Refactor dlwrap.h for easier reuse in D115966 and upcoming patches 2021-12-17 22:28:31 +00:00
Jon Chesterfield 91dfb32f2f [openmp][amdgpu][nfc] Mark all external functions extern C to get type checking 2021-12-17 18:46:43 +00:00
Carlo Bertolli d3abb04e14 [OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter
I missed the async info parameter in the first version of this API.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115887
2021-12-17 15:58:18 +00:00
Carlo Bertolli d83dc4c648 [OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin
This patch extends the AMDGPU plugin for OpenMP target offloading from using a single HSA queue to multiple queues (four in this patch) per device. This enables concurrent threads to concurrently submit kernel launches to the same GPU.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115771
2021-12-15 15:33:17 +00:00
Jonathan Peyton 9769340905 [OpenMP][libomp] Fix compile errors with new KMP_HW_SUBSET changes
Add missing guards around x86-specific code.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D115664
2021-12-14 08:33:05 +01:00
John Ericson ddcc02dbcc Quote some more destination paths with variables
Just defensive CMake-ing. I pulled this from D115544 and D99484 which
are blocked on some lldb CI failures I don't yet understand. Hoping to land
something smaller in the meantime.

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D115566
2021-12-13 17:29:08 +00:00
Michael Kruse 77e019c233 [OpenMP] Add "not" to test dependencies.
The `not` program is used to test executions prefixed with `%libomptarget-run-fail-`. Currently `not` is not used for libomp tests, but might be used in the future and its dependency does not add any additional burden over the already established `FileCheck` dependency.

Required to add libomptarget testing to the Phabricator pre-merge check (see https://github.com/google/llvm-premerge-checks/issues/368)

Reviewed By: jdenny, JonChesterfield

Differential Revision: https://reviews.llvm.org/D115454
2021-12-12 10:52:53 -06:00
Med Ismail Bennani 30fc88bf1d Revert "Revert "Revert "Use `GNUInstallDirs` to support custom installation dirs. -- LLVM"""
This reverts commit 492de35df4.

I tried to apply John's changes in 8d897ec915 that were expected to
fix his patch but that didn't work unfortunately.

Reverting this again to fix the macOS bots and leave him more time to
investigate the issue.
2021-12-10 17:33:54 -08:00
John Ericson 492de35df4 Revert "Revert "Use `GNUInstallDirs` to support custom installation dirs. -- LLVM""
This reverts commit 797b50d4be.

See the original D99484. @mib who noticed the original problem could not longer
reproduce it, after I tried and also failed. We are threfore hoping it went
away on its own!

Reviewed By: mib

Differential Revision: https://reviews.llvm.org/D115544
2021-12-10 20:59:43 +00:00
Joseph Huber 8425bde82d Revert "[OpenMP] Avoid costly shadow map traversals whenever possible"
This reverts commit 7c8f4e7b85.
Fails a few OpenMP tests, causes a few updates to segfault.
2021-12-10 15:57:58 -05:00
Jonathan Peyton df20599597 [OpenMP][libomp] Add core attributes to KMP_HW_SUBSET
Allow filtering of resources based on core attributes. There are two new
attributes added:
1) Core Type (intel_atom, intel_core)
2) Core Efficiency (integer) where the higher the efficiency, the more
   performant the core
On hybrid architectures , e.g., Alder Lake, users can specify
KMP_HW_SUBSET=4c:intel_atom,4c:intel_core to select the first four Atom
and first four Big cores. The can also use the efficiency syntax. e.g.,
KMP_HW_SUBSET=2c:eff0,2c:eff1

Differential Revision: https://reviews.llvm.org/D114901
2021-12-10 14:34:33 -06:00
Joseph Huber 7c8f4e7b85 [OpenMP] Avoid costly shadow map traversals whenever possible
In the OpenMC app we saw `omp target update` spending an awful lot of
time in the shadow map traversal without ever doing any update there.
There are two cases that allow us to avoid the traversal completely.
The simplest thing is that small updates cannot (reasonably) contain
an attached pointer part. The other case requires to track in the
mapping table if an entry might contain an attached pointer as part.
Given that we have a single location shadow map entries are created,
the latter is actually fairly easy as well.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D113124
2021-12-10 14:33:18 -05:00
Carlo Bertolli 28309c5436 [OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous
and synchronous kernel launch implementations into a single
synchronous version.  This patch prepares the plugin for asynchronous
implementation by:

    Privatizing actual kernel launch code (valid in both cases) into
    an anonymous namespace base function (submitted at D115267)

    - Separating the control flow path of asynchronous and synchronous
      kernel launch functions** (this diff)

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115273
2021-12-10 19:21:05 +00:00
Joel E. Denny 51168ce8d5 [OpenMP] Add test for custom state machine if have reduction
D113602 broke the custom state machine when a reduction is present, as
revealed by the reproducer this patch adds to the test suite.  In that
case, openmp-opts changes the return value to undef in
`__kmpc_get_warp_size` (which the custom state machine calls as of
D113602).  Later optimizations then optimize away the custom state
machine code as if all threads are outside the thread block, so the
target region does not execute.  D114802 fixed that but didn't add a
reproducer.

This patch also adds a `__OMP_RTL_ATTRS` entry for
`__kmpc_get_warp_size` to OMPKinds.def, which D113602 missed.  This
change does not seem to have any impact on the reduction problem.

Reviewed By: JonChesterfield, jdoerfert

Differential Revision: https://reviews.llvm.org/D113824
2021-12-10 12:53:54 -05:00
AndreyChurbanov 1031e43052 [OpenMP] libomp: fix Fortran header: lines exceeded 72-char length
Added line continuation to two long lines in Fortran header.

Differential Revision: https://reviews.llvm.org/D114537
2021-12-10 16:23:21 +03:00
Joseph Huber bc9c4d7216 [OpenMP][FIX] Pass the num_threads value directly to parallel_51
The problem with the old scheme is that we would need to keep track of
the "next region" and reset the num_threads value after it. The new RT
doesn't do it and an assertion is triggered. The old RT doesn't do it
either, I haven't tested it but I assume a num_threads clause might
impact multiple parallel regions "accidentally". Further, in SPMD mode
num_threads was simply ignored, for some reason beyond me.

In any case, parallel_51 is designed to take the clause value directly,
so let's do that instead.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D113623
2021-12-09 16:30:29 -05:00
Carlo Bertolli cc8dc5e28b [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115279
2021-12-08 23:02:39 +00:00
AndreyChurbanov 4dd8fccb71 [OpenMP] libomp: Fix crash if application send us negative thread_limit value
Regardless that specification requires thread_limit to be positive,
it is better to warn user instead of crash in case the value is negative.

Differential Revision: https://reviews.llvm.org/D115340
2021-12-08 19:02:57 +03:00
Jon Chesterfield 14ff611fe1 Revert "[OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version"
This reverts commit 6de698bf10.
It didn't build in the dynamic_hsa configuration
2021-12-08 08:23:12 +00:00
Carlo Bertolli 6de698bf10 [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version
Prepare amdgpu plugin for asynchronous implementation. This patch switches to using HSA API for asynchronous memory copy.
Moving away from hsa_memory_copy means that plugin is responsible for locking/unlocking host memory pointers.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115279
2021-12-07 23:05:23 +00:00
Carlo Bertolli d9b1d827d2 [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch
At present, amdgpu plugin merges both asynchronous and synchronous kernel launch implementations into a single synchronous version.
This patch prepares the plugin for asynchronous implementation by:
- Privatizing actual kernel launch code (valid in both cases) into an anonymous namespace base function

Actual separation of kernel launch code (async vs sync) is a following patch.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115267
2021-12-07 21:02:45 +00:00
Martin Storsjö db32c4f456 [OpenMP] Disable libomptarget profiling by default if built via the "runtimes" setup
In the "runtimes" setup, the runtime (e.g. OpenMP) can be built for
a target entirely different from the current host build (where LLVM
and Clang are built). If profiling is enabled, libomptarget links
against LLVMSupport (which only has been built for the host).

Thus, don't enable profiling by default in this setup.

This should allow relanding D113253.

Differential Revision: https://reviews.llvm.org/D114083
2021-12-07 22:23:50 +02:00
Ye Luo 21a51cebf1 [OpenMP][libomptarget] amdgpu plugin adds runpath for dependencies
amdgpu plugin depends on libhsa-runtime64 library. Add runpath in case it is not on the LD_LIBRARY_PATH.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D115198
2021-12-06 18:19:18 -06:00
Jon Chesterfield a05a0c3c2f [libomptarget] Add cmake variables to disable building the amdgpu or cuda plugins
Analogous to the controls on building device runtimes

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115148
2021-12-06 16:42:26 +00:00
Jon Chesterfield a2b3b4dadc [openmp] Run tests on both runtimes, independent of the default
Minor fix to the lit.cfg. Currently, nvptx runs the tests twice on the new runtime.
Soon, amdgpu will run them on the new runtime as well as the old.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115150
2021-12-06 16:41:23 +00:00
Jon Chesterfield 9e08c2054a [openmp] Enable tests on new devicertl on amdgpu
Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D114891
2021-12-06 15:26:18 +00:00
Jon Chesterfield 1a87a18955 [openmp][amdgpu] Disable tests requiring USM on amdgcn
These tests tend to hang or crash on hardware that doesn't
support USM. Disabling them helps diagnose other issues. To safely
enable we require a means of testing whether USM is expected to work.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115144
2021-12-06 13:25:23 +00:00
Matt Arsenault 90f914c870 OpenMP: Un-xfail tests that pass now
729bf9b26b should have fixed these
2021-12-04 11:25:22 -05:00
Ron Lieberman 8f4013ad46 Restric xfail on openmp/libomptarget/test/mapping/reduction_implicit_map.cpp to amdgcn-amd-amdhsa 2021-12-02 20:58:26 +00:00
Ron Lieberman f87c2c637e xfail: libomptarget reduction_implicit_map.cpp after reapply of Start calling setTargetAttributes 2021-12-02 20:38:25 +00:00
Jon Chesterfield fb9fc3c951 [openmp][amdgpu] Disable three tests in preparation for new runtime 2021-12-02 07:57:14 +00:00
Kazushi (Jam) Marukawa 5e2358c781 [runtimes][openmp] Change to not treat ARCH-unknown-linux-gnu as errors
When OpenMP is compiled as a part runtimes for multiple targets, openmp
is compiled under build/runtimes/runtimes-arch-unknown-linux-gnu-bins
directory.  Old implementation treats this directory name as errors.
This patch adds a guard like "[Uu]known[^-]".

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D114346
2021-12-01 08:33:37 +09:00
Jonathan Peyton 618f8dc5e5 [OpenMP][libomp][doc] Add environment variables documentation
Add documentation for the environment variables for libomp

Differential Revision: https://reviews.llvm.org/D114269
2021-11-30 16:29:31 -06:00
Jon Chesterfield 3ab150f6e4 [openmp][devicertl] Add a missing loader_uninitialized attribute 2021-11-29 23:54:37 +00:00
Matt Arsenault 935abeaace OpenMP: Correctly query location for amdgpu-arch
This was trying to figure out the build path for amdgpu-arch, and
making assumptions about where it is which were not working on my
system. Whether a standalone build or not, we should have a proper
imported target to get the location from.
2021-11-29 16:31:32 -05:00
Jon Chesterfield ae5348a38e [openmp][amdgpu] Make plugin robust to presence of explicit implicit arguments
OpenMP (compiler) does not currently request any implicit kernel
arguments. OpenMP (runtime) allocates and initialises a reasonable guess at
the implicit kernel arguments anyway.

This change makes the plugin check the number of explicit arguments, instead
of all arguments, and puts the pointer to hostcall buffer in both the current
location and at the offset expected when implicit arguments are added to the
metadata by D113538.

This is intended to keep things running while fixing the oversight in the
compiler (in D113538). Once that patch lands, and a following one marks
openmp kernels that use printf such that the backend emits an args element
with the right type (instead of hidden_node), the over-allocation can be
removed and the hardcoded 8*e+3 offset replaced with one read from the
.offset of the corresponding metadata element.

Reviewed By: estewart08

Differential Revision: https://reviews.llvm.org/D114274
2021-11-22 23:00:20 +00:00
Joseph Huber fbfe8fcbc3 [Libomptarget] Remove undefined symbol in old runtime
A function with no definition was left in the old runtime, causing
linker errors when trying to compile.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D114264
2021-11-20 08:26:57 -05:00
Jon Chesterfield 04954824ee [openmp][amdgpu][nfc] Simplify implicit args handling
Removes a +x/-x pair on the only store/load of a variable
and deletes some nearby dead code. Also reduces the size of the implicit
struct to reflect the code currently emitted by clang.

Differential Revision: https://reviews.llvm.org/D114270
2021-11-19 20:18:23 +00:00
Jon Chesterfield 9cdaf0b01b [openmp][amdgpu][nfc] Inline interop_hsa_get_kernel_info into only caller 2021-11-19 18:45:17 +00:00
Alexey Bataev 80256605f8 [OpenMP] support depend clause for taskwait directive, by Deepak
Eachempati.

This patch adds clang (parsing, sema, serialization, codegen) support for the 'depend' clause on the 'taskwait' directive.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D113540
2021-11-19 06:30:17 -08:00
Peyton, Jonathan L a733b18bdb [OpenMP][libomp] Enable HWLOC topology detection of multiple CPU kinds
Teach the HWLOC topology method how to detect Atom and Core
types so hybrid CPUs are properly detected and represented when using
the HWLOC topology method.

Differential Revision: https://reviews.llvm.org/D112270
2021-11-17 16:30:18 -06:00
Peyton, Jonathan L 286094af9b [OpenMP][libomp] Improve Windows Processor Group handling within topology
The current implementation of Windows Processor Groups has
a separate topology method to handle them. This patch deprecates
that specific method and uses the regular CPUID topology
method by default and inserts the Windows Processor Group objects
in the topology manually.

Notes:
* The preference for processor groups is lowered to a value less than
  socket so that the user will see sockets in the KMP_AFFINITY=verbose
  output instead of processor groups when sockets=processor groups.
* The topology's capacity is modified to handle additional topology layers
  without the need for reallocation.
* If a user asks for a granularity setting that is "above" the processor
  group layer, then the granularity is adjusted "down" to the processor
  group since this is the coarsest layer available for threads.

Differential Revision: https://reviews.llvm.org/D112273
2021-11-17 16:29:01 -06:00
Peyton, Jonathan L 1dd797168e [OpenMP][libomp] Add support for offline CPUs in Linux
If some CPUs are offline, then make sure they are not included in the
fullMask even if norespect is given to KMP_AFFINITY.

Differential Revision: https://reviews.llvm.org/D112274
2021-11-17 16:28:01 -06:00
Peyton, Jonathan L a0afb9d0fc [OpenMP][libomp] Allow users to specify KMP_HW_SUBSET in any order
Remove restriction forcing users to specify the KMP_HW_SUBSET value in
topology order. This patch sorts the user KMP_HW_SUBSET value before
trying to apply it. For example: 1s,4c,2t is equivalent to 2t,1s,4c

Differential Revision: https://reviews.llvm.org/D112027
2021-11-17 15:27:37 -06:00
Jonathan Peyton c46becf500 [OpenMP][libomp][NFC] Remove non-ASCII apostrophe in comment 2021-11-17 14:46:40 -06:00
Martin Storsjö 9b2b549837 [OpenMP] Silence build warnings when built with MinGW
There's an attempt to upstream this change in
https://github.com/intel/ittapi/pull/25 too.

Differential Revision: https://reviews.llvm.org/D114069
2021-11-17 18:51:18 +02:00
Joseph Huber 374cd0fb61 [OpenMP] Fix initializer not working on AMDGPU
The RAII class used for debugging RTL entry used a shared variable to
keep track of the current depth. This used a global initializer, which
isn't supported on AMDGPU. This patch removes the initializer and
instead sets it to zero when the state is initialized in the runtime.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D113963
2021-11-16 08:17:15 -05:00
Shao-Ce SUN 0c660256eb [NFC] Trim trailing whitespace in *.rst 2021-11-15 09:17:08 +08:00
Nawrin Sultana 7a5680233e [OpenMP] Set default blocktime to 0 for hybrid cpu
Differential Revision:https://reviews.llvm.org/D113012
2021-11-12 12:05:35 -06:00