Commit Graph

1763 Commits

Author SHA1 Message Date
Joachim Protze fedbff75f4 [OpenMP][OMPT] Fix compile-time assertion in ompt-multiplex.h
The compile-time assertion is supposed to prevent double-free caused by
unexpected combination of preprocessor defines passed by an OMPT tool.
The current defines are not used, so this patch replaces the check with
macros actually used in ompt-multiplex.h

Reported by: Semih Burak

Differential Revision: https://reviews.llvm.org/D104633
2021-07-12 12:12:09 +02:00
Johannes Doerfert a7b7b5dfe5 [OpenMP] Create and use `__kmpc_is_generic_main_thread`
In order to fold calls based on high-level knowledge and control flow
tracking it helps to expose the information as a runtime call. The
logic: `!SPMD && getTID() == getMasterTID()` was used in various places
and is now encapsulated in `__kmpc_is_generic_main_thread`. As part of
this rewrite we replaced eager computation of arguments with on-demand
computation, especially helpful if the calls can be folded and arguments
don't need to be computed consequently.

Differential Revision: https://reviews.llvm.org/D105768
2021-07-11 19:18:03 -05:00
Johannes Doerfert 1ab1f04a2b [OpenMP] Simplify variable sharing and increase shared memory size
In order to avoid malloc/free, up to NUM_SHARED_VARIABLES_IN_SHARED_MEM
(=64) variables are communicated in dedicated shared memory instead. The
simplification does avoid the need for an "init" and requires "deinit"
only if we ever communicate more than NUM_SHARED_VARIABLES_IN_SHARED_MEM
variables.

Differential Revision: https://reviews.llvm.org/D105767
2021-07-11 19:18:03 -05:00
Johannes Doerfert 0a223827de [OpenMP] Remove checkXXXX device runtime functions
We had multiple functions to determine the execution mode (SPMD/Generic)
and runtime status (initialized/uninitialized) but that just increased
complexity without a real benefit. Especially with D102307 in mind it
is helpful to reduce the dependence on the `ident_t` flags.

Differential Revision: https://reviews.llvm.org/D105586
2021-07-10 18:20:40 -05:00
Johannes Doerfert e2cfbfcc0c [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 17:53:56 -05:00
Nico Weber d3e7491333 Revert Attributor patch series
Broke check-clang, see https://reviews.llvm.org/D102307#2869065
Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`
2021-07-10 16:15:55 -04:00
Johannes Doerfert e603ca0306 [OpenMP] Remove checkXXXX device runtime functions
We had multiple functions to determine the execution mode (SPMD/Generic)
and runtime status (initialized/uninitialized) but that just increased
complexity without a real benefit. Especially with D102307 in mind it
is helpful to reduce the dependence on the `ident_t` flags.

Differential Revision: https://reviews.llvm.org/D105586
2021-07-10 12:32:51 -05:00
Johannes Doerfert 1d5711c3ee [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL
In the spirit of TRegions [0], this patch provides a simpler and uniform
interface for a kernel to set up the device runtime. The OMPIRBuilder is
used for reuse in Flang. A custom state machine will be generated in the
follow up patch.

The "surplus" threads of the "master warp" will not exit early anymore
so we need to use non-aligned barriers. The new runtime will not have an
extra warp but also require these non-aligned barriers.

[0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11

This was in parts extracted from D59319.

Reviewed By: ABataev, JonChesterfield

Differential Revision: https://reviews.llvm.org/D101976
2021-07-10 12:32:50 -05:00
Joel E. Denny d99f65de2a [OpenMP] Avoid checking parent reference count in targetDataBegin
This patch is an attempt to do for `targetDataBegin` what D104924 does
for `targetDataEnd`:

* Eliminates a lock/unlock of the data mapping table.
* Clarifies the logic that determines whether a struct member's
  host-to-device transfer occurs.  The old logic, which checks the
  parent struct's reference count, is a leftover from back when we had
  a different map interface (as pointed out at
  <https://reviews.llvm.org/D104924#2846972>).

Additionally, it eliminates the `DeviceTy::getMapEntryRefCnt`, which
is no longer used after this patch.

While D104924 does not change the computation of `IsLast`, I found I
needed to change the computation of `IsNew` for this patch.  As far as
I can tell, the change is correct, and this patch does not cause any
additional `openmp` tests to fail.  However, I'm not sure I've thought
of all use cases.  Please advise.

Reviewed By: jdoerfert, jhuber6, protze.joachim, tianshilei1992, grokos, RaviNarayanaswamy

Differential Revision: https://reviews.llvm.org/D105121
2021-07-10 12:15:04 -04:00
Joel E. Denny 1d0456361a [OpenMP] Avoid checking parent reference count in targetDataEnd
The patch has the following benefits:

* Eliminates a lock/unlock of the data mapping table.
* Clarifies the logic that determines whether a struct member's
  device-to-host transfer occurs.  The old logic, which checks the
  parent struct's reference count, is a leftover from back when we had
  a different map interface (as pointed out at
  <https://reviews.llvm.org/D104924#2846972>).

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104924
2021-07-10 12:15:04 -04:00
Alexey Bataev ab8989ab87 [OPENMP]Fix overlapped mapping for dereferenced pointer members.
If the base is used in a map clause and later we have a memberexpr with
this base, and the member is a pointer, and this pointer is dereferenced
anyhow (subscript, array section, dereference, etc.), such components
should be considered as overlapped, otherwise it may lead to incorrect
size computations, since we try to map a pointee as a part of the whole
struct, which is not true for the pointer members.

Differential Revision: https://reviews.llvm.org/D105562
2021-07-09 12:51:26 -07:00
Michał Górny 2b0d95fb58 [openmp] [test] Add missing <limits> include to capacity_nthreads
Differential Revision: https://reviews.llvm.org/D105474
2021-07-06 20:39:53 +02:00
Jon Chesterfield ddfb074a80 [libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global
[libomptarget][nfc] Group environment variables, drop accesses to DeviceInfo global

Folds some duplicates logic into a helper function, passes the new environment
struct into getLaunchVals which no longer reads the DeviceInfo global.

Implemented on top of D105237

Reviewed By: dhruvachak

Differential Revision: https://reviews.llvm.org/D105239
2021-07-06 17:06:38 +01:00
Atmn Patel 21e92612c0 [Libomptarget] Experimental Remote Plugin Fixes
D97883 introduced a compile-time error in the experimental remote offloading
libomptarget plugin, this patch fixes it and resolves a number of
inconsistencies in the plugin as well:

1. Non-functional Asynchronous API
2. Unnecessarily verbose debug printing
3. Misc. code clean ups

This is not intended to make any functional changes to the plugin.

Differential Revision: https://reviews.llvm.org/D105325
2021-07-02 12:38:34 -04:00
Hansang Bae f1b9ce2736 [OpenMP] Fix a few issues with hidden helper task
This patch includes the following changes to address a few issues when
using hidden helper task.

- Assertion is triggered when there are inadvertent calls to hidden
  helper functions on non-Linux OS
- Added deinit code in __kmp_internal_end_library function to fix random
  shutdown crashes
- Moved task data access into the lock-guarded region in __kmp_push_task

Differential Revision: https://reviews.llvm.org/D105308
2021-07-01 17:10:32 -05:00
Shilei Tian 369216ab31 [OpenMP][Offloading] Refined return value of `DeviceTy::getOrAllocTgtPtr`
`DeviceTy::getOrAllocTgtPtr` just returns a target pointer. In addition,
two bool values (`IsNew` and `IsHostPtr`) are passed by reference to make the
change in the function available in callee.

In this patch, a struct, which contains the target pointer, two flags, and an
iterator to the map table entry corresponding to the queried host pointer, will
be returned. In addition to make the logic clearer regarding the two bool values,
this paves the way for the next patch to fix the data race in `bug49334.cpp` by
attaching an event to the map table entry (and that's why we need the iterator).

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104382
2021-07-01 12:32:03 -04:00
Jon Chesterfield db89414da4 [libomptarget][nfc] Move grid size computation
Change getLaunchVals to return the integers used for launch

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D105237
2021-07-01 12:53:04 +01:00
Dhruva Chakrabarti 98c36f0079 Revert "[libomptarget] [amdgpu] Fix default setting of max flat workgroup size"
This reverts commit 2240b41ee4.
A value of 0 for KernDescVal WG_Size implies it is unknown, so it should be
set to the default. The above change was made without this assumption.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D105250
2021-06-30 17:15:00 -07:00
Jon Chesterfield 4b0926b044 [libomptarget][nfc] Replace out arguments with struct return
A step towards making this function adequately self contained that it
can be tested easily. No functional change intended here, left variable
names unchanged.

Reviewed By: ronlieb

Differential Revision: https://reviews.llvm.org/D105229
2021-06-30 22:40:07 +01:00
Jon Chesterfield d86b0073cf [libomptarget][amdgpu][nfc] Fix build warnings, drop some headers
Removes stdarg header, drops uses of iostream, fix some format string errors.
Also changes a C style struct to C++ style to avoid a warning from clang/

Reviewed By: pdhaliwal

Differential Revision: https://reviews.llvm.org/D104923
2021-06-30 22:23:36 +01:00
Shilei Tian 24a36ce58b [OpenMP][Offloading] Replace all calls to `isSPMDMode` with `__kmpc_is_spmd_exec_mode`
In our ongoing work, we are using `AbstractAttributor` to deduct execution model
of device functions, and potententially remove unnecessary function calls to
`__kmpc_is_spmd_exec_mode`. In current device runtime, we have mixed use of
`isSPMDMode` and `__kmpc_is_spmd_exec_mode`, but in fact in `__kmpc_is_spmd_exec_mode`
it simply calls `isSPMDMode`. Since all functions starting with `__kmpc` is C
function, which doesn't have things like name mangling. It is more optimization
friendly. In this patch, we simply replaced all calls to `isSPMDMode` with
`__kmpc_is_spmd_exec_mode` to pave the way for the optimization.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D105211
2021-06-30 15:39:57 -04:00
Dhruva Chakrabarti e0b713a035 [libomptarget] [amdgpu] Change default number of teams per computation unit
This patch is related to https://reviews.llvm.org/D98832. Based on discussions there, I decided to separate out the teams default as this patch. This change is to increase the number of teams per computation unit so as to provide more wavefronts for hiding latency. This change improves performance for some programs, including 20-50% for some Stream benchmarks.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D99003
2021-06-29 15:34:35 -07:00
Dhruva Chakrabarti 2240b41ee4 [libomptarget] [amdgpu] Fix default setting of max flat workgroup size
When max flat workgroup size is not specified, it is set to the default
workgroup size. This prevents kernel launch with a workgroup size larger
than the default. The fix is to ignore a size of 0 and treat it as
unspecified.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D105073
2021-06-29 13:47:24 -07:00
Johannes Doerfert 4eb90e893f Revert "[OpenMP] Add Two-level Distributed Barrier"
This reverts commit 25073a4ecf.

This breaks non-x86 OpenMP builds for a while now. Until a solution is
ready to be upstreamed we revert the feature and unblock those builds.
See:
  https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821
and
  https://reviews.llvm.org/rG25073a4ecfc9b2e3cb76776185e63bfdb094cd98#1005821

The currently proposed fix (D104788) seems not to be ready yet:
  https://reviews.llvm.org/D104788#2841928
2021-06-29 09:38:27 -05:00
Johannes Doerfert bc8bb3df35 Revert "[omp] Fix build without ITT after D103121 changes"
This reverts commit eab1fd389b.

This commit fixed a problem with 25073a4ecf (D103121) which is the one
we actually need to revert to unblock non-X86 builds of OpenMP. Can be
reapplied, or merged into, D103121 as it goes in again.
2021-06-29 09:38:27 -05:00
Joseph Huber 2190c48fde [OpenMP][Documentation] Add FAQ entry for CMake module
This patch adds documentation for using the CMake find module for OpenMP
target offloading provided by LLVM. It also removes the requirement for
AMD's architecture to be set as this isn't necessary for upstream LLVM.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105051
2021-06-28 17:05:07 -04:00
Joseph Huber c9f3240c9d [OpenMP][Documentation] Add OpenMPOpt optimization section
Add some information about the optimizations currently provided by
OpenMPOpt. Every optimization performed should eventually be listed
here.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105050
2021-06-28 17:05:03 -04:00
Pushpinder Singh 20df2c7052 [AMDGPU][Libomptarget] Collect allocatable memory pools using HSA
The logic is almost similar to that of system.cpp with one change that
instead of adding all the memory pools to a device struct it only
keeps a single pool. The existing approach also always allocated memory on
the first HSA pool found for a GPU.

This depends on D104691. The goal of this series of patches is to remove
_atl_machine global. The next patch will drop g_atl_machine entirely.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104695
2021-06-28 11:28:04 +00:00
Jon Chesterfield f66b8fdc0a [libomptarget][amdgpu] Build openmp for two more targets
[libomptarget][amdgpu] Build openmp for two more targets

The 4800U APU is a gfx902 and the MI100 accelerator is a gfx908.
Both numbers are listed in ROCT topology.c

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D104922
2021-06-25 19:02:03 +01:00
Jon Chesterfield 96f6873dff [OpenMP][NFC] Drop unused headers from amdgpu plugin 2021-06-25 12:08:56 +01:00
AndreyChurbanov b2787945f9 [OpenMP][NFC] libomp: fix wrong debug assertion.
Normalized bounds of chunk of iterations to steal from are inclusive,
so upper bound should not be decremented in expression to check.
Problem was in attempt to steal iterations 0:0, that caused assertion after
wrong decrement. Reported in comment to https://reviews.llvm.org/D103648.

Differential Revision: https://reviews.llvm.org/D104880
2021-06-25 02:02:14 +03:00
Aakanksha Patil 3453f3dd46 [AMDGPU] Add gfx1035 target
Differential Revision: https://reviews.llvm.org/D104804
2021-06-24 14:32:41 -04:00
Joel E. Denny 9fa5e3280d [OpenMP] Fix delete map type in ref count debug messages
For example, without this patch:

```
$ cat test.c
int main() {
  int x;
  #pragma omp target enter data map(alloc: x)
  #pragma omp target enter data map(alloc: x)
  #pragma omp target enter data map(alloc: x)
  #pragma omp target exit data map(delete: x)
  ;
  return 0;
}
$ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c
$ LIBOMPTARGET_DEBUG=1 ./a.out |& grep 'Creating\|Mapping exists\|last'
Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=1, Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (incremented), Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=3 (incremented), Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffddf1eaea8, TgtPtrBegin=0x00000000013bb040, Size=4, RefCount=2 (decremented)
Libomptarget --> There are 4 bytes allocated at target address 0x00000000013bb040 - is not last
```

`RefCount` is reported as decremented to 2, but it ought to be reset
because of the `delete` map type, and `is not last` is incorrect.

This patch migrates the reset of reference counts from
`DeviceTy::deallocTgtPtr` to `DeviceTy::getTgtPtrBegin`, which then
correctly reports the reset.  Based on the `IsLast` result from
`DeviceTy::getTgtPtrBegin`, `targetDataEnd` then correctly reports `is
last` for any deletion.  `DeviceTy::deallocTgtPtr` is responsible only
for the final reference count decrement and mapping removal.

An obscure side effect of this patch is that a `delete` map type when
the reference count is infinite yields `DelEntry=IsLast=false` in
`targetDataEnd` and so no longer results in a
`DeviceTy::deallocTgtPtr` call.  Without this patch, that call is a
no-op anyway besides some unnecessary locking and mapping table
lookups.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104560
2021-06-23 09:57:19 -04:00
Joel E. Denny 48421ac441 [OpenMP] Improve ref count debug messages
For example, without this patch:

```
$ cat test.c
int main() {
  int x;
  #pragma omp target enter data map(alloc: x)
  #pragma omp target exit data map(release: x)
  ;
  return 0;
}
$ clang -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda test.c
$ LIBOMPTARGET_DEBUG=1 ./a.out |& grep 'Creating\|Mapping exists'
Libomptarget --> Creating new map entry with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, Name=unknown
Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffcace8e448, TgtPtrBegin=0x00007f12ef600000, Size=4, updated RefCount=1
```

There are two problems in this example:

* `RefCount` is not reported when a mapping is created, but it might
  be 1 or infinite.  In this case, because it's created by `omp target
  enter data`, it's 1.  Seeing that would make later `RefCount`
  messages easier to understand.
* `RefCount` is still 1 at the `omp target exit data`, but it's
  reported as `updated`.  The reason it's still 1 is that, upon
  deletions, the reference count is generally not updated in
  `DeviceTy::getTgtPtrBegin`, where the report is produced.  Instead,
  it's zeroed later in `DeviceTy::deallocTgtPtr`, where it's actually
  removed from the mapping table.

This patch makes the following changes:

* Report the reference count when creating a mapping.
* Where an existing mapping is reported, always report a reference
  count action:
    * `update suppressed` when `UpdateRefCount=false`
    * `incremented`
    * `decremented`
    * `deferred final decrement`, which replaces the misleading
      `updated` in the above example
* Add comments to `DeviceTy::getTgtPtrBegin` to explain why it does
  not zero the reference count.  (Please advise if these comments miss
  the point.)
* For unified shared memory, don't report confusing messages like
  `RefCount=` or `RefCount= updated` given that reference counts are
  irrelevant in this case.  Instead, just report `for unified shared
  memory`.
* Use `INFO` not `DP` consistently for `Mapping exists` messages.
* Fix device table dumps to print `INF` instead of `-1` for an
  infinite reference count.

Reviewed By: jhuber6, grokos

Differential Revision: https://reviews.llvm.org/D104559
2021-06-23 09:57:19 -04:00
Joseph Huber 72d4cd627c [OpenMP] Introduce an CMake find module for OpenMP Target support
This introduces a CMake find module for detecting target offloading support in
a compiler. The goal is to make it easier to incorporate target offloading into
a cmake project.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D104710
2021-06-22 23:01:38 -04:00
Joseph Huber 422adaa879 [OpenMP] Add thread limit environment variable support to plugins
The OpenMP 5.1 standard defines the environment variable
`OMP_TEAMS_THREAD_LIMIT` to limit the number of threads that will be run in a
single block. This patch adds support for this into the AMDGPU and CUDA
plugins.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103923
2021-06-22 16:25:40 -04:00
Shilei Tian 0029059074 [NFC][OpenMP][Offloading] Unified the construction of mapping table entry
This patch unifies construction of mapping table entry to use `emplace`.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D104580
2021-06-22 12:38:47 -04:00
Joseph Huber 244e98ff48 [Libomptarget] Improve device runtime implementation for globalized variables.
Currently the runtime implementation of `__kmpc_alloc_shared` is extremely slow because it allocated memory for each thread individually. This patch adds a small buffer for the threads to share data and will greatly improve performance for builds where all globalization could not be optimized out. If the shared buffer is full, then memory will not only be allocated per-warp rather than per-thread.

Depends on D97680

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104666
2021-06-22 11:52:49 -04:00
Joseph Huber 952a0f2385 [Libomptarget] Introduce new globalization runtime calls
Summary:
This patch introduces the new globalization runtime to be used by D97680. These
runtime calls will replace the __kmpc_data_sharing_push_stack and
__kmpc_data_sharing_pop_stack functions.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102532
2021-06-22 10:05:42 -04:00
AndreyChurbanov 5dd4d0d46f [OpenMP] libomp: fix dynamic loop dispatcher
Restructured dynamic loop dispatcher code.
Fixed use of dispatch buffers for nonmonotonic dynamic (static_steal) schedule:
- eliminated possibility of stealing iterations of the wrong loop when victim
  thread changed its buffer to work on another loop;
- fixed race when victim thread changed its buffer to work in nested parallel;
- eliminated "static" property of the schedule, that is now a single thread can
  execute whole loop.

Differential Revision: https://reviews.llvm.org/D103648
2021-06-22 16:29:01 +03:00
Pushpinder Singh 9d110f9159 [AMDGPU][Libomptarget] Move allow_access_to_all_gpu_agents to rtl.cpp
Moving this method helps eliminate a use of g_atl_machine.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104691
2021-06-22 11:44:52 +00:00
Vladislav Vinogradov eab1fd389b [omp] Fix build without ITT after D103121 changes
Reviewed By: AndreyChurbanov

Differential Revision: https://reviews.llvm.org/D104638
2021-06-21 18:17:52 +03:00
Vyacheslav Zakharin aad9e48c5f [NFC][libomptarget] Remove redundant libelf dependency for elf_common.
Differential Revision: https://reviews.llvm.org/D104549
2021-06-21 07:19:55 -07:00
Pushpinder Singh 7a97cd9da7 [AMDGPU][Libomptarget] Remove redundant functions
There does not seem to be any use of these functions. They just
put the value to a local which is never used again.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D104512
2021-06-21 06:13:24 +00:00
Shilei Tian ec97866454 [OpenMP] Make bug49334.cpp more reproducible
`bug49334.cpp` cannot detect data race in `libomptarget` efficiently. It
is reported that with `N = 256` and `BS = 16`, the data race can be reproduced
more steadily. The next coming pathces will fix it so this patch is expected to
fail now.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104552
2021-06-18 18:35:41 -04:00
Asher Mancinelli 5c189d30e6 [OpenMP] Update FAQ for enabling cuda offloading
Add an FAQ entry and add a few lines to an existing one. Document
the use of `GCC_INSTALL_PREFIX` for pointing clang to correct
GCC installation for two-stage build.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D104474
2021-06-18 11:55:45 -06:00
Vyacheslav Zakharin 836992ab9a [NFC][libomptarget] Build elf_common with PIC.
Differential Revision: https://reviews.llvm.org/D104545
2021-06-18 09:20:10 -07:00
Vyacheslav Zakharin c5b7c7c8f7 [NFC][libomptarget] Fixed -DLLVM_ENABLE_RUNTIMES="openmp" build.
Differential Revision: https://reviews.llvm.org/D104535
2021-06-18 09:20:10 -07:00
Terry Wilmarth 25073a4ecf [OpenMP] Add Two-level Distributed Barrier
Two-level distributed barrier is a new experimental barrier designed
for Intel hardware that has better performance in some cases than the
default hyper barrier.

This barrier is designed to handle fine granularity parallelism where
barriers are used frequently with little compute and memory access
between barriers.  There is no need to use it for codes with few
barriers and large granularity compute, or memory intensive
applications, as little difference will be seen between this barrier
and the default hyper barrier. This barrier is designed to work
optimally with a fixed number of threads, and has a significant setup
time, so should NOT be used in situations where the number of threads
in a team is varied frequently.

The two-level distributed barrier is off by default -- hyper barrier
is used by default. To use this barrier, you must set all barrier
patterns to use this type, because it will not work with other barrier
patterns.  Thus, to turn it on, the following settings are required:

KMP_FORKJOIN_BARRIER_PATTERN=dist,dist
KMP_PLAIN_BARRIER_PATTERN=dist,dist
KMP_REDUCTION_BARRIER_PATTERN=dist,dist

Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER,
and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed
barrier.

Differential Revision: https://reviews.llvm.org/D103121
2021-06-16 15:34:55 -05:00
Vyacheslav Zakharin b5c4fc0f23 [NFC][libomptarget] Reduce the dependency on libelf
This change-set removes libelf usage from elf_common part of the plugins.
libelf is still used in x86_64 generic plugin code and in some plugins
(e.g. amdgpu) - these will have to be cleaned up in separate checkins.

Differential Revision: https://reviews.llvm.org/D103545
2021-06-16 08:34:23 -07:00