Commit Graph

305 Commits

Author SHA1 Message Date
Shilei Tian f2400f024d [OpenMP] Fixed the issue that target memory deallocation might be called when they're being used
This patch fixed the issue that target memory might be deallocated when
they're still being used or before they're used.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84996
2020-07-31 18:54:18 -04:00
Shilei Tian 0f10165626 [OpenMP] Refactored the function `targetDataEnd`
Refactored the function `targetDataEnd` to make preparation of fixing
the issue of ahead-of-time target memory deallocation. This patch only
renamed `targetDataEnd` related variables and functions to conform
with LLVM code standard.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84991
2020-07-30 21:39:26 -04:00
Shilei Tian 8218eee269 [OpenMP] Refactored the function `target`
Refactored the function `target` to make preparation for fixing the
issue of ahead-of-time device memory deallocation.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84816
2020-07-30 21:05:55 -04:00
Alexey Bataev 622e46156d [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region.
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84767
2020-07-30 11:18:33 -04:00
Alexey Bataev b69357c2f4 Revert "[OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region."
This reverts commit 142d0d3ed8 to
investigate undefined behavior revealed by buildbots.
2020-07-30 10:57:56 -04:00
Alexey Bataev 142d0d3ed8 [OPENMP]Fix PR46824: Global declare target pointer cannot be accessed in target region.
Need to map the base pointer for all directives, not only target
data-based ones.
The base pointer is mapped for array sections, array subscript, array
shaping and other array-like constructs with the base pointer. Also,
codegen for use_device_ptr clause was modified to correctly handle
mapping combination of array like constructs + use_device_ptr clause.
The data for use_device_ptr clause is emitted as the last records in the
data mapping array.
It applies only for global pointers.

Differential Revision: https://reviews.llvm.org/D84767
2020-07-30 09:40:05 -04:00
Joel E. Denny cee52dd026 [OpenMP] Implement TR8 `present` motion modifier in runtime (2/2)
This patch implements OpenMP runtime support for the OpenMP TR8
`present` motion modifier for `omp target update` directives.  The
previous patch in this series implements Clang front end support.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D84712
2020-07-29 12:18:50 -04:00
Shilei Tian 30440924d4 [OpenMP] Replaced mutex lock/unlock in `target` with `std::lock_guard`
Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84799
2020-07-28 20:31:40 -04:00
Joel E. Denny 65564e5eaf Revert "[OpenMP] Implement TR8 `present` motion modifier in runtime (2/2)"
This reverts commit 2cb926a447.

It depends on 3c3faae497, which is being
reverted.
2020-07-28 20:30:05 -04:00
Shilei Tian 3ce69d4d50 [NFC][OpenMP] Renamed all variable and function names in `target` to conform with LLVM code standard
This patch only touched variables and functions in `target`.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84797
2020-07-28 20:11:09 -04:00
Joel E. Denny 2cb926a447 [OpenMP] Implement TR8 `present` motion modifier in runtime (2/2)
This patch implements OpenMP runtime support for the OpenMP TR8
`present` motion modifier for `omp target update` directives.  The
previous patch in this series implements Clang front end support.

Reviewed By: grokos

Differential Revision: https://reviews.llvm.org/D84712
2020-07-28 19:15:18 -04:00
Joel E. Denny 9b4826d18b [OpenMP] Fix libomptarget negative tests to expect abort
On runtime failures, D83963 causes the runtime to abort instead of
merely exiting with a non-zero value, but many tests in the
libomptarget test suite still expect the former behavior.  This patch
updates the test suite and was discussed in post-commit comments on
D83963 and D84557.
2020-07-28 09:02:16 -04:00
Joachim Protze e2f5444c9c [OpenMP][Tests] Enable nvptx64 testing for most libomptarget tests
Also add $BUILD/lib to the LIBRARY_PATH to fix
https://bugs.llvm.org/show_bug.cgi?id=46836.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D84557
2020-07-28 11:08:24 +02:00
Ye Luo 9323166601 [OpenMP] Add more pass-through functions in DeviceTy
Summary:
1. Add DeviceTy::data_alloc, DeviceTy::data_delete, DeviceTy::data_alloc, DeviceTy::synchronize pass-through functions. Avoid directly accessing Device.RTL
2. Fix the type of the first argument of synchronize_ty in rth.h, device id is int32_t which is consistent with other functions.

Reviewers: tianshilei1992, jdoerfert

Reviewed By: tianshilei1992

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D84487
2020-07-27 16:08:30 -04:00
Johannes Doerfert 9c87466c39 [OpenMP] Use `abort` not `error` for fatal runtime exceptions
See PR46515 for the rational but generally, we want to *really* abort
not gracefully shut down.

Reviewed By: grokos, ABataev

Differential Revision: https://reviews.llvm.org/D83963
2020-07-24 15:15:38 -05:00
Shilei Tian c0185dc7df Revert "[OpenMP] Wait for kernel prior to memory deallocation"
This reverts commit 9b2832c089.
2020-07-22 23:03:36 -04:00
Shilei Tian 9b2832c089 [OpenMP] Wait for kernel prior to memory deallocation
Summary:
In the function `target`, memory deallocation and `target_data_end` is called
immediately returning from launching kernel. This might cause a race condition
that the corresponding memory is still being used by the kernel and a potential
issue that when the kernel starts to execute, its required data have already
been deallocated, especially when multiple kernels running concurrently. Since
nevertheless, we will block the thread issuing the target offloading at the end
of the target, we just move the synchronization ahead a little bit to make sure
the correctness.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D84381
2020-07-22 22:55:34 -04:00
Joel E. Denny 708752b2f6 [OpenMP] Implement TR8 `present` map type modifier in runtime (2/2)
This implements OpenMP runtime support for the OpenMP TR8 `present`
map type modifier.  The previous patch in this series implements Clang
front end support.  See that patch summary for behaviors that are not
yet supported.

Reviewed By: grokos, jdoerfert

Differential Revision: https://reviews.llvm.org/D83062
2020-07-22 14:04:58 -04:00
Joel E. Denny fc247c8f3c Revert "[OpenMP] Implement TR8 `present` map type modifier in runtime (2/2)"
This reverts commit 45b8f7ec35.

It attempts to use debug macros `DPxMOD` and `DPxPTR` in release
builds.  Will fix and reapply later.
2020-07-22 11:22:08 -04:00
Joel E. Denny 45b8f7ec35 [OpenMP] Implement TR8 `present` map type modifier in runtime (2/2)
This implements OpenMP runtime support for the OpenMP TR8 `present`
map type modifier.  The previous patch in this series implements Clang
front end support.  See that patch summary for behaviors that are not
yet supported.

Reviewed By: grokos, jdoerfert

Differential Revision: https://reviews.llvm.org/D83062
2020-07-22 10:15:32 -04:00
Joachim Protze ae31d7838c [OpenMP][NFC] pass on env variables to libomptarget tests 2020-07-22 12:14:45 +02:00
George Rokos 140ab574a1 [OpenMP][Offload] Declare mapper runtime implementation
Libomptarget patch adding runtime support for "declare mapper".
Patch co-developed by Lingda Li and George Rokos.

Differential revision: https://reviews.llvm.org/D68100
2020-07-15 18:11:43 -07:00
Johannes Doerfert 5937434677 [OpenMP] Silence unused symbol warning with proper ifdefs 2020-07-11 11:57:42 -05:00
Johannes Doerfert c98699582a [OpenMP][NFC] Remove unused (always fixed) arguments
There are various runtime calls in the device runtime with unused, or
always fixed, arguments. This is bad for all sorts of reasons. Clean up
two before as we match them in OpenMPOpt now.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D83268
2020-07-11 00:51:51 -05:00
Johannes Doerfert cd0ea03e6f [OpenMP][NFC] Remove unused and untested code from the device runtime
Summary:
We carried a lot of unused and untested code in the device runtime.
Among other reasons, we are planning major rewrites for which reduced
size is going to help a lot.

The number of code lines reduced by 14%!

Before:
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CUDA                            13            489            841           2454
C/C++ Header                    14            322            493           1377
C                               12            117            124            559
CMake                            4             64             64            262
C++                              1              6              6             39
-------------------------------------------------------------------------------
SUM:                            44            998           1528           4691
-------------------------------------------------------------------------------

After:
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CUDA                            13            366            733           1879
C/C++ Header                    14            317            484           1293
C                               12            117            124            559
CMake                            4             64             64            262
C++                              1              6              6             39
-------------------------------------------------------------------------------
SUM:                            44            870           1411           4032
-------------------------------------------------------------------------------

Reviewers: hfinkel, jhuber6, fghanim, JonChesterfield, grokos, AndreyChurbanov, ye-luo, tianshilei1992, ggeorgakoudis, Hahnfeld, ABataev, hbae, ronlieb, gregrodgers

Subscribers: jvesely, yaxunl, bollu, guansong, jfb, sstefan1, aaron.ballman, openmp-commits, cfe-commits

Tags: #clang, #openmp

Differential Revision: https://reviews.llvm.org/D83349
2020-07-10 19:09:41 -05:00
Ye Luo c5348aecd7 [OpenMP] Use primary context in CUDA plugin
Summary:
Retaining per device primary context is preferred to creating a context owned by the plugin.

From CUDA documentation
1. Note that the use of multiple CUcontext s per device within a single process will substantially degrade performance and is strongly discouraged. Instead, it is highly recommended that the implicit one-to-one device-to-context mapping for the process provided by the CUDA Runtime API be used." from https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html
2. Right under cuCtxCreate. In most cases it is recommended to use cuDevicePrimaryCtxRetain. https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf
3. The primary context is unique per device and shared with the CUDA runtime API. These functions allow integration with other libraries using CUDA.  https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX

Two issues are addressed by this patch:
1. Not using the primary context caused interoperability issue with libraries like cublas, cusolver. CUBLAS_STATUS_EXECUTION_FAILED and cudaErrorInvalidResourceHandle
2. On OLCF summit, "Error returned from cuCtxCreate" and "CUDA error is: invalid device ordinal"

Regarding the flags of the primary context. If it is inactive, we set CU_CTX_SCHED_BLOCKING_SYNC. If it is already active, we respect the current flags.

Reviewers: grokos, ABataev, jdoerfert, protze.joachim, AndreyChurbanov, Hahnfeld

Reviewed By: jdoerfert

Subscribers: openmp-commits, yaxunl, guansong, sstefan1, tianshilei1992

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82718
2020-07-07 10:14:51 -04:00
Saiyedul Islam 38d6640ba5 [libomptarget] Implement atomic inc and fence functions for AMDGCN using clang builtins
This function uses __builtin_amdgcn_atomic_inc32():
  uint32_t atomicInc(uint32_t *address, uint32_t max);

These functions use __builtin_amdgcn_fence():
__kmpc_impl_threadfence()
__kmpc_impl_threadfence_block()
__kmpc_impl_threadfence_system()

They will take place of current mechanism of directly calling IR functions.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D83132
2020-07-07 06:36:25 +00:00
Fangrui Song 6ba4380ed6 [libomptarget][test] Fix text relocations by adding -fPIC 2020-07-05 12:51:28 -07:00
Ye Luo 45bb073da8 [OpenMP] fix clang warning about printf format in CUDA plugin
Summary: Warnings are printed by clang when building LIBOMPTARGET_ENABLE_DEBUG=ON due incorrect format string.

Reviewers: tianshilei1992, jdoerfert

Reviewed By: tianshilei1992

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82789
2020-06-29 22:35:39 -04:00
Ye Luo 6e5f64c44f [OpenMP] Adopt std::set in HostDataToTargetMap
Summary:
lookupMapping took significant time due to linear complexity searching.
This is bad for offloading from multiple host threads because lookupMapping is protected by mutex.
Use std::set for logarithmic complexity searching.

Before my change.
libomptarget inclusive time 16.7 sec, exclusive time 8.6 sec.
After the change
libomptarget inclusive time 7.3 sec, exclusive time 0.4 sec.

Most of the overhead of libomptarget (exclusive time) is gone.

Reviewers: jdoerfert, grokos

Reviewed By: grokos

Subscribers: tianshilei1992, yaxunl, guansong, sstefan1

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D82264
2020-06-24 12:22:45 -04:00
Shilei Tian aaf50adb53 Revert "[OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info"
This reverts commit ee1bf45e1d.
2020-06-17 15:01:16 -04:00
Shilei Tian ee1bf45e1d [OpenMP][NFC] Added DeviceID and Event pointer to __tgt_async_info
DeviceID is added for some cases that we only have the __tgt_async_info but do
not know its corresponding device id. However, to communicate with target
plugins, we need that information.

Event is added for another way to synchronize.
2020-06-17 14:29:09 -04:00
Shilei Tian a014fbbc21 [OpenMP] Improve D2D memcpy to use more efficient driver API
Summary:
In current implementation, D2D memcpy is first to copy data back to host and then
copy from host to device. This is very efficient if the device supports D2D
memcpy, like CUDA.

In this patch, D2D memcpy will first try to use native supported driver API. If
it fails, fall back to original way. It is worth noting that D2D memcpy in this
scenerio contains two ideas:
- Same devices: this is the D2D memcpy in the CUDA context.
- Different devices: this is the PeerToPeer memcpy in the CUDA context.
My implementation merges this two parts. It chooses the best API according to
the source device and destination device.

Reviewers: jdoerfert, AndreyChurbanov, grokos

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, sstefan1, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D80649
2020-06-04 16:59:06 -04:00
Manoel Roemmer 6b9e43c67e [Openmp][VE] Libomptarget plugin for NEC SX-Aurora
This patch adds a libomptarget plugin for the NEC SX-Aurora TSUBASA Vector
Engine (VE target).  The code is largely based on the existing generic-elf
plugin and uses the NEC VEO and VEOSINFO libraries for offloading.

Differential Revision: https://reviews.llvm.org/D76843
2020-05-12 10:47:30 +02:00
Joel E. Denny dd5ba4b585 [OpenMP][NFC] Fix `not` sustitution in tests
D78566 introduced a `\bnot\b` lit substitution in OpenMP test suites.
However, that would corrupt a command like
`FileCheck -implicit-check-not` or any file name like `%t.not`.  We
could use lookbehind/lookahead assertions to avoid such cases, but
this patch switches to `%not` (suggested during the D78566 review) as
a safer option.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D79529
2020-05-11 14:53:48 -04:00
Shilei Tian cb038927ef [OpenMP] Fix an issue of wrong return type of DeviceRTLTy::getNumOfDevices
Summary: There is a typo in DeviceRTLTy::getNumOfDevices that the type of its return value is bool. It will lead to a problem of wrong device number returned from omp_get_num_devices.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D79255
2020-05-03 15:59:06 -04:00
Ron Lieberman ee9c53d271 [libomptarget] Initialize reference parameter IsNew within Device::getOrAllocTgtPtr
The two locals IsNew and Pointer_IsNew were uninitialized at declaration, and then passed by
reference to Device.getOrAllocTgtPtr which in turn did not assign on all
paths within the function. This resulted in occasional runtime failures in one application.
Device::getOrAllocTgtPtr will now initialize IsNew to false on entry to function.

Differential Revision: https://reviews.llvm.org/D78744
2020-04-24 15:33:37 -05:00
Joel E. Denny 5f6aa9680c [OpenMP] target_data_begin: fail on device alloc fail
Without this patch, target_data_begin continues after an illegal
mapping or an out-of-memory error on the device.  With this patch, it
terminates the runtime with an error instead.

The new test exercises only illegal mappings.  I didn't think of a
good way to exercise out-of-memory errors from the test suite.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D78170
2020-04-21 17:10:50 -04:00
Joel E. Denny ba942610f6 [OpenMP] Add scaffolding for negative runtime tests
Without this patch, the openmp project's test suites do not appear to
have support for negative tests.  However, D78170 needs to add a test
that an expected runtime failure occurs.

This patch makes `not` visible in all of the openmp project's test
suites.  In all but `libomptarget/test`, it should be possible for a
test author to insert `not` before a use of the lit substitution for
running a test program.  In `libomptarget/test`, that substitution is
target-specific, and its value is `echo` when the target is not
available.  In that case, inserting `not` before a lit substitution
would expect an `echo` fail, so this patch instead defines a separate
lit substitution for expected runtime fails.

Reviewed By: jdoerfert, Hahnfeld

Differential Revision: https://reviews.llvm.org/D78566
2020-04-21 17:10:50 -04:00
Shilei Tian 4031bb982b [OpenMP] Refined CUDA plugin to put all CUDA operations into class
Summary: Current implementation mixed everything up so that there is almost no encapsulation. In this patch, all CUDA related operations are put into a new class DeviceRTLTy and only necessary functions are exposed. In addition, all C++ code now conforms with LLVM code standard, keeping those API functions following C style.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: jfb, yaxunl, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77951
2020-04-13 13:32:46 -04:00
Shilei Tian feed674dec [OpenMP] Introduce stream pool to make sure the correctness of device synchr...
...onization

Summary: In previous patch, in order to optimize performance, we only synchronize once
for each target region. The syncrhonization is via stream synchronization.
However, in the extreme situation, the performce might be bad. Consider the
following case: There is a task that requires transferring huge amount of data
(call many times of data transferring function). It is scheduled to the first
stream. And then we have 255 very light tasks scheduled to the remaining 255
streams (by default we have 256 streams). They can be finished before we do
synchronization at the end of the first task. Next, we get another very huge
task. It will be scheduled again to the first stream. Now the first task
finishes its kernel launch and call stream synchronization. Right now, the
stream already contains two kernels, and the synchronization will wait until the
two kernels finish instead of just the first one for the first task.

In this patch, we introduce stream pool. After each synchronization, the stream
will be returned back to the pool to make sure that for each synchronization,
only expected operations are waited.

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: gregrodgers, yaxunl, lildmh, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77412
2020-04-11 07:08:56 -04:00
Shilei Tian 03ff643d2e [OpenMP] Put old APIs back and added new _async series for backward compatibility
Summary: According to comments on bi-weekly meeting, this patch put back old APIs and added new `_async` series

Reviewers: jdoerfert

Reviewed By: jdoerfert

Subscribers: yaxunl, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77822
2020-04-09 22:40:58 -04:00
Shilei Tian 32ed29271f [OpenMP] Optimized stream selection by scheduling data mapping for the same target region into a same stream
Summary:
This patch introduces two things for offloading:
1. Asynchronous data transferring: those functions are suffix with `_async`. They have one more argument compared with their synchronous counterparts: `__tgt_async_info*`, which is a new struct that only has one field, `void *Identifier`. This struct is for information exchange between different asynchronous operations. It can be used for stream selection, like in this case, or operation synchronization, which is also used. We may expect more usages in the future.
2. Optimization of stream selection for data mapping. Previous implementation was using asynchronous device memory transfer but synchronizing after each memory transfer. Actually, if we say kernel A needs four memory copy to device and two memory copy back to host, then we can schedule these seven operations (four H2D, two D2H, and one kernel launch) into a same stream and just need synchronization after memory copy from device to host. In this way, we can save a huge overhead compared with synchronization after each operation.

Reviewers: jdoerfert, ye-luo

Reviewed By: jdoerfert

Subscribers: yaxunl, lildmh, guansong, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77005
2020-04-07 14:55:47 -04:00
Kazuaki Ishizaki 4201679110 [OpenMP] NFC: Fix trivial typo
Differential Revision: https://reviews.llvm.org/D77430
2020-04-04 12:06:54 +09:00
JonChesterfield 09834f9761 [libomptarget][nfc] Move non-freestanding headers out of common
Summary:
[libomptarget][nfc] Move non-freestanding headers out of common

Lowers the bar for building deviceRTL.
Drops math.h entirely as it wasn't used and libm is a big dependency.

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: jdoerfert

Subscribers: jvesely, openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D77071
2020-03-31 23:43:18 +01:00
Jon Chesterfield 856c995436 [libomptarget] Add missing elf_end call in elf_common.c
Summary:
[libomptarget] Add missing elf_end call in elf_common.c
Noticed when reviewing D76843.

Reviewers: simoll, jdoerfert, efocht, AndreyChurbanov, grokos, manorom

Reviewed By: grokos

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D76874
2020-03-26 19:07:33 +00:00
JonChesterfield 0813f41005 [libomptarget][nfc] Explicitly static function scope shared variables
Summary:
[libomptarget][nfc] Explicitly static function scope shared variables

`__shared__` in CUDA implies static in function scope. See e.g. D.2.1.1
in CUDA_C_Programming_Guide.pdf,
http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/

This is surprising for non-cuda developers, see e.g. D73239 where I thought
local variables would be thread local.

Tested by IR diff of libomptarget.bc (no change), running in tree tests,
and binary diff of the nvcc static archives (no significant change).

Reviewers: jdoerfert, ABataev, grokos

Reviewed By: jdoerfert

Subscribers: openmp-commits

Tags: #openmp

Differential Revision: https://reviews.llvm.org/D76713
2020-03-24 18:51:50 +00:00
JonChesterfield 298527587c [libomptarget][nfc] Disable amdgcn rtl build. The cmake logic for finding llvm is misbehaving. 2020-03-21 00:01:03 +00:00
George Rokos 0a42c9bfe4 Enable CUDA offloading on aarch64 host
Differential Revision: https://reviews.llvm.org/D76469
2020-03-20 15:38:47 -07:00
Tom Scogland a23d7282ca openmp: fix memcpy memory leak
Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D72637
2020-03-12 23:24:16 -05:00