6.4 KiB
Overview
The indirect clause enables indirect device invocation for a procedure:
19 An indirect call to the device version of a procedure on a device other than the host
20 device, through a function pointer (C/C++), a pointer to a member function (C++) or
21 a procedure pointer (Fortran) that refers to the host version of the procedure.
Compiler support
Offload entry metadata (C++ FE)
For each function declared as declare target indirect C++ FE generates the following offload metadata:
// Entry 0 -> Kind of this type of metadata (2)
// Entry 1 -> Mangled name of the function.
// Entry 2 -> Order the entry was created.
The offloading metadata uses new OffloadEntriesInfoManagerTy::OffloadingEntryInfoKinds::OffloadingEntryInfoDeviceIndirectFunc
metadata kind.
Offload entries table
The offload entries table that is created for the host and for each of the device images currently have entries for declare target global variables, omp target outlined functions and constructor/destructor thunks for declare target global variables.
Compiler will also produce an entry for each procedure listed in indirect clause of declare target construct:
struct __tgt_offload_entry {
void *addr; // Pointer to the function
char *name; // Name of the function
size_t size; // 0 for function
int32_t flags; // OpenMPOffloadingDeclareTargetFlags::OMP_DECLARE_TARGET_FPTR
int32_t reserved; // Reserved
};
Run-time dispatch in device code
When an indirect function call is generated by a FE in device code it translates the original function pointer (which may be an address of a host function) into the device function pointer using a translation API, and uses the resulting function pointer for the call.
Original call code:
%0 = load void ()*, void ()** %fptr.addr
call void %0()
Becomes this:
%0 = load void ()*, void ()** %fptr.addr
%1 = bitcast void ()* %0 to i8*
%call = call i8* @__kmpc_target_translate_fptr(i8* %1)
%fptr_device = bitcast i8* %call to void ()*
call void %fptr_device()
Device RTLs must provide the translation API:
// Translate \p FnPtr identifying a host function into a function pointer
// identifying its device counterpart.
// If \p FnPtr matches an address of any host function
// declared as 'declare target indirect', then the API
// must return an address of the same function compiled
// for the device. If \p FnPtr does not match an address
// of any host function, then the API returns \p FnPtr
// unchanged.
EXTERN void *__kmpc_target_translate_fptr(void *FnPtr);
Runtime handling of function pointers
OpenMPOffloadingDeclareTargetFlags::OMP_DECLARE_TARGET_FPTR
is a new flag to distinguish offload entries for function pointers from other function entries. Unlike other function entries (with size
equal to 0) omptarget::InitLibrary()
will establish mapping for function pointer entries in Device.HostDataToTargetMap
.
For each OMP_DECLARE_TARGET_FPTR
entry in the offload entries table libomptarget
creates an entry of the following type:
struct __omp_offloading_fptr_map_ty {
int64_t host_ptr; // key
int64_t tgt_ptr; // value
};
Where host_ptr
is __tgt_offload_entry::addr
in a host offload entry, and tgt_ptr
is __tgt_offload_entry::addr
in the corresponding device offload entry (which may be found using the populated Device.HostDataToTargetMap
).
When all __omp_offloading_function_ptr_map_ty
entries are collected in a single host array, libomptarget
sorts the table by host_ptr
values and passes it to the device plugin for registration, if plugin supports optional __tgt_rtl_set_function_ptr_map
API.
Plugins may provide the following API, if they want to support declare target indirect functionality:
// Register in a target implementation defined way a table
// of __omp_offloading_function_ptr_map_ty entries providing
// mapping between host and device addresses of 'declare target indirect'
// functions. \p table_size is the number of elements in \p table_host_ptr
// array.
EXTERN void __tgt_rtl_set_function_ptr_map(
int32_t device_id, uint64_t table_size, __omp_offloading_fptr_map_ty *table_host_ptr);
Sample implementation
This section describes one of potential implementations.
A FE may define the following global symbols for each translation module containing declare target indirect, when compiling this module for a device:
// Mapping between host and device functions declared as
// 'declare target indirect'.
__attribute__((weak)) struct __omp_offloading_fptr_map_ty {
int64_t host_ptr; // key
int64_t tgt_ptr; // value
} *__omp_offloading_fptr_map_p = 0;
// Number of elements in __omp_offloading_fptr_map_p table.
__attribute__((weak)) uint64_t __omp_offloading_fptr_map_size = 0;
__tgt_rtl_set_function_ptr_map(int32_t device_id, uint64_t table_size, __omp_offloading_fptr_map_ty *table_host_ptr)
allocates device memory of size sizeof(__omp_offloading_fptr_map_ty) * table_size
, and transfers the contents of table_host_ptr
array into this device memory. An address of the allocated device memory area is then assigned to __omp_offloading_fptr_map_p
global variables on the device. For example, in CUDA, a device address of __omp_offloading_fptr_map_p
may be taken by calling cuModuleGetGlobal
, and then a pointer-sized data transfer will initialize __omp_offloading_fptr_map_p
to point to the device copy of table_host_ptr
array. __omp_offloading_fptr_map_size
is assigned to table_size
the same way.
An alternative implementation of __tgt_rtl_set_function_ptr_map
may invoke a device kernel that will do the assignments.
__kmpc_target_translate_fptr(void *FnPtr)
API uses binary search to match FnPtr
against host_ptr
inside the device table pointed to by __omp_offloading_fptr_map_p
. If the matching key is found, it returns the corresponding tgt_ptr
, otherwise, it returns FnPtr
.
TODO: Optimization for non-unified_shared_memory
If a program does not use required unified_shared_memory, and all function pointers are mapped (not a requirement by OpenMP spec), then an implementation may avoid the runtime dispatch code for indirect function calls (i.e. __kmpc_target_translate_fptr
is not needed) and also __tgt_rtl_set_function_ptr_map
is not needed. libomptarget
will just map the function pointers as regular data pointers via Device.HostDataToTargetMap
.