109 lines
4.1 KiB
ReStructuredText
109 lines
4.1 KiB
ReStructuredText
|
========================================
|
||
|
Symmetric Communication Interface (SCIF)
|
||
|
========================================
|
||
|
|
||
|
The Symmetric Communication Interface (SCIF (pronounced as skiff)) is a low
|
||
|
level communications API across PCIe currently implemented for MIC. Currently
|
||
|
SCIF provides inter-node communication within a single host platform, where a
|
||
|
node is a MIC Coprocessor or Xeon based host. SCIF abstracts the details of
|
||
|
communicating over the PCIe bus while providing an API that is symmetric
|
||
|
across all the nodes in the PCIe network. An important design objective for SCIF
|
||
|
is to deliver the maximum possible performance given the communication
|
||
|
abilities of the hardware. SCIF has been used to implement an offload compiler
|
||
|
runtime and OFED support for MPI implementations for MIC coprocessors.
|
||
|
|
||
|
SCIF API Components
|
||
|
===================
|
||
|
|
||
|
The SCIF API has the following parts:
|
||
|
|
||
|
1. Connection establishment using a client server model
|
||
|
2. Byte stream messaging intended for short messages
|
||
|
3. Node enumeration to determine online nodes
|
||
|
4. Poll semantics for detection of incoming connections and messages
|
||
|
5. Memory registration to pin down pages
|
||
|
6. Remote memory mapping for low latency CPU accesses via mmap
|
||
|
7. Remote DMA (RDMA) for high bandwidth DMA transfers
|
||
|
8. Fence APIs for RDMA synchronization
|
||
|
|
||
|
SCIF exposes the notion of a connection which can be used by peer processes on
|
||
|
nodes in a SCIF PCIe "network" to share memory "windows" and to communicate. A
|
||
|
process in a SCIF node initiates a SCIF connection to a peer process on a
|
||
|
different node via a SCIF "endpoint". SCIF endpoints support messaging APIs
|
||
|
which are similar to connection oriented socket APIs. Connected SCIF endpoints
|
||
|
can also register local memory which is followed by data transfer using either
|
||
|
DMA, CPU copies or remote memory mapping via mmap. SCIF supports both user and
|
||
|
kernel mode clients which are functionally equivalent.
|
||
|
|
||
|
SCIF Performance for MIC
|
||
|
========================
|
||
|
|
||
|
DMA bandwidth comparison between the TCP (over ethernet over PCIe) stack versus
|
||
|
SCIF shows the performance advantages of SCIF for HPC applications and
|
||
|
runtimes::
|
||
|
|
||
|
Comparison of TCP and SCIF based BW
|
||
|
|
||
|
Throughput (GB/sec)
|
||
|
8 + PCIe Bandwidth ******
|
||
|
+ TCP ######
|
||
|
7 + ************************************** SCIF %%%%%%
|
||
|
| %%%%%%%%%%%%%%%%%%%
|
||
|
6 + %%%%
|
||
|
| %%
|
||
|
| %%%
|
||
|
5 + %%
|
||
|
| %%
|
||
|
4 + %%
|
||
|
| %%
|
||
|
3 + %%
|
||
|
| %
|
||
|
2 + %%
|
||
|
| %%
|
||
|
| %
|
||
|
1 +
|
||
|
+ ######################################
|
||
|
0 +++---+++--+--+-+--+--+-++-+--+-++-+--+-++-+-
|
||
|
1 10 100 1000 10000 100000
|
||
|
Transfer Size (KBytes)
|
||
|
|
||
|
SCIF allows memory sharing via mmap(..) between processes on different PCIe
|
||
|
nodes and thus provides bare-metal PCIe latency. The round trip SCIF mmap
|
||
|
latency from the host to an x100 MIC for an 8 byte message is 0.44 usecs.
|
||
|
|
||
|
SCIF has a user space library which is a thin IOCTL wrapper providing a user
|
||
|
space API similar to the kernel API in scif.h. The SCIF user space library
|
||
|
is distributed @ https://software.intel.com/en-us/mic-developer
|
||
|
|
||
|
Here is some pseudo code for an example of how two applications on two PCIe
|
||
|
nodes would typically use the SCIF API::
|
||
|
|
||
|
Process A (on node A) Process B (on node B)
|
||
|
|
||
|
/* get online node information */
|
||
|
scif_get_node_ids(..) scif_get_node_ids(..)
|
||
|
scif_open(..) scif_open(..)
|
||
|
scif_bind(..) scif_bind(..)
|
||
|
scif_listen(..)
|
||
|
scif_accept(..) scif_connect(..)
|
||
|
/* SCIF connection established */
|
||
|
|
||
|
/* Send and receive short messages */
|
||
|
scif_send(..)/scif_recv(..) scif_send(..)/scif_recv(..)
|
||
|
|
||
|
/* Register memory */
|
||
|
scif_register(..) scif_register(..)
|
||
|
|
||
|
/* RDMA */
|
||
|
scif_readfrom(..)/scif_writeto(..) scif_readfrom(..)/scif_writeto(..)
|
||
|
|
||
|
/* Fence DMAs */
|
||
|
scif_fence_signal(..) scif_fence_signal(..)
|
||
|
|
||
|
mmap(..) mmap(..)
|
||
|
|
||
|
/* Access remote registered memory */
|
||
|
|
||
|
/* Close the endpoints */
|
||
|
scif_close(..) scif_close(..)
|