forked from lijiext/lammps
73 lines
3.2 KiB
Plaintext
73 lines
3.2 KiB
Plaintext
/*!
|
|
\mainpage Trilinos/Kokkos: Shared-memory programming interface and computational kernels
|
|
|
|
\section Kokkos_Intro Introduction
|
|
|
|
The %Kokkos package has two main components. The first, sometimes
|
|
called "%Kokkos Array" or just "%Kokkos," implements a
|
|
performance-portable shared-memory parallel programming model and data
|
|
containers. The second, called "%Kokkos Classic," consists of
|
|
computational kernels that support the %Tpetra package.
|
|
|
|
\section Kokkos_Kokkos The %Kokkos programming model
|
|
|
|
%Kokkos implements a performance-portable shared-memory parallel
|
|
programming model and data containers. It lets you write an algorithm
|
|
once, and just change a template parameter to get the optimal data
|
|
layout for your hardware. %Kokkos has back-ends for the following
|
|
parallel programming models:
|
|
|
|
- Kokkos::Threads: POSIX Threads (Pthreads)
|
|
- Kokkos::OpenMP: OpenMP
|
|
- Kokkos::Cuda: NVIDIA's CUDA programming model for graphics
|
|
processing units (GPUs)
|
|
- Kokkos::Serial: No thread parallelism
|
|
|
|
%Kokkos also has optimizations for shared-memory parallel systems with
|
|
nonuniform memory access (NUMA). Its containers can hold data of any
|
|
primitive ("plain old") data type (and some aggregate types). %Kokkos
|
|
Array may be used as a stand-alone programming model.
|
|
|
|
%Kokkos' parallel operations include the following:
|
|
|
|
- parallel_for: a thread-parallel "for loop"
|
|
- parallel_reduce: a thread-parallel reduction
|
|
- parallel_scan: a thread-parallel prefix scan operation
|
|
|
|
as well as expert-level platform-independent interfaces to thread
|
|
"teams," per-team "shared memory," synchronization, and atomic update
|
|
operations.
|
|
|
|
%Kokkos' data containers include the following:
|
|
|
|
- Kokkos::View: A multidimensional array suitable for thread-parallel
|
|
operations. Its layout (e.g., row-major or column-major) is
|
|
optimized by default for the particular thread-parallel device.
|
|
- Kokkos::Vector: A drop-in replacement for std::vector that eases
|
|
porting from standard sequential C++ data structures to %Kokkos'
|
|
parallel data structures.
|
|
- Kokkos::UnorderedMap: A parallel lookup table comparable in
|
|
functionality to std::unordered_map.
|
|
|
|
%Kokkos also uses the above basic containers to implement higher-level
|
|
data structures, like sparse graphs and matrices.
|
|
|
|
A good place to start learning about %Kokkos would be <a href="http://trilinos.sandia.gov/events/trilinos_user_group_2013/presentations/2013-11-TUG-Kokkos-Tutorial.pdf">these tutorial slides</a> from the 2013 Trilinos Users' Group meeting.
|
|
|
|
\section Kokkos_Classic %Kokkos Classic
|
|
|
|
"%Kokkos Classic" consists of computational kernels that support the
|
|
%Tpetra package. These kernels include sparse matrix-vector multiply,
|
|
sparse triangular solve, Gauss-Seidel, and dense vector operations.
|
|
They are templated on the type of objects (\c Scalar) on which they
|
|
operate. This component was not meant to be visible to users; it is
|
|
an implementation detail of the %Tpetra distributed linear algebra
|
|
package.
|
|
|
|
%Kokkos Classic also implements a shared-memory parallel programming
|
|
model. This inspired and preceded the %Kokkos programming model
|
|
described in the previous section. Users should consider the %Kokkos
|
|
Classic programming model deprecated, and prefer the new %Kokkos
|
|
programming model.
|
|
*/
|