git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@13582 f3b2605a-c512-4ea7-a41b-209d697bcdaa

This commit is contained in:
sjplimp 2015-07-14 19:55:42 +00:00
parent 26b23a47cd
commit 47e13d72a8
2 changed files with 123 additions and 121 deletions

View File

@ -16,7 +16,8 @@
</H4>
<P>The KOKKOS package was developed primaritly by Christian Trott
(Sandia) with contributions of various styles by others, including
Sikandar Mashayak (UIUC). The underlying Kokkos library was written
Sikandar Mashayak (UIUC), Stan Moore (Sandia), and Ray Shan (Sandia).
The underlying Kokkos library was written
primarily by Carter Edwards, Christian Trott, and Dan Sunderland (all
Sandia).
</P>
@ -25,7 +26,8 @@ that use data structures and macros provided by the Kokkos library,
which is included with LAMMPS in lib/kokkos.
</P>
<P>The Kokkos library is part of
<A HREF = "http://trilinos.sandia.gov/packages/kokkos">Trilinos</A> and is a
<A HREF = "http://trilinos.sandia.gov/packages/kokkos">Trilinos</A> and can also
be downloaded from <A HREF = "https://github.com/kokkos/kokkos">Github</A>. Kokkos is a
templated C++ library that provides two key abstractions for an
application like LAMMPS. First, it allows a single implementation of
an application kernel (e.g. a pair style) to run efficiently on
@ -71,10 +73,10 @@ mode.
</P>
<P>Here is a quick overview of how to use the KOKKOS package:
</P>
<UL><LI>specify variables and settings in your Makefile.machine that enable OpenMP, GPU, or Phi support
<LI>include the KOKKOS package and build LAMMPS
<LI>enable the KOKKOS package and its hardware options via the "-k on" command-line switch
<LI>use KOKKOS styles in your input script
<UL><LI>specify variables and settings in your Makefile.machine that enable
<LI>OpenMP, GPU, or Phi support include the KOKKOS package and build
<LI>LAMMPS enable the KOKKOS package and its hardware options via the "-k
<LI>on" command-line switch use KOKKOS styles in your input script
</UL>
<P>The latter two steps can be done using the "-k on", "-pk kokkos" and
"-sf kk" <A HREF = "Section_start.html#start_7">command-line switches</A>
@ -87,10 +89,11 @@ kk</A> commands respectively to your input script.
<P>The KOKKOS package can be used to build and run LAMMPS on the
following kinds of hardware:
</P>
<UL><LI>CPU-only: one MPI task per CPU core (MPI-only, but using KOKKOS styles)
<LI>CPU-only: one or a few MPI tasks per node with additional threading via OpenMP
<LI>Phi: on one or more Intel Phi coprocessors (per node)
<LI>GPU: on the GPUs of a node with additional OpenMP threading on the CPUs
<UL><LI>CPU-only: one MPI task per CPU core (MPI-only, but using KOKKOS
<LI>styles) CPU-only: one or a few MPI tasks per node with additional
<LI>threading via OpenMP Phi: on one or more Intel Phi coprocessors (per
<LI>node) GPU: on the GPUs of a node with additional OpenMP threading on
<LI>the CPUs
</UL>
<P>Note that Intel Xeon Phi coprocessors are supported in "native" mode,
not "offload" mode like the USER-INTEL package supports.
@ -130,19 +133,19 @@ Make.py -p kokkos -kokkos phi -o kokkos_phi file mpi
</P>
<PRE>cd lammps/src
make yes-kokkos
make g++ OMP=yes
make g++ KOKKOS_DEVICES=OpenMP
</PRE>
<P>Intel Xeon Phi:
</P>
<PRE>cd lammps/src
make yes-kokkos
make g++ OMP=yes MIC=yes
make g++ KOKKOS_DEVICES=OpenMP KOKKOS_ARCH=KNC
</PRE>
<P>CPUs and GPUs:
</P>
<PRE>cd lammps/src
make yes-kokkos
make cuda CUDA=yes
make cuda KOKKOS_DEVICES=Cuda
</PRE>
<P>These examples set the KOKKOS-specific OMP, MIC, CUDA variables on the
make command line which requires a GNU-compatible make command. Try
@ -159,7 +162,7 @@ options.
makefile, e.g. src/MAKE/Makefile.g++ in the first two examples above,
with a line like:
</P>
<PRE>MIC = yes
<PRE>KOKKOS_ARCH = KNC
</PRE>
<P>Note that if you build LAMMPS multiple times in this manner, using
different KOKKOS options (defined in different machine makefiles), you
@ -170,9 +173,9 @@ because the targets will be different.
machine makefile, in this case src/MAKE/Makefile.cuda, which is
included in the LAMMPS distribution. To build the KOKKOS package for
a GPU, this makefile must use the NVIDA "nvcc" compiler. And it must
have a CCFLAGS -arch setting that is appropriate for your NVIDIA
hardware and installed software. Typical values for -arch are given
in <A HREF = "Section_start.html#start_3_4">Section 2.3.4</A> of the manual, as well
have a KOKKOS_ARCH setting that is appropriate for your NVIDIA
hardware and installed software. Typical values for KOKKOS_ARCH are given
below, as well
as other settings that must be included in the machine makefile, if
you create your own.
</P>
@ -183,36 +186,32 @@ double precision.
<P>There are other allowed options when building with the KOKKOS package.
As above, they can be set either as variables on the make command line
or in Makefile.machine. This is the full list of options, including
those discussed above, Each takes a value of <I>yes</I> or <I>no</I>. The
those discussed above, Each takes a value shown below. The
default value is listed, which is set in the
lib/kokkos/Makefile.lammps file.
lib/kokkos/Makefile.kokkos file.
</P>
<UL><LI>OMP, default = <I>yes</I>
<LI>CUDA, default = <I>no</I>
<LI>HWLOC, default = <I>no</I>
<LI>AVX, default = <I>no</I>
<LI>MIC, default = <I>no</I>
<LI>LIBRT, default = <I>no</I>
<LI>DEBUG, default = <I>no</I>
<P>#Default settings specific options
#Options: force_uvm,use_ldg,rdc
</P>
<UL><LI>KOKKOS_DEVICES, values = <I>OpenMP</I>, <I>Serial</I>, <I>Pthreads</I>, <I>Cuda</I>, default = <I>OpenMP</I>
<LI>KOKKOS_ARCH, values = <I>KNC</I>, <I>SNB</I>, <I>HSW</I>, <I>Kepler</I>, <I>Kepler30</I>, <I>Kepler32</I>, <I>Kepler35</I>,
<LI><I>Kepler37</I>, <I>Maxwell</I>, <I>Maxwell50</I>, <I>Maxwell52</I>, <I>Maxwell53</I>, <I>ARMv8</I>, <I>BGQ</I>, <I>Power7</I>, <I>Power8</I>,
<LI>default = <I>none</I>
<LI>KOKKOS_DEBUG, values = <I>yes</I>, <I>no</I>, default = <I>no</I>
<LI>KOKKOS_USE_TPLS, values = <I>hwloc</I>, <I>librt</I>, default = <I>none</I>
<LI>KOKKOS_CUDA_OPTIONS, values = <I>force_uvm</I>, <I>use_ldg</I>, <I>rdc</I>
</UL>
<P>OMP sets the parallelization method used for Kokkos code (within
LAMMPS) that runs on the host. OMP=yes means that OpenMP will be
used. OMP=no means that pthreads will be used.
<P>KOKKOS_DEVICE sets the parallelization method used for Kokkos code (within
LAMMPS). KOKKOS_DEVICES=OpenMP means that OpenMP will be
used. KOKKOS_DEVICES=Pthreads means that pthreads will be used.
KOKKOS_DEVICES=Cuda means an NVIDIA GPU running
CUDA will be used.
</P>
<P>CUDA sets the parallelization method used for Kokkos code (within
LAMMPS) that runs on the device. CUDA=yes means an NVIDIA GPU running
CUDA will be used. CUDA=no means that the OMP=yes or OMP=no setting
will be used for the device as well as the host.
</P>
<P>If CUDA=yes, then the lo-level Makefile in the src/MAKE directory must
use "nvcc" as its compiler, via its CC setting. For best performance
its CCFLAGS setting should use -O3 and have an -arch setting that
matches the compute capability of your NVIDIA hardware and software
installation, e.g. -arch=sm_20. Generally Fermi Generation GPUs are
sm_20, while Kepler generation GPUs are sm_30 or sm_35 and Maxwell
cards are sm_50. A complete list can be found on
<A HREF = "http://en.wikipedia.org/wiki/CUDA#Supported_GPUs">wikipedia</A>. You can
also use the deviceQuery tool that comes with the CUDA samples. Note
<P>If KOKKOS_DEVICES=Cuda, then the lo-level Makefile in the src/MAKE
directory must use "nvcc" as its compiler, via its CC setting. For
best performance its CCFLAGS setting should use -O3 and have a
KOKKOS_ARCH setting that matches the compute capability of your NVIDIA
hardware and software installation, e.g. KOKKOS_ARCH=Kepler30. Note
the minimal required compute capability is 2.0, but this will give
signicantly reduced performance compared to Kepler generation GPUs
with compute capability 3.x. For the LINK setting, "nvcc" should not
@ -223,28 +222,30 @@ also have a "Compilation rule" for creating *.o files from *.cu files.
See src/Makefile.cuda for an example of a lo-level Makefile with all
of these settings.
</P>
<P>HWLOC binds threads to hardware cores, so they do not migrate during a
simulation. HWLOC=yes should always be used if running with OMP=no
for pthreads. It is not necessary for OMP=yes for OpenMP, because
OpenMP provides alternative methods via environment variables for
binding threads to hardware cores. More info on binding threads to
cores is given in <A HREF = "Section_accelerate.html#acc_8">this section</A>.
<P>KOKKOS_USE_TPLS=hwloc binds threads to hardware cores, so they do not
migrate during a simulation. KOKKOS_USE_TPLS=hwloc should always be
used if running with KOKKOS_DEVICES=Pthreads for pthreads. It is not
necessary for KOKKOS_DEVICES=OpenMP for OpenMP, because OpenMP
provides alternative methods via environment variables for binding
threads to hardware cores. More info on binding threads to cores is
given in <A HREF = "Section_accelerate.html#acc_8">this section</A>.
</P>
<P>AVX enables Intel advanced vector extensions when compiling for an
Intel-compatible chip. AVX=yes should only be set if your host
hardware supports AVX. If it does not support it, this will cause a
run-time crash.
<P>KOKKOS_ARCH=KNC enables compiler switches needed when compling for an
Intel Phi processor.
</P>
<P>MIC enables compiler switches needed when compling for an Intel Phi
processor.
<P>KOKKOS_USE_TPLS=librt enables use of a more accurate timer mechanism
on most Unix platforms. This library is not available on all
platforms.
</P>
<P>LIBRT enables use of a more accurate timer mechanism on most Unix
platforms. This library is not available on all platforms.
<P>KOKKOS_DEBUG is only useful when developing a Kokkos-enabled style
within LAMMPS. KOKKOS_DEBUG=yes enables printing of run-time
debugging information that can be useful. It also enables runtime
bounds checking on Kokkos data structures.
</P>
<P>DEBUG is only useful when developing a Kokkos-enabled style within
LAMMPS. DEBUG=yes enables printing of run-time debugging information
that can be useful. It also enables runtime bounds checking on Kokkos
data structures.
<P>KOKKOS_CUDA_OPTIONS are additional options for CUDA.
</P>
<P>For more information on Kokkos see the Kokkos programmers' guide here:
/lib/kokkos/doc/Kokkos_PG.pdf.
</P>
<P><B>Run with the KOKKOS package from the command line:</B>
</P>

View File

@ -13,7 +13,8 @@
The KOKKOS package was developed primaritly by Christian Trott
(Sandia) with contributions of various styles by others, including
Sikandar Mashayak (UIUC). The underlying Kokkos library was written
Sikandar Mashayak (UIUC), Stan Moore (Sandia), and Ray Shan (Sandia).
The underlying Kokkos library was written
primarily by Carter Edwards, Christian Trott, and Dan Sunderland (all
Sandia).
@ -22,7 +23,8 @@ that use data structures and macros provided by the Kokkos library,
which is included with LAMMPS in lib/kokkos.
The Kokkos library is part of
"Trilinos"_http://trilinos.sandia.gov/packages/kokkos and is a
"Trilinos"_http://trilinos.sandia.gov/packages/kokkos and can also
be downloaded from "Github"_https://github.com/kokkos/kokkos. Kokkos is a
templated C++ library that provides two key abstractions for an
application like LAMMPS. First, it allows a single implementation of
an application kernel (e.g. a pair style) to run efficiently on
@ -68,10 +70,10 @@ mode.
Here is a quick overview of how to use the KOKKOS package:
specify variables and settings in your Makefile.machine that enable OpenMP, GPU, or Phi support
include the KOKKOS package and build LAMMPS
enable the KOKKOS package and its hardware options via the "-k on" command-line switch
use KOKKOS styles in your input script :ul
specify variables and settings in your Makefile.machine that enable
OpenMP, GPU, or Phi support include the KOKKOS package and build
LAMMPS enable the KOKKOS package and its hardware options via the "-k
on" command-line switch use KOKKOS styles in your input script :ul
The latter two steps can be done using the "-k on", "-pk kokkos" and
"-sf kk" "command-line switches"_Section_start.html#start_7
@ -84,10 +86,11 @@ kk"_suffix.html commands respectively to your input script.
The KOKKOS package can be used to build and run LAMMPS on the
following kinds of hardware:
CPU-only: one MPI task per CPU core (MPI-only, but using KOKKOS styles)
CPU-only: one or a few MPI tasks per node with additional threading via OpenMP
Phi: on one or more Intel Phi coprocessors (per node)
GPU: on the GPUs of a node with additional OpenMP threading on the CPUs :ul
CPU-only: one MPI task per CPU core (MPI-only, but using KOKKOS
styles) CPU-only: one or a few MPI tasks per node with additional
threading via OpenMP Phi: on one or more Intel Phi coprocessors (per
node) GPU: on the GPUs of a node with additional OpenMP threading on
the CPUs :ul
Note that Intel Xeon Phi coprocessors are supported in "native" mode,
not "offload" mode like the USER-INTEL package supports.
@ -127,19 +130,19 @@ CPU-only (run all-MPI or with OpenMP threading):
cd lammps/src
make yes-kokkos
make g++ OMP=yes :pre
make g++ KOKKOS_DEVICES=OpenMP :pre
Intel Xeon Phi:
cd lammps/src
make yes-kokkos
make g++ OMP=yes MIC=yes :pre
make g++ KOKKOS_DEVICES=OpenMP KOKKOS_ARCH=KNC :pre
CPUs and GPUs:
cd lammps/src
make yes-kokkos
make cuda CUDA=yes :pre
make cuda KOKKOS_DEVICES=Cuda :pre
These examples set the KOKKOS-specific OMP, MIC, CUDA variables on the
make command line which requires a GNU-compatible make command. Try
@ -156,7 +159,7 @@ You can also hardwire these make variables in the specified machine
makefile, e.g. src/MAKE/Makefile.g++ in the first two examples above,
with a line like:
MIC = yes :pre
KOKKOS_ARCH = KNC :pre
Note that if you build LAMMPS multiple times in this manner, using
different KOKKOS options (defined in different machine makefiles), you
@ -167,9 +170,9 @@ IMPORTANT NOTE: The 3rd example above for a GPU, uses a different
machine makefile, in this case src/MAKE/Makefile.cuda, which is
included in the LAMMPS distribution. To build the KOKKOS package for
a GPU, this makefile must use the NVIDA "nvcc" compiler. And it must
have a CCFLAGS -arch setting that is appropriate for your NVIDIA
hardware and installed software. Typical values for -arch are given
in "Section 2.3.4"_Section_start.html#start_3_4 of the manual, as well
have a KOKKOS_ARCH setting that is appropriate for your NVIDIA
hardware and installed software. Typical values for KOKKOS_ARCH are given
below, as well
as other settings that must be included in the machine makefile, if
you create your own.
@ -180,36 +183,32 @@ double precision.
There are other allowed options when building with the KOKKOS package.
As above, they can be set either as variables on the make command line
or in Makefile.machine. This is the full list of options, including
those discussed above, Each takes a value of {yes} or {no}. The
those discussed above, Each takes a value shown below. The
default value is listed, which is set in the
lib/kokkos/Makefile.lammps file.
lib/kokkos/Makefile.kokkos file.
OMP, default = {yes}
CUDA, default = {no}
HWLOC, default = {no}
AVX, default = {no}
MIC, default = {no}
LIBRT, default = {no}
DEBUG, default = {no} :ul
#Default settings specific options
#Options: force_uvm,use_ldg,rdc
OMP sets the parallelization method used for Kokkos code (within
LAMMPS) that runs on the host. OMP=yes means that OpenMP will be
used. OMP=no means that pthreads will be used.
KOKKOS_DEVICES, values = {OpenMP}, {Serial}, {Pthreads}, {Cuda}, default = {OpenMP}
KOKKOS_ARCH, values = {KNC}, {SNB}, {HSW}, {Kepler}, {Kepler30}, {Kepler32}, {Kepler35},
{Kepler37}, {Maxwell}, {Maxwell50}, {Maxwell52}, {Maxwell53}, {ARMv8}, {BGQ}, {Power7}, {Power8},
default = {none}
KOKKOS_DEBUG, values = {yes}, {no}, default = {no}
KOKKOS_USE_TPLS, values = {hwloc}, {librt}, default = {none}
KOKKOS_CUDA_OPTIONS, values = {force_uvm}, {use_ldg}, {rdc} :ul
CUDA sets the parallelization method used for Kokkos code (within
LAMMPS) that runs on the device. CUDA=yes means an NVIDIA GPU running
CUDA will be used. CUDA=no means that the OMP=yes or OMP=no setting
will be used for the device as well as the host.
KOKKOS_DEVICE sets the parallelization method used for Kokkos code (within
LAMMPS). KOKKOS_DEVICES=OpenMP means that OpenMP will be
used. KOKKOS_DEVICES=Pthreads means that pthreads will be used.
KOKKOS_DEVICES=Cuda means an NVIDIA GPU running
CUDA will be used.
If CUDA=yes, then the lo-level Makefile in the src/MAKE directory must
use "nvcc" as its compiler, via its CC setting. For best performance
its CCFLAGS setting should use -O3 and have an -arch setting that
matches the compute capability of your NVIDIA hardware and software
installation, e.g. -arch=sm_20. Generally Fermi Generation GPUs are
sm_20, while Kepler generation GPUs are sm_30 or sm_35 and Maxwell
cards are sm_50. A complete list can be found on
"wikipedia"_http://en.wikipedia.org/wiki/CUDA#Supported_GPUs. You can
also use the deviceQuery tool that comes with the CUDA samples. Note
If KOKKOS_DEVICES=Cuda, then the lo-level Makefile in the src/MAKE
directory must use "nvcc" as its compiler, via its CC setting. For
best performance its CCFLAGS setting should use -O3 and have a
KOKKOS_ARCH setting that matches the compute capability of your NVIDIA
hardware and software installation, e.g. KOKKOS_ARCH=Kepler30. Note
the minimal required compute capability is 2.0, but this will give
signicantly reduced performance compared to Kepler generation GPUs
with compute capability 3.x. For the LINK setting, "nvcc" should not
@ -220,28 +219,30 @@ also have a "Compilation rule" for creating *.o files from *.cu files.
See src/Makefile.cuda for an example of a lo-level Makefile with all
of these settings.
HWLOC binds threads to hardware cores, so they do not migrate during a
simulation. HWLOC=yes should always be used if running with OMP=no
for pthreads. It is not necessary for OMP=yes for OpenMP, because
OpenMP provides alternative methods via environment variables for
binding threads to hardware cores. More info on binding threads to
cores is given in "this section"_Section_accelerate.html#acc_8.
KOKKOS_USE_TPLS=hwloc binds threads to hardware cores, so they do not
migrate during a simulation. KOKKOS_USE_TPLS=hwloc should always be
used if running with KOKKOS_DEVICES=Pthreads for pthreads. It is not
necessary for KOKKOS_DEVICES=OpenMP for OpenMP, because OpenMP
provides alternative methods via environment variables for binding
threads to hardware cores. More info on binding threads to cores is
given in "this section"_Section_accelerate.html#acc_8.
AVX enables Intel advanced vector extensions when compiling for an
Intel-compatible chip. AVX=yes should only be set if your host
hardware supports AVX. If it does not support it, this will cause a
run-time crash.
KOKKOS_ARCH=KNC enables compiler switches needed when compling for an
Intel Phi processor.
MIC enables compiler switches needed when compling for an Intel Phi
processor.
KOKKOS_USE_TPLS=librt enables use of a more accurate timer mechanism
on most Unix platforms. This library is not available on all
platforms.
LIBRT enables use of a more accurate timer mechanism on most Unix
platforms. This library is not available on all platforms.
KOKKOS_DEBUG is only useful when developing a Kokkos-enabled style
within LAMMPS. KOKKOS_DEBUG=yes enables printing of run-time
debugging information that can be useful. It also enables runtime
bounds checking on Kokkos data structures.
DEBUG is only useful when developing a Kokkos-enabled style within
LAMMPS. DEBUG=yes enables printing of run-time debugging information
that can be useful. It also enables runtime bounds checking on Kokkos
data structures.
KOKKOS_CUDA_OPTIONS are additional options for CUDA.
For more information on Kokkos see the Kokkos programmers' guide here:
/lib/kokkos/doc/Kokkos_PG.pdf.
[Run with the KOKKOS package from the command line:]