forked from lijiext/lammps
Doc tweak
This commit is contained in:
parent
618547b72e
commit
073f003470
|
@ -46,7 +46,7 @@ software version 7.5 or later must be installed on your system. See
|
||||||
the discussion for the "GPU package"_Speed_gpu.html for details of how
|
the discussion for the "GPU package"_Speed_gpu.html for details of how
|
||||||
to check and do this.
|
to check and do this.
|
||||||
|
|
||||||
NOTE: Kokkos with CUDA currently implicitly assumes, that the MPI
|
NOTE: Kokkos with CUDA currently implicitly assumes that the MPI
|
||||||
library is CUDA-aware and has support for GPU-direct. This is not
|
library is CUDA-aware and has support for GPU-direct. This is not
|
||||||
always the case, especially when using pre-compiled MPI libraries
|
always the case, especially when using pre-compiled MPI libraries
|
||||||
provided by a Linux distribution. This is not a problem when using
|
provided by a Linux distribution. This is not a problem when using
|
||||||
|
@ -207,19 +207,21 @@ supports.
|
||||||
|
|
||||||
[Running on GPUs:]
|
[Running on GPUs:]
|
||||||
|
|
||||||
Use the "-k" "command-line switch"_Run_options.html to
|
Use the "-k" "command-line switch"_Run_options.html to specify the
|
||||||
specify the number of GPUs per node. Typically the -np setting of the
|
number of GPUs per node. Typically the -np setting of the mpirun command
|
||||||
mpirun command should set the number of MPI tasks/node to be equal to
|
should set the number of MPI tasks/node to be equal to the number of
|
||||||
the number of physical GPUs on the node. You can assign multiple MPI
|
physical GPUs on the node. You can assign multiple MPI tasks to the same
|
||||||
tasks to the same GPU with the KOKKOS package, but this is usually
|
GPU with the KOKKOS package, but this is usually only faster if some
|
||||||
only faster if significant portions of the input script have not
|
portions of the input script have not been ported to use Kokkos. In this
|
||||||
been ported to use Kokkos. Using CUDA MPS is recommended in this
|
case, also packing/unpacking communication buffers on the host may give
|
||||||
scenario. Using a CUDA-aware MPI library with support for GPU-direct
|
speedup (see the KOKKOS "package"_package.html command). Using CUDA MPS
|
||||||
is highly recommended. GPU-direct use can be avoided by using
|
is recommended in this scenario.
|
||||||
"-pk kokkos gpu/direct no"_package.html.
|
|
||||||
As above for multi-core CPUs (and no GPU), if N is the number of
|
Using a CUDA-aware MPI library with
|
||||||
physical cores/node, then the number of MPI tasks/node should not
|
support for GPU-direct is highly recommended. GPU-direct use can be
|
||||||
exceed N.
|
avoided by using "-pk kokkos gpu/direct no"_package.html. As above for
|
||||||
|
multi-core CPUs (and no GPU), if N is the number of physical cores/node,
|
||||||
|
then the number of MPI tasks/node should not exceed N.
|
||||||
|
|
||||||
-k on g Ng :pre
|
-k on g Ng :pre
|
||||||
|
|
||||||
|
|
|
@ -490,10 +490,10 @@ are rebuilt. The data is only for atoms that migrate to new processors.
|
||||||
"Forward" communication happens every timestep. "Reverse" communication
|
"Forward" communication happens every timestep. "Reverse" communication
|
||||||
happens every timestep if the {newton} option is on. The data is for
|
happens every timestep if the {newton} option is on. The data is for
|
||||||
atom coordinates and any other atom properties that needs to be updated
|
atom coordinates and any other atom properties that needs to be updated
|
||||||
for ghost atoms owned by each processor.
|
for ghost atoms owned by each processor.
|
||||||
|
|
||||||
The {comm} keyword is simply a short-cut to set the same value for both
|
The {comm} keyword is simply a short-cut to set the same value for both
|
||||||
the {comm/exchange} and {comm/forward} and {comm/reverse} keywords.
|
the {comm/exchange} and {comm/forward} and {comm/reverse} keywords.
|
||||||
|
|
||||||
The value options for all 3 keywords are {no} or {host} or {device}. A
|
The value options for all 3 keywords are {no} or {host} or {device}. A
|
||||||
value of {no} means to use the standard non-KOKKOS method of
|
value of {no} means to use the standard non-KOKKOS method of
|
||||||
|
@ -501,26 +501,26 @@ packing/unpacking data for the communication. A value of {host} means to
|
||||||
use the host, typically a multi-core CPU, and perform the
|
use the host, typically a multi-core CPU, and perform the
|
||||||
packing/unpacking in parallel with threads. A value of {device} means to
|
packing/unpacking in parallel with threads. A value of {device} means to
|
||||||
use the device, typically a GPU, to perform the packing/unpacking
|
use the device, typically a GPU, to perform the packing/unpacking
|
||||||
operation.
|
operation.
|
||||||
|
|
||||||
The optimal choice for these keywords depends on the input script and
|
The optimal choice for these keywords depends on the input script and
|
||||||
the hardware used. The {no} value is useful for verifying that the
|
the hardware used. The {no} value is useful for verifying that the
|
||||||
Kokkos-based {host} and {device} values are working correctly. It is the
|
Kokkos-based {host} and {device} values are working correctly. It is the
|
||||||
default when running on CPUs since it is usually the fastest.
|
default when running on CPUs since it is usually the fastest.
|
||||||
|
|
||||||
When running on CPUs or Xeon Phi, the {host} and {device} values work
|
When running on CPUs or Xeon Phi, the {host} and {device} values work
|
||||||
identically. When using GPUs, the {device} value is the default since it
|
identically. When using GPUs, the {device} value is the default since it
|
||||||
will typically be optimal if all of your styles used in your input
|
will typically be optimal if all of your styles used in your input
|
||||||
script are supported by the KOKKOS package. In this case data can stay
|
script are supported by the KOKKOS package. In this case data can stay
|
||||||
on the GPU for many timesteps without being moved between the host and
|
on the GPU for many timesteps without being moved between the host and
|
||||||
GPU, if you use the {device} value. This requires that your MPI is able
|
GPU, if you use the {device} value. If your script uses styles (e.g.
|
||||||
to access GPU memory directly. Currently that is true for OpenMPI 1.8
|
fixes) which are not yet supported by the KOKKOS package, then data has
|
||||||
(or later versions), Mvapich2 1.9 (or later), and CrayMPI. If your
|
to be move between the host and device anyway, so it is typically faster
|
||||||
script uses styles (e.g. fixes) which are not yet supported by the
|
to let the host handle communication, by using the {host} value. Using
|
||||||
KOKKOS package, then data has to be move between the host and device
|
{host} instead of {no} will enable use of multiple threads to
|
||||||
anyway, so it is typically faster to let the host handle communication,
|
pack/unpack communicated data. When running small systems on a GPU,
|
||||||
by using the {host} value. Using {host} instead of {no} will enable use
|
performing the exchange pack/unpack on the host CPU can give speedup
|
||||||
of multiple threads to pack/unpack communicated data.
|
since it reduces the number of CUDA kernel launches.
|
||||||
|
|
||||||
The {gpu/direct} keyword chooses whether GPU-direct will be used. When
|
The {gpu/direct} keyword chooses whether GPU-direct will be used. When
|
||||||
this keyword is set to {on}, buffers in GPU memory are passed directly
|
this keyword is set to {on}, buffers in GPU memory are passed directly
|
||||||
|
@ -533,7 +533,8 @@ the {gpu/direct} keyword is automatically set to {off} by default. When
|
||||||
the {gpu/direct} keyword is set to {off} while any of the {comm}
|
the {gpu/direct} keyword is set to {off} while any of the {comm}
|
||||||
keywords are set to {device}, the value for these {comm} keywords will
|
keywords are set to {device}, the value for these {comm} keywords will
|
||||||
be automatically changed to {host}. This setting has no effect if not
|
be automatically changed to {host}. This setting has no effect if not
|
||||||
running on GPUs.
|
running on GPUs. GPU-direct is available for OpenMPI 1.8 (or later
|
||||||
|
versions), Mvapich2 1.9 (or later), and CrayMPI.
|
||||||
|
|
||||||
:line
|
:line
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue