forked from lijiext/lammps
git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12466 f3b2605a-c512-4ea7-a41b-209d697bcdaa
This commit is contained in:
parent
1025e266b1
commit
9d11e531e7
187
doc/package.html
187
doc/package.html
|
@ -68,11 +68,21 @@
|
||||||
<I>tptask</I> value = Ntptask
|
<I>tptask</I> value = Ntptask
|
||||||
Ntptask = max number of threads to use on coprocessor for each MPI task
|
Ntptask = max number of threads to use on coprocessor for each MPI task
|
||||||
<I>kokkos</I> args = keyword value ...
|
<I>kokkos</I> args = keyword value ...
|
||||||
one or more keyword/value pairs may be appended
|
zero or more keyword/value pairs may be appended
|
||||||
keywords = <I>neigh</I> or <I>comm/exchange</I> or <I>comm/forward</I>
|
keywords = <I>neigh</I> or <I>comm</I> or <I>comm/exchange</I> or <I>comm/forward</I>
|
||||||
<I>neigh</I> value = <I>full</I> or <I>half/thread</I> or <I>half</I> or <I>n2</I> or <I>full/cluster</I>
|
<I>neigh</I> value = <I>full</I> or <I>half/thread</I> or <I>half</I> or <I>n2</I> or <I>full/cluster</I>
|
||||||
|
full = full neighbor list
|
||||||
|
half/thread = half neighbor list built in thread-safe manner
|
||||||
|
half = half neighbor list, not thread-safe, only use when 1 thread/MPI task
|
||||||
|
n2 = non-binning neighbor list build, O(N^2) algorithm
|
||||||
|
full/cluster = full neighbor list with clustered groups of atoms
|
||||||
|
<I>comm</I> value = <I>no</I> or <I>host</I> or <I>device</I>
|
||||||
|
use value for both comm/exchange and comm/forward
|
||||||
<I>comm/exchange</I> value = <I>no</I> or <I>host</I> or <I>device</I>
|
<I>comm/exchange</I> value = <I>no</I> or <I>host</I> or <I>device</I>
|
||||||
<I>comm/forward</I> value = <I>no</I> or <I>host</I> or <I>device</I>
|
<I>comm/forward</I> value = <I>no</I> or <I>host</I> or <I>device</I>
|
||||||
|
no = perform communication pack/unpack in non-KOKKOS mode
|
||||||
|
host = perform pack/unpack on host (e.g. with OpenMP threading)
|
||||||
|
device = perform pack/unpack on device (e.g. on GPU)
|
||||||
<I>omp</I> args = Nthreads keyword value ...
|
<I>omp</I> args = Nthreads keyword value ...
|
||||||
Nthread = # of OpenMP threads to associate with each MPI process
|
Nthread = # of OpenMP threads to associate with each MPI process
|
||||||
zero or more keyword/value pairs may be appended
|
zero or more keyword/value pairs may be appended
|
||||||
|
@ -88,47 +98,59 @@
|
||||||
<PRE>package gpu 1
|
<PRE>package gpu 1
|
||||||
package gpu 1 split 0.75
|
package gpu 1 split 0.75
|
||||||
package gpu 2 split -1.0
|
package gpu 2 split -1.0
|
||||||
package cuda gpu/node/special 2 0 2
|
package cuda 2 gpuID 0 2
|
||||||
package cuda test 3948
|
package cuda 1 test 3948
|
||||||
package kokkos neigh half/thread comm/forward device
|
package kokkos neigh half/thread comm device
|
||||||
package omp 0 neigh yes
|
package omp 0 neigh no
|
||||||
package omp 4
|
package omp 4
|
||||||
package intel * mixed balance -1
|
package intel * mixed balance -1
|
||||||
</PRE>
|
</PRE>
|
||||||
<P><B>Description:</B>
|
<P><B>Description:</B>
|
||||||
</P>
|
</P>
|
||||||
<P>This command invokes package-specific settings. Currently the
|
<P>This command invokes package-specific settings for the various
|
||||||
following packages use it: USER-CUDA, GPU, USER-INTEL, KOKKOS, and
|
accelerator packages available in LAMMPS. Currently the following
|
||||||
USER-OMP.
|
packages use settings from this command: USER-CUDA, GPU, USER-INTEL,
|
||||||
|
KOKKOS, and USER-OMP.
|
||||||
</P>
|
</P>
|
||||||
<P>If allows calling multiple times, all options set to their
|
<P>If this command is specified in an input script, it must be near the
|
||||||
defaults, whether specified or not.
|
top of the script, before the simulation box has been defined. This
|
||||||
|
is because it specifies settings that the accelerator packages use in
|
||||||
|
their intialization, before a simultion is defined.
|
||||||
</P>
|
</P>
|
||||||
<P>Talk about command line switch -pk as alternate option.
|
<P>This command can also be specified from the command-line when
|
||||||
|
launching LAMMPS, using the "-pk" <A HREF = "Section_start.html#start_7">command-line
|
||||||
|
switch</A>. The syntax is exactly the same as
|
||||||
|
when used in an input script.
|
||||||
</P>
|
</P>
|
||||||
<P>Which packages require it to be invoked, only CUDA
|
<P>Note that all of the accelerator packages require the package command
|
||||||
this is b/c can only be invoked once
|
to be specified (except the OPT package), if the package is to be used
|
||||||
vs optional: all others? and allow multiple invokes
|
in a simulation (LAMMPS can be built with an accelerator package
|
||||||
|
without using it in a particular simulation). However, in all cases,
|
||||||
|
a default version of the command is typically invoked by other
|
||||||
|
accelerator settings.
|
||||||
</P>
|
</P>
|
||||||
<P>Must be invoked early in script, before simulation box is defined.
|
<P>The USER-CUDA and KOKKOS packages require a "-c on" or "-k on"
|
||||||
|
<A HREF = "Section_start.html#start_7">command-line switch</A> respectively, which
|
||||||
|
invokes a "package cuda" or "package kokkos" command with default
|
||||||
|
settings.
|
||||||
</P>
|
</P>
|
||||||
<P>To use the accelerated GPU and USER-OMP styles, the use of the package
|
<P>For the GPU, USER-INTEL, and USER-OMP packages, if a "-sf gpu" or "-sf
|
||||||
command is required. However, as described in the "Defaults" section
|
intel" or "-sf omp" <A HREF = "Section_start.html#start_7">command-line switch</A>
|
||||||
below, if you use the "-sf gpu" or "-sf omp" <A HREF = "Section_start.html#start_7">command-line
|
is used to auto-append accelerator suffixes to various styles in the
|
||||||
options</A> to enable use of these styles,
|
input script, then those switches also invoke a "package gpu",
|
||||||
then default package settings are enabled. In that case you only need
|
"package intel", or "package omp" command with default settings.
|
||||||
to use the package command if you want to change the defaults.
|
|
||||||
</P>
|
</P>
|
||||||
<P>To use the accelerated USER-CUDA and KOKKOS styles, the package
|
<P>IMPORTANT NOTE: A package command for a particular style can be
|
||||||
command is not required as defaults are assigned internally. You only
|
invoked multiple times when a simulation is setup, e.g. by the "-c
|
||||||
need to use the package command if you want to change the defaults.
|
on", "-k on", "-sf", and "-pk" <A HREF = "Section_start.html#start_7">command-line
|
||||||
|
switches</A>, and by using this command in an
|
||||||
|
input script. Each time it is used all of the style options are set,
|
||||||
|
either to default values or to specified settings. I.e. settings from
|
||||||
|
previous invocations do not persist across multiple invocations.
|
||||||
</P>
|
</P>
|
||||||
<P>See <A HREF = "Section_accelerate.html">Section_accelerate</A> of the manual for
|
<P>See the <A HREF = "Section_accelerate.html">Section Accelerate</A> section of the
|
||||||
more details about using these various packages for accelerating
|
manual for more details about using the various accelerator packages
|
||||||
LAMMPS calculations.
|
for speeding up LAMMPS simulations.
|
||||||
</P>
|
|
||||||
<P>Package GPU always sets newton pair off. Not so for USER-CUDA
|
|
||||||
add newton options to GPU, CUDA, KOKKOS.
|
|
||||||
</P>
|
</P>
|
||||||
<HR>
|
<HR>
|
||||||
|
|
||||||
|
@ -335,32 +357,44 @@ generation Xeon Phi chip.
|
||||||
<P>The <I>kokkos</I> style invokes settings associated with the use of the
|
<P>The <I>kokkos</I> style invokes settings associated with the use of the
|
||||||
KOKKOS package.
|
KOKKOS package.
|
||||||
</P>
|
</P>
|
||||||
<P>The <I>neigh</I> keyword determines what kinds of neighbor lists are built.
|
<P>All of the settings are optional keyword/value pairs. Each has a
|
||||||
A value of <I>half</I> uses half-neighbor lists, the same as used by most
|
default value as listed below.
|
||||||
pair styles in LAMMPS. A value of <I>half/thread</I> uses a threadsafe
|
|
||||||
variant of the half-neighbor list. It should be used instead of
|
|
||||||
<I>half</I> when running with threads on a CPU. A value of <I>full</I> uses a
|
|
||||||
full-neighborlist, i.e. f_ij and f_ji are both calculated. This
|
|
||||||
performs twice as much computation as the <I>half</I> option, however that
|
|
||||||
can be a win because it is threadsafe and doesn't require atomic
|
|
||||||
operations. A value of <I>full/cluster</I> is an experimental neighbor
|
|
||||||
style, where particles interact with all particles within a small
|
|
||||||
cluster, if at least one of the clusters particles is within the
|
|
||||||
neighbor cutoff range. This potentially allows for better
|
|
||||||
vectorization on architectures such as the Intel Phi. If also reduces
|
|
||||||
the size of the neighbor list by roughly a factor of the cluster size,
|
|
||||||
thus reducing the total memory footprint considerably.
|
|
||||||
</P>
|
</P>
|
||||||
<P>The <I>comm/exchange</I> and <I>comm/forward</I> keywords determine whether the
|
<P>The <I>neigh</I> keyword determines how neighbor lists are built. A value
|
||||||
host or device performs the packing and unpacking of data when
|
of <I>half</I> uses half-neighbor lists, the same as used by most pair
|
||||||
communicating information between processors. "Exchange"
|
styles in LAMMPS. A value of <I>half/thread</I> uses a thread-safe variant
|
||||||
|
of the half-neighbor list. It should be used instead of <I>half</I> when
|
||||||
|
running with more than 1 threads per MPI task on a CPU. A value of
|
||||||
|
<I>n2</I> uses an O(N^2) algorithm to build the neighbor list without
|
||||||
|
binning, where N = # of atoms on a processor. It is typically slower
|
||||||
|
than the other methods, which use binning.
|
||||||
|
</P>
|
||||||
|
<P>A value of <I>full</I> uses a full neighbor lists and is the default. This
|
||||||
|
performs twice as much computation as the <I>half</I> option, however that
|
||||||
|
is often a win because it is thread-safe and doesn't require atomic
|
||||||
|
operations in the calculation of pair forces.
|
||||||
|
</P>
|
||||||
|
<P>A value of <I>full/cluster</I> is an experimental neighbor style, where
|
||||||
|
particles interact with all particles within a small cluster, if at
|
||||||
|
least one of the clusters particles is within the neighbor cutoff
|
||||||
|
range. This potentially allows for better vectorization on
|
||||||
|
architectures such as the Intel Phi. If also reduces the size of the
|
||||||
|
neighbor list by roughly a factor of the cluster size, thus reducing
|
||||||
|
the total memory footprint considerably.
|
||||||
|
</P>
|
||||||
|
<P>The <I>comm</I> and <I>comm/exchange</I> and <I>comm/forward</I> keywords determine
|
||||||
|
whether the host or device performs the packing and unpacking of data
|
||||||
|
when communicating per-atom data between processors. "Exchange"
|
||||||
communication happens only on timesteps that neighbor lists are
|
communication happens only on timesteps that neighbor lists are
|
||||||
rebuilt. The data is only for atoms that migrate to new processors.
|
rebuilt. The data is only for atoms that migrate to new processors.
|
||||||
"Forward" communication happens every timestep. The data is for atom
|
"Forward" communication happens every timestep. The data is for atom
|
||||||
coordinates and any other atom properties that needs to be updated for
|
coordinates and any other atom properties that needs to be updated for
|
||||||
ghost atoms owned by each processor.
|
ghost atoms owned by each processor.
|
||||||
</P>
|
</P>
|
||||||
<P>The value options for these keywords are <I>no</I> or <I>host</I> or <I>device</I>.
|
<P>The <I>comm</I> keyword is simply a short-cut to set the same value
|
||||||
|
for both the <I>comm/exchange</I> and <I>comm/forward</I> keywords.
|
||||||
|
</P>
|
||||||
|
<P>The value options for all 3 keywords are <I>no</I> or <I>host</I> or <I>device</I>.
|
||||||
A value of <I>no</I> means to use the standard non-KOKKOS method of
|
A value of <I>no</I> means to use the standard non-KOKKOS method of
|
||||||
packing/unpacking data for the communication. A value of <I>host</I> means
|
packing/unpacking data for the communication. A value of <I>host</I> means
|
||||||
to use the host, typically a multi-core CPU, and perform the
|
to use the host, typically a multi-core CPU, and perform the
|
||||||
|
@ -369,10 +403,12 @@ to use the device, typically a GPU, to perform the packing/unpacking
|
||||||
operation.
|
operation.
|
||||||
</P>
|
</P>
|
||||||
<P>The optimal choice for these keywords depends on the input script and
|
<P>The optimal choice for these keywords depends on the input script and
|
||||||
the hardware used. The <I>no</I> value is useful for verifying that Kokkos
|
the hardware used. The <I>no</I> value is useful for verifying that the
|
||||||
code is working correctly. It may also be the fastest choice when
|
Kokkos-based <I>host</I> and <I>device</I> values are working correctly. It may
|
||||||
using Kokkos styles in MPI-only mode (i.e. with a thread count of 1).
|
also be the fastest choice when using Kokkos styles in MPI-only mode
|
||||||
When running on CPUs or Xeon Phi, the <I>host</I> and <I>device</I> values work
|
(i.e. with a thread count of 1).
|
||||||
|
</P>
|
||||||
|
<P>When running on CPUs or Xeon Phi, the <I>host</I> and <I>device</I> values work
|
||||||
identically. When using GPUs, the <I>device</I> value will typically be
|
identically. When using GPUs, the <I>device</I> value will typically be
|
||||||
optimal if all of your styles used in your input script are supported
|
optimal if all of your styles used in your input script are supported
|
||||||
by the KOKKOS package. In this case data can stay on the GPU for many
|
by the KOKKOS package. In this case data can stay on the GPU for many
|
||||||
|
@ -476,11 +512,13 @@ setting</A>
|
||||||
</P>
|
</P>
|
||||||
<P><B>Default:</B>
|
<P><B>Default:</B>
|
||||||
</P>
|
</P>
|
||||||
<P>To use the USER-CUDA package, the package cuda command must be invoked
|
<P>For the USER-CUDA package, the default is Ngpu = 1 and the option
|
||||||
explicitly in your input script or via the "-pk cuda" <A HREF = "Section_start.html#start_7">command-line
|
defaults are gpuID = 0 to Ngpu-1, timing = not enabled, test = not
|
||||||
switch</A>. This will set the # of GPUs/node.
|
enabled, and thread = auto. These settings are made automatically by
|
||||||
The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
|
the required "-c on" <A HREF = "Section_start.html#start_7">command-line switch</A>.
|
||||||
test = not enabled, and thread = auto.
|
You can change them bu using the package cuda command in your input
|
||||||
|
script or via the "-pk cuda" <A HREF = "Section_start.html#start_7">command-line
|
||||||
|
switch</A>.
|
||||||
</P>
|
</P>
|
||||||
<P>For the GPU package, the default is Ngpu = 1 and the option defaults
|
<P>For the GPU package, the default is Ngpu = 1 and the option defaults
|
||||||
are neigh = yes, split = 1.0, gpuID = 0 to Ngpu-1, tpa = 1, binsize =
|
are neigh = yes, split = 1.0, gpuID = 0 to Ngpu-1, tpa = 1, binsize =
|
||||||
|
@ -491,24 +529,21 @@ must invoke the package gpu command in your input script or via the
|
||||||
"-pk gpu" <A HREF = "Section_start.html#start_7">command-line switch</A>.
|
"-pk gpu" <A HREF = "Section_start.html#start_7">command-line switch</A>.
|
||||||
</P>
|
</P>
|
||||||
<P>For the USER-INTEL package, the default is Nphi = 1 and the option
|
<P>For the USER-INTEL package, the default is Nphi = 1 and the option
|
||||||
defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. The
|
defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. Note
|
||||||
default ghost option is determined by the pair style being used. This
|
that all of these settings, except "prec", are ignored if LAMMPS was
|
||||||
value used is output to the screen in the offload report at the end of
|
not built with Xeon Phi coprocessor support. The default ghost option
|
||||||
each run. These settings are made automatically if the "-sf intel"
|
is determined by the pair style being used. This value is output to
|
||||||
<A HREF = "Section_start.html#start_7">command-line switch</A> is used. If it is
|
the screen in the offload report at the end of each run. These
|
||||||
not used, you must invoke the package intel command in your input
|
settings are made automatically if the "-sf intel" <A HREF = "Section_start.html#start_7">command-line
|
||||||
script or or via the "-pk intel" <A HREF = "Section_start.html#start_7">command-line
|
switch</A> is used. If it is not used, you
|
||||||
switch</A>.
|
must invoke the package intel command in your input script or or via
|
||||||
|
the "-pk intel" <A HREF = "Section_start.html#start_7">command-line switch</A>.
|
||||||
</P>
|
</P>
|
||||||
<P>The default settings for the KOKKOS package are "package kokkos neigh
|
<P>For the KOKKOS package, the option defaults neigh = full and comm =
|
||||||
full comm/exchange host comm/forward host". This is the case whether
|
host. These settings are made automatically by the required "-k on"
|
||||||
the "-sf kk" <A HREF = "Section_start.html#start_7">command-line switch</A> is used
|
<A HREF = "Section_start.html#start_7">command-line switch</A>. You can change them
|
||||||
or not.
|
bu using the package kokkos command in your input script or via the
|
||||||
To use the KOKKOS package, the package kokkos command must be invoked
|
"-pk kokkos" <A HREF = "Section_start.html#start_7">command-line switch</A>.
|
||||||
explicitly in your input script or via the "-pk kokkos" <A HREF = "Section_start.html#start_7">command-line
|
|
||||||
switch</A>. This will set the # of GPUs/node.
|
|
||||||
The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
|
|
||||||
test = not enabled, and thread = auto.
|
|
||||||
</P>
|
</P>
|
||||||
<P>For the OMP package, the default is Nthreads = 0 and the option
|
<P>For the OMP package, the default is Nthreads = 0 and the option
|
||||||
defaults are neigh = yes. These settings are made automatically if
|
defaults are neigh = yes. These settings are made automatically if
|
||||||
|
|
186
doc/package.txt
186
doc/package.txt
|
@ -63,11 +63,21 @@ args = arguments specific to the style :l
|
||||||
{tptask} value = Ntptask
|
{tptask} value = Ntptask
|
||||||
Ntptask = max number of threads to use on coprocessor for each MPI task
|
Ntptask = max number of threads to use on coprocessor for each MPI task
|
||||||
{kokkos} args = keyword value ...
|
{kokkos} args = keyword value ...
|
||||||
one or more keyword/value pairs may be appended
|
zero or more keyword/value pairs may be appended
|
||||||
keywords = {neigh} or {comm/exchange} or {comm/forward}
|
keywords = {neigh} or {comm} or {comm/exchange} or {comm/forward}
|
||||||
{neigh} value = {full} or {half/thread} or {half} or {n2} or {full/cluster}
|
{neigh} value = {full} or {half/thread} or {half} or {n2} or {full/cluster}
|
||||||
|
full = full neighbor list
|
||||||
|
half/thread = half neighbor list built in thread-safe manner
|
||||||
|
half = half neighbor list, not thread-safe, only use when 1 thread/MPI task
|
||||||
|
n2 = non-binning neighbor list build, O(N^2) algorithm
|
||||||
|
full/cluster = full neighbor list with clustered groups of atoms
|
||||||
|
{comm} value = {no} or {host} or {device}
|
||||||
|
use value for both comm/exchange and comm/forward
|
||||||
{comm/exchange} value = {no} or {host} or {device}
|
{comm/exchange} value = {no} or {host} or {device}
|
||||||
{comm/forward} value = {no} or {host} or {device}
|
{comm/forward} value = {no} or {host} or {device}
|
||||||
|
no = perform communication pack/unpack in non-KOKKOS mode
|
||||||
|
host = perform pack/unpack on host (e.g. with OpenMP threading)
|
||||||
|
device = perform pack/unpack on device (e.g. on GPU)
|
||||||
{omp} args = Nthreads keyword value ...
|
{omp} args = Nthreads keyword value ...
|
||||||
Nthread = # of OpenMP threads to associate with each MPI process
|
Nthread = # of OpenMP threads to associate with each MPI process
|
||||||
zero or more keyword/value pairs may be appended
|
zero or more keyword/value pairs may be appended
|
||||||
|
@ -82,47 +92,59 @@ args = arguments specific to the style :l
|
||||||
package gpu 1
|
package gpu 1
|
||||||
package gpu 1 split 0.75
|
package gpu 1 split 0.75
|
||||||
package gpu 2 split -1.0
|
package gpu 2 split -1.0
|
||||||
package cuda gpu/node/special 2 0 2
|
package cuda 2 gpuID 0 2
|
||||||
package cuda test 3948
|
package cuda 1 test 3948
|
||||||
package kokkos neigh half/thread comm/forward device
|
package kokkos neigh half/thread comm device
|
||||||
package omp 0 neigh yes
|
package omp 0 neigh no
|
||||||
package omp 4
|
package omp 4
|
||||||
package intel * mixed balance -1 :pre
|
package intel * mixed balance -1 :pre
|
||||||
|
|
||||||
[Description:]
|
[Description:]
|
||||||
|
|
||||||
This command invokes package-specific settings. Currently the
|
This command invokes package-specific settings for the various
|
||||||
following packages use it: USER-CUDA, GPU, USER-INTEL, KOKKOS, and
|
accelerator packages available in LAMMPS. Currently the following
|
||||||
USER-OMP.
|
packages use settings from this command: USER-CUDA, GPU, USER-INTEL,
|
||||||
|
KOKKOS, and USER-OMP.
|
||||||
|
|
||||||
If allows calling multiple times, all options set to their
|
If this command is specified in an input script, it must be near the
|
||||||
defaults, whether specified or not.
|
top of the script, before the simulation box has been defined. This
|
||||||
|
is because it specifies settings that the accelerator packages use in
|
||||||
|
their intialization, before a simultion is defined.
|
||||||
|
|
||||||
Talk about command line switch -pk as alternate option.
|
This command can also be specified from the command-line when
|
||||||
|
launching LAMMPS, using the "-pk" "command-line
|
||||||
|
switch"_Section_start.html#start_7. The syntax is exactly the same as
|
||||||
|
when used in an input script.
|
||||||
|
|
||||||
Which packages require it to be invoked, only CUDA
|
Note that all of the accelerator packages require the package command
|
||||||
this is b/c can only be invoked once
|
to be specified (except the OPT package), if the package is to be used
|
||||||
vs optional: all others? and allow multiple invokes
|
in a simulation (LAMMPS can be built with an accelerator package
|
||||||
|
without using it in a particular simulation). However, in all cases,
|
||||||
|
a default version of the command is typically invoked by other
|
||||||
|
accelerator settings.
|
||||||
|
|
||||||
Must be invoked early in script, before simulation box is defined.
|
The USER-CUDA and KOKKOS packages require a "-c on" or "-k on"
|
||||||
|
"command-line switch"_Section_start.html#start_7 respectively, which
|
||||||
|
invokes a "package cuda" or "package kokkos" command with default
|
||||||
|
settings.
|
||||||
|
|
||||||
To use the accelerated GPU and USER-OMP styles, the use of the package
|
For the GPU, USER-INTEL, and USER-OMP packages, if a "-sf gpu" or "-sf
|
||||||
command is required. However, as described in the "Defaults" section
|
intel" or "-sf omp" "command-line switch"_Section_start.html#start_7
|
||||||
below, if you use the "-sf gpu" or "-sf omp" "command-line
|
is used to auto-append accelerator suffixes to various styles in the
|
||||||
options"_Section_start.html#start_7 to enable use of these styles,
|
input script, then those switches also invoke a "package gpu",
|
||||||
then default package settings are enabled. In that case you only need
|
"package intel", or "package omp" command with default settings.
|
||||||
to use the package command if you want to change the defaults.
|
|
||||||
|
|
||||||
To use the accelerated USER-CUDA and KOKKOS styles, the package
|
IMPORTANT NOTE: A package command for a particular style can be
|
||||||
command is not required as defaults are assigned internally. You only
|
invoked multiple times when a simulation is setup, e.g. by the "-c
|
||||||
need to use the package command if you want to change the defaults.
|
on", "-k on", "-sf", and "-pk" "command-line
|
||||||
|
switches"_Section_start.html#start_7, and by using this command in an
|
||||||
|
input script. Each time it is used all of the style options are set,
|
||||||
|
either to default values or to specified settings. I.e. settings from
|
||||||
|
previous invocations do not persist across multiple invocations.
|
||||||
|
|
||||||
See "Section_accelerate"_Section_accelerate.html of the manual for
|
See the "Section Accelerate"_Section_accelerate.html section of the
|
||||||
more details about using these various packages for accelerating
|
manual for more details about using the various accelerator packages
|
||||||
LAMMPS calculations.
|
for speeding up LAMMPS simulations.
|
||||||
|
|
||||||
Package GPU always sets newton pair off. Not so for USER-CUDA
|
|
||||||
add newton options to GPU, CUDA, KOKKOS.
|
|
||||||
|
|
||||||
:line
|
:line
|
||||||
|
|
||||||
|
@ -329,32 +351,44 @@ generation Xeon Phi chip.
|
||||||
The {kokkos} style invokes settings associated with the use of the
|
The {kokkos} style invokes settings associated with the use of the
|
||||||
KOKKOS package.
|
KOKKOS package.
|
||||||
|
|
||||||
The {neigh} keyword determines what kinds of neighbor lists are built.
|
All of the settings are optional keyword/value pairs. Each has a
|
||||||
A value of {half} uses half-neighbor lists, the same as used by most
|
default value as listed below.
|
||||||
pair styles in LAMMPS. A value of {half/thread} uses a threadsafe
|
|
||||||
variant of the half-neighbor list. It should be used instead of
|
|
||||||
{half} when running with threads on a CPU. A value of {full} uses a
|
|
||||||
full-neighborlist, i.e. f_ij and f_ji are both calculated. This
|
|
||||||
performs twice as much computation as the {half} option, however that
|
|
||||||
can be a win because it is threadsafe and doesn't require atomic
|
|
||||||
operations. A value of {full/cluster} is an experimental neighbor
|
|
||||||
style, where particles interact with all particles within a small
|
|
||||||
cluster, if at least one of the clusters particles is within the
|
|
||||||
neighbor cutoff range. This potentially allows for better
|
|
||||||
vectorization on architectures such as the Intel Phi. If also reduces
|
|
||||||
the size of the neighbor list by roughly a factor of the cluster size,
|
|
||||||
thus reducing the total memory footprint considerably.
|
|
||||||
|
|
||||||
The {comm/exchange} and {comm/forward} keywords determine whether the
|
The {neigh} keyword determines how neighbor lists are built. A value
|
||||||
host or device performs the packing and unpacking of data when
|
of {half} uses half-neighbor lists, the same as used by most pair
|
||||||
communicating information between processors. "Exchange"
|
styles in LAMMPS. A value of {half/thread} uses a thread-safe variant
|
||||||
|
of the half-neighbor list. It should be used instead of {half} when
|
||||||
|
running with more than 1 threads per MPI task on a CPU. A value of
|
||||||
|
{n2} uses an O(N^2) algorithm to build the neighbor list without
|
||||||
|
binning, where N = # of atoms on a processor. It is typically slower
|
||||||
|
than the other methods, which use binning.
|
||||||
|
|
||||||
|
A value of {full} uses a full neighbor lists and is the default. This
|
||||||
|
performs twice as much computation as the {half} option, however that
|
||||||
|
is often a win because it is thread-safe and doesn't require atomic
|
||||||
|
operations in the calculation of pair forces.
|
||||||
|
|
||||||
|
A value of {full/cluster} is an experimental neighbor style, where
|
||||||
|
particles interact with all particles within a small cluster, if at
|
||||||
|
least one of the clusters particles is within the neighbor cutoff
|
||||||
|
range. This potentially allows for better vectorization on
|
||||||
|
architectures such as the Intel Phi. If also reduces the size of the
|
||||||
|
neighbor list by roughly a factor of the cluster size, thus reducing
|
||||||
|
the total memory footprint considerably.
|
||||||
|
|
||||||
|
The {comm} and {comm/exchange} and {comm/forward} keywords determine
|
||||||
|
whether the host or device performs the packing and unpacking of data
|
||||||
|
when communicating per-atom data between processors. "Exchange"
|
||||||
communication happens only on timesteps that neighbor lists are
|
communication happens only on timesteps that neighbor lists are
|
||||||
rebuilt. The data is only for atoms that migrate to new processors.
|
rebuilt. The data is only for atoms that migrate to new processors.
|
||||||
"Forward" communication happens every timestep. The data is for atom
|
"Forward" communication happens every timestep. The data is for atom
|
||||||
coordinates and any other atom properties that needs to be updated for
|
coordinates and any other atom properties that needs to be updated for
|
||||||
ghost atoms owned by each processor.
|
ghost atoms owned by each processor.
|
||||||
|
|
||||||
The value options for these keywords are {no} or {host} or {device}.
|
The {comm} keyword is simply a short-cut to set the same value
|
||||||
|
for both the {comm/exchange} and {comm/forward} keywords.
|
||||||
|
|
||||||
|
The value options for all 3 keywords are {no} or {host} or {device}.
|
||||||
A value of {no} means to use the standard non-KOKKOS method of
|
A value of {no} means to use the standard non-KOKKOS method of
|
||||||
packing/unpacking data for the communication. A value of {host} means
|
packing/unpacking data for the communication. A value of {host} means
|
||||||
to use the host, typically a multi-core CPU, and perform the
|
to use the host, typically a multi-core CPU, and perform the
|
||||||
|
@ -363,9 +397,11 @@ to use the device, typically a GPU, to perform the packing/unpacking
|
||||||
operation.
|
operation.
|
||||||
|
|
||||||
The optimal choice for these keywords depends on the input script and
|
The optimal choice for these keywords depends on the input script and
|
||||||
the hardware used. The {no} value is useful for verifying that Kokkos
|
the hardware used. The {no} value is useful for verifying that the
|
||||||
code is working correctly. It may also be the fastest choice when
|
Kokkos-based {host} and {device} values are working correctly. It may
|
||||||
using Kokkos styles in MPI-only mode (i.e. with a thread count of 1).
|
also be the fastest choice when using Kokkos styles in MPI-only mode
|
||||||
|
(i.e. with a thread count of 1).
|
||||||
|
|
||||||
When running on CPUs or Xeon Phi, the {host} and {device} values work
|
When running on CPUs or Xeon Phi, the {host} and {device} values work
|
||||||
identically. When using GPUs, the {device} value will typically be
|
identically. When using GPUs, the {device} value will typically be
|
||||||
optimal if all of your styles used in your input script are supported
|
optimal if all of your styles used in your input script are supported
|
||||||
|
@ -470,11 +506,13 @@ setting"_Section_start.html#start_7
|
||||||
|
|
||||||
[Default:]
|
[Default:]
|
||||||
|
|
||||||
To use the USER-CUDA package, the package cuda command must be invoked
|
For the USER-CUDA package, the default is Ngpu = 1 and the option
|
||||||
explicitly in your input script or via the "-pk cuda" "command-line
|
defaults are gpuID = 0 to Ngpu-1, timing = not enabled, test = not
|
||||||
switch"_Section_start.html#start_7. This will set the # of GPUs/node.
|
enabled, and thread = auto. These settings are made automatically by
|
||||||
The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
|
the required "-c on" "command-line switch"_Section_start.html#start_7.
|
||||||
test = not enabled, and thread = auto.
|
You can change them bu using the package cuda command in your input
|
||||||
|
script or via the "-pk cuda" "command-line
|
||||||
|
switch"_Section_start.html#start_7.
|
||||||
|
|
||||||
For the GPU package, the default is Ngpu = 1 and the option defaults
|
For the GPU package, the default is Ngpu = 1 and the option defaults
|
||||||
are neigh = yes, split = 1.0, gpuID = 0 to Ngpu-1, tpa = 1, binsize =
|
are neigh = yes, split = 1.0, gpuID = 0 to Ngpu-1, tpa = 1, binsize =
|
||||||
|
@ -485,24 +523,21 @@ must invoke the package gpu command in your input script or via the
|
||||||
"-pk gpu" "command-line switch"_Section_start.html#start_7.
|
"-pk gpu" "command-line switch"_Section_start.html#start_7.
|
||||||
|
|
||||||
For the USER-INTEL package, the default is Nphi = 1 and the option
|
For the USER-INTEL package, the default is Nphi = 1 and the option
|
||||||
defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. The
|
defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. Note
|
||||||
default ghost option is determined by the pair style being used. This
|
that all of these settings, except "prec", are ignored if LAMMPS was
|
||||||
value used is output to the screen in the offload report at the end of
|
not built with Xeon Phi coprocessor support. The default ghost option
|
||||||
each run. These settings are made automatically if the "-sf intel"
|
is determined by the pair style being used. This value is output to
|
||||||
"command-line switch"_Section_start.html#start_7 is used. If it is
|
the screen in the offload report at the end of each run. These
|
||||||
not used, you must invoke the package intel command in your input
|
settings are made automatically if the "-sf intel" "command-line
|
||||||
script or or via the "-pk intel" "command-line
|
switch"_Section_start.html#start_7 is used. If it is not used, you
|
||||||
switch"_Section_start.html#start_7.
|
must invoke the package intel command in your input script or or via
|
||||||
|
the "-pk intel" "command-line switch"_Section_start.html#start_7.
|
||||||
|
|
||||||
The default settings for the KOKKOS package are "package kokkos neigh
|
For the KOKKOS package, the option defaults neigh = full and comm =
|
||||||
full comm/exchange host comm/forward host". This is the case whether
|
host. These settings are made automatically by the required "-k on"
|
||||||
the "-sf kk" "command-line switch"_Section_start.html#start_7 is used
|
"command-line switch"_Section_start.html#start_7. You can change them
|
||||||
or not.
|
bu using the package kokkos command in your input script or via the
|
||||||
To use the KOKKOS package, the package kokkos command must be invoked
|
"-pk kokkos" "command-line switch"_Section_start.html#start_7.
|
||||||
explicitly in your input script or via the "-pk kokkos" "command-line
|
|
||||||
switch"_Section_start.html#start_7. This will set the # of GPUs/node.
|
|
||||||
The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
|
|
||||||
test = not enabled, and thread = auto.
|
|
||||||
|
|
||||||
For the OMP package, the default is Nthreads = 0 and the option
|
For the OMP package, the default is Nthreads = 0 and the option
|
||||||
defaults are neigh = yes. These settings are made automatically if
|
defaults are neigh = yes. These settings are made automatically if
|
||||||
|
@ -510,4 +545,3 @@ the "-sf omp" "command-line switch"_Section_start.html#start_7 is
|
||||||
used. If it is not used, you must invoke the package omp command in
|
used. If it is not used, you must invoke the package omp command in
|
||||||
your input script or via the "-pk omp" "command-line
|
your input script or via the "-pk omp" "command-line
|
||||||
switch"_Section_start.html#start_7.
|
switch"_Section_start.html#start_7.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue