git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12466 f3b2605a-c512-4ea7-a41b-209d697bcdaa

This commit is contained in:
sjplimp 2014-09-10 16:25:52 +00:00
parent 1025e266b1
commit 9d11e531e7
2 changed files with 221 additions and 152 deletions

View File

@ -68,11 +68,21 @@
<I>tptask</I> value = Ntptask
Ntptask = max number of threads to use on coprocessor for each MPI task
<I>kokkos</I> args = keyword value ...
one or more keyword/value pairs may be appended
keywords = <I>neigh</I> or <I>comm/exchange</I> or <I>comm/forward</I>
zero or more keyword/value pairs may be appended
keywords = <I>neigh</I> or <I>comm</I> or <I>comm/exchange</I> or <I>comm/forward</I>
<I>neigh</I> value = <I>full</I> or <I>half/thread</I> or <I>half</I> or <I>n2</I> or <I>full/cluster</I>
full = full neighbor list
half/thread = half neighbor list built in thread-safe manner
half = half neighbor list, not thread-safe, only use when 1 thread/MPI task
n2 = non-binning neighbor list build, O(N^2) algorithm
full/cluster = full neighbor list with clustered groups of atoms
<I>comm</I> value = <I>no</I> or <I>host</I> or <I>device</I>
use value for both comm/exchange and comm/forward
<I>comm/exchange</I> value = <I>no</I> or <I>host</I> or <I>device</I>
<I>comm/forward</I> value = <I>no</I> or <I>host</I> or <I>device</I>
no = perform communication pack/unpack in non-KOKKOS mode
host = perform pack/unpack on host (e.g. with OpenMP threading)
device = perform pack/unpack on device (e.g. on GPU)
<I>omp</I> args = Nthreads keyword value ...
Nthread = # of OpenMP threads to associate with each MPI process
zero or more keyword/value pairs may be appended
@ -88,47 +98,59 @@
<PRE>package gpu 1
package gpu 1 split 0.75
package gpu 2 split -1.0
package cuda gpu/node/special 2 0 2
package cuda test 3948
package kokkos neigh half/thread comm/forward device
package omp 0 neigh yes
package cuda 2 gpuID 0 2
package cuda 1 test 3948
package kokkos neigh half/thread comm device
package omp 0 neigh no
package omp 4
package intel * mixed balance -1
</PRE>
<P><B>Description:</B>
</P>
<P>This command invokes package-specific settings. Currently the
following packages use it: USER-CUDA, GPU, USER-INTEL, KOKKOS, and
USER-OMP.
<P>This command invokes package-specific settings for the various
accelerator packages available in LAMMPS. Currently the following
packages use settings from this command: USER-CUDA, GPU, USER-INTEL,
KOKKOS, and USER-OMP.
</P>
<P>If allows calling multiple times, all options set to their
defaults, whether specified or not.
<P>If this command is specified in an input script, it must be near the
top of the script, before the simulation box has been defined. This
is because it specifies settings that the accelerator packages use in
their intialization, before a simultion is defined.
</P>
<P>Talk about command line switch -pk as alternate option.
<P>This command can also be specified from the command-line when
launching LAMMPS, using the "-pk" <A HREF = "Section_start.html#start_7">command-line
switch</A>. The syntax is exactly the same as
when used in an input script.
</P>
<P>Which packages require it to be invoked, only CUDA
this is b/c can only be invoked once
vs optional: all others? and allow multiple invokes
<P>Note that all of the accelerator packages require the package command
to be specified (except the OPT package), if the package is to be used
in a simulation (LAMMPS can be built with an accelerator package
without using it in a particular simulation). However, in all cases,
a default version of the command is typically invoked by other
accelerator settings.
</P>
<P>Must be invoked early in script, before simulation box is defined.
<P>The USER-CUDA and KOKKOS packages require a "-c on" or "-k on"
<A HREF = "Section_start.html#start_7">command-line switch</A> respectively, which
invokes a "package cuda" or "package kokkos" command with default
settings.
</P>
<P>To use the accelerated GPU and USER-OMP styles, the use of the package
command is required. However, as described in the "Defaults" section
below, if you use the "-sf gpu" or "-sf omp" <A HREF = "Section_start.html#start_7">command-line
options</A> to enable use of these styles,
then default package settings are enabled. In that case you only need
to use the package command if you want to change the defaults.
<P>For the GPU, USER-INTEL, and USER-OMP packages, if a "-sf gpu" or "-sf
intel" or "-sf omp" <A HREF = "Section_start.html#start_7">command-line switch</A>
is used to auto-append accelerator suffixes to various styles in the
input script, then those switches also invoke a "package gpu",
"package intel", or "package omp" command with default settings.
</P>
<P>To use the accelerated USER-CUDA and KOKKOS styles, the package
command is not required as defaults are assigned internally. You only
need to use the package command if you want to change the defaults.
<P>IMPORTANT NOTE: A package command for a particular style can be
invoked multiple times when a simulation is setup, e.g. by the "-c
on", "-k on", "-sf", and "-pk" <A HREF = "Section_start.html#start_7">command-line
switches</A>, and by using this command in an
input script. Each time it is used all of the style options are set,
either to default values or to specified settings. I.e. settings from
previous invocations do not persist across multiple invocations.
</P>
<P>See <A HREF = "Section_accelerate.html">Section_accelerate</A> of the manual for
more details about using these various packages for accelerating
LAMMPS calculations.
</P>
<P>Package GPU always sets newton pair off. Not so for USER-CUDA
add newton options to GPU, CUDA, KOKKOS.
<P>See the <A HREF = "Section_accelerate.html">Section Accelerate</A> section of the
manual for more details about using the various accelerator packages
for speeding up LAMMPS simulations.
</P>
<HR>
@ -335,32 +357,44 @@ generation Xeon Phi chip.
<P>The <I>kokkos</I> style invokes settings associated with the use of the
KOKKOS package.
</P>
<P>The <I>neigh</I> keyword determines what kinds of neighbor lists are built.
A value of <I>half</I> uses half-neighbor lists, the same as used by most
pair styles in LAMMPS. A value of <I>half/thread</I> uses a threadsafe
variant of the half-neighbor list. It should be used instead of
<I>half</I> when running with threads on a CPU. A value of <I>full</I> uses a
full-neighborlist, i.e. f_ij and f_ji are both calculated. This
performs twice as much computation as the <I>half</I> option, however that
can be a win because it is threadsafe and doesn't require atomic
operations. A value of <I>full/cluster</I> is an experimental neighbor
style, where particles interact with all particles within a small
cluster, if at least one of the clusters particles is within the
neighbor cutoff range. This potentially allows for better
vectorization on architectures such as the Intel Phi. If also reduces
the size of the neighbor list by roughly a factor of the cluster size,
thus reducing the total memory footprint considerably.
<P>All of the settings are optional keyword/value pairs. Each has a
default value as listed below.
</P>
<P>The <I>comm/exchange</I> and <I>comm/forward</I> keywords determine whether the
host or device performs the packing and unpacking of data when
communicating information between processors. "Exchange"
<P>The <I>neigh</I> keyword determines how neighbor lists are built. A value
of <I>half</I> uses half-neighbor lists, the same as used by most pair
styles in LAMMPS. A value of <I>half/thread</I> uses a thread-safe variant
of the half-neighbor list. It should be used instead of <I>half</I> when
running with more than 1 threads per MPI task on a CPU. A value of
<I>n2</I> uses an O(N^2) algorithm to build the neighbor list without
binning, where N = # of atoms on a processor. It is typically slower
than the other methods, which use binning.
</P>
<P>A value of <I>full</I> uses a full neighbor lists and is the default. This
performs twice as much computation as the <I>half</I> option, however that
is often a win because it is thread-safe and doesn't require atomic
operations in the calculation of pair forces.
</P>
<P>A value of <I>full/cluster</I> is an experimental neighbor style, where
particles interact with all particles within a small cluster, if at
least one of the clusters particles is within the neighbor cutoff
range. This potentially allows for better vectorization on
architectures such as the Intel Phi. If also reduces the size of the
neighbor list by roughly a factor of the cluster size, thus reducing
the total memory footprint considerably.
</P>
<P>The <I>comm</I> and <I>comm/exchange</I> and <I>comm/forward</I> keywords determine
whether the host or device performs the packing and unpacking of data
when communicating per-atom data between processors. "Exchange"
communication happens only on timesteps that neighbor lists are
rebuilt. The data is only for atoms that migrate to new processors.
"Forward" communication happens every timestep. The data is for atom
coordinates and any other atom properties that needs to be updated for
ghost atoms owned by each processor.
</P>
<P>The value options for these keywords are <I>no</I> or <I>host</I> or <I>device</I>.
<P>The <I>comm</I> keyword is simply a short-cut to set the same value
for both the <I>comm/exchange</I> and <I>comm/forward</I> keywords.
</P>
<P>The value options for all 3 keywords are <I>no</I> or <I>host</I> or <I>device</I>.
A value of <I>no</I> means to use the standard non-KOKKOS method of
packing/unpacking data for the communication. A value of <I>host</I> means
to use the host, typically a multi-core CPU, and perform the
@ -369,10 +403,12 @@ to use the device, typically a GPU, to perform the packing/unpacking
operation.
</P>
<P>The optimal choice for these keywords depends on the input script and
the hardware used. The <I>no</I> value is useful for verifying that Kokkos
code is working correctly. It may also be the fastest choice when
using Kokkos styles in MPI-only mode (i.e. with a thread count of 1).
When running on CPUs or Xeon Phi, the <I>host</I> and <I>device</I> values work
the hardware used. The <I>no</I> value is useful for verifying that the
Kokkos-based <I>host</I> and <I>device</I> values are working correctly. It may
also be the fastest choice when using Kokkos styles in MPI-only mode
(i.e. with a thread count of 1).
</P>
<P>When running on CPUs or Xeon Phi, the <I>host</I> and <I>device</I> values work
identically. When using GPUs, the <I>device</I> value will typically be
optimal if all of your styles used in your input script are supported
by the KOKKOS package. In this case data can stay on the GPU for many
@ -476,11 +512,13 @@ setting</A>
</P>
<P><B>Default:</B>
</P>
<P>To use the USER-CUDA package, the package cuda command must be invoked
explicitly in your input script or via the "-pk cuda" <A HREF = "Section_start.html#start_7">command-line
switch</A>. This will set the # of GPUs/node.
The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
test = not enabled, and thread = auto.
<P>For the USER-CUDA package, the default is Ngpu = 1 and the option
defaults are gpuID = 0 to Ngpu-1, timing = not enabled, test = not
enabled, and thread = auto. These settings are made automatically by
the required "-c on" <A HREF = "Section_start.html#start_7">command-line switch</A>.
You can change them bu using the package cuda command in your input
script or via the "-pk cuda" <A HREF = "Section_start.html#start_7">command-line
switch</A>.
</P>
<P>For the GPU package, the default is Ngpu = 1 and the option defaults
are neigh = yes, split = 1.0, gpuID = 0 to Ngpu-1, tpa = 1, binsize =
@ -491,24 +529,21 @@ must invoke the package gpu command in your input script or via the
"-pk gpu" <A HREF = "Section_start.html#start_7">command-line switch</A>.
</P>
<P>For the USER-INTEL package, the default is Nphi = 1 and the option
defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. The
default ghost option is determined by the pair style being used. This
value used is output to the screen in the offload report at the end of
each run. These settings are made automatically if the "-sf intel"
<A HREF = "Section_start.html#start_7">command-line switch</A> is used. If it is
not used, you must invoke the package intel command in your input
script or or via the "-pk intel" <A HREF = "Section_start.html#start_7">command-line
switch</A>.
defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. Note
that all of these settings, except "prec", are ignored if LAMMPS was
not built with Xeon Phi coprocessor support. The default ghost option
is determined by the pair style being used. This value is output to
the screen in the offload report at the end of each run. These
settings are made automatically if the "-sf intel" <A HREF = "Section_start.html#start_7">command-line
switch</A> is used. If it is not used, you
must invoke the package intel command in your input script or or via
the "-pk intel" <A HREF = "Section_start.html#start_7">command-line switch</A>.
</P>
<P>The default settings for the KOKKOS package are "package kokkos neigh
full comm/exchange host comm/forward host". This is the case whether
the "-sf kk" <A HREF = "Section_start.html#start_7">command-line switch</A> is used
or not.
To use the KOKKOS package, the package kokkos command must be invoked
explicitly in your input script or via the "-pk kokkos" <A HREF = "Section_start.html#start_7">command-line
switch</A>. This will set the # of GPUs/node.
The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
test = not enabled, and thread = auto.
<P>For the KOKKOS package, the option defaults neigh = full and comm =
host. These settings are made automatically by the required "-k on"
<A HREF = "Section_start.html#start_7">command-line switch</A>. You can change them
bu using the package kokkos command in your input script or via the
"-pk kokkos" <A HREF = "Section_start.html#start_7">command-line switch</A>.
</P>
<P>For the OMP package, the default is Nthreads = 0 and the option
defaults are neigh = yes. These settings are made automatically if

View File

@ -63,11 +63,21 @@ args = arguments specific to the style :l
{tptask} value = Ntptask
Ntptask = max number of threads to use on coprocessor for each MPI task
{kokkos} args = keyword value ...
one or more keyword/value pairs may be appended
keywords = {neigh} or {comm/exchange} or {comm/forward}
zero or more keyword/value pairs may be appended
keywords = {neigh} or {comm} or {comm/exchange} or {comm/forward}
{neigh} value = {full} or {half/thread} or {half} or {n2} or {full/cluster}
full = full neighbor list
half/thread = half neighbor list built in thread-safe manner
half = half neighbor list, not thread-safe, only use when 1 thread/MPI task
n2 = non-binning neighbor list build, O(N^2) algorithm
full/cluster = full neighbor list with clustered groups of atoms
{comm} value = {no} or {host} or {device}
use value for both comm/exchange and comm/forward
{comm/exchange} value = {no} or {host} or {device}
{comm/forward} value = {no} or {host} or {device}
no = perform communication pack/unpack in non-KOKKOS mode
host = perform pack/unpack on host (e.g. with OpenMP threading)
device = perform pack/unpack on device (e.g. on GPU)
{omp} args = Nthreads keyword value ...
Nthread = # of OpenMP threads to associate with each MPI process
zero or more keyword/value pairs may be appended
@ -82,47 +92,59 @@ args = arguments specific to the style :l
package gpu 1
package gpu 1 split 0.75
package gpu 2 split -1.0
package cuda gpu/node/special 2 0 2
package cuda test 3948
package kokkos neigh half/thread comm/forward device
package omp 0 neigh yes
package cuda 2 gpuID 0 2
package cuda 1 test 3948
package kokkos neigh half/thread comm device
package omp 0 neigh no
package omp 4
package intel * mixed balance -1 :pre
[Description:]
This command invokes package-specific settings. Currently the
following packages use it: USER-CUDA, GPU, USER-INTEL, KOKKOS, and
USER-OMP.
This command invokes package-specific settings for the various
accelerator packages available in LAMMPS. Currently the following
packages use settings from this command: USER-CUDA, GPU, USER-INTEL,
KOKKOS, and USER-OMP.
If allows calling multiple times, all options set to their
defaults, whether specified or not.
If this command is specified in an input script, it must be near the
top of the script, before the simulation box has been defined. This
is because it specifies settings that the accelerator packages use in
their intialization, before a simultion is defined.
Talk about command line switch -pk as alternate option.
This command can also be specified from the command-line when
launching LAMMPS, using the "-pk" "command-line
switch"_Section_start.html#start_7. The syntax is exactly the same as
when used in an input script.
Which packages require it to be invoked, only CUDA
this is b/c can only be invoked once
vs optional: all others? and allow multiple invokes
Note that all of the accelerator packages require the package command
to be specified (except the OPT package), if the package is to be used
in a simulation (LAMMPS can be built with an accelerator package
without using it in a particular simulation). However, in all cases,
a default version of the command is typically invoked by other
accelerator settings.
Must be invoked early in script, before simulation box is defined.
The USER-CUDA and KOKKOS packages require a "-c on" or "-k on"
"command-line switch"_Section_start.html#start_7 respectively, which
invokes a "package cuda" or "package kokkos" command with default
settings.
To use the accelerated GPU and USER-OMP styles, the use of the package
command is required. However, as described in the "Defaults" section
below, if you use the "-sf gpu" or "-sf omp" "command-line
options"_Section_start.html#start_7 to enable use of these styles,
then default package settings are enabled. In that case you only need
to use the package command if you want to change the defaults.
For the GPU, USER-INTEL, and USER-OMP packages, if a "-sf gpu" or "-sf
intel" or "-sf omp" "command-line switch"_Section_start.html#start_7
is used to auto-append accelerator suffixes to various styles in the
input script, then those switches also invoke a "package gpu",
"package intel", or "package omp" command with default settings.
To use the accelerated USER-CUDA and KOKKOS styles, the package
command is not required as defaults are assigned internally. You only
need to use the package command if you want to change the defaults.
IMPORTANT NOTE: A package command for a particular style can be
invoked multiple times when a simulation is setup, e.g. by the "-c
on", "-k on", "-sf", and "-pk" "command-line
switches"_Section_start.html#start_7, and by using this command in an
input script. Each time it is used all of the style options are set,
either to default values or to specified settings. I.e. settings from
previous invocations do not persist across multiple invocations.
See "Section_accelerate"_Section_accelerate.html of the manual for
more details about using these various packages for accelerating
LAMMPS calculations.
Package GPU always sets newton pair off. Not so for USER-CUDA
add newton options to GPU, CUDA, KOKKOS.
See the "Section Accelerate"_Section_accelerate.html section of the
manual for more details about using the various accelerator packages
for speeding up LAMMPS simulations.
:line
@ -329,32 +351,44 @@ generation Xeon Phi chip.
The {kokkos} style invokes settings associated with the use of the
KOKKOS package.
The {neigh} keyword determines what kinds of neighbor lists are built.
A value of {half} uses half-neighbor lists, the same as used by most
pair styles in LAMMPS. A value of {half/thread} uses a threadsafe
variant of the half-neighbor list. It should be used instead of
{half} when running with threads on a CPU. A value of {full} uses a
full-neighborlist, i.e. f_ij and f_ji are both calculated. This
performs twice as much computation as the {half} option, however that
can be a win because it is threadsafe and doesn't require atomic
operations. A value of {full/cluster} is an experimental neighbor
style, where particles interact with all particles within a small
cluster, if at least one of the clusters particles is within the
neighbor cutoff range. This potentially allows for better
vectorization on architectures such as the Intel Phi. If also reduces
the size of the neighbor list by roughly a factor of the cluster size,
thus reducing the total memory footprint considerably.
All of the settings are optional keyword/value pairs. Each has a
default value as listed below.
The {comm/exchange} and {comm/forward} keywords determine whether the
host or device performs the packing and unpacking of data when
communicating information between processors. "Exchange"
The {neigh} keyword determines how neighbor lists are built. A value
of {half} uses half-neighbor lists, the same as used by most pair
styles in LAMMPS. A value of {half/thread} uses a thread-safe variant
of the half-neighbor list. It should be used instead of {half} when
running with more than 1 threads per MPI task on a CPU. A value of
{n2} uses an O(N^2) algorithm to build the neighbor list without
binning, where N = # of atoms on a processor. It is typically slower
than the other methods, which use binning.
A value of {full} uses a full neighbor lists and is the default. This
performs twice as much computation as the {half} option, however that
is often a win because it is thread-safe and doesn't require atomic
operations in the calculation of pair forces.
A value of {full/cluster} is an experimental neighbor style, where
particles interact with all particles within a small cluster, if at
least one of the clusters particles is within the neighbor cutoff
range. This potentially allows for better vectorization on
architectures such as the Intel Phi. If also reduces the size of the
neighbor list by roughly a factor of the cluster size, thus reducing
the total memory footprint considerably.
The {comm} and {comm/exchange} and {comm/forward} keywords determine
whether the host or device performs the packing and unpacking of data
when communicating per-atom data between processors. "Exchange"
communication happens only on timesteps that neighbor lists are
rebuilt. The data is only for atoms that migrate to new processors.
"Forward" communication happens every timestep. The data is for atom
coordinates and any other atom properties that needs to be updated for
ghost atoms owned by each processor.
The value options for these keywords are {no} or {host} or {device}.
The {comm} keyword is simply a short-cut to set the same value
for both the {comm/exchange} and {comm/forward} keywords.
The value options for all 3 keywords are {no} or {host} or {device}.
A value of {no} means to use the standard non-KOKKOS method of
packing/unpacking data for the communication. A value of {host} means
to use the host, typically a multi-core CPU, and perform the
@ -363,9 +397,11 @@ to use the device, typically a GPU, to perform the packing/unpacking
operation.
The optimal choice for these keywords depends on the input script and
the hardware used. The {no} value is useful for verifying that Kokkos
code is working correctly. It may also be the fastest choice when
using Kokkos styles in MPI-only mode (i.e. with a thread count of 1).
the hardware used. The {no} value is useful for verifying that the
Kokkos-based {host} and {device} values are working correctly. It may
also be the fastest choice when using Kokkos styles in MPI-only mode
(i.e. with a thread count of 1).
When running on CPUs or Xeon Phi, the {host} and {device} values work
identically. When using GPUs, the {device} value will typically be
optimal if all of your styles used in your input script are supported
@ -470,11 +506,13 @@ setting"_Section_start.html#start_7
[Default:]
To use the USER-CUDA package, the package cuda command must be invoked
explicitly in your input script or via the "-pk cuda" "command-line
switch"_Section_start.html#start_7. This will set the # of GPUs/node.
The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
test = not enabled, and thread = auto.
For the USER-CUDA package, the default is Ngpu = 1 and the option
defaults are gpuID = 0 to Ngpu-1, timing = not enabled, test = not
enabled, and thread = auto. These settings are made automatically by
the required "-c on" "command-line switch"_Section_start.html#start_7.
You can change them bu using the package cuda command in your input
script or via the "-pk cuda" "command-line
switch"_Section_start.html#start_7.
For the GPU package, the default is Ngpu = 1 and the option defaults
are neigh = yes, split = 1.0, gpuID = 0 to Ngpu-1, tpa = 1, binsize =
@ -485,24 +523,21 @@ must invoke the package gpu command in your input script or via the
"-pk gpu" "command-line switch"_Section_start.html#start_7.
For the USER-INTEL package, the default is Nphi = 1 and the option
defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. The
default ghost option is determined by the pair style being used. This
value used is output to the screen in the offload report at the end of
each run. These settings are made automatically if the "-sf intel"
"command-line switch"_Section_start.html#start_7 is used. If it is
not used, you must invoke the package intel command in your input
script or or via the "-pk intel" "command-line
switch"_Section_start.html#start_7.
defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. Note
that all of these settings, except "prec", are ignored if LAMMPS was
not built with Xeon Phi coprocessor support. The default ghost option
is determined by the pair style being used. This value is output to
the screen in the offload report at the end of each run. These
settings are made automatically if the "-sf intel" "command-line
switch"_Section_start.html#start_7 is used. If it is not used, you
must invoke the package intel command in your input script or or via
the "-pk intel" "command-line switch"_Section_start.html#start_7.
The default settings for the KOKKOS package are "package kokkos neigh
full comm/exchange host comm/forward host". This is the case whether
the "-sf kk" "command-line switch"_Section_start.html#start_7 is used
or not.
To use the KOKKOS package, the package kokkos command must be invoked
explicitly in your input script or via the "-pk kokkos" "command-line
switch"_Section_start.html#start_7. This will set the # of GPUs/node.
The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
test = not enabled, and thread = auto.
For the KOKKOS package, the option defaults neigh = full and comm =
host. These settings are made automatically by the required "-k on"
"command-line switch"_Section_start.html#start_7. You can change them
bu using the package kokkos command in your input script or via the
"-pk kokkos" "command-line switch"_Section_start.html#start_7.
For the OMP package, the default is Nthreads = 0 and the option
defaults are neigh = yes. These settings are made automatically if
@ -510,4 +545,3 @@ the "-sf omp" "command-line switch"_Section_start.html#start_7 is
used. If it is not used, you must invoke the package omp command in
your input script or via the "-pk omp" "command-line
switch"_Section_start.html#start_7.