git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12466 f3b2605a-c512-4ea7-a41b-209d697bcdaa

This commit is contained in:
sjplimp 2014-09-10 16:25:52 +00:00
parent 1025e266b1
commit 9d11e531e7
2 changed files with 221 additions and 152 deletions

View File

@ -68,11 +68,21 @@
<I>tptask</I> value = Ntptask <I>tptask</I> value = Ntptask
Ntptask = max number of threads to use on coprocessor for each MPI task Ntptask = max number of threads to use on coprocessor for each MPI task
<I>kokkos</I> args = keyword value ... <I>kokkos</I> args = keyword value ...
one or more keyword/value pairs may be appended zero or more keyword/value pairs may be appended
keywords = <I>neigh</I> or <I>comm/exchange</I> or <I>comm/forward</I> keywords = <I>neigh</I> or <I>comm</I> or <I>comm/exchange</I> or <I>comm/forward</I>
<I>neigh</I> value = <I>full</I> or <I>half/thread</I> or <I>half</I> or <I>n2</I> or <I>full/cluster</I> <I>neigh</I> value = <I>full</I> or <I>half/thread</I> or <I>half</I> or <I>n2</I> or <I>full/cluster</I>
full = full neighbor list
half/thread = half neighbor list built in thread-safe manner
half = half neighbor list, not thread-safe, only use when 1 thread/MPI task
n2 = non-binning neighbor list build, O(N^2) algorithm
full/cluster = full neighbor list with clustered groups of atoms
<I>comm</I> value = <I>no</I> or <I>host</I> or <I>device</I>
use value for both comm/exchange and comm/forward
<I>comm/exchange</I> value = <I>no</I> or <I>host</I> or <I>device</I> <I>comm/exchange</I> value = <I>no</I> or <I>host</I> or <I>device</I>
<I>comm/forward</I> value = <I>no</I> or <I>host</I> or <I>device</I> <I>comm/forward</I> value = <I>no</I> or <I>host</I> or <I>device</I>
no = perform communication pack/unpack in non-KOKKOS mode
host = perform pack/unpack on host (e.g. with OpenMP threading)
device = perform pack/unpack on device (e.g. on GPU)
<I>omp</I> args = Nthreads keyword value ... <I>omp</I> args = Nthreads keyword value ...
Nthread = # of OpenMP threads to associate with each MPI process Nthread = # of OpenMP threads to associate with each MPI process
zero or more keyword/value pairs may be appended zero or more keyword/value pairs may be appended
@ -88,47 +98,59 @@
<PRE>package gpu 1 <PRE>package gpu 1
package gpu 1 split 0.75 package gpu 1 split 0.75
package gpu 2 split -1.0 package gpu 2 split -1.0
package cuda gpu/node/special 2 0 2 package cuda 2 gpuID 0 2
package cuda test 3948 package cuda 1 test 3948
package kokkos neigh half/thread comm/forward device package kokkos neigh half/thread comm device
package omp 0 neigh yes package omp 0 neigh no
package omp 4 package omp 4
package intel * mixed balance -1 package intel * mixed balance -1
</PRE> </PRE>
<P><B>Description:</B> <P><B>Description:</B>
</P> </P>
<P>This command invokes package-specific settings. Currently the <P>This command invokes package-specific settings for the various
following packages use it: USER-CUDA, GPU, USER-INTEL, KOKKOS, and accelerator packages available in LAMMPS. Currently the following
USER-OMP. packages use settings from this command: USER-CUDA, GPU, USER-INTEL,
KOKKOS, and USER-OMP.
</P> </P>
<P>If allows calling multiple times, all options set to their <P>If this command is specified in an input script, it must be near the
defaults, whether specified or not. top of the script, before the simulation box has been defined. This
is because it specifies settings that the accelerator packages use in
their intialization, before a simultion is defined.
</P> </P>
<P>Talk about command line switch -pk as alternate option. <P>This command can also be specified from the command-line when
launching LAMMPS, using the "-pk" <A HREF = "Section_start.html#start_7">command-line
switch</A>. The syntax is exactly the same as
when used in an input script.
</P> </P>
<P>Which packages require it to be invoked, only CUDA <P>Note that all of the accelerator packages require the package command
this is b/c can only be invoked once to be specified (except the OPT package), if the package is to be used
vs optional: all others? and allow multiple invokes in a simulation (LAMMPS can be built with an accelerator package
without using it in a particular simulation). However, in all cases,
a default version of the command is typically invoked by other
accelerator settings.
</P> </P>
<P>Must be invoked early in script, before simulation box is defined. <P>The USER-CUDA and KOKKOS packages require a "-c on" or "-k on"
<A HREF = "Section_start.html#start_7">command-line switch</A> respectively, which
invokes a "package cuda" or "package kokkos" command with default
settings.
</P> </P>
<P>To use the accelerated GPU and USER-OMP styles, the use of the package <P>For the GPU, USER-INTEL, and USER-OMP packages, if a "-sf gpu" or "-sf
command is required. However, as described in the "Defaults" section intel" or "-sf omp" <A HREF = "Section_start.html#start_7">command-line switch</A>
below, if you use the "-sf gpu" or "-sf omp" <A HREF = "Section_start.html#start_7">command-line is used to auto-append accelerator suffixes to various styles in the
options</A> to enable use of these styles, input script, then those switches also invoke a "package gpu",
then default package settings are enabled. In that case you only need "package intel", or "package omp" command with default settings.
to use the package command if you want to change the defaults.
</P> </P>
<P>To use the accelerated USER-CUDA and KOKKOS styles, the package <P>IMPORTANT NOTE: A package command for a particular style can be
command is not required as defaults are assigned internally. You only invoked multiple times when a simulation is setup, e.g. by the "-c
need to use the package command if you want to change the defaults. on", "-k on", "-sf", and "-pk" <A HREF = "Section_start.html#start_7">command-line
switches</A>, and by using this command in an
input script. Each time it is used all of the style options are set,
either to default values or to specified settings. I.e. settings from
previous invocations do not persist across multiple invocations.
</P> </P>
<P>See <A HREF = "Section_accelerate.html">Section_accelerate</A> of the manual for <P>See the <A HREF = "Section_accelerate.html">Section Accelerate</A> section of the
more details about using these various packages for accelerating manual for more details about using the various accelerator packages
LAMMPS calculations. for speeding up LAMMPS simulations.
</P>
<P>Package GPU always sets newton pair off. Not so for USER-CUDA
add newton options to GPU, CUDA, KOKKOS.
</P> </P>
<HR> <HR>
@ -335,32 +357,44 @@ generation Xeon Phi chip.
<P>The <I>kokkos</I> style invokes settings associated with the use of the <P>The <I>kokkos</I> style invokes settings associated with the use of the
KOKKOS package. KOKKOS package.
</P> </P>
<P>The <I>neigh</I> keyword determines what kinds of neighbor lists are built. <P>All of the settings are optional keyword/value pairs. Each has a
A value of <I>half</I> uses half-neighbor lists, the same as used by most default value as listed below.
pair styles in LAMMPS. A value of <I>half/thread</I> uses a threadsafe
variant of the half-neighbor list. It should be used instead of
<I>half</I> when running with threads on a CPU. A value of <I>full</I> uses a
full-neighborlist, i.e. f_ij and f_ji are both calculated. This
performs twice as much computation as the <I>half</I> option, however that
can be a win because it is threadsafe and doesn't require atomic
operations. A value of <I>full/cluster</I> is an experimental neighbor
style, where particles interact with all particles within a small
cluster, if at least one of the clusters particles is within the
neighbor cutoff range. This potentially allows for better
vectorization on architectures such as the Intel Phi. If also reduces
the size of the neighbor list by roughly a factor of the cluster size,
thus reducing the total memory footprint considerably.
</P> </P>
<P>The <I>comm/exchange</I> and <I>comm/forward</I> keywords determine whether the <P>The <I>neigh</I> keyword determines how neighbor lists are built. A value
host or device performs the packing and unpacking of data when of <I>half</I> uses half-neighbor lists, the same as used by most pair
communicating information between processors. "Exchange" styles in LAMMPS. A value of <I>half/thread</I> uses a thread-safe variant
of the half-neighbor list. It should be used instead of <I>half</I> when
running with more than 1 threads per MPI task on a CPU. A value of
<I>n2</I> uses an O(N^2) algorithm to build the neighbor list without
binning, where N = # of atoms on a processor. It is typically slower
than the other methods, which use binning.
</P>
<P>A value of <I>full</I> uses a full neighbor lists and is the default. This
performs twice as much computation as the <I>half</I> option, however that
is often a win because it is thread-safe and doesn't require atomic
operations in the calculation of pair forces.
</P>
<P>A value of <I>full/cluster</I> is an experimental neighbor style, where
particles interact with all particles within a small cluster, if at
least one of the clusters particles is within the neighbor cutoff
range. This potentially allows for better vectorization on
architectures such as the Intel Phi. If also reduces the size of the
neighbor list by roughly a factor of the cluster size, thus reducing
the total memory footprint considerably.
</P>
<P>The <I>comm</I> and <I>comm/exchange</I> and <I>comm/forward</I> keywords determine
whether the host or device performs the packing and unpacking of data
when communicating per-atom data between processors. "Exchange"
communication happens only on timesteps that neighbor lists are communication happens only on timesteps that neighbor lists are
rebuilt. The data is only for atoms that migrate to new processors. rebuilt. The data is only for atoms that migrate to new processors.
"Forward" communication happens every timestep. The data is for atom "Forward" communication happens every timestep. The data is for atom
coordinates and any other atom properties that needs to be updated for coordinates and any other atom properties that needs to be updated for
ghost atoms owned by each processor. ghost atoms owned by each processor.
</P> </P>
<P>The value options for these keywords are <I>no</I> or <I>host</I> or <I>device</I>. <P>The <I>comm</I> keyword is simply a short-cut to set the same value
for both the <I>comm/exchange</I> and <I>comm/forward</I> keywords.
</P>
<P>The value options for all 3 keywords are <I>no</I> or <I>host</I> or <I>device</I>.
A value of <I>no</I> means to use the standard non-KOKKOS method of A value of <I>no</I> means to use the standard non-KOKKOS method of
packing/unpacking data for the communication. A value of <I>host</I> means packing/unpacking data for the communication. A value of <I>host</I> means
to use the host, typically a multi-core CPU, and perform the to use the host, typically a multi-core CPU, and perform the
@ -369,10 +403,12 @@ to use the device, typically a GPU, to perform the packing/unpacking
operation. operation.
</P> </P>
<P>The optimal choice for these keywords depends on the input script and <P>The optimal choice for these keywords depends on the input script and
the hardware used. The <I>no</I> value is useful for verifying that Kokkos the hardware used. The <I>no</I> value is useful for verifying that the
code is working correctly. It may also be the fastest choice when Kokkos-based <I>host</I> and <I>device</I> values are working correctly. It may
using Kokkos styles in MPI-only mode (i.e. with a thread count of 1). also be the fastest choice when using Kokkos styles in MPI-only mode
When running on CPUs or Xeon Phi, the <I>host</I> and <I>device</I> values work (i.e. with a thread count of 1).
</P>
<P>When running on CPUs or Xeon Phi, the <I>host</I> and <I>device</I> values work
identically. When using GPUs, the <I>device</I> value will typically be identically. When using GPUs, the <I>device</I> value will typically be
optimal if all of your styles used in your input script are supported optimal if all of your styles used in your input script are supported
by the KOKKOS package. In this case data can stay on the GPU for many by the KOKKOS package. In this case data can stay on the GPU for many
@ -476,11 +512,13 @@ setting</A>
</P> </P>
<P><B>Default:</B> <P><B>Default:</B>
</P> </P>
<P>To use the USER-CUDA package, the package cuda command must be invoked <P>For the USER-CUDA package, the default is Ngpu = 1 and the option
explicitly in your input script or via the "-pk cuda" <A HREF = "Section_start.html#start_7">command-line defaults are gpuID = 0 to Ngpu-1, timing = not enabled, test = not
switch</A>. This will set the # of GPUs/node. enabled, and thread = auto. These settings are made automatically by
The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled, the required "-c on" <A HREF = "Section_start.html#start_7">command-line switch</A>.
test = not enabled, and thread = auto. You can change them bu using the package cuda command in your input
script or via the "-pk cuda" <A HREF = "Section_start.html#start_7">command-line
switch</A>.
</P> </P>
<P>For the GPU package, the default is Ngpu = 1 and the option defaults <P>For the GPU package, the default is Ngpu = 1 and the option defaults
are neigh = yes, split = 1.0, gpuID = 0 to Ngpu-1, tpa = 1, binsize = are neigh = yes, split = 1.0, gpuID = 0 to Ngpu-1, tpa = 1, binsize =
@ -491,24 +529,21 @@ must invoke the package gpu command in your input script or via the
"-pk gpu" <A HREF = "Section_start.html#start_7">command-line switch</A>. "-pk gpu" <A HREF = "Section_start.html#start_7">command-line switch</A>.
</P> </P>
<P>For the USER-INTEL package, the default is Nphi = 1 and the option <P>For the USER-INTEL package, the default is Nphi = 1 and the option
defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. The defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. Note
default ghost option is determined by the pair style being used. This that all of these settings, except "prec", are ignored if LAMMPS was
value used is output to the screen in the offload report at the end of not built with Xeon Phi coprocessor support. The default ghost option
each run. These settings are made automatically if the "-sf intel" is determined by the pair style being used. This value is output to
<A HREF = "Section_start.html#start_7">command-line switch</A> is used. If it is the screen in the offload report at the end of each run. These
not used, you must invoke the package intel command in your input settings are made automatically if the "-sf intel" <A HREF = "Section_start.html#start_7">command-line
script or or via the "-pk intel" <A HREF = "Section_start.html#start_7">command-line switch</A> is used. If it is not used, you
switch</A>. must invoke the package intel command in your input script or or via
the "-pk intel" <A HREF = "Section_start.html#start_7">command-line switch</A>.
</P> </P>
<P>The default settings for the KOKKOS package are "package kokkos neigh <P>For the KOKKOS package, the option defaults neigh = full and comm =
full comm/exchange host comm/forward host". This is the case whether host. These settings are made automatically by the required "-k on"
the "-sf kk" <A HREF = "Section_start.html#start_7">command-line switch</A> is used <A HREF = "Section_start.html#start_7">command-line switch</A>. You can change them
or not. bu using the package kokkos command in your input script or via the
To use the KOKKOS package, the package kokkos command must be invoked "-pk kokkos" <A HREF = "Section_start.html#start_7">command-line switch</A>.
explicitly in your input script or via the "-pk kokkos" <A HREF = "Section_start.html#start_7">command-line
switch</A>. This will set the # of GPUs/node.
The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
test = not enabled, and thread = auto.
</P> </P>
<P>For the OMP package, the default is Nthreads = 0 and the option <P>For the OMP package, the default is Nthreads = 0 and the option
defaults are neigh = yes. These settings are made automatically if defaults are neigh = yes. These settings are made automatically if

View File

@ -63,11 +63,21 @@ args = arguments specific to the style :l
{tptask} value = Ntptask {tptask} value = Ntptask
Ntptask = max number of threads to use on coprocessor for each MPI task Ntptask = max number of threads to use on coprocessor for each MPI task
{kokkos} args = keyword value ... {kokkos} args = keyword value ...
one or more keyword/value pairs may be appended zero or more keyword/value pairs may be appended
keywords = {neigh} or {comm/exchange} or {comm/forward} keywords = {neigh} or {comm} or {comm/exchange} or {comm/forward}
{neigh} value = {full} or {half/thread} or {half} or {n2} or {full/cluster} {neigh} value = {full} or {half/thread} or {half} or {n2} or {full/cluster}
full = full neighbor list
half/thread = half neighbor list built in thread-safe manner
half = half neighbor list, not thread-safe, only use when 1 thread/MPI task
n2 = non-binning neighbor list build, O(N^2) algorithm
full/cluster = full neighbor list with clustered groups of atoms
{comm} value = {no} or {host} or {device}
use value for both comm/exchange and comm/forward
{comm/exchange} value = {no} or {host} or {device} {comm/exchange} value = {no} or {host} or {device}
{comm/forward} value = {no} or {host} or {device} {comm/forward} value = {no} or {host} or {device}
no = perform communication pack/unpack in non-KOKKOS mode
host = perform pack/unpack on host (e.g. with OpenMP threading)
device = perform pack/unpack on device (e.g. on GPU)
{omp} args = Nthreads keyword value ... {omp} args = Nthreads keyword value ...
Nthread = # of OpenMP threads to associate with each MPI process Nthread = # of OpenMP threads to associate with each MPI process
zero or more keyword/value pairs may be appended zero or more keyword/value pairs may be appended
@ -82,47 +92,59 @@ args = arguments specific to the style :l
package gpu 1 package gpu 1
package gpu 1 split 0.75 package gpu 1 split 0.75
package gpu 2 split -1.0 package gpu 2 split -1.0
package cuda gpu/node/special 2 0 2 package cuda 2 gpuID 0 2
package cuda test 3948 package cuda 1 test 3948
package kokkos neigh half/thread comm/forward device package kokkos neigh half/thread comm device
package omp 0 neigh yes package omp 0 neigh no
package omp 4 package omp 4
package intel * mixed balance -1 :pre package intel * mixed balance -1 :pre
[Description:] [Description:]
This command invokes package-specific settings. Currently the This command invokes package-specific settings for the various
following packages use it: USER-CUDA, GPU, USER-INTEL, KOKKOS, and accelerator packages available in LAMMPS. Currently the following
USER-OMP. packages use settings from this command: USER-CUDA, GPU, USER-INTEL,
KOKKOS, and USER-OMP.
If allows calling multiple times, all options set to their If this command is specified in an input script, it must be near the
defaults, whether specified or not. top of the script, before the simulation box has been defined. This
is because it specifies settings that the accelerator packages use in
their intialization, before a simultion is defined.
Talk about command line switch -pk as alternate option. This command can also be specified from the command-line when
launching LAMMPS, using the "-pk" "command-line
switch"_Section_start.html#start_7. The syntax is exactly the same as
when used in an input script.
Which packages require it to be invoked, only CUDA Note that all of the accelerator packages require the package command
this is b/c can only be invoked once to be specified (except the OPT package), if the package is to be used
vs optional: all others? and allow multiple invokes in a simulation (LAMMPS can be built with an accelerator package
without using it in a particular simulation). However, in all cases,
a default version of the command is typically invoked by other
accelerator settings.
Must be invoked early in script, before simulation box is defined. The USER-CUDA and KOKKOS packages require a "-c on" or "-k on"
"command-line switch"_Section_start.html#start_7 respectively, which
invokes a "package cuda" or "package kokkos" command with default
settings.
To use the accelerated GPU and USER-OMP styles, the use of the package For the GPU, USER-INTEL, and USER-OMP packages, if a "-sf gpu" or "-sf
command is required. However, as described in the "Defaults" section intel" or "-sf omp" "command-line switch"_Section_start.html#start_7
below, if you use the "-sf gpu" or "-sf omp" "command-line is used to auto-append accelerator suffixes to various styles in the
options"_Section_start.html#start_7 to enable use of these styles, input script, then those switches also invoke a "package gpu",
then default package settings are enabled. In that case you only need "package intel", or "package omp" command with default settings.
to use the package command if you want to change the defaults.
To use the accelerated USER-CUDA and KOKKOS styles, the package IMPORTANT NOTE: A package command for a particular style can be
command is not required as defaults are assigned internally. You only invoked multiple times when a simulation is setup, e.g. by the "-c
need to use the package command if you want to change the defaults. on", "-k on", "-sf", and "-pk" "command-line
switches"_Section_start.html#start_7, and by using this command in an
input script. Each time it is used all of the style options are set,
either to default values or to specified settings. I.e. settings from
previous invocations do not persist across multiple invocations.
See "Section_accelerate"_Section_accelerate.html of the manual for See the "Section Accelerate"_Section_accelerate.html section of the
more details about using these various packages for accelerating manual for more details about using the various accelerator packages
LAMMPS calculations. for speeding up LAMMPS simulations.
Package GPU always sets newton pair off. Not so for USER-CUDA
add newton options to GPU, CUDA, KOKKOS.
:line :line
@ -329,32 +351,44 @@ generation Xeon Phi chip.
The {kokkos} style invokes settings associated with the use of the The {kokkos} style invokes settings associated with the use of the
KOKKOS package. KOKKOS package.
The {neigh} keyword determines what kinds of neighbor lists are built. All of the settings are optional keyword/value pairs. Each has a
A value of {half} uses half-neighbor lists, the same as used by most default value as listed below.
pair styles in LAMMPS. A value of {half/thread} uses a threadsafe
variant of the half-neighbor list. It should be used instead of
{half} when running with threads on a CPU. A value of {full} uses a
full-neighborlist, i.e. f_ij and f_ji are both calculated. This
performs twice as much computation as the {half} option, however that
can be a win because it is threadsafe and doesn't require atomic
operations. A value of {full/cluster} is an experimental neighbor
style, where particles interact with all particles within a small
cluster, if at least one of the clusters particles is within the
neighbor cutoff range. This potentially allows for better
vectorization on architectures such as the Intel Phi. If also reduces
the size of the neighbor list by roughly a factor of the cluster size,
thus reducing the total memory footprint considerably.
The {comm/exchange} and {comm/forward} keywords determine whether the The {neigh} keyword determines how neighbor lists are built. A value
host or device performs the packing and unpacking of data when of {half} uses half-neighbor lists, the same as used by most pair
communicating information between processors. "Exchange" styles in LAMMPS. A value of {half/thread} uses a thread-safe variant
of the half-neighbor list. It should be used instead of {half} when
running with more than 1 threads per MPI task on a CPU. A value of
{n2} uses an O(N^2) algorithm to build the neighbor list without
binning, where N = # of atoms on a processor. It is typically slower
than the other methods, which use binning.
A value of {full} uses a full neighbor lists and is the default. This
performs twice as much computation as the {half} option, however that
is often a win because it is thread-safe and doesn't require atomic
operations in the calculation of pair forces.
A value of {full/cluster} is an experimental neighbor style, where
particles interact with all particles within a small cluster, if at
least one of the clusters particles is within the neighbor cutoff
range. This potentially allows for better vectorization on
architectures such as the Intel Phi. If also reduces the size of the
neighbor list by roughly a factor of the cluster size, thus reducing
the total memory footprint considerably.
The {comm} and {comm/exchange} and {comm/forward} keywords determine
whether the host or device performs the packing and unpacking of data
when communicating per-atom data between processors. "Exchange"
communication happens only on timesteps that neighbor lists are communication happens only on timesteps that neighbor lists are
rebuilt. The data is only for atoms that migrate to new processors. rebuilt. The data is only for atoms that migrate to new processors.
"Forward" communication happens every timestep. The data is for atom "Forward" communication happens every timestep. The data is for atom
coordinates and any other atom properties that needs to be updated for coordinates and any other atom properties that needs to be updated for
ghost atoms owned by each processor. ghost atoms owned by each processor.
The value options for these keywords are {no} or {host} or {device}. The {comm} keyword is simply a short-cut to set the same value
for both the {comm/exchange} and {comm/forward} keywords.
The value options for all 3 keywords are {no} or {host} or {device}.
A value of {no} means to use the standard non-KOKKOS method of A value of {no} means to use the standard non-KOKKOS method of
packing/unpacking data for the communication. A value of {host} means packing/unpacking data for the communication. A value of {host} means
to use the host, typically a multi-core CPU, and perform the to use the host, typically a multi-core CPU, and perform the
@ -363,9 +397,11 @@ to use the device, typically a GPU, to perform the packing/unpacking
operation. operation.
The optimal choice for these keywords depends on the input script and The optimal choice for these keywords depends on the input script and
the hardware used. The {no} value is useful for verifying that Kokkos the hardware used. The {no} value is useful for verifying that the
code is working correctly. It may also be the fastest choice when Kokkos-based {host} and {device} values are working correctly. It may
using Kokkos styles in MPI-only mode (i.e. with a thread count of 1). also be the fastest choice when using Kokkos styles in MPI-only mode
(i.e. with a thread count of 1).
When running on CPUs or Xeon Phi, the {host} and {device} values work When running on CPUs or Xeon Phi, the {host} and {device} values work
identically. When using GPUs, the {device} value will typically be identically. When using GPUs, the {device} value will typically be
optimal if all of your styles used in your input script are supported optimal if all of your styles used in your input script are supported
@ -470,11 +506,13 @@ setting"_Section_start.html#start_7
[Default:] [Default:]
To use the USER-CUDA package, the package cuda command must be invoked For the USER-CUDA package, the default is Ngpu = 1 and the option
explicitly in your input script or via the "-pk cuda" "command-line defaults are gpuID = 0 to Ngpu-1, timing = not enabled, test = not
switch"_Section_start.html#start_7. This will set the # of GPUs/node. enabled, and thread = auto. These settings are made automatically by
The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled, the required "-c on" "command-line switch"_Section_start.html#start_7.
test = not enabled, and thread = auto. You can change them bu using the package cuda command in your input
script or via the "-pk cuda" "command-line
switch"_Section_start.html#start_7.
For the GPU package, the default is Ngpu = 1 and the option defaults For the GPU package, the default is Ngpu = 1 and the option defaults
are neigh = yes, split = 1.0, gpuID = 0 to Ngpu-1, tpa = 1, binsize = are neigh = yes, split = 1.0, gpuID = 0 to Ngpu-1, tpa = 1, binsize =
@ -485,24 +523,21 @@ must invoke the package gpu command in your input script or via the
"-pk gpu" "command-line switch"_Section_start.html#start_7. "-pk gpu" "command-line switch"_Section_start.html#start_7.
For the USER-INTEL package, the default is Nphi = 1 and the option For the USER-INTEL package, the default is Nphi = 1 and the option
defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. The defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. Note
default ghost option is determined by the pair style being used. This that all of these settings, except "prec", are ignored if LAMMPS was
value used is output to the screen in the offload report at the end of not built with Xeon Phi coprocessor support. The default ghost option
each run. These settings are made automatically if the "-sf intel" is determined by the pair style being used. This value is output to
"command-line switch"_Section_start.html#start_7 is used. If it is the screen in the offload report at the end of each run. These
not used, you must invoke the package intel command in your input settings are made automatically if the "-sf intel" "command-line
script or or via the "-pk intel" "command-line switch"_Section_start.html#start_7 is used. If it is not used, you
switch"_Section_start.html#start_7. must invoke the package intel command in your input script or or via
the "-pk intel" "command-line switch"_Section_start.html#start_7.
The default settings for the KOKKOS package are "package kokkos neigh For the KOKKOS package, the option defaults neigh = full and comm =
full comm/exchange host comm/forward host". This is the case whether host. These settings are made automatically by the required "-k on"
the "-sf kk" "command-line switch"_Section_start.html#start_7 is used "command-line switch"_Section_start.html#start_7. You can change them
or not. bu using the package kokkos command in your input script or via the
To use the KOKKOS package, the package kokkos command must be invoked "-pk kokkos" "command-line switch"_Section_start.html#start_7.
explicitly in your input script or via the "-pk kokkos" "command-line
switch"_Section_start.html#start_7. This will set the # of GPUs/node.
The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
test = not enabled, and thread = auto.
For the OMP package, the default is Nthreads = 0 and the option For the OMP package, the default is Nthreads = 0 and the option
defaults are neigh = yes. These settings are made automatically if defaults are neigh = yes. These settings are made automatically if
@ -510,4 +545,3 @@ the "-sf omp" "command-line switch"_Section_start.html#start_7 is
used. If it is not used, you must invoke the package omp command in used. If it is not used, you must invoke the package omp command in
your input script or via the "-pk omp" "command-line your input script or via the "-pk omp" "command-line
switch"_Section_start.html#start_7. switch"_Section_start.html#start_7.