lammps/doc/package.txt

606 lines
29 KiB
Plaintext

"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)
:line
package command :h3
[Syntax:]
package style args :pre
style = {cuda} or {gpu} or {intel} or {kokkos} or {omp} :ulb,l
args = arguments specific to the style :l
{cuda} args = Ngpu keyword value ...
Ngpu = # of GPUs per node
zero or more keyword/value pairs may be appended
keywords = {newton} or {gpuID} or {timing} or {test} or {thread}
{newton} = {off} or {on}
off = set Newton pairwise and bonded flags off (default)
on = set Newton pairwise and bonded flags on
{gpuID} values = gpu1 .. gpuN
gpu1 .. gpuN = IDs of the Ngpu GPUs to use
{timing} values = none
{test} values = id
id = atom-ID of a test particle
{thread} = auto or tpa or bpa
auto = test whether tpa or bpa is faster
tpa = one thread per atom
bpa = one block per atom
{gpu} args = Ngpu keyword value ...
Ngpu = # of GPUs per node
zero or more keyword/value pairs may be appended
keywords = {neigh} or {newton} or {binsize} or {split} or {gpuID} or {tpa} or {device}
{neigh} value = {yes} or {no}
yes = neighbor list build on GPU (default)
no = neighbor list build on CPU
{newton} = {off} or {on}
off = set Newton pairwise flag off (default and required)
on = set Newton pairwise flag on (currently not allowed)
{binsize} value = size
size = bin size for neighbor list construction (distance units)
{split} = fraction
fraction = fraction of atoms assigned to GPU (default = 1.0)
{gpuID} values = first last
first = ID of first GPU to be used on each node
last = ID of last GPU to be used on each node
{tpa} value = Nthreads
Nthreads = # of GPU threads used per atom
{device} value = device_type
device_type = {kepler} or {fermi} or {cypress} or {generic}
{intel} args = NPhi keyword value ...
Nphi = # of coprocessors per node
zero or more keyword/value pairs may be appended
keywords = {prec} or {balance} or {ghost} or {tpc} or {tptask}
{prec} value = {single} or {mixed} or {double}
single = perform force calculations in single precision
mixed = perform force calculations in mixed precision
double = perform force calculations in double precision
{balance} value = split
split = fraction of work to offload to coprocessor, -1 for dynamic
{ghost} value = {yes} or {no}
yes = include ghost atoms for offload
no = do not include ghost atoms for offload
{tpc} value = Ntpc
Ntpc = number of threads to use on each physical core of coprocessor
{tptask} value = Ntptask
Ntptask = max number of threads to use on coprocessor for each MPI task
{kokkos} args = keyword value ...
zero or more keyword/value pairs may be appended
keywords = {neigh} or {newton} or {binsize} or {comm} or {comm/exchange} or {comm/forward}
{neigh} value = {full} or {half/thread} or {half} or {n2} or {full/cluster}
full = full neighbor list
half/thread = half neighbor list built in thread-safe manner
half = half neighbor list, not thread-safe, only use when 1 thread/MPI task
n2 = non-binning neighbor list build, O(N^2) algorithm
full/cluster = full neighbor list with clustered groups of atoms
{newton} = {off} or {on}
off = set Newton pairwise and bonded flags off (default)
on = set Newton pairwise and bonded flags on
{binsize} value = size
size = bin size for neighbor list construction (distance units)
{comm} value = {no} or {host} or {device}
use value for both comm/exchange and comm/forward
{comm/exchange} value = {no} or {host} or {device}
{comm/forward} value = {no} or {host} or {device}
no = perform communication pack/unpack in non-KOKKOS mode
host = perform pack/unpack on host (e.g. with OpenMP threading)
device = perform pack/unpack on device (e.g. on GPU)
{omp} args = Nthreads keyword value ...
Nthread = # of OpenMP threads to associate with each MPI process
zero or more keyword/value pairs may be appended
keywords = {neigh}
{neigh} value = {yes} or {no}
yes = threaded neighbor list build (default)
no = non-threaded neighbor list build :pre
:ule
[Examples:]
package gpu 1
package gpu 1 split 0.75
package gpu 2 split -1.0
package cuda 2 gpuID 0 2
package cuda 1 test 3948
package kokkos neigh half/thread comm device
package omp 0 neigh no
package omp 4
package intel * mixed balance -1 :pre
[Description:]
This command invokes package-specific settings for the various
accelerator packages available in LAMMPS. Currently the following
packages use settings from this command: USER-CUDA, GPU, USER-INTEL,
KOKKOS, and USER-OMP.
If this command is specified in an input script, it must be near the
top of the script, before the simulation box has been defined. This
is because it specifies settings that the accelerator packages use in
their intialization, before a simultion is defined.
This command can also be specified from the command-line when
launching LAMMPS, using the "-pk" "command-line
switch"_Section_start.html#start_7. The syntax is exactly the same as
when used in an input script.
Note that all of the accelerator packages require the package command
to be specified (except the OPT package), if the package is to be used
in a simulation (LAMMPS can be built with an accelerator package
without using it in a particular simulation). However, in all cases,
a default version of the command is typically invoked by other
accelerator settings.
The USER-CUDA and KOKKOS packages require a "-c on" or "-k on"
"command-line switch"_Section_start.html#start_7 respectively, which
invokes a "package cuda" or "package kokkos" command with default
settings.
For the GPU, USER-INTEL, and USER-OMP packages, if a "-sf gpu" or "-sf
intel" or "-sf omp" "command-line switch"_Section_start.html#start_7
is used to auto-append accelerator suffixes to various styles in the
input script, then those switches also invoke a "package gpu",
"package intel", or "package omp" command with default settings.
IMPORTANT NOTE: A package command for a particular style can be
invoked multiple times when a simulation is setup, e.g. by the "-c
on", "-k on", "-sf", and "-pk" "command-line
switches"_Section_start.html#start_7, and by using this command in an
input script. Each time it is used all of the style options are set,
either to default values or to specified settings. I.e. settings from
previous invocations do not persist across multiple invocations.
See the "Section Accelerate"_Section_accelerate.html section of the
manual for more details about using the various accelerator packages
for speeding up LAMMPS simulations.
:line
The {cuda} style invokes settings associated with the use of the
USER-CUDA package.
The {Ngpus} argument sets the number of GPUs per node. There must be
exactly one MPI task per GPU, as set by the mpirun or mpiexec command.
Optional keyword/value pairs can also be specified. Each has a
default value as listed below.
The {newton} keyword sets the Newton flags for pairwise and bonded
interactions to {off} or {on}, the same as the "newton"_newton.html
command allows. The default is {off} because this will almost always
give better performance for the USER-CUDA package. This means
more computation is done, but less communication.
The {gpuID} keyword allows selection of which GPUs on each node will
be used for a simulation. GPU IDs range from 0 to N-1 where N is the
physical number of GPUs/node. An ID is specified for each of the
Ngpus being used. For example if you have three GPUs on a machine,
one of which is used for the X-Server (the GPU with the ID 1) while
the others (with IDs 0 and 2) are used for computations you would
specify:
package cuda 2 gpuID 0 2 :pre
The purpose of the {gpuID} keyword is to allow two (or more)
simulations to be run on one workstation. In that case one could set
the first simulation to use GPU 0 and the second to use GPU 1. This is
not necessary however, if the GPUs are in what is called {compute
exclusive} mode. Using that setting, every process will get its own
GPU automatically. This {compute exclusive} mode can be set as root
using the {nvidia-smi} tool which is part of the CUDA installation.
Also note that if the {gpuID} keyword is not used, the USER-CUDA
package sorts existing GPUs on each node according to their number of
multiprocessors. This way, compute GPUs will be priorized over
X-Server GPUs.
If the {timing} keyword is specified, detailed timing information for
various subroutines will be output.
If the {test} keyword is specified, information for the specified atom
with atom-ID will be output at several points during each timestep.
This is mainly usefull for debugging purposes. Note that the
simulation slow down dramatically if this option is used.
The {thread} keyword can be used to specify how GPU threads are
assigned work during pair style force evaluation. If the value =
{tpa}, one thread per atom is used. If the value = {bpa}, one block
per atom is used. If the value = {auto}, a short test is performed at
the beginning of each run to determing where {tpa} or {bpa} mode is
faster. The result of this test is output. Since {auto} is the
default value, it is usually not necessary to use this keyword.
:line
The {gpu} style invokes settings associated with the use of the GPU
package.
The {Ngpu} argument sets the number of GPUs per node. There must be
at least as many MPI tasks per node as GPUs, as set by the mpirun or
mpiexec command. If there are more MPI tasks (per node)
than GPUs, multiple MPI tasks will share each GPU.
Optional keyword/value pairs can also be specified. Each has a
default value as listed below.
The {neigh} keyword specifies where neighbor lists for pair style
computation will be built. If {neigh} is {yes}, which is the default,
neighbor list building is performed on the GPU. If {neigh} is {no},
neighbor list building is performed on the CPU. GPU neighbor list
building currently cannot be used with a triclinic box. GPU neighbor
list calculation currently cannot be used with
"hybrid"_pair_hybrid.html pair styles. GPU neighbor lists are not
compatible with comannds that are not GPU-enabled. When a non-GPU
enabled command requires a neighbor list, it will also be built on the
CPU. In these cases, it will typically be more efficient to only use
CPU neighbor list builds.
The {newton} keyword sets the Newton flags for pairwise (not bonded)
interactions to {off} or {on}, the same as the "newton"_newton.html
command allows. Currently, only an {off} value is allowed, since all
the GPU package pair styles require this setting. This means more
computation is done, but less communication. In the future a value of
{on} may be allowed, so the {newton} keyword is included as an option
for compatibility with the package command for other accelerator
styles. Note that the newton setting for bonded interactions is not
affected by this keyword.
The {binsize} keyword sets the size of bins used to bin atoms in
neighbor list builds performed on the GPU, if {neigh} = {yes} is set.
If {binsize} is set to 0.0 (the default), then bins = the size of the
pairwise cutoff + neighbor skin distance. This is 2x larger than the
LAMMPS default used for neighbor list building on the CPU. This will
be close to optimal for the GPU, so you do not normally need to use
this keyword. Note that if you use a longer-than-usual pairwise
cutoff, e.g. to allow for a smaller fraction of KSpace work with a
"long-range Coulombic solver"_kspace_style.html because the GPU is
faster at performing pairwise interactions, then it may be optimal to
make the {binsize} smaller than the default. For example, with a
cutoff of 20*sigma in LJ "units"_units.html and a neighbor skin
distance of sigma, a {binsize} = 5.25*sigma can be more efficient than
the default.
The {split} keyword can be used for load balancing force calculations
between CPU and GPU cores in GPU-enabled pair styles. If 0 < {split} <
1.0, a fixed fraction of particles is offloaded to the GPU while force
calculation for the other particles occurs simulataneously on the
CPU. If {split} < 0.0, the optimal fraction (based on CPU and GPU
timings) is calculated every 25 timesteps. If {split} = 1.0, all
force calculations for GPU accelerated pair styles are performed on
the GPU. In this case, other "hybrid"_pair_hybrid.html pair
interactions, "bond"_bond_style.html, "angle"_angle_style.html,
"dihedral"_dihedral_style.html, "improper"_improper_style.html, and
"long-range"_kspace_style.html calculations can be performed on the
CPU while the GPU is performing force calculations for the GPU-enabled
pair style. If all CPU force computations complete before the GPU
completes, LAMMPS will block until the GPU has finished before
continuing the timestep.
As an example, if you have two GPUs per node and 8 CPU cores per node,
and would like to run on 4 nodes (32 cores) with dynamic balancing of
force calculation across CPU and GPU cores, you could specify
mpirun -np 32 -sf gpu -in in.script # launch command
package gpu 2 split -1 # input script command :pre
In this case, all CPU cores and GPU devices on the nodes would be
utilized. Each GPU device would be shared by 4 CPU cores. The CPU
cores would perform force calculations for some fraction of the
particles at the same time the GPUs performed force calculation for
the other particles.
The {gpuID} keyword allows selection of which GPUs on each node will
be used for a simulation. The {first} and {last} values specify the
GPU IDs to use (from 0 to Ngpu-1). By default, first = 0 and last =
Ngpu-1, so that all GPUs are used, assuming Ngpu is set to the number
of physical GPUs. If you only wish to use a subset, set Ngpu to a
smaller number and first/last to a sub-range of the available GPUs.
The {tpa} keyword sets the number of GPU thread per atom used to
perform force calculations. With a default value of 1, the number of
threads will be chosen based on the pair style, however, the value can
be set explicitly with this keyword to fine-tune performance. For
large cutoffs or with a small number of particles per GPU, increasing
the value can improve performance. The number of threads per atom must
be a power of 2 and currently cannot be greater than 32.
The {device} keyword can be used to tune parameters optimized for a
specific accelerator, when using OpenCL. For CUDA, the {device}
keyword is ignored. Currently, the device type is limited to NVIDIA
Kepler, NVIDIA Fermi, AMD Cypress, or a generic device. More devices
may be added later. The default device type can be specified when
building LAMMPS with the GPU library, via settings in the
lib/gpu/Makefile that is used.
:line
The {intel} style invokes settings associated with the use of the
USER-INTEL package. All of its settings, except the {prec} keyword,
are ignored if LAMMPS was not built with Xeon Phi coprocessor support,
when building with the USER-INTEL package. All of its settings,
including the {prec} keyword are applicable if LAMMPS was built with
coprocessor support.
The {Nphi} argument sets the number of coprocessors per node.
Optional keyword/value pairs can also be specified. Each has a
default value as listed below.
The {prec} keyword argument determines the precision mode to use for
computing pair style forces, either on the CPU or on the coprocessor,
when using a USER-INTEL supported "pair style"_pair_style.html. It
can take a value of {single}, {mixed} which is the default, or
{double}. {Single} means single precision is used for the entire
force calculation. {Mixed} means forces between a pair of atoms are
computed in single precision, but accumulated and stored in double
precision, including storage of forces, torques, energies, and virial
quantities. {Double} means double precision is used for the entire
force calculation.
The {balance} keyword sets the fraction of "pair
style"_pair_style.html work offloaded to the coprocessor style for
split values between 0.0 and 1.0 inclusive. While this fraction of
work is running on the coprocessor, other calculations will run on the
host, including neighbor and pair calculations that are not offloaded,
angle, bond, dihedral, kspace, and some MPI communications. If
{split} is set to -1, the fraction of work is dynamically adjusted
automatically throughout the run. This typically give performance
within 5 to 10 percent of the optimal fixed fraction.
The {ghost} keyword determines whether or not ghost atoms, i.e. atoms
at the boundaries of proessor sub-domains, are offloaded for neighbor
and force calculations. When the value = "no", ghost atoms are not
offloaded. This option can reduce the amount of data transfer with
the coprocessor and can also overlap MPI communication of forces with
computation on the coprocessor when the "newton pair"_newton.html
setting is "on". When the value = "ues", ghost atoms are offloaded.
In some cases this can provide better performance, especially if the
{balance} fraction is high.
The {tpc} keyword sets the maximum # of threads {Ntpc} that will
run on each physical core of the coprocessor. The default value is
set to 4, which is the number of hardware threads per core supported
by the current generation Xeon Phi chips.
The {tptask} keyword sets the maximum # of threads (Ntptask} that will
be used on the coprocessor for each MPI task. This, along with the
{tpc} keyword setting, are the only methods for changing the number of
threads used on the coprocessor. The default value is set to 240 =
60*4, which is the maximum # of threads supported by an entire current
generation Xeon Phi chip.
:line
The {kokkos} style invokes settings associated with the use of the
KOKKOS package.
All of the settings are optional keyword/value pairs. Each has a
default value as listed below.
The {neigh} keyword determines how neighbor lists are built. A value
of {half} uses half-neighbor lists, the same as used by most pair
styles in LAMMPS. A value of {half/thread} uses a thread-safe variant
of the half-neighbor list. It should be used instead of {half} when
running with more than 1 threads per MPI task on a CPU. A value of
{n2} uses an O(N^2) algorithm to build the neighbor list without
binning, where N = # of atoms on a processor. It is typically slower
than the other methods, which use binning.
A value of {full} uses a full neighbor lists and is the default. This
performs twice as much computation as the {half} option, however that
is often a win because it is thread-safe and doesn't require atomic
operations in the calculation of pair forces. For that reason, {full}
is the default setting. However, when running in MPI-only mode with 1
thread per MPI task, {half} neighbor lists will typically be faster,
just as it is for non-accelerated pair styles.
A value of {full/cluster} is an experimental neighbor style, where
particles interact with all particles within a small cluster, if at
least one of the clusters particles is within the neighbor cutoff
range. This potentially allows for better vectorization on
architectures such as the Intel Phi. If also reduces the size of the
neighbor list by roughly a factor of the cluster size, thus reducing
the total memory footprint considerably.
The {newton} keyword sets the Newton flags for pairwise and bonded
interactions to {off} or {on}, the same as the "newton"_newton.html
command allows. The default is {off} because this will almost always
give better performance for the KOKKOS package. This means more
computation is done, but less communication. However, when running in
MPI-only mode with 1 thread per MPI task, a value of {on} will
typically be faster, just as it is for non-accelerated pair styles.
The {binsize} keyword sets the size of bins used to bin atoms in
neighbor list builds. The same value can be set by the "neigh_modify
binsize"_neigh_modify.html command. Making it an option in the
package kokkos command allows it to be set from the command line. The
default value is 0.0, which means the LAMMPS default will be used,
which is bins = 1/2 the size of the pairwise cutoff + neighbor skin
distance. This is fine when neighbor lists are built on the CPU. For
GPU builds, a 2x larger binsize equal to the pairwise cutoff +
neighbor skin, is often faster, which can be set by this keyword.
Note that if you use a longer-than-usual pairwise cutoff, e.g. to
allow for a smaller fraction of KSpace work with a "long-range
Coulombic solver"_kspace_style.html because the GPU is faster at
performing pairwise interactions, then this rule of thumb may give too
large a binsize.
The {comm} and {comm/exchange} and {comm/forward} keywords determine
whether the host or device performs the packing and unpacking of data
when communicating per-atom data between processors. "Exchange"
communication happens only on timesteps that neighbor lists are
rebuilt. The data is only for atoms that migrate to new processors.
"Forward" communication happens every timestep. The data is for atom
coordinates and any other atom properties that needs to be updated for
ghost atoms owned by each processor.
The {comm} keyword is simply a short-cut to set the same value
for both the {comm/exchange} and {comm/forward} keywords.
The value options for all 3 keywords are {no} or {host} or {device}.
A value of {no} means to use the standard non-KOKKOS method of
packing/unpacking data for the communication. A value of {host} means
to use the host, typically a multi-core CPU, and perform the
packing/unpacking in parallel with threads. A value of {device} means
to use the device, typically a GPU, to perform the packing/unpacking
operation.
The optimal choice for these keywords depends on the input script and
the hardware used. The {no} value is useful for verifying that the
Kokkos-based {host} and {device} values are working correctly. It may
also be the fastest choice when using Kokkos styles in MPI-only mode
(i.e. with a thread count of 1).
When running on CPUs or Xeon Phi, the {host} and {device} values work
identically. When using GPUs, the {device} value will typically be
optimal if all of your styles used in your input script are supported
by the KOKKOS package. In this case data can stay on the GPU for many
timesteps without being moved between the host and GPU, if you use the
{device} value. This requires that your MPI is able to access GPU
memory directly. Currently that is true for OpenMPI 1.8 (or later
versions), Mvapich2 1.9 (or later), and CrayMPI. If your script uses
styles (e.g. fixes) which are not yet supported by the KOKKOS package,
then data has to be move between the host and device anyway, so it is
typically faster to let the host handle communication, by using the
{host} value. Using {host} instead of {no} will enable use of
multiple threads to pack/unpack communicated data.
:line
The {omp} style invokes settings associated with the use of the
USER-OMP package.
The {Nthread} argument sets the number of OpenMP threads allocated for
each MPI task. For example, if your system has nodes with dual
quad-core processors, it has a total of 8 cores per node. You could
use two MPI tasks per node (e.g. using the -ppn option of the mpirun
command in MPICH or -npernode in OpenMPI), and set {Nthreads} = 4.
This would use all 8 cores on each node. Note that the product of MPI
tasks * threads/task should not exceed the physical number of cores
(on a node), otherwise performance will suffer.
Setting {Nthread} = 0 instructs LAMMPS to use whatever value is the
default for the given OpenMP environment. This is usually determined
via the {OMP_NUM_THREADS} environment variable or the compiler
runtime. Note that in most cases the default for OpenMP capable
compilers is to use one thread for each available CPU core when
{OMP_NUM_THREADS} is not explicitly set, which can lead to poor
performance.
Here are examples of how to set the environment variable when
launching LAMMPS:
env OMP_NUM_THREADS=4 lmp_machine -sf omp -in in.script
env OMP_NUM_THREADS=2 mpirun -np 2 lmp_machine -sf omp -in in.script
mpirun -x OMP_NUM_THREADS=2 -np 2 lmp_machine -sf omp -in in.script :pre
or you can set it permanently in your shell's start-up script.
All three of these examples use a total of 4 CPU cores.
Note that different MPI implementations have different ways of passing
the OMP_NUM_THREADS environment variable to all MPI processes. The
2nd example line above is for MPICH; the 3rd example line with -x is
for OpenMPI. Check your MPI documentation for additional details.
What combination of threads and MPI tasks gives the best performance
is difficult to predict and can depend on many components of your
input. Not all features of LAMMPS support OpenMP threading via the
USER-OMP packaage and the parallel efficiency can be very different,
too.
Optional keyword/value pairs can also be specified. Each has a
default value as listed below.
The {neigh} keyword specifies whether neighbor list building will be
multi-threaded in addition to force calculations. If {neigh} is set
to {no} then neighbor list calculation is performed only by MPI tasks
with no OpenMP threading. If {mode} is {yes} (the default), a
multi-threaded neighbor list build is used. Using {neigh} = {yes} is
almost always faster and should produce idential neighbor lists at the
expense of using more memory. Specifically, neighbor list pages are
allocated for all threads at the same time and each thread works
within its own pages.
:line
[Restrictions:]
This command cannot be used after the simulation box is defined by a
"read_data"_read_data.html or "create_box"_create_box.html command.
The cuda style of this command can only be invoked if LAMMPS was built
with the USER-CUDA package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
The gpu style of this command can only be invoked if LAMMPS was built
with the GPU package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
The intel style of this command can only be invoked if LAMMPS was
built with the USER-INTEL package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
The kk style of this command can only be invoked if LAMMPS was built
with the KOKKOS package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
The omp style of this command can only be invoked if LAMMPS was built
with the USER-OMP package. See the "Making
LAMMPS"_Section_start.html#start_3 section for more info.
[Related commands:]
"suffix"_suffix.html, "-pk" "command-line
setting"_Section_start.html#start_7
[Default:]
For the USER-CUDA package, the default is Ngpu = 1 and the option
defaults are newton = off, gpuID = 0 to Ngpu-1, timing = not enabled,
test = not enabled, and thread = auto. These settings are made
automatically by the required "-c on" "command-line
switch"_Section_start.html#start_7. You can change them bu using the
package cuda command in your input script or via the "-pk cuda"
"command-line switch"_Section_start.html#start_7.
For the GPU package, the default is Ngpu = 1 and the option defaults
are neigh = yes, newton = off, binsize = 0.0, split = 1.0, gpuID = 0
to Ngpu-1, tpa = 1, and device = not used. These settings are made
automatically if the "-sf gpu" "command-line
switch"_Section_start.html#start_7 is used. If it is not used, you
must invoke the package gpu command in your input script or via the
"-pk gpu" "command-line switch"_Section_start.html#start_7.
For the USER-INTEL package, the default is Nphi = 1 and the option
defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240. Note
that all of these settings, except "prec", are ignored if LAMMPS was
not built with Xeon Phi coprocessor support. The default ghost option
is determined by the pair style being used. This value is output to
the screen in the offload report at the end of each run. These
settings are made automatically if the "-sf intel" "command-line
switch"_Section_start.html#start_7 is used. If it is not used, you
must invoke the package intel command in your input script or or via
the "-pk intel" "command-line switch"_Section_start.html#start_7.
For the KOKKOS package, the option defaults neigh = full, newton =
off, binsize = 0.0, and comm = host. These settings are made
automatically by the required "-k on" "command-line
switch"_Section_start.html#start_7. You can change them bu using the
package kokkos command in your input script or via the "-pk kokkos"
"command-line switch"_Section_start.html#start_7.
For the OMP package, the default is Nthreads = 0 and the option
defaults are neigh = yes. These settings are made automatically if
the "-sf omp" "command-line switch"_Section_start.html#start_7 is
used. If it is not used, you must invoke the package omp command in
your input script or via the "-pk omp" "command-line
switch"_Section_start.html#start_7.