forked from lijiext/lammps
179 lines
7.0 KiB
Plaintext
179 lines
7.0 KiB
Plaintext
"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c
|
|
|
|
:link(lws,http://lammps.sandia.gov)
|
|
:link(ld,Manual.html)
|
|
:link(lc,Section_commands.html#comm)
|
|
|
|
:line
|
|
|
|
package command :h3
|
|
|
|
[Syntax:]
|
|
|
|
package style args :pre
|
|
|
|
style = {gpu} or {cuda} or {omp} :ulb,l
|
|
args = arguments specific to the style :l
|
|
{gpu} args = mode first last split
|
|
mode = force or force/neigh
|
|
first = ID of first GPU to be used on each node
|
|
last = ID of last GPU to be used on each node
|
|
split = fraction of particles assigned to the GPU
|
|
{cuda} args = to be determined
|
|
{omp} args = Nthreads mode
|
|
Nthreads = # of OpenMP threads to associate with each MPI process
|
|
mode = force or force/neigh (optional) :pre
|
|
:ule
|
|
|
|
[Examples:]
|
|
|
|
package gpu force 0 0 1.0
|
|
package gpu force 0 0 0.75
|
|
package gpu force/neigh 0 0 1.0
|
|
package gpu force/neigh 0 1 -1.0
|
|
package omp * force/neigh
|
|
package omp 4 force :pre
|
|
|
|
[Description:]
|
|
|
|
This command invokes package-specific settings. Currently the
|
|
following packages use it: GPU, USER-CUDA, and USER-OMP.
|
|
|
|
See "this section"_Section_accelerate.html of the manual for more
|
|
details about using these various packages for accelerating
|
|
a LAMMPS calculation.
|
|
|
|
:line
|
|
|
|
The {gpu} style invokes options associated with the use of the GPU
|
|
package. It allows you to select and initialize GPUs to be used for
|
|
acceleration via this package and configure how the GPU acceleration
|
|
is performed. These settings are required in order to use any style
|
|
with GPU acceleration.
|
|
|
|
The {mode} setting specifies where neighbor list calculations will be
|
|
performed. If {mode} is force, neighbor list calculation is performed
|
|
on the CPU. If {mode} is force/neigh, neighbor list calculation is
|
|
performed on the GPU. GPU neighbor list calculation currently cannot
|
|
be used with a triclinic box. GPU neighbor list calculation currently
|
|
cannot be used with "hybrid"_pair_hybrid.html pair styles. GPU
|
|
neighbor lists are not compatible with styles that are not
|
|
GPU-enabled. When a non-GPU enabled style requires a neighbor list,
|
|
it will also be built using CPU routines. In these cases, it will
|
|
typically be more efficient to only use CPU neighbor list builds.
|
|
|
|
The {first} and {last} settings specify the GPUs that will be used for
|
|
simulation. On each node, the GPU IDs in the inclusive range from
|
|
{first} to {last} will be used.
|
|
|
|
The {split} setting can be used for load balancing force calculation
|
|
work between CPU and GPU cores in GPU-enabled pair styles. If 0 <
|
|
{split} < 1.0, a fixed fraction of particles is offloaded to the GPU
|
|
while force calculation for the other particles occurs simulataneously
|
|
on the CPU. If {split}<0, the optimal fraction (based on CPU and GPU
|
|
timings) is calculated every 25 timesteps. If {split} = 1.0, all force
|
|
calculations for GPU accelerated pair styles are performed on the
|
|
GPU. In this case, "hybrid"_pair_hybrid.html, "bond"_bond_style.html,
|
|
"angle"_angle_style.html, "dihedral"_dihedral_style.html,
|
|
"improper"_improper_style.html, and "long-range"_kspace_style.html
|
|
calculations can be performed on the CPU while the GPU is performing
|
|
force calculations for the GPU-enabled pair style. If all CPU force
|
|
computations complete before the GPU, LAMMPS will block until the GPU
|
|
has finished before continuing the timestep.
|
|
|
|
As an example, if you have two GPUs per node and 8 CPU cores per node,
|
|
and would like to run on 4 nodes (32 cores) with dynamic balancing of
|
|
force calculation across CPU and GPU cores, you could specify
|
|
|
|
package gpu force/neigh 0 1 -1 :pre
|
|
|
|
In this case, all CPU cores and GPU devices on the nodes would be
|
|
utilized. Each GPU device would be shared by 4 CPU cores. The CPU
|
|
cores would perform force calculations for some fraction of the
|
|
particles at the same time the GPUs performed force calculation for
|
|
the other particles.
|
|
|
|
:line
|
|
|
|
The {cuda} style invokes options associated with the use of the
|
|
USER-CUDA package. These still need to be documented.
|
|
|
|
:line
|
|
|
|
The {omp} style invokes options associated with the use of the
|
|
USER-OMP package.
|
|
|
|
The first argument allows to explicitly set the number of OpenMP
|
|
threads to be allocated for each MPI process. For example, if your
|
|
system has nodes with dual quad-core processors, it has a total of 8
|
|
cores per node. You could run MPI on 2 cores on each node (e.g. using
|
|
options for the mpirun command), and set the {Nthreads} setting to 4.
|
|
This would effectively use all 8 cores on each node. Since each MPI
|
|
process would spawn 4 threads (one of which runs as part of the MPI
|
|
process itself).
|
|
|
|
For performance reasons, you should not set {Nthreads} to more threads
|
|
than there are physical cores (per MPI task), but LAMMPS cannot check
|
|
for this.
|
|
|
|
An {Nthreads} value of '*' instructs LAMMPS to use whatever is the
|
|
default for the given OpenMP environment. This is usually determined
|
|
via the {OMP_NUM_THREADS} environment variable or the compiler
|
|
runtime. Please note that in most cases the default for OpenMP
|
|
capable compilers is to use one thread for each available CPU core
|
|
when {OMP_NUM_THREADS} is not set, which can lead to extremely bad
|
|
performance.
|
|
|
|
Which combination of threads and MPI tasks gives the best performance
|
|
is difficult to predict and can depend on many components of your input.
|
|
Not all features of LAMMPS support OpenMP and the parallel efficiency
|
|
can be very different, too.
|
|
|
|
The {mode} setting specifies where neighbor list calculations will be
|
|
multi-threaded as well. If {mode} is force, neighbor list calculation
|
|
is performed in serial. If {mode} is force/neigh, a multi-threaded
|
|
neighbor list build is used. Using the force/neigh setting is almost
|
|
always faster and should produce idential neighbor lists at the
|
|
expense of using some more memory (neighbor list pages are always
|
|
allocated for all threads at the same time and each thread works on
|
|
its own pages).
|
|
|
|
:line
|
|
|
|
[Restrictions:]
|
|
|
|
This command cannot be used after the simulation box is defined by a
|
|
"read_data"_read_data.html or "create_box"_create_box.html command.
|
|
|
|
The cuda style of this command can only be invoked if LAMMPS was built
|
|
with the USER-CUDA package. See the "Making
|
|
LAMMPS"_Section_start.html#start_3 section for more info. When using
|
|
styles in the USER-CUDA package, use of the "package cuda" command in
|
|
your input script is not required.
|
|
|
|
The gpu style of this command can only be invoked if LAMMPS was built
|
|
with the GPU package. See the "Making
|
|
LAMMPS"_Section_start.html#start_3 section for more info. When using
|
|
styles in the GPU package, use of the "package gpu" command in your
|
|
input script is currently required.
|
|
|
|
The omp style of this command can only be invoked if LAMMPS was built
|
|
with the USER-OMP package. See the "Making
|
|
LAMMPS"_Section_start.html#start_3 section for more info. When using
|
|
styles in the USER-OMP package, use of the "package omp" command in
|
|
your input script is not required. See the information on default
|
|
settings below.
|
|
|
|
[Related commands:]
|
|
|
|
"suffix"_suffix.html
|
|
|
|
[Default:]
|
|
|
|
If the "-sf omp" "command-line switch"_Section_start.html#start_6 is
|
|
used then "package omp *" is also auto-invoked to specify default OMP
|
|
settings.
|
|
|
|
The other styles have no defaults.
|
|
|