lammps/doc/balance.txt

"LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc :c

:link(lws,http://lammps.sandia.gov)
:link(ld,Manual.html)
:link(lc,Section_commands.html#comm)

:line

balance command :h3

[Syntax:]

balance thresh style args keyword value ... :pre

thresh = imbalance threshhold that must be exceeded to perform a re-balance :ulb,l
style = {x} or {y} or {z} or {shift} or {rcb} :l
  {x} args = {uniform} or Px-1 numbers between 0 and 1
    {uniform} = evenly spaced cuts between processors in x dimension
    numbers = Px-1 ascending values between 0 and 1, Px - # of processors in x dimension
    {x} can be specified together with {y} or {z}
  {y} args = {uniform} or Py-1 numbers between 0 and 1
    {uniform} = evenly spaced cuts between processors in y dimension
    numbers = Py-1 ascending values between 0 and 1, Py - # of processors in y dimension
    {y} can be specified together with {x} or {z}
  {z} args = {uniform} or Pz-1 numbers between 0 and 1
    {uniform} = evenly spaced cuts between processors in z dimension
    numbers = Pz-1 ascending values between 0 and 1, Pz - # of processors in z dimension
    {z} can be specified together with {x} or {y}
  {shift} args = dimstr Niter stopthresh
    dimstr = sequence of letters containing "x" or "y" or "z", each not more than once
    Niter = # of times to iterate within each dimension of dimstr sequence
    stopthresh = stop balancing when this imbalance threshhold is reached
  {rcb} args = none :pre
zero or more keyword/value pairs may be appended :l
keyword = {out} :l
  {out} value = filename
    filename = write each processor's sub-domain to a file :pre
:ule

[Examples:]

balance 0.9 x uniform y 0.4 0.5 0.6
balance 1.2 shift xz 5 1.1
balance 1.0 shift xz 5 1.1
balance 1.1 rcb
balance 1.0 shift x 20 1.0 out tmp.balance :pre

[Description:]

IMPORTANT NOTE: The {rcb} style is not yet implemented.

This command adjusts the size and shape of processor sub-domains
within the simulation box, to attempt to balance the number of
particles and thus the computational cost (load) evenly across
processors.  The load balancing is "static" in the sense that this
command performs the balancing once, before or between simulations.
The processor sub-domains will then remain static during the
subsequent run.  To perform "dynamic" balancing, see the "fix
balance"_fix_balance.html command, which can adjust processor
sub-domain sizes and shapes on-the-fly during a "run"_run.html.

Load-balancing is typically only useful if the particles in the
simulation box have a spatially-varying density distribution.  E.g. a
model of a vapor/liquid interface, or a solid with an irregular-shaped
geometry containing void regions.  In this case, the LAMMPS default of
dividing the simulation box volume into a regular-spaced grid of 3d
bricks, with one equal-volume sub-domain per procesor, may assign very
different numbers of particles per processor.  This can lead to poor
performance in a scalability sense, when the simulation is run in
parallel.

Note that the "processors"_processors.html command allows some control
over how the box volume is split across processors.  Specifically, for
a Px by Py by Pz grid of processors, it allows choice of Px, Py, and
Pz, subject to the constraint that Px * Py * Pz = P, the total number
of processors.  This is sufficient to achieve good load-balance for
many models on many processor counts.  However, all the processor
sub-domains will still have the same shape and same volume.

The requested load-balancing operation is only performed if the
current "imbalance factor" in particles owned by each processor
exceeds the specified {thresh} parameter.  This factor is defined as
the maximum number of particles owned by any processor, divided by the
average number of particles per processor.  Thus an imbalance factor
of 1.0 is perfect balance.  For 10000 particles running on 10
processors, if the most heavily loaded processor has 1200 particles,
then the factor is 1.2, meaning there is a 20% imbalance.  Note that a
re-balance can be forced even if the current balance is perfect (1.0)
be specifying a {thresh} < 1.0.

When the balance command completes, it prints statistics about its
results, including the change in the imbalance factor and the change
in the maximum number of particles (on any processor).  For "grid"
methods (defined below) that create a logical 3d grid of processors,
the positions of all cutting planes in each of the 3 dimensions (as
fractions of the box length) are also printed.

IMPORTANT NOTE: This command attempts to minimize the imbalance
factor, as defined above.  But depending on the method a perfect
balance (1.0) may not be achieved.  For example, "grid" methods
(defined below) that create a logical 3d grid cannot achieve perfect
balance for many irregular distributions of particles.  Likewise, if a
portion of the system is a perfect lattice, e.g. the intiial system is
generated by the "create_atoms"_create_atoms.html command, then "grid"
methods may be unable to achieve exact balance.  This is because
entire lattice planes will be owned or not owned by a single
processor.

IMPORTANT NOTE: Computational cost is not strictly proportional to
particle count, and changing the relative size and shape of processor
sub-domains may lead to additional computational and communication
overheads, e.g. in the PPPM solver used via the
"kspace_style"_kspace_style.html command.  Thus you should benchmark
the run times of a simulation before and after balancing.

:line

The method used to perform a load balance is specified by one of the
listed styles, which are described in detail below.  There are 2 kinds
of styles.

The {x}, {y}, {z}, and {shift} styles are "grid" methods which produce
a logical 3d grid of processors.  They operate by changing the cutting
planes (or lines) between processors in 3d (or 2d), to adjust the
volume (area in 2d) assigned to each processor, as in the following 2d
diagram.  The left diagram is the default partitioning of the
simulation box across processors (one sub-box for each of 16
processors); the right diagram is after balancing.

:c,image(JPG/balance.jpg)

The {rcb} style is a "tiling" method which does not produce a logical
3d grid of processors.  Rather it tiles the simulation domain with
rectangular sub-boxes of varying size and shape in an irregular
fashion so as to have equal numbers of particles in each sub-box, as
in the following 2d diagram.  Again the left diagram is the default
partitioning of the simulation box across processors (one sub-box for
each of 16 processors); the right diagram is after balancing.

NOTE: Need a diagram of RCB partitioning.

The "grid" methods can be used with either of the
"comm_style"_comm_style.html command options, {brick} or {tiled}.  The
"tiling" methods can only be used with "comm_style
tiled"_comm_style.html.  Note that it can be useful to use a "grid"
method with "comm_style tiled"_comm_style.html to return the domain
partitioning to a logical 3d grid of processors so that "comm_style
brick" can be used for subsequent "run"_run.html commands.

When a "grid" method is specified, the current domain partitioning can
be either a logical 3d grid or a tiled partitioning.  In the former
case, the current logical 3d grid is used as a starting point and
changes are made to improve the imbalance factor.  In the latter case,
the tiled partitioning is discarded and a logical 3d grid is created
with uniform spacing in all dimensions.  This becomes the starting
point for the balancing operation.

When a "tiling" method is specified, the current domain partitioning
("grid" or "tiled") is ignored, and a new partitioning is computed
from scratch.

:line

The {x}, {y}, and {z} styles invoke a "grid" method for balancing, as
described above.  Note that any or all of these 3 styles can be
specified together, one after the other.  This style adjusts the
position of cutting planes between processor sub-domains in specific
dimensions.  Only the specified dimensions are altered.

The {uniform} argument spaces the planes evenly, as in the left
diagrams above.  The {numeric} argument requires listing Ps-1 numbers
that specify the position of the cutting planes.  This requires
knowing Ps = Px or Py or Pz = the number of processors assigned by
LAMMPS to the relevant dimension.  This assignment is made (and the
Px, Py, Pz values printed out) when the simulation box is created by
the "create_box" or "read_data" or "read_restart" command and is
influenced by the settings of the "processors"_processors.html
command.

Each of the numeric values must be between 0 and 1, and they must be
listed in ascending order.  They represent the fractional position of
the cutting place.  The left (or lower) edge of the box is 0.0, and
the right (or upper) edge is 1.0.  Neither of these values is
specified.  Only the interior Ps-1 positions are specified.  Thus is
there are 2 procesors in the x dimension, you specify a single value
such as 0.75, which would make the left processor's sub-domain 3x
larger than the right processor's sub-domain.

:line

The {shift} style invokes a "grid" method for balancing, as
described above.  It changes the positions of cutting planes between
processors in an iterative fashion, seeking to reduce the imbalance
factor, similar to how the "fix balance shift"_fix_balance.html
command operates.

The {dimstr} argument is a string of characters, each of which must be
an "x" or "y" or "z".  Eacn character can appear zero or one time,
since there is no advantage to balancing on a dimension more than
once.  You should normally only list dimensions where you expect there
to be a density variation in the particles.

Balancing proceeds by adjusting the cutting planes in each of the
dimensions listed in {dimstr}, one dimension at a time.  For a single
dimension, the balancing operation (described below) is iterated on up
to {Niter} times.  After each dimension finishes, the imbalance factor
is re-computed, and the balancing operation halts if the {stopthresh}
criterion is met.

A rebalance operation in a single dimension is performed using a
recursive multisectioning algorithm, where the position of each
cutting plane (line in 2d) in the dimension is adjusted independently.
This is similar to a recursive bisectioning for a single value, except
that the bounds used for each bisectioning take advantage of
information from neighboring cuts if possible.  At each iteration, the
count of particles on either side of each plane is tallied.  If the
counts do not match the target value for the plane, the position of
the cut is adjusted to be halfway between a low and high bound.  The
low and high bounds are adjusted on each iteration, using new count
information, so that they become closer together over time.  Thus as
the recustion progresses, the count of particles on either side of the
plane gets closer to the target value.

Once the rebalancing is complete and final processor sub-domains
assigned, particles are migrated to their new owning processor, and
the balance procedure ends.

IMPORTANT NOTE: At each rebalance operation, the bisectioning for each
cutting plane (line in 2d) typcially starts with low and high bounds
separated by the extent of a processor's sub-domain in one dimension.
The size of this bracketing region shrinks by 1/2 every iteration.
Thus if {Niter} is specified as 10, the cutting plane will typically
be positioned to 1 part in 1000 accuracy (relative to the perfect
target position).  For {Niter} = 20, it will be accurate to 1 part in
a million.  Thus there is no need ot set {Niter} to a large value.
LAMMPS will check if the threshold accuracy is reached (in a
dimension) is less iterations than {Niter} and exit early.  However,
{Niter} should also not be set too small, since it will take roughly
the same number of iterations to converge even if the cutting plane is
initially close to the target value.

:line

The {rcb} style invokes a "tiled" method for balancing, as described
above.  It performs a recursive coordinate bisectioning (RCB) of the
simulation domain.

Need further description of RCB.

:line

The {out} keyword writes a text file to the specified {filename} with
the results of the balancing operation.  The file contains the bounds
of the sub-domain for each processor after the balancing operation
completes.  The format of the file is compatible with the
"Pizza.py"_pizza {mdump} tool which has support for manipulating and
visualizing mesh files.  An example is shown here for a balancing by 4
processors for a 2d problem:

ITEM: TIMESTEP
0
ITEM: NUMBER OF SQUARES
4
ITEM: SQUARES
1 1 1 2 7 6
2 2 2 3 8 7
3 3 3 4 9 8
4 4 4 5 10 9
ITEM: TIMESTEP
0
ITEM: NUMBER OF NODES
10
ITEM: BOX BOUNDS
-153.919 184.703
0 15.3919
-0.769595 0.769595
ITEM: NODES
1 1 -153.919 0 0
2 1 7.45545 0 0
3 1 14.7305 0 0
4 1 22.667 0 0
5 1 184.703 0 0
6 1 -153.919 15.3919 0
7 1 7.45545 15.3919 0
8 1 14.7305 15.3919 0
9 1 22.667 15.3919 0
10 1 184.703 15.3919 0 :pre

The "SQUARES" lists the node IDs of the 4 vertices in a rectangle for
each processor (1 to 4).  The first SQUARE 1 (for processor 0) is a
rectangle of type 1 (equal to SQUARE ID) and contains vertices
1,2,7,6.  The coordinates of all the vertices are listed in the NODES
section.  Note that the 4 sub-domains share vertices, so there are
only 10 unique vertices in total.

For a 3d problem, the syntax is similar with "SQUARES" replaced by
"CUBES", and 8 vertices listed for each processor, instead of 4.

:line

[Restrictions:]

For 2d simulations, the {z} style cannot be used.  Nor can a "z"
appear in {dimstr} for the {shift} style.

[Related commands:]

"processors"_processors.html, "fix balance"_fix_balance.html

[Default:] none