diff --git a/doc/balance.html b/doc/balance.html index 4392ad32ca..3f1244d9ab 100644 --- a/doc/balance.html +++ b/doc/balance.html @@ -20,7 +20,7 @@ balancing has not yet been released.
x args = uniform or Px-1 numbers between 0 and 1 uniform = evenly spaced cuts between processors in x dimension @@ -31,19 +31,20 @@ balancing has not yet been released. z args = uniform or Pz-1 numbers between 0 and 1 uniform = evenly spaced cuts between processors in z dimension numbers = Pz-1 ascending values between 0 and 1, Pz - # of processors in z dimension - dynamic args = Nrepeat Niter dimstr thresh - Nrepeat = # of times to repeat dimstr sequence + dynamic args = disstr Niter thresh + dimstr = sequence of letters containing "x" or "y" or "z", each not more than once Niter = # of times to iterate within each dimension of dimstr sequence - dimstr = sequence of letters containing "x" or "y" or "z" - thresh = stop balancing when this imbalance threshhold is reached + thresh = stop balancing when this imbalance threshhold is reached + out arg = filename + filename = output file to write each processor's sub-domain to
Examples:
balance x uniform y 0.4 0.5 0.6 -balance dynamic 1 5 xzx 1.1 -balance dynamic 5 10 x 1.0 +balance dynamic xz 5 1.1 +balance dynamic x 20 1.0 out tmp.balance
Description:
@@ -67,14 +68,13 @@ very different numbers of particles per processor. This can lead to poor performance in a scalability sense, when the simulation is run in parallel. -Note that the processors command gives you some -control over how the box volume is split across processors. -Specifically, for a Px by Py by Pz grid of processors, it lets you -choose Px, Py, and Pz, subject to the constraint that Px * Py * Pz = -P, the total number of processors. This can be sufficient to achieve -good load-balance for some models on some processor counts. However, -all the processor sub-domains will still be the same shape and have -the same volume. +
Note that the processors command gives you control +over how the box volume is split across processors. Specifically, for +a Px by Py by Pz grid of processors, it chooses or lets you choose Px, +Py, and Pz, subject to the constraint that Px * Py * Pz = P, the total +number of processors. This is sufficient to achieve good load-balance +for many models on many processor counts. However, all the processor +sub-domains will still be the same shape and have the same volume.
This command does not alter the topology of the Px by Py by Pz grid or processors. But it shifts the cutting planes between processors (in @@ -141,48 +141,77 @@ partitioning, which could be uniform or the result of a previous balance command.
The dimstr argument is a string of characters, each of which must be -an "x" or "y" or "z". The characters can appear in any order, and can -be repeated as many times as desired. These are all valid dimstr -arguments: "x" or "xyzyx" or "yyyzzz". +an "x" or "y" or "z". Eacn character can appear zero or one time, +since there is no advantage to balancing on a dimension more than +once. You should normally only list dimensions where you expect there +to be a density variation in the particles.
Balancing proceeds by adjusting the cutting planes in each of the -dimensions listed in dimstr, one dimension at a time. The entire -sequence of dimensions is repeated Nrepeat times. For a single -dimension, the balancing operation (described below) is iterated on -Niter times. After each dimension finishes, the imbalance factor is -re-computed, and the balancing operation halts if the thresh +dimensions listed in dimstr, one dimension at a time. For a single +dimension, the balancing operation (described below) is iterated on up +to Niter times. After each dimension finishes, the imbalance factor +is re-computed, and the balancing operation halts if the thresh criterion is met.
-The interplay between Nrepeat, Niter, and dimstr means that -these commands do essentially the same thing, the only difference -being how often the imbalance factor is computed and checked against -the threshhold: +
A rebalance operation in a single dimension is performed using a +recursive multisectioning algorithm, where the position of each +cutting plane (line in 2d) in the dimension is adjusted independently. +This is similar to a recursive bisectioning (RCB) for a single value, +except that the bounds used for each bisectioning take advantage of +information from neighboring cuts if possible. At each iteration, the +count of particles on either side of each plane is tallied. If the +counts do not match the target value for the plane, the position of +the cut is adjusted. As the recustion progresses, the count of +particles on either side of the plane gets closer to the target value.
-balance y dynamic 5 10 x 1.2 -balance y dynamic 1 10 xxxxx 1.2 -balance y dynamic 50 1 x 1.2 +
+ +The out keyword writes a text file to the specified filename with +the results of the balancing operation. The file contains the bounds +of the sub-domain for each processor after the balancing operation +completes. The format of the file is compatible with the +Pizza.py mdump tool which has support for manipulating and +visualizing mesh files. An example is show here for a balancing by 4 +processors for a 2d problem: +
+ITEM: TIMESTEP +0 +ITEM: NUMBER OF SQUARES +4 +ITEM: SQUARES +1 1 1 2 7 6 +2 2 2 3 8 7 +3 3 3 4 9 8 +4 4 4 5 10 9 +ITEM: TIMESTEP +0 +ITEM: NUMBER OF NODES +10 +ITEM: BOX BOUNDS +-153.919 184.703 +0 15.3919 +-0.769595 0.769595 +ITEM: NODES +1 1 -153.919 0 0 +2 1 7.45545 0 0 +3 1 14.7305 0 0 +4 1 22.667 0 0 +5 1 184.703 0 0 +6 1 -153.919 15.3919 0 +7 1 7.45545 15.3919 0 +8 1 14.7305 15.3919 0 +9 1 22.667 15.3919 0 +10 1 184.703 15.3919 0-A rebalance operation in a single dimension is performed using an -iterative "diffusive" load-balancing algorithm (Cybenko). -One iteration on a dimension (which is repeated Niter times), works -as follows. Assume there are Px processors in the x dimension. This -defines Px slices of the simulation, each of which contains Py*Pz -processors. The task is to adjust the position of the Px-1 cuts -between slices, leaving the end cuts unchanged (left and right edges -of the simulation box). +
The "SQUARES" lists the node IDs of the 4 vertices in a rectangle for +each processor (1 to 4). The first SQUARE 1 (for processor 0) is a +rectangle of type 1 (equal to SQUARE ID) and contains vertices +1,2,7,6. The coordinates of all the vertices are listed in the NODES +section. Note that the 4 sub-domains share vertices, so there are +only 10 unique vertices in total.
-The iteration beings by calculating the number of atoms within each of -the Px slices. Then for each slice, its atom count is compared to its -neighbors. If a slice has more atoms than its left (or right) -neighbor, the cut is moved towards the center of the slice, -effectively shrinking the width of the slice and migrating atoms to -the other slice. The distance to move the cut is a function of the -"density" of atoms in the donor slice and the difference in counts -between the 2 slices. A damping factor is also applied to avoid -oscillations in the position of the cutting plane as iterations -proceed. Hence the "diffusive" nature of the algorithm as work -(atoms) effectively diffuses from highly loaded processors to -less-loaded processors. +
For a 3d problem, the syntax is similar with "SQUARES" replaced by +"CUBES", and 8 vertices listed for each processor, instead of 4.
@@ -200,10 +229,4 @@ appear in dimstr for the dynamic keyword.Default: none
-
- - - -(Cybenko) Cybenko, J Par Dist Comp, 7, 279-301 (1989). -