change to RCB cuts in load-balancing commands, also a new option for fix halt

This commit is contained in:
Steve Plimpton 2017-03-10 15:55:07 -07:00
parent 470353e320
commit f871ecdc67
15 changed files with 686 additions and 51 deletions

View File

@ -1,7 +1,7 @@
<!-- HTML_ONLY -->
<HEAD>
<TITLE>LAMMPS Users Manual</TITLE>
<META NAME="docnumber" CONTENT="7 Mar 2017 version">
<META NAME="docnumber" CONTENT="10 Mar 2017 version">
<META NAME="author" CONTENT="http://lammps.sandia.gov - Sandia National Laboratories">
<META NAME="copyright" CONTENT="Copyright (2003) Sandia Corporation. This software and manual is distributed under the GNU General Public License.">
</HEAD>
@ -21,7 +21,7 @@
<H1></H1>
LAMMPS Documentation :c,h3
7 Mar 2017 version :c,h4
10 Mar 2017 version :c,h4
Version info: :h4

View File

@ -286,24 +286,32 @@ above. It performs a recursive coordinate bisectioning (RCB) of the
simulation domain. The basic idea is as follows.
The simulation domain is cut into 2 boxes by an axis-aligned cut in
the longest dimension, leaving one new box on either side of the cut.
All the processors are also partitioned into 2 groups, half assigned
to the box on the lower side of the cut, and half to the box on the
upper side. (If the processor count is odd, one side gets an extra
processor.) The cut is positioned so that the number of particles in
the lower box is exactly the number that the processors assigned to
that box should own for load balance to be perfect. This also makes
load balance for the upper box perfect. The positioning is done
iteratively, by a bisectioning method. Note that counting particles
on either side of the cut requires communication between all
processors at each iteration.
one of the dimensions, leaving one new sub-box on either side of the
cut. Which dimension is chosen for the cut depends on the particle
(weight) distribution within the parent box. Normally the longest
dimension of the box is cut, but if all (or most) of the particles are
at one end of the box, a cut may be performed in another dimension to
induce sub-boxes that are more cube-ish (3d) or square-ish (2d) in
shape.
After the cut is made, all the processors are also partitioned into 2
groups, half assigned to the box on the lower side of the cut, and
half to the box on the upper side. (If the processor count is odd,
one side gets an extra processor.) The cut is positioned so that the
number of (weighted) particles in the lower box is exactly the number
that the processors assigned to that box should own for load balance
to be perfect. This also makes load balance for the upper box
perfect. The positioning of the cut is done iteratively, by a
bisectioning method (median search). Note that counting particles on
either side of the cut requires communication between all processors
at each iteration.
That is the procedure for the first cut. Subsequent cuts are made
recursively, in exactly the same manner. The subset of processors
assigned to each box make a new cut in the longest dimension of that
box, splitting the box, the subset of processors, and the particles
in the box in two. The recursion continues until every processor is
assigned a sub-box of the entire simulation domain, and owns the
assigned to each box make a new cut in one dimension of that box,
splitting the box, the subset of processors, and the particles in the
box in two. The recursion continues until every processor is assigned
a sub-box of the entire simulation domain, and owns the (weighted)
particles in that sub-box.
:line

View File

@ -101,11 +101,11 @@ Instead you could do something like this, assuming the simulation box
is non-periodic and atoms extend from 0 to 20 in all dimensions:
change_box all x final -10 20
create_atoms 1 single -5 5 5 # this will fail to insert an atom :pre
create_atoms 1 single -5 5 5 # this will fail to insert an atom :pre
change_box all x final -10 20 boundary f s s
create_atoms 1 single -5 5 5
change_box boundary s s s # this will work :pre
change_box all boundary s s s # this will work :pre
NOTE: Unlike the earlier "displace_box" version of this command, atom
remapping is NOT performed by default. This command allows remapping

View File

@ -134,6 +134,17 @@ not overlap existing atoms inappropriately, especially if molecules
are being added. The "delete_atoms"_delete_atoms.html command can be
used to remove overlapping atoms or molecules.
NOTE: You cannot use any of the styles explained above to create atoms
that are outside the simulation box; they will just be ignored by
LAMMPS. This is true even if you are using shrink-wrapped box
boundaries, as specified by the "boundary"_boundary.html command.
However, you can first use the "change_box"_change_box.html command to
temporarily expand the box, then add atoms via create_atoms, then
finally use change_box command again if needed to re-shrink-wrap the
new atoms. See the "change_box"_change_box.html doc page for an
example of how to do this, using the create_atoms {single} style to
insert a new atom outside the current simulation box.
:line
Individual atoms are inserted by this command, unless the {mol}

View File

@ -15,15 +15,16 @@ fix ID group-ID halt N attribute operator avalue keyword value ... :pre
ID, group-ID are documented in "fix"_fix.html command :ulb,l
halt = style name of this fix command :l
N = check halt condition every N steps :l
attribute = hstyle or v_name :l
hstyle = {bondmax}
attribute = {bondmax} or {tlimit} or v_name :l
bondmax = length of longest bond in the system
tlimit = elapsed CPU time
v_name = name of "equal-style variable"_variable.html :pre
operator = "<" or "<=" or ">" or ">=" or "==" or "!=" or "|^" :l
avalue = numeric value to compare attribute to :l
string = text string to print with optional variable names :l
zero or more keyword/value pairs may be appended :l
keyword = {error} :l
{error} value = {hard} or {soft} or {continue} :pre
keyword = {error} or {message} :l
{error} value = {hard} or {soft} or {continue}
{message} value = {yes} or {no} :pre
:ule
[Examples:]
@ -40,14 +41,33 @@ specified by the "run"_run.html or "minimize"_minimize.html command.
The specified group-ID is ignored by this fix.
The specified {attribute} can be one of the {hstyle} options listed
above, or an "equal-style variable"_variable.html referenced as
{v_name}, where "name" is the name of a variable that has been defined
previously in the input script.
The specified {attribute} can be one of the options listed above,
namely {bondmax} or {tlimit}, or an "equal-style
variable"_variable.html referenced as {v_name}, where "name" is the
name of a variable that has been defined previously in the input
script.
The only {hstyle} option currently implemented is {bondmax}. This
will loop over all bonds in the system, compute their current
lengths, and set {attribute} to the longest bond distance.
The {bondmax} attribute will loop over all bonds in the system,
compute their current lengths, and set {attribute} to the longest bond
distance.
The {tlimit} attribute queries the elapsed CPU time (in seconds) since
the current run began, and sets {attribute} to that value. This is an
alternative way to limit the length of a simulation run, similar to
the "timer"_timer.html timeout command. There are two differences in
using this method versus the timer command option. The first is that
the clock starts at the beginning of the current run (not when the
timer or fix command is specified), so that any setup time for the run
is not included in the elapsed time. The second is that the timer
invocation and syncing across all processors (via MPI_Allreduce) is
not performed once every {N} steps by this command. Instead it is
performed (typically) only a small number of times and the elapsed
times are used to predict when the end-of-the-run will be. Both of
these attributes can be useful when performing benchmark calculations
for a desired length of time with minmimal overhead. For example, if
a run is performing 1000s of timesteps/sec, the overhead for syncing
the timer frequently across a large number of processors may be
non-negligble.
Equal-style variables evaluate to a numeric value. See the
"variable"_variable.html command for a description. They calculate
@ -100,6 +120,14 @@ Note that you may wish use the "unfix"_unfix.html command on the fix
halt ID, so that the same condition is not immediately triggered in a
subsequent run.
The optional {message} keyword determines whether a message is printed
to the screen and logfile when the half condition is triggered. If
{message} is set to yes, a one line message with the values that
triggered the halt is printed. If {message} is set to no, no message
is printed; the run simply exits. The latter may be desirable for
post-processing tools that extract thermodyanmic information from log
files.
[Restart, fix_modify, output, run start/stop, minimize info:]
No information about this fix is written to "binary restart
@ -118,4 +146,4 @@ This fix is not invoked during "energy minimization"_minimize.html.
[Default:]
The option defaults are error = hard.
The option defaults are error = hard and message = yes.

View File

@ -17,7 +17,7 @@ status = numerical exit status (optional)
[Examples:]
quit
if "$n > 10000" then "quit 1":pre
if "$n > 10000" then "quit 1" :pre
[Description:]

View File

@ -36,7 +36,7 @@ args = list of arguments for a particular style :l
elaplong = timesteps since start of initial run in a series of runs
dt = timestep size
time = simulation time
cpu = elapsed CPU time in seconds
cpu = elapsed CPU time in seconds since start of this run
tpcpu = time per CPU second
spcpu = timesteps per CPU second
cpuremain = estimated CPU time remaining in run

View File

@ -456,6 +456,7 @@ void Balance::options(int iarg, int narg, char **arg)
wtflag = 0;
varflag = 0;
oldrcb = 0;
outflag = 0;
int outarg = 0;
fp = NULL;
@ -491,6 +492,9 @@ void Balance::options(int iarg, int narg, char **arg)
}
iarg += 2+nopt;
} else if (strcmp(arg[iarg],"old") == 0) {
oldrcb = 1;
iarg++;
} else if (strcmp(arg[iarg],"out") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal (fix) balance command");
outflag = 1;
@ -641,12 +645,21 @@ int *Balance::bisection(int sortflag)
// invoke RCB
// then invert() to create list of proc assignments for my atoms
// NOTE: (3/2017) can remove undocumented "old" option at some point
// ditto in rcb.cpp
if (wtflag) {
weight = fixstore->vstore;
rcb->compute(dim,atom->nlocal,atom->x,weight,shrinklo,shrinkhi);
} else rcb->compute(dim,atom->nlocal,atom->x,NULL,shrinklo,shrinkhi);
if (oldrcb) {
if (wtflag) {
weight = fixstore->vstore;
rcb->compute_old(dim,atom->nlocal,atom->x,weight,shrinklo,shrinkhi);
} else rcb->compute_old(dim,atom->nlocal,atom->x,NULL,shrinklo,shrinkhi);
} else {
if (wtflag) {
weight = fixstore->vstore;
rcb->compute(dim,atom->nlocal,atom->x,weight,shrinklo,shrinkhi);
} else rcb->compute(dim,atom->nlocal,atom->x,NULL,shrinklo,shrinkhi);
}
rcb->invert(sortflag);
// reset RCB lo/hi bounding box to full simulation box as needed

View File

@ -53,6 +53,7 @@ class Balance : protected Pointers {
int style; // style of LB
int xflag,yflag,zflag; // xyz LB flags
double *user_xsplit,*user_ysplit,*user_zsplit; // params for xyz LB
int oldrcb; // use old-style RCB compute
int nitermax; // params for shift LB
double stopthresh;

View File

@ -30,9 +30,10 @@
using namespace LAMMPS_NS;
using namespace FixConst;
enum{BONDMAX,VARIABLE};
enum{BONDMAX,TLIMIT,VARIABLE};
enum{LT,LE,GT,GE,EQ,NEQ,XOR};
enum{HARD,SOFT,CONTINUE};
enum{NOMSG,YESMSG};
/* ---------------------------------------------------------------------- */
@ -47,7 +48,8 @@ FixHalt::FixHalt(LAMMPS *lmp, int narg, char **arg) :
idvar = NULL;
if (strcmp(arg[4],"bondmax") == 0) attribute = BONDMAX;
if (strcmp(arg[4],"tlimit") == 0) attribute = TLIMIT;
else if (strcmp(arg[4],"bondmax") == 0) attribute = BONDMAX;
else if (strncmp(arg[4],"v_",2) == 0) {
attribute = VARIABLE;
int n = strlen(arg[4]);
@ -73,6 +75,7 @@ FixHalt::FixHalt(LAMMPS *lmp, int narg, char **arg) :
// parse optional args
eflag = SOFT;
msgflag = YESMSG;
int iarg = 7;
while (iarg < narg) {
@ -83,6 +86,12 @@ FixHalt::FixHalt(LAMMPS *lmp, int narg, char **arg) :
else if (strcmp(arg[iarg+1],"continue") == 0) eflag = CONTINUE;
else error->all(FLERR,"Illegal fix halt command");
iarg += 2;
} else if (strcmp(arg[iarg],"message") == 0) {
if (iarg+2 > narg) error->all(FLERR,"Illegal fix halt command");
if (strcmp(arg[iarg+1],"no") == 0) msgflag = NOMSG;
else if (strcmp(arg[iarg+1],"yes") == 0) msgflag = YESMSG;
else error->all(FLERR,"Illegal fix halt command");
iarg += 2;
} else error->all(FLERR,"Illegal fix halt command");
}
@ -125,6 +134,11 @@ void FixHalt::init()
if (input->variable->equalstyle(ivar) == 0)
error->all(FLERR,"Fix halt variable is not equal-style variable");
}
// settings used by TLIMIT
nextstep = (update->ntimestep/nevery)*nevery + nevery;
tratio = 0.5;
}
/* ---------------------------------------------------------------------- */
@ -135,8 +149,12 @@ void FixHalt::end_of_step()
double attvalue;
if (attribute == BONDMAX) attvalue = bondmax();
else {
if (attribute == TLIMIT) {
if (update->ntimestep != nextstep) return;
attvalue = tlimit();
} else if (attribute == BONDMAX) {
attvalue = bondmax();
} else {
modify->clearstep_compute();
attvalue = input->variable->compute_equal(ivar);
modify->addstep_compute(update->ntimestep + nevery);
@ -169,9 +187,10 @@ void FixHalt::end_of_step()
sprintf(str,"Fix halt %s condition met on step %ld with value %g",
id,update->ntimestep,attvalue);
if (eflag == HARD) error->all(FLERR,str);
else if (eflag == SOFT || eflag == CONTINUE) {
if (comm->me == 0) error->message(FLERR,str);
if (eflag == HARD) {
error->all(FLERR,str);
} else if (eflag == SOFT || eflag == CONTINUE) {
if (comm->me == 0 && msgflag == YESMSG) error->message(FLERR,str);
timer->force_timeout();
}
}
@ -218,3 +237,27 @@ double FixHalt::bondmax()
return sqrt(maxall);
}
/* ----------------------------------------------------------------------
compute synced elapsed time
reset nextstep = estimate of timestep when run will end
first project to 1/2 the run time, thereafter to end of run
------------------------------------------------------------------------- */
double FixHalt::tlimit()
{
double cpu = timer->elapsed(Timer::TOTAL);
MPI_Bcast(&cpu,1,MPI_DOUBLE,0,world);
if (cpu < value) {
bigint elapsed = update->ntimestep - update->firststep;
bigint final = update->firststep +
static_cast<bigint> (tratio*value/cpu * elapsed);
nextstep = (final/nevery)*nevery + nevery;
tratio = 1.0;
}
//printf("EVAL %ld %g %d\n",update->ntimestep,cpu,nevery);
return cpu;
}

View File

@ -35,11 +35,13 @@ class FixHalt : public Fix {
void post_run();
private:
int attribute,operation,eflag,ivar;
double value;
int attribute,operation,eflag,msgflag,ivar;
bigint nextstep;
double value,tratio;
char *idvar;
double bondmax();
double tlimit();
};
}

View File

@ -42,7 +42,7 @@ RCB::RCB(LAMMPS *lmp) : Pointers(lmp)
dots = NULL;
nlist = maxlist = 0;
dotlist = dotmark = NULL;
dotlist = dotmark = dotmark_select = NULL;
maxbuf = 0;
buf = NULL;
@ -73,6 +73,7 @@ RCB::~RCB()
memory->sfree(dots);
memory->destroy(dotlist);
memory->destroy(dotmark);
memory->destroy(dotmark_select);
memory->sfree(buf);
memory->destroy(recvproc);
@ -91,6 +92,9 @@ RCB::~RCB()
/* ----------------------------------------------------------------------
perform RCB balancing of N particles at coords X in bounding box LO/HI
NEW version: each RCB cut is tested in all dimensions
dimeension that produces 2 boxes with largest min size is selected
this is to prevent very narrow boxes from being produced
if wt = NULL, ignore per-particle weights
if wt defined, per-particle weights > 0.0
dimension = 2 or 3
@ -103,6 +107,523 @@ RCB::~RCB()
void RCB::compute(int dimension, int n, double **x, double *wt,
double *bboxlo, double *bboxhi)
{
int i,j,k;
int keep,outgoing,incoming,incoming2;
int dim,markactive;
int indexlo,indexhi;
int first_iteration,breakflag;
double wttot,wtlo,wthi,wtsum,wtok,wtupto,wtmax;
double targetlo,targethi;
double valuemin,valuemax,valuehalf,valuehalf_select,smaller;
double tolerance;
MPI_Comm comm,comm_half;
MPI_Request request,request2;
Median med,medme;
// create list of my Dots
ndot = nkeep = noriginal = n;
if (ndot > maxdot) {
maxdot = ndot;
memory->sfree(dots);
dots = (Dot *) memory->smalloc(ndot*sizeof(Dot),"RCB:dots");
}
for (i = 0; i < ndot; i++) {
dots[i].x[0] = x[i][0];
dots[i].x[1] = x[i][1];
dots[i].x[2] = x[i][2];
dots[i].proc = me;
dots[i].index = i;
}
if (wt)
for (i = 0; i < ndot; i++) dots[i].wt = wt[i];
else
for (i = 0; i < ndot; i++) dots[i].wt = 1.0;
// initial bounding box = simulation box
// includes periodic or shrink-wrapped boundaries
lo = bbox.lo;
hi = bbox.hi;
lo[0] = bboxlo[0];
lo[1] = bboxlo[1];
lo[2] = bboxlo[2];
hi[0] = bboxhi[0];
hi[1] = bboxhi[1];
hi[2] = bboxhi[2];
cut = 0.0;
cutdim = -1;
// initialize counters
counters[0] = 0;
counters[1] = 0;
counters[2] = 0;
counters[3] = ndot;
counters[4] = maxdot;
counters[5] = 0;
counters[6] = 0;
// create communicator for use in recursion
MPI_Comm_dup(world,&comm);
// recurse until partition is a single proc = me
// proclower,procupper = lower,upper procs in partition
// procmid = 1st proc in upper half of partition
int procpartner,procpartner2;
int procmid;
int proclower = 0;
int procupper = nprocs - 1;
while (proclower != procupper) {
// if odd # of procs, lower partition gets extra one
procmid = proclower + (procupper - proclower) / 2 + 1;
// determine communication partner(s)
// readnumber = # of proc partners to read from
if (me < procmid)
procpartner = me + (procmid - proclower);
else
procpartner = me - (procmid - proclower);
int readnumber = 1;
if (procpartner > procupper) {
readnumber = 0;
procpartner--;
}
if (me == procupper && procpartner != procmid - 1) {
readnumber = 2;
procpartner2 = procpartner + 1;
}
// wttot = summed weight of entire partition
// search tolerance = largest single weight (plus epsilon)
// targetlo = desired weight in lower half of partition
// targethi = desired weight in upper half of partition
wtmax = wtsum = 0.0;
if (wt) {
for (i = 0; i < ndot; i++) {
wtsum += dots[i].wt;
if (dots[i].wt > wtmax) wtmax = dots[i].wt;
}
} else {
for (i = 0; i < ndot; i++) wtsum += dots[i].wt;
wtmax = 1.0;
}
MPI_Allreduce(&wtsum,&wttot,1,MPI_DOUBLE,MPI_SUM,comm);
if (wt) MPI_Allreduce(&wtmax,&tolerance,1,MPI_DOUBLE,MPI_MAX,comm);
else tolerance = 1.0;
tolerance *= 1.0 + TINY;
targetlo = wttot * (procmid - proclower) / (procupper + 1 - proclower);
targethi = wttot - targetlo;
// attempt a cut in each dimension
// each cut produces 2 boxes, each with a reduced box length in that dim
// smaller = the smaller of the 2 reduced box lengths in that dimension
// choose to cut in dimension which produces largest smaller value
// this should induce final proc sub-boxes to be as cube-ish as possible
// dim_select = selected cut dimension
// valuehalf_select = valuehalf in that dimension
// dotmark_select = dot markings in that dimension
int dim_select = -1;
double largest = 0.0;
for (dim = 0; dim < dimension; dim++) {
// create active list and mark array for dots
// initialize active list to all dots
if (ndot > maxlist) {
memory->destroy(dotlist);
memory->destroy(dotmark);
memory->destroy(dotmark_select);
maxlist = maxdot;
memory->create(dotlist,maxlist,"RCB:dotlist");
memory->create(dotmark,maxlist,"RCB:dotmark");
memory->create(dotmark_select,maxlist,"RCB:dotmark_select");
}
nlist = ndot;
for (i = 0; i < nlist; i++) dotlist[i] = i;
// median iteration
// zoom in on bisector until correct # of dots in each half of partition
// as each iteration of median-loop begins, require:
// all non-active dots are marked with 0/1 in dotmark
// valuemin <= every active dot <= valuemax
// wtlo, wthi = total wt of non-active dots
// when leave median-loop, require only:
// valuehalf = correct cut position
// all dots <= valuehalf are marked with 0 in dotmark
// all dots >= valuehalf are marked with 1 in dotmark
// markactive = which side of cut is active = 0/1
// indexlo,indexhi = indices of dot closest to median
wtlo = wthi = 0.0;
valuemin = lo[dim];
valuemax = hi[dim];
first_iteration = 1;
indexlo = indexhi = 0;
while (1) {
// choose bisector value
// use old value on 1st iteration if old cut dimension is the same
// on 2nd option: could push valuehalf towards geometric center
// with "1.0-factor" to force overshoot
if (first_iteration && reuse && dim == tree[procmid].dim) {
counters[5]++;
valuehalf = tree[procmid].cut;
if (valuehalf < valuemin || valuehalf > valuemax)
valuehalf = 0.5 * (valuemin + valuemax);
} else if (wt)
valuehalf = valuemin + (targetlo - wtlo) /
(wttot - wtlo - wthi) * (valuemax - valuemin);
else
valuehalf = 0.5 * (valuemin + valuemax);
first_iteration = 0;
// initialize local median data structure
medme.totallo = medme.totalhi = 0.0;
medme.valuelo = -MYHUGE;
medme.valuehi = MYHUGE;
medme.wtlo = medme.wthi = 0.0;
medme.countlo = medme.counthi = 0;
medme.proclo = medme.prochi = me;
// mark all active dots on one side or other of bisector
// also set all fields in median data struct
// save indices of closest dots on either side
for (j = 0; j < nlist; j++) {
i = dotlist[j];
if (dots[i].x[dim] <= valuehalf) { // in lower part
medme.totallo += dots[i].wt;
dotmark[i] = 0;
if (dots[i].x[dim] > medme.valuelo) { // my closest dot
medme.valuelo = dots[i].x[dim];
medme.wtlo = dots[i].wt;
medme.countlo = 1;
indexlo = i;
} else if (dots[i].x[dim] == medme.valuelo) { // tied for closest
medme.wtlo += dots[i].wt;
medme.countlo++;
}
}
else { // in upper part
medme.totalhi += dots[i].wt;
dotmark[i] = 1;
if (dots[i].x[dim] < medme.valuehi) { // my closest dot
medme.valuehi = dots[i].x[dim];
medme.wthi = dots[i].wt;
medme.counthi = 1;
indexhi = i;
} else if (dots[i].x[dim] == medme.valuehi) { // tied for closest
medme.wthi += dots[i].wt;
medme.counthi++;
}
}
}
// combine median data struct across current subset of procs
counters[0]++;
MPI_Allreduce(&medme,&med,1,med_type,med_op,comm);
// test median guess for convergence
// move additional dots that are next to cut across it
if (wtlo + med.totallo < targetlo) { // lower half TOO SMALL
wtlo += med.totallo;
valuehalf = med.valuehi;
if (med.counthi == 1) { // only one dot to move
if (wtlo + med.wthi < targetlo) { // move it, keep iterating
if (me == med.prochi) dotmark[indexhi] = 0;
}
else { // only move if beneficial
if (wtlo + med.wthi - targetlo < targetlo - wtlo)
if (me == med.prochi) dotmark[indexhi] = 0;
break; // all done
}
}
else { // multiple dots to move
breakflag = 0;
wtok = 0.0;
if (medme.valuehi == med.valuehi) wtok = medme.wthi;
if (wtlo + med.wthi >= targetlo) { // all done
MPI_Scan(&wtok,&wtupto,1,MPI_DOUBLE,MPI_SUM,comm);
wtmax = targetlo - wtlo;
if (wtupto > wtmax) wtok = wtok - (wtupto - wtmax);
breakflag = 1;
} // wtok = most I can move
for (j = 0, wtsum = 0.0; j < nlist && wtsum < wtok; j++) {
i = dotlist[j];
if (dots[i].x[dim] == med.valuehi) { // only move if better
if (wtsum + dots[i].wt - wtok < wtok - wtsum)
dotmark[i] = 0;
wtsum += dots[i].wt;
}
}
if (breakflag) break; // done if moved enough
}
wtlo += med.wthi;
if (targetlo-wtlo <= tolerance) break; // close enough
valuemin = med.valuehi; // iterate again
markactive = 1;
}
else if (wthi + med.totalhi < targethi) { // upper half TOO SMALL
wthi += med.totalhi;
valuehalf = med.valuelo;
if (med.countlo == 1) { // only one dot to move
if (wthi + med.wtlo < targethi) { // move it, keep iterating
if (me == med.proclo) dotmark[indexlo] = 1;
}
else { // only move if beneficial
if (wthi + med.wtlo - targethi < targethi - wthi)
if (me == med.proclo) dotmark[indexlo] = 1;
break; // all done
}
}
else { // multiple dots to move
breakflag = 0;
wtok = 0.0;
if (medme.valuelo == med.valuelo) wtok = medme.wtlo;
if (wthi + med.wtlo >= targethi) { // all done
MPI_Scan(&wtok,&wtupto,1,MPI_DOUBLE,MPI_SUM,comm);
wtmax = targethi - wthi;
if (wtupto > wtmax) wtok = wtok - (wtupto - wtmax);
breakflag = 1;
} // wtok = most I can move
for (j = 0, wtsum = 0.0; j < nlist && wtsum < wtok; j++) {
i = dotlist[j];
if (dots[i].x[dim] == med.valuelo) { // only move if better
if (wtsum + dots[i].wt - wtok < wtok - wtsum)
dotmark[i] = 1;
wtsum += dots[i].wt;
}
}
if (breakflag) break; // done if moved enough
}
wthi += med.wtlo;
if (targethi-wthi <= tolerance) break; // close enough
valuemax = med.valuelo; // iterate again
markactive = 0;
}
else // Goldilocks result: both partitions just right
break;
// shrink the active list
k = 0;
for (j = 0; j < nlist; j++) {
i = dotlist[j];
if (dotmark[i] == markactive) dotlist[k++] = i;
}
nlist = k;
}
// cut produces 2 sub-boxes with reduced size in dim
// compare smaller of the 2 sizes to previous dims
// keep dim that has the largest smaller
smaller = MIN(valuehalf-lo[dim],hi[dim]-valuehalf);
if (smaller > largest) {
largest = smaller;
dim_select = dim;
valuehalf_select = valuehalf;
memcpy(dotmark_select,dotmark,ndot*sizeof(int));
}
}
// copy results for best dim cut into dim,valuehalf,dotmark
dim = dim_select;
valuehalf = valuehalf_select;
memcpy(dotmark,dotmark_select,ndot*sizeof(int));
// found median
// store cut info only if I am procmid
if (me == procmid) {
cut = valuehalf;
cutdim = dim;
}
// use cut to shrink my RCB bounding box
if (me < procmid) hi[dim] = valuehalf;
else lo[dim] = valuehalf;
// outgoing = number of dots to ship to partner
// nkeep = number of dots that have never migrated
markactive = (me < procpartner);
for (i = 0, keep = 0, outgoing = 0; i < ndot; i++)
if (dotmark[i] == markactive) outgoing++;
else if (i < nkeep) keep++;
nkeep = keep;
// alert partner how many dots I'll send, read how many I'll recv
MPI_Send(&outgoing,1,MPI_INT,procpartner,0,world);
incoming = 0;
if (readnumber) {
MPI_Recv(&incoming,1,MPI_INT,procpartner,0,world,MPI_STATUS_IGNORE);
if (readnumber == 2) {
MPI_Recv(&incoming2,1,MPI_INT,procpartner2,0,world,MPI_STATUS_IGNORE);
incoming += incoming2;
}
}
// check if need to alloc more space
int ndotnew = ndot - outgoing + incoming;
if (ndotnew > maxdot) {
while (maxdot < ndotnew) maxdot += DELTA;
dots = (Dot *) memory->srealloc(dots,maxdot*sizeof(Dot),"RCB::dots");
counters[6]++;
}
counters[1] += outgoing;
counters[2] += incoming;
if (ndotnew > counters[3]) counters[3] = ndotnew;
if (maxdot > counters[4]) counters[4] = maxdot;
// malloc comm send buffer
if (outgoing > maxbuf) {
memory->sfree(buf);
maxbuf = outgoing;
buf = (Dot *) memory->smalloc(maxbuf*sizeof(Dot),"RCB:buf");
}
// fill buffer with dots that are marked for sending
// pack down the unmarked ones
keep = outgoing = 0;
for (i = 0; i < ndot; i++) {
if (dotmark[i] == markactive)
memcpy(&buf[outgoing++],&dots[i],sizeof(Dot));
else
memcpy(&dots[keep++],&dots[i],sizeof(Dot));
}
// post receives for dots
if (readnumber > 0) {
MPI_Irecv(&dots[keep],incoming*sizeof(Dot),MPI_CHAR,
procpartner,1,world,&request);
if (readnumber == 2) {
keep += incoming - incoming2;
MPI_Irecv(&dots[keep],incoming2*sizeof(Dot),MPI_CHAR,
procpartner2,1,world,&request2);
}
}
// handshake before sending dots to insure recvs have been posted
if (readnumber > 0) {
MPI_Send(NULL,0,MPI_INT,procpartner,0,world);
if (readnumber == 2) MPI_Send(NULL,0,MPI_INT,procpartner2,0,world);
}
MPI_Recv(NULL,0,MPI_INT,procpartner,0,world,MPI_STATUS_IGNORE);
// send dots to partner
MPI_Rsend(buf,outgoing*sizeof(Dot),MPI_CHAR,procpartner,1,world);
// wait until all dots are received
if (readnumber > 0) {
MPI_Wait(&request,MPI_STATUS_IGNORE);
if (readnumber == 2) MPI_Wait(&request2,MPI_STATUS_IGNORE);
}
ndot = ndotnew;
// cut partition in half, create new communicators of 1/2 size
int split;
if (me < procmid) {
procupper = procmid - 1;
split = 0;
} else {
proclower = procmid;
split = 1;
}
MPI_Comm_split(comm,split,me,&comm_half);
MPI_Comm_free(&comm);
comm = comm_half;
}
// clean up
MPI_Comm_free(&comm);
// set public variables with results of rebalance
nfinal = ndot;
if (nfinal > maxrecv) {
memory->destroy(recvproc);
memory->destroy(recvindex);
maxrecv = nfinal;
memory->create(recvproc,maxrecv,"RCB:recvproc");
memory->create(recvindex,maxrecv,"RCB:recvindex");
}
for (i = 0; i < nfinal; i++) {
recvproc[i] = dots[i].proc;
recvindex[i] = dots[i].index;
}
}
/* ----------------------------------------------------------------------
perform RCB balancing of N particles at coords X in bounding box LO/HI
OLD version: each RCB cut is made in longest dimension of sub-box
if wt = NULL, ignore per-particle weights
if wt defined, per-particle weights > 0.0
dimension = 2 or 3
as documented in rcb.h:
sets noriginal,nfinal,nkeep,recvproc,recvindex,lo,hi
all proc particles will be inside or on surface of 3-d box
defined by final lo/hi
// NOTE: worry about re-use of data structs for fix balance?
------------------------------------------------------------------------- */
void RCB::compute_old(int dimension, int n, double **x, double *wt,
double *bboxlo, double *bboxhi)
{
int i,j,k;
int keep,outgoing,incoming,incoming2;

View File

@ -42,6 +42,7 @@ class RCB : protected Pointers {
RCB(class LAMMPS *);
~RCB();
void compute(int, int, double **, double *, double *, double *);
void compute_old(int, int, double **, double *, double *, double *);
void invert(int sortflag = 0);
bigint memory_usage();
@ -99,6 +100,7 @@ class RCB : protected Pointers {
int maxlist;
int *dotlist;
int *dotmark;
int *dotmark_select;
int maxbuf;
Dot *buf;

View File

@ -55,7 +55,8 @@ using namespace MathConst;
// customize a new keyword by adding to this list:
// step, elapsed, elaplong, dt, time, cpu, tpcpu, spcpu, cpuremain, part, timeremain
// step, elapsed, elaplong, dt, time, cpu, tpcpu, spcpu, cpuremain,
// part, timeremain
// atoms, temp, press, pe, ke, etotal, enthalpy
// evdwl, ecoul, epair, ebond, eangle, edihed, eimp, emol, elong, etail
// vol, density, lx, ly, lz, xlo, xhi, ylo, yhi, zlo, zhi, xy, xz, yz,
@ -394,6 +395,11 @@ void Thermo::compute(int flag)
if (flushflag) fflush(logfile);
}
}
// set to 1, so that subsequent invocations of CPU time will be non-zero
// e.g. via variables in print command
firststep = 1;
}
/* ----------------------------------------------------------------------

View File

@ -1 +1 @@
#define LAMMPS_VERSION "7 Mar 2017"
#define LAMMPS_VERSION "10 Mar 2017"