git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@7301 f3b2605a-c512-4ea7-a41b-209d697bcdaa

2011-12-08 00:29:41 +00:00 · 2011-12-08 00:29:41 +00:00 · 4f79dab7b3
parent 4810127eba
commit 4f79dab7b3
2 changed files with 130 additions and 2 deletions
--- a/doc/run_style.html
+++ b/doc/run_style.html
@ -15,9 +15,10 @@
 </P>
 <PRE>run_style style args 
 </PRE>
-<UL><LI>style = <I>verlet</I> or <I>respa</I> 
+<UL><LI>style = <I>verlet</I> or <I>verlet/split</I> or <I>respa</I> 

 <PRE>  <I>verlet</I> args = none
+  <I>verlet/split</I> args = none
  <I>respa</I> args = N n1 n2 ... keyword values ...
    N = # of levels of rRESPA
    n1, n2, ... = loop factor between rRESPA levels (N-1 values)
@ -64,6 +65,69 @@ simulations performed by LAMMPS.
 </P>
 <P>The <I>verlet</I> style is a velocity-Verlet integrator.
 </P>
+<HR>
+
+<P>The <I>verlet/style</I> style is also a velocity-Verlet integrator, but it
+splits the force calculation within each timestep over 2 partitions of
+processors.  See <A HREF = "Section_start.html#start_6">this section</A> for an
+explanation of the -partition command-line switch.
+</P>
+<P>Specifically, this style performs all computation except the
+<A HREF = "kspace_style.html">kspace_style</A> portion of the force field on the 1st
+partition.  This include the <A HREF = "pair_style.html">pair style</A>, <A HREF = "bond_style.html">bond
+style</A>, <A HREF = "neighbor.html">neighbor list building</A>,
+<A HREF = "fix.html">fixes</A> including time intergration, and output.  The
+<A HREF = "kspace_style.html">kspace_style</A> portion of the calculation is
+performed on the 2nd partition.
+</P>
+<P>This is most useful for the PPPM kspace_style when its performance on
+a large number of processors degrades due to the cost of communication
+in its 3d FFTs.  In this scenario, splitting your P total processors
+into 2 subsets of processors, P1 in the 1st partition and P2 in the
+2nd partition, can enable your simulation to run faster.  This is
+because the long-range forces in PPPM can be calculated at the same
+time as pair-wise and bonded forces are being calculated, and the FFTs
+can actually speed up when running on fewer processors.
+</P>
+<P>To use this style, you must define 2 partitions where P1 is a multiple
+of P2.  Typically having P1 be 3x larger than P2 is a good choice.
+The 3d processor layouts in each partition must overlay in the
+following sense.  If P1 is a Px1 by Py1 by Pz1 grid, and P2 = Px2 by
+Py2 by Pz2, then Px1 must be an integer multiple of Px2, and similarly
+for Py1 a multiple of Py2, and Pz1 a multiple of Pz2.
+</P>
+<P>Typically the best way to do this is to let the 1st partition choose
+its onn optimal layout, then require the 2nd partition's layout to
+match the integer multiple constraint.  See the
+<A HREF = "processors.html">processors</A> command with its <I>part</I> keyword for a way
+to control this, e.g.
+</P>
+<PRE>procssors * * * part 1 2 multiple 
+</PRE>
+<P>You can also use the <A HREF = "partition.html">partition</A> command to explicitly
+specity the processor layout on each partition.  E.g. for 2 partitions
+of 60 and 15 processors each:
+</P>
+<PRE>partition yes 1 processors 3 4 5
+partition yes 2 processors 3 1 5 
+</PRE>
+<P>When you run in 2-partition mode with this <I>verlet/split</I> style, the
+thermodyanmic data for the entire simulation will be output to the log
+and screen file of the 1st partition, which are log.lammps.0 and
+screen.0 by default; see the "-plog and -pscreen command-line
+switches"Section_start.html#start_6 to change this.  The log and
+screen file for the 2nd partition will not contain thermodynamic
+output beyone the 1st timestep of the run.
+</P>
+<P>See <A HREF = "Section_accelerate.html">this section</A> of the manual for
+performance details of the speed-up offered by the <I>verlet/split</I>
+style.  One important performance consideration is the assignemnt of
+logical processors in the 2 partitions to the physical cores of a
+parallel machine.  <A HREF = "Section_accelerate.html">This section</A> discusses
+how to optimize this mapping.
+</P>
+<HR>
+
 <P>The <I>respa</I> style implements the rRESPA multi-timescale integrator
 <A HREF = "#Tuckerman">(Tuckerman)</A> with N hierarchical levels, where level 1 is
 the innermost loop (shortest timestep) and level N is the outermost
--- a/doc/run_style.txt
+++ b/doc/run_style.txt
@ -12,8 +12,9 @@ run_style command :h3

 run_style style args :pre

-style = {verlet} or {respa} :ulb,l
+style = {verlet} or {verlet/split} or {respa} :ulb,l
  {verlet} args = none
+  {verlet/split} args = none
  {respa} args = N n1 n2 ... keyword values ...
    N = # of levels of rRESPA
    n1, n2, ... = loop factor between rRESPA levels (N-1 values)
@ -59,6 +60,69 @@ simulations performed by LAMMPS.

 The {verlet} style is a velocity-Verlet integrator.

+:line
+
+The {verlet/style} style is also a velocity-Verlet integrator, but it
+splits the force calculation within each timestep over 2 partitions of
+processors.  See "this section"_Section_start.html#start_6 for an
+explanation of the -partition command-line switch.
+
+Specifically, this style performs all computation except the
+"kspace_style"_kspace_style.html portion of the force field on the 1st
+partition.  This include the "pair style"_pair_style.html, "bond
+style"_bond_style.html, "neighbor list building"_neighbor.html,
+"fixes"_fix.html including time intergration, and output.  The
+"kspace_style"_kspace_style.html portion of the calculation is
+performed on the 2nd partition.
+
+This is most useful for the PPPM kspace_style when its performance on
+a large number of processors degrades due to the cost of communication
+in its 3d FFTs.  In this scenario, splitting your P total processors
+into 2 subsets of processors, P1 in the 1st partition and P2 in the
+2nd partition, can enable your simulation to run faster.  This is
+because the long-range forces in PPPM can be calculated at the same
+time as pair-wise and bonded forces are being calculated, and the FFTs
+can actually speed up when running on fewer processors.
+
+To use this style, you must define 2 partitions where P1 is a multiple
+of P2.  Typically having P1 be 3x larger than P2 is a good choice.
+The 3d processor layouts in each partition must overlay in the
+following sense.  If P1 is a Px1 by Py1 by Pz1 grid, and P2 = Px2 by
+Py2 by Pz2, then Px1 must be an integer multiple of Px2, and similarly
+for Py1 a multiple of Py2, and Pz1 a multiple of Pz2.
+
+Typically the best way to do this is to let the 1st partition choose
+its onn optimal layout, then require the 2nd partition's layout to
+match the integer multiple constraint.  See the
+"processors"_processors.html command with its {part} keyword for a way
+to control this, e.g.
+
+procssors * * * part 1 2 multiple :pre
+
+You can also use the "partition"_partition.html command to explicitly
+specity the processor layout on each partition.  E.g. for 2 partitions
+of 60 and 15 processors each:
+
+partition yes 1 processors 3 4 5
+partition yes 2 processors 3 1 5 :pre
+
+When you run in 2-partition mode with this {verlet/split} style, the
+thermodyanmic data for the entire simulation will be output to the log
+and screen file of the 1st partition, which are log.lammps.0 and
+screen.0 by default; see the "-plog and -pscreen command-line
+switches"Section_start.html#start_6 to change this.  The log and
+screen file for the 2nd partition will not contain thermodynamic
+output beyone the 1st timestep of the run.
+
+See "this section"_Section_accelerate.html of the manual for
+performance details of the speed-up offered by the {verlet/split}
+style.  One important performance consideration is the assignemnt of
+logical processors in the 2 partitions to the physical cores of a
+parallel machine.  "This section"_Section_accelerate.html discusses
+how to optimize this mapping.
+
+:line
+
 The {respa} style implements the rRESPA multi-timescale integrator
 "(Tuckerman)"_#Tuckerman with N hierarchical levels, where level 1 is
 the innermost loop (shortest timestep) and level N is the outermost