git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@7301 f3b2605a-c512-4ea7-a41b-209d697bcdaa

2011-12-08 00:29:41 +00:00 · 2011-12-08 00:29:41 +00:00 · 4f79dab7b3
parent 4810127eba
commit 4f79dab7b3
2 changed files with 130 additions and 2 deletions
--- a/doc/run_style.html
+++ b/doc/run_style.html
@ -15,9 +15,10 @@
 </P>
 <PRE>run_style style args 
 </PRE>
-<UL><LI>style = <I>verlet</I> or <I>respa</I> 
+<UL><LI>style = <I>verlet</I> or <I>verlet/split</I> or <I>respa</I> 
 <PRE>  <I>verlet</I> args = none
  <I>verlet/split</I> args = none
  <I>respa</I> args = N n1 n2 ... keyword values ...
    N = # of levels of rRESPA
    n1, n2, ... = loop factor between rRESPA levels (N-1 values)
@ -64,6 +65,69 @@ simulations performed by LAMMPS.
 </P>
 <P>The <I>verlet</I> style is a velocity-Verlet integrator.
 </P>
 <HR>
 <P>The <I>verlet/style</I> style is also a velocity-Verlet integrator, but it
 splits the force calculation within each timestep over 2 partitions of
 processors.  See <A HREF = "Section_start.html#start_6">this section</A> for an
 explanation of the -partition command-line switch.
 </P>
 <P>Specifically, this style performs all computation except the
 <A HREF = "kspace_style.html">kspace_style</A> portion of the force field on the 1st
 partition.  This include the <A HREF = "pair_style.html">pair style</A>, <A HREF = "bond_style.html">bond
 style</A>, <A HREF = "neighbor.html">neighbor list building</A>,
 <A HREF = "fix.html">fixes</A> including time intergration, and output.  The
 <A HREF = "kspace_style.html">kspace_style</A> portion of the calculation is
 performed on the 2nd partition.
 </P>
 <P>This is most useful for the PPPM kspace_style when its performance on
 a large number of processors degrades due to the cost of communication
 in its 3d FFTs.  In this scenario, splitting your P total processors
 into 2 subsets of processors, P1 in the 1st partition and P2 in the
 2nd partition, can enable your simulation to run faster.  This is
 because the long-range forces in PPPM can be calculated at the same
 time as pair-wise and bonded forces are being calculated, and the FFTs
 can actually speed up when running on fewer processors.
 </P>
 <P>To use this style, you must define 2 partitions where P1 is a multiple
 of P2.  Typically having P1 be 3x larger than P2 is a good choice.
 The 3d processor layouts in each partition must overlay in the
 following sense.  If P1 is a Px1 by Py1 by Pz1 grid, and P2 = Px2 by
 Py2 by Pz2, then Px1 must be an integer multiple of Px2, and similarly
 for Py1 a multiple of Py2, and Pz1 a multiple of Pz2.
 </P>
 <P>Typically the best way to do this is to let the 1st partition choose
 its onn optimal layout, then require the 2nd partition's layout to
 match the integer multiple constraint.  See the
 <A HREF = "processors.html">processors</A> command with its <I>part</I> keyword for a way
 to control this, e.g.
 </P>
 <PRE>procssors * * * part 1 2 multiple 
 </PRE>
 <P>You can also use the <A HREF = "partition.html">partition</A> command to explicitly
 specity the processor layout on each partition.  E.g. for 2 partitions
 of 60 and 15 processors each:
 </P>
 <PRE>partition yes 1 processors 3 4 5
 partition yes 2 processors 3 1 5 
 </PRE>
 <P>When you run in 2-partition mode with this <I>verlet/split</I> style, the
 thermodyanmic data for the entire simulation will be output to the log
 and screen file of the 1st partition, which are log.lammps.0 and
 screen.0 by default; see the "-plog and -pscreen command-line
 switches"Section_start.html#start_6 to change this.  The log and
 screen file for the 2nd partition will not contain thermodynamic
 output beyone the 1st timestep of the run.
 </P>
 <P>See <A HREF = "Section_accelerate.html">this section</A> of the manual for
 performance details of the speed-up offered by the <I>verlet/split</I>
 style.  One important performance consideration is the assignemnt of
 logical processors in the 2 partitions to the physical cores of a
 parallel machine.  <A HREF = "Section_accelerate.html">This section</A> discusses
 how to optimize this mapping.
 </P>
 <HR>
 <P>The <I>respa</I> style implements the rRESPA multi-timescale integrator
 <A HREF = "#Tuckerman">(Tuckerman)</A> with N hierarchical levels, where level 1 is
 the innermost loop (shortest timestep) and level N is the outermost
--- a/doc/run_style.txt
+++ b/doc/run_style.txt
@ -12,8 +12,9 @@ run_style command :h3
 run_style style args :pre
-style = {verlet} or {respa} :ulb,l
+style = {verlet} or {verlet/split} or {respa} :ulb,l
  {verlet} args = none
  {verlet/split} args = none
  {respa} args = N n1 n2 ... keyword values ...
    N = # of levels of rRESPA
    n1, n2, ... = loop factor between rRESPA levels (N-1 values)
@ -59,6 +60,69 @@ simulations performed by LAMMPS.
 The {verlet} style is a velocity-Verlet integrator.
 :line
 The {verlet/style} style is also a velocity-Verlet integrator, but it
 splits the force calculation within each timestep over 2 partitions of
 processors.  See "this section"_Section_start.html#start_6 for an
 explanation of the -partition command-line switch.
 Specifically, this style performs all computation except the
 "kspace_style"_kspace_style.html portion of the force field on the 1st
 partition.  This include the "pair style"_pair_style.html, "bond
 style"_bond_style.html, "neighbor list building"_neighbor.html,
 "fixes"_fix.html including time intergration, and output.  The
 "kspace_style"_kspace_style.html portion of the calculation is
 performed on the 2nd partition.
 This is most useful for the PPPM kspace_style when its performance on
 a large number of processors degrades due to the cost of communication
 in its 3d FFTs.  In this scenario, splitting your P total processors
 into 2 subsets of processors, P1 in the 1st partition and P2 in the
 2nd partition, can enable your simulation to run faster.  This is
 because the long-range forces in PPPM can be calculated at the same
 time as pair-wise and bonded forces are being calculated, and the FFTs
 can actually speed up when running on fewer processors.
 To use this style, you must define 2 partitions where P1 is a multiple
 of P2.  Typically having P1 be 3x larger than P2 is a good choice.
 The 3d processor layouts in each partition must overlay in the
 following sense.  If P1 is a Px1 by Py1 by Pz1 grid, and P2 = Px2 by
 Py2 by Pz2, then Px1 must be an integer multiple of Px2, and similarly
 for Py1 a multiple of Py2, and Pz1 a multiple of Pz2.
 Typically the best way to do this is to let the 1st partition choose
 its onn optimal layout, then require the 2nd partition's layout to
 match the integer multiple constraint.  See the
 "processors"_processors.html command with its {part} keyword for a way
 to control this, e.g.
 procssors * * * part 1 2 multiple :pre
 You can also use the "partition"_partition.html command to explicitly
 specity the processor layout on each partition.  E.g. for 2 partitions
 of 60 and 15 processors each:
 partition yes 1 processors 3 4 5
 partition yes 2 processors 3 1 5 :pre
 When you run in 2-partition mode with this {verlet/split} style, the
 thermodyanmic data for the entire simulation will be output to the log
 and screen file of the 1st partition, which are log.lammps.0 and
 screen.0 by default; see the "-plog and -pscreen command-line
 switches"Section_start.html#start_6 to change this.  The log and
 screen file for the 2nd partition will not contain thermodynamic
 output beyone the 1st timestep of the run.
 See "this section"_Section_accelerate.html of the manual for
 performance details of the speed-up offered by the {verlet/split}
 style.  One important performance consideration is the assignemnt of
 logical processors in the 2 partitions to the physical cores of a
 parallel machine.  "This section"_Section_accelerate.html discusses
 how to optimize this mapping.
 :line
 The {respa} style implements the rRESPA multi-timescale integrator
 "(Tuckerman)"_#Tuckerman with N hierarchical levels, where level 1 is
 the innermost loop (shortest timestep) and level N is the outermost