forked from lijiext/lammps
git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@7345 f3b2605a-c512-4ea7-a41b-209d697bcdaa
This commit is contained in:
parent
dfdc3fe1a2
commit
a866ec3edb
|
@ -24,11 +24,12 @@
|
|||
<PRE> <I>grid</I> arg = gstyle params ...
|
||||
gstyle = <I>onelevel</I> or <I>twolevel</I> or <I>numa</I> or <I>custom</I>
|
||||
onelevel params = none
|
||||
twolevel params = Cx Cy Cz
|
||||
twolevel params = Nc Cx Cy Cz
|
||||
Nc = number of cores per node
|
||||
Cx,Cy,Cz = # of cores in each dimension of 3d sub-grid assigned to each node
|
||||
numa params = none
|
||||
custom params = fname
|
||||
fname = file containing grid layout
|
||||
custom params = inname
|
||||
inname = file containing grid layout
|
||||
<I>map</I> arg = <I>cart</I> or <I>cart/reorder</I> or <I>xyz</I> or <I>xzy</I> or <I>yxz</I> or <I>yzx</I> or <I>zxy</I> or <I>zyx</I>
|
||||
cart = use MPI_Cart() methods to map processors to 3d grid with reorder = 0
|
||||
cart/reorder = use MPI_Cart() methods to map processors to 3d grid with reorder = 1
|
||||
|
@ -50,6 +51,7 @@
|
|||
processors 2 4 4
|
||||
processors * * 8 map xyz
|
||||
processors * * * grid numa
|
||||
processors * * * grid twolevel 4 * * 1
|
||||
processors 4 8 16 grid custom myfile
|
||||
processors * * * part 1 2 multiple
|
||||
</PRE>
|
||||
|
@ -59,9 +61,9 @@ processors * * * part 1 2 multiple
|
|||
simulation box. This involves 2 steps. First if there are P
|
||||
processors it means choosing a factorization P = Px by Py by Pz so
|
||||
that there are Px processors in the x dimension, and similarly for the
|
||||
y and z dimensions. Second, the P processors (with MPI ranks 0 to
|
||||
P-1) are mapped to the logical 3d grid. The arguments to this command
|
||||
control each of these 2 steps.
|
||||
y and z dimensions. Second, the P processors are mapped to the
|
||||
logical 3d grid. The arguments to this command control each of these
|
||||
2 steps.
|
||||
</P>
|
||||
<P>The Px, Py, Pz parameters affect the factorization. Any of the 3
|
||||
parameters can be specified with an asterisk "*", which means LAMMPS
|
||||
|
@ -81,8 +83,8 @@ course of the simulation.
|
|||
LAMMPS is running on. For a <A HREF = "dimension.html">2d simulation</A>, Pz must
|
||||
equal 1.
|
||||
</P>
|
||||
<P>Note that if you run on a large, prime number of processors P, then a
|
||||
grid such as 1 x P x 1 will be required, which may incur extra
|
||||
<P>Note that if you run on a prime number of processors P, then a grid
|
||||
such as 1 x P x 1 will be required, which may incur extra
|
||||
communication costs due to the high surface area of each processor's
|
||||
sub-domain.
|
||||
</P>
|
||||
|
@ -101,8 +103,87 @@ partition yes 2 processors 2 3 2
|
|||
</PRE>
|
||||
<HR>
|
||||
|
||||
<P>The <I>grid</I> keyword affects how the P processor IDs (from 0 to P-1) are
|
||||
mapped to the 3d grid of processors.
|
||||
<P>The <I>grid</I> keyword affects the factorization of P into Px,Py,Pz and it
|
||||
can also affect how the P processor IDs are mapped to the 3d grid of
|
||||
processors.
|
||||
</P>
|
||||
<P>The <I>onelevel</I> style creates a 3d grid that is compatible with the
|
||||
Px,Py,Pz settings, and which minimizes the surface-to-volume ratio of
|
||||
each processor's sub-domain, as described above. The mapping of
|
||||
processors to the grid is determined by the <I>map</I> keyword setting.
|
||||
</P>
|
||||
<P>The <I>twolevel</I> style can be used on machines with multi-core nodes
|
||||
to minimize off-node communication. It insures that contiguous
|
||||
sub-sections of the 3d grid are assigned to all the cores of a node.
|
||||
For example if <I>Nc</I> is 4, then 2x2x1 or 2x1x2 or 1x2x2 sub-sections
|
||||
of the 3d grid will correspond to the cores of each node. This
|
||||
affects both the factorization and mapping steps.
|
||||
</P>
|
||||
<P>The <I>Cx</I>, <I>Cy</I>, <I>Cz</I> settings are similar to the <I>Px</I>, <I>Py</I>, <I>Pz</I>
|
||||
settings, only their product should equal <I>Nc</I>. Any of the 3
|
||||
parameters can be specified with an asterisk "*", which means LAMMPS
|
||||
will choose the number of cores in that dimension of the node's
|
||||
sub-grid. As with Px,Py,Pz, it will do this based on the size and
|
||||
shape of the global simulation box so as to minimize the
|
||||
surface-to-volume ratio of each processor's sub-domain.
|
||||
</P>
|
||||
<P>IMPORTANT NOTE: For the <I>twolevel</I> style to work correctly, it
|
||||
assumes the MPI ranks of processors LAMMPS is running on are ordered
|
||||
by core and then by node. E.g. if you are running on 2 quad-core
|
||||
nodes, for a total of 8 processors, then it assumes processors 0,1,2,3
|
||||
are on node 1, and processors 4,5,6,7 are on node 2. This is the
|
||||
default rank ordering for most MPI implementations, but some MPIs
|
||||
provide options for this ordering, e.g. via environment variable
|
||||
settings.
|
||||
</P>
|
||||
<P>The <I>numa</I> style operates similar to the <I>twolevel</I> keyword except
|
||||
that it auto-detects the core count within the nodes. Currently, it
|
||||
does this in only 2 levels, but it may be extended in the future to
|
||||
account for socket topology and other non-uniform memory access (NUMA)
|
||||
costs. It also uses a different algorithm (iterative) than the
|
||||
<I>twolevel</I> keyword for doing the two-level factorization of the
|
||||
simulation box into a 3d processor grid to minimize off-node
|
||||
communication, and it does its own mapping of nodes and cores to the
|
||||
logical 3d grid. Thus it may produce a different or improved layout
|
||||
of the processors.
|
||||
</P>
|
||||
<P>The <I>numa</I> style will give an error if (a) there are less than 4 cores
|
||||
per node, or (b) the number of MPI processes is not divisible by the
|
||||
number of cores used per node, or (c) only 1 node is allocated, or (d)
|
||||
any of the Px or Py of Pz values is greater than 1.
|
||||
</P>
|
||||
<P>IMPORTANT NOTE: For the <I>numa</I> style to work correctly, it assumes
|
||||
the MPI ranks of processors LAMMPS is running on are ordered by core
|
||||
and then by node. See the same note for the <I>twolevel</I> keyword.
|
||||
</P>
|
||||
<P>The <I>custom</I> style uses the file <I>inname</I> to define both the 3d
|
||||
factorization and the mapping of processors to the grid.
|
||||
</P>
|
||||
<P>The file should have the following format. Any number of initial
|
||||
blank or comment lines (starting with a "#" character) can be present.
|
||||
The first non-blank, non-comment line should have
|
||||
3 values:
|
||||
</P>
|
||||
<PRE>Px Py Py
|
||||
</PRE>
|
||||
<P>These must be compatible with the total number of processors
|
||||
and the Px, Py, Pz settings of the processors commmand.
|
||||
</P>
|
||||
<P>This line should be immediately followed by
|
||||
P = Px*Py*Pz lines of the form:
|
||||
</P>
|
||||
<PRE>ID I J K
|
||||
</PRE>
|
||||
<P>where ID is a processor ID (from 0 to P-1) and I,J,K are the
|
||||
processors location in the 3d grid. I must be a number from 1 to Px
|
||||
(inclusive) and similarly for J and K. The P lines can be listed in
|
||||
any order, but no processor ID should appear more than once.
|
||||
</P>
|
||||
<HR>
|
||||
|
||||
<P>The <I>map</I> keyword affects how the P processor IDs (from 0 to P-1) are
|
||||
mapped to the 3d grid of processors. It is only used by the
|
||||
<I>onelevel</I> and <I>twolevel</I> grid settings.
|
||||
</P>
|
||||
<P>The <I>cart</I> style uses the family of MPI Cartesian functions to perform
|
||||
the mapping, namely MPI_Cart_create(), MPI_Cart_get(),
|
||||
|
@ -140,28 +221,10 @@ practice, however, few if any MPI implementations actually do this.
|
|||
So it is likely that the <I>cart</I> and <I>cart/reorder</I> styles simply give
|
||||
the same result as one of the IJK styles.
|
||||
</P>
|
||||
<HR>
|
||||
|
||||
<P>The <I>numa</I> keyword affects both the factorization of P into Px,Py,Pz
|
||||
and the mapping of processors to the 3d grid.
|
||||
</P>
|
||||
<P>It operates similar to the <I>level2</I> and <I>level3</I> keywords except that
|
||||
it tries to auto-detect the count and topology of the processors and
|
||||
cores within a node. Currently, it does this in only 2 levels
|
||||
(assumes the proces/node = 1), but it may be extended in the future.
|
||||
</P>
|
||||
<P>It also uses a different algorithm (iterative) than the <I>level2</I>
|
||||
keyword for doing the two-level factorization of the simulation box
|
||||
into a 3d processor grid to minimize off-node communication. Thus it
|
||||
may give a differnet or improved mapping of processors to the 3d grid.
|
||||
</P>
|
||||
<P>The numa setting will give an error if the number of MPI processes
|
||||
is not evenly divisible by the number of cores used per node.
|
||||
</P>
|
||||
<P>The numa setting will be ignored if (a) there are less than 4 cores
|
||||
per node, or (b) the number of MPI processes is not divisible by the
|
||||
number of cores used per node, or (c) only 1 node is allocated, or (d)
|
||||
any of the Px or Py of Pz values is greater than 1.
|
||||
<P>Also note, that for the <I>twolevel</I> grid style, the <I>map</I> setting is
|
||||
used to first map the nodes to the 3d grid, then again to the cores
|
||||
within each node. For the latter step, the <I>cart</I> and <I>cart/reorder</I>
|
||||
styles are not supported, so an <I>xyz</I> style is used in their place.
|
||||
</P>
|
||||
<HR>
|
||||
|
||||
|
@ -216,8 +279,8 @@ use of MPI-specific launch options such as a config file.
|
|||
</P>
|
||||
<P>If you have multiple partitions you should insure that each one writes
|
||||
to a different file, e.g. using a <A HREF = "variable.html">world-style variable</A>
|
||||
for the filename. The file will have a self-explanatory header,
|
||||
followed by one-line per processor in this format:
|
||||
for the filename. The file has a self-explanatory header, followed by
|
||||
one-line per processor in this format:
|
||||
</P>
|
||||
<P>I J K: world-ID universe-ID original-ID: name
|
||||
</P>
|
||||
|
@ -245,7 +308,11 @@ same <I>name</I>.
|
|||
It can be used before a restart file is read to change the 3d
|
||||
processor grid from what is specified in the restart file.
|
||||
</P>
|
||||
<P>The <I>grid numa</I> keyword only currently works with the <I>map cart</I> option.
|
||||
<P>The <I>grid numa</I> keyword only currently works with the <I>map cart</I>
|
||||
option.
|
||||
</P>
|
||||
<P>The <I>part</I> keyword (for the receiving partition) only works with the
|
||||
<I>grid onelevel</I> or <I>twolevel</I> options.
|
||||
</P>
|
||||
<P><B>Related commands:</B>
|
||||
</P>
|
||||
|
@ -254,6 +321,7 @@ switch</A>
|
|||
</P>
|
||||
<P><B>Default:</B>
|
||||
</P>
|
||||
<P>The option defaults are Px Py Pz = * * *, grid = level1, and map = cart.
|
||||
<P>The option defaults are Px Py Pz = * * *, grid = onelevel, and map =
|
||||
cart.
|
||||
</P>
|
||||
</HTML>
|
||||
|
|
|
@ -18,11 +18,12 @@ keyword = {grid} or {map} or {part} or {file} :l
|
|||
{grid} arg = gstyle params ...
|
||||
gstyle = {onelevel} or {twolevel} or {numa} or {custom}
|
||||
onelevel params = none
|
||||
twolevel params = Cx Cy Cz
|
||||
twolevel params = Nc Cx Cy Cz
|
||||
Nc = number of cores per node
|
||||
Cx,Cy,Cz = # of cores in each dimension of 3d sub-grid assigned to each node
|
||||
numa params = none
|
||||
custom params = fname
|
||||
fname = file containing grid layout
|
||||
custom params = inname
|
||||
inname = file containing grid layout
|
||||
{map} arg = {cart} or {cart/reorder} or {xyz} or {xzy} or {yxz} or {yzx} or {zxy} or {zyx}
|
||||
cart = use MPI_Cart() methods to map processors to 3d grid with reorder = 0
|
||||
cart/reorder = use MPI_Cart() methods to map processors to 3d grid with reorder = 1
|
||||
|
@ -43,6 +44,7 @@ processors * * 5
|
|||
processors 2 4 4
|
||||
processors * * 8 map xyz
|
||||
processors * * * grid numa
|
||||
processors * * * grid twolevel 4 * * 1
|
||||
processors 4 8 16 grid custom myfile
|
||||
processors * * * part 1 2 multiple :pre
|
||||
|
||||
|
@ -52,9 +54,9 @@ Specify how processors are mapped as a 3d logical grid to the global
|
|||
simulation box. This involves 2 steps. First if there are P
|
||||
processors it means choosing a factorization P = Px by Py by Pz so
|
||||
that there are Px processors in the x dimension, and similarly for the
|
||||
y and z dimensions. Second, the P processors (with MPI ranks 0 to
|
||||
P-1) are mapped to the logical 3d grid. The arguments to this command
|
||||
control each of these 2 steps.
|
||||
y and z dimensions. Second, the P processors are mapped to the
|
||||
logical 3d grid. The arguments to this command control each of these
|
||||
2 steps.
|
||||
|
||||
The Px, Py, Pz parameters affect the factorization. Any of the 3
|
||||
parameters can be specified with an asterisk "*", which means LAMMPS
|
||||
|
@ -74,8 +76,8 @@ The product of Px, Py, Pz must equal P, the total # of processors
|
|||
LAMMPS is running on. For a "2d simulation"_dimension.html, Pz must
|
||||
equal 1.
|
||||
|
||||
Note that if you run on a large, prime number of processors P, then a
|
||||
grid such as 1 x P x 1 will be required, which may incur extra
|
||||
Note that if you run on a prime number of processors P, then a grid
|
||||
such as 1 x P x 1 will be required, which may incur extra
|
||||
communication costs due to the high surface area of each processor's
|
||||
sub-domain.
|
||||
|
||||
|
@ -94,8 +96,87 @@ partition yes 2 processors 2 3 2 :pre
|
|||
|
||||
:line
|
||||
|
||||
The {grid} keyword affects how the P processor IDs (from 0 to P-1) are
|
||||
mapped to the 3d grid of processors.
|
||||
The {grid} keyword affects the factorization of P into Px,Py,Pz and it
|
||||
can also affect how the P processor IDs are mapped to the 3d grid of
|
||||
processors.
|
||||
|
||||
The {onelevel} style creates a 3d grid that is compatible with the
|
||||
Px,Py,Pz settings, and which minimizes the surface-to-volume ratio of
|
||||
each processor's sub-domain, as described above. The mapping of
|
||||
processors to the grid is determined by the {map} keyword setting.
|
||||
|
||||
The {twolevel} style can be used on machines with multi-core nodes
|
||||
to minimize off-node communication. It insures that contiguous
|
||||
sub-sections of the 3d grid are assigned to all the cores of a node.
|
||||
For example if {Nc} is 4, then 2x2x1 or 2x1x2 or 1x2x2 sub-sections
|
||||
of the 3d grid will correspond to the cores of each node. This
|
||||
affects both the factorization and mapping steps.
|
||||
|
||||
The {Cx}, {Cy}, {Cz} settings are similar to the {Px}, {Py}, {Pz}
|
||||
settings, only their product should equal {Nc}. Any of the 3
|
||||
parameters can be specified with an asterisk "*", which means LAMMPS
|
||||
will choose the number of cores in that dimension of the node's
|
||||
sub-grid. As with Px,Py,Pz, it will do this based on the size and
|
||||
shape of the global simulation box so as to minimize the
|
||||
surface-to-volume ratio of each processor's sub-domain.
|
||||
|
||||
IMPORTANT NOTE: For the {twolevel} style to work correctly, it
|
||||
assumes the MPI ranks of processors LAMMPS is running on are ordered
|
||||
by core and then by node. E.g. if you are running on 2 quad-core
|
||||
nodes, for a total of 8 processors, then it assumes processors 0,1,2,3
|
||||
are on node 1, and processors 4,5,6,7 are on node 2. This is the
|
||||
default rank ordering for most MPI implementations, but some MPIs
|
||||
provide options for this ordering, e.g. via environment variable
|
||||
settings.
|
||||
|
||||
The {numa} style operates similar to the {twolevel} keyword except
|
||||
that it auto-detects the core count within the nodes. Currently, it
|
||||
does this in only 2 levels, but it may be extended in the future to
|
||||
account for socket topology and other non-uniform memory access (NUMA)
|
||||
costs. It also uses a different algorithm (iterative) than the
|
||||
{twolevel} keyword for doing the two-level factorization of the
|
||||
simulation box into a 3d processor grid to minimize off-node
|
||||
communication, and it does its own mapping of nodes and cores to the
|
||||
logical 3d grid. Thus it may produce a different or improved layout
|
||||
of the processors.
|
||||
|
||||
The {numa} style will give an error if (a) there are less than 4 cores
|
||||
per node, or (b) the number of MPI processes is not divisible by the
|
||||
number of cores used per node, or (c) only 1 node is allocated, or (d)
|
||||
any of the Px or Py of Pz values is greater than 1.
|
||||
|
||||
IMPORTANT NOTE: For the {numa} style to work correctly, it assumes
|
||||
the MPI ranks of processors LAMMPS is running on are ordered by core
|
||||
and then by node. See the same note for the {twolevel} keyword.
|
||||
|
||||
The {custom} style uses the file {inname} to define both the 3d
|
||||
factorization and the mapping of processors to the grid.
|
||||
|
||||
The file should have the following format. Any number of initial
|
||||
blank or comment lines (starting with a "#" character) can be present.
|
||||
The first non-blank, non-comment line should have
|
||||
3 values:
|
||||
|
||||
Px Py Py :pre
|
||||
|
||||
These must be compatible with the total number of processors
|
||||
and the Px, Py, Pz settings of the processors commmand.
|
||||
|
||||
This line should be immediately followed by
|
||||
P = Px*Py*Pz lines of the form:
|
||||
|
||||
ID I J K :pre
|
||||
|
||||
where ID is a processor ID (from 0 to P-1) and I,J,K are the
|
||||
processors location in the 3d grid. I must be a number from 1 to Px
|
||||
(inclusive) and similarly for J and K. The P lines can be listed in
|
||||
any order, but no processor ID should appear more than once.
|
||||
|
||||
:line
|
||||
|
||||
The {map} keyword affects how the P processor IDs (from 0 to P-1) are
|
||||
mapped to the 3d grid of processors. It is only used by the
|
||||
{onelevel} and {twolevel} grid settings.
|
||||
|
||||
The {cart} style uses the family of MPI Cartesian functions to perform
|
||||
the mapping, namely MPI_Cart_create(), MPI_Cart_get(),
|
||||
|
@ -133,28 +214,10 @@ practice, however, few if any MPI implementations actually do this.
|
|||
So it is likely that the {cart} and {cart/reorder} styles simply give
|
||||
the same result as one of the IJK styles.
|
||||
|
||||
:line
|
||||
|
||||
The {numa} keyword affects both the factorization of P into Px,Py,Pz
|
||||
and the mapping of processors to the 3d grid.
|
||||
|
||||
It operates similar to the {level2} and {level3} keywords except that
|
||||
it tries to auto-detect the count and topology of the processors and
|
||||
cores within a node. Currently, it does this in only 2 levels
|
||||
(assumes the proces/node = 1), but it may be extended in the future.
|
||||
|
||||
It also uses a different algorithm (iterative) than the {level2}
|
||||
keyword for doing the two-level factorization of the simulation box
|
||||
into a 3d processor grid to minimize off-node communication. Thus it
|
||||
may give a differnet or improved mapping of processors to the 3d grid.
|
||||
|
||||
The numa setting will give an error if the number of MPI processes
|
||||
is not evenly divisible by the number of cores used per node.
|
||||
|
||||
The numa setting will be ignored if (a) there are less than 4 cores
|
||||
per node, or (b) the number of MPI processes is not divisible by the
|
||||
number of cores used per node, or (c) only 1 node is allocated, or (d)
|
||||
any of the Px or Py of Pz values is greater than 1.
|
||||
Also note, that for the {twolevel} grid style, the {map} setting is
|
||||
used to first map the nodes to the 3d grid, then again to the cores
|
||||
within each node. For the latter step, the {cart} and {cart/reorder}
|
||||
styles are not supported, so an {xyz} style is used in their place.
|
||||
|
||||
:line
|
||||
|
||||
|
@ -209,8 +272,8 @@ use of MPI-specific launch options such as a config file.
|
|||
|
||||
If you have multiple partitions you should insure that each one writes
|
||||
to a different file, e.g. using a "world-style variable"_variable.html
|
||||
for the filename. The file will have a self-explanatory header,
|
||||
followed by one-line per processor in this format:
|
||||
for the filename. The file has a self-explanatory header, followed by
|
||||
one-line per processor in this format:
|
||||
|
||||
I J K: world-ID universe-ID original-ID: name
|
||||
|
||||
|
@ -238,7 +301,11 @@ This command cannot be used after the simulation box is defined by a
|
|||
It can be used before a restart file is read to change the 3d
|
||||
processor grid from what is specified in the restart file.
|
||||
|
||||
The {grid numa} keyword only currently works with the {map cart} option.
|
||||
The {grid numa} keyword only currently works with the {map cart}
|
||||
option.
|
||||
|
||||
The {part} keyword (for the receiving partition) only works with the
|
||||
{grid onelevel} or {twolevel} options.
|
||||
|
||||
[Related commands:]
|
||||
|
||||
|
@ -247,4 +314,5 @@ switch"_Section_start.html#start_6
|
|||
|
||||
[Default:]
|
||||
|
||||
The option defaults are Px Py Pz = * * *, grid = level1, and map = cart.
|
||||
The option defaults are Px Py Pz = * * *, grid = onelevel, and map =
|
||||
cart.
|
||||
|
|
Loading…
Reference in New Issue