git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12453 f3b2605a-c512-4ea7-a41b-209d697bcdaa

This commit is contained in:
sjplimp 2014-09-09 17:07:45 +00:00
parent 334f57f7f3
commit 787b9fc6f8
2 changed files with 86 additions and 98 deletions

View File

@ -65,9 +65,13 @@
<I>neigh</I> value = <I>full</I> or <I>half/thread</I> or <I>half</I> or <I>n2</I> or <I>full/cluster</I>
<I>comm/exchange</I> value = <I>no</I> or <I>host</I> or <I>device</I>
<I>comm/forward</I> value = <I>no</I> or <I>host</I> or <I>device</I>
<I>omp</I> args = Nthreads mode
<I>omp</I> args = Nthreads keyword value ...
Nthreads = # of OpenMP threads to associate with each MPI process
mode = force or force/neigh (optional)
zero or more keyword/value pairs may be appended
keywords = <I>neigh</I>
<I>neigh</I> value = <I>yes</I> or <I>no</I>
<I>yes</I> = threaded neighbor list build (default)
<I>no</I> = non-threaded neighbor list build
</PRE>
</UL>
@ -80,8 +84,8 @@ package gpu force/neigh 0 1 -1.0
package cuda gpu/node/special 2 0 2
package cuda test 3948
package kokkos neigh half/thread comm/forward device
package omp * force/neigh
package omp 4 force
package omp 0 neigh yes
package omp 4
package intel * mixed balance -1
</PRE>
<P><B>Description:</B>
@ -349,30 +353,25 @@ multiple threads to pack/unpack communicated data.
<P>The <I>omp</I> style invokes options associated with the use of the
USER-OMP package.
</P>
<P>The first argument allows to explicitly set the number of OpenMP
threads to be allocated for each MPI process. For example, if your
system has nodes with dual quad-core processors, it has a total of 8
cores per node. You could run MPI on 2 cores on each node (e.g. using
options for the mpirun command), and set the <I>Nthreads</I> setting to 4.
This would effectively use all 8 cores on each node. Since each MPI
process would spawn 4 threads (one of which runs as part of the MPI
process itself).
<P>The first argument sets the number of OpenMP threads allocated for
each MPI process or task. For example, if your system has nodes with
dual quad-core processors, it has a total of 8 cores per node. You
could two MPI tasks per node (e.g. using the -ppn option of the mpirun
command), and set <I>Nthreads</I> = 4. This would effectively use all 8
cores on each node. Note that the product of MPI tasks * threads/task
should not exceed the physical number of cores (on a node), otherwise
performance will suffer.
</P>
<P>For performance reasons, you should not set <I>Nthreads</I> to more threads
than there are physical cores (per MPI task), but LAMMPS cannot check
for this.
</P>
<P>An <I>Nthreads</I> value of '*' instructs LAMMPS to use whatever is the
<P>An <I>Nthreads</I> value of 0 instructs LAMMPS to use whatever value is the
default for the given OpenMP environment. This is usually determined
via the <I>OMP_NUM_THREADS</I> environment variable or the compiler
runtime. Please note that in most cases the default for OpenMP
capable compilers is to use one thread for each available CPU core
when <I>OMP_NUM_THREADS</I> is not set, which can lead to extremely bad
runtime. Note that in most cases the default for OpenMP capable
compilers is to use one thread for each available CPU core when
<I>OMP_NUM_THREADS</I> is not explicitly set, which can lead to poor
performance.
</P>
<P>By default LAMMPS uses 1 thread per MPI task. If the environment
variable OMP_NUM_THREADS is set to a valid value, this value is used.
You can set this environment variable when you launch LAMMPS, e.g.
<P>Here are examples of how to set the environment variable when
launching LAMMPS:
</P>
<PRE>env OMP_NUM_THREADS=4 lmp_machine -sf omp -in in.script
env OMP_NUM_THREADS=2 mpirun -np 2 lmp_machine -sf omp -in in.script
@ -383,26 +382,24 @@ All three of these examples use a total of 4 CPU cores.
</P>
<P>Note that different MPI implementations have different ways of passing
the OMP_NUM_THREADS environment variable to all MPI processes. The
2nd line above is for MPICH; the 3rd line with -x is for OpenMPI.
Check your MPI documentation for additional details.
2nd example line above is for MPICH; the 3rd example line with -x is
for OpenMPI. Check your MPI documentation for additional details.
</P>
<P>You can also set the number of threads per MPI task via the <A HREF = "package.html">package
omp</A> command, which will override any OMP_NUM_THREADS
setting.
<P>What combination of threads and MPI tasks gives the best performance
is difficult to predict and can depend on many components of your
input. Not all features of LAMMPS support OpenMP threading via the
USER-OMP packaage and the parallel efficiency can be very different,
too.
</P>
<P>Which combination of threads and MPI tasks gives the best performance
is difficult to predict and can depend on many components of your input.
Not all features of LAMMPS support OpenMP and the parallel efficiency
can be very different, too.
</P>
<P>The <I>mode</I> setting specifies where neighbor list calculations will be
multi-threaded as well. If <I>mode</I> is force, neighbor list calculation
is performed in serial. If <I>mode</I> is force/neigh, a multi-threaded
neighbor list build is used. Using the force/neigh setting is almost
always faster and should produce idential neighbor lists at the
expense of using some more memory (neighbor list pages are always
allocated for all threads at the same time and each thread works on
its own pages).
<P>The <I>neigh</I> keyword specifies whether neighbor list building will be
multi-threaded in addition to force calculations. If <I>neigh</I> is set
to <I>no</I> then neighbor list calculation is performed only by MPI tasks
with no OpenMP threading. If <I>mode</I> is <I>yes</I> (the default), a
multi-threaded neighbor list build is used. Using <I>neigh</I> = <I>yes</I> is
almost always faster and should produce idential neighbor lists at the
expense of using more memory. Specifically, neighbor list pages are
allocated for all threads at the same time and each thread works
within its own pages.
</P>
<HR>
@ -455,9 +452,10 @@ the "-sf kk" <A HREF = "Section_start.html#start_7">command-line switch</A> is u
or not.
</P>
<P>If the "-sf omp" <A HREF = "Section_start.html#start_7">command-line switch</A> is
used then it is as if the command "package omp *" were invoked, to
specify default settings for the USER-OMP package. If the
command-line switch is not used, then no defaults are set, and you
must specify the appropriate package command in your input script.
used then it is as if the command "package omp 0" were invoked, to
specify settings for the USER-OMP package. The option defaults are
neigh = yes. If the command-line switch is not used, then no defaults
are set, and you must specify the appropriate "package omp" command in
your input script.
</P>
</HTML>

View File

@ -60,9 +60,13 @@ args = arguments specific to the style :l
{neigh} value = {full} or {half/thread} or {half} or {n2} or {full/cluster}
{comm/exchange} value = {no} or {host} or {device}
{comm/forward} value = {no} or {host} or {device}
{omp} args = Nthreads mode
{omp} args = Nthreads keyword value ...
Nthreads = # of OpenMP threads to associate with each MPI process
mode = force or force/neigh (optional) :pre
zero or more keyword/value pairs may be appended
keywords = {neigh}
{neigh} value = {yes} or {no}
{yes} = threaded neighbor list build (default)
{no} = non-threaded neighbor list build :pre
:ule
[Examples:]
@ -74,8 +78,8 @@ package gpu force/neigh 0 1 -1.0
package cuda gpu/node/special 2 0 2
package cuda test 3948
package kokkos neigh half/thread comm/forward device
package omp * force/neigh
package omp 4 force
package omp 0 neigh yes
package omp 4
package intel * mixed balance -1 :pre
[Description:]
@ -343,33 +347,25 @@ multiple threads to pack/unpack communicated data.
The {omp} style invokes options associated with the use of the
USER-OMP package.
The first argument allows to explicitly set the number of OpenMP
threads to be allocated for each MPI process. For example, if your
system has nodes with dual quad-core processors, it has a total of 8
cores per node. You could run MPI on 2 cores on each node (e.g. using
options for the mpirun command), and set the {Nthreads} setting to 4.
This would effectively use all 8 cores on each node. Since each MPI
process would spawn 4 threads (one of which runs as part of the MPI
process itself).
The first argument sets the number of OpenMP threads allocated for
each MPI process or task. For example, if your system has nodes with
dual quad-core processors, it has a total of 8 cores per node. You
could two MPI tasks per node (e.g. using the -ppn option of the mpirun
command), and set {Nthreads} = 4. This would effectively use all 8
cores on each node. Note that the product of MPI tasks * threads/task
should not exceed the physical number of cores (on a node), otherwise
performance will suffer.
For performance reasons, you should not set {Nthreads} to more threads
than there are physical cores (per MPI task), but LAMMPS cannot check
for this.
An {Nthreads} value of '*' instructs LAMMPS to use whatever is the
An {Nthreads} value of 0 instructs LAMMPS to use whatever value is the
default for the given OpenMP environment. This is usually determined
via the {OMP_NUM_THREADS} environment variable or the compiler
runtime. Please note that in most cases the default for OpenMP
capable compilers is to use one thread for each available CPU core
when {OMP_NUM_THREADS} is not set, which can lead to extremely bad
runtime. Note that in most cases the default for OpenMP capable
compilers is to use one thread for each available CPU core when
{OMP_NUM_THREADS} is not explicitly set, which can lead to poor
performance.
By default LAMMPS uses 1 thread per MPI task. If the environment
variable OMP_NUM_THREADS is set to a valid value, this value is used.
You can set this environment variable when you launch LAMMPS, e.g.
Here are examples of how to set the environment variable when
launching LAMMPS:
env OMP_NUM_THREADS=4 lmp_machine -sf omp -in in.script
env OMP_NUM_THREADS=2 mpirun -np 2 lmp_machine -sf omp -in in.script
@ -378,33 +374,26 @@ mpirun -x OMP_NUM_THREADS=2 -np 2 lmp_machine -sf omp -in in.script :pre
or you can set it permanently in your shell's start-up script.
All three of these examples use a total of 4 CPU cores.
Note that different MPI implementations have different ways of passing
the OMP_NUM_THREADS environment variable to all MPI processes. The
2nd line above is for MPICH; the 3rd line with -x is for OpenMPI.
Check your MPI documentation for additional details.
2nd example line above is for MPICH; the 3rd example line with -x is
for OpenMPI. Check your MPI documentation for additional details.
You can also set the number of threads per MPI task via the "package
omp"_package.html command, which will override any OMP_NUM_THREADS
setting.
What combination of threads and MPI tasks gives the best performance
is difficult to predict and can depend on many components of your
input. Not all features of LAMMPS support OpenMP threading via the
USER-OMP packaage and the parallel efficiency can be very different,
too.
Which combination of threads and MPI tasks gives the best performance
is difficult to predict and can depend on many components of your input.
Not all features of LAMMPS support OpenMP and the parallel efficiency
can be very different, too.
The {mode} setting specifies where neighbor list calculations will be
multi-threaded as well. If {mode} is force, neighbor list calculation
is performed in serial. If {mode} is force/neigh, a multi-threaded
neighbor list build is used. Using the force/neigh setting is almost
always faster and should produce idential neighbor lists at the
expense of using some more memory (neighbor list pages are always
allocated for all threads at the same time and each thread works on
its own pages).
The {neigh} keyword specifies whether neighbor list building will be
multi-threaded in addition to force calculations. If {neigh} is set
to {no} then neighbor list calculation is performed only by MPI tasks
with no OpenMP threading. If {mode} is {yes} (the default), a
multi-threaded neighbor list build is used. Using {neigh} = {yes} is
almost always faster and should produce idential neighbor lists at the
expense of using more memory. Specifically, neighbor list pages are
allocated for all threads at the same time and each thread works
within its own pages.
:line
@ -457,7 +446,8 @@ the "-sf kk" "command-line switch"_Section_start.html#start_7 is used
or not.
If the "-sf omp" "command-line switch"_Section_start.html#start_7 is
used then it is as if the command "package omp *" were invoked, to
specify default settings for the USER-OMP package. If the
command-line switch is not used, then no defaults are set, and you
must specify the appropriate package command in your input script.
used then it is as if the command "package omp 0" were invoked, to
specify settings for the USER-OMP package. The option defaults are
neigh = yes. If the command-line switch is not used, then no defaults
are set, and you must specify the appropriate "package omp" command in
your input script.