forked from lijiext/lammps
git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@13962 f3b2605a-c512-4ea7-a41b-209d697bcdaa
This commit is contained in:
parent
4d13f3d33d
commit
34c599b125
|
@ -1739,9 +1739,9 @@ timesteps. When the run concludes, LAMMPS prints the final
|
|||
thermodynamic state and a total run time for the simulation. It then
|
||||
appends statistics about the CPU time and storage requirements for the
|
||||
simulation. An example set of statistics is shown here:</p>
|
||||
<div class="highlight-python"><div class="highlight"><pre>Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
|
||||
<p>Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms</p>
|
||||
<div class="highlight-python"><div class="highlight"><pre>Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
|
||||
97.0% CPU use with 4 MPI tasks x no OpenMP threads
|
||||
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-python"><div class="highlight"><pre>MPI task timings breakdown:
|
||||
|
@ -1773,15 +1773,15 @@ Dangerous builds = 0
|
|||
</pre></div>
|
||||
</div>
|
||||
<p>The first section provides a global loop timing summary. The loop time
|
||||
is the total wall time for the section. The second line provides the
|
||||
CPU utilzation per MPI task; it should be close to 100% times the number
|
||||
of OpenMP threads (or 1). Lower numbers correspond to delays due to
|
||||
file i/o or unsufficient thread utilization. The <em>Performance</em> line is
|
||||
is the total wall time for the section. The <em>Performance</em> line is
|
||||
provided for convenience to help predicting the number of loop
|
||||
continuations required and for comparing performance with other similar
|
||||
MD codes.</p>
|
||||
<p>The second section gives the breakdown of the CPU run time (in seconds)
|
||||
into major categories:</p>
|
||||
continuations required and for comparing performance with other
|
||||
similar MD codes. The CPU use line provides the CPU utilzation per
|
||||
MPI task; it should be close to 100% times the number of OpenMP
|
||||
threads (or 1). Lower numbers correspond to delays due to file I/O or
|
||||
insufficient thread utilization.</p>
|
||||
<p>The MPI task section gives the breakdown of the CPU run time (in
|
||||
seconds) into major categories:</p>
|
||||
<ul class="simple">
|
||||
<li><em>Pair</em> stands for all non-bonded force computation</li>
|
||||
<li><em>Bond</em> stands for bonded interactions: bonds, angles, dihedrals, impropers</li>
|
||||
|
@ -1799,17 +1799,17 @@ the amount of load imbalance in this segment of the calculation. Ideally
|
|||
the difference between minimum, maximum and average is small and thus
|
||||
the variation from the average close to zero. The final column shows
|
||||
the percentage of the total loop time is spent in this section.</p>
|
||||
<p>When using the <code class="xref doc docutils literal"><span class="pre">timers</span> <span class="pre">full</span></code> setting, and additional column
|
||||
is present that also prints the CPU utilization in percent. In addition,
|
||||
when using <em>timers full</em> and the <a class="reference internal" href="package.html"><em>package omp</em></a> command are
|
||||
active, a similar timing summary of time spent in threaded regions to
|
||||
monitor thread utilization and load balance is provided. A new enrty is
|
||||
the <em>Reduce</em> section, which lists the time spend in reducing the per-thread
|
||||
data elements to the storage for non-threaded computation. These thread
|
||||
timings are taking from the first MPI rank only and and thus, as the
|
||||
breakdown for MPI tasks can change from MPI rank to MPI rank, this
|
||||
breakdown can be very different for individual ranks. Here is an example
|
||||
output for this optional output section:</p>
|
||||
<p>When using the <code class="xref doc docutils literal"><span class="pre">timers</span> <span class="pre">full</span></code> setting, an additional column
|
||||
is present that also prints the CPU utilization in percent. In
|
||||
addition, when using <em>timers full</em> and the <a class="reference internal" href="package.html"><em>package omp</em></a>
|
||||
command are active, a similar timing summary of time spent in threaded
|
||||
regions to monitor thread utilization and load balance is provided. A
|
||||
new entry is the <em>Reduce</em> section, which lists the time spend in
|
||||
reducing the per-thread data elements to the storage for non-threaded
|
||||
computation. These thread timings are taking from the first MPI rank
|
||||
only and and thus, as the breakdown for MPI tasks can change from MPI
|
||||
rank to MPI rank, this breakdown can be very different for individual
|
||||
ranks. Here is an example output for this section:</p>
|
||||
<p>Thread timings breakdown (MPI rank 0):
|
||||
Total threaded time 0.6846 / 90.6%
|
||||
Section | min time | avg time | max time <a href="#id17"><span class="problematic" id="id18">|%varavg|</span></a> %total
|
||||
|
|
|
@ -1746,8 +1746,9 @@ appends statistics about the CPU time and storage requirements for the
|
|||
simulation. An example set of statistics is shown here:
|
||||
|
||||
Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
|
||||
97.0% CPU use with 4 MPI tasks x no OpenMP threads
|
||||
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s :pre
|
||||
|
||||
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
|
||||
97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre
|
||||
|
||||
MPI task timings breakdown:
|
||||
Section | min time | avg time | max time |%varavg| %total
|
||||
|
@ -1775,16 +1776,16 @@ Neighbor list builds = 26
|
|||
Dangerous builds = 0 :pre
|
||||
|
||||
The first section provides a global loop timing summary. The loop time
|
||||
is the total wall time for the section. The second line provides the
|
||||
CPU utilzation per MPI task; it should be close to 100% times the number
|
||||
of OpenMP threads (or 1). Lower numbers correspond to delays due to
|
||||
file i/o or unsufficient thread utilization. The {Performance} line is
|
||||
is the total wall time for the section. The {Performance} line is
|
||||
provided for convenience to help predicting the number of loop
|
||||
continuations required and for comparing performance with other similar
|
||||
MD codes.
|
||||
continuations required and for comparing performance with other
|
||||
similar MD codes. The CPU use line provides the CPU utilzation per
|
||||
MPI task; it should be close to 100% times the number of OpenMP
|
||||
threads (or 1). Lower numbers correspond to delays due to file I/O or
|
||||
insufficient thread utilization.
|
||||
|
||||
The second section gives the breakdown of the CPU run time (in seconds)
|
||||
into major categories:
|
||||
The MPI task section gives the breakdown of the CPU run time (in
|
||||
seconds) into major categories:
|
||||
|
||||
{Pair} stands for all non-bonded force computation
|
||||
{Bond} stands for bonded interactions: bonds, angles, dihedrals, impropers
|
||||
|
@ -1803,17 +1804,17 @@ the difference between minimum, maximum and average is small and thus
|
|||
the variation from the average close to zero. The final column shows
|
||||
the percentage of the total loop time is spent in this section.
|
||||
|
||||
When using the "timers full"_timers.html setting, and additional column
|
||||
is present that also prints the CPU utilization in percent. In addition,
|
||||
when using {timers full} and the "package omp"_package.html command are
|
||||
active, a similar timing summary of time spent in threaded regions to
|
||||
monitor thread utilization and load balance is provided. A new enrty is
|
||||
the {Reduce} section, which lists the time spend in reducing the per-thread
|
||||
data elements to the storage for non-threaded computation. These thread
|
||||
timings are taking from the first MPI rank only and and thus, as the
|
||||
breakdown for MPI tasks can change from MPI rank to MPI rank, this
|
||||
breakdown can be very different for individual ranks. Here is an example
|
||||
output for this optional output section:
|
||||
When using the "timers full"_timers.html setting, an additional column
|
||||
is present that also prints the CPU utilization in percent. In
|
||||
addition, when using {timers full} and the "package omp"_package.html
|
||||
command are active, a similar timing summary of time spent in threaded
|
||||
regions to monitor thread utilization and load balance is provided. A
|
||||
new entry is the {Reduce} section, which lists the time spend in
|
||||
reducing the per-thread data elements to the storage for non-threaded
|
||||
computation. These thread timings are taking from the first MPI rank
|
||||
only and and thus, as the breakdown for MPI tasks can change from MPI
|
||||
rank to MPI rank, this breakdown can be very different for individual
|
||||
ranks. Here is an example output for this section:
|
||||
|
||||
Thread timings breakdown (MPI rank 0):
|
||||
Total threaded time 0.6846 / 90.6%
|
||||
|
@ -1825,7 +1826,6 @@ Kspace | 0.070572 | 0.074541 | 0.07851 | 1.5 | 10.89
|
|||
Neigh | 0.084778 | 0.086969 | 0.089161 | 0.7 | 12.70
|
||||
Reduce | 0.0036485 | 0.003737 | 0.0038254 | 0.1 | 0.55
|
||||
|
||||
|
||||
The third section lists the number of owned atoms (Nlocal), ghost atoms
|
||||
(Nghost), and pair-wise neighbors stored per processor. The max and min
|
||||
values give the spread of these values across processors with a 10-bin
|
||||
|
|
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue