git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@13962 f3b2605a-c512-4ea7-a41b-209d697bcdaa
This commit is contained in:
sjplimp 2015-08-29 00:13:46 +00:00
parent 4d13f3d33d
commit 34c599b125
3 changed files with 44 additions and 44 deletions

View File

@ -1739,9 +1739,9 @@ timesteps. When the run concludes, LAMMPS prints the final
thermodynamic state and a total run time for the simulation. It then
appends statistics about the CPU time and storage requirements for the
simulation. An example set of statistics is shown here:</p>
<div class="highlight-python"><div class="highlight"><pre>Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
<p>Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms</p>
<div class="highlight-python"><div class="highlight"><pre>Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
97.0% CPU use with 4 MPI tasks x no OpenMP threads
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
</pre></div>
</div>
<div class="highlight-python"><div class="highlight"><pre>MPI task timings breakdown:
@ -1773,15 +1773,15 @@ Dangerous builds = 0
</pre></div>
</div>
<p>The first section provides a global loop timing summary. The loop time
is the total wall time for the section. The second line provides the
CPU utilzation per MPI task; it should be close to 100% times the number
of OpenMP threads (or 1). Lower numbers correspond to delays due to
file i/o or unsufficient thread utilization. The <em>Performance</em> line is
is the total wall time for the section. The <em>Performance</em> line is
provided for convenience to help predicting the number of loop
continuations required and for comparing performance with other similar
MD codes.</p>
<p>The second section gives the breakdown of the CPU run time (in seconds)
into major categories:</p>
continuations required and for comparing performance with other
similar MD codes. The CPU use line provides the CPU utilzation per
MPI task; it should be close to 100% times the number of OpenMP
threads (or 1). Lower numbers correspond to delays due to file I/O or
insufficient thread utilization.</p>
<p>The MPI task section gives the breakdown of the CPU run time (in
seconds) into major categories:</p>
<ul class="simple">
<li><em>Pair</em> stands for all non-bonded force computation</li>
<li><em>Bond</em> stands for bonded interactions: bonds, angles, dihedrals, impropers</li>
@ -1799,17 +1799,17 @@ the amount of load imbalance in this segment of the calculation. Ideally
the difference between minimum, maximum and average is small and thus
the variation from the average close to zero. The final column shows
the percentage of the total loop time is spent in this section.</p>
<p>When using the <code class="xref doc docutils literal"><span class="pre">timers</span> <span class="pre">full</span></code> setting, and additional column
is present that also prints the CPU utilization in percent. In addition,
when using <em>timers full</em> and the <a class="reference internal" href="package.html"><em>package omp</em></a> command are
active, a similar timing summary of time spent in threaded regions to
monitor thread utilization and load balance is provided. A new enrty is
the <em>Reduce</em> section, which lists the time spend in reducing the per-thread
data elements to the storage for non-threaded computation. These thread
timings are taking from the first MPI rank only and and thus, as the
breakdown for MPI tasks can change from MPI rank to MPI rank, this
breakdown can be very different for individual ranks. Here is an example
output for this optional output section:</p>
<p>When using the <code class="xref doc docutils literal"><span class="pre">timers</span> <span class="pre">full</span></code> setting, an additional column
is present that also prints the CPU utilization in percent. In
addition, when using <em>timers full</em> and the <a class="reference internal" href="package.html"><em>package omp</em></a>
command are active, a similar timing summary of time spent in threaded
regions to monitor thread utilization and load balance is provided. A
new entry is the <em>Reduce</em> section, which lists the time spend in
reducing the per-thread data elements to the storage for non-threaded
computation. These thread timings are taking from the first MPI rank
only and and thus, as the breakdown for MPI tasks can change from MPI
rank to MPI rank, this breakdown can be very different for individual
ranks. Here is an example output for this section:</p>
<p>Thread timings breakdown (MPI rank 0):
Total threaded time 0.6846 / 90.6%
Section | min time | avg time | max time <a href="#id17"><span class="problematic" id="id18">|%varavg|</span></a> %total

View File

@ -1746,8 +1746,9 @@ appends statistics about the CPU time and storage requirements for the
simulation. An example set of statistics is shown here:
Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
97.0% CPU use with 4 MPI tasks x no OpenMP threads
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s :pre
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre
MPI task timings breakdown:
Section | min time | avg time | max time |%varavg| %total
@ -1775,16 +1776,16 @@ Neighbor list builds = 26
Dangerous builds = 0 :pre
The first section provides a global loop timing summary. The loop time
is the total wall time for the section. The second line provides the
CPU utilzation per MPI task; it should be close to 100% times the number
of OpenMP threads (or 1). Lower numbers correspond to delays due to
file i/o or unsufficient thread utilization. The {Performance} line is
is the total wall time for the section. The {Performance} line is
provided for convenience to help predicting the number of loop
continuations required and for comparing performance with other similar
MD codes.
continuations required and for comparing performance with other
similar MD codes. The CPU use line provides the CPU utilzation per
MPI task; it should be close to 100% times the number of OpenMP
threads (or 1). Lower numbers correspond to delays due to file I/O or
insufficient thread utilization.
The second section gives the breakdown of the CPU run time (in seconds)
into major categories:
The MPI task section gives the breakdown of the CPU run time (in
seconds) into major categories:
{Pair} stands for all non-bonded force computation
{Bond} stands for bonded interactions: bonds, angles, dihedrals, impropers
@ -1803,17 +1804,17 @@ the difference between minimum, maximum and average is small and thus
the variation from the average close to zero. The final column shows
the percentage of the total loop time is spent in this section.
When using the "timers full"_timers.html setting, and additional column
is present that also prints the CPU utilization in percent. In addition,
when using {timers full} and the "package omp"_package.html command are
active, a similar timing summary of time spent in threaded regions to
monitor thread utilization and load balance is provided. A new enrty is
the {Reduce} section, which lists the time spend in reducing the per-thread
data elements to the storage for non-threaded computation. These thread
timings are taking from the first MPI rank only and and thus, as the
breakdown for MPI tasks can change from MPI rank to MPI rank, this
breakdown can be very different for individual ranks. Here is an example
output for this optional output section:
When using the "timers full"_timers.html setting, an additional column
is present that also prints the CPU utilization in percent. In
addition, when using {timers full} and the "package omp"_package.html
command are active, a similar timing summary of time spent in threaded
regions to monitor thread utilization and load balance is provided. A
new entry is the {Reduce} section, which lists the time spend in
reducing the per-thread data elements to the storage for non-threaded
computation. These thread timings are taking from the first MPI rank
only and and thus, as the breakdown for MPI tasks can change from MPI
rank to MPI rank, this breakdown can be very different for individual
ranks. Here is an example output for this section:
Thread timings breakdown (MPI rank 0):
Total threaded time 0.6846 / 90.6%
@ -1825,7 +1826,6 @@ Kspace | 0.070572 | 0.074541 | 0.07851 | 1.5 | 10.89
Neigh | 0.084778 | 0.086969 | 0.089161 | 0.7 | 12.70
Reduce | 0.0036485 | 0.003737 | 0.0038254 | 0.1 | 0.55
The third section lists the number of owned atoms (Nlocal), ghost atoms
(Nghost), and pair-wise neighbors stored per processor. The max and min
values give the spread of these values across processors with a 10-bin

File diff suppressed because one or more lines are too long