''

git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@13962 f3b2605a-c512-4ea7-a41b-209d697bcdaa
2015-08-29 00:13:46 +00:00 · 2015-08-29 00:13:46 +00:00 · 34c599b125
parent 4d13f3d33d
commit 34c599b125
3 changed files with 44 additions and 44 deletions
--- a/doc/Section_start.html
+++ b/doc/Section_start.html
@ -1739,9 +1739,9 @@ timesteps.  When the run concludes, LAMMPS prints the final
 thermodynamic state and a total run time for the simulation.  It then
 appends statistics about the CPU time and storage requirements for the
 simulation.  An example set of statistics is shown here:</p>
-<div class="highlight-python"><div class="highlight"><pre>Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
+<p>Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms</p>
+<div class="highlight-python"><div class="highlight"><pre>Performance: 18.436 ns/day  1.302 hours/ns  106.689 timesteps/s
 97.0% CPU use with 4 MPI tasks x no OpenMP threads
-Performance: 18.436 ns/day  1.302 hours/ns  106.689 timesteps/s
 </pre></div>
 </div>
 <div class="highlight-python"><div class="highlight"><pre>MPI task timings breakdown:
@ -1773,15 +1773,15 @@ Dangerous builds = 0
 </pre></div>
 </div>
 <p>The first section provides a global loop timing summary. The loop time
-is the total wall time for the section. The second line provides the
-CPU utilzation per MPI task; it should be close to 100% times the number
-of OpenMP threads (or 1). Lower numbers correspond to delays due to
-file i/o or unsufficient thread utilization. The <em>Performance</em> line is
+is the total wall time for the section.  The <em>Performance</em> line is
 provided for convenience to help predicting the number of loop
-continuations required and for comparing performance with other similar
-MD codes.</p>
-<p>The second section gives the breakdown of the CPU run time (in seconds)
-into major categories:</p>
+continuations required and for comparing performance with other
+similar MD codes.  The CPU use line provides the CPU utilzation per
+MPI task; it should be close to 100% times the number of OpenMP
+threads (or 1). Lower numbers correspond to delays due to file I/O or
+insufficient thread utilization.</p>
+<p>The MPI task section gives the breakdown of the CPU run time (in
+seconds) into major categories:</p>
 <ul class="simple">
 <li><em>Pair</em> stands for all non-bonded force computation</li>
 <li><em>Bond</em> stands for bonded interactions: bonds, angles, dihedrals, impropers</li>
@ -1799,17 +1799,17 @@ the amount of load imbalance in this segment of the calculation. Ideally
 the difference between minimum, maximum and average is small and thus
 the variation from the average close to zero. The final column shows
 the percentage of the total loop time is spent in this section.</p>
-<p>When using the <code class="xref doc docutils literal"><span class="pre">timers</span> <span class="pre">full</span></code> setting, and additional column
-is present that also prints the CPU utilization in percent. In addition,
-when using <em>timers full</em> and the <a class="reference internal" href="package.html"><em>package omp</em></a> command are
-active, a similar timing summary of time spent in threaded regions to
-monitor thread utilization and load balance is provided. A new enrty is
-the <em>Reduce</em> section, which lists the time spend in reducing the per-thread
-data elements to the storage for non-threaded computation. These thread
-timings are taking from the first MPI rank only and and thus, as the
-breakdown for MPI tasks can change from MPI rank to MPI rank, this
-breakdown can be very different for individual ranks. Here is an example
-output for this optional output section:</p>
+<p>When using the <code class="xref doc docutils literal"><span class="pre">timers</span> <span class="pre">full</span></code> setting, an additional column
+is present that also prints the CPU utilization in percent. In
+addition, when using <em>timers full</em> and the <a class="reference internal" href="package.html"><em>package omp</em></a>
+command are active, a similar timing summary of time spent in threaded
+regions to monitor thread utilization and load balance is provided. A
+new entry is the <em>Reduce</em> section, which lists the time spend in
+reducing the per-thread data elements to the storage for non-threaded
+computation. These thread timings are taking from the first MPI rank
+only and and thus, as the breakdown for MPI tasks can change from MPI
+rank to MPI rank, this breakdown can be very different for individual
+ranks. Here is an example output for this section:</p>
 <p>Thread timings breakdown (MPI rank 0):
 Total threaded time 0.6846 / 90.6%
 Section |  min time  |  avg time  |  max time  <a href="#id17"><span class="problematic" id="id18">|%varavg|</span></a> %total
--- a/doc/Section_start.txt
+++ b/doc/Section_start.txt
@ -1746,8 +1746,9 @@ appends statistics about the CPU time and storage requirements for the
 simulation.  An example set of statistics is shown here:

 Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
-97.0% CPU use with 4 MPI tasks x no OpenMP threads
-Performance: 18.436 ns/day  1.302 hours/ns  106.689 timesteps/s :pre
+
+Performance: 18.436 ns/day  1.302 hours/ns  106.689 timesteps/s
+97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre

 MPI task timings breakdown:
 Section |  min time  |  avg time  |  max time  |%varavg| %total
@ -1775,16 +1776,16 @@ Neighbor list builds = 26
 Dangerous builds = 0 :pre

 The first section provides a global loop timing summary. The loop time
-is the total wall time for the section. The second line provides the
-CPU utilzation per MPI task; it should be close to 100% times the number
-of OpenMP threads (or 1). Lower numbers correspond to delays due to
-file i/o or unsufficient thread utilization. The {Performance} line is
+is the total wall time for the section.  The {Performance} line is
 provided for convenience to help predicting the number of loop
-continuations required and for comparing performance with other similar
-MD codes.
+continuations required and for comparing performance with other
+similar MD codes.  The CPU use line provides the CPU utilzation per
+MPI task; it should be close to 100% times the number of OpenMP
+threads (or 1). Lower numbers correspond to delays due to file I/O or
+insufficient thread utilization.

-The second section gives the breakdown of the CPU run time (in seconds)
-into major categories:
+The MPI task section gives the breakdown of the CPU run time (in
+seconds) into major categories:

 {Pair} stands for all non-bonded force computation
 {Bond} stands for bonded interactions: bonds, angles, dihedrals, impropers
@ -1803,17 +1804,17 @@ the difference between minimum, maximum and average is small and thus
 the variation from the average close to zero. The final column shows
 the percentage of the total loop time is spent in this section.

-When using the "timers full"_timers.html setting, and additional column
-is present that also prints the CPU utilization in percent. In addition,
-when using {timers full} and the "package omp"_package.html command are
-active, a similar timing summary of time spent in threaded regions to 
-monitor thread utilization and load balance is provided. A new enrty is
-the {Reduce} section, which lists the time spend in reducing the per-thread
-data elements to the storage for non-threaded computation. These thread
-timings are taking from the first MPI rank only and and thus, as the
-breakdown for MPI tasks can change from MPI rank to MPI rank, this
-breakdown can be very different for individual ranks. Here is an example
-output for this optional output section:
+When using the "timers full"_timers.html setting, an additional column
+is present that also prints the CPU utilization in percent. In
+addition, when using {timers full} and the "package omp"_package.html
+command are active, a similar timing summary of time spent in threaded
+regions to monitor thread utilization and load balance is provided. A
+new entry is the {Reduce} section, which lists the time spend in
+reducing the per-thread data elements to the storage for non-threaded
+computation. These thread timings are taking from the first MPI rank
+only and and thus, as the breakdown for MPI tasks can change from MPI
+rank to MPI rank, this breakdown can be very different for individual
+ranks. Here is an example output for this section:

 Thread timings breakdown (MPI rank 0):
 Total threaded time 0.6846 / 90.6%
@ -1825,7 +1826,6 @@ Kspace  | 0.070572   | 0.074541   | 0.07851    |   1.5 | 10.89
 Neigh   | 0.084778   | 0.086969   | 0.089161   |   0.7 | 12.70
 Reduce  | 0.0036485  | 0.003737   | 0.0038254  |   0.1 |  0.55

-
 The third section lists the number of owned atoms (Nlocal), ghost atoms
 (Nghost), and pair-wise neighbors stored per processor.  The max and min
 values give the spread of these values across processors with a 10-bin
--- a/doc/searchindex.js
+++ b/doc/searchindex.js