forked from lijiext/lammps
git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@13944 f3b2605a-c512-4ea7-a41b-209d697bcdaa
This commit is contained in:
parent
b0215cc367
commit
59149e72ff
|
@ -1739,40 +1739,91 @@ timesteps. When the run concludes, LAMMPS prints the final
|
|||
thermodynamic state and a total run time for the simulation. It then
|
||||
appends statistics about the CPU time and storage requirements for the
|
||||
simulation. An example set of statistics is shown here:</p>
|
||||
<div class="highlight-python"><div class="highlight"><pre>Loop time of 49.002 on 2 procs for 2004 atoms
|
||||
<div class="highlight-python"><div class="highlight"><pre>Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
|
||||
97.0% CPU use with 4 MPI tasks x no OpenMP threads
|
||||
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-python"><div class="highlight"><pre>Pair time (%) = 35.0495 (71.5267)
|
||||
Bond time (%) = 0.092046 (0.187841)
|
||||
Kspce time (%) = 6.42073 (13.103)
|
||||
Neigh time (%) = 2.73485 (5.5811)
|
||||
Comm time (%) = 1.50291 (3.06703)
|
||||
Outpt time (%) = 0.013799 (0.0281601)
|
||||
Other time (%) = 2.13669 (4.36041)
|
||||
<div class="highlight-python"><div class="highlight"><pre>MPI task timings breakdown:
|
||||
Section | min time | avg time | max time |%varavg| %total
|
||||
---------------------------------------------------------------
|
||||
Pair | 1.9808 | 2.0134 | 2.0318 | 1.4 | 71.60
|
||||
Bond | 0.0021894 | 0.0060319 | 0.010058 | 4.7 | 0.21
|
||||
Kspace | 0.3207 | 0.3366 | 0.36616 | 3.1 | 11.97
|
||||
Neigh | 0.28411 | 0.28464 | 0.28516 | 0.1 | 10.12
|
||||
Comm | 0.075732 | 0.077018 | 0.07883 | 0.4 | 2.74
|
||||
Output | 0.00030518 | 0.00042665 | 0.00078821 | 1.0 | 0.02
|
||||
Modify | 0.086606 | 0.086631 | 0.086668 | 0.0 | 3.08
|
||||
Other | | 0.007178 | | | 0.26
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-python"><div class="highlight"><pre>Nlocal: 1002 ave, 1015 max, 989 min
|
||||
Histogram: 1 0 0 0 0 0 0 0 0 1
|
||||
Nghost: 8720 ave, 8724 max, 8716 min
|
||||
Histogram: 1 0 0 0 0 0 0 0 0 1
|
||||
Neighs: 354141 ave, 361422 max, 346860 min
|
||||
Histogram: 1 0 0 0 0 0 0 0 0 1
|
||||
<div class="highlight-python"><div class="highlight"><pre>Nlocal: 501 ave 508 max 490 min
|
||||
Histogram: 1 0 0 0 0 0 1 1 0 1
|
||||
Nghost: 6586.25 ave 6628 max 6548 min
|
||||
Histogram: 1 0 1 0 0 0 1 0 0 1
|
||||
Neighs: 177007 ave 180562 max 170212 min
|
||||
Histogram: 1 0 0 0 0 0 0 1 1 1
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="highlight-python"><div class="highlight"><pre>Total # of neighbors = 708282
|
||||
Ave neighs/atom = 353.434
|
||||
<div class="highlight-python"><div class="highlight"><pre>Total # of neighbors = 708028
|
||||
Ave neighs/atom = 353.307
|
||||
Ave special neighs/atom = 2.34032
|
||||
Number of reneighborings = 42
|
||||
Dangerous reneighborings = 2
|
||||
Neighbor list builds = 26
|
||||
Dangerous builds = 0
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The first section gives the breakdown of the CPU run time (in seconds)
|
||||
into major categories. The second section lists the number of owned
|
||||
atoms (Nlocal), ghost atoms (Nghost), and pair-wise neighbors stored
|
||||
per processor. The max and min values give the spread of these values
|
||||
across processors with a 10-bin histogram showing the distribution.
|
||||
The total number of histogram counts is equal to the number of
|
||||
processors.</p>
|
||||
<p>The first section provides a global loop timing summary. The loop time
|
||||
is the total wall time for the section. The second line provides the
|
||||
CPU utilzation per MPI task; it should be close to 100% times the number
|
||||
of OpenMP threads (or 1). Lower numbers correspond to delays due to
|
||||
file i/o or unsufficient thread utilization. The <em>Performance</em> line is
|
||||
provided for convenience to help predicting the number of loop
|
||||
continuations required and for comparing performance with other similar
|
||||
MD codes.</p>
|
||||
<p>The second section gives the breakdown of the CPU run time (in seconds)
|
||||
into major categories:</p>
|
||||
<ul class="simple">
|
||||
<li><em>Pair</em> stands for all non-bonded force computation</li>
|
||||
<li><em>Bond</em> stands for bonded interactions: bonds, angles, dihedrals, impropers</li>
|
||||
<li><em>Kspace</em> stands for reciprocal space interactions: Ewald, PPPM, MSM</li>
|
||||
<li><em>Neigh</em> stands for neighbor list construction</li>
|
||||
<li><em>Comm</em> stands for communicating atoms and their properties</li>
|
||||
<li><em>Output</em> stands for writing dumps and thermo output</li>
|
||||
<li><em>Modify</em> stands for fixes and computes called by them</li>
|
||||
<li><em>Other</em> is the remaining time</li>
|
||||
</ul>
|
||||
<p>For each category, there is a breakdown of the least, average and most
|
||||
amount of wall time a processor spent on this section. Also you have the
|
||||
variation from the average time. Together these numbers allow to gauge
|
||||
the amount of load imbalance in this segment of the calculation. Ideally
|
||||
the difference between minimum, maximum and average is small and thus
|
||||
the variation from the average close to zero. The final column shows
|
||||
the percentage of the total loop time is spent in this section.</p>
|
||||
<p>When using the <code class="xref doc docutils literal"><span class="pre">timers</span> <span class="pre">full</span></code> setting, and additional column
|
||||
is present that also prints the CPU utilization in percent. In addition,
|
||||
when using <em>timers full</em> and the <a class="reference internal" href="package.html"><em>package omp</em></a> command are
|
||||
active, a similar timing summary of time spent in threaded regions to
|
||||
monitor thread utilization and load balance is provided. A new enrty is
|
||||
the <em>Reduce</em> section, which lists the time spend in reducing the per-thread
|
||||
data elements to the storage for non-threaded computation. These thread
|
||||
timings are taking from the first MPI rank only and and thus, as the
|
||||
breakdown for MPI tasks can change from MPI rank to MPI rank, this
|
||||
breakdown can be very different for individual ranks. Here is an example
|
||||
output for this optional output section:</p>
|
||||
<p>Thread timings breakdown (MPI rank 0):
|
||||
Total threaded time 0.6846 / 90.6%
|
||||
Section | min time | avg time | max time <a href="#id17"><span class="problematic" id="id18">|%varavg|</span></a> %total
|
||||
—————————————————————
|
||||
Pair | 0.5127 | 0.5147 | 0.5167 | 0.3 | 75.18
|
||||
Bond | 0.0043139 | 0.0046779 | 0.0050418 | 0.5 | 0.68
|
||||
Kspace | 0.070572 | 0.074541 | 0.07851 | 1.5 | 10.89
|
||||
Neigh | 0.084778 | 0.086969 | 0.089161 | 0.7 | 12.70
|
||||
Reduce | 0.0036485 | 0.003737 | 0.0038254 | 0.1 | 0.55</p>
|
||||
<p>The third section lists the number of owned atoms (Nlocal), ghost atoms
|
||||
(Nghost), and pair-wise neighbors stored per processor. The max and min
|
||||
values give the spread of these values across processors with a 10-bin
|
||||
histogram showing the distribution. The total number of histogram counts
|
||||
is equal to the number of processors.</p>
|
||||
<p>The last section gives aggregate statistics for pair-wise neighbors
|
||||
and special neighbors that LAMMPS keeps track of (see the
|
||||
<a class="reference internal" href="special_bonds.html"><em>special_bonds</em></a> command). The number of times
|
||||
|
@ -1789,21 +1840,24 @@ takes place.</p>
|
|||
<a class="reference internal" href="minimize.html"><em>minimize</em></a> command, additional information is printed,
|
||||
e.g.</p>
|
||||
<div class="highlight-python"><div class="highlight"><pre>Minimization stats:
|
||||
E initial, next-to-last, final = -0.895962 -2.94193 -2.94342
|
||||
Gradient 2-norm init/final= 1920.78 20.9992
|
||||
Gradient inf-norm init/final= 304.283 9.61216
|
||||
Iterations = 36
|
||||
Force evaluations = 177
|
||||
Stopping criterion = linesearch alpha is zero
|
||||
Energy initial, next-to-last, final =
|
||||
-6372.3765206 -8328.46998942 -8328.46998942
|
||||
Force two-norm initial, final = 1059.36 5.36874
|
||||
Force max component initial, final = 58.6026 1.46872
|
||||
Final line search alpha, max atom move = 2.7842e-10 4.0892e-10
|
||||
Iterations, force evaluations = 701 1516
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The first line lists the initial and final energy, as well as the
|
||||
energy on the next-to-last iteration. The next 2 lines give a measure
|
||||
of the gradient of the energy (force on all atoms). The 2-norm is the
|
||||
“length” of this force vector; the inf-norm is the largest component.
|
||||
The last 2 lines are statistics on how many iterations and
|
||||
force-evaluations the minimizer required. Multiple force evaluations
|
||||
are typically done at each iteration to perform a 1d line minimization
|
||||
in the search direction.</p>
|
||||
<p>The first line prints the criterion that determined the minimization
|
||||
to be completed. The third line lists the initial and final energy,
|
||||
as well as the energy on the next-to-last iteration. The next 2 lines
|
||||
give a measure of the gradient of the energy (force on all atoms).
|
||||
The 2-norm is the “length” of this force vector; the inf-norm is the
|
||||
largest component. Then some information about the line search and
|
||||
statistics on how many iterations and force-evaluations the minimizer
|
||||
required. Multiple force evaluations are typically done at each
|
||||
iteration to perform a 1d line minimization in the search direction.</p>
|
||||
<p>If a <a class="reference internal" href="kspace_style.html"><em>kspace_style</em></a> long-range Coulombics solve was
|
||||
performed during the run (PPPM, Ewald), then additional information is
|
||||
printed, e.g.</p>
|
||||
|
|
|
@ -1745,36 +1745,92 @@ thermodynamic state and a total run time for the simulation. It then
|
|||
appends statistics about the CPU time and storage requirements for the
|
||||
simulation. An example set of statistics is shown here:
|
||||
|
||||
Loop time of 49.002 on 2 procs for 2004 atoms :pre
|
||||
Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
|
||||
97.0% CPU use with 4 MPI tasks x no OpenMP threads
|
||||
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s :pre
|
||||
|
||||
Pair time (%) = 35.0495 (71.5267)
|
||||
Bond time (%) = 0.092046 (0.187841)
|
||||
Kspce time (%) = 6.42073 (13.103)
|
||||
Neigh time (%) = 2.73485 (5.5811)
|
||||
Comm time (%) = 1.50291 (3.06703)
|
||||
Outpt time (%) = 0.013799 (0.0281601)
|
||||
Other time (%) = 2.13669 (4.36041) :pre
|
||||
MPI task timings breakdown:
|
||||
Section | min time | avg time | max time |%varavg| %total
|
||||
---------------------------------------------------------------
|
||||
Pair | 1.9808 | 2.0134 | 2.0318 | 1.4 | 71.60
|
||||
Bond | 0.0021894 | 0.0060319 | 0.010058 | 4.7 | 0.21
|
||||
Kspace | 0.3207 | 0.3366 | 0.36616 | 3.1 | 11.97
|
||||
Neigh | 0.28411 | 0.28464 | 0.28516 | 0.1 | 10.12
|
||||
Comm | 0.075732 | 0.077018 | 0.07883 | 0.4 | 2.74
|
||||
Output | 0.00030518 | 0.00042665 | 0.00078821 | 1.0 | 0.02
|
||||
Modify | 0.086606 | 0.086631 | 0.086668 | 0.0 | 3.08
|
||||
Other | | 0.007178 | | | 0.26 :pre
|
||||
|
||||
Nlocal: 1002 ave, 1015 max, 989 min
|
||||
Histogram: 1 0 0 0 0 0 0 0 0 1
|
||||
Nghost: 8720 ave, 8724 max, 8716 min
|
||||
Histogram: 1 0 0 0 0 0 0 0 0 1
|
||||
Neighs: 354141 ave, 361422 max, 346860 min
|
||||
Histogram: 1 0 0 0 0 0 0 0 0 1 :pre
|
||||
Nlocal: 501 ave 508 max 490 min
|
||||
Histogram: 1 0 0 0 0 0 1 1 0 1
|
||||
Nghost: 6586.25 ave 6628 max 6548 min
|
||||
Histogram: 1 0 1 0 0 0 1 0 0 1
|
||||
Neighs: 177007 ave 180562 max 170212 min
|
||||
Histogram: 1 0 0 0 0 0 0 1 1 1 :pre
|
||||
|
||||
Total # of neighbors = 708282
|
||||
Ave neighs/atom = 353.434
|
||||
Total # of neighbors = 708028
|
||||
Ave neighs/atom = 353.307
|
||||
Ave special neighs/atom = 2.34032
|
||||
Number of reneighborings = 42
|
||||
Dangerous reneighborings = 2 :pre
|
||||
Neighbor list builds = 26
|
||||
Dangerous builds = 0 :pre
|
||||
|
||||
The first section gives the breakdown of the CPU run time (in seconds)
|
||||
into major categories. The second section lists the number of owned
|
||||
atoms (Nlocal), ghost atoms (Nghost), and pair-wise neighbors stored
|
||||
per processor. The max and min values give the spread of these values
|
||||
across processors with a 10-bin histogram showing the distribution.
|
||||
The total number of histogram counts is equal to the number of
|
||||
processors.
|
||||
The first section provides a global loop timing summary. The loop time
|
||||
is the total wall time for the section. The second line provides the
|
||||
CPU utilzation per MPI task; it should be close to 100% times the number
|
||||
of OpenMP threads (or 1). Lower numbers correspond to delays due to
|
||||
file i/o or unsufficient thread utilization. The {Performance} line is
|
||||
provided for convenience to help predicting the number of loop
|
||||
continuations required and for comparing performance with other similar
|
||||
MD codes.
|
||||
|
||||
The second section gives the breakdown of the CPU run time (in seconds)
|
||||
into major categories:
|
||||
|
||||
{Pair} stands for all non-bonded force computation
|
||||
{Bond} stands for bonded interactions: bonds, angles, dihedrals, impropers
|
||||
{Kspace} stands for reciprocal space interactions: Ewald, PPPM, MSM
|
||||
{Neigh} stands for neighbor list construction
|
||||
{Comm} stands for communicating atoms and their properties
|
||||
{Output} stands for writing dumps and thermo output
|
||||
{Modify} stands for fixes and computes called by them
|
||||
{Other} is the remaining time :ul
|
||||
|
||||
For each category, there is a breakdown of the least, average and most
|
||||
amount of wall time a processor spent on this section. Also you have the
|
||||
variation from the average time. Together these numbers allow to gauge
|
||||
the amount of load imbalance in this segment of the calculation. Ideally
|
||||
the difference between minimum, maximum and average is small and thus
|
||||
the variation from the average close to zero. The final column shows
|
||||
the percentage of the total loop time is spent in this section.
|
||||
|
||||
When using the "timers full"_timers.html setting, and additional column
|
||||
is present that also prints the CPU utilization in percent. In addition,
|
||||
when using {timers full} and the "package omp"_package.html command are
|
||||
active, a similar timing summary of time spent in threaded regions to
|
||||
monitor thread utilization and load balance is provided. A new enrty is
|
||||
the {Reduce} section, which lists the time spend in reducing the per-thread
|
||||
data elements to the storage for non-threaded computation. These thread
|
||||
timings are taking from the first MPI rank only and and thus, as the
|
||||
breakdown for MPI tasks can change from MPI rank to MPI rank, this
|
||||
breakdown can be very different for individual ranks. Here is an example
|
||||
output for this optional output section:
|
||||
|
||||
Thread timings breakdown (MPI rank 0):
|
||||
Total threaded time 0.6846 / 90.6%
|
||||
Section | min time | avg time | max time |%varavg| %total
|
||||
---------------------------------------------------------------
|
||||
Pair | 0.5127 | 0.5147 | 0.5167 | 0.3 | 75.18
|
||||
Bond | 0.0043139 | 0.0046779 | 0.0050418 | 0.5 | 0.68
|
||||
Kspace | 0.070572 | 0.074541 | 0.07851 | 1.5 | 10.89
|
||||
Neigh | 0.084778 | 0.086969 | 0.089161 | 0.7 | 12.70
|
||||
Reduce | 0.0036485 | 0.003737 | 0.0038254 | 0.1 | 0.55
|
||||
|
||||
|
||||
The third section lists the number of owned atoms (Nlocal), ghost atoms
|
||||
(Nghost), and pair-wise neighbors stored per processor. The max and min
|
||||
values give the spread of these values across processors with a 10-bin
|
||||
histogram showing the distribution. The total number of histogram counts
|
||||
is equal to the number of processors.
|
||||
|
||||
The last section gives aggregate statistics for pair-wise neighbors
|
||||
and special neighbors that LAMMPS keeps track of (see the
|
||||
|
@ -1794,20 +1850,23 @@ If an energy minimization was performed via the
|
|||
e.g.
|
||||
|
||||
Minimization stats:
|
||||
E initial, next-to-last, final = -0.895962 -2.94193 -2.94342
|
||||
Gradient 2-norm init/final= 1920.78 20.9992
|
||||
Gradient inf-norm init/final= 304.283 9.61216
|
||||
Iterations = 36
|
||||
Force evaluations = 177 :pre
|
||||
Stopping criterion = linesearch alpha is zero
|
||||
Energy initial, next-to-last, final =
|
||||
-6372.3765206 -8328.46998942 -8328.46998942
|
||||
Force two-norm initial, final = 1059.36 5.36874
|
||||
Force max component initial, final = 58.6026 1.46872
|
||||
Final line search alpha, max atom move = 2.7842e-10 4.0892e-10
|
||||
Iterations, force evaluations = 701 1516 :pre
|
||||
|
||||
The first line lists the initial and final energy, as well as the
|
||||
energy on the next-to-last iteration. The next 2 lines give a measure
|
||||
of the gradient of the energy (force on all atoms). The 2-norm is the
|
||||
"length" of this force vector; the inf-norm is the largest component.
|
||||
The last 2 lines are statistics on how many iterations and
|
||||
force-evaluations the minimizer required. Multiple force evaluations
|
||||
are typically done at each iteration to perform a 1d line minimization
|
||||
in the search direction.
|
||||
The first line prints the criterion that determined the minimization
|
||||
to be completed. The third line lists the initial and final energy,
|
||||
as well as the energy on the next-to-last iteration. The next 2 lines
|
||||
give a measure of the gradient of the energy (force on all atoms).
|
||||
The 2-norm is the "length" of this force vector; the inf-norm is the
|
||||
largest component. Then some information about the line search and
|
||||
statistics on how many iterations and force-evaluations the minimizer
|
||||
required. Multiple force evaluations are typically done at each
|
||||
iteration to perform a 1d line minimization in the search direction.
|
||||
|
||||
If a "kspace_style"_kspace_style.html long-range Coulombics solve was
|
||||
performed during the run (PPPM, Ewald), then additional information is
|
||||
|
|
File diff suppressed because one or more lines are too long
Loading…
Reference in New Issue