forked from lijiext/lammps
177 lines
8.0 KiB
Plaintext
177 lines
8.0 KiB
Plaintext
"Higher level section"_Run_head.html - "LAMMPS WWW Site"_lws - "LAMMPS
|
|
Documentation"_ld - "LAMMPS Commands"_lc :c
|
|
|
|
:link(lws,http://lammps.sandia.gov)
|
|
:link(ld,Manual.html)
|
|
:link(lc,Commands_all.html)
|
|
|
|
:line
|
|
|
|
Screen and logfile output :h3
|
|
|
|
As LAMMPS reads an input script, it prints information to both the
|
|
screen and a log file about significant actions it takes to setup a
|
|
simulation. When the simulation is ready to begin, LAMMPS performs
|
|
various initializations, and prints info about the run it is about to
|
|
perform, including the amount of memory (in MBytes per processor) that
|
|
the simulation requires. It also prints details of the initial
|
|
thermodynamic state of the system. During the run itself,
|
|
thermodynamic information is printed periodically, every few
|
|
timesteps. When the run concludes, LAMMPS prints the final
|
|
thermodynamic state and a total run time for the simulation. It also
|
|
appends statistics about the CPU time and storage requirements for the
|
|
simulation. An example set of statistics is shown here:
|
|
|
|
Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms :pre
|
|
|
|
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
|
|
97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre
|
|
|
|
MPI task timings breakdown:
|
|
Section | min time | avg time | max time |%varavg| %total
|
|
---------------------------------------------------------------
|
|
Pair | 1.9808 | 2.0134 | 2.0318 | 1.4 | 71.60
|
|
Bond | 0.0021894 | 0.0060319 | 0.010058 | 4.7 | 0.21
|
|
Kspace | 0.3207 | 0.3366 | 0.36616 | 3.1 | 11.97
|
|
Neigh | 0.28411 | 0.28464 | 0.28516 | 0.1 | 10.12
|
|
Comm | 0.075732 | 0.077018 | 0.07883 | 0.4 | 2.74
|
|
Output | 0.00030518 | 0.00042665 | 0.00078821 | 1.0 | 0.02
|
|
Modify | 0.086606 | 0.086631 | 0.086668 | 0.0 | 3.08
|
|
Other | | 0.007178 | | | 0.26 :pre
|
|
|
|
Nlocal: 501 ave 508 max 490 min
|
|
Histogram: 1 0 0 0 0 0 1 1 0 1
|
|
Nghost: 6586.25 ave 6628 max 6548 min
|
|
Histogram: 1 0 1 0 0 0 1 0 0 1
|
|
Neighs: 177007 ave 180562 max 170212 min
|
|
Histogram: 1 0 0 0 0 0 0 1 1 1 :pre
|
|
|
|
Total # of neighbors = 708028
|
|
Ave neighs/atom = 353.307
|
|
Ave special neighs/atom = 2.34032
|
|
Neighbor list builds = 26
|
|
Dangerous builds = 0 :pre
|
|
|
|
:line
|
|
|
|
The first section provides a global loop timing summary. The {loop
|
|
time} is the total wall-clock time for the simulation to run. The
|
|
{Performance} line is provided for convenience to help predict how
|
|
long it will take to run a desired physical simulation. The {CPU use}
|
|
line provides the CPU utilization per MPI task; it should be close to
|
|
100% times the number of OpenMP threads (or 1 of not using OpenMP).
|
|
Lower numbers correspond to delays due to file I/O or insufficient
|
|
thread utilization.
|
|
|
|
:line
|
|
|
|
The {MPI task} section gives the breakdown of the CPU run time (in
|
|
seconds) into major categories:
|
|
|
|
{Pair} = non-bonded force computations
|
|
{Bond} = bonded interactions: bonds, angles, dihedrals, impropers
|
|
{Kspace} = long-range interactions: Ewald, PPPM, MSM
|
|
{Neigh} = neighbor list construction
|
|
{Comm} = inter-processor communication of atoms and their properties
|
|
{Output} = output of thermodynamic info and dump files
|
|
{Modify} = fixes and computes invoked by fixes
|
|
{Other} = all the remaining time :ul
|
|
|
|
For each category, there is a breakdown of the least, average and most
|
|
amount of wall time any processor spent on this category of
|
|
computation. The "%varavg" is the percentage by which the max or min
|
|
varies from the average. This is an indication of load imbalance. A
|
|
percentage close to 0 is perfect load balance. A large percentage is
|
|
imbalance. The final "%total" column is the percentage of the total
|
|
loop time is spent in this category.
|
|
|
|
When using the "timer full"_timer.html setting, an additional column
|
|
is added that also prints the CPU utilization in percent. In addition,
|
|
when using {timer full} and the "package omp"_package.html command are
|
|
active, a similar timing summary of time spent in threaded regions to
|
|
monitor thread utilization and load balance is provided. A new {Thread
|
|
timings} section is also added, which lists the time spent in reducing
|
|
the per-thread data elements to the storage for non-threaded
|
|
computation. These thread timings are measured for the first MPI rank
|
|
only and thus, because the breakdown for MPI tasks can change from
|
|
MPI rank to MPI rank, this breakdown can be very different for
|
|
individual ranks. Here is an example output for this section:
|
|
|
|
Thread timings breakdown (MPI rank 0):
|
|
Total threaded time 0.6846 / 90.6%
|
|
Section | min time | avg time | max time |%varavg| %total
|
|
---------------------------------------------------------------
|
|
Pair | 0.5127 | 0.5147 | 0.5167 | 0.3 | 75.18
|
|
Bond | 0.0043139 | 0.0046779 | 0.0050418 | 0.5 | 0.68
|
|
Kspace | 0.070572 | 0.074541 | 0.07851 | 1.5 | 10.89
|
|
Neigh | 0.084778 | 0.086969 | 0.089161 | 0.7 | 12.70
|
|
Reduce | 0.0036485 | 0.003737 | 0.0038254 | 0.1 | 0.55 :pre
|
|
|
|
:line
|
|
|
|
The third section above lists the number of owned atoms (Nlocal),
|
|
ghost atoms (Nghost), and pair-wise neighbors stored per processor.
|
|
The max and min values give the spread of these values across
|
|
processors with a 10-bin histogram showing the distribution. The total
|
|
number of histogram counts is equal to the number of processors.
|
|
|
|
:line
|
|
|
|
The last section gives aggregate statistics (across all processors)
|
|
for pair-wise neighbors and special neighbors that LAMMPS keeps track
|
|
of (see the "special_bonds"_special_bonds.html command). The number
|
|
of times neighbor lists were rebuilt is tallied, as is the number of
|
|
potentially {dangerous} rebuilds. If atom movement triggered neighbor
|
|
list rebuilding (see the "neigh_modify"_neigh_modify.html command),
|
|
then dangerous reneighborings are those that were triggered on the
|
|
first timestep atom movement was checked for. If this count is
|
|
non-zero you may wish to reduce the delay factor to insure no force
|
|
interactions are missed by atoms moving beyond the neighbor skin
|
|
distance before a rebuild takes place.
|
|
|
|
:line
|
|
|
|
If an energy minimization was performed via the
|
|
"minimize"_minimize.html command, additional information is printed,
|
|
e.g.
|
|
|
|
Minimization stats:
|
|
Stopping criterion = linesearch alpha is zero
|
|
Energy initial, next-to-last, final =
|
|
-6372.3765206 -8328.46998942 -8328.46998942
|
|
Force two-norm initial, final = 1059.36 5.36874
|
|
Force max component initial, final = 58.6026 1.46872
|
|
Final line search alpha, max atom move = 2.7842e-10 4.0892e-10
|
|
Iterations, force evaluations = 701 1516 :pre
|
|
|
|
The first line prints the criterion that determined minimization was
|
|
converged. The next line lists the initial and final energy, as well
|
|
as the energy on the next-to-last iteration. The next 2 lines give a
|
|
measure of the gradient of the energy (force on all atoms). The
|
|
2-norm is the "length" of this 3N-component force vector; the largest
|
|
component (x, y, or z) of force (infinity-norm) is also given. Then
|
|
information is provided about the line search and statistics on how
|
|
many iterations and force-evaluations the minimizer required.
|
|
Multiple force evaluations are typically done at each iteration to
|
|
perform a 1d line minimization in the search direction. See the
|
|
"minimize"_minimize.html doc page for more details.
|
|
|
|
:line
|
|
|
|
If a "kspace_style"_kspace_style.html long-range Coulombics solver
|
|
that performs FFTs was used during the run (PPPM, Ewald), then
|
|
additional information is printed, e.g.
|
|
|
|
FFT time (% of Kspce) = 0.200313 (8.34477)
|
|
FFT Gflps 3d 1d-only = 2.31074 9.19989 :pre
|
|
|
|
The first line is the time spent doing 3d FFTs (several per timestep)
|
|
and the fraction it represents of the total KSpace time (listed
|
|
above). Each 3d FFT requires computation (3 sets of 1d FFTs) and
|
|
communication (transposes). The total flops performed is 5Nlog_2(N),
|
|
where N is the number of points in the 3d grid. The FFTs are timed
|
|
with and without the communication and a Gflop rate is computed. The
|
|
3d rate is with communication; the 1d rate is without (just the 1d
|
|
FFTs). Thus you can estimate what fraction of your FFT time was spent
|
|
in communication, roughly 75% in the example above.
|