formatting improvements and small corrections for timer settings and output discussions

This commit is contained in:
Axel Kohlmeyer 2017-01-10 23:47:14 -05:00
parent d014e00e53
commit 1d0e600ab7
2 changed files with 23 additions and 21 deletions

View File

@ -1727,7 +1727,7 @@ thermodynamic state and a total run time for the simulation. It then
appends statistics about the CPU time and storage requirements for the
simulation. An example set of statistics is shown here:
Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms :pre
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre
@ -1757,14 +1757,14 @@ Ave special neighs/atom = 2.34032
Neighbor list builds = 26
Dangerous builds = 0 :pre
The first section provides a global loop timing summary. The loop time
The first section provides a global loop timing summary. The {loop time}
is the total wall time for the section. The {Performance} line is
provided for convenience to help predicting the number of loop
continuations required and for comparing performance with other
similar MD codes. The CPU use line provides the CPU utilzation per
continuations required and for comparing performance with other,
similar MD codes. The {CPU use} line provides the CPU utilzation per
MPI task; it should be close to 100% times the number of OpenMP
threads (or 1). Lower numbers correspond to delays due to file I/O or
insufficient thread utilization.
threads (or 1 of no OpenMP). Lower numbers correspond to delays due
to file I/O or insufficient thread utilization.
The MPI task section gives the breakdown of the CPU run time (in
seconds) into major categories:
@ -1791,7 +1791,7 @@ is present that also prints the CPU utilization in percent. In
addition, when using {timer full} and the "package omp"_package.html
command are active, a similar timing summary of time spent in threaded
regions to monitor thread utilization and load balance is provided. A
new entry is the {Reduce} section, which lists the time spend in
new entry is the {Reduce} section, which lists the time spent in
reducing the per-thread data elements to the storage for non-threaded
computation. These thread timings are taking from the first MPI rank
only and and thus, as the breakdown for MPI tasks can change from MPI

View File

@ -33,14 +33,14 @@ timer loop :pre
Select the level of detail at which LAMMPS performs its CPU timings.
Multiple keywords can be specified with the {timer} command. For
keywords that are mutually exclusive, the last one specified takes
effect.
precedence.
During a simulation run LAMMPS collects information about how much
time is spent in different sections of the code and thus can provide
information for determining performance and load imbalance problems.
This can be done at different levels of detail and accuracy. For more
information about the timing output, see this "discussion of screen
output"_Section_start.html#start_8.
output in Section 2.8"_Section_start.html#start_8.
The {off} setting will turn all time measurements off. The {loop}
setting will only measure the total time for a run and not collect any
@ -52,20 +52,22 @@ procsessors. The {full} setting adds information about CPU
utilization and thread utilization, when multi-threading is enabled.
With the {sync} setting, all MPI tasks are synchronized at each timer
call which meaures load imbalance more accuractly, though it can also
slow down the simulation. Using the {nosync} setting (which is the
default) turns off this synchronization.
call which measures load imbalance for each section more accuractly,
though it can also slow down the simulation by prohibiting overlapping
independent computations on different MPI ranks Using the {nosync}
setting (which is the default) turns this synchronization off.
With the {timeout} keyword a walltime limit can be imposed that
With the {timeout} keyword a walltime limit can be imposed, that
affects the "run"_run.html and "minimize"_minimize.html commands.
This can be convenient when runs have to confirm to time limits,
e.g. when running under a batch system and you want to maximize
the utilization of the batch time slot, especially when the time
per timestep varies and is thus difficult to predict how many
steps a simulation can perform, or for difficult to converge
minimizations. The timeout {elapse} value should be somewhat smaller
than the time requested from the batch system, as there is usually
some overhead to launch jobs, and it may be advisable to write
This can be convenient when calculations have to comply with execution
time limits, e.g. when running under a batch system when you want to
maximize the utilization of the batch time slot, especially for runs
where the time per timestep varies much and thus it becomes difficult
to predict how many steps a simulation can perform for a given walltime
limit. This also applies for difficult to converge minimizations.
The timeout {elapse} value should be somewhat smaller than the maximum
wall time requested from the batch system, as there is usually
some overhead to launch jobs, and it is advisable to write
out a restart after terminating a run due to a timeout.
The timeout timer starts when the command is issued. When the time