forked from lijiext/lammps
formatting improvements and small corrections for timer settings and output discussions
This commit is contained in:
parent
d014e00e53
commit
1d0e600ab7
|
@ -1727,7 +1727,7 @@ thermodynamic state and a total run time for the simulation. It then
|
|||
appends statistics about the CPU time and storage requirements for the
|
||||
simulation. An example set of statistics is shown here:
|
||||
|
||||
Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
|
||||
Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms :pre
|
||||
|
||||
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
|
||||
97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre
|
||||
|
@ -1757,14 +1757,14 @@ Ave special neighs/atom = 2.34032
|
|||
Neighbor list builds = 26
|
||||
Dangerous builds = 0 :pre
|
||||
|
||||
The first section provides a global loop timing summary. The loop time
|
||||
The first section provides a global loop timing summary. The {loop time}
|
||||
is the total wall time for the section. The {Performance} line is
|
||||
provided for convenience to help predicting the number of loop
|
||||
continuations required and for comparing performance with other
|
||||
similar MD codes. The CPU use line provides the CPU utilzation per
|
||||
continuations required and for comparing performance with other,
|
||||
similar MD codes. The {CPU use} line provides the CPU utilzation per
|
||||
MPI task; it should be close to 100% times the number of OpenMP
|
||||
threads (or 1). Lower numbers correspond to delays due to file I/O or
|
||||
insufficient thread utilization.
|
||||
threads (or 1 of no OpenMP). Lower numbers correspond to delays due
|
||||
to file I/O or insufficient thread utilization.
|
||||
|
||||
The MPI task section gives the breakdown of the CPU run time (in
|
||||
seconds) into major categories:
|
||||
|
@ -1791,7 +1791,7 @@ is present that also prints the CPU utilization in percent. In
|
|||
addition, when using {timer full} and the "package omp"_package.html
|
||||
command are active, a similar timing summary of time spent in threaded
|
||||
regions to monitor thread utilization and load balance is provided. A
|
||||
new entry is the {Reduce} section, which lists the time spend in
|
||||
new entry is the {Reduce} section, which lists the time spent in
|
||||
reducing the per-thread data elements to the storage for non-threaded
|
||||
computation. These thread timings are taking from the first MPI rank
|
||||
only and and thus, as the breakdown for MPI tasks can change from MPI
|
||||
|
|
|
@ -33,14 +33,14 @@ timer loop :pre
|
|||
Select the level of detail at which LAMMPS performs its CPU timings.
|
||||
Multiple keywords can be specified with the {timer} command. For
|
||||
keywords that are mutually exclusive, the last one specified takes
|
||||
effect.
|
||||
precedence.
|
||||
|
||||
During a simulation run LAMMPS collects information about how much
|
||||
time is spent in different sections of the code and thus can provide
|
||||
information for determining performance and load imbalance problems.
|
||||
This can be done at different levels of detail and accuracy. For more
|
||||
information about the timing output, see this "discussion of screen
|
||||
output"_Section_start.html#start_8.
|
||||
output in Section 2.8"_Section_start.html#start_8.
|
||||
|
||||
The {off} setting will turn all time measurements off. The {loop}
|
||||
setting will only measure the total time for a run and not collect any
|
||||
|
@ -52,20 +52,22 @@ procsessors. The {full} setting adds information about CPU
|
|||
utilization and thread utilization, when multi-threading is enabled.
|
||||
|
||||
With the {sync} setting, all MPI tasks are synchronized at each timer
|
||||
call which meaures load imbalance more accuractly, though it can also
|
||||
slow down the simulation. Using the {nosync} setting (which is the
|
||||
default) turns off this synchronization.
|
||||
call which measures load imbalance for each section more accuractly,
|
||||
though it can also slow down the simulation by prohibiting overlapping
|
||||
independent computations on different MPI ranks Using the {nosync}
|
||||
setting (which is the default) turns this synchronization off.
|
||||
|
||||
With the {timeout} keyword a walltime limit can be imposed that
|
||||
With the {timeout} keyword a walltime limit can be imposed, that
|
||||
affects the "run"_run.html and "minimize"_minimize.html commands.
|
||||
This can be convenient when runs have to confirm to time limits,
|
||||
e.g. when running under a batch system and you want to maximize
|
||||
the utilization of the batch time slot, especially when the time
|
||||
per timestep varies and is thus difficult to predict how many
|
||||
steps a simulation can perform, or for difficult to converge
|
||||
minimizations. The timeout {elapse} value should be somewhat smaller
|
||||
than the time requested from the batch system, as there is usually
|
||||
some overhead to launch jobs, and it may be advisable to write
|
||||
This can be convenient when calculations have to comply with execution
|
||||
time limits, e.g. when running under a batch system when you want to
|
||||
maximize the utilization of the batch time slot, especially for runs
|
||||
where the time per timestep varies much and thus it becomes difficult
|
||||
to predict how many steps a simulation can perform for a given walltime
|
||||
limit. This also applies for difficult to converge minimizations.
|
||||
The timeout {elapse} value should be somewhat smaller than the maximum
|
||||
wall time requested from the batch system, as there is usually
|
||||
some overhead to launch jobs, and it is advisable to write
|
||||
out a restart after terminating a run due to a timeout.
|
||||
|
||||
The timeout timer starts when the command is issued. When the time
|
||||
|
|
Loading…
Reference in New Issue