forked from lijiext/lammps
formatting improvements and small corrections for timer settings and output discussions
This commit is contained in:
parent
d014e00e53
commit
1d0e600ab7
|
@ -1727,7 +1727,7 @@ thermodynamic state and a total run time for the simulation. It then
|
||||||
appends statistics about the CPU time and storage requirements for the
|
appends statistics about the CPU time and storage requirements for the
|
||||||
simulation. An example set of statistics is shown here:
|
simulation. An example set of statistics is shown here:
|
||||||
|
|
||||||
Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms
|
Loop time of 2.81192 on 4 procs for 300 steps with 2004 atoms :pre
|
||||||
|
|
||||||
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
|
Performance: 18.436 ns/day 1.302 hours/ns 106.689 timesteps/s
|
||||||
97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre
|
97.0% CPU use with 4 MPI tasks x no OpenMP threads :pre
|
||||||
|
@ -1757,14 +1757,14 @@ Ave special neighs/atom = 2.34032
|
||||||
Neighbor list builds = 26
|
Neighbor list builds = 26
|
||||||
Dangerous builds = 0 :pre
|
Dangerous builds = 0 :pre
|
||||||
|
|
||||||
The first section provides a global loop timing summary. The loop time
|
The first section provides a global loop timing summary. The {loop time}
|
||||||
is the total wall time for the section. The {Performance} line is
|
is the total wall time for the section. The {Performance} line is
|
||||||
provided for convenience to help predicting the number of loop
|
provided for convenience to help predicting the number of loop
|
||||||
continuations required and for comparing performance with other
|
continuations required and for comparing performance with other,
|
||||||
similar MD codes. The CPU use line provides the CPU utilzation per
|
similar MD codes. The {CPU use} line provides the CPU utilzation per
|
||||||
MPI task; it should be close to 100% times the number of OpenMP
|
MPI task; it should be close to 100% times the number of OpenMP
|
||||||
threads (or 1). Lower numbers correspond to delays due to file I/O or
|
threads (or 1 of no OpenMP). Lower numbers correspond to delays due
|
||||||
insufficient thread utilization.
|
to file I/O or insufficient thread utilization.
|
||||||
|
|
||||||
The MPI task section gives the breakdown of the CPU run time (in
|
The MPI task section gives the breakdown of the CPU run time (in
|
||||||
seconds) into major categories:
|
seconds) into major categories:
|
||||||
|
@ -1791,7 +1791,7 @@ is present that also prints the CPU utilization in percent. In
|
||||||
addition, when using {timer full} and the "package omp"_package.html
|
addition, when using {timer full} and the "package omp"_package.html
|
||||||
command are active, a similar timing summary of time spent in threaded
|
command are active, a similar timing summary of time spent in threaded
|
||||||
regions to monitor thread utilization and load balance is provided. A
|
regions to monitor thread utilization and load balance is provided. A
|
||||||
new entry is the {Reduce} section, which lists the time spend in
|
new entry is the {Reduce} section, which lists the time spent in
|
||||||
reducing the per-thread data elements to the storage for non-threaded
|
reducing the per-thread data elements to the storage for non-threaded
|
||||||
computation. These thread timings are taking from the first MPI rank
|
computation. These thread timings are taking from the first MPI rank
|
||||||
only and and thus, as the breakdown for MPI tasks can change from MPI
|
only and and thus, as the breakdown for MPI tasks can change from MPI
|
||||||
|
|
|
@ -33,14 +33,14 @@ timer loop :pre
|
||||||
Select the level of detail at which LAMMPS performs its CPU timings.
|
Select the level of detail at which LAMMPS performs its CPU timings.
|
||||||
Multiple keywords can be specified with the {timer} command. For
|
Multiple keywords can be specified with the {timer} command. For
|
||||||
keywords that are mutually exclusive, the last one specified takes
|
keywords that are mutually exclusive, the last one specified takes
|
||||||
effect.
|
precedence.
|
||||||
|
|
||||||
During a simulation run LAMMPS collects information about how much
|
During a simulation run LAMMPS collects information about how much
|
||||||
time is spent in different sections of the code and thus can provide
|
time is spent in different sections of the code and thus can provide
|
||||||
information for determining performance and load imbalance problems.
|
information for determining performance and load imbalance problems.
|
||||||
This can be done at different levels of detail and accuracy. For more
|
This can be done at different levels of detail and accuracy. For more
|
||||||
information about the timing output, see this "discussion of screen
|
information about the timing output, see this "discussion of screen
|
||||||
output"_Section_start.html#start_8.
|
output in Section 2.8"_Section_start.html#start_8.
|
||||||
|
|
||||||
The {off} setting will turn all time measurements off. The {loop}
|
The {off} setting will turn all time measurements off. The {loop}
|
||||||
setting will only measure the total time for a run and not collect any
|
setting will only measure the total time for a run and not collect any
|
||||||
|
@ -52,20 +52,22 @@ procsessors. The {full} setting adds information about CPU
|
||||||
utilization and thread utilization, when multi-threading is enabled.
|
utilization and thread utilization, when multi-threading is enabled.
|
||||||
|
|
||||||
With the {sync} setting, all MPI tasks are synchronized at each timer
|
With the {sync} setting, all MPI tasks are synchronized at each timer
|
||||||
call which meaures load imbalance more accuractly, though it can also
|
call which measures load imbalance for each section more accuractly,
|
||||||
slow down the simulation. Using the {nosync} setting (which is the
|
though it can also slow down the simulation by prohibiting overlapping
|
||||||
default) turns off this synchronization.
|
independent computations on different MPI ranks Using the {nosync}
|
||||||
|
setting (which is the default) turns this synchronization off.
|
||||||
|
|
||||||
With the {timeout} keyword a walltime limit can be imposed that
|
With the {timeout} keyword a walltime limit can be imposed, that
|
||||||
affects the "run"_run.html and "minimize"_minimize.html commands.
|
affects the "run"_run.html and "minimize"_minimize.html commands.
|
||||||
This can be convenient when runs have to confirm to time limits,
|
This can be convenient when calculations have to comply with execution
|
||||||
e.g. when running under a batch system and you want to maximize
|
time limits, e.g. when running under a batch system when you want to
|
||||||
the utilization of the batch time slot, especially when the time
|
maximize the utilization of the batch time slot, especially for runs
|
||||||
per timestep varies and is thus difficult to predict how many
|
where the time per timestep varies much and thus it becomes difficult
|
||||||
steps a simulation can perform, or for difficult to converge
|
to predict how many steps a simulation can perform for a given walltime
|
||||||
minimizations. The timeout {elapse} value should be somewhat smaller
|
limit. This also applies for difficult to converge minimizations.
|
||||||
than the time requested from the batch system, as there is usually
|
The timeout {elapse} value should be somewhat smaller than the maximum
|
||||||
some overhead to launch jobs, and it may be advisable to write
|
wall time requested from the batch system, as there is usually
|
||||||
|
some overhead to launch jobs, and it is advisable to write
|
||||||
out a restart after terminating a run due to a timeout.
|
out a restart after terminating a run due to a timeout.
|
||||||
|
|
||||||
The timeout timer starts when the command is issued. When the time
|
The timeout timer starts when the command is issued. When the time
|
||||||
|
|
Loading…
Reference in New Issue