forked from lijiext/lammps
128 lines
5.2 KiB
Plaintext
128 lines
5.2 KiB
Plaintext
LAMMPS benchmark problems
|
|
|
|
This directory contains 5 benchmark problems which are discussed in
|
|
the Benchmark section of the LAMMPS documentation, and on the
|
|
Benchmark page of the LAMMPS WWW site (lammps.sandia.gov/bench).
|
|
|
|
This directory also has several sub-directories:
|
|
|
|
FERMI benchmark scripts for desktop machine with Fermi GPUs (Tesla)
|
|
KEPLER benchmark scripts for GPU cluster with Kepler GPUs
|
|
POTENTIALS benchmarks scripts for various potentials in LAMMPS
|
|
|
|
The results for all of these benchmarks are displayed and discussed on
|
|
the Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench.
|
|
|
|
The remainder of this file refers to the 5 problems in the top-level
|
|
of this directory and how to run them on CPUs, either in serial or
|
|
parallel. The sub-directories have their own README files which you
|
|
should refer to before running those scripts.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
Each of the 5 problems has 32,000 atoms and runs for 100 timesteps.
|
|
Each can be run as a serial benchmark (on one processor) or in
|
|
parallel. In parallel, each benchmark can be run as a fixed-size or
|
|
scaled-size problem. For fixed-size benchmarking, the same 32K atom
|
|
problem is run on various numbers of processors. For scaled-size
|
|
benchmarking, the model size is increased with the number of
|
|
processors. E.g. on 8 processors, a 256K-atom problem is run; on 1024
|
|
processors, a 32-million atom problem is run, etc.
|
|
|
|
A few sample log file outputs on different machines and different
|
|
numbers of processors are included in this directory to compare your
|
|
answers to. E.g. a log file like log.date.chain.lmp.scaled.foo.P is
|
|
for a scaled-size version of the Chain benchmark, run on P processors
|
|
of machine "foo" with the dated version of LAMMPS. Note that the Eam
|
|
and Lj benchmarks may not give identical answers on different machines
|
|
because of the "velocity loop geom" option that assigns velocities
|
|
based on atom coordinates - see the discussion in the documentation
|
|
for the velocity command for details.
|
|
|
|
The CPU time (in seconds) for the run is in the "Loop time" line
|
|
of the log files, e.g.
|
|
|
|
Loop time of 3.89418 on 8 procs for 100 steps with 32000 atoms
|
|
|
|
Timing results for these problems run on various machines are listed
|
|
on the Benchmarks page of the LAMMPS WWW Site.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
These are the 5 benchmark problems:
|
|
|
|
LJ = atomic fluid, Lennard-Jones potential with 2.5 sigma cutoff (55
|
|
neighbors per atom), NVE integration
|
|
|
|
Chain = bead-spring polymer melt of 100-mer chains, FENE bonds and LJ
|
|
pairwise interactions with a 2^(1/6) sigma cutoff (5 neighbors per
|
|
atom), NVE integration
|
|
|
|
EAM = metallic solid, Cu EAM potential with 4.95 Angstrom cutoff (45
|
|
neighbors per atom), NVE integration
|
|
|
|
Chute = granular chute flow, frictional history potential with 1.1
|
|
sigma cutoff (7 neighbors per atom), NVE integration
|
|
|
|
Rhodo = rhodopsin protein in solvated lipid bilayer, CHARMM force
|
|
field with a 10 Angstrom LJ cutoff (440 neighbors per atom),
|
|
particle-particle particle-mesh (PPPM) for long-range Coulombics, NPT
|
|
integration
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
Here is a src/Make.py command which will perform a parallel build of a
|
|
LAMMPS executable "lmp_mpi" with all the packages needed by all the
|
|
examples. This assumes you have an MPI installed on your machine so
|
|
that "mpicxx" can be used as the wrapper compiler. It also assumes
|
|
you have an Intel compiler to use as the base compiler. You can leave
|
|
off the "-cc mpi wrap=icc" switch if that is not the case. You can
|
|
also leave off the "-fft fftw3" switch if you do not have the FFTW
|
|
(v3) installed as an FFT package, in which case the default KISS FFT
|
|
library will be used.
|
|
|
|
cd src
|
|
Make.py -j 16 -p none molecule manybody kspace granular orig \
|
|
-cc mpi wrap=icc -fft fftw3 -a file mpi
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
Here is how to run each problem, assuming the LAMMPS executable is
|
|
named lmp_mpi, and you are using the mpirun command to launch parallel
|
|
runs:
|
|
|
|
Serial (one processor runs):
|
|
|
|
lmp_mpi < in.lj
|
|
lmp_mpi < in.chain
|
|
lmp_mpi < in.eam
|
|
lmp_mpi < in.chute
|
|
lmp_mpi < in.rhodo
|
|
|
|
Parallel fixed-size runs (on 8 procs in this case):
|
|
|
|
mpirun -np 8 lmp_mpi < in.lj
|
|
mpirun -np 8 lmp_mpi < in.chain
|
|
mpirun -np 8 lmp_mpi < in.eam
|
|
mpirun -np 8 lmp_mpi < in.chute
|
|
mpirun -np 8 lmp_mpi < in.rhodo
|
|
|
|
Parallel scaled-size runs (on 16 procs in this case):
|
|
|
|
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.lj
|
|
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.chain.scaled
|
|
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.eam
|
|
mpirun -np 16 lmp_mpi -var x 4 -var y 4 < in.chute.scaled
|
|
mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 < in.rhodo.scaled
|
|
|
|
For each of the scaled-size runs you must set 3 variables as -var
|
|
command line switches. The variables x,y,z are used in the input
|
|
scripts to scale up the problem size in each dimension. Imagine the P
|
|
processors arrayed as a 3d grid, so that P = Px * Py * Pz. For P =
|
|
16, you might use Px = 2, Py = 2, Pz = 4. To scale up equally in all
|
|
dimensions you roughly want Px = Py = Pz. Using the var switches, set
|
|
x = Px, y = Py, and z = Pz.
|
|
|
|
For Chute runs, you must have Pz = 1. Therefore P = Px * Py and you
|
|
only need to set variables x and y.
|