forked from lijiext/lammps
git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12394 f3b2605a-c512-4ea7-a41b-209d697bcdaa
This commit is contained in:
parent
f3e11380a7
commit
0c0e5c356f
|
@ -1,68 +1,63 @@
|
|||
These are input scripts used to run versions of several of the
|
||||
benchmarks in the top-level bench directory using the GPU and
|
||||
USER-CUDA accelerator packages. The results of running these scripts
|
||||
on two different machines (a desktop with 2 Tesla GPUs and the ORNL
|
||||
Titan supercomputer) are shown on the "GPU (Fermi)" section of the
|
||||
Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench.
|
||||
These are build and input and run scripts used to run the LJ benchmark
|
||||
in the top-level bench directory using all the various accelerator
|
||||
packages currently available in LAMMPS. The results of running these
|
||||
benchmarks on a GPU cluster with Kepler GPUs are shown on the "GPU
|
||||
(Kepler)" section of the Benchmark page of the LAMMPS WWW site:
|
||||
lammps.sandia.gov/bench.
|
||||
|
||||
Examples are shown below of how to run these scripts. This assumes
|
||||
you have built 3 executables with both the GPU and USER-CUDA packages
|
||||
installed, e.g.
|
||||
The specifics of the benchmark machine are as follows:
|
||||
|
||||
lmp_linux_single
|
||||
lmp_linux_mixed
|
||||
lmp_linux_double
|
||||
|
||||
The precision (single, mixed, double) refers to the GPU and USER-CUDA
|
||||
pacakge precision. See the README files in the lib/gpu and lib/cuda
|
||||
directories for instructions on how to build the packages with
|
||||
different precisions. The doc/Section_accelerate.html file also has a
|
||||
summary description.
|
||||
It is a small GPU cluster at Sandia National Labs called "shannon". It
|
||||
has 32 nodes, each with two 8-core Sandy Bridge Xeon CPUs (E5-2670,
|
||||
2.6GHz, HT deactivated), for a total of 512 cores. Twenty-four of the
|
||||
nodes have two NVIDIA Kepler GPUs (K20x, 2688 732 MHz cores). LAMMPS
|
||||
was compiled with the Intel icc compiler, using module
|
||||
openmpi/1.8.1/intel/13.1.SP1.106/cuda/6.0.37.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
If the script has "cpu" in its name, it is meant to be run in CPU-only
|
||||
mode (without using the GPU or USER-CUDA styles). For example:
|
||||
You can of course build LAMMPS yourself with any of the accelerator
|
||||
packages for your platform.
|
||||
|
||||
mpirun -np 1 ../lmp_linux_double -v x 8 -v y 8 -v z 8 -v t 100 < in.lj.cpu
|
||||
mpirun -np 12 ../lmp_linux_double -v x 16 -v y 16 -v z 16 -v t 100 < in.lj.cpu
|
||||
The build.py script will build LAMMPS for the various accelerlator
|
||||
packages using the Makefile.* files in this dir, which you can edit if
|
||||
necessary for your platform. You must set the "lmpdir" variable at
|
||||
the top of build.py to the home directory of LAMMPS as installed on
|
||||
your system. Then typing, for example,
|
||||
|
||||
The "xyz" settings determine the problem size. The "t" setting
|
||||
determines the number of timesteps.
|
||||
python build.py cpu gpu
|
||||
|
||||
will build executables for the CPU (no accelerators), and 3 GPU
|
||||
variants (double, mixed, single precision). See the list
|
||||
of possible targets at the top of the build.py script.
|
||||
|
||||
Note that the build.py script will un-install all packages in LAMMPS,
|
||||
then only install the ones needed for the benchmark. The Makefile.*
|
||||
files in this dir are copied into lammps/src/MAKE, as a dummy
|
||||
Makefile.foo, so they will not conflict with makefiles that may
|
||||
already be there. The build.py script also builds the auxiliary
|
||||
GPU and USER-CUDA library as needed.
|
||||
|
||||
The various LAMMPS executables are copied into this directory
|
||||
when the build.py script finishes each build.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
If the script has "gpu" in its name, it is meant to be run using
|
||||
the GPU package. For example:
|
||||
|
||||
mpirun -np 12 ../lmp_linux_single -sf gpu -v g 1 -v x 32 -v y 32 -v z 64 -v t 100 < in.lj.gpu
|
||||
|
||||
mpirun -np 8 ../lmp_linux_mixed -sf gpu -v g 2 -v x 32 -v y 32 -v z 64 -v t 100 < in.lj.gpu
|
||||
|
||||
The "xyz" settings determine the problem size. The "t" setting
|
||||
determines the number of timesteps. The "np" setting determines how
|
||||
many MPI tasks per compute node the problem will run on, and the "g"
|
||||
setting determines how many GPUs per compute node the problem will run
|
||||
on, i.e. 1 or 2 in this case. Note that you can use more MPI tasks
|
||||
than GPUs (both per compute node) with the GPU package.
|
||||
The in.* files have settings for the benchmark appopriate to each
|
||||
accelerator package. Many of them, including the problem size,
|
||||
and number of timsteps, must be set as command-line arguments
|
||||
when the input script is run.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
If the script has "cuda" in its name, it is meant to be run using
|
||||
the USER-CUDA package. For example:
|
||||
The run*.sh scripts have sample mpirun commands for running the input
|
||||
scripts on a single node. These are provided for illustration
|
||||
purposes, to show what command-line arguments are used with each
|
||||
accelerator package, in combination with settings in the input scripts
|
||||
themselves.
|
||||
|
||||
mpirun -np 1 ../lmp_linux_single -c on -sf cuda -v g 1 -v x 16 -v y 16 -v z 16 -v t 100 < in.lj.cuda
|
||||
|
||||
mpirun -np 2 ../lmp_linux_double -c on -sf cuda -v g 2 -v x 32 -v y 64 -v z 64 -v t 100 < in.eam.cuda
|
||||
|
||||
The "xyz" settings determine the problem size. The "t" setting
|
||||
determines the number of timesteps. The "np" setting determines how
|
||||
many MPI tasks per compute node the problem will run on, and the "g"
|
||||
setting determines how many GPUs per compute node the problem will run
|
||||
on, i.e. 1 or 2 in this case. For the USER-CUDA package, the number
|
||||
of MPI tasks and GPUs (both per compute node) must be equal.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
If the script has "titan" in its name, it was run on the Titan supercomputer
|
||||
at ORNL.
|
||||
Note that we generate these run scripts, either for interactive or
|
||||
batch submission, via Python scripts which produce a long list of runs
|
||||
to exercise a combination of options. To perform a quick benchmark
|
||||
calculation on your platform, you will typically only want to run a
|
||||
few commands out of any run*.sh script
|
||||
|
|
Loading…
Reference in New Issue