git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12394 f3b2605a-c512-4ea7-a41b-209d697bcdaa

This commit is contained in:
sjplimp 2014-09-04 15:39:04 +00:00
parent f3e11380a7
commit 0c0e5c356f
1 changed files with 49 additions and 54 deletions

View File

@ -1,68 +1,63 @@
These are input scripts used to run versions of several of the
benchmarks in the top-level bench directory using the GPU and
USER-CUDA accelerator packages. The results of running these scripts
on two different machines (a desktop with 2 Tesla GPUs and the ORNL
Titan supercomputer) are shown on the "GPU (Fermi)" section of the
Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench.
These are build and input and run scripts used to run the LJ benchmark
in the top-level bench directory using all the various accelerator
packages currently available in LAMMPS. The results of running these
benchmarks on a GPU cluster with Kepler GPUs are shown on the "GPU
(Kepler)" section of the Benchmark page of the LAMMPS WWW site:
lammps.sandia.gov/bench.
Examples are shown below of how to run these scripts. This assumes
you have built 3 executables with both the GPU and USER-CUDA packages
installed, e.g.
The specifics of the benchmark machine are as follows:
lmp_linux_single
lmp_linux_mixed
lmp_linux_double
The precision (single, mixed, double) refers to the GPU and USER-CUDA
pacakge precision. See the README files in the lib/gpu and lib/cuda
directories for instructions on how to build the packages with
different precisions. The doc/Section_accelerate.html file also has a
summary description.
It is a small GPU cluster at Sandia National Labs called "shannon". It
has 32 nodes, each with two 8-core Sandy Bridge Xeon CPUs (E5-2670,
2.6GHz, HT deactivated), for a total of 512 cores. Twenty-four of the
nodes have two NVIDIA Kepler GPUs (K20x, 2688 732 MHz cores). LAMMPS
was compiled with the Intel icc compiler, using module
openmpi/1.8.1/intel/13.1.SP1.106/cuda/6.0.37.
------------------------------------------------------------------------
If the script has "cpu" in its name, it is meant to be run in CPU-only
mode (without using the GPU or USER-CUDA styles). For example:
You can of course build LAMMPS yourself with any of the accelerator
packages for your platform.
mpirun -np 1 ../lmp_linux_double -v x 8 -v y 8 -v z 8 -v t 100 < in.lj.cpu
mpirun -np 12 ../lmp_linux_double -v x 16 -v y 16 -v z 16 -v t 100 < in.lj.cpu
The build.py script will build LAMMPS for the various accelerlator
packages using the Makefile.* files in this dir, which you can edit if
necessary for your platform. You must set the "lmpdir" variable at
the top of build.py to the home directory of LAMMPS as installed on
your system. Then typing, for example,
The "xyz" settings determine the problem size. The "t" setting
determines the number of timesteps.
python build.py cpu gpu
will build executables for the CPU (no accelerators), and 3 GPU
variants (double, mixed, single precision). See the list
of possible targets at the top of the build.py script.
Note that the build.py script will un-install all packages in LAMMPS,
then only install the ones needed for the benchmark. The Makefile.*
files in this dir are copied into lammps/src/MAKE, as a dummy
Makefile.foo, so they will not conflict with makefiles that may
already be there. The build.py script also builds the auxiliary
GPU and USER-CUDA library as needed.
The various LAMMPS executables are copied into this directory
when the build.py script finishes each build.
------------------------------------------------------------------------
If the script has "gpu" in its name, it is meant to be run using
the GPU package. For example:
mpirun -np 12 ../lmp_linux_single -sf gpu -v g 1 -v x 32 -v y 32 -v z 64 -v t 100 < in.lj.gpu
mpirun -np 8 ../lmp_linux_mixed -sf gpu -v g 2 -v x 32 -v y 32 -v z 64 -v t 100 < in.lj.gpu
The "xyz" settings determine the problem size. The "t" setting
determines the number of timesteps. The "np" setting determines how
many MPI tasks per compute node the problem will run on, and the "g"
setting determines how many GPUs per compute node the problem will run
on, i.e. 1 or 2 in this case. Note that you can use more MPI tasks
than GPUs (both per compute node) with the GPU package.
The in.* files have settings for the benchmark appopriate to each
accelerator package. Many of them, including the problem size,
and number of timsteps, must be set as command-line arguments
when the input script is run.
------------------------------------------------------------------------
If the script has "cuda" in its name, it is meant to be run using
the USER-CUDA package. For example:
The run*.sh scripts have sample mpirun commands for running the input
scripts on a single node. These are provided for illustration
purposes, to show what command-line arguments are used with each
accelerator package, in combination with settings in the input scripts
themselves.
mpirun -np 1 ../lmp_linux_single -c on -sf cuda -v g 1 -v x 16 -v y 16 -v z 16 -v t 100 < in.lj.cuda
mpirun -np 2 ../lmp_linux_double -c on -sf cuda -v g 2 -v x 32 -v y 64 -v z 64 -v t 100 < in.eam.cuda
The "xyz" settings determine the problem size. The "t" setting
determines the number of timesteps. The "np" setting determines how
many MPI tasks per compute node the problem will run on, and the "g"
setting determines how many GPUs per compute node the problem will run
on, i.e. 1 or 2 in this case. For the USER-CUDA package, the number
of MPI tasks and GPUs (both per compute node) must be equal.
------------------------------------------------------------------------
If the script has "titan" in its name, it was run on the Titan supercomputer
at ORNL.
Note that we generate these run scripts, either for interactive or
batch submission, via Python scripts which produce a long list of runs
to exercise a combination of options. To perform a quick benchmark
calculation on your platform, you will typically only want to run a
few commands out of any run*.sh script