2014-09-04 22:28:10 +08:00
|
|
|
These are input scripts used to run versions of several of the
|
|
|
|
benchmarks in the top-level bench directory using the GPU and
|
|
|
|
USER-CUDA accelerator packages. The results of running these scripts
|
|
|
|
on two different machines (a desktop with 2 Tesla GPUs and the ORNL
|
|
|
|
Titan supercomputer) are shown on the "GPU (Fermi)" section of the
|
|
|
|
Benchmark page of the LAMMPS WWW site: lammps.sandia.gov/bench.
|
2012-07-04 06:16:40 +08:00
|
|
|
|
|
|
|
Examples are shown below of how to run these scripts. This assumes
|
|
|
|
you have built 3 executables with both the GPU and USER-CUDA packages
|
|
|
|
installed, e.g.
|
|
|
|
|
|
|
|
lmp_linux_single
|
|
|
|
lmp_linux_mixed
|
|
|
|
lmp_linux_double
|
|
|
|
|
|
|
|
The precision (single, mixed, double) refers to the GPU and USER-CUDA
|
|
|
|
pacakge precision. See the README files in the lib/gpu and lib/cuda
|
|
|
|
directories for instructions on how to build the packages with
|
2014-09-12 00:00:27 +08:00
|
|
|
different precisions. The GPU and USER-CUDA sub-sections of the
|
|
|
|
doc/Section_accelerate.html file also describes this process.
|
2012-07-04 06:16:40 +08:00
|
|
|
|
2015-01-10 00:16:10 +08:00
|
|
|
Make.py -d ~/lammps -j 16 -p #all orig -m linux -o cpu -a exe
|
|
|
|
Make.py -d ~/lammps -j 16 -p #all opt orig -m linux -o opt -a exe
|
|
|
|
Make.py -d ~/lammps -j 16 -p #all omp orig -m linux -o omp -a exe
|
2014-10-07 23:24:30 +08:00
|
|
|
Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
|
2015-01-10 00:16:10 +08:00
|
|
|
-gpu mode=double arch=20 -o gpu_double -a libs exe
|
2014-10-07 23:24:30 +08:00
|
|
|
Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
|
2015-01-10 00:16:10 +08:00
|
|
|
-gpu mode=mixed arch=20 -o gpu_mixed -a libs exe
|
2014-10-07 23:24:30 +08:00
|
|
|
Make.py -d ~/lammps -j 16 -p #all gpu orig -m linux \
|
2015-01-10 00:16:10 +08:00
|
|
|
-gpu mode=single arch=20 -o gpu_single -a libs exe
|
2014-10-07 23:24:30 +08:00
|
|
|
Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
|
2015-01-10 00:16:10 +08:00
|
|
|
-cuda mode=double arch=20 -o cuda_double -a libs exe
|
2014-10-07 23:24:30 +08:00
|
|
|
Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
|
2015-01-10 00:16:10 +08:00
|
|
|
-cuda mode=mixed arch=20 -o cuda_mixed -a libs exe
|
2014-10-07 23:24:30 +08:00
|
|
|
Make.py -d ~/lammps -j 16 -p #all cuda orig -m linux \
|
2015-01-10 00:16:10 +08:00
|
|
|
-cuda mode=single arch=20 -o cuda_single -a libs exe
|
|
|
|
Make.py -d ~/lammps -j 16 -p #all intel orig -m linux -o intel_cpu -a exe
|
|
|
|
Make.py -d ~/lammps -j 16 -p #all kokkos orig -m linux -o kokkos_omp -a exe
|
2014-10-07 23:24:30 +08:00
|
|
|
Make.py -d ~/lammps -j 16 -p #all kokkos orig -kokkos cuda arch=20 \
|
2015-01-10 00:16:10 +08:00
|
|
|
-m cuda -o kokkos_cuda -a exe
|
2014-10-07 23:24:30 +08:00
|
|
|
|
|
|
|
Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \
|
|
|
|
-gpu mode=double arch=20 -cuda mode=double arch=20 -m linux \
|
2015-01-10 00:16:10 +08:00
|
|
|
-o all -a libs exe
|
2014-10-07 23:24:30 +08:00
|
|
|
|
|
|
|
Make.py -d ~/lammps -j 16 -p #all opt omp gpu cuda intel kokkos orig \
|
|
|
|
-kokkos cuda arch=20 -gpu mode=double arch=20 \
|
2015-01-10 00:16:10 +08:00
|
|
|
-cuda mode=double arch=20 -m cuda -o all_cuda -a libs exe
|
2014-10-07 23:24:30 +08:00
|
|
|
|
2012-07-04 06:16:40 +08:00
|
|
|
------------------------------------------------------------------------
|
|
|
|
|
2014-09-12 00:00:27 +08:00
|
|
|
To run on just CPUs (without using the GPU or USER-CUDA styles),
|
|
|
|
do something like the following:
|
2012-07-04 06:16:40 +08:00
|
|
|
|
2014-09-12 00:00:27 +08:00
|
|
|
mpirun -np 1 lmp_linux_double -v x 8 -v y 8 -v z 8 -v t 100 < in.lj
|
2014-09-12 00:03:27 +08:00
|
|
|
mpirun -np 12 lmp_linux_double -v x 16 -v y 16 -v z 16 -v t 100 < in.eam
|
2012-07-04 06:16:40 +08:00
|
|
|
|
|
|
|
The "xyz" settings determine the problem size. The "t" setting
|
|
|
|
determines the number of timesteps.
|
|
|
|
|
2014-09-12 00:00:27 +08:00
|
|
|
These mpirun commands run on a single node. To run on multiple
|
|
|
|
nodes, scale up the "-np" setting.
|
2012-07-04 06:16:40 +08:00
|
|
|
|
2014-09-12 00:00:27 +08:00
|
|
|
------------------------------------------------------------------------
|
2012-07-04 06:16:40 +08:00
|
|
|
|
2014-09-12 00:00:27 +08:00
|
|
|
To run with the GPU package, do something like the following:
|
2012-07-04 06:16:40 +08:00
|
|
|
|
2014-09-12 00:03:27 +08:00
|
|
|
mpirun -np 12 lmp_linux_single -sf gpu -v x 32 -v y 32 -v z 64 -v t 100 < in.lj
|
|
|
|
mpirun -np 8 lmp_linux_mixed -sf gpu -pk gpu 2 -v x 32 -v y 32 -v z 64 -v t 100 < in.eam
|
2012-07-04 06:16:40 +08:00
|
|
|
|
|
|
|
The "xyz" settings determine the problem size. The "t" setting
|
|
|
|
determines the number of timesteps. The "np" setting determines how
|
2014-09-12 00:03:27 +08:00
|
|
|
many MPI tasks (per node) the problem will run on. The numeric
|
|
|
|
argument to the "-pk" setting is the number of GPUs (per node); 1 GPU
|
|
|
|
is the default. Note that you can use more MPI tasks than GPUs (per
|
|
|
|
node) with the GPU package.
|
2014-09-12 00:00:27 +08:00
|
|
|
|
2014-09-12 00:03:27 +08:00
|
|
|
These mpirun commands run on a single node. To run on multiple nodes,
|
|
|
|
scale up the "-np" setting, and control the number of MPI tasks per
|
|
|
|
node via a "-ppn" setting.
|
2012-07-04 06:16:40 +08:00
|
|
|
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
|
2014-09-12 00:00:27 +08:00
|
|
|
To run with the USER-CUDA package, do something like the following:
|
|
|
|
|
2014-09-12 00:03:27 +08:00
|
|
|
mpirun -np 1 lmp_linux_single -c on -sf cuda -v x 16 -v y 16 -v z 16 -v t 100 < in.lj
|
|
|
|
mpirun -np 2 lmp_linux_double -c on -sf cuda -pk cuda 2 -v x 32 -v y 64 -v z 64 -v t 100 < in.eam
|
2012-07-04 06:16:40 +08:00
|
|
|
|
|
|
|
The "xyz" settings determine the problem size. The "t" setting
|
|
|
|
determines the number of timesteps. The "np" setting determines how
|
2014-09-12 00:03:27 +08:00
|
|
|
many MPI tasks (per node) the problem will run on. The numeric
|
|
|
|
argument to the "-pk" setting is the number of GPUs (per node); 1 GPU
|
|
|
|
is the default. Note that the number of MPI tasks must equal the
|
|
|
|
number of GPUs (both per node) with the USER-CUDA package.
|
|
|
|
|
|
|
|
These mpirun commands run on a single node. To run on multiple nodes,
|
|
|
|
scale up the "-np" setting, and control the number of MPI tasks per
|
|
|
|
node via a "-ppn" setting.
|
2014-09-12 00:00:27 +08:00
|
|
|
|
2014-09-04 22:28:10 +08:00
|
|
|
------------------------------------------------------------------------
|
|
|
|
|
2014-09-12 00:00:27 +08:00
|
|
|
If the script has "titan" in its name, it was run on the Titan
|
|
|
|
supercomputer at ORNL.
|