forked from lijiext/lammps
git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@3060 f3b2605a-c512-4ea7-a41b-209d697bcdaa
This commit is contained in:
parent
ea26344b52
commit
bff67c37d7
|
@ -748,35 +748,83 @@ communication, roughly 75% in the example above.
|
|||
|
||||
<H4><A NAME = "2_8"></A>2.8 Running on GPUs
|
||||
</H4>
|
||||
<P> 1. I assume you have nvidia card ( cat /proc/driver/nvidia/cards/0 )
|
||||
2. Go to http://www.nvidia.com/object/cuda_get.html
|
||||
3. Install latest driver and toolkit (SDK is not necessary)
|
||||
4. Run make in lib/gpu
|
||||
5. ./nvc_get_devices will list supported devices and properties
|
||||
<P>A few LAMMPS <A HREF = "pair_style.html">pair styles</A> can be run on graphical
|
||||
processing units (GPUs). We plan to add more over time. Currently,
|
||||
they only support NVIDIA GPU cards. To use them you need to install
|
||||
certain NVIDIA CUDA software on your system:
|
||||
</P>
|
||||
<P>The pair styles supporting GPU acceleration have optional arguments to
|
||||
determine how GPUs are selected. Each GPU is selected based on its ID
|
||||
number reported by the graphics driver. A list of supported GPUs can be
|
||||
obtained using the 'nvc_get_devices' executable that is built with the
|
||||
library. To change how GPUs are selected, a GPU selection mode and a GPU
|
||||
selection parameter are supplied. If the mode is "one_node", the GPU ID
|
||||
is equal to the process rank plus the GPU parameter. For example,
|
||||
<UL><LI>Check if you have an NVIDIA card: cat /proc/driver/nvidia/cards/0
|
||||
<LI>Go to http://www.nvidia.com/object/cuda_get.html
|
||||
<LI>Install a driver and toolkit appopriate for your system (SDK is not necessary)
|
||||
<LI>Run make in lammps/lib/gpu, editing a Makefile if necessary
|
||||
<LI>Run lammps/lib/gpu/nvc_get_devices to list supported devices and properties
|
||||
</UL>
|
||||
<H4>GPU hardware
|
||||
</H4>
|
||||
<P>When using GPUs, you are restricted to one physical GPU per LAMMPS
|
||||
process. This can be multiple GPUs on a single node or across
|
||||
multiple nodes. For each GPU pair style, the first two arguments (GPU
|
||||
mode followed by GPU parameter) control how GPUs are selected. If you
|
||||
are running on a single node, the mode is "one/node" and the parameter
|
||||
is the ID of the first GPU to select:
|
||||
</P>
|
||||
<P>pair_style lj/cut/gpu 2.5 one_node 0
|
||||
<PRE>pair_style lj/cut/gpu one/node 0 2.5
|
||||
</PRE>
|
||||
<P>The ID is the GPU ID reported by the driver for CUDA enabled graphics
|
||||
cards. For multiple GPU cards on a node, an MPI process should be run
|
||||
for each graphics card. In this case, each process will grab the GPU
|
||||
with ID equal to the process rank plus the GPU parameter.
|
||||
</P>
|
||||
<P>will select GPUs 0, 1, and 2 for a 3 process run. If the mode is
|
||||
"one_gpu", every process will select the same GPU id identified by the
|
||||
GPU parameter. This option can be used to select a single GPU on
|
||||
multiple nodes and should be run with exactly 1 process per node. If the
|
||||
mode is "multi_gpu" the GPU parameter supplied should be the number of
|
||||
GPUs per node. The GPU selected by a particular process will then be
|
||||
equal to the rank%gpus_per_node. This final option can be used to run on
|
||||
multiple nodes that contain multiple GPUs per node. It assumes that
|
||||
processes will be filled in order by slots such that the ranks on any
|
||||
given node are always sequential.
|
||||
<P>For multiple nodes with one GPU per node, the mode is "one/gpu" and
|
||||
the parameter is the ID of the GPU used on every node:
|
||||
</P>
|
||||
<P>double/single precision
|
||||
atom limits
|
||||
<PRE>pair_style lj/cut/gpu one/gpu 1 2.5
|
||||
</PRE>
|
||||
<P>In this case, MPI should be run with exactly one process per node.
|
||||
</P>
|
||||
<P>For multiple nodes with multiple GPUs, the mode is "multi/gpu" and the
|
||||
parameter is the number of GPUs per node:
|
||||
</P>
|
||||
<PRE>pair_style lj/cut/gpu multi/gpu 3 2.5
|
||||
</PRE>
|
||||
<P>In this case, LAMMPS will attempt to grab 3 GPUs per node and this
|
||||
requires that the number of processes per node be 3. The first GPU
|
||||
selected must have ID zero for this mode (in the example, GPUs 0, 1,
|
||||
and 2 will be selected on every node). An additional constraint is
|
||||
that the MPI processes must be filled by slot on each node such that
|
||||
the process ranks on each node are always sequential. This is a option
|
||||
for the MPI launcher (mpirun/mpiexec) and will be the default on many
|
||||
clusters.
|
||||
</P>
|
||||
<H4>GPU single vs double precision
|
||||
</H4>
|
||||
<P>See the lammps/lib/gpu/README for instructions on how to build the
|
||||
LAMMPS gpu library for single vs double precision. The latter
|
||||
requires that your GPU card supports double precision.
|
||||
</P>
|
||||
<H4>GPU Memory
|
||||
</H4>
|
||||
<P>Upon initialization of the pair style, LAMMPS will reserve memory for
|
||||
64K atoms per GPU or 70% of each card's GPU memory, whichever value is
|
||||
limiting. If the GPU library is compiled for double precision, the
|
||||
maximum number of atoms per GPU is 32K. When running a periodic
|
||||
system and/or in parallel, this maximum atom count includes ghost
|
||||
atoms.
|
||||
</P>
|
||||
<P>The value of 70% can be changed by editing the PERCENT_GPU_MEMORY
|
||||
definition in the appopriate lammps/lib/gpu source file. The value of
|
||||
64K cannot be increased and is the maximum number of atoms allowed per
|
||||
GPU. By default, enough memory to store at least the maximum number
|
||||
of neighbors per atom is reserved on the GPU, which is set by the
|
||||
<A HREF = "neigh_modify.html">neigh_modify one</A> command. The default value of
|
||||
2000 will be very high for many cases. If memory on the graphics card
|
||||
is limiting, the number of atoms allowed can be increased by
|
||||
decreasing the maximum number of neighbors. For example placing,
|
||||
</P>
|
||||
<PRE>neigh_modify one 100
|
||||
</PRE>
|
||||
<P>in the input script will decrease the maximum number of neighbors per
|
||||
atom to 100, allowing more atoms to be run on the GPU.
|
||||
</P>
|
||||
<HR>
|
||||
|
||||
|
|
|
@ -741,35 +741,83 @@ communication, roughly 75% in the example above.
|
|||
|
||||
2.8 Running on GPUs :h4,link(2_8)
|
||||
|
||||
1. I assume you have nvidia card ( cat /proc/driver/nvidia/cards/0 )
|
||||
2. Go to http://www.nvidia.com/object/cuda_get.html
|
||||
3. Install latest driver and toolkit (SDK is not necessary)
|
||||
4. Run make in lib/gpu
|
||||
5. ./nvc_get_devices will list supported devices and properties
|
||||
A few LAMMPS "pair styles"_pair_style.html can be run on graphical
|
||||
processing units (GPUs). We plan to add more over time. Currently,
|
||||
they only support NVIDIA GPU cards. To use them you need to install
|
||||
certain NVIDIA CUDA software on your system:
|
||||
|
||||
The pair styles supporting GPU acceleration have optional arguments to
|
||||
determine how GPUs are selected. Each GPU is selected based on its ID
|
||||
number reported by the graphics driver. A list of supported GPUs can be
|
||||
obtained using the 'nvc_get_devices' executable that is built with the
|
||||
library. To change how GPUs are selected, a GPU selection mode and a GPU
|
||||
selection parameter are supplied. If the mode is "one_node", the GPU ID
|
||||
is equal to the process rank plus the GPU parameter. For example,
|
||||
Check if you have an NVIDIA card: cat /proc/driver/nvidia/cards/0
|
||||
Go to http://www.nvidia.com/object/cuda_get.html
|
||||
Install a driver and toolkit appopriate for your system (SDK is not necessary)
|
||||
Run make in lammps/lib/gpu, editing a Makefile if necessary
|
||||
Run lammps/lib/gpu/nvc_get_devices to list supported devices and properties :ul
|
||||
|
||||
pair_style lj/cut/gpu 2.5 one_node 0
|
||||
GPU hardware :h4
|
||||
|
||||
will select GPUs 0, 1, and 2 for a 3 process run. If the mode is
|
||||
"one_gpu", every process will select the same GPU id identified by the
|
||||
GPU parameter. This option can be used to select a single GPU on
|
||||
multiple nodes and should be run with exactly 1 process per node. If the
|
||||
mode is "multi_gpu" the GPU parameter supplied should be the number of
|
||||
GPUs per node. The GPU selected by a particular process will then be
|
||||
equal to the rank%gpus_per_node. This final option can be used to run on
|
||||
multiple nodes that contain multiple GPUs per node. It assumes that
|
||||
processes will be filled in order by slots such that the ranks on any
|
||||
given node are always sequential.
|
||||
When using GPUs, you are restricted to one physical GPU per LAMMPS
|
||||
process. This can be multiple GPUs on a single node or across
|
||||
multiple nodes. For each GPU pair style, the first two arguments (GPU
|
||||
mode followed by GPU parameter) control how GPUs are selected. If you
|
||||
are running on a single node, the mode is "one/node" and the parameter
|
||||
is the ID of the first GPU to select:
|
||||
|
||||
double/single precision
|
||||
atom limits
|
||||
pair_style lj/cut/gpu one/node 0 2.5 :pre
|
||||
|
||||
The ID is the GPU ID reported by the driver for CUDA enabled graphics
|
||||
cards. For multiple GPU cards on a node, an MPI process should be run
|
||||
for each graphics card. In this case, each process will grab the GPU
|
||||
with ID equal to the process rank plus the GPU parameter.
|
||||
|
||||
For multiple nodes with one GPU per node, the mode is "one/gpu" and
|
||||
the parameter is the ID of the GPU used on every node:
|
||||
|
||||
pair_style lj/cut/gpu one/gpu 1 2.5 :pre
|
||||
|
||||
In this case, MPI should be run with exactly one process per node.
|
||||
|
||||
For multiple nodes with multiple GPUs, the mode is "multi/gpu" and the
|
||||
parameter is the number of GPUs per node:
|
||||
|
||||
pair_style lj/cut/gpu multi/gpu 3 2.5 :pre
|
||||
|
||||
In this case, LAMMPS will attempt to grab 3 GPUs per node and this
|
||||
requires that the number of processes per node be 3. The first GPU
|
||||
selected must have ID zero for this mode (in the example, GPUs 0, 1,
|
||||
and 2 will be selected on every node). An additional constraint is
|
||||
that the MPI processes must be filled by slot on each node such that
|
||||
the process ranks on each node are always sequential. This is a option
|
||||
for the MPI launcher (mpirun/mpiexec) and will be the default on many
|
||||
clusters.
|
||||
|
||||
GPU single vs double precision :h4
|
||||
|
||||
See the lammps/lib/gpu/README for instructions on how to build the
|
||||
LAMMPS gpu library for single vs double precision. The latter
|
||||
requires that your GPU card supports double precision.
|
||||
|
||||
GPU Memory :h4
|
||||
|
||||
Upon initialization of the pair style, LAMMPS will reserve memory for
|
||||
64K atoms per GPU or 70% of each card's GPU memory, whichever value is
|
||||
limiting. If the GPU library is compiled for double precision, the
|
||||
maximum number of atoms per GPU is 32K. When running a periodic
|
||||
system and/or in parallel, this maximum atom count includes ghost
|
||||
atoms.
|
||||
|
||||
The value of 70% can be changed by editing the PERCENT_GPU_MEMORY
|
||||
definition in the appopriate lammps/lib/gpu source file. The value of
|
||||
64K cannot be increased and is the maximum number of atoms allowed per
|
||||
GPU. By default, enough memory to store at least the maximum number
|
||||
of neighbors per atom is reserved on the GPU, which is set by the
|
||||
"neigh_modify one"_neigh_modify.html command. The default value of
|
||||
2000 will be very high for many cases. If memory on the graphics card
|
||||
is limiting, the number of atoms allowed can be increased by
|
||||
decreasing the maximum number of neighbors. For example placing,
|
||||
|
||||
neigh_modify one 100 :pre
|
||||
|
||||
in the input script will decrease the maximum number of neighbors per
|
||||
atom to 100, allowing more atoms to be run on the GPU.
|
||||
|
||||
:line
|
||||
|
||||
|
|
Loading…
Reference in New Issue