forked from lijiext/lammps
205 lines
9.0 KiB
Plaintext
205 lines
9.0 KiB
Plaintext
--------------------------------
|
||
LAMMPS ACCELERATOR LIBRARY
|
||
--------------------------------
|
||
|
||
W. Michael Brown (ORNL)
|
||
Trung Dac Nguyen (ORNL/Northwestern)
|
||
Peng Wang (NVIDIA)
|
||
Axel Kohlmeyer (Temple)
|
||
Steve Plimpton (SNL)
|
||
Inderaj Bains (NVIDIA)
|
||
|
||
-------------------------------------------------------------------
|
||
|
||
This directory has source files to build a library that LAMMPS
|
||
links against when using the GPU package.
|
||
|
||
This library must be built with a C++ compiler, before LAMMPS is
|
||
built, so LAMMPS can link against it.
|
||
|
||
You can type "make lib-gpu" from the src directory to see help on how
|
||
to build this library via make commands, or you can do the same thing
|
||
by typing "python Install.py" from within this directory, or you can
|
||
do it manually by following the instructions below.
|
||
|
||
Build the library using one of the provided Makefile.* files or create
|
||
your own, specific to your compiler and system. For example:
|
||
|
||
make -f Makefile.linux
|
||
|
||
When you are done building this library, two files should
|
||
exist in this directory:
|
||
|
||
libgpu.a the library LAMMPS will link against
|
||
Makefile.lammps settings the LAMMPS Makefile will import
|
||
|
||
Makefile.lammps is created by the make command, by copying one of the
|
||
Makefile.lammps.* files. See the EXTRAMAKE setting at the top of the
|
||
Makefile.* files.
|
||
|
||
IMPORTANT: You should examine the final Makefile.lammps to insure it is
|
||
correct for your system, else the LAMMPS build can fail.
|
||
|
||
IMPORTANT: If you re-build the library, e.g. for a different precision
|
||
(see below), you should do a "make clean" first, e.g. make -f
|
||
Makefile.linux clean, to insure all previous derived files are removed
|
||
before the new build is done.
|
||
|
||
Makefile.lammps has settings for 3 variables:
|
||
|
||
user-gpu_SYSINC = leave blank for this package
|
||
user-gpu_SYSLIB = CUDA libraries needed by this package
|
||
user-gpu_SYSPATH = path(s) to where those libraries are
|
||
|
||
Because you have the CUDA compilers on your system, you should have
|
||
the needed libraries. If the CUDA development tools were installed
|
||
in the standard manner, the settings in the Makefile.lammps.standard
|
||
file should work.
|
||
|
||
-------------------------------------------------------------------
|
||
|
||
GENERAL NOTES
|
||
--------------------------------
|
||
|
||
This library, libgpu.a, provides routines for GPU acceleration
|
||
of certain LAMMPS styles and neighbor list builds. Compilation of this
|
||
library requires installing the CUDA GPU driver and CUDA toolkit for
|
||
your operating system. Installation of the CUDA SDK is not necessary.
|
||
In addition to the LAMMPS library, the binary nvc_get_devices will also
|
||
be built. This can be used to query the names and properties of GPU
|
||
devices on your system. A Makefile for OpenCL and ROCm HIP compilation
|
||
is provided, but support for it is not currently provided by the developers.
|
||
Details of the implementation are provided in:
|
||
|
||
----
|
||
|
||
Brown, W.M., Wang, P. Plimpton, S.J., Tharrington, A.N. Implementing
|
||
Molecular Dynamics on Hybrid High Performance Computers - Short Range
|
||
Forces. Computer Physics Communications. 2011. 182: p. 898-911.
|
||
|
||
and
|
||
|
||
Brown, W.M., Kohlmeyer, A. Plimpton, S.J., Tharrington, A.N. Implementing
|
||
Molecular Dynamics on Hybrid High Performance Computers - Particle-Particle
|
||
Particle-Mesh. Computer Physics Communications. 2012. 183: p. 449-459.
|
||
|
||
and
|
||
|
||
Brown, W.M., Masako, Y. Implementing Molecular Dynamics on Hybrid High
|
||
Performance Computers - Three-Body Potentials. Computer Physics Communications.
|
||
2013. 184: p. 2785–2793.
|
||
|
||
----
|
||
|
||
NOTE: Installation of the CUDA SDK is not required, only the CUDA
|
||
toolkit itself or an OpenCL 1.2 compatible header and library.
|
||
|
||
Pair styles supporting GPU acceleration this this library
|
||
are marked in the list of Pair style potentials with a "g".
|
||
See the online version at: https://lammps.sandia.gov/doc/Commands_pair.html
|
||
|
||
In addition the (plain) pppm kspace style is supported as well.
|
||
|
||
|
||
MULTIPLE LAMMPS PROCESSES
|
||
--------------------------------
|
||
|
||
Multiple LAMMPS MPI processes can share GPUs on the system, but multiple
|
||
GPUs cannot be utilized by a single MPI process. In many cases, the
|
||
best performance will be obtained by running as many MPI processes as
|
||
CPU cores available with the condition that the number of MPI processes
|
||
is an integer multiple of the number of GPUs being used. See the
|
||
LAMMPS user manual for details on running with GPU acceleration.
|
||
|
||
|
||
BUILDING AND PRECISION MODES
|
||
--------------------------------
|
||
|
||
To build, edit the CUDA_ARCH, CUDA_PRECISION, CUDA_HOME variables in one of
|
||
the Makefiles. CUDA_ARCH should be set based on the compute capability of
|
||
your GPU. This can be verified by running the nvc_get_devices executable after
|
||
the build is complete. Additionally, the GPU package must be installed and
|
||
compiled for LAMMPS. This may require editing the gpu_SYSPATH variable in the
|
||
LAMMPS makefile.
|
||
|
||
Please note that the GPU library accesses the CUDA driver library directly,
|
||
so it needs to be linked not only to the CUDA runtime library (libcudart.so)
|
||
that ships with the CUDA toolkit, but also with the CUDA driver library
|
||
(libcuda.so) that ships with the Nvidia driver. If you are compiling LAMMPS
|
||
on the head node of a GPU cluster, this library may not be installed,
|
||
so you may need to copy it over from one of the compute nodes (best into
|
||
this directory). Recent CUDA toolkits starting from CUDA 9 provide a dummy
|
||
libcuda.so library (typically under $(CUDA_HOME)/lib64/stubs), that can be used for
|
||
linking.
|
||
|
||
The gpu library supports 3 precision modes as determined by
|
||
the CUDA_PRECISION variable:
|
||
|
||
CUDA_PRECISION = -D_SINGLE_SINGLE # Single precision for all calculations
|
||
CUDA_PRECISION = -D_DOUBLE_DOUBLE # Double precision for all calculations
|
||
CUDA_PRECISION = -D_SINGLE_DOUBLE # Accumulation of forces, etc. in double
|
||
|
||
As of CUDA 7.5 only GPUs with compute capability 2.0 (Fermi) or newer are
|
||
supported and as of CUDA 9.0 only compute capability 3.0 (Kepler) or newer
|
||
are supported. There are some limitations of this library for GPUs older
|
||
than that, which require additional preprocessor flag, and limit features,
|
||
but they are kept for historical reasons. There is no value in trying to
|
||
use those GPUs for production calculations.
|
||
|
||
You have to make sure that you set a CUDA_ARCH line suitable for your
|
||
hardware and CUDA toolkit version: e.g. -arch=sm_35 for Tesla K20 or K40
|
||
or -arch=sm_52 GeForce GTX Titan X. A detailed list of GPU architectures
|
||
and CUDA compatible GPUs can be found e.g. here:
|
||
https://en.wikipedia.org/wiki/CUDA#GPUs_supported
|
||
|
||
NOTE: when compiling with CMake, all of the considerations listed below
|
||
are considered within the CMake configuration process, so no separate
|
||
compilation of the gpu library is required. Also this will build in support
|
||
for all compute architecture that are supported by the CUDA toolkit version
|
||
used to build the gpu library.
|
||
|
||
Please note the CUDA_CODE settings in Makefile.linux_multi, which allows
|
||
to compile this library with support for multiple GPUs. This list can be
|
||
extended for newer GPUs with newer CUDA toolkits and should allow to build
|
||
a single GPU library compatible with all GPUs that are worth using for
|
||
GPU acceleration and supported by the current CUDA toolkits and drivers.
|
||
|
||
NOTE: The system-specific setting LAMMPS_SMALLBIG (default), LAMMPS_BIGBIG,
|
||
or LAMMPS_SMALLSMALL if specified when building LAMMPS (i.e. in
|
||
src/MAKE/Makefile.foo) should be consistent with that specified
|
||
when building libgpu.a (i.e. by LMP_INC in the lib/gpu/Makefile.bar).
|
||
|
||
BUILDING FOR HIP FRAMEWORK
|
||
--------------------------------
|
||
1. Install the latest ROCm framework (https://github.com/RadeonOpenCompute/ROCm).
|
||
2. GPU sorting requires installing hipcub
|
||
(https://github.com/ROCmSoftwarePlatform/hipCUB). The HIP CUDA-backend
|
||
additionally requires cub (https://nvlabs.github.io/cub). Download and
|
||
extract the cub directory to lammps/lib/gpu/ or specify an appropriate
|
||
path in lammps/lib/gpu/Makefile.hip.
|
||
3. In Makefile.hip it is possible to specify the target platform via
|
||
export HIP_PLATFORM=hcc or HIP_PLATFORM=nvcc as well as the target
|
||
architecture (gfx803, gfx900, gfx906 etc.)
|
||
4. If your MPI implementation does not support `mpicxx --showme` command,
|
||
it is required to specify the corresponding MPI compiler and linker flags
|
||
in lammps/lib/gpu/Makefile.hip and in lammps/src/MAKE/OPTIONS/Makefile.hip.
|
||
5. Building the GPU library (libgpu.a):
|
||
cd lammps/lib/gpu; make -f Makefile.hip -j
|
||
6. Building the LAMMPS executable (lmp_hip):
|
||
cd ../../src; make hip -j
|
||
|
||
EXAMPLE CONVENTIONAL BUILD PROCESS
|
||
--------------------------------
|
||
|
||
cd ~/lammps/lib/gpu
|
||
emacs Makefile.linux
|
||
make -f Makefile.linux
|
||
./nvc_get_devices
|
||
cd ../../src
|
||
emacs ./MAKE/Makefile.linux
|
||
make yes-asphere
|
||
make yes-kspace
|
||
make yes-gpu
|
||
make linux
|
||
|