2014-09-10 23:32:24 +08:00
2015-07-30 22:53:28 +08:00
<!DOCTYPE html>
<!-- [if IE 8]><html class="no - js lt - ie9" lang="en" > <![endif] -->
<!-- [if gt IE 8]><! --> < html class = "no-js" lang = "en" > <!-- <![endif] -->
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1.0" >
< title > 5.USER-CUDA package — LAMMPS 15 May 2015 version documentation< / title >
2014-09-10 23:32:24 +08:00
2015-07-30 22:53:28 +08:00
2014-09-10 23:32:24 +08:00
2015-07-30 22:53:28 +08:00
2014-09-10 23:32:24 +08:00
2015-07-30 22:53:28 +08:00
2014-09-10 23:32:24 +08:00
2015-07-30 22:53:28 +08:00
2014-09-10 23:32:24 +08:00
2015-07-30 22:53:28 +08:00
< link rel = "stylesheet" href = "_static/css/theme.css" type = "text/css" / >
< link rel = "stylesheet" href = "_static/sphinxcontrib-images/LightBox2/lightbox2/css/lightbox.css" type = "text/css" / >
< link rel = "top" title = "LAMMPS 15 May 2015 version documentation" href = "index.html" / >
< script src = "_static/js/modernizr.min.js" > < / script >
< / head >
< body class = "wy-body-for-nav" role = "document" >
< div class = "wy-grid-for-nav" >
< nav data-toggle = "wy-nav-shift" class = "wy-nav-side" >
< div class = "wy-side-nav-search" >
< a href = "Manual.html" class = "icon icon-home" > LAMMPS
< / a >
< div role = "search" >
< form id = "rtd-search-form" class = "wy-form" action = "search.html" method = "get" >
< input type = "text" name = "q" placeholder = "Search docs" / >
< input type = "hidden" name = "check_keywords" value = "yes" / >
< input type = "hidden" name = "area" value = "default" / >
< / form >
< / div >
< / div >
< div class = "wy-menu wy-menu-vertical" data-spy = "affix" role = "navigation" aria-label = "main navigation" >
< ul >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_intro.html" > 1. Introduction< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_start.html" > 2. Getting Started< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_commands.html" > 3. Commands< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_packages.html" > 4. Packages< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_accelerate.html" > 5. Accelerating LAMMPS performance< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_howto.html" > 6. How-to discussions< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_example.html" > 7. Example problems< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_perf.html" > 8. Performance & scalability< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_tools.html" > 9. Additional tools< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_modify.html" > 10. Modifying & extending LAMMPS< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_python.html" > 11. Python interface to LAMMPS< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_errors.html" > 12. Errors< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_history.html" > 13. Future and history< / a > < / li >
< / ul >
< / div >
< / nav >
< section data-toggle = "wy-nav-shift" class = "wy-nav-content-wrap" >
< nav class = "wy-nav-top" role = "navigation" aria-label = "top navigation" >
< i data-toggle = "wy-nav-top" class = "fa fa-bars" > < / i >
< a href = "Manual.html" > LAMMPS< / a >
< / nav >
< div class = "wy-nav-content" >
< div class = "rst-content" >
< div role = "navigation" aria-label = "breadcrumbs navigation" >
< ul class = "wy-breadcrumbs" >
< li > < a href = "Manual.html" > Docs< / a > » < / li >
< li > 5.USER-CUDA package< / li >
< li class = "wy-breadcrumbs-aside" >
< a href = "http://lammps.sandia.gov" > Website< / a >
< a href = "Section_commands.html#comm" > Commands< / a >
< / li >
< / ul >
< hr / >
< / div >
< div role = "main" class = "document" itemscope = "itemscope" itemtype = "http://schema.org/Article" >
< div itemprop = "articleBody" >
< p > < a class = "reference internal" href = "Section_accelerate.html" > < em > Return to Section accelerate overview< / em > < / a > < / p >
< div class = "section" id = "user-cuda-package" >
< h1 > 5.USER-CUDA package< a class = "headerlink" href = "#user-cuda-package" title = "Permalink to this headline" > ¶< / a > < / h1 >
< p > The USER-CUDA package was developed by Christian Trott (Sandia) while
2014-09-10 23:32:24 +08:00
at U Technology Ilmenau in Germany. It provides NVIDIA GPU versions
of many pair styles, many fixes, a few computes, and for long-range
Coulombics via the PPPM command. It has the following general
2015-07-30 22:53:28 +08:00
features:< / p >
< ul class = "simple" >
< li > The package is designed to allow an entire LAMMPS calculation, for
2014-09-10 23:32:24 +08:00
many timesteps, to run entirely on the GPU (except for inter-processor
MPI communication), so that atom-based data (e.g. coordinates, forces)
2015-07-30 22:53:28 +08:00
do not have to move back-and-forth between the CPU and GPU.< / li >
< li > The speed-up advantage of this approach is typically better when the
number of atoms per GPU is large< / li >
< li > Data will stay on the GPU until a timestep where a non-USER-CUDA fix
2014-09-10 23:32:24 +08:00
or compute is invoked. Whenever a non-GPU operation occurs (fix,
compute, output), data automatically moves back to the CPU as needed.
This may incur a performance penalty, but should otherwise work
2015-07-30 22:53:28 +08:00
transparently.< / li >
< li > Neighbor lists are constructed on the GPU.< / li >
< li > The package only supports use of a single MPI task, running on a
single CPU (core), assigned to each GPU.< / li >
< / ul >
< p > Here is a quick overview of how to use the USER-CUDA package:< / p >
< ul class = "simple" >
< li > build the library in lib/cuda for your GPU hardware with desired precision< / li >
< li > include the USER-CUDA package and build LAMMPS< / li >
< li > use the mpirun command to specify 1 MPI task per GPU (on each node)< / li >
< li > enable the USER-CUDA package via the “ -c on” command-line switch< / li >
< li > specify the # of GPUs per node< / li >
< li > use USER-CUDA styles in your input script< / li >
< / ul >
< p > The latter two steps can be done using the “ -pk cuda” and “ -sf cuda”
< a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switches< / span > < / a > respectively. Or
the effect of the “ -pk” or “ -sf” switches can be duplicated by adding
the < a class = "reference internal" href = "package.html" > < em > package cuda< / em > < / a > or < a class = "reference internal" href = "suffix.html" > < em > suffix cuda< / em > < / a > commands
respectively to your input script.< / p >
< p > < strong > Required hardware/software:< / strong > < / p >
< p > To use this package, you need to have one or more NVIDIA GPUs and
install the NVIDIA Cuda software on your system:< / p >
< p > Your NVIDIA GPU needs to support Compute Capability 1.3. This list may
help you to find out the Compute Capability of your card:< / p >
< p > < a class = "reference external" href = "http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units" > http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units< / a > < / p >
< p > Install the Nvidia Cuda Toolkit (version 3.2 or higher) and the
2014-09-10 23:32:24 +08:00
corresponding GPU drivers. The Nvidia Cuda SDK is not required, but
we recommend it also be installed. You can then make sure its sample
2015-07-30 22:53:28 +08:00
projects can be compiled without problems.< / p >
< p > < strong > Building LAMMPS with the USER-CUDA package:< / strong > < / p >
< p > This requires two steps (a,b): build the USER-CUDA library, then build
LAMMPS with the USER-CUDA package.< / p >
< p > You can do both these steps in one line, using the src/Make.py script,
described in < a class = "reference internal" href = "Section_start.html#start-4" > < span > Section 2.4< / span > < / a > of the manual.
Type “ Make.py -h” for help. If run from the src directory, this
2014-10-07 23:08:33 +08:00
command will create src/lmp_cuda using src/MAKE/Makefile.mpi as the
2015-07-30 22:53:28 +08:00
starting Makefile.machine:< / p >
2015-08-15 03:07:30 +08:00
< div class = "highlight-python" > < div class = "highlight" > < pre > Make.py -p cuda -cuda mode=single arch=20 -o cuda -a lib-cuda file mpi
2015-07-30 22:53:28 +08:00
< / pre > < / div >
< / div >
< p > Or you can follow these two (a,b) steps:< / p >
< ol class = "loweralpha simple" >
< li > Build the USER-CUDA library< / li >
< / ol >
< p > The USER-CUDA library is in lammps/lib/cuda. If your < em > CUDA< / em > toolkit
is not installed in the default system directoy < em > /usr/local/cuda< / em > edit
the file < em > lib/cuda/Makefile.common< / em > accordingly.< / p >
< p > To build the library with the settings in lib/cuda/Makefile.default,
simply type:< / p >
< div class = "highlight-python" > < div class = "highlight" > < pre > < span class = "n" > make< / span >
< / pre > < / div >
< / div >
< p > To set options when the library is built, type “ make OPTIONS” , where
< em > OPTIONS< / em > are one or more of the following. The settings will be
written to the < em > lib/cuda/Makefile.defaults< / em > before the build.< / p >
< pre class = "literal-block" >
< em > precision=N< / em > to set the precision level
2014-09-10 23:32:24 +08:00
N = 1 for single precision (default)
N = 2 for double precision
N = 3 for positions in double precision
N = 4 for positions and velocities in double precision
2015-07-30 22:53:28 +08:00
< em > arch=M< / em > to set GPU compute capability
2014-09-10 23:32:24 +08:00
M = 35 for Kepler GPUs
M = 20 for CC2.0 (GF100/110, e.g. C2050,GTX580,GTX470) (default)
M = 21 for CC2.1 (GF104/114, e.g. GTX560, GTX460, GTX450)
M = 13 for CC1.3 (GF200, e.g. C1060, GTX285)
2015-07-30 22:53:28 +08:00
< em > prec_timer=0/1< / em > to use hi-precision timers
2014-09-10 23:32:24 +08:00
0 = do not use them (default)
1 = use them
2015-07-30 22:53:28 +08:00
this is usually only useful for Mac machines
< em > dbg=0/1< / em > to activate debug mode
2014-09-10 23:32:24 +08:00
0 = no debug mode (default)
1 = yes debug mode
this is only useful for developers
2015-07-30 22:53:28 +08:00
< em > cufft=1< / em > for use of the CUDA FFT library
2014-09-10 23:32:24 +08:00
0 = no CUFFT support (default)
2015-07-30 22:53:28 +08:00
in the future other CUDA-enabled FFT libraries might be supported
< / pre >
< p > If the build is successful, it will produce the files liblammpscuda.a and
Makefile.lammps.< / p >
< p > Note that if you change any of the options (like precision), you need
to re-build the entire library. Do a “ make clean” first, followed by
“ make” .< / p >
< ol class = "loweralpha simple" start = "2" >
< li > Build LAMMPS with the USER-CUDA package< / li >
< / ol >
< div class = "highlight-python" > < div class = "highlight" > < pre > cd lammps/src
2014-09-10 23:32:24 +08:00
make yes-user-cuda
2015-07-30 22:53:28 +08:00
make machine
< / pre > < / div >
< / div >
< p > No additional compile/link flags are needed in Makefile.machine.< / p >
< p > Note that if you change the USER-CUDA library precision (discussed
2014-09-10 23:32:24 +08:00
above) and rebuild the USER-CUDA library, then you also need to
re-install the USER-CUDA package and re-build LAMMPS, so that all
affected files are re-compiled and linked to the new USER-CUDA
2015-07-30 22:53:28 +08:00
library.< / p >
< p > < strong > Run with the USER-CUDA package from the command line:< / strong > < / p >
< p > The mpirun or mpiexec command sets the total number of MPI tasks used
2014-09-10 23:32:24 +08:00
by LAMMPS (one or multiple per compute node) and the number of MPI
2014-09-12 05:14:29 +08:00
tasks used per node. E.g. the mpirun command in MPICH does this via
2015-07-30 22:53:28 +08:00
its -np and -ppn switches. Ditto for OpenMPI via -np and -npernode.< / p >
< p > When using the USER-CUDA package, you must use exactly one MPI task
per physical GPU.< / p >
< p > You must use the “ -c on” < a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switch< / span > < / a > to enable the USER-CUDA package.
The “ -c on” switch also issues a default < a class = "reference internal" href = "package.html" > < em > package cuda 1< / em > < / a >
2014-09-10 23:47:24 +08:00
command which sets various USER-CUDA options to default values, as
2015-07-30 22:53:28 +08:00
discussed on the < a class = "reference internal" href = "package.html" > < em > package< / em > < / a > command doc page.< / p >
< p > Use the “ -sf cuda” < a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switch< / span > < / a > ,
which will automatically append “ cuda” to styles that support it. Use
the “ -pk cuda Ng” < a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switch< / span > < / a > to
2014-09-10 23:47:24 +08:00
set Ng = # of GPUs per node to a different value than the default set
2015-07-30 22:53:28 +08:00
by the “ -c on” switch (1 GPU) or change other < a class = "reference internal" href = "package.html" > < em > package cuda< / em > < / a > options.< / p >
< div class = "highlight-python" > < div class = "highlight" > < pre > lmp_machine -c on -sf cuda -pk cuda 1 -in in.script # 1 MPI task uses 1 GPU
2014-09-10 23:32:24 +08:00
mpirun -np 2 lmp_machine -c on -sf cuda -pk cuda 2 -in in.script # 2 MPI tasks use 2 GPUs on a single 16-core (or whatever) node
2015-07-30 22:53:28 +08:00
mpirun -np 24 -ppn 2 lmp_machine -c on -sf cuda -pk cuda 2 -in in.script # ditto on 12 16-core nodes
< / pre > < / div >
< / div >
< p > The syntax for the “ -pk” switch is the same as same as the “ package
cuda” command. See the < a class = "reference internal" href = "package.html" > < em > package< / em > < / a > command doc page for
2014-09-10 23:47:24 +08:00
details, including the default values used for all its options if it
2015-07-30 22:53:28 +08:00
is not specified.< / p >
< p > Note that the default for the < a class = "reference internal" href = "package.html" > < em > package cuda< / em > < / a > command is
to set the Newton flag to “ off” for both pairwise and bonded
2014-09-11 01:48:28 +08:00
interactions. This typically gives fastest performance. If the
2015-07-30 22:53:28 +08:00
< a class = "reference internal" href = "newton.html" > < em > newton< / em > < / a > command is used in the input script, it can
override these defaults.< / p >
< p > < strong > Or run with the USER-CUDA package by editing an input script:< / strong > < / p >
< p > The discussion above for the mpirun/mpiexec command and the requirement
of one MPI task per GPU is the same.< / p >
< p > You must still use the “ -c on” < a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switch< / span > < / a > to enable the USER-CUDA package.< / p >
< p > Use the < a class = "reference internal" href = "suffix.html" > < em > suffix cuda< / em > < / a > command, or you can explicitly add a
“ cuda” suffix to individual styles in your input script, e.g.< / p >
< div class = "highlight-python" > < div class = "highlight" > < pre > pair_style lj/cut/cuda 2.5
< / pre > < / div >
< / div >
< p > You only need to use the < a class = "reference internal" href = "package.html" > < em > package cuda< / em > < / a > command if you
2014-09-10 23:47:24 +08:00
wish to change any of its option defaults, including the number of
2015-07-30 22:53:28 +08:00
GPUs/node (default = 1), as set by the “ -c on” < a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switch< / span > < / a > .< / p >
< p > < strong > Speed-ups to expect:< / strong > < / p >
< p > The performance of a GPU versus a multi-core CPU is a function of your
2014-09-10 23:32:24 +08:00
hardware, which pair style is used, the number of atoms/GPU, and the
2015-07-30 22:53:28 +08:00
precision used on the GPU (double, single, mixed).< / p >
< p > See the < a class = "reference external" href = "http://lammps.sandia.gov/bench.html" > Benchmark page< / a > of the
2014-09-10 23:32:24 +08:00
LAMMPS web site for performance of the USER-CUDA package on different
2015-07-30 22:53:28 +08:00
hardware.< / p >
< p > < strong > Guidelines for best performance:< / strong > < / p >
< ul class = "simple" >
< li > The USER-CUDA package offers more speed-up relative to CPU performance
2014-09-10 23:32:24 +08:00
when the number of atoms per GPU is large, e.g. on the order of tens
2015-07-30 22:53:28 +08:00
or hundreds of 1000s.< / li >
< li > As noted above, this package will continue to run a simulation
2014-09-10 23:32:24 +08:00
entirely on the GPU(s) (except for inter-processor MPI communication),
for multiple timesteps, until a CPU calculation is required, either by
a fix or compute that is non-GPU-ized, or until output is performed
(thermo or dump snapshot or restart file). The less often this
2015-07-30 22:53:28 +08:00
occurs, the faster your simulation will run.< / li >
< / ul >
< div class = "section" id = "restrictions" >
< h2 > Restrictions< a class = "headerlink" href = "#restrictions" title = "Permalink to this headline" > ¶< / a > < / h2 >
< p > None.< / p >
< / div >
< / div >
< / div >
< / div >
< footer >
< hr / >
< div role = "contentinfo" >
< p >
© Copyright .
< / p >
< / div >
Built with < a href = "http://sphinx-doc.org/" > Sphinx< / a > using a < a href = "https://github.com/snide/sphinx_rtd_theme" > theme< / a > provided by < a href = "https://readthedocs.org" > Read the Docs< / a > .
< / footer >
< / div >
< / div >
< / section >
< / div >
< script type = "text/javascript" >
var DOCUMENTATION_OPTIONS = {
URL_ROOT:'./',
VERSION:'15 May 2015 version',
COLLAPSE_INDEX:false,
FILE_SUFFIX:'.html',
HAS_SOURCE: true
};
< / script >
< script type = "text/javascript" src = "_static/jquery.js" > < / script >
< script type = "text/javascript" src = "_static/underscore.js" > < / script >
< script type = "text/javascript" src = "_static/doctools.js" > < / script >
< script type = "text/javascript" src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" > < / script >
< script type = "text/javascript" src = "_static/sphinxcontrib-images/LightBox2/lightbox2/js/jquery-1.11.0.min.js" > < / script >
< script type = "text/javascript" src = "_static/sphinxcontrib-images/LightBox2/lightbox2/js/lightbox.min.js" > < / script >
< script type = "text/javascript" src = "_static/sphinxcontrib-images/LightBox2/lightbox2-customize/jquery-noconflict.js" > < / script >
< script type = "text/javascript" src = "_static/js/theme.js" > < / script >
< script type = "text/javascript" >
jQuery(function () {
SphinxRtdTheme.StickyNav.enable();
});
< / script >
< / body >
< / html >