forked from lijiext/lammps
372 lines
17 KiB
HTML
372 lines
17 KiB
HTML
|
|
|
|
<!DOCTYPE html>
|
|
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
|
|
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
|
|
<head>
|
|
<meta charset="utf-8">
|
|
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
|
|
<title>5.USER-CUDA package — LAMMPS documentation</title>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
|
|
|
|
|
|
|
|
<link rel="stylesheet" href="_static/sphinxcontrib-images/LightBox2/lightbox2/css/lightbox.css" type="text/css" />
|
|
|
|
|
|
|
|
<link rel="top" title="LAMMPS documentation" href="index.html"/>
|
|
|
|
|
|
<script src="_static/js/modernizr.min.js"></script>
|
|
|
|
</head>
|
|
|
|
<body class="wy-body-for-nav" role="document">
|
|
|
|
<div class="wy-grid-for-nav">
|
|
|
|
|
|
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
|
|
<div class="wy-side-nav-search">
|
|
|
|
|
|
|
|
<a href="Manual.html" class="icon icon-home"> LAMMPS
|
|
|
|
|
|
|
|
</a>
|
|
|
|
|
|
<div role="search">
|
|
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
|
|
<input type="text" name="q" placeholder="Search docs" />
|
|
<input type="hidden" name="check_keywords" value="yes" />
|
|
<input type="hidden" name="area" value="default" />
|
|
</form>
|
|
</div>
|
|
|
|
|
|
</div>
|
|
|
|
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
|
|
|
|
|
|
|
|
<ul>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_intro.html">1. Introduction</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_start.html">2. Getting Started</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_commands.html">3. Commands</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_packages.html">4. Packages</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_accelerate.html">5. Accelerating LAMMPS performance</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_howto.html">6. How-to discussions</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_example.html">7. Example problems</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_perf.html">8. Performance & scalability</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_tools.html">9. Additional tools</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_modify.html">10. Modifying & extending LAMMPS</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_python.html">11. Python interface to LAMMPS</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_errors.html">12. Errors</a></li>
|
|
<li class="toctree-l1"><a class="reference internal" href="Section_history.html">13. Future and history</a></li>
|
|
</ul>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
</nav>
|
|
|
|
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
|
|
|
|
|
|
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
|
|
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
|
|
<a href="Manual.html">LAMMPS</a>
|
|
</nav>
|
|
|
|
|
|
|
|
<div class="wy-nav-content">
|
|
<div class="rst-content">
|
|
<div role="navigation" aria-label="breadcrumbs navigation">
|
|
<ul class="wy-breadcrumbs">
|
|
<li><a href="Manual.html">Docs</a> »</li>
|
|
|
|
<li>5.USER-CUDA package</li>
|
|
<li class="wy-breadcrumbs-aside">
|
|
|
|
|
|
<a href="http://lammps.sandia.gov">Website</a>
|
|
<a href="Section_commands.html#comm">Commands</a>
|
|
|
|
</li>
|
|
</ul>
|
|
<hr/>
|
|
|
|
</div>
|
|
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
|
|
<div itemprop="articleBody">
|
|
|
|
<p><a class="reference internal" href="Section_accelerate.html"><em>Return to Section accelerate overview</em></a></p>
|
|
<div class="section" id="user-cuda-package">
|
|
<h1>5.USER-CUDA package<a class="headerlink" href="#user-cuda-package" title="Permalink to this headline">¶</a></h1>
|
|
<p>The USER-CUDA package was developed by Christian Trott (Sandia) while
|
|
at U Technology Ilmenau in Germany. It provides NVIDIA GPU versions
|
|
of many pair styles, many fixes, a few computes, and for long-range
|
|
Coulombics via the PPPM command. It has the following general
|
|
features:</p>
|
|
<ul class="simple">
|
|
<li>The package is designed to allow an entire LAMMPS calculation, for
|
|
many timesteps, to run entirely on the GPU (except for inter-processor
|
|
MPI communication), so that atom-based data (e.g. coordinates, forces)
|
|
do not have to move back-and-forth between the CPU and GPU.</li>
|
|
<li>The speed-up advantage of this approach is typically better when the
|
|
number of atoms per GPU is large</li>
|
|
<li>Data will stay on the GPU until a timestep where a non-USER-CUDA fix
|
|
or compute is invoked. Whenever a non-GPU operation occurs (fix,
|
|
compute, output), data automatically moves back to the CPU as needed.
|
|
This may incur a performance penalty, but should otherwise work
|
|
transparently.</li>
|
|
<li>Neighbor lists are constructed on the GPU.</li>
|
|
<li>The package only supports use of a single MPI task, running on a
|
|
single CPU (core), assigned to each GPU.</li>
|
|
</ul>
|
|
<p>Here is a quick overview of how to use the USER-CUDA package:</p>
|
|
<ul class="simple">
|
|
<li>build the library in lib/cuda for your GPU hardware with desired precision</li>
|
|
<li>include the USER-CUDA package and build LAMMPS</li>
|
|
<li>use the mpirun command to specify 1 MPI task per GPU (on each node)</li>
|
|
<li>enable the USER-CUDA package via the “-c on” command-line switch</li>
|
|
<li>specify the # of GPUs per node</li>
|
|
<li>use USER-CUDA styles in your input script</li>
|
|
</ul>
|
|
<p>The latter two steps can be done using the “-pk cuda” and “-sf cuda”
|
|
<a class="reference internal" href="Section_start.html#start-7"><span>command-line switches</span></a> respectively. Or
|
|
the effect of the “-pk” or “-sf” switches can be duplicated by adding
|
|
the <a class="reference internal" href="package.html"><em>package cuda</em></a> or <a class="reference internal" href="suffix.html"><em>suffix cuda</em></a> commands
|
|
respectively to your input script.</p>
|
|
<p><strong>Required hardware/software:</strong></p>
|
|
<p>To use this package, you need to have one or more NVIDIA GPUs and
|
|
install the NVIDIA Cuda software on your system:</p>
|
|
<p>Your NVIDIA GPU needs to support Compute Capability 1.3. This list may
|
|
help you to find out the Compute Capability of your card:</p>
|
|
<p><a class="reference external" href="http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units">http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units</a></p>
|
|
<p>Install the Nvidia Cuda Toolkit (version 3.2 or higher) and the
|
|
corresponding GPU drivers. The Nvidia Cuda SDK is not required, but
|
|
we recommend it also be installed. You can then make sure its sample
|
|
projects can be compiled without problems.</p>
|
|
<p><strong>Building LAMMPS with the USER-CUDA package:</strong></p>
|
|
<p>This requires two steps (a,b): build the USER-CUDA library, then build
|
|
LAMMPS with the USER-CUDA package.</p>
|
|
<p>You can do both these steps in one line, using the src/Make.py script,
|
|
described in <a class="reference internal" href="Section_start.html#start-4"><span>Section 2.4</span></a> of the manual.
|
|
Type “Make.py -h” for help. If run from the src directory, this
|
|
command will create src/lmp_cuda using src/MAKE/Makefile.mpi as the
|
|
starting Makefile.machine:</p>
|
|
<div class="highlight-python"><div class="highlight"><pre>Make.py -p cuda -cuda mode=single arch=20 -o cuda -a lib-cuda file mpi
|
|
</pre></div>
|
|
</div>
|
|
<p>Or you can follow these two (a,b) steps:</p>
|
|
<ol class="loweralpha simple">
|
|
<li>Build the USER-CUDA library</li>
|
|
</ol>
|
|
<p>The USER-CUDA library is in lammps/lib/cuda. If your <em>CUDA</em> toolkit
|
|
is not installed in the default system directoy <em>/usr/local/cuda</em> edit
|
|
the file <em>lib/cuda/Makefile.common</em> accordingly.</p>
|
|
<p>To build the library with the settings in lib/cuda/Makefile.default,
|
|
simply type:</p>
|
|
<div class="highlight-python"><div class="highlight"><pre><span class="n">make</span>
|
|
</pre></div>
|
|
</div>
|
|
<p>To set options when the library is built, type “make OPTIONS”, where
|
|
<em>OPTIONS</em> are one or more of the following. The settings will be
|
|
written to the <em>lib/cuda/Makefile.defaults</em> before the build.</p>
|
|
<pre class="literal-block">
|
|
<em>precision=N</em> to set the precision level
|
|
N = 1 for single precision (default)
|
|
N = 2 for double precision
|
|
N = 3 for positions in double precision
|
|
N = 4 for positions and velocities in double precision
|
|
<em>arch=M</em> to set GPU compute capability
|
|
M = 35 for Kepler GPUs
|
|
M = 20 for CC2.0 (GF100/110, e.g. C2050,GTX580,GTX470) (default)
|
|
M = 21 for CC2.1 (GF104/114, e.g. GTX560, GTX460, GTX450)
|
|
M = 13 for CC1.3 (GF200, e.g. C1060, GTX285)
|
|
<em>prec_timer=0/1</em> to use hi-precision timers
|
|
0 = do not use them (default)
|
|
1 = use them
|
|
this is usually only useful for Mac machines
|
|
<em>dbg=0/1</em> to activate debug mode
|
|
0 = no debug mode (default)
|
|
1 = yes debug mode
|
|
this is only useful for developers
|
|
<em>cufft=1</em> for use of the CUDA FFT library
|
|
0 = no CUFFT support (default)
|
|
in the future other CUDA-enabled FFT libraries might be supported
|
|
</pre>
|
|
<p>If the build is successful, it will produce the files liblammpscuda.a and
|
|
Makefile.lammps.</p>
|
|
<p>Note that if you change any of the options (like precision), you need
|
|
to re-build the entire library. Do a “make clean” first, followed by
|
|
“make”.</p>
|
|
<ol class="loweralpha simple" start="2">
|
|
<li>Build LAMMPS with the USER-CUDA package</li>
|
|
</ol>
|
|
<div class="highlight-python"><div class="highlight"><pre>cd lammps/src
|
|
make yes-user-cuda
|
|
make machine
|
|
</pre></div>
|
|
</div>
|
|
<p>No additional compile/link flags are needed in Makefile.machine.</p>
|
|
<p>Note that if you change the USER-CUDA library precision (discussed
|
|
above) and rebuild the USER-CUDA library, then you also need to
|
|
re-install the USER-CUDA package and re-build LAMMPS, so that all
|
|
affected files are re-compiled and linked to the new USER-CUDA
|
|
library.</p>
|
|
<p><strong>Run with the USER-CUDA package from the command line:</strong></p>
|
|
<p>The mpirun or mpiexec command sets the total number of MPI tasks used
|
|
by LAMMPS (one or multiple per compute node) and the number of MPI
|
|
tasks used per node. E.g. the mpirun command in MPICH does this via
|
|
its -np and -ppn switches. Ditto for OpenMPI via -np and -npernode.</p>
|
|
<p>When using the USER-CUDA package, you must use exactly one MPI task
|
|
per physical GPU.</p>
|
|
<p>You must use the “-c on” <a class="reference internal" href="Section_start.html#start-7"><span>command-line switch</span></a> to enable the USER-CUDA package.
|
|
The “-c on” switch also issues a default <a class="reference internal" href="package.html"><em>package cuda 1</em></a>
|
|
command which sets various USER-CUDA options to default values, as
|
|
discussed on the <a class="reference internal" href="package.html"><em>package</em></a> command doc page.</p>
|
|
<p>Use the “-sf cuda” <a class="reference internal" href="Section_start.html#start-7"><span>command-line switch</span></a>,
|
|
which will automatically append “cuda” to styles that support it. Use
|
|
the “-pk cuda Ng” <a class="reference internal" href="Section_start.html#start-7"><span>command-line switch</span></a> to
|
|
set Ng = # of GPUs per node to a different value than the default set
|
|
by the “-c on” switch (1 GPU) or change other <a class="reference internal" href="package.html"><em>package cuda</em></a> options.</p>
|
|
<div class="highlight-python"><div class="highlight"><pre>lmp_machine -c on -sf cuda -pk cuda 1 -in in.script # 1 MPI task uses 1 GPU
|
|
mpirun -np 2 lmp_machine -c on -sf cuda -pk cuda 2 -in in.script # 2 MPI tasks use 2 GPUs on a single 16-core (or whatever) node
|
|
mpirun -np 24 -ppn 2 lmp_machine -c on -sf cuda -pk cuda 2 -in in.script # ditto on 12 16-core nodes
|
|
</pre></div>
|
|
</div>
|
|
<p>The syntax for the “-pk” switch is the same as same as the “package
|
|
cuda” command. See the <a class="reference internal" href="package.html"><em>package</em></a> command doc page for
|
|
details, including the default values used for all its options if it
|
|
is not specified.</p>
|
|
<p>Note that the default for the <a class="reference internal" href="package.html"><em>package cuda</em></a> command is
|
|
to set the Newton flag to “off” for both pairwise and bonded
|
|
interactions. This typically gives fastest performance. If the
|
|
<a class="reference internal" href="newton.html"><em>newton</em></a> command is used in the input script, it can
|
|
override these defaults.</p>
|
|
<p><strong>Or run with the USER-CUDA package by editing an input script:</strong></p>
|
|
<p>The discussion above for the mpirun/mpiexec command and the requirement
|
|
of one MPI task per GPU is the same.</p>
|
|
<p>You must still use the “-c on” <a class="reference internal" href="Section_start.html#start-7"><span>command-line switch</span></a> to enable the USER-CUDA package.</p>
|
|
<p>Use the <a class="reference internal" href="suffix.html"><em>suffix cuda</em></a> command, or you can explicitly add a
|
|
“cuda” suffix to individual styles in your input script, e.g.</p>
|
|
<div class="highlight-python"><div class="highlight"><pre>pair_style lj/cut/cuda 2.5
|
|
</pre></div>
|
|
</div>
|
|
<p>You only need to use the <a class="reference internal" href="package.html"><em>package cuda</em></a> command if you
|
|
wish to change any of its option defaults, including the number of
|
|
GPUs/node (default = 1), as set by the “-c on” <a class="reference internal" href="Section_start.html#start-7"><span>command-line switch</span></a>.</p>
|
|
<p><strong>Speed-ups to expect:</strong></p>
|
|
<p>The performance of a GPU versus a multi-core CPU is a function of your
|
|
hardware, which pair style is used, the number of atoms/GPU, and the
|
|
precision used on the GPU (double, single, mixed).</p>
|
|
<p>See the <a class="reference external" href="http://lammps.sandia.gov/bench.html">Benchmark page</a> of the
|
|
LAMMPS web site for performance of the USER-CUDA package on different
|
|
hardware.</p>
|
|
<p><strong>Guidelines for best performance:</strong></p>
|
|
<ul class="simple">
|
|
<li>The USER-CUDA package offers more speed-up relative to CPU performance
|
|
when the number of atoms per GPU is large, e.g. on the order of tens
|
|
or hundreds of 1000s.</li>
|
|
<li>As noted above, this package will continue to run a simulation
|
|
entirely on the GPU(s) (except for inter-processor MPI communication),
|
|
for multiple timesteps, until a CPU calculation is required, either by
|
|
a fix or compute that is non-GPU-ized, or until output is performed
|
|
(thermo or dump snapshot or restart file). The less often this
|
|
occurs, the faster your simulation will run.</li>
|
|
</ul>
|
|
<div class="section" id="restrictions">
|
|
<h2>Restrictions<a class="headerlink" href="#restrictions" title="Permalink to this headline">¶</a></h2>
|
|
<p>None.</p>
|
|
</div>
|
|
</div>
|
|
|
|
|
|
</div>
|
|
</div>
|
|
<footer>
|
|
|
|
|
|
<hr/>
|
|
|
|
<div role="contentinfo">
|
|
<p>
|
|
© Copyright 2013 Sandia Corporation.
|
|
</p>
|
|
</div>
|
|
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
|
|
|
|
</footer>
|
|
|
|
</div>
|
|
</div>
|
|
|
|
</section>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
|
|
|
|
<script type="text/javascript">
|
|
var DOCUMENTATION_OPTIONS = {
|
|
URL_ROOT:'./',
|
|
VERSION:'',
|
|
COLLAPSE_INDEX:false,
|
|
FILE_SUFFIX:'.html',
|
|
HAS_SOURCE: true
|
|
};
|
|
</script>
|
|
<script type="text/javascript" src="_static/jquery.js"></script>
|
|
<script type="text/javascript" src="_static/underscore.js"></script>
|
|
<script type="text/javascript" src="_static/doctools.js"></script>
|
|
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
|
|
<script type="text/javascript" src="_static/sphinxcontrib-images/LightBox2/lightbox2/js/jquery-1.11.0.min.js"></script>
|
|
<script type="text/javascript" src="_static/sphinxcontrib-images/LightBox2/lightbox2/js/lightbox.min.js"></script>
|
|
<script type="text/javascript" src="_static/sphinxcontrib-images/LightBox2/lightbox2-customize/jquery-noconflict.js"></script>
|
|
|
|
|
|
|
|
|
|
|
|
<script type="text/javascript" src="_static/js/theme.js"></script>
|
|
|
|
|
|
|
|
|
|
<script type="text/javascript">
|
|
jQuery(function () {
|
|
SphinxRtdTheme.StickyNav.enable();
|
|
});
|
|
</script>
|
|
|
|
|
|
</body>
|
|
</html> |