2014-09-10 23:32:24 +08:00
2015-07-30 22:53:28 +08:00
<!DOCTYPE html>
<!-- [if IE 8]><html class="no - js lt - ie9" lang="en" > <![endif] -->
<!-- [if gt IE 8]><! --> < html class = "no-js" lang = "en" > <!-- <![endif] -->
< head >
< meta charset = "utf-8" >
< meta name = "viewport" content = "width=device-width, initial-scale=1.0" >
< title > 5.USER-INTEL package — LAMMPS 15 May 2015 version documentation< / title >
2014-09-10 23:32:24 +08:00
2015-07-30 22:53:28 +08:00
2014-09-10 23:32:24 +08:00
2015-07-30 22:53:28 +08:00
2014-09-10 23:32:24 +08:00
2015-07-30 22:53:28 +08:00
2014-09-10 23:32:24 +08:00
2015-07-30 22:53:28 +08:00
2014-09-10 23:32:24 +08:00
2015-07-30 22:53:28 +08:00
< link rel = "stylesheet" href = "_static/css/theme.css" type = "text/css" / >
< link rel = "stylesheet" href = "_static/sphinxcontrib-images/LightBox2/lightbox2/css/lightbox.css" type = "text/css" / >
< link rel = "top" title = "LAMMPS 15 May 2015 version documentation" href = "index.html" / >
< script src = "_static/js/modernizr.min.js" > < / script >
< / head >
< body class = "wy-body-for-nav" role = "document" >
< div class = "wy-grid-for-nav" >
< nav data-toggle = "wy-nav-shift" class = "wy-nav-side" >
< div class = "wy-side-nav-search" >
< a href = "Manual.html" class = "icon icon-home" > LAMMPS
< / a >
< div role = "search" >
< form id = "rtd-search-form" class = "wy-form" action = "search.html" method = "get" >
< input type = "text" name = "q" placeholder = "Search docs" / >
< input type = "hidden" name = "check_keywords" value = "yes" / >
< input type = "hidden" name = "area" value = "default" / >
< / form >
< / div >
< / div >
< div class = "wy-menu wy-menu-vertical" data-spy = "affix" role = "navigation" aria-label = "main navigation" >
< ul >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_intro.html" > 1. Introduction< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_start.html" > 2. Getting Started< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_commands.html" > 3. Commands< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_packages.html" > 4. Packages< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_accelerate.html" > 5. Accelerating LAMMPS performance< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_howto.html" > 6. How-to discussions< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_example.html" > 7. Example problems< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_perf.html" > 8. Performance & scalability< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_tools.html" > 9. Additional tools< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_modify.html" > 10. Modifying & extending LAMMPS< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_python.html" > 11. Python interface to LAMMPS< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_errors.html" > 12. Errors< / a > < / li >
< li class = "toctree-l1" > < a class = "reference internal" href = "Section_history.html" > 13. Future and history< / a > < / li >
< / ul >
< / div >
< / nav >
< section data-toggle = "wy-nav-shift" class = "wy-nav-content-wrap" >
< nav class = "wy-nav-top" role = "navigation" aria-label = "top navigation" >
< i data-toggle = "wy-nav-top" class = "fa fa-bars" > < / i >
< a href = "Manual.html" > LAMMPS< / a >
< / nav >
< div class = "wy-nav-content" >
< div class = "rst-content" >
< div role = "navigation" aria-label = "breadcrumbs navigation" >
< ul class = "wy-breadcrumbs" >
< li > < a href = "Manual.html" > Docs< / a > » < / li >
< li > 5.USER-INTEL package< / li >
< li class = "wy-breadcrumbs-aside" >
< a href = "http://lammps.sandia.gov" > Website< / a >
< a href = "Section_commands.html#comm" > Commands< / a >
< / li >
< / ul >
< hr / >
< / div >
< div role = "main" class = "document" itemscope = "itemscope" itemtype = "http://schema.org/Article" >
< div itemprop = "articleBody" >
< p > < a class = "reference internal" href = "Section_accelerate.html" > < em > Return to Section accelerate overview< / em > < / a > < / p >
< div class = "section" id = "user-intel-package" >
< h1 > 5.USER-INTEL package< a class = "headerlink" href = "#user-intel-package" title = "Permalink to this headline" > ¶< / a > < / h1 >
< p > The USER-INTEL package was developed by Mike Brown at Intel
2014-09-10 23:32:24 +08:00
Corporation. It provides a capability to accelerate simulations by
offloading neighbor list and non-bonded force calculations to Intel(R)
Xeon Phi(TM) coprocessors (not native mode like the KOKKOS package).
Additionally, it supports running simulations in single, mixed, or
double precision with vectorization, even if a coprocessor is not
present, i.e. on an Intel(R) CPU. The same C++ code is used for both
cases. When offloading to a coprocessor, the routine is run twice,
2015-07-30 22:53:28 +08:00
once with an offload flag.< / p >
< p > The USER-INTEL package can be used in tandem with the USER-OMP
2014-09-10 23:32:24 +08:00
package. This is useful when offloading pair style computations to
coprocessors, so that other styles not supported by the USER-INTEL
package, e.g. bond, angle, dihedral, improper, and long-range
2014-09-13 05:19:51 +08:00
electrostatics, can run simultaneously in threaded mode on the CPU
2014-09-10 23:32:24 +08:00
cores. Since less MPI tasks than CPU cores will typically be invoked
2014-09-13 05:19:51 +08:00
when running with coprocessors, this enables the extra CPU cores to be
2015-07-30 22:53:28 +08:00
used for useful computation.< / p >
< p > If LAMMPS is built with both the USER-INTEL and USER-OMP packages
2015-10-05 23:19:04 +08:00
installed, this mode of operation is made easier to use, with the
“ -suffix hybrid intel omp” < a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switch< / span > < / a >
or the < a class = "reference internal" href = "suffix.html" > < em > suffix hybrid intel omp< / em > < / a > command will both set a
second-choice suffix to “ omp” so that styles from the USER-OMP package will be
used if available, after first testing if a style from the USER-INTEL
2015-07-30 22:53:28 +08:00
package is available.< / p >
2015-10-05 23:19:04 +08:00
< p > When using the USER-INTEL package, you must choose at build time whether the
binary will support offload to Xeon Phi coprocessors. Binaries supporting
offload can still be run in CPU-only (host-only) mode.< / p >
2015-07-30 22:53:28 +08:00
< p > Here is a quick overview of how to use the USER-INTEL package
for CPU-only acceleration:< / p >
< ul class = "simple" >
< li > specify these CCFLAGS in your src/MAKE/Makefile.machine: -openmp, -DLAMMPS_MEMALIGN=64, -restrict, -xHost< / li >
< li > specify -openmp with LINKFLAGS in your Makefile.machine< / li >
< li > include the USER-INTEL package and (optionally) USER-OMP package and build LAMMPS< / li >
< li > specify how many OpenMP threads per MPI task to use< / li >
< li > use USER-INTEL and (optionally) USER-OMP styles in your input script< / li >
< / ul >
< p > Note that many of these settings can only be used with the Intel
compiler, as discussed below.< / p >
< p > Using the USER-INTEL package to offload work to the Intel(R)
2014-09-10 23:32:24 +08:00
Xeon Phi(TM) coprocessor is the same except for these additional
2015-07-30 22:53:28 +08:00
steps:< / p >
< ul class = "simple" >
< li > add the flag -DLMP_INTEL_OFFLOAD to CCFLAGS in your Makefile.machine< / li >
< li > add the flag -offload to LINKFLAGS in your Makefile.machine< / li >
< / ul >
< p > The latter two steps in the first case and the last step in the
coprocessor case can be done using the “ -pk intel” and “ -sf intel”
< a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switches< / span > < / a > respectively. Or
the effect of the “ -pk” or “ -sf” switches can be duplicated by adding
the < a class = "reference internal" href = "package.html" > < em > package intel< / em > < / a > or < a class = "reference internal" href = "suffix.html" > < em > suffix intel< / em > < / a >
commands respectively to your input script.< / p >
< p > < strong > Required hardware/software:< / strong > < / p >
< p > To use the offload option, you must have one or more Intel(R) Xeon
Phi(TM) coprocessors and use an Intel(R) C++ compiler.< / p >
< p > Optimizations for vectorization have only been tested with the
2014-09-10 23:32:24 +08:00
Intel(R) compiler. Use of other compilers may not result in
2015-07-30 22:53:28 +08:00
vectorization or give poor performance.< / p >
< p > Use of an Intel C++ compiler is recommended, but not required (though
2014-10-07 23:08:33 +08:00
g++ will not recognize some of the settings, so they cannot be used).
2015-07-30 22:53:28 +08:00
The compiler must support the OpenMP interface.< / p >
< p > The recommended version of the Intel(R) compiler is 14.0.1.106.
Versions 15.0.1.133 and later are also supported. If using Intel(R)
MPI, versions 15.0.2.044 and later are recommended.< / p >
< p > < strong > Building LAMMPS with the USER-INTEL package:< / strong > < / p >
< p > You can choose to build with or without support for offload to a
2014-12-23 06:12:21 +08:00
Intel(R) Xeon Phi(TM) coprocessor. If you build with support for a
coprocessor, the same binary can be used on nodes with and without
coprocessors installed. However, if you do not have coprocessors
on your system, building without offload support will produce a
2015-07-30 22:53:28 +08:00
smaller binary.< / p >
< p > You can do either in one line, using the src/Make.py script, described
in < a class = "reference internal" href = "Section_start.html#start-4" > < span > Section 2.4< / span > < / a > of the manual. Type
“ Make.py -h” for help. If run from the src directory, these commands
2014-10-07 23:08:33 +08:00
will create src/lmp_intel_cpu and lmp_intel_phi using
2015-07-30 22:53:28 +08:00
src/MAKE/Makefile.mpi as the starting Makefile.machine:< / p >
2015-08-15 03:07:30 +08:00
< div class = "highlight-python" > < div class = "highlight" > < pre > Make.py -p intel omp -intel cpu -o intel_cpu -cc icc -a file mpi
Make.py -p intel omp -intel phi -o intel_phi -cc icc -a file mpi
2015-07-30 22:53:28 +08:00
< / pre > < / div >
< / div >
< p > Note that this assumes that your MPI and its mpicxx wrapper
2014-10-07 23:08:33 +08:00
is using the Intel compiler. If it is not, you should
2015-07-30 22:53:28 +08:00
leave off the “ -cc icc” switch.< / p >
< p > Or you can follow these steps:< / p >
< div class = "highlight-python" > < div class = "highlight" > < pre > cd lammps/src
2014-09-10 23:32:24 +08:00
make yes-user-intel
make yes-user-omp (if desired)
2015-07-30 22:53:28 +08:00
make machine
< / pre > < / div >
< / div >
< p > Note that if the USER-OMP package is also installed, you can use
styles from both packages, as described below.< / p >
< p > The Makefile.machine needs a “ -fopenmp” flag for OpenMP support in
2014-10-07 23:08:33 +08:00
both the CCFLAGS and LINKFLAGS variables. You also need to add
2015-07-30 22:53:28 +08:00
-DLAMMPS_MEMALIGN=64 and -restrict to CCFLAGS.< / p >
< p > If you are compiling on the same architecture that will be used for
the runs, adding the flag < em > -xHost< / em > to CCFLAGS will enable
2014-12-23 06:12:21 +08:00
vectorization with the Intel(R) compiler. Otherwise, you must
provide the correct compute node architecture to the -x option
2015-07-30 22:53:28 +08:00
(e.g. -xAVX).< / p >
< p > In order to build with support for an Intel(R) Xeon Phi(TM)
coprocessor, the flag < em > -offload< / em > should be added to the LINKFLAGS line
and the flag -DLMP_INTEL_OFFLOAD should be added to the CCFLAGS line.< / p >
< p > Example makefiles Makefile.intel_cpu and Makefile.intel_phi are
2014-10-07 23:08:33 +08:00
included in the src/MAKE/OPTIONS directory with settings that perform
well with the Intel(R) compiler. The latter file has support for
2015-07-30 22:53:28 +08:00
offload to coprocessors; the former does not.< / p >
< p > < strong > Notes on CPU and core affinity:< / strong > < / p >
< p > Setting core affinity is often used to pin MPI tasks and OpenMP
2014-12-23 06:12:21 +08:00
threads to a core or group of cores so that memory access can be
2015-07-30 22:53:28 +08:00
uniform. Unless disabled at build time, affinity for MPI tasks and
OpenMP threads on the host will be set by default on the host
when using offload to a coprocessor. In this case, it is unnecessary
2014-12-23 06:12:21 +08:00
to use other methods to control affinity (e.g. taskset, numactl,
I_MPI_PIN_DOMAIN, etc.). This can be disabled in an input script
2015-07-30 22:53:28 +08:00
with the < em > no_affinity< / em > option to the < a class = "reference internal" href = "package.html" > < em > package intel< / em > < / a >
2014-12-23 06:12:21 +08:00
command or by disabling the option at build time (by adding
-DINTEL_OFFLOAD_NOAFFINITY to the CCFLAGS line of your Makefile).
Disabling this option is not recommended, especially when running
2015-07-30 22:53:28 +08:00
on a machine with hyperthreading disabled.< / p >
< p > < strong > Running with the USER-INTEL package from the command line:< / strong > < / p >
< p > The mpirun or mpiexec command sets the total number of MPI tasks used
2014-09-10 23:32:24 +08:00
by LAMMPS (one or multiple per compute node) and the number of MPI
2014-09-12 05:14:29 +08:00
tasks used per node. E.g. the mpirun command in MPICH does this via
2015-07-30 22:53:28 +08:00
its -np and -ppn switches. Ditto for OpenMPI via -np and -npernode.< / p >
< p > If you plan to compute (any portion of) pairwise interactions using
2014-09-13 05:19:51 +08:00
USER-INTEL pair styles on the CPU, or use USER-OMP styles on the CPU,
you need to choose how many OpenMP threads per MPI task to use. Note
that the product of MPI tasks * OpenMP threads/task should not exceed
the physical number of cores (on a node), otherwise performance will
2015-07-30 22:53:28 +08:00
suffer.< / p >
< p > If LAMMPS was built with coprocessor support for the USER-INTEL
2014-09-13 05:19:51 +08:00
package, you also need to specify the number of coprocessor/node and
2015-10-05 23:19:04 +08:00
optionally the number of coprocessor threads per MPI task to use. Note that
2014-09-10 23:32:24 +08:00
coprocessor threads (which run on the coprocessor) are totally
2014-09-13 05:19:51 +08:00
independent from OpenMP threads (which run on the CPU). The default
values for the settings that affect coprocessor threads are typically
2015-07-30 22:53:28 +08:00
fine, as discussed below.< / p >
< p > Use the “ -sf intel” < a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switch< / span > < / a > ,
2015-10-05 23:19:04 +08:00
which will automatically append “ intel” to styles that support it.
OpenMP threads per MPI task can be set via the “ -pk intel Nphi omp Nt” or
2015-07-30 22:53:28 +08:00
“ -pk omp Nt” < a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switches< / span > < / a > , which
set Nt = # of OpenMP threads per MPI task to use. The “ -pk omp” form
is only allowed if LAMMPS was also built with the USER-OMP package.< / p >
< p > Use the “ -pk intel Nphi” < a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switch< / span > < / a > to set Nphi = # of Xeon Phi(TM)
2014-09-13 05:19:51 +08:00
coprocessors/node, if LAMMPS was built with coprocessor support. All
the available coprocessor threads on each Phi will be divided among
2015-07-30 22:53:28 +08:00
MPI tasks, unless the < em > tptask< / em > option of the “ -pk intel” < a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switch< / span > < / a > is used to limit the coprocessor
threads per MPI task. See the < a class = "reference internal" href = "package.html" > < em > package intel< / em > < / a > command
for details.< / p >
< div class = "highlight-python" > < div class = "highlight" > < pre > CPU-only without USER-OMP (but using Intel vectorization on CPU):
2015-10-05 23:19:04 +08:00
mpirun -np 32 lmp_machine -sf intel -in in.script # 32 MPI tasks on as many nodes as needed (e.g. 2 16-core nodes)
lmp_machine -sf intel -pk intel 0 omp 16 -in in.script # 1 MPI task and 16 threads
2015-07-30 22:53:28 +08:00
< / pre > < / div >
< / div >
< div class = "highlight-python" > < div class = "highlight" > < pre > CPU-only with USER-OMP (and Intel vectorization on CPU):
2015-10-05 23:19:04 +08:00
lmp_machine -sf hybrid intel omp -pk intel 0 omp 16 -in in.script # 1 MPI task on a 16-core node with 16 threads
mpirun -np 4 lmp_machine -sf hybrid intel omp -pk omp 4 -in in.script # 4 MPI tasks each with 4 threads on a single 16-core node
2015-07-30 22:53:28 +08:00
< / pre > < / div >
< / div >
< div class = "highlight-python" > < div class = "highlight" > < pre > CPUs + Xeon Phi(TM) coprocessors with or without USER-OMP:
2015-10-05 23:19:04 +08:00
mpirun -np 32 -ppn 16 lmp_machine -sf intel -pk intel 1 -in in.script # 2 nodes with 16 MPI tasks on each, 240 total threads on coprocessor
mpirun -np 16 -ppn 8 lmp_machine -sf intel -pk intel 1 omp 2 -in in.script # 2 nodes, 8 MPI tasks on each node, 2 threads for each task, 240 total threads on coprocessor
mpirun -np 16 -ppn 8 lmp_machine -sf hybrid intel omp -pk intel 1 omp 2 -in in.script # 2 nodes, 8 MPI tasks on each node, 2 threads for each task, 240 total threads on coprocessor, USER-OMP package for some styles
2015-07-30 22:53:28 +08:00
< / pre > < / div >
< / div >
2015-10-05 23:19:04 +08:00
< p > Note that if the “ -sf intel” switch is used, it also invokes a
default command: < a class = "reference internal" href = "package.html" > < em > package intel 1< / em > < / a > . If the “ -sf hybrid intel omp”
switch is used, the default USER-OMP command < a class = "reference internal" href = "package.html" > < em > package omp 0< / em > < / a > is
also invoked. Both set the number of OpenMP threads per MPI task via the
OMP_NUM_THREADS environment variable. The first command sets the number of
Xeon Phi(TM) coprocessors/node to 1 (and the precision mode to “ mixed” , as one
of its option defaults). The latter command is not invoked if LAMMPS was not
built with the USER-OMP package. The Nphi = 1 value for the first command is
ignored if LAMMPS was not built with coprocessor support.< / p >
2015-07-30 22:53:28 +08:00
< p > Using the “ -pk intel” or “ -pk omp” switches explicitly allows for
2014-09-13 05:19:51 +08:00
direct setting of the number of OpenMP threads per MPI task, and
additional options for either of the USER-INTEL or USER-OMP packages.
2015-07-30 22:53:28 +08:00
In particular, the “ -pk intel” switch sets the number of
2014-09-13 05:19:51 +08:00
coprocessors/node and can limit the number of coprocessor threads per
2015-10-05 23:19:04 +08:00
MPI task. The syntax for these two switches is the same as the
2015-07-30 22:53:28 +08:00
< a class = "reference internal" href = "package.html" > < em > package omp< / em > < / a > and < a class = "reference internal" href = "package.html" > < em > package intel< / em > < / a > commands.
See the < a class = "reference internal" href = "package.html" > < em > package< / em > < / a > command doc page for details, including
2014-09-13 05:19:51 +08:00
the default values used for all its options if these switches are not
specified, and how to set the number of OpenMP threads via the
2015-07-30 22:53:28 +08:00
OMP_NUM_THREADS environment variable if desired.< / p >
< p > < strong > Or run with the USER-INTEL package by editing an input script:< / strong > < / p >
< p > The discussion above for the mpirun/mpiexec command, MPI tasks/node,
2014-09-10 23:32:24 +08:00
OpenMP threads per MPI task, and coprocessor threads per MPI task is
2015-07-30 22:53:28 +08:00
the same.< / p >
< p > Use the < a class = "reference internal" href = "suffix.html" > < em > suffix intel< / em > < / a > command, or you can explicitly add an
“ intel” suffix to individual styles in your input script, e.g.< / p >
< div class = "highlight-python" > < div class = "highlight" > < pre > pair_style lj/cut/intel 2.5
< / pre > < / div >
< / div >
< p > You must also use the < a class = "reference internal" href = "package.html" > < em > package intel< / em > < / a > command, unless the
“ -sf intel” or “ -pk intel” < a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switches< / span > < / a > were used. It specifies how many
2014-09-13 05:19:51 +08:00
coprocessors/node to use, as well as other OpenMP threading and
coprocessor options. Its doc page explains how to set the number of
2015-07-30 22:53:28 +08:00
OpenMP threads via an environment variable if desired.< / p >
< p > If LAMMPS was also built with the USER-OMP package, you must also use
the < a class = "reference internal" href = "package.html" > < em > package omp< / em > < / a > command to enable that package, unless
2015-10-05 23:19:04 +08:00
the “ -sf hybrid intel omp” or “ -pk omp” < a class = "reference internal" href = "Section_start.html#start-7" > < span > command-line switches< / span > < / a > were used. It specifies how many
2014-09-13 05:19:51 +08:00
OpenMP threads per MPI task to use, as well as other options. Its doc
page explains how to set the number of OpenMP threads via an
2015-07-30 22:53:28 +08:00
environment variable if desired.< / p >
< p > < strong > Speed-ups to expect:< / strong > < / p >
< p > If LAMMPS was not built with coprocessor support when including the
2014-09-10 23:32:24 +08:00
USER-INTEL package, then acclerated styles will run on the CPU using
vectorization optimizations and the specified precision. This may
give a substantial speed-up for a pair style, particularly if mixed or
2015-07-30 22:53:28 +08:00
single precision is used.< / p >
< p > If LAMMPS was built with coproccesor support, the pair styles will run
2014-09-10 23:32:24 +08:00
on one or more Intel(R) Xeon Phi(TM) coprocessors (per node). The
performance of a Xeon Phi versus a multi-core CPU is a function of
your hardware, which pair style is used, the number of
atoms/coprocessor, and the precision used on the coprocessor (double,
2015-07-30 22:53:28 +08:00
single, mixed).< / p >
< p > See the < a class = "reference external" href = "http://lammps.sandia.gov/bench.html" > Benchmark page< / a > of the
2014-09-10 23:32:24 +08:00
LAMMPS web site for performance of the USER-INTEL package on different
2015-07-30 22:53:28 +08:00
hardware.< / p >
< p > < strong > Guidelines for best performance on an Intel(R) Xeon Phi(TM)
coprocessor:< / strong > < / p >
< ul class = "simple" >
< li > The default for the < a class = "reference internal" href = "package.html" > < em > package intel< / em > < / a > command is to have
2014-09-10 23:32:24 +08:00
all the MPI tasks on a given compute node use a single Xeon Phi(TM)
coprocessor. In general, running with a large number of MPI tasks on
each node will perform best with offload. Each MPI task will
automatically get affinity to a subset of the hardware threads
available on the coprocessor. For example, if your card has 61 cores,
with 60 cores available for offload and 4 hardware threads per core
(240 total threads), running with 24 MPI tasks per node will cause
each MPI task to use a subset of 10 threads on the coprocessor. Fine
tuning of the number of threads to use per MPI task or the number of
threads to use per core can be accomplished with keyword settings of
2015-07-30 22:53:28 +08:00
the < a class = "reference internal" href = "package.html" > < em > package intel< / em > < / a > command.< / li >
< li > If desired, only a fraction of the pair style computation can be
2014-09-10 23:32:24 +08:00
offloaded to the coprocessors. This is accomplished by using the
2015-07-30 22:53:28 +08:00
< em > balance< / em > keyword in the < a class = "reference internal" href = "package.html" > < em > package intel< / em > < / a > command. A
2014-09-10 23:32:24 +08:00
balance of 0 runs all calculations on the CPU. A balance of 1 runs
all calculations on the coprocessor. A balance of 0.5 runs half of
the calculations on the coprocessor. Setting the balance to -1 (the
default) will enable dynamic load balancing that continously adjusts
the fraction of offloaded work throughout the simulation. This option
typically produces results within 5 to 10 percent of the optimal fixed
2015-07-30 22:53:28 +08:00
balance.< / li >
< li > When using offload with CPU hyperthreading disabled, it may help
2014-09-10 23:32:24 +08:00
performance to use fewer MPI tasks and OpenMP threads than available
cores. This is due to the fact that additional threads are generated
2015-07-30 22:53:28 +08:00
internally to handle the asynchronous offload tasks.< / li >
< li > If running short benchmark runs with dynamic load balancing, adding a
2014-09-10 23:32:24 +08:00
short warm-up run (10-20 steps) will allow the load-balancer to find a
2015-07-30 22:53:28 +08:00
near-optimal setting that will carry over to additional runs.< / li >
< li > If pair computations are being offloaded to an Intel(R) Xeon Phi(TM)
2014-09-10 23:32:24 +08:00
coprocessor, a diagnostic line is printed to the screen (not to the
log file), during the setup phase of a run, indicating that offload
mode is being used and indicating the number of coprocessor threads
per MPI task. Additionally, an offload timing summary is printed at
2015-07-30 22:53:28 +08:00
the end of each run. When offloading, the frequency for < a class = "reference internal" href = "atom_modify.html" > < em > atom sorting< / em > < / a > is changed to 1 so that the per-atom data is
effectively sorted at every rebuild of the neighbor lists.< / li >
< li > For simulations with long-range electrostatics or bond, angle,
2014-09-10 23:32:24 +08:00
dihedral, improper calculations, computation and data transfer to the
coprocessor will run concurrently with computations and MPI
communications for these calculations on the host CPU. The USER-INTEL
package has two modes for deciding which atoms will be handled by the
2015-07-30 22:53:28 +08:00
coprocessor. This choice is controlled with the < em > ghost< / em > keyword of
the < a class = "reference internal" href = "package.html" > < em > package intel< / em > < / a > command. When set to 0, ghost atoms
2014-09-10 23:32:24 +08:00
(atoms at the borders between MPI tasks) are not offloaded to the
card. This allows for overlap of MPI communication of forces with
2015-07-30 22:53:28 +08:00
computation on the coprocessor when the < a class = "reference internal" href = "newton.html" > < em > newton< / em > < / a > setting
is “ on” . The default is dependent on the style being used, however,
2014-09-10 23:32:24 +08:00
better performance may be achieved by setting this option
2015-07-30 22:53:28 +08:00
explictly.< / li >
< / ul >
< div class = "section" id = "restrictions" >
< h2 > Restrictions< a class = "headerlink" href = "#restrictions" title = "Permalink to this headline" > ¶< / a > < / h2 >
< p > When offloading to a coprocessor, < a class = "reference internal" href = "pair_hybrid.html" > < em > hybrid< / em > < / a > styles
2014-09-10 23:32:24 +08:00
that require skip lists for neighbor builds cannot be offloaded.
2015-07-30 22:53:28 +08:00
Using < a class = "reference internal" href = "pair_hybrid.html" > < em > hybrid/overlay< / em > < / a > is allowed. Only one intel
2014-09-10 23:32:24 +08:00
accelerated style may be used with hybrid styles.
2015-07-30 22:53:28 +08:00
< a class = "reference internal" href = "special_bonds.html" > < em > Special_bonds< / em > < / a > exclusion lists are not currently
2014-09-10 23:32:24 +08:00
supported with offload, however, the same effect can often be
accomplished by setting cutoffs for excluded atom types to 0. None of
the pair styles in the USER-INTEL package currently support the
2015-07-30 22:53:28 +08:00
“ inner” , “ middle” , “ outer” options for rRESPA integration via the
< a class = "reference internal" href = "run_style.html" > < em > run_style respa< / em > < / a > command; only the “ pair” option is
supported.< / p >
< / div >
< / div >
< / div >
< / div >
< footer >
< hr / >
< div role = "contentinfo" >
< p >
© Copyright .
< / p >
< / div >
Built with < a href = "http://sphinx-doc.org/" > Sphinx< / a > using a < a href = "https://github.com/snide/sphinx_rtd_theme" > theme< / a > provided by < a href = "https://readthedocs.org" > Read the Docs< / a > .
< / footer >
< / div >
< / div >
< / section >
< / div >
< script type = "text/javascript" >
var DOCUMENTATION_OPTIONS = {
URL_ROOT:'./',
VERSION:'15 May 2015 version',
COLLAPSE_INDEX:false,
FILE_SUFFIX:'.html',
HAS_SOURCE: true
};
< / script >
< script type = "text/javascript" src = "_static/jquery.js" > < / script >
< script type = "text/javascript" src = "_static/underscore.js" > < / script >
< script type = "text/javascript" src = "_static/doctools.js" > < / script >
< script type = "text/javascript" src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" > < / script >
< script type = "text/javascript" src = "_static/sphinxcontrib-images/LightBox2/lightbox2/js/jquery-1.11.0.min.js" > < / script >
< script type = "text/javascript" src = "_static/sphinxcontrib-images/LightBox2/lightbox2/js/lightbox.min.js" > < / script >
< script type = "text/javascript" src = "_static/sphinxcontrib-images/LightBox2/lightbox2-customize/jquery-noconflict.js" > < / script >
< script type = "text/javascript" src = "_static/js/theme.js" > < / script >
< script type = "text/javascript" >
jQuery(function () {
SphinxRtdTheme.StickyNav.enable();
});
< / script >
< / body >
< / html >