Corporation. It provides two methods for accelerating simulations,
depending on the hardware you have. The first is acceleration on
Intel(R) CPUs by running in single, mixed, or double precision with
vectorization. The second is acceleration on Intel(R) Xeon Phi(TM)
coprocessors via offloading neighbor list and non-bonded force
calculations to the Phi. The same C++ code is used in both cases.
When offloading to a coprocessor from a CPU, the same routine is run
twice, once on the CPU and once with an offload flag.</p>
<p>Note that the USER-INTEL package supports use of the Phi in “offload”
mode, not “native” mode like the <aclass="reference internal"href="accelerate_kokkos.html"><em>KOKKOS package</em></a>.</p>
<p>Also note that the USER-INTEL package can be used in tandem with the
<aclass="reference internal"href="accelerate_omp.html"><em>USER-OMP package</em></a>. This is useful when
offloading pair style computations to the Phi, so that other styles
not supported by the USER-INTEL package, e.g. bond, angle, dihedral,
improper, and long-range electrostatics, can run simultaneously in
threaded mode on the CPU cores. Since less MPI tasks than CPU cores
will typically be invoked when running with coprocessors, this enables
the extra CPU cores to be used for useful computation.</p>
<p>As illustrated below, if LAMMPS is built with both the USER-INTEL and
USER-OMP packages, this dual mode of operation is made easier to use,
via the “-suffix hybrid intel omp”<aclass="reference internal"href="Section_start.html#start-7"><span>command-line switch</span></a> or the <aclass="reference internal"href="suffix.html"><em>suffix hybrid intel omp</em></a> command. Both set a second-choice suffix to “omp” so
that styles from the USER-INTEL package will be used if available,
with styles from the USER-OMP package as a second choice.</p>
<p>Here is a quick overview of how to use the USER-INTEL package for CPU
acceleration, assuming one or more 16-core nodes. More details
follow.</p>
<divclass="highlight-python"><divclass="highlight"><pre>use an Intel compiler
use these CCFLAGS settings in Makefile.machine: -fopenmp, -DLAMMPS_MEMALIGN=64, -restrict, -xHost, -fno-alias, -ansi-alias, -override-limits
use these LINKFLAGS settings in Makefile.machine: -fopenmp, -xHost
make yes-user-intel yes-user-omp # including user-omp is optional
make mpi # build with the USER-INTEL package, if settings (including compiler) added to Makefile.mpi
make intel_cpu # or Makefile.intel_cpu already has settings, uses Intel MPI wrapper
Make.py -v -p intel omp -intel cpu -a file mpich_icc # or one-line build via Make.py for MPICH
Make.py -v -p intel omp -intel cpu -a file ompi_icc # or for OpenMPI
Make.py -v -p intel omp -intel cpu -a file intel_cpu # or for Intel MPI wrapper
<p>If you compute (any portion of) pairwise interactions using USER-INTEL
pair styles on the CPU, or use USER-OMP styles on the CPU, you need to
choose how many OpenMP threads per MPI task to use. If both packages
are used, it must be done for both packages, and the same thread count
value should be used for both. Note that the product of MPI tasks *
threads/task should not exceed the physical number of cores (on a
node), otherwise performance will suffer.</p>
<p>When using the USER-INTEL package for the Phi, you also need to
specify the number of coprocessor/node and optionally the number of
coprocessor threads per MPI task to use. Note that coprocessor
threads (which run on the coprocessor) are totally independent from
OpenMP threads (which run on the CPU). The default values for the
settings that affect coprocessor threads are typically fine, as
discussed below.</p>
<p>As in the lines above, use the “-sf intel” or “-sf hybrid intel omp”
<aclass="reference internal"href="Section_start.html#start-7"><span>command-line switch</span></a>, which will
automatically append “intel” to styles that support it. In the second
case, “omp” will be appended if an “intel” style does not exist.</p>
<p>Note that if either switch is used, it also invokes a default command:
<aclass="reference internal"href="package.html"><em>package intel 1</em></a>. If the “-sf hybrid intel omp” switch
is used, the default USER-OMP command <aclass="reference internal"href="package.html"><em>package omp 0</em></a> is
also invoked (if LAMMPS was built with USER-OMP). Both set the number
of OpenMP threads per MPI task via the OMP_NUM_THREADS environment
variable. The first command sets the number of Xeon Phi(TM)
coprocessors/node to 1 (ignored if USER-INTEL is built for CPU-only),
and the precision mode to “mixed” (default value).</p>
<p>You can also use the “-pk intel Nphi”<aclass="reference internal"href="Section_start.html#start-7"><span>command-line switch</span></a> to explicitly set Nphi = # of Xeon
Phi(TM) coprocessors/node, as well as additional options. Nphi should
be >= 1 if LAMMPS was built with coprocessor support, otherswise Nphi
= 0 for a CPU-only build. All the available coprocessor threads on
each Phi will be divided among MPI tasks, unless the <em>tptask</em> option
of the “-pk intel”<aclass="reference internal"href="Section_start.html#start-7"><span>command-line switch</span></a> is
used to limit the coprocessor threads per MPI task. See the <aclass="reference internal"href="package.html"><em>package intel</em></a> command for details, including the default values
used for all its options if not specified, and how to set the number
of OpenMP threads via the OMP_NUM_THREADS environment variable if
desired.</p>
<p>If LAMMPS was built with the USER-OMP package, you can also use the
“-pk omp Nt”<aclass="reference internal"href="Section_start.html#start-7"><span>command-line switch</span></a> to
explicitly set Nt = # of OpenMP threads per MPI task to use, as well
as additional options. Nt should be the same threads per MPI task as
set for the USER-INTEL package, e.g. via the “-pk intel Nphi omp Nt”
command. Again, see the <aclass="reference internal"href="package.html"><em>package omp</em></a> command for
details, including the default values used for all its options if not
<p>Use the <aclass="reference internal"href="suffix.html"><em>suffix intel</em></a> or <aclass="reference internal"href="suffix.html"><em>suffix hybrid intel omp</em></a> commands, or you can explicitly add an “intel” or
“omp” suffix to individual styles in your input script, e.g.</p>
<p>You must also use the <aclass="reference internal"href="package.html"><em>package intel</em></a> command, unless the
“-sf intel” or “-pk intel”<aclass="reference internal"href="Section_start.html#start-7"><span>command-line switches</span></a> were used. It specifies how many
the “-sf hybrid intel omp” or “-pk omp”<aclass="reference internal"href="Section_start.html#start-7"><span>command-line switches</span></a> were used. It specifies how many
<pclass="last">Setting core affinity is often used to pin MPI tasks
and OpenMP threads to a core or group of cores so that memory access
can be uniform. Unless disabled at build time, affinity for MPI tasks
and OpenMP threads on the host (CPU) will be set by default on the
host when using offload to a coprocessor. In this case, it is
unnecessary to use other methods to control affinity (e.g. taskset,
numactl, I_MPI_PIN_DOMAIN, etc.). This can be disabled in an input
script with the <em>no_affinity</em> option to the <aclass="reference internal"href="package.html"><em>package intel</em></a> command or by disabling the option at build time
(by adding -DINTEL_OFFLOAD_NOAFFINITY to the CCFLAGS line of your
Makefile). Disabling this option is not recommended, especially when
running on a machine with hyperthreading disabled.</p>
the end of each run. When offloading, the frequency for <aclass="reference internal"href="atom_modify.html"><em>atom sorting</em></a> is changed to 1 so that the per-atom data is
effectively sorted at every rebuild of the neighbor lists.</li>
<li>For simulations with long-range electrostatics or bond, angle,
Built with <ahref="http://sphinx-doc.org/">Sphinx</a> using a <ahref="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <ahref="https://readthedocs.org">Read the Docs</a>.