git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12318 f3b2605a-c512-4ea7-a41b-209d697bcdaa

This commit is contained in:
sjplimp 2014-08-14 20:26:52 +00:00
parent af7d84de2d
commit b2f3ef52e4
6 changed files with 188 additions and 166 deletions

View File

@ -978,45 +978,52 @@ LAMMPS.
<P>The USER-INTEL package was developed by Mike Brown at Intel
Corporation. It provides a capability to accelerate simulations by
offloading neighbor list and non-bonded force calculations to Intel
coprocessors. Additionally, it supports running simulations in
single, mixed, or double precision with vectorization, even if a
coprocessor is not present. The same C++ code is used for both cases.
When offloading to a coprocessor, the routine is run twice, once with
an offload flag.
coprocessors (Xeon Phi). Additionally, it supports running
simulations in single, mixed, or double precision with vectorization,
even if a coprocessor is not present, i.e. on an Intel CPU. The same
C++ code is used for both cases. When offloading to a coprocessor,
the routine is run twice, once with an offload flag.
</P>
<P>The USER-INTEL package will work with the USER-OMP package. Specifying
use of the Intel package implicitly includes the OMP package allowing
it to be used for angle, bond, dihedral, and long-range
electrostatics. Using the <A HREF = "suffix.html">suffix intel</A> command will use
styles from the Intel package if available; otherwise it will use
styles from the OMP package if available.
<P>The USER-INTEL package can be used in tandem with the USER-OMP
package. This is useful when a USER-INTEL pair style is used, so that
other styles not supported by the USER-INTEL package, e.g. for bond,
angle, dihedral, improper, and long-range electrostatics can be run
with the USER-OMP package versions. If you have built LAMMPS with
both the USER-INTEL and USER-OMP packages, then this mode of operation
is made easier, because the "-suffix intel" <A HREF = "Section_start.html#start_7">command-line
switch</A> and the the <A HREF = "suffix.html">suffix
intel</A> command will both set a second-choice suffix to
"omp" so that styles from the USER-OMP package will be used if
available.
</P>
<P><B>Building LAMMPS with the USER-INTEL package:</B>
</P>
<P>The procedure for building LAMMPS with the USER-INTEL package is
simple. You have to edit your machine specific makefile to add the
flags to enable OpenMP support (<I>-openmp</I>) to both the CCFLAGS and
LINKFLAGS variables. You also need to add -restrict to CCFLAGS. If
you are compiling on the same architecture that will be used for the
runs, adding the flag <I>-xHost</I> will enable vectorization with the
Intel compiler. In order to build with support for an Intel
LINKFLAGS variables. You also need to add -DLAMMPS_MEMALIGN=64 and
-restrict to CCFLAGS.
</P>
<P>If you are compiling on the same architecture that will be used for
the runs, adding the flag <I>-xHost</I> will enable vectorization with the
Intel compiler. In order to build with support for an Intel
coprocessor, the flag <I>-offload</I> should be added to the LINKFLAGS line
and the flag <I>-DLMP_INTEL_OFFLOAD</I> should be added to the CCFLAGS
line.
</P>
<P>The files src/MAKE/Makefile.intel and src/MAKE/Makefile.intel_offload
are provided with options that perform well with the Intel
compiler. The latter Makefile has support for offload to coprocessors
and the former does not.
are included in the src/MAKE directory with options that perform well
with the Intel compiler. The latter Makefile has support for offload
to coprocessors and the former does not.
</P>
<P>It is recommended that Intel Compiler 2013 SP1 update 1 be used for
compiling. Newer versions have some performance issues that are being
addressed. If using Intel MPI, version 5 or higher is recommended.
</P>
<P>The rest of the compilation is the same as for any other package that
has no additional library dependencies:
has no additional library dependencies, e.g.
</P>
<PRE>make yes-user-omp yes-user-intel
<PRE>make yes-user-intel yes-user-omp
make machine
</PRE>
<P><B>Running an input script:</B>
@ -1032,94 +1039,97 @@ commands, and is independent of the Intel package.
<P>Input script requirements to run using pair styles with a <I>intel</I>
suffix are as follows:
</P>
<P>To invoke specific styles from the Intel package, either append
<P>To invoke specific styles from the UESR-INTEL package, either append
"intel" to the style name (e.g. pair_style lj/cut/intel), or use the
<A HREF = "Section_start.html#start_7">-suffix command-line switch</A>, or use the
<A HREF = "suffix.html">suffix</A> command in the input script.
</P>
<P>Unless the <A HREF = "Section_start.html#start_7">-suffix intel command-line
switch</A> is used, the <A HREF = "package.html">package
switch</A> is used, a <A HREF = "package.html">package
intel</A> command must be used near the beginning of the
script. The default precision mode for the Intel package is <I>mixed</I>,
meaning that accumulation is performed in double precision and other
calculations are performed in single precision. In order to use all
single or all double precision, the "package intel" line must be used
in the input script with a "single" or "double" keyword specified.
input script. The default precision mode for the USER-INTEL package
is <I>mixed</I>, meaning that accumulation is performed in double precision
and other calculations are performed in single precision. In order to
use all single or all double precision, the <A HREF = "package.html">package
intel</A> command must be used in the input script with a
"single" or "double" keyword specified.
</P>
<P><B>Running with an Intel coprocessor:</B>
</P>
<P>The Intel package supports offload of a fraction of the work to Intel
coprocessors. This is accomplished by setting a balance fraction on
the <A HREF = "package.html">package intel</A> line. A balance of 0 runs all
calculations on the CPU. A balance of 1 runs all calculations on the
coprocessor. A balance of 0.5 runs half of the calculations on the
coprocessor. Setting the balance to -1 will enable dynamic load
balancing that continously adjusts the fraction of offloaded work
throughout the simulation. This option is typically within 5 to 10
percent of the optimal fixed balance. By default, using the suffix
command or command-line switch will use offload to a coprocessor with
the balance set to -1. If LAMMPS is built without offload support,
this setting is ignored.
<P>The USER-INTEL package supports offload of a fraction of the work to
Intel coprocessors (Xeon Phi). This is accomplished by setting a
balance fraction on the <A HREF = "package.html">package intel</A> command. A
balance of 0 runs all calculations on the CPU. A balance of 1 runs
all calculations on the coprocessor. A balance of 0.5 runs half of
the calculations on the coprocessor. Setting the balance to -1 will
enable dynamic load balancing that continously adjusts the fraction of
offloaded work throughout the simulation. This option typically
produces results within 5 to 10 percent of the optimal fixed balance.
By default, using the <A HREF = "suffix.html">suffix</A> command or <A HREF = "Section_start.html#start_7">-suffix
command-line switch</A> will use offload to a
coprocessor with the balance set to -1. If LAMMPS is built without
offload support, this setting is ignored.
</P>
<P>If one is running short benchmark runs with dynamic load balancing,
adding a short warm-up run (10-20 steps) will allow the load-balancer
to find a setting that will be carried over to additional runs.
to find a setting that will carry over to additional runs.
</P>
<P>The default for the <A HREF = "package.html">package intel</A> command is to have
all of the MPI tasks on a given compute node use a single
coprocessor. In general, running with a large number of MPI tasks on
each node will perform best with offload. Each MPI task will
all the MPI tasks on a given compute node use a single coprocessor
(Xeon Phi). In general, running with a large number of MPI tasks on
each node will perform best with offload. Each MPI task will
automatically get affinity to a subset of the hardware threads
available on the coprocessor. For example, if your card has 61 cores,
with 60 cores available for offload and 4 hardware threads per core,
running with 24 MPI tasks per node will cause each MPI task to use a
subset of 10 threads on the coprocessor. Fine tuning of the number of
threads to use per MPI task or the number of threads to use per core
can be accomplished with keywords to the <A HREF = "package.html">package intel</A>
command.
available on the coprocessor. For example, if your card has 61 cores,
with 60 cores available for offload and 4 hardware threads per core
(240 total threads), running with 24 MPI tasks per node will cause
each MPI task to use a subset of 10 threads on the coprocessor. Fine
tuning of the number of threads to use per MPI task or the number of
threads to use per core can be accomplished with keywords to the
<A HREF = "package.html">package intel</A> command.
</P>
<P>If LAMMPS is using offload to a coprocessor, a diagnostic line during
the setup for a run is printed to the screen (not to log files)
indicating that offload is being used and the number of coprocessor
threads per MPI task. Additionally, an offload timing summary is
printed at the end of each run. When using offload, the
<A HREF = "atom_modify.html">sort</A> frequency for atom data is changed to 1 such
that the data is sorted every neighbor build.
<P>If LAMMPS is using offload to a coprocessor (Xeon Phi), a diagnostic
line during the setup for a run is printed to the screen (not to log
files) indicating that offload is being used and the number of
coprocessor threads per MPI task. Additionally, an offload timing
summary is printed at the end of each run. When using offload, the
<A HREF = "atom_modify.html">sort</A> frequency for atom data is changed to 1 so
that the per-atom data is sorted every neighbor build.
</P>
<P>In order to use multiple coprocessors on each compute node, the
<P>To use multiple coprocessors (Xeon Phis) on each compute node, the
<I>offload_cards</I> keyword can be specified with the <A HREF = "package.html">package
intel</A> command to specify the number of coprocessors to
use.
</P>
<P>For simulations involving long-range electrostatics or angle, bond,
and dihedral calculations, computation and data transfer to the
<P>For simulations with long-range electrostatics or bond, angle,
dihedral, improper calculations, computation and data transfer to the
coprocessor will run concurrently with computations and MPI
communications for these routines on the host. The Intel package has
two modes for deciding which atoms will be handled by the coprocessor.
The setting is controlled with the "offload_ghost" option. When set to
0, ghost atoms (atoms at the borders between MPI tasks) are not
offloaded to the card. This allows for overlap of MPI communication of
forces with computation on the coprocessor when the
<A HREF = "newton.html">newton</A> setting is "on". The default is dependent on the
style being used, however, better performance might be achieving by
communications for these routines on the host. The USER-INTEL package
has two modes for deciding which atoms will be handled by the
coprocessor. The setting is controlled with the "offload_ghost"
option. When set to 0, ghost atoms (atoms at the borders between MPI
tasks) are not offloaded to the card. This allows for overlap of MPI
communication of forces with computation on the coprocessor when the
<A HREF = "newton.html">newton</A> setting is "on". The default is dependent on the
style being used, however, better performance might be achieved by
setting this explictly.
</P>
<P>In order to control the number of OpenMP threads used on the host, the
OMP_NUM_THREADS environment variable should be set. This variable will
not influence the number of threads used on the coprocessor. Only the
"package intel" command can be used to control thread counts on the
coprocessor.
<A HREF = "package.html">package intel</A> command can be used to control thread
counts on the coprocessor.
</P>
<P><B>Restrictions:</B>
</P>
<P>When using offload, <A HREF = "pair_hybrid.html">hybrid</A> styles that require skip
lists for neighbor builds cannot be offloaded to the coprocessor.
Using <A HREF = "pair_hybrid.html">hybrid/overlay</A> is allowed. Only one intel
accelerated style may be used with hybrid styles. Exclusion lists are
Using <A HREF = "pair_hybrid.html">hybrid/overlay</A> is allowed. Only one intel
accelerated style may be used with hybrid styles. Exclusion lists are
not currently supported with offload, however, the same effect can
often be accomplished by setting cutoffs for excluded atom types to
0. None of the pair styles in the USER-OMP package support the
"inner", "middle", "outer" options for r-RESPA integration.
often be accomplished by setting cutoffs for excluded atom types to 0.
None of the pair styles in the USER-OMP package currently support the
"inner", "middle", "outer" options for rRESPA integration via the
<A HREF = "run_style.html">run_style respa</A> command.
</P>
<HR>

View File

@ -974,45 +974,52 @@ LAMMPS.
The USER-INTEL package was developed by Mike Brown at Intel
Corporation. It provides a capability to accelerate simulations by
offloading neighbor list and non-bonded force calculations to Intel
coprocessors. Additionally, it supports running simulations in
single, mixed, or double precision with vectorization, even if a
coprocessor is not present. The same C++ code is used for both cases.
When offloading to a coprocessor, the routine is run twice, once with
an offload flag.
coprocessors (Xeon Phi). Additionally, it supports running
simulations in single, mixed, or double precision with vectorization,
even if a coprocessor is not present, i.e. on an Intel CPU. The same
C++ code is used for both cases. When offloading to a coprocessor,
the routine is run twice, once with an offload flag.
The USER-INTEL package will work with the USER-OMP package. Specifying
use of the Intel package implicitly includes the OMP package allowing
it to be used for angle, bond, dihedral, and long-range
electrostatics. Using the "suffix intel"_suffix.html command will use
styles from the Intel package if available; otherwise it will use
styles from the OMP package if available.
The USER-INTEL package can be used in tandem with the USER-OMP
package. This is useful when a USER-INTEL pair style is used, so that
other styles not supported by the USER-INTEL package, e.g. for bond,
angle, dihedral, improper, and long-range electrostatics can be run
with the USER-OMP package versions. If you have built LAMMPS with
both the USER-INTEL and USER-OMP packages, then this mode of operation
is made easier, because the "-suffix intel" "command-line
switch"_Section_start.html#start_7 and the the "suffix
intel"_suffix.html command will both set a second-choice suffix to
"omp" so that styles from the USER-OMP package will be used if
available.
[Building LAMMPS with the USER-INTEL package:]
The procedure for building LAMMPS with the USER-INTEL package is
simple. You have to edit your machine specific makefile to add the
flags to enable OpenMP support ({-openmp}) to both the CCFLAGS and
LINKFLAGS variables. You also need to add -restrict to CCFLAGS. If
you are compiling on the same architecture that will be used for the
runs, adding the flag {-xHost} will enable vectorization with the
Intel compiler. In order to build with support for an Intel
LINKFLAGS variables. You also need to add -DLAMMPS_MEMALIGN=64 and
-restrict to CCFLAGS.
If you are compiling on the same architecture that will be used for
the runs, adding the flag {-xHost} will enable vectorization with the
Intel compiler. In order to build with support for an Intel
coprocessor, the flag {-offload} should be added to the LINKFLAGS line
and the flag {-DLMP_INTEL_OFFLOAD} should be added to the CCFLAGS
line.
The files src/MAKE/Makefile.intel and src/MAKE/Makefile.intel_offload
are provided with options that perform well with the Intel
compiler. The latter Makefile has support for offload to coprocessors
and the former does not.
are included in the src/MAKE directory with options that perform well
with the Intel compiler. The latter Makefile has support for offload
to coprocessors and the former does not.
It is recommended that Intel Compiler 2013 SP1 update 1 be used for
compiling. Newer versions have some performance issues that are being
addressed. If using Intel MPI, version 5 or higher is recommended.
The rest of the compilation is the same as for any other package that
has no additional library dependencies:
has no additional library dependencies, e.g.
make yes-user-omp yes-user-intel
make yes-user-intel yes-user-omp
make machine :pre
[Running an input script:]
@ -1028,94 +1035,97 @@ commands, and is independent of the Intel package.
Input script requirements to run using pair styles with a {intel}
suffix are as follows:
To invoke specific styles from the Intel package, either append
To invoke specific styles from the UESR-INTEL package, either append
"intel" to the style name (e.g. pair_style lj/cut/intel), or use the
"-suffix command-line switch"_Section_start.html#start_7, or use the
"suffix"_suffix.html command in the input script.
Unless the "-suffix intel command-line
switch"_Section_start.html#start_7 is used, the "package
switch"_Section_start.html#start_7 is used, a "package
intel"_package.html command must be used near the beginning of the
script. The default precision mode for the Intel package is {mixed},
meaning that accumulation is performed in double precision and other
calculations are performed in single precision. In order to use all
single or all double precision, the "package intel" line must be used
in the input script with a "single" or "double" keyword specified.
input script. The default precision mode for the USER-INTEL package
is {mixed}, meaning that accumulation is performed in double precision
and other calculations are performed in single precision. In order to
use all single or all double precision, the "package
intel"_package.html command must be used in the input script with a
"single" or "double" keyword specified.
[Running with an Intel coprocessor:]
The Intel package supports offload of a fraction of the work to Intel
coprocessors. This is accomplished by setting a balance fraction on
the "package intel"_package.html line. A balance of 0 runs all
calculations on the CPU. A balance of 1 runs all calculations on the
coprocessor. A balance of 0.5 runs half of the calculations on the
coprocessor. Setting the balance to -1 will enable dynamic load
balancing that continously adjusts the fraction of offloaded work
throughout the simulation. This option is typically within 5 to 10
percent of the optimal fixed balance. By default, using the suffix
command or command-line switch will use offload to a coprocessor with
the balance set to -1. If LAMMPS is built without offload support,
this setting is ignored.
The USER-INTEL package supports offload of a fraction of the work to
Intel coprocessors (Xeon Phi). This is accomplished by setting a
balance fraction on the "package intel"_package.html command. A
balance of 0 runs all calculations on the CPU. A balance of 1 runs
all calculations on the coprocessor. A balance of 0.5 runs half of
the calculations on the coprocessor. Setting the balance to -1 will
enable dynamic load balancing that continously adjusts the fraction of
offloaded work throughout the simulation. This option typically
produces results within 5 to 10 percent of the optimal fixed balance.
By default, using the "suffix"_suffix.html command or "-suffix
command-line switch"_Section_start.html#start_7 will use offload to a
coprocessor with the balance set to -1. If LAMMPS is built without
offload support, this setting is ignored.
If one is running short benchmark runs with dynamic load balancing,
adding a short warm-up run (10-20 steps) will allow the load-balancer
to find a setting that will be carried over to additional runs.
to find a setting that will carry over to additional runs.
The default for the "package intel"_package.html command is to have
all of the MPI tasks on a given compute node use a single
coprocessor. In general, running with a large number of MPI tasks on
each node will perform best with offload. Each MPI task will
all the MPI tasks on a given compute node use a single coprocessor
(Xeon Phi). In general, running with a large number of MPI tasks on
each node will perform best with offload. Each MPI task will
automatically get affinity to a subset of the hardware threads
available on the coprocessor. For example, if your card has 61 cores,
with 60 cores available for offload and 4 hardware threads per core,
running with 24 MPI tasks per node will cause each MPI task to use a
subset of 10 threads on the coprocessor. Fine tuning of the number of
threads to use per MPI task or the number of threads to use per core
can be accomplished with keywords to the "package intel"_package.html
command.
available on the coprocessor. For example, if your card has 61 cores,
with 60 cores available for offload and 4 hardware threads per core
(240 total threads), running with 24 MPI tasks per node will cause
each MPI task to use a subset of 10 threads on the coprocessor. Fine
tuning of the number of threads to use per MPI task or the number of
threads to use per core can be accomplished with keywords to the
"package intel"_package.html command.
If LAMMPS is using offload to a coprocessor, a diagnostic line during
the setup for a run is printed to the screen (not to log files)
indicating that offload is being used and the number of coprocessor
threads per MPI task. Additionally, an offload timing summary is
printed at the end of each run. When using offload, the
"sort"_atom_modify.html frequency for atom data is changed to 1 such
that the data is sorted every neighbor build.
If LAMMPS is using offload to a coprocessor (Xeon Phi), a diagnostic
line during the setup for a run is printed to the screen (not to log
files) indicating that offload is being used and the number of
coprocessor threads per MPI task. Additionally, an offload timing
summary is printed at the end of each run. When using offload, the
"sort"_atom_modify.html frequency for atom data is changed to 1 so
that the per-atom data is sorted every neighbor build.
In order to use multiple coprocessors on each compute node, the
To use multiple coprocessors (Xeon Phis) on each compute node, the
{offload_cards} keyword can be specified with the "package
intel"_package.html command to specify the number of coprocessors to
use.
For simulations involving long-range electrostatics or angle, bond,
and dihedral calculations, computation and data transfer to the
For simulations with long-range electrostatics or bond, angle,
dihedral, improper calculations, computation and data transfer to the
coprocessor will run concurrently with computations and MPI
communications for these routines on the host. The Intel package has
two modes for deciding which atoms will be handled by the coprocessor.
The setting is controlled with the "offload_ghost" option. When set to
0, ghost atoms (atoms at the borders between MPI tasks) are not
offloaded to the card. This allows for overlap of MPI communication of
forces with computation on the coprocessor when the
"newton"_newton.html setting is "on". The default is dependent on the
style being used, however, better performance might be achieving by
communications for these routines on the host. The USER-INTEL package
has two modes for deciding which atoms will be handled by the
coprocessor. The setting is controlled with the "offload_ghost"
option. When set to 0, ghost atoms (atoms at the borders between MPI
tasks) are not offloaded to the card. This allows for overlap of MPI
communication of forces with computation on the coprocessor when the
"newton"_newton.html setting is "on". The default is dependent on the
style being used, however, better performance might be achieved by
setting this explictly.
In order to control the number of OpenMP threads used on the host, the
OMP_NUM_THREADS environment variable should be set. This variable will
not influence the number of threads used on the coprocessor. Only the
"package intel" command can be used to control thread counts on the
coprocessor.
"package intel"_package.html command can be used to control thread
counts on the coprocessor.
[Restrictions:]
When using offload, "hybrid"_pair_hybrid.html styles that require skip
lists for neighbor builds cannot be offloaded to the coprocessor.
Using "hybrid/overlay"_pair_hybrid.html is allowed. Only one intel
accelerated style may be used with hybrid styles. Exclusion lists are
Using "hybrid/overlay"_pair_hybrid.html is allowed. Only one intel
accelerated style may be used with hybrid styles. Exclusion lists are
not currently supported with offload, however, the same effect can
often be accomplished by setting cutoffs for excluded atom types to
0. None of the pair styles in the USER-OMP package support the
"inner", "middle", "outer" options for r-RESPA integration.
often be accomplished by setting cutoffs for excluded atom types to 0.
None of the pair styles in the USER-OMP package currently support the
"inner", "middle", "outer" options for rRESPA integration via the
"run_style respa"_run_style.html command.
:line

View File

@ -1497,8 +1497,9 @@ if desired.
default Intel settings, as if the command "package intel * mixed
balance -1" were used at the top of your input script. These settings
can be changed by using the <A HREF = "package.html">package intel</A> command in
your script if desired. The intel suffix will attempt to use styles
from the OMP package if they are not present in the Intel package.
your script if desired. If the USER-OMP package is installed, the
intel suffix will make the omp suffix a second choice, if a requested
style is not available in the USER-INTEL package.
</P>
<P>For the KOKKOS package, using this command-line switch also invokes
the default KOKKOS settings, as if the command "package kokkos neigh
@ -1511,9 +1512,9 @@ default OMP settings, as if the command "package omp *" were used at
the top of your input script. These settings can be changed by using
the <A HREF = "package.html">package omp</A> command in your script if desired.
</P>
<P>The <A HREF = "suffix.html">suffix</A> command can also be used set a suffix and it
can also turn off or back on any suffix setting made via the command
line.
<P>The <A HREF = "suffix.html">suffix</A> command can also be used to set a suffix and
it can also turn off or back on any suffix setting made via the
command line.
</P>
<PRE>-var name value1 value2 ...
</PRE>

View File

@ -1491,8 +1491,9 @@ For the Intel package, using this command-line switch also invokes the
default Intel settings, as if the command "package intel * mixed
balance -1" were used at the top of your input script. These settings
can be changed by using the "package intel"_package.html command in
your script if desired. The intel suffix will attempt to use styles
from the OMP package if they are not present in the Intel package.
your script if desired. If the USER-OMP package is installed, the
intel suffix will make the omp suffix a second choice, if a requested
style is not available in the USER-INTEL package.
For the KOKKOS package, using this command-line switch also invokes
the default KOKKOS settings, as if the command "package kokkos neigh
@ -1505,9 +1506,9 @@ default OMP settings, as if the command "package omp *" were used at
the top of your input script. These settings can be changed by using
the "package omp"_package.html command in your script if desired.
The "suffix"_suffix.html command can also be used set a suffix and it
can also turn off or back on any suffix setting made via the command
line.
The "suffix"_suffix.html command can also be used to set a suffix and
it can also turn off or back on any suffix setting made via the
command line.
-var name value1 value2 ... :pre

View File

@ -80,9 +80,9 @@ If the variant version does not exist, the standard version is
created.
</P>
<P>When using the intel suffix, LAMMPS will first attempt to use a style
with the intel suffix. If this does not exist, a style with the omp
suffix is attempted. If this also does not exist, the style without
any suffix is used.
with the intel suffix. If the USER-OMP package is installed, the the
omp suffix will be tried as a second choice, if a requested style is
not available in the USER-INTEL package.
</P>
<P>If the specified style is <I>off</I>, then any previously specified suffix
is temporarily disabled, whether it was specified by a command-line

View File

@ -77,9 +77,9 @@ If the variant version does not exist, the standard version is
created.
When using the intel suffix, LAMMPS will first attempt to use a style
with the intel suffix. If this does not exist, a style with the omp
suffix is attempted. If this also does not exist, the style without
any suffix is used.
with the intel suffix. If the USER-OMP package is installed, the the
omp suffix will be tried as a second choice, if a requested style is
not available in the USER-INTEL package.
If the specified style is {off}, then any previously specified suffix
is temporarily disabled, whether it was specified by a command-line