From 0e398e5e65e49d9bac3a94ff2314ab963c4be649 Mon Sep 17 00:00:00 2001 From: sjplimp Date: Mon, 5 Oct 2015 15:18:49 +0000 Subject: [PATCH] '' git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@14089 f3b2605a-c512-4ea7-a41b-209d697bcdaa --- doc/doc2/Section_start.html | 31 ++++++++++------- doc/doc2/accelerate_intel.html | 62 ++++++++++++++++------------------ doc/doc2/package.html | 2 +- doc/doc2/suffix.html | 24 ++++++++----- 4 files changed, 64 insertions(+), 55 deletions(-) diff --git a/doc/doc2/Section_start.html b/doc/doc2/Section_start.html index 9d501b5901..0217938215 100644 --- a/doc/doc2/Section_start.html +++ b/doc/doc2/Section_start.html @@ -1653,15 +1653,22 @@ multi-partition mode, if the specified file is "none", then no screen output is performed. Option -pscreen will override the name of the partition screen files file.N.

-
-suffix style 
+
-suffix style args 
 

Use variants of various styles if they exist. The specified style can -be cuda, gpu, intel, kk, omp, or opt. These refer to -optional packages that LAMMPS can be built with, as described above in +be cuda, gpu, intel, kk, omp, opt, or hybrid. These refer +to optional packages that LAMMPS can be built with, as described above in Section 2.3. The "cuda" style corresponds to the USER-CUDA package, the "gpu" style to the GPU package, the "intel" style to the USER-INTEL package, the "kk" style to the KOKKOS package, the "opt" -style to the OPT package, and the "omp" style to the USER-OMP package. +style to the OPT package, and the "omp" style to the USER-OMP package. The +hybrid style is the only style that accepts arguments. It allows for two +packages to be specified. The first package specified is the default and +will be used if it is available. If no style is available for the first +package, the style for the second package will be used if available. For +example, "-suffix hybrid intel omp" will use styles from the USER-INTEL +package if they are installed and available, but styles for the USER-OMP +package otherwise.

Along with the "-package" command-line switch, this is a convenient mechanism for invoking accelerator packages and their options without @@ -1688,14 +1695,14 @@ gpu command in your script. invokes the default USER-INTEL settings, as if the command "package intel 1" were used at the top of your input script. These settings can be changed by using the "-package intel" command-line switch or -the package intel command in your script. If the -USER-OMP package is also installed, the intel suffix will make the omp -suffix a second choice, if a requested style is not available in the -USER-INTEL package. It will also invoke the default USER-OMP -settings, as if the command "package omp 0" were used at the top of -your input script. These settings can be changed by using the -"-package omp" command-line switch or the package omp -command in your script. +the package intel command in your script. If the +USER-OMP package is also installed, the hybrid style with "intel omp" +arguments can be used to make the omp suffix a second choice, if a +requested style is not available in the USER-INTEL package. It will +also invoke the default USER-OMP settings, as if the command "package +omp 0" were used at the top of your input script. These settings can +be changed by using the "-package omp" command-line switch or the +package omp command in your script.

For the KOKKOS package, using this command-line switch also invokes the default KOKKOS settings, as if the command "package kokkos" were diff --git a/doc/doc2/accelerate_intel.html b/doc/doc2/accelerate_intel.html index 459b314d9f..aa1a7f938e 100644 --- a/doc/doc2/accelerate_intel.html +++ b/doc/doc2/accelerate_intel.html @@ -34,16 +34,16 @@ when running with coprocessors, this enables the extra CPU cores to be used for useful computation.

If LAMMPS is built with both the USER-INTEL and USER-OMP packages -intsalled, this mode of operation is made easier to use, because the -"-suffix intel" command-line switch or -the suffix intel command will both set a second-choice -suffix to "omp" so that styles from the USER-OMP package will be used -if available, after first testing if a style from the USER-INTEL +installed, this mode of operation is made easier to use, with the +"-suffix hybrid intel omp" command-line switch +or the suffix hybrid intel omp command will both set a +second-choice suffix to "omp" so that styles from the USER-OMP package will be +used if available, after first testing if a style from the USER-INTEL package is available.

-

When using the USER-INTEL package, you must choose at build time -whether you are building for CPU-only acceleration or for using the -Xeon Phi in offload mode. +

When using the USER-INTEL package, you must choose at build time whether the +binary will support offload to Xeon Phi coprocessors. Binaries supporting +offload can still be run in CPU-only (host-only) mode.

Here is a quick overview of how to use the USER-INTEL package for CPU-only acceleration: @@ -170,16 +170,15 @@ suffer.

If LAMMPS was built with coprocessor support for the USER-INTEL package, you also need to specify the number of coprocessor/node and -the number of coprocessor threads per MPI task to use. Note that +optionally the number of coprocessor threads per MPI task to use. Note that coprocessor threads (which run on the coprocessor) are totally independent from OpenMP threads (which run on the CPU). The default values for the settings that affect coprocessor threads are typically fine, as discussed below.

Use the "-sf intel" command-line switch, -which will automatically append "intel" to styles that support it. If -a style does not support it, an "omp" suffix is tried next. OpenMP -threads per MPI task can be set via the "-pk intel Nphi omp Nt" or +which will automatically append "intel" to styles that support it. +OpenMP threads per MPI task can be set via the "-pk intel Nphi omp Nt" or "-pk omp Nt" command-line switches, which set Nt = # of OpenMP threads per MPI task to use. The "-pk omp" form is only allowed if LAMMPS was also built with the USER-OMP package. @@ -194,37 +193,34 @@ threads per MPI task. See the package intel comman for details.

CPU-only without USER-OMP (but using Intel vectorization on CPU):
-lmp_machine -sf intel -in in.script                 # 1 MPI task
-mpirun -np 32 lmp_machine -sf intel -in in.script   # 32 MPI tasks on as many nodes as needed (e.g. 2 16-core nodes) 
+mpirun -np 32 lmp_machine -sf intel -in in.script         # 32 MPI tasks on as many nodes as needed (e.g. 2 16-core nodes)
+lmp_machine -sf intel -pk intel 0 omp 16 -in in.script    # 1 MPI task and 16 threads 
 
CPU-only with USER-OMP (and Intel vectorization on CPU):
-lmp_machine -sf intel -pk intel 16 0 -in in.script             # 1 MPI task on a 16-core node
-mpirun -np 4 lmp_machine -sf intel -pk omp 4 -in in.script     # 4 MPI tasks each with 4 threads on a single 16-core node
-mpirun -np 32 lmp_machine -sf intel -pk omp 4 -in in.script    # ditto on 8 16-core nodes 
+lmp_machine -sf hybrid intel omp -pk intel 0 omp 16 -in in.script         # 1 MPI task on a 16-core node with 16 threads
+mpirun -np 4 lmp_machine -sf hybrid intel omp -pk omp 4 -in in.script     # 4 MPI tasks each with 4 threads on a single 16-core node 
 
CPUs + Xeon Phi(TM) coprocessors with or without USER-OMP:
-lmp_machine -sf intel -pk intel 1 omp 16 -in in.script                       # 1 MPI task, 16 OpenMP threads on CPU, 1 coprocessor, all 240 coprocessor threads
-lmp_machine -sf intel -pk intel 1 omp 16 tptask 32 -in in.script             # 1 MPI task, 16 OpenMP threads on CPU, 1 coprocessor, only 32 coprocessor threads
-mpirun -np 4 lmp_machine -sf intel -pk intel 1 omp 4 -in in.script           # 4 MPI tasks, 4 OpenMP threads/task, 1 coprocessor, 60 coprocessor threads/task
-mpirun -np 32 -ppn 4 lmp_machine -sf intel -pk intel 1 omp 4 -in in.script   # ditto on 8 16-core nodes
-mpirun -np 8 lmp_machine -sf intel -pk intel 4 omp 2 -in in.script           # 8 MPI tasks, 2 OpenMP threads/task, 4 coprocessors, 120 coprocessor threads/task 
+mpirun -np 32 -ppn 16 lmp_machine -sf intel -pk intel 1 -in in.script                 # 2 nodes with 16 MPI tasks on each, 240 total threads on coprocessor
+mpirun -np 16 -ppn 8 lmp_machine -sf intel -pk intel 1 omp 2 -in in.script            # 2 nodes, 8 MPI tasks on each node, 2 threads for each task, 240 total threads on coprocessor
+mpirun -np 16 -ppn 8 lmp_machine -sf hybrid intel omp -pk intel 1 omp 2 -in in.script # 2 nodes, 8 MPI tasks on each node, 2 threads for each task, 240 total threads on coprocessor, USER-OMP package for some styles 
 
-

Note that if the "-sf intel" switch is used, it also invokes two -default commands: package intel 1, followed by package -omp 0. These both set the number of OpenMP threads per -MPI task via the OMP_NUM_THREADS environment variable. The first -command sets the number of Xeon Phi(TM) coprocessors/node to 1 (and -the precision mode to "mixed", as one of its option defaults). The -latter command is not invoked if LAMMPS was not built with the -USER-OMP package. The Nphi = 1 value for the first command is ignored -if LAMMPS was not built with coprocessor support. +

Note that if the "-sf intel" switch is used, it also invokes a +default command: package intel 1. If the "-sf hybrid intel omp" +switch is used, the default USER-OMP command package omp 0 is +also invoked. Both set the number of OpenMP threads per MPI task via the +OMP_NUM_THREADS environment variable. The first command sets the number of +Xeon Phi(TM) coprocessors/node to 1 (and the precision mode to "mixed", as one +of its option defaults). The latter command is not invoked if LAMMPS was not +built with the USER-OMP package. The Nphi = 1 value for the first command is +ignored if LAMMPS was not built with coprocessor support.

Using the "-pk intel" or "-pk omp" switches explicitly allows for direct setting of the number of OpenMP threads per MPI task, and additional options for either of the USER-INTEL or USER-OMP packages. In particular, the "-pk intel" switch sets the number of coprocessors/node and can limit the number of coprocessor threads per -MPI task. The syntax for these two switches is the same as the +MPI task. The syntax for these two switches is the same as the package omp and package intel commands. See the package command doc page for details, including the default values used for all its options if these switches are not @@ -251,7 +247,7 @@ OpenMP threads via an environment variable if desired.

If LAMMPS was also built with the USER-OMP package, you must also use the package omp command to enable that package, unless -the "-sf intel" or "-pk omp" command-line +the "-sf hybrid intel omp" or "-pk omp" command-line switches were used. It specifies how many OpenMP threads per MPI task to use, as well as other options. Its doc page explains how to set the number of OpenMP threads via an diff --git a/doc/doc2/package.html b/doc/doc2/package.html index 82009847c8..ddf2830d24 100644 --- a/doc/doc2/package.html +++ b/doc/doc2/package.html @@ -382,7 +382,7 @@ USER-OMP packages, be aware that both packages allow setting of the single global Nthreads value used by OpenMP. Thus if both package commands are invoked, you should insure the two values are consistent. If they are not, the last one invoked will take precedence, for both -packages. Also note that if the "-sf intel" command-line"> +packages. Also note that if the "-sf hybrid intel omp" command-line"> switch is used, it invokes a "package intel" command, followed by a "package omp" command, both with a setting of Nthreads = 0. diff --git a/doc/doc2/suffix.html b/doc/doc2/suffix.html index c1aae192de..bc8cc55424 100644 --- a/doc/doc2/suffix.html +++ b/doc/doc2/suffix.html @@ -13,9 +13,10 @@

Syntax:

-
suffix style 
+
suffix style args 
 
-
  • style = off or on or cuda or gpu or intel or kk or omp or opt +
    • style = off or on or cuda or gpu or intel or kk or omp or opt or hybrid +
    • args = for hybrid style, default suffix to be used and alternative suffix

    Examples:

    @@ -23,6 +24,7 @@ suffix on suffix gpu suffix intel +suffix hybrid intel omp suffix kk

Description: @@ -32,14 +34,14 @@ exist. In that respect it operates the same as the this section of the manual. The "cuda" style corresponds to the USER-CUDA package, the "gpu" style to the GPU package, the "intel" style to the USER-INTEL package, the "kk" style to the KOKKOS package, the "omp" style to the USER-OMP package, and the "opt" style to the -OPT package, +OPT package.

These are the variants these packages provide:

@@ -63,6 +65,8 @@ multi-threading
  • OPT = a handful of pair styles, cache-optimized for faster CPU performance + +
  • HYBRID = a combination of two packages can be specified (see below)

    As an example, all of the packages provide a pair_style lj/cut variant, with style names lj/cut/opt, lj/cut/omp, @@ -79,10 +83,12 @@ input script command creates a new atom, If the variant version does not exist, the standard version is created.

    -

    When using the intel suffix, LAMMPS will first attempt to use a style -with the intel suffix. If the USER-OMP package is installed, the the -omp suffix will be tried as a second choice, if a requested style is -not available in the USER-INTEL package. +

    For "hybrid", two packages are specified. The first is used whenever +available. If a style with the first suffix is not available, the style +with the suffix for the second package will be used if available. For +example, "hybrid intel omp" will use styles from the USER-INTEL package +as a first choice and styles from the USER-OMP package as a second choice +if no USER-INTEL variant is available.

    If the specified style is off, then any previously specified suffix is temporarily disabled, whether it was specified by a command-line