git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@12466 f3b2605a-c512-4ea7-a41b-209d697bcdaa

2014-09-10 16:25:52 +00:00 · 2014-09-10 16:25:52 +00:00 · 9d11e531e7
parent 1025e266b1
commit 9d11e531e7
2 changed files with 221 additions and 152 deletions
--- a/doc/package.html
+++ b/doc/package.html
@ -68,11 +68,21 @@
     <I>tptask</I> value = Ntptask
       Ntptask = max number of threads to use on coprocessor for each MPI task
  <I>kokkos</I> args = keyword value ...
-    one or more keyword/value pairs may be appended
+    zero or more keyword/value pairs may be appended
-    keywords = <I>neigh</I> or <I>comm/exchange</I> or <I>comm/forward</I>
+    keywords = <I>neigh</I> or <I>comm</I> or <I>comm/exchange</I> or <I>comm/forward</I>
      <I>neigh</I> value = <I>full</I> or <I>half/thread</I> or <I>half</I> or <I>n2</I> or <I>full/cluster</I>
        full = full neighbor list
        half/thread = half neighbor list built in thread-safe manner
        half = half neighbor list, not thread-safe, only use when 1 thread/MPI task
        n2 = non-binning neighbor list build, O(N^2) algorithm
        full/cluster = full neighbor list with clustered groups of atoms
      <I>comm</I> value = <I>no</I> or <I>host</I> or <I>device</I>
        use value for both comm/exchange and comm/forward
      <I>comm/exchange</I> value = <I>no</I> or <I>host</I> or <I>device</I>
      <I>comm/forward</I> value = <I>no</I> or <I>host</I> or <I>device</I>
        no = perform communication pack/unpack in non-KOKKOS mode
        host = perform pack/unpack on host (e.g. with OpenMP threading)
        device = perform pack/unpack on device (e.g. on GPU)
  <I>omp</I> args = Nthreads keyword value ...
    Nthread = # of OpenMP threads to associate with each MPI process
    zero or more keyword/value pairs may be appended 
@ -88,47 +98,59 @@
 <PRE>package gpu 1
 package gpu 1 split 0.75
 package gpu 2 split -1.0
-package cuda gpu/node/special 2 0 2
+package cuda 2 gpuID 0 2
-package cuda test 3948
+package cuda 1 test 3948
-package kokkos neigh half/thread comm/forward device
+package kokkos neigh half/thread comm device
-package omp 0 neigh yes
+package omp 0 neigh no
 package omp 4
 package intel * mixed balance -1 
 </PRE>
 <P><B>Description:</B>
 </P>
-<P>This command invokes package-specific settings.  Currently the
+<P>This command invokes package-specific settings for the various
-following packages use it: USER-CUDA, GPU, USER-INTEL, KOKKOS, and
+accelerator packages available in LAMMPS.  Currently the following
-USER-OMP.
+packages use settings from this command: USER-CUDA, GPU, USER-INTEL,
 KOKKOS, and USER-OMP.
 </P>
-<P>If allows calling multiple times, all options set to their
+<P>If this command is specified in an input script, it must be near the
-defaults, whether specified or not.
+top of the script, before the simulation box has been defined.  This
 is because it specifies settings that the accelerator packages use in
 their intialization, before a simultion is defined.
 </P>
-<P>Talk about command line switch -pk as alternate option.
+<P>This command can also be specified from the command-line when
 launching LAMMPS, using the "-pk" <A HREF = "Section_start.html#start_7">command-line
 switch</A>.  The syntax is exactly the same as
 when used in an input script.
 </P>
-<P>Which packages require it to be invoked, only CUDA
+<P>Note that all of the accelerator packages require the package command
-  this is b/c can only be invoked once
+to be specified (except the OPT package), if the package is to be used
-vs optional: all others?  and allow multiple invokes
+in a simulation (LAMMPS can be built with an accelerator package
 without using it in a particular simulation).  However, in all cases,
 a default version of the command is typically invoked by other
 accelerator settings.
 </P>
-<P>Must be invoked early in script, before simulation box is defined.
+<P>The USER-CUDA and KOKKOS packages require a "-c on" or "-k on"
 <A HREF = "Section_start.html#start_7">command-line switch</A> respectively, which
 invokes a "package cuda" or "package kokkos" command with default
 settings.
 </P>
-<P>To use the accelerated GPU and USER-OMP styles, the use of the package
+<P>For the GPU, USER-INTEL, and USER-OMP packages, if a "-sf gpu" or "-sf
-command is required.  However, as described in the "Defaults" section
+intel" or "-sf omp" <A HREF = "Section_start.html#start_7">command-line switch</A>
-below, if you use the "-sf gpu" or "-sf omp" <A HREF = "Section_start.html#start_7">command-line
+is used to auto-append accelerator suffixes to various styles in the
-options</A> to enable use of these styles,
+input script, then those switches also invoke a "package gpu",
-then default package settings are enabled.  In that case you only need
+"package intel", or "package omp" command with default settings.
 to use the package command if you want to change the defaults.
 </P>
-<P>To use the accelerated USER-CUDA and KOKKOS styles, the package
+<P>IMPORTANT NOTE: A package command for a particular style can be
-command is not required as defaults are assigned internally.  You only
+invoked multiple times when a simulation is setup, e.g. by the "-c
-need to use the package command if you want to change the defaults.
+on", "-k on", "-sf", and "-pk" <A HREF = "Section_start.html#start_7">command-line
 switches</A>, and by using this command in an
 input script.  Each time it is used all of the style options are set,
 either to default values or to specified settings.  I.e. settings from
 previous invocations do not persist across multiple invocations.
 </P>
-<P>See <A HREF = "Section_accelerate.html">Section_accelerate</A> of the manual for
+<P>See the <A HREF = "Section_accelerate.html">Section Accelerate</A> section of the
-more details about using these various packages for accelerating
+manual for more details about using the various accelerator packages
-LAMMPS calculations.
+for speeding up LAMMPS simulations.
 </P>
 <P>Package GPU always sets newton pair off.  Not so for USER-CUDA
 add newton options to GPU, CUDA, KOKKOS.
 </P>
 <HR>
@ -335,32 +357,44 @@ generation Xeon Phi chip.
 <P>The <I>kokkos</I> style invokes settings associated with the use of the
 KOKKOS package.
 </P>
-<P>The <I>neigh</I> keyword determines what kinds of neighbor lists are built.
+<P>All of the settings are optional keyword/value pairs.  Each has a
-A value of <I>half</I> uses half-neighbor lists, the same as used by most
+default value as listed below.
 pair styles in LAMMPS.  A value of <I>half/thread</I> uses a threadsafe
 variant of the half-neighbor list.  It should be used instead of
 <I>half</I> when running with threads on a CPU.  A value of <I>full</I> uses a
 full-neighborlist, i.e. f_ij and f_ji are both calculated.  This
 performs twice as much computation as the <I>half</I> option, however that
 can be a win because it is threadsafe and doesn't require atomic
 operations.  A value of <I>full/cluster</I> is an experimental neighbor
 style, where particles interact with all particles within a small
 cluster, if at least one of the clusters particles is within the
 neighbor cutoff range.  This potentially allows for better
 vectorization on architectures such as the Intel Phi.  If also reduces
 the size of the neighbor list by roughly a factor of the cluster size,
 thus reducing the total memory footprint considerably.
 </P>
-<P>The <I>comm/exchange</I> and <I>comm/forward</I> keywords determine whether the
+<P>The <I>neigh</I> keyword determines how neighbor lists are built.  A value
-host or device performs the packing and unpacking of data when
+of <I>half</I> uses half-neighbor lists, the same as used by most pair
-communicating information between processors.  "Exchange"
+styles in LAMMPS.  A value of <I>half/thread</I> uses a thread-safe variant
 of the half-neighbor list.  It should be used instead of <I>half</I> when
 running with more than 1 threads per MPI task on a CPU.  A value of
 <I>n2</I> uses an O(N^2) algorithm to build the neighbor list without
 binning, where N = # of atoms on a processor.  It is typically slower
 than the other methods, which use binning.
 </P>
 <P>A value of <I>full</I> uses a full neighbor lists and is the default.  This
 performs twice as much computation as the <I>half</I> option, however that
 is often a win because it is thread-safe and doesn't require atomic
 operations in the calculation of pair forces.
 </P>
 <P>A value of <I>full/cluster</I> is an experimental neighbor style, where
 particles interact with all particles within a small cluster, if at
 least one of the clusters particles is within the neighbor cutoff
 range.  This potentially allows for better vectorization on
 architectures such as the Intel Phi.  If also reduces the size of the
 neighbor list by roughly a factor of the cluster size, thus reducing
 the total memory footprint considerably.
 </P>
 <P>The <I>comm</I> and <I>comm/exchange</I> and <I>comm/forward</I> keywords determine
 whether the host or device performs the packing and unpacking of data
 when communicating per-atom data between processors.  "Exchange"
 communication happens only on timesteps that neighbor lists are
 rebuilt.  The data is only for atoms that migrate to new processors.
 "Forward" communication happens every timestep.  The data is for atom
 coordinates and any other atom properties that needs to be updated for
 ghost atoms owned by each processor.
 </P>
-<P>The value options for these keywords are <I>no</I> or <I>host</I> or <I>device</I>.
+<P>The <I>comm</I> keyword is simply a short-cut to set the same value
 for both the <I>comm/exchange</I> and <I>comm/forward</I> keywords.
 </P>
 <P>The value options for all 3 keywords are <I>no</I> or <I>host</I> or <I>device</I>.
 A value of <I>no</I> means to use the standard non-KOKKOS method of
 packing/unpacking data for the communication.  A value of <I>host</I> means
 to use the host, typically a multi-core CPU, and perform the
@ -369,10 +403,12 @@ to use the device, typically a GPU, to perform the packing/unpacking
 operation.
 </P>
 <P>The optimal choice for these keywords depends on the input script and
-the hardware used.  The <I>no</I> value is useful for verifying that Kokkos
+the hardware used.  The <I>no</I> value is useful for verifying that the
-code is working correctly.  It may also be the fastest choice when
+Kokkos-based <I>host</I> and <I>device</I> values are working correctly.  It may
-using Kokkos styles in MPI-only mode (i.e. with a thread count of 1).
+also be the fastest choice when using Kokkos styles in MPI-only mode
-When running on CPUs or Xeon Phi, the <I>host</I> and <I>device</I> values work
+(i.e. with a thread count of 1).
 </P>
 <P>When running on CPUs or Xeon Phi, the <I>host</I> and <I>device</I> values work
 identically.  When using GPUs, the <I>device</I> value will typically be
 optimal if all of your styles used in your input script are supported
 by the KOKKOS package.  In this case data can stay on the GPU for many
@ -476,11 +512,13 @@ setting</A>
 </P>
 <P><B>Default:</B>
 </P>
-<P>To use the USER-CUDA package, the package cuda command must be invoked
+<P>For the USER-CUDA package, the default is Ngpu = 1 and the option
-explicitly in your input script or via the "-pk cuda" <A HREF = "Section_start.html#start_7">command-line
+defaults are gpuID = 0 to Ngpu-1, timing = not enabled, test = not
-switch</A>.  This will set the # of GPUs/node.
+enabled, and thread = auto.  These settings are made automatically by
-The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
+the required "-c on" <A HREF = "Section_start.html#start_7">command-line switch</A>.
-test = not enabled, and thread = auto.
+You can change them bu using the package cuda command in your input
 script or via the "-pk cuda" <A HREF = "Section_start.html#start_7">command-line
 switch</A>.
 </P>
 <P>For the GPU package, the default is Ngpu = 1 and the option defaults
 are neigh = yes, split = 1.0, gpuID = 0 to Ngpu-1, tpa = 1, binsize =
@ -491,24 +529,21 @@ must invoke the package gpu command in your input script or via the
 "-pk gpu" <A HREF = "Section_start.html#start_7">command-line switch</A>.
 </P>
 <P>For the USER-INTEL package, the default is Nphi = 1 and the option
-defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240.  The
+defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240.  Note
-default ghost option is determined by the pair style being used.  This
+that all of these settings, except "prec", are ignored if LAMMPS was
-value used is output to the screen in the offload report at the end of
+not built with Xeon Phi coprocessor support.  The default ghost option
-each run.  These settings are made automatically if the "-sf intel"
+is determined by the pair style being used.  This value is output to
-<A HREF = "Section_start.html#start_7">command-line switch</A> is used.  If it is
+the screen in the offload report at the end of each run.  These
-not used, you must invoke the package intel command in your input
+settings are made automatically if the "-sf intel" <A HREF = "Section_start.html#start_7">command-line
-script or or via the "-pk intel" <A HREF = "Section_start.html#start_7">command-line
+switch</A> is used.  If it is not used, you
-switch</A>.
+must invoke the package intel command in your input script or or via
 the "-pk intel" <A HREF = "Section_start.html#start_7">command-line switch</A>.
 </P>
-<P>The default settings for the KOKKOS package are "package kokkos neigh
+<P>For the KOKKOS package, the option defaults neigh = full and comm =
-full comm/exchange host comm/forward host".  This is the case whether
+host.  These settings are made automatically by the required "-k on"
-the "-sf kk" <A HREF = "Section_start.html#start_7">command-line switch</A> is used
+<A HREF = "Section_start.html#start_7">command-line switch</A>.  You can change them
-or not.
+bu using the package kokkos command in your input script or via the
-To use the KOKKOS package, the package kokkos command must be invoked
+"-pk kokkos" <A HREF = "Section_start.html#start_7">command-line switch</A>.
 explicitly in your input script or via the "-pk kokkos" <A HREF = "Section_start.html#start_7">command-line
 switch</A>.  This will set the # of GPUs/node.
 The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
 test = not enabled, and thread = auto.
 </P>
 <P>For the OMP package, the default is Nthreads = 0 and the option
 defaults are neigh = yes.  These settings are made automatically if
--- a/doc/package.txt
+++ b/doc/package.txt
@ -63,11 +63,21 @@ args = arguments specific to the style :l
     {tptask} value = Ntptask
       Ntptask = max number of threads to use on coprocessor for each MPI task
  {kokkos} args = keyword value ...
-    one or more keyword/value pairs may be appended
+    zero or more keyword/value pairs may be appended
-    keywords = {neigh} or {comm/exchange} or {comm/forward}
+    keywords = {neigh} or {comm} or {comm/exchange} or {comm/forward}
      {neigh} value = {full} or {half/thread} or {half} or {n2} or {full/cluster}
        full = full neighbor list
        half/thread = half neighbor list built in thread-safe manner
        half = half neighbor list, not thread-safe, only use when 1 thread/MPI task
        n2 = non-binning neighbor list build, O(N^2) algorithm
        full/cluster = full neighbor list with clustered groups of atoms
      {comm} value = {no} or {host} or {device}
        use value for both comm/exchange and comm/forward
      {comm/exchange} value = {no} or {host} or {device}
      {comm/forward} value = {no} or {host} or {device}
        no = perform communication pack/unpack in non-KOKKOS mode
        host = perform pack/unpack on host (e.g. with OpenMP threading)
        device = perform pack/unpack on device (e.g. on GPU)
  {omp} args = Nthreads keyword value ...
    Nthread = # of OpenMP threads to associate with each MPI process
    zero or more keyword/value pairs may be appended 
@ -82,47 +92,59 @@ args = arguments specific to the style :l
 package gpu 1
 package gpu 1 split 0.75
 package gpu 2 split -1.0
-package cuda gpu/node/special 2 0 2
+package cuda 2 gpuID 0 2
-package cuda test 3948
+package cuda 1 test 3948
-package kokkos neigh half/thread comm/forward device
+package kokkos neigh half/thread comm device
-package omp 0 neigh yes
+package omp 0 neigh no
 package omp 4
 package intel * mixed balance -1 :pre
 [Description:]
-This command invokes package-specific settings.  Currently the
+This command invokes package-specific settings for the various
-following packages use it: USER-CUDA, GPU, USER-INTEL, KOKKOS, and
+accelerator packages available in LAMMPS.  Currently the following
-USER-OMP.
+packages use settings from this command: USER-CUDA, GPU, USER-INTEL,
 KOKKOS, and USER-OMP.
-If allows calling multiple times, all options set to their
+If this command is specified in an input script, it must be near the
-defaults, whether specified or not.
+top of the script, before the simulation box has been defined.  This
 is because it specifies settings that the accelerator packages use in
 their intialization, before a simultion is defined.
-Talk about command line switch -pk as alternate option.
+This command can also be specified from the command-line when
 launching LAMMPS, using the "-pk" "command-line
 switch"_Section_start.html#start_7.  The syntax is exactly the same as
 when used in an input script.
-Which packages require it to be invoked, only CUDA
+Note that all of the accelerator packages require the package command
-  this is b/c can only be invoked once
+to be specified (except the OPT package), if the package is to be used
-vs optional: all others?  and allow multiple invokes
+in a simulation (LAMMPS can be built with an accelerator package
 without using it in a particular simulation).  However, in all cases,
 a default version of the command is typically invoked by other
 accelerator settings.
-Must be invoked early in script, before simulation box is defined.
+The USER-CUDA and KOKKOS packages require a "-c on" or "-k on"
 "command-line switch"_Section_start.html#start_7 respectively, which
 invokes a "package cuda" or "package kokkos" command with default
 settings.
-To use the accelerated GPU and USER-OMP styles, the use of the package
+For the GPU, USER-INTEL, and USER-OMP packages, if a "-sf gpu" or "-sf
-command is required.  However, as described in the "Defaults" section
+intel" or "-sf omp" "command-line switch"_Section_start.html#start_7
-below, if you use the "-sf gpu" or "-sf omp" "command-line
+is used to auto-append accelerator suffixes to various styles in the
-options"_Section_start.html#start_7 to enable use of these styles,
+input script, then those switches also invoke a "package gpu",
-then default package settings are enabled.  In that case you only need
+"package intel", or "package omp" command with default settings.
 to use the package command if you want to change the defaults.
-To use the accelerated USER-CUDA and KOKKOS styles, the package
+IMPORTANT NOTE: A package command for a particular style can be
-command is not required as defaults are assigned internally.  You only
+invoked multiple times when a simulation is setup, e.g. by the "-c
-need to use the package command if you want to change the defaults.
+on", "-k on", "-sf", and "-pk" "command-line
 switches"_Section_start.html#start_7, and by using this command in an
 input script.  Each time it is used all of the style options are set,
 either to default values or to specified settings.  I.e. settings from
 previous invocations do not persist across multiple invocations.
-See "Section_accelerate"_Section_accelerate.html of the manual for
+See the "Section Accelerate"_Section_accelerate.html section of the
-more details about using these various packages for accelerating
+manual for more details about using the various accelerator packages
-LAMMPS calculations.
+for speeding up LAMMPS simulations.
 Package GPU always sets newton pair off.  Not so for USER-CUDA
 add newton options to GPU, CUDA, KOKKOS.
 :line
@ -329,32 +351,44 @@ generation Xeon Phi chip.
 The {kokkos} style invokes settings associated with the use of the
 KOKKOS package.
-The {neigh} keyword determines what kinds of neighbor lists are built.
+All of the settings are optional keyword/value pairs.  Each has a
-A value of {half} uses half-neighbor lists, the same as used by most
+default value as listed below.
 pair styles in LAMMPS.  A value of {half/thread} uses a threadsafe
 variant of the half-neighbor list.  It should be used instead of
 {half} when running with threads on a CPU.  A value of {full} uses a
 full-neighborlist, i.e. f_ij and f_ji are both calculated.  This
 performs twice as much computation as the {half} option, however that
 can be a win because it is threadsafe and doesn't require atomic
 operations.  A value of {full/cluster} is an experimental neighbor
 style, where particles interact with all particles within a small
 cluster, if at least one of the clusters particles is within the
 neighbor cutoff range.  This potentially allows for better
 vectorization on architectures such as the Intel Phi.  If also reduces
 the size of the neighbor list by roughly a factor of the cluster size,
 thus reducing the total memory footprint considerably.
-The {comm/exchange} and {comm/forward} keywords determine whether the
+The {neigh} keyword determines how neighbor lists are built.  A value
-host or device performs the packing and unpacking of data when
+of {half} uses half-neighbor lists, the same as used by most pair
-communicating information between processors.  "Exchange"
+styles in LAMMPS.  A value of {half/thread} uses a thread-safe variant
 of the half-neighbor list.  It should be used instead of {half} when
 running with more than 1 threads per MPI task on a CPU.  A value of
 {n2} uses an O(N^2) algorithm to build the neighbor list without
 binning, where N = # of atoms on a processor.  It is typically slower
 than the other methods, which use binning.
 A value of {full} uses a full neighbor lists and is the default.  This
 performs twice as much computation as the {half} option, however that
 is often a win because it is thread-safe and doesn't require atomic
 operations in the calculation of pair forces.
 A value of {full/cluster} is an experimental neighbor style, where
 particles interact with all particles within a small cluster, if at
 least one of the clusters particles is within the neighbor cutoff
 range.  This potentially allows for better vectorization on
 architectures such as the Intel Phi.  If also reduces the size of the
 neighbor list by roughly a factor of the cluster size, thus reducing
 the total memory footprint considerably.
 The {comm} and {comm/exchange} and {comm/forward} keywords determine
 whether the host or device performs the packing and unpacking of data
 when communicating per-atom data between processors.  "Exchange"
 communication happens only on timesteps that neighbor lists are
 rebuilt.  The data is only for atoms that migrate to new processors.
 "Forward" communication happens every timestep.  The data is for atom
 coordinates and any other atom properties that needs to be updated for
 ghost atoms owned by each processor.
-The value options for these keywords are {no} or {host} or {device}.
+The {comm} keyword is simply a short-cut to set the same value
 for both the {comm/exchange} and {comm/forward} keywords.
 The value options for all 3 keywords are {no} or {host} or {device}.
 A value of {no} means to use the standard non-KOKKOS method of
 packing/unpacking data for the communication.  A value of {host} means
 to use the host, typically a multi-core CPU, and perform the
@ -363,9 +397,11 @@ to use the device, typically a GPU, to perform the packing/unpacking
 operation.
 The optimal choice for these keywords depends on the input script and
-the hardware used.  The {no} value is useful for verifying that Kokkos
+the hardware used.  The {no} value is useful for verifying that the
-code is working correctly.  It may also be the fastest choice when
+Kokkos-based {host} and {device} values are working correctly.  It may
-using Kokkos styles in MPI-only mode (i.e. with a thread count of 1).
+also be the fastest choice when using Kokkos styles in MPI-only mode
 (i.e. with a thread count of 1).
 When running on CPUs or Xeon Phi, the {host} and {device} values work
 identically.  When using GPUs, the {device} value will typically be
 optimal if all of your styles used in your input script are supported
@ -470,11 +506,13 @@ setting"_Section_start.html#start_7
 [Default:]
-To use the USER-CUDA package, the package cuda command must be invoked
+For the USER-CUDA package, the default is Ngpu = 1 and the option
-explicitly in your input script or via the "-pk cuda" "command-line
+defaults are gpuID = 0 to Ngpu-1, timing = not enabled, test = not
-switch"_Section_start.html#start_7.  This will set the # of GPUs/node.
+enabled, and thread = auto.  These settings are made automatically by
-The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
+the required "-c on" "command-line switch"_Section_start.html#start_7.
-test = not enabled, and thread = auto.
+You can change them bu using the package cuda command in your input
 script or via the "-pk cuda" "command-line
 switch"_Section_start.html#start_7.
 For the GPU package, the default is Ngpu = 1 and the option defaults
 are neigh = yes, split = 1.0, gpuID = 0 to Ngpu-1, tpa = 1, binsize =
@ -485,24 +523,21 @@ must invoke the package gpu command in your input script or via the
 "-pk gpu" "command-line switch"_Section_start.html#start_7.
 For the USER-INTEL package, the default is Nphi = 1 and the option
-defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240.  The
+defaults are prec = mixed, balance = -1, tpc = 4, tptask = 240.  Note
-default ghost option is determined by the pair style being used.  This
+that all of these settings, except "prec", are ignored if LAMMPS was
-value used is output to the screen in the offload report at the end of
+not built with Xeon Phi coprocessor support.  The default ghost option
-each run.  These settings are made automatically if the "-sf intel"
+is determined by the pair style being used.  This value is output to
-"command-line switch"_Section_start.html#start_7 is used.  If it is
+the screen in the offload report at the end of each run.  These
-not used, you must invoke the package intel command in your input
+settings are made automatically if the "-sf intel" "command-line
-script or or via the "-pk intel" "command-line
+switch"_Section_start.html#start_7 is used.  If it is not used, you
-switch"_Section_start.html#start_7.
+must invoke the package intel command in your input script or or via
 the "-pk intel" "command-line switch"_Section_start.html#start_7.
-The default settings for the KOKKOS package are "package kokkos neigh
+For the KOKKOS package, the option defaults neigh = full and comm =
-full comm/exchange host comm/forward host".  This is the case whether
+host.  These settings are made automatically by the required "-k on"
-the "-sf kk" "command-line switch"_Section_start.html#start_7 is used
+"command-line switch"_Section_start.html#start_7.  You can change them
-or not.
+bu using the package kokkos command in your input script or via the
-To use the KOKKOS package, the package kokkos command must be invoked
+"-pk kokkos" "command-line switch"_Section_start.html#start_7.
 explicitly in your input script or via the "-pk kokkos" "command-line
 switch"_Section_start.html#start_7.  This will set the # of GPUs/node.
 The options defaults are gpuID = 0 to Ngpu-1, timing = not enabled,
 test = not enabled, and thread = auto.
 For the OMP package, the default is Nthreads = 0 and the option
 defaults are neigh = yes.  These settings are made automatically if
@ -510,4 +545,3 @@ the "-sf omp" "command-line switch"_Section_start.html#start_7 is
 used.  If it is not used, you must invoke the package omp command in
 your input script or via the "-pk omp" "command-line
 switch"_Section_start.html#start_7.