git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@6222 f3b2605a-c512-4ea7-a41b-209d697bcdaa

2011-05-26 23:45:23 +00:00 · 2011-05-26 23:45:23 +00:00 · 465cbbd0ed
parent ab5592e67b
commit 465cbbd0ed
10 changed files with 154 additions and 376 deletions
--- a/doc/Manual.html
+++ b/doc/Manual.html
@ -104,9 +104,7 @@ it gives quick access to documentation for all LAMMPS commands.
 <BR>
  2.7 <A HREF = "Section_start.html#2_7">Screen output</A> 
 <BR>
-  2.8 <A HREF = "Section_start.html#2_8">Running on GPUs</A> 
-<BR>
-  2.9 <A HREF = "Section_start.html#2_9">Tips for users of previous versions</A> 
+  2.8 <A HREF = "Section_start.html#2_8">Tips for users of previous versions</A> 
 <BR></UL>
 <LI><A HREF = "Section_commands.html">Commands</A> 

@ -170,7 +168,7 @@ it gives quick access to documentation for all LAMMPS commands.

 <LI><A HREF = "Section_tools.html">Additional tools</A> 

-<LI><A HREF = "Section_modify.html">Modifying & Extending LAMMPS</A> 
+<LI><A HREF = "Section_modify.html">Modifying & extending LAMMPS</A> 

 <LI><A HREF = "Section_python.html">Python interface</A> 

@ -188,19 +186,29 @@ it gives quick access to documentation for all LAMMPS commands.
 <BR>
  9.7 <A HREF = "Section_python.html#9_7">Example Python scripts that use LAMMPS</A> 
 <BR></UL>
+<LI><A HREF = "Section_accelerate.html">Using accelelerated CPU and GPU styles</A> 
+
+<UL>  10.1 <A HREF = "Section_errors.html#10_1">OPT package</A> 
+<BR>
+  10.2 <A HREF = "Section_errors.html#10_2">GPU package</A> 
+<BR>
+  10.3 <A HREF = "Section_errors.html#10_3">USER-CUDA package</A> 
+<BR>
+  10.4 <A HREF = "Section_errors.html#10_4">Comparison of GPU and USER-CUDA packages</A> 
+<BR></UL>
 <LI><A HREF = "Section_errors.html">Errors</A> 

-<UL>  10.1 <A HREF = "Section_errors.html#10_1">Common problems</A> 
+<UL>  11.1 <A HREF = "Section_errors.html#11_1">Common problems</A> 
 <BR>
-  10.2 <A HREF = "Section_errors.html#10_2">Reporting bugs</A> 
+  11.2 <A HREF = "Section_errors.html#11_2">Reporting bugs</A> 
 <BR>
-  10.3 <A HREF = "Section_errors.html#10_3">Error & warning messages</A> 
+  11.3 <A HREF = "Section_errors.html#11_3">Error & warning messages</A> 
 <BR></UL>
 <LI><A HREF = "Section_history.html">Future and history</A> 

-<UL>  11.1 <A HREF = "Section_history.html#11_1">Coming attractions</A> 
+<UL>  12.1 <A HREF = "Section_history.html#12_1">Coming attractions</A> 
 <BR>
-  11.2 <A HREF = "Section_history.html#11_2">Past versions</A> 
+  12.2 <A HREF = "Section_history.html#12_2">Past versions</A> 
 <BR></UL>

 </OL>
@ -301,6 +309,12 @@ it gives quick access to documentation for all LAMMPS commands.



+
+
+
+
+
+



--- a/doc/Manual.txt
+++ b/doc/Manual.txt
@ -86,8 +86,7 @@ it gives quick access to documentation for all LAMMPS commands.
  2.5 "Running LAMMPS"_2_5 :b
  2.6 "Command-line options"_2_6 :b
  2.7 "Screen output"_2_7 :b
-  2.8 "Running on GPUs"_2_8 :b
-  2.9 "Tips for users of previous versions"_2_9 :ule,b
+  2.8 "Tips for users of previous versions"_2_8 :ule,b
 "Commands"_Section_commands.html :l
  3.1 "LAMMPS input script"_3_1 :ulb,b
  3.2 "Parsing rules"_3_2 :b
@ -119,7 +118,7 @@ it gives quick access to documentation for all LAMMPS commands.
 "Example problems"_Section_example.html :l
 "Performance & scalability"_Section_perf.html :l
 "Additional tools"_Section_tools.html :l
-"Modifying & Extending LAMMPS"_Section_modify.html :l
+"Modifying & extending LAMMPS"_Section_modify.html :l
 "Python interface"_Section_python.html :l
  9.1 "Extending Python with a serial version of LAMMPS"_9_1 :ulb,b
  9.2 "Creating a shared MPI library"_9_2 :b
@ -128,13 +127,18 @@ it gives quick access to documentation for all LAMMPS commands.
  9.5 "Testing the Python-LAMMPS interface"_9_5 :b
  9.6 "Using LAMMPS from Python"_9_6 :b
  9.7 "Example Python scripts that use LAMMPS"_9_7 :ule,b
+"Using accelelerated CPU and GPU styles"_Section_accelerate.html :l
+  10.1 "OPT package"_10_1 :ulb,b
+  10.2 "GPU package"_10_2 :b
+  10.3 "USER-CUDA package"_10_3 :b
+  10.4 "Comparison of GPU and USER-CUDA packages"_10_4 :ule,b
 "Errors"_Section_errors.html :l
-  10.1 "Common problems"_10_1 :ulb,b
-  10.2 "Reporting bugs"_10_2 :b
-  10.3 "Error & warning messages"_10_3 :ule,b
+  11.1 "Common problems"_11_1 :ulb,b
+  11.2 "Reporting bugs"_11_2 :b
+  11.3 "Error & warning messages"_11_3 :ule,b
 "Future and history"_Section_history.html :l
-  11.1 "Coming attractions"_11_1 :ulb,b
-  11.2 "Past versions"_11_2 :ule,b
+  12.1 "Coming attractions"_12_1 :ulb,b
+  12.2 "Past versions"_12_2 :ule,b
 :ole

 :link(1_1,Section_intro.html#1_1)
@ -151,7 +155,6 @@ it gives quick access to documentation for all LAMMPS commands.
 :link(2_6,Section_start.html#2_6)
 :link(2_7,Section_start.html#2_7)
 :link(2_8,Section_start.html#2_8)
-:link(2_9,Section_start.html#2_9)

 :link(3_1,Section_commands.html#3_1)
 :link(3_2,Section_commands.html#3_2)
@ -192,8 +195,13 @@ it gives quick access to documentation for all LAMMPS commands.
 :link(10_1,Section_errors.html#10_1)
 :link(10_2,Section_errors.html#10_2)
 :link(10_3,Section_errors.html#10_3)
+:link(10_4,Section_errors.html#10_4)

-:link(11_1,Section_history.html#11_1)
-:link(11_2,Section_history.html#11_2)
+:link(11_1,Section_errors.html#11_1)
+:link(11_2,Section_errors.html#11_2)
+:link(11_3,Section_errors.html#11_3)
+
+:link(12_1,Section_history.html#12_1)
+:link(12_2,Section_history.html#12_2)

 </BODY>
--- a/doc/Section_errors.html
+++ b/doc/Section_errors.html
@ -11,18 +11,18 @@ Section</A>

 <HR>

-<H3>10. Errors 
+<H3>11. Errors 
 </H3>
 <P>This section describes the various kinds of errors you can encounter
 when using LAMMPS.
 </P>
-10.1 <A HREF = "#10_1">Common problems</A><BR>
-10.2 <A HREF = "#10_2">Reporting bugs</A><BR>
-10.3 <A HREF = "#10_3">Error & warning messages</A> <BR>
+11.1 <A HREF = "#11_1">Common problems</A><BR>
+11.2 <A HREF = "#11_2">Reporting bugs</A><BR>
+11.3 <A HREF = "#11_3">Error & warning messages</A> <BR>

 <HR>

-<A NAME = "10_1"></A><H4>10.1 Common problems 
+<A NAME = "11_1"></A><H4>11.1 Common problems 
 </H4>
 <P>If two LAMMPS runs do not produce the same answer on different
 machines or different numbers of processors, this is typically not a
@ -81,7 +81,7 @@ decide if the WARNING is important or not.  A WARNING message that is
 generated in the middle of a run is only printed to the screen, not to
 the logfile, to avoid cluttering up thermodynamic output.  If LAMMPS
 crashes or hangs without spitting out an error message first then it
-could be a bug (see <A HREF = "#10_2">this section</A>) or one of the following
+could be a bug (see <A HREF = "#11_2">this section</A>) or one of the following
 cases:
 </P>
 <P>LAMMPS runs in the available memory a processor allows to be
@ -112,7 +112,7 @@ buffering or boost the sizes of messages that can be buffered.
 </P>
 <HR>

-<A NAME = "10_2"></A><H4>10.2 Reporting bugs 
+<A NAME = "11_2"></A><H4>11.2 Reporting bugs 
 </H4>
 <P>If you are confident that you have found a bug in LAMMPS, follow these
 steps.
@ -142,7 +142,7 @@ causing the problem.
 </P>
 <HR>

-<H4><A NAME = "10_3"></A>10.3 Error & warning messages 
+<H4><A NAME = "11_3"></A>11.3 Error & warning messages 
 </H4>
 <P>These are two alphabetic lists of the <A HREF = "#error">ERROR</A> and
 <A HREF = "#warn">WARNING</A> messages LAMMPS prints out and the reason why.  If the
--- a/doc/Section_errors.txt
+++ b/doc/Section_errors.txt
@ -8,18 +8,18 @@ Section"_Section_history.html :c

 :line

-10. Errors :h3
+11. Errors :h3

 This section describes the various kinds of errors you can encounter
 when using LAMMPS.

-10.1 "Common problems"_#10_1
-10.2 "Reporting bugs"_#10_2
-10.3 "Error & warning messages"_#10_3 :all(b)
+11.1 "Common problems"_#11_1
+11.2 "Reporting bugs"_#11_2
+11.3 "Error & warning messages"_#11_3 :all(b)

 :line

-10.1 Common problems :link(10_1),h4
+11.1 Common problems :link(11_1),h4

 If two LAMMPS runs do not produce the same answer on different
 machines or different numbers of processors, this is typically not a
@ -78,7 +78,7 @@ decide if the WARNING is important or not.  A WARNING message that is
 generated in the middle of a run is only printed to the screen, not to
 the logfile, to avoid cluttering up thermodynamic output.  If LAMMPS
 crashes or hangs without spitting out an error message first then it
-could be a bug (see "this section"_#10_2) or one of the following
+could be a bug (see "this section"_#11_2) or one of the following
 cases:

 LAMMPS runs in the available memory a processor allows to be
@ -109,7 +109,7 @@ buffering or boost the sizes of messages that can be buffered.

 :line

-10.2 Reporting bugs :link(10_2),h4
+11.2 Reporting bugs :link(11_2),h4

 If you are confident that you have found a bug in LAMMPS, follow these
 steps.
@ -139,7 +139,7 @@ As a last resort, you can send an email directly to the

 :line

-10.3 Error & warning messages :h4,link(10_3)
+11.3 Error & warning messages :h4,link(11_3)

 These are two alphabetic lists of the "ERROR"_#error and
 "WARNING"_#warn messages LAMMPS prints out and the reason why.  If the
--- a/doc/Section_history.html
+++ b/doc/Section_history.html
@ -9,18 +9,18 @@

 <HR>

-<H3>11. Future and history 
+<H3>12. Future and history 
 </H3>
 <P>This section lists features we are planning to add to LAMMPS, features
 of previous versions of LAMMPS, and features of other parallel
 molecular dynamics codes I've distributed.
 </P>
-11.1 <A HREF = "#11_1">Coming attractions</A><BR>
-11.2 <A HREF = "#11_2">Past versions</A> <BR>
+12.1 <A HREF = "#12_1">Coming attractions</A><BR>
+12.2 <A HREF = "#12_2">Past versions</A> <BR>

 <HR>

-<H4><A NAME = "11_1"></A>11.1 Coming attractions 
+<H4><A NAME = "12_1"></A>12.1 Coming attractions 
 </H4>
 <P>The current version of LAMMPS incorporates nearly all the features
 from previous parallel MD codes developed at Sandia.  These include
@ -49,7 +49,7 @@ page</A> on the LAMMPS WWW site for more details.
 </UL>
 <HR>

-<H4><A NAME = "11_2"></A>11.2 Past versions 
+<H4><A NAME = "12_2"></A>12.2 Past versions 
 </H4>
 <P>LAMMPS development began in the mid 1990s under a cooperative research
 & development agreement (CRADA) between two DOE labs (Sandia and LLNL)
--- a/doc/Section_history.txt
+++ b/doc/Section_history.txt
@ -6,18 +6,18 @@

 :line

-11. Future and history :h3
+12. Future and history :h3

 This section lists features we are planning to add to LAMMPS, features
 of previous versions of LAMMPS, and features of other parallel
 molecular dynamics codes I've distributed.

-11.1 "Coming attractions"_#11_1
-11.2 "Past versions"_#11_2 :all(b)
+12.1 "Coming attractions"_#12_1
+12.2 "Past versions"_#12_2 :all(b)

 :line

-11.1 Coming attractions :h4,link(11_1)
+12.1 Coming attractions :h4,link(12_1)

 The current version of LAMMPS incorporates nearly all the features
 from previous parallel MD codes developed at Sandia.  These include
@ -46,7 +46,7 @@ Direct Simulation Monte Carlo - DSMC :ul

 :line

-11.2 Past versions :h4,link(11_2)
+12.2 Past versions :h4,link(12_2)

 LAMMPS development began in the mid 1990s under a cooperative research
 & development agreement (CRADA) between two DOE labs (Sandia and LLNL)
--- a/doc/Section_python.html
+++ b/doc/Section_python.html
@ -1,5 +1,5 @@
 <HTML>
-<CENTER><A HREF = "Section_modify.html">Previous Section</A> - <A HREF = "http://lammps.sandia.gov">LAMMPS WWW Site</A> - <A HREF = "Manual.html">LAMMPS Documentation</A> - <A HREF = "Section_commands.html#comm">LAMMPS Commands</A> - <A HREF = "Section_errors.html">Next Section</A> 
+<CENTER><A HREF = "Section_modify.html">Previous Section</A> - <A HREF = "http://lammps.sandia.gov">LAMMPS WWW Site</A> - <A HREF = "Manual.html">LAMMPS Documentation</A> - <A HREF = "Section_commands.html#comm">LAMMPS Commands</A> - <A HREF = "Section_accelerate.html">Next Section</A> 
 </CENTER>


--- a/doc/Section_python.txt
+++ b/doc/Section_python.txt
@ -1,4 +1,4 @@
-"Previous Section"_Section_modify.html - "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next Section"_Section_errors.html :c
+"Previous Section"_Section_modify.html - "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next Section"_Section_accelerate.html :c

 :link(lws,http://lammps.sandia.gov)
 :link(ld,Manual.html)
--- a/doc/Section_start.html
+++ b/doc/Section_start.html
@ -21,8 +21,7 @@ experienced users.
 2.5 <A HREF = "#2_5">Running LAMMPS</A><BR>
 2.6 <A HREF = "#2_6">Command-line options</A><BR>
 2.7 <A HREF = "#2_7">Screen output</A><BR>
-2.8 <A HREF = "#2_8">Running on GPUs</A><BR>
-2.9 <A HREF = "#2_9">Tips for users of previous versions</A> <BR>
+2.8 <A HREF = "#2_8">Tips for users of previous versions</A> <BR>

 <HR>

@ -467,21 +466,19 @@ documentation.

 <A NAME = "2_3_2"></A><B><I>Including/excluding packages:</I></B> 

-<P>Any or all packages can be included or excluded independently BEFORE
+<P>To use or not use a package you must be include or exclude it before
 LAMMPS is built.
 </P>
-<P>The two exceptions to this are the "gpu" and "opt" packages.  Some of
-the files in these packages require other packages to also be
-included.  If this is not the case, then those subsidiary files in
-"gpu" and "opt" will not be installed either.  To install all the
-files in package "gpu", the "asphere" and "kspace" packages must also be 
-installed. To install all the files in package "opt", the "kspace" and 
-"manybody" packages must also be installed.
+<P>Some packages have individual files that depend on other packages
+being included, but LAMMPS checks for this and does the right thing.
+I.e. individual files are only included if their dependencies are
+already included.  Likewise, if a package is excluded, other files
+dependent on that package are also excluded.
 </P>
-<P>You may wish to exclude certain packages if you will never run certain
-kinds of simulations.  This will keep you from having to build
-auxiliary libraries (see below) and will produce a smaller executable
-which may run a bit faster.
+<P>The reason to exclude packages is if you will never run certain kinds
+of simulations.  This will keep you from having to build auxiliary
+libraries (see below) and will produce a smaller executable which may
+run a bit faster.
 </P>
 <P>By default, LAMMPS includes only the "kspace", "manybody", and
 "molecule" packages.
@ -531,15 +528,24 @@ link with the library file.
 </P>
 <P>The "atc" library in lib/atc is used by the user-atc package.  It
 provides continuum field estimation and molecular dynamics-finite
-element coupling methods.  It was written primarily by Reese Jones,
-Jeremy Templeton and Jonathan Zimmerman at Sandia.
+element coupling methods.
 </P>
-<P>The "gpu" library in lib/gpu is used by the gpu package.  It
-contains code to enable portions of LAMMPS to run on a GPU chip
-associated with your CPU.  Currently, only NVIDIA GPUs are supported.
+<P>The "cuda" library in lib/cuda is used by the user-cuda package.  It
+contains code to enable portions of LAMMPS to run on NVIDIA GPUs
+associated with your CPUs.  Currently, only NVIDIA GPUs are supported.
 Building this library requires NVIDIA Cuda tools to be installed on
-your system.  See the <A HREF = "#2_8">Running on GPUs</A> section below for more
-info about installing and using Cuda.
+your system.  See <A HREF = "Section_accelerate.html#10_3">this section</A> of the
+manual for more information about using this package effectively and
+how it differs from the gpu package.
+</P>
+<P>The "gpu" library in lib/gpu is used by the gpu package.  It contains
+code to enable portions of LAMMPS to run on GPUs associated with your
+CPUs.  Currently, only NVIDIA GPUs are supported, but eventually this
+may be extended to OpenCL.  Building this library requires NVIDIA Cuda
+tools to be installed on your system.  See <A HREF = "Section_accelerate.html#10_2">this
+section</A> of the manual for more
+information about using this package effectively and how it differs
+from the user-cuda package.
 </P>
 <P>The "meam" library in lib/meam is used by the meam package.
 computes the modified embedded atom method potential, which is a
@ -573,8 +579,8 @@ obtained by adding a machine-specific macro definition to the CCFLAGS
 variable in your Makefile e.g. -D_IBM. See pair_reax_fortran.h for
 more info.
 </P>
-<P>As described in its README file, each library is built by typing
-something like
+<P>As described in the README file in each lib directory, each library is
+typically built by typing something like
 </P>
 <PRE>make -f Makefile.g++ 
 </PRE>
@ -586,6 +592,11 @@ need to edit or add one.  For example, in the case of Fotran-based
 libraries, your system must have a Fortran compiler, the settings for
 which will be in the Makefile.
 </P>
+<P>Note that the cuda library, used by the user-cuda package is an
+exception.  See its README file and <A HREF = "Section_accelerate.html#10_3">this
+section</A> of the manual for instructions
+on how to build it.
+</P>
 <HR>

 <A NAME = "2_3_4"></A><B><I>Additional Makefile settings for extra libraries:</I></B> 
@ -604,10 +615,10 @@ make g++
 from LAMMPS.  As in this example, you must also include the package
 that uses and wraps the library before you build LAMMPS itself.
 </P>
-<P>As discussed in point (2.4) of <A HREF = "#2_2_4">this section</A> above, there are
+<P>As discussed in point (3.e) of <A HREF = "#2_2_4">this section</A> above, there are
 settings in the low-level Makefile that specify additional system
-libraries needed by individual LAMMPS add-on libraries.  These are the
-settings you must specify correctly in your low-level Makefile in
+libraries needed by some of the LAMMPS add-on libraries.  These are
+the settings you must specify correctly in your low-level Makefile in
 lammps/src/MAKE, such as Makefile.foo:
 </P>
 <P>To use the gpu package and library, the settings for gpu_SYSLIB and
@ -620,7 +631,7 @@ reax_SYSPATH must be correct.  This is so that the C++ compiler can
 perform a cross-language link using the appropriate system Fortran
 libraries.
 </P>
-<P>To use the user-atc package and library, the settings for
+<P>To use the user-atc package and atc library, the settings for
 user-atc_SYSLIB and user-atc_SYSPATH must be correct.  This is so that
 the appropriate BLAS and LAPACK libs, used by the user-atc library,
 can be found.
@ -825,9 +836,9 @@ standard version is created.
 </P>
 <P>The default value of this switch is "none", unless LAMMPS was built
 with the USER-CUDA package, in which case the default value is "cuda".
-See the <A HREF = "accelerator.html">acclerator</A> command doc page for info on how
-to turn off/on the suffix associated with this switch within your
-input script.
+See the <A HREF = "accelerator.html">acclerator</A> command for info on how to turn
+off/on the suffix associated with this switch within your input
+script.
 </P>
 <PRE>-echo style 
 </PRE>
@ -1016,140 +1027,7 @@ communication, roughly 75% in the example above.
 </P>
 <HR>

-<H4><A NAME = "2_8"></A>2.8 Running on GPUs 
-</H4>
-<P>A few LAMMPS <A HREF = "pair_style.html">pair styles</A> can be run on graphical
-processing units (GPUs).  We plan to add more over time.  Currently,
-they only support NVIDIA GPU cards.  To use them you need to install
-certain NVIDIA CUDA software on your system:
-</P>
-<UL><LI>Check if you have an NVIDIA card: cat /proc/driver/nvidia/cards/0
-<LI>Go to http://www.nvidia.com/object/cuda_get.html
-<LI>Install a driver and toolkit appropriate for your system (SDK is not necessary)
-<LI>Follow the instructions in README in lammps/lib/gpu to build the library
-<LI>Run lammps/lib/gpu/nvc_get_devices to list supported devices and properties 
-</UL>
-<H4>GPU configuration 
-</H4>
-<P>When using GPUs, you are restricted to one physical GPU per LAMMPS
-process. Multiple processes can share a single GPU and in many cases
-it will be more efficient to run with multiple processes per GPU. Any
-GPU accelerated style requires that <A HREF = "fix_gpu.html">fix gpu</A> be used in
-the input script to select and initialize the GPUs. The format for the
-fix is:
-</P>
-<PRE>fix <I>name</I> all gpu <I>mode</I> <I>first</I> <I>last</I> <I>split</I> 
-</PRE>
-<P>where <I>name</I> is the name for the fix. The gpu fix must be the first
-fix specified for a given run, otherwise the program will exit with an
-error. The gpu fix will not have any effect on runs that do not use
-GPU acceleration; there should be no problem with specifying the fix
-first in any input script.
-</P>
-<P><I>mode</I> can be either "force" or "force/neigh". In the former, neighbor
-list calculation is performed on the CPU using the standard LAMMPS
-routines. In the latter, the neighbor list calculation is performed on
-the GPU. The GPU neighbor list can be used for better performance,
-however, it cannot not be used with a triclinic box or with
-<A HREF = "pair_hybrid.html">hybrid</A> pair styles.
-</P>
-<P>There are cases when it might be more efficient to select the CPU for
-neighbor list builds. If a non-GPU enabled style requires a neighbor
-list, it will also be built using CPU routines. Redundant CPU and GPU
-neighbor list calculations will typically be less efficient.
-</P>
-<P><I>first</I> is the ID (as reported by lammps/lib/gpu/nvc_get_devices) of
-the first GPU that will be used on each node. <I>last</I> is the ID of the
-last GPU that will be used on each node. If you have only one GPU per
-node, <I>first</I> and <I>last</I> will typically both be 0. Selecting a
-non-sequential set of GPU IDs (e.g. 0,1,3) is not currently supported.
-</P>
-<P><I>split</I> is the fraction of particles whose forces, torques, energies,
-and/or virials will be calculated on the GPU. This can be used to
-perform CPU and GPU force calculations simultaneously. If <I>split</I> is
-negative, the software will attempt to calculate the optimal fraction
-automatically every 25 timesteps based on CPU and GPU timings. Because
-the GPU speedups are dependent on the number of particles, automatic
-calculation of the split can be less efficient, but typically results
-in loop times within 20% of an optimal fixed split.
-</P>
-<P>If you have two GPUs per node, 8 CPU cores per node, and would like to
-run on 4 nodes with dynamic balancing of force calculation across CPU
-and GPU cores, the fix might be
-</P>
-<PRE>fix 0 all gpu force/neigh 0 1 -1 
-</PRE>
-<P>with LAMMPS run on 32 processes. In this case, all CPU cores and GPU
-devices on the nodes would be utilized.  Each GPU device would be
-shared by 4 CPU cores. The CPU cores would perform force calculations
-for some fraction of the particles at the same time the GPUs performed
-force calculation for the other particles.
-</P>
-<P>Because of the large number of cores on each GPU device, it might be
-more efficient to run on fewer processes per GPU when the number of
-particles per process is small (100's of particles); this can be
-necessary to keep the GPU cores busy.
-</P>
-<H4>GPU input script 
-</H4>
-<P>In order to use GPU acceleration in LAMMPS, <A HREF = "fix_gpu.html">fix_gpu</A>
-should be used in order to initialize and configure the GPUs for
-use. Additionally, GPU enabled styles must be selected in the input
-script. Currently, this is limited to a few <A HREF = "pair_style.html">pair
-styles</A> and PPPM.  Some GPU-enabled styles have
-additional restrictions listed in their documentation.
-</P>
-<H4>GPU asynchronous pair computation 
-</H4>
-<P>The GPU accelerated pair styles can be used to perform pair style
-force calculation on the GPU while other calculations are performed on
-the CPU. One method to do this is to specify a <I>split</I> in the gpu fix
-as described above.  In this case, force calculation for the pair
-style will also be performed on the CPU.
-</P>
-<P>When the CPU work in a GPU pair style has finished, the next force
-computation will begin, possibly before the GPU has finished. If
-<I>split</I> is 1.0 in the gpu fix, the next force computation will begin
-almost immediately. This can be used to run a
-<A HREF = "pair_hybrid.html">hybrid</A> GPU pair style at the same time as a hybrid
-CPU pair style. In this case, the GPU pair style should be first in
-the hybrid command in order to perform simultaneous calculations. This
-also allows <A HREF = "bond_style.html">bond</A>, <A HREF = "angle_style.html">angle</A>,
-<A HREF = "dihedral_style.html">dihedral</A>, <A HREF = "improper_style.html">improper</A>, and
-<A HREF = "kspace_style.html">long-range</A> force computations to be run
-simultaneously with the GPU pair style.  Once all CPU force
-computations have completed, the gpu fix will block until the GPU has
-finished all work before continuing the run.
-</P>
-<H4>GPU timing 
-</H4>
-<P>GPU accelerated pair styles can perform computations asynchronously
-with CPU computations. The "Pair" time reported by LAMMPS will be the
-maximum of the time required to complete the CPU pair style
-computations and the time required to complete the GPU pair style
-computations. Any time spent for GPU-enabled pair styles for
-computations that run simultaneously with <A HREF = "bond_style.html">bond</A>,
-<A HREF = "angle_style.html">angle</A>, <A HREF = "dihedral_style.html">dihedral</A>,
-<A HREF = "improper_style.html">improper</A>, and <A HREF = "kspace_style.html">long-range</A>
-calculations will not be included in the "Pair" time.
-</P>
-<P>When <I>mode</I> for the gpu fix is force/neigh, the time for neighbor list
-calculations on the GPU will be added into the "Pair" time, not the
-"Neigh" time. A breakdown of the times required for various tasks on
-the GPU (data copy, neighbor calculations, force computations, etc.)
-are output only with the LAMMPS screen output at the end of each
-run. These timings represent total time spent on the GPU for each
-routine, regardless of asynchronous CPU calculations.
-</P>
-<H4>GPU single vs double precision 
-</H4>
-<P>See the lammps/lib/gpu/README file for instructions on how to build
-the LAMMPS gpu library for single, mixed, and double precision.  The
-latter requires that your GPU card supports double precision.
-</P>
-<HR>
-
-<H4><A NAME = "2_9"></A>2.9 Tips for users of previous LAMMPS versions 
+<H4><A NAME = "2_8"></A>2.8 Tips for users of previous LAMMPS versions 
 </H4>
 <P>The current C++ began with a complete rewrite of LAMMPS 2001, which
 was written in F90.  Features of earlier versions of LAMMPS are listed
--- a/doc/Section_start.txt
+++ b/doc/Section_start.txt
@ -18,8 +18,7 @@ experienced users.
 2.5 "Running LAMMPS"_#2_5
 2.6 "Command-line options"_#2_6
 2.7 "Screen output"_#2_7
-2.8 "Running on GPUs"_#2_8
-2.9 "Tips for users of previous versions"_#2_9 :all(b)
+2.8 "Tips for users of previous versions"_#2_8 :all(b)

 :line

@ -460,21 +459,19 @@ documentation.

 [{Including/excluding packages:}] :link(2_3_2)

-Any or all packages can be included or excluded independently BEFORE
+To use or not use a package you must be include or exclude it before
 LAMMPS is built.

-The two exceptions to this are the "gpu" and "opt" packages.  Some of
-the files in these packages require other packages to also be
-included.  If this is not the case, then those subsidiary files in
-"gpu" and "opt" will not be installed either.  To install all the
-files in package "gpu", the "asphere" and "kspace" packages must also be 
-installed. To install all the files in package "opt", the "kspace" and 
-"manybody" packages must also be installed.
+Some packages have individual files that depend on other packages
+being included, but LAMMPS checks for this and does the right thing.
+I.e. individual files are only included if their dependencies are
+already included.  Likewise, if a package is excluded, other files
+dependent on that package are also excluded.

-You may wish to exclude certain packages if you will never run certain
-kinds of simulations.  This will keep you from having to build
-auxiliary libraries (see below) and will produce a smaller executable
-which may run a bit faster.
+The reason to exclude packages is if you will never run certain kinds
+of simulations.  This will keep you from having to build auxiliary
+libraries (see below) and will produce a smaller executable which may
+run a bit faster.

 By default, LAMMPS includes only the "kspace", "manybody", and
 "molecule" packages.
@ -524,15 +521,24 @@ Here is a bit of information about each library:

 The "atc" library in lib/atc is used by the user-atc package.  It
 provides continuum field estimation and molecular dynamics-finite
-element coupling methods.  It was written primarily by Reese Jones,
-Jeremy Templeton and Jonathan Zimmerman at Sandia.
+element coupling methods.

-The "gpu" library in lib/gpu is used by the gpu package.  It
-contains code to enable portions of LAMMPS to run on a GPU chip
-associated with your CPU.  Currently, only NVIDIA GPUs are supported.
+The "cuda" library in lib/cuda is used by the user-cuda package.  It
+contains code to enable portions of LAMMPS to run on NVIDIA GPUs
+associated with your CPUs.  Currently, only NVIDIA GPUs are supported.
 Building this library requires NVIDIA Cuda tools to be installed on
-your system.  See the "Running on GPUs"_#2_8 section below for more
-info about installing and using Cuda.
+your system.  See "this section"_Section_accelerate.html#10_3 of the
+manual for more information about using this package effectively and
+how it differs from the gpu package.
+
+The "gpu" library in lib/gpu is used by the gpu package.  It contains
+code to enable portions of LAMMPS to run on GPUs associated with your
+CPUs.  Currently, only NVIDIA GPUs are supported, but eventually this
+may be extended to OpenCL.  Building this library requires NVIDIA Cuda
+tools to be installed on your system.  See "this
+section"_Section_accelerate.html#10_2 of the manual for more
+information about using this package effectively and how it differs
+from the user-cuda package.

 The "meam" library in lib/meam is used by the meam package.
 computes the modified embedded atom method potential, which is a
@ -566,8 +572,8 @@ obtained by adding a machine-specific macro definition to the CCFLAGS
 variable in your Makefile e.g. -D_IBM. See pair_reax_fortran.h for
 more info.

-As described in its README file, each library is built by typing
-something like
+As described in the README file in each lib directory, each library is
+typically built by typing something like

 make -f Makefile.g++ :pre

@ -579,6 +585,11 @@ need to edit or add one.  For example, in the case of Fotran-based
 libraries, your system must have a Fortran compiler, the settings for
 which will be in the Makefile.

+Note that the cuda library, used by the user-cuda package is an
+exception.  See its README file and "this
+section"_Section_accelerate.html#10_3 of the manual for instructions
+on how to build it.
+
 :line

 [{Additional Makefile settings for extra libraries:}] :link(2_3_4)
@ -597,10 +608,10 @@ Also note that simply building the library is not sufficient to use it
 from LAMMPS.  As in this example, you must also include the package
 that uses and wraps the library before you build LAMMPS itself.

-As discussed in point (2.4) of "this section"_#2_2_4 above, there are
+As discussed in point (3.e) of "this section"_#2_2_4 above, there are
 settings in the low-level Makefile that specify additional system
-libraries needed by individual LAMMPS add-on libraries.  These are the
-settings you must specify correctly in your low-level Makefile in
+libraries needed by some of the LAMMPS add-on libraries.  These are
+the settings you must specify correctly in your low-level Makefile in
 lammps/src/MAKE, such as Makefile.foo:

 To use the gpu package and library, the settings for gpu_SYSLIB and
@ -613,7 +624,7 @@ reax_SYSPATH must be correct.  This is so that the C++ compiler can
 perform a cross-language link using the appropriate system Fortran
 libraries.

-To use the user-atc package and library, the settings for
+To use the user-atc package and atc library, the settings for
 user-atc_SYSLIB and user-atc_SYSPATH must be correct.  This is so that
 the appropriate BLAS and LAPACK libs, used by the user-atc library,
 can be found.
@ -815,9 +826,9 @@ standard version is created.

 The default value of this switch is "none", unless LAMMPS was built
 with the USER-CUDA package, in which case the default value is "cuda".
-See the "acclerator"_accelerator.html command doc page for info on how
-to turn off/on the suffix associated with this switch within your
-input script.
+See the "acclerator"_accelerator.html command for info on how to turn
+off/on the suffix associated with this switch within your input
+script.

 -echo style :pre

@ -1006,140 +1017,7 @@ communication, roughly 75% in the example above.

 :line

-2.8 Running on GPUs :h4,link(2_8)
-
-A few LAMMPS "pair styles"_pair_style.html can be run on graphical
-processing units (GPUs).  We plan to add more over time.  Currently,
-they only support NVIDIA GPU cards.  To use them you need to install
-certain NVIDIA CUDA software on your system:
-
-Check if you have an NVIDIA card: cat /proc/driver/nvidia/cards/0
-Go to http://www.nvidia.com/object/cuda_get.html
-Install a driver and toolkit appropriate for your system (SDK is not necessary)
-Follow the instructions in README in lammps/lib/gpu to build the library
-Run lammps/lib/gpu/nvc_get_devices to list supported devices and properties :ul
-
-GPU configuration :h4
-
-When using GPUs, you are restricted to one physical GPU per LAMMPS
-process. Multiple processes can share a single GPU and in many cases
-it will be more efficient to run with multiple processes per GPU. Any
-GPU accelerated style requires that "fix gpu"_fix_gpu.html be used in
-the input script to select and initialize the GPUs. The format for the
-fix is:
-
-fix {name} all gpu {mode} {first} {last} {split} :pre
-
-where {name} is the name for the fix. The gpu fix must be the first
-fix specified for a given run, otherwise the program will exit with an
-error. The gpu fix will not have any effect on runs that do not use
-GPU acceleration; there should be no problem with specifying the fix
-first in any input script.
-
-{mode} can be either "force" or "force/neigh". In the former, neighbor
-list calculation is performed on the CPU using the standard LAMMPS
-routines. In the latter, the neighbor list calculation is performed on
-the GPU. The GPU neighbor list can be used for better performance,
-however, it cannot not be used with a triclinic box or with
-"hybrid"_pair_hybrid.html pair styles.
-
-There are cases when it might be more efficient to select the CPU for
-neighbor list builds. If a non-GPU enabled style requires a neighbor
-list, it will also be built using CPU routines. Redundant CPU and GPU
-neighbor list calculations will typically be less efficient.
-
-{first} is the ID (as reported by lammps/lib/gpu/nvc_get_devices) of
-the first GPU that will be used on each node. {last} is the ID of the
-last GPU that will be used on each node. If you have only one GPU per
-node, {first} and {last} will typically both be 0. Selecting a
-non-sequential set of GPU IDs (e.g. 0,1,3) is not currently supported.
-
-{split} is the fraction of particles whose forces, torques, energies,
-and/or virials will be calculated on the GPU. This can be used to
-perform CPU and GPU force calculations simultaneously. If {split} is
-negative, the software will attempt to calculate the optimal fraction
-automatically every 25 timesteps based on CPU and GPU timings. Because
-the GPU speedups are dependent on the number of particles, automatic
-calculation of the split can be less efficient, but typically results
-in loop times within 20% of an optimal fixed split.
-
-If you have two GPUs per node, 8 CPU cores per node, and would like to
-run on 4 nodes with dynamic balancing of force calculation across CPU
-and GPU cores, the fix might be
-
-fix 0 all gpu force/neigh 0 1 -1 :pre
-
-with LAMMPS run on 32 processes. In this case, all CPU cores and GPU
-devices on the nodes would be utilized.  Each GPU device would be
-shared by 4 CPU cores. The CPU cores would perform force calculations
-for some fraction of the particles at the same time the GPUs performed
-force calculation for the other particles.
-
-Because of the large number of cores on each GPU device, it might be
-more efficient to run on fewer processes per GPU when the number of
-particles per process is small (100's of particles); this can be
-necessary to keep the GPU cores busy.
-
-GPU input script :h4
-
-In order to use GPU acceleration in LAMMPS, "fix_gpu"_fix_gpu.html
-should be used in order to initialize and configure the GPUs for
-use. Additionally, GPU enabled styles must be selected in the input
-script. Currently, this is limited to a few "pair
-styles"_pair_style.html and PPPM.  Some GPU-enabled styles have
-additional restrictions listed in their documentation.
-
-GPU asynchronous pair computation :h4
-
-The GPU accelerated pair styles can be used to perform pair style
-force calculation on the GPU while other calculations are performed on
-the CPU. One method to do this is to specify a {split} in the gpu fix
-as described above.  In this case, force calculation for the pair
-style will also be performed on the CPU.
-
-When the CPU work in a GPU pair style has finished, the next force
-computation will begin, possibly before the GPU has finished. If
-{split} is 1.0 in the gpu fix, the next force computation will begin
-almost immediately. This can be used to run a
-"hybrid"_pair_hybrid.html GPU pair style at the same time as a hybrid
-CPU pair style. In this case, the GPU pair style should be first in
-the hybrid command in order to perform simultaneous calculations. This
-also allows "bond"_bond_style.html, "angle"_angle_style.html,
-"dihedral"_dihedral_style.html, "improper"_improper_style.html, and
-"long-range"_kspace_style.html force computations to be run
-simultaneously with the GPU pair style.  Once all CPU force
-computations have completed, the gpu fix will block until the GPU has
-finished all work before continuing the run.
-
-GPU timing :h4
-
-GPU accelerated pair styles can perform computations asynchronously
-with CPU computations. The "Pair" time reported by LAMMPS will be the
-maximum of the time required to complete the CPU pair style
-computations and the time required to complete the GPU pair style
-computations. Any time spent for GPU-enabled pair styles for
-computations that run simultaneously with "bond"_bond_style.html,
-"angle"_angle_style.html, "dihedral"_dihedral_style.html,
-"improper"_improper_style.html, and "long-range"_kspace_style.html
-calculations will not be included in the "Pair" time.
-
-When {mode} for the gpu fix is force/neigh, the time for neighbor list
-calculations on the GPU will be added into the "Pair" time, not the
-"Neigh" time. A breakdown of the times required for various tasks on
-the GPU (data copy, neighbor calculations, force computations, etc.)
-are output only with the LAMMPS screen output at the end of each
-run. These timings represent total time spent on the GPU for each
-routine, regardless of asynchronous CPU calculations.
-
-GPU single vs double precision :h4
-
-See the lammps/lib/gpu/README file for instructions on how to build
-the LAMMPS gpu library for single, mixed, and double precision.  The
-latter requires that your GPU card supports double precision.
-
-:line
-
-2.9 Tips for users of previous LAMMPS versions :h4,link(2_9)
+2.8 Tips for users of previous LAMMPS versions :h4,link(2_8)

 The current C++ began with a complete rewrite of LAMMPS 2001, which
 was written in F90.  Features of earlier versions of LAMMPS are listed