mirror of https://github.com/lammps/lammps.git
git-svn-id: svn://svn.icms.temple.edu/lammps-ro/trunk@14267 f3b2605a-c512-4ea7-a41b-209d697bcdaa
This commit is contained in:
parent
77f8955d4e
commit
0bdd0e36cc
|
@ -38,7 +38,7 @@
|
|||
<I>gpu</I> args = Ngpu keyword value ...
|
||||
Ngpu = # of GPUs per node
|
||||
zero or more keyword/value pairs may be appended
|
||||
keywords = <I>neigh</I> or <I>newton</I> or <I>binsize</I> or <I>split</I> or <I>gpuID</I> or <I>tpa</I> or <I>device</I>
|
||||
keywords = <I>neigh</I> or <I>newton</I> or <I>binsize</I> or <I>split</I> or <I>gpuID</I> or <I>tpa</I> or <I>device</I> or <I>blocksize</I>
|
||||
<I>neigh</I> value = <I>yes</I> or <I>no</I>
|
||||
yes = neighbor list build on GPU (default)
|
||||
no = neighbor list build on CPU
|
||||
|
@ -320,13 +320,6 @@ large cutoffs or with a small number of particles per GPU, increasing
|
|||
the value can improve performance. The number of threads per atom must
|
||||
be a power of 2 and currently cannot be greater than 32.
|
||||
</P>
|
||||
<P>The <I>blocksize</I> keyword allows you to tweak the number of threads used
|
||||
per thread block. This number should be a multiple of 32 (for GPUs)
|
||||
and its maximum depends on the specific GPU hardware. Typical choices
|
||||
are 64, 128, or 256. A larger blocksize increases occupancy of
|
||||
individual GPU cores, but reduces the total number of thread blocks,
|
||||
thus may lead to load imbalance.
|
||||
</P>
|
||||
<P>The <I>device</I> keyword can be used to tune parameters optimized for a
|
||||
specific accelerator, when using OpenCL. For CUDA, the <I>device</I>
|
||||
keyword is ignored. Currently, the device type is limited to NVIDIA
|
||||
|
@ -335,6 +328,13 @@ may be added later. The default device type can be specified when
|
|||
building LAMMPS with the GPU library, via settings in the
|
||||
lib/gpu/Makefile that is used.
|
||||
</P>
|
||||
<P>The <I>blocksize</I> keyword allows you to tweak the number of threads used
|
||||
per thread block. This number should be a multiple of 32 (for GPUs)
|
||||
and its maximum depends on the specific GPU hardware. Typical choices
|
||||
are 64, 128, or 256. A larger blocksize increases occupancy of
|
||||
individual GPU cores, but reduces the total number of thread blocks,
|
||||
thus may lead to load imbalance.
|
||||
</P>
|
||||
<HR>
|
||||
|
||||
<P>The <I>intel</I> style invokes settings associated with the use of the
|
||||
|
|
Loading…
Reference in New Issue