2011-05-27 07:45:30 +08:00
|
|
|
"Previous Section"_Section_python.html - "LAMMPS WWW Site"_lws - "LAMMPS Documentation"_ld - "LAMMPS Commands"_lc - "Next Section"_Section_errors.html :c
|
|
|
|
|
|
|
|
:link(lws,http://lammps.sandia.gov)
|
|
|
|
:link(ld,Manual.html)
|
|
|
|
:link(lc,Section_commands.html#comm)
|
|
|
|
|
|
|
|
:line
|
|
|
|
|
|
|
|
10. Using accelerated CPU and GPU styles :h3
|
|
|
|
|
2011-06-01 08:11:58 +08:00
|
|
|
NOTE: These doc pages are still incomplete as of 1Jun11.
|
|
|
|
|
|
|
|
NOTE: The USER-CUDA package discussed below has not yet been
|
|
|
|
officially released in LAMMPS.
|
|
|
|
|
2011-05-27 07:45:30 +08:00
|
|
|
Accelerated versions of various "pair_style"_pair_style.html,
|
2011-06-09 04:56:17 +08:00
|
|
|
"fixes"_fix.html, "computes"_compute.html, and other commands have
|
|
|
|
been added to LAMMPS, which will typically run faster than the
|
|
|
|
standard non-accelerated versions, if you have the appropriate
|
|
|
|
hardware on your system.
|
2011-05-27 07:45:30 +08:00
|
|
|
|
2011-06-09 04:56:17 +08:00
|
|
|
The accelerated styles have the same name as the standard styles,
|
|
|
|
except that a suffix is appended. Otherwise, the syntax for the
|
|
|
|
command is identical, their functionality is the same, and the
|
|
|
|
numerical results it produces should also be identical, except for
|
|
|
|
precision and round-off issues.
|
2011-05-27 07:45:30 +08:00
|
|
|
|
|
|
|
For example, all of these variants of the basic Lennard-Jones pair
|
|
|
|
style exist in LAMMPS:
|
|
|
|
|
|
|
|
"pair_style lj/cut"_pair_lj.html
|
|
|
|
"pair_style lj/cut/opt"_pair_lj.html
|
|
|
|
"pair_style lj/cut/gpu"_pair_lj.html
|
|
|
|
"pair_style lj/cut/cuda"_pair_lj.html :ul
|
|
|
|
|
|
|
|
Assuming you have built LAMMPS with the appropriate package, these
|
|
|
|
styles can be invoked by specifying them explicitly in your input
|
2011-06-01 07:08:32 +08:00
|
|
|
script. Or you can use the "-suffix command-line
|
2011-05-27 07:45:30 +08:00
|
|
|
switch"_Section_start.html#2_6 to invoke the accelerated versions
|
2011-06-09 04:56:17 +08:00
|
|
|
automatically, without changing your input script. The
|
|
|
|
"suffix"_suffix.html command also allows you to set a suffix and to
|
|
|
|
turn off/on the comand-line switch setting within your input script.
|
2011-05-27 07:45:30 +08:00
|
|
|
|
|
|
|
Styles with an "opt" suffix are part of the OPT package and typically
|
2011-06-09 04:56:17 +08:00
|
|
|
speed-up the pairwise calculations of your simulation by 5-25%.
|
2011-05-27 07:45:30 +08:00
|
|
|
|
|
|
|
Styles with a "gpu" or "cuda" suffix are part of the GPU or USER-CUDA
|
|
|
|
packages, and can be run on NVIDIA GPUs associated with your CPUs.
|
|
|
|
The speed-up due to GPU usage depends on a variety of factors, as
|
|
|
|
discussed below.
|
|
|
|
|
|
|
|
To see what styles are currently available in each of the accelerated
|
|
|
|
packages, see "this section"_Section_commands.html#3_5 of the manual.
|
|
|
|
A list of accelerated styles is included in the pair, fix, compute,
|
|
|
|
and kspace sections.
|
|
|
|
|
|
|
|
The following sections explain:
|
|
|
|
|
2011-06-09 04:56:17 +08:00
|
|
|
what hardware and software the accelerated styles require
|
2011-05-27 07:45:30 +08:00
|
|
|
how to install the accelerated packages
|
|
|
|
what kind of problems they run best on
|
2011-06-09 04:56:17 +08:00
|
|
|
guidelines for how to use them to best advantage
|
2011-05-27 07:45:30 +08:00
|
|
|
the kinds of speed-ups you can expect :ul
|
|
|
|
|
|
|
|
The final section compares and contrasts the GPU and USER-CUDA
|
|
|
|
packages, since they are both designed to use NVIDIA GPU hardware.
|
|
|
|
|
|
|
|
10.1 "OPT package"_#10_1
|
|
|
|
10.2 "GPU package"_#10_2
|
|
|
|
10.3 "USER-CUDA package"_#10_3
|
|
|
|
10.4 "Comparison of GPU and USER-CUDA packages"_#10_4 :all(b)
|
|
|
|
|
2011-06-09 04:56:17 +08:00
|
|
|
:line
|
2011-05-27 07:45:30 +08:00
|
|
|
:line
|
|
|
|
|
|
|
|
10.1 OPT package :h4,link(10_1)
|
|
|
|
|
2011-05-28 01:59:03 +08:00
|
|
|
The OPT package was developed by James Fischer (High Performance
|
|
|
|
Technologies), David Richie and Vincent Natoli (Stone Ridge
|
2011-06-09 04:56:17 +08:00
|
|
|
Technologies). It contains a handful of pair styles whose compute()
|
|
|
|
methods were rewritten in C++ templated form to reduce the overhead
|
|
|
|
due to if tests and other conditional code.
|
|
|
|
|
|
|
|
The procedure for building LAMMPS with the OPT package is simple. It
|
|
|
|
is the same as for any other package which has no additional library
|
|
|
|
dependencies:
|
|
|
|
|
|
|
|
make yes-opt
|
|
|
|
make machine :pre
|
|
|
|
|
|
|
|
If your input script uses one of the OPT pair styles,
|
|
|
|
you can run it as follows:
|
|
|
|
|
|
|
|
lmp_machine -sf opt < in.script
|
|
|
|
mpirun -np 4 lmp_machine -sf opt < in.script :pre
|
|
|
|
|
|
|
|
You should see a reduction in the "Pair time" printed out at the end
|
|
|
|
of the run. On most machines and problems, this will typically be a 5
|
|
|
|
to 20% savings.
|
2011-05-28 01:59:03 +08:00
|
|
|
|
2011-05-27 07:45:30 +08:00
|
|
|
:line
|
|
|
|
|
|
|
|
10.2 GPU package :h4,link(10_2)
|
|
|
|
|
2011-06-14 07:18:49 +08:00
|
|
|
The GPU package was developed by Mike Brown at ORNL.
|
|
|
|
|
2011-06-09 05:26:06 +08:00
|
|
|
Additional requirements in your input script to run the styles with a
|
|
|
|
{gpu} suffix are as follows:
|
|
|
|
|
|
|
|
The "newton pair"_newton.html setting must be {off} and the "fix
|
|
|
|
gpu"_fix_gpu.html command must be used. The fix controls the GPU
|
|
|
|
selection and initialization steps.
|
|
|
|
|
2011-05-27 07:45:30 +08:00
|
|
|
A few LAMMPS "pair styles"_pair_style.html can be run on graphical
|
|
|
|
processing units (GPUs). We plan to add more over time. Currently,
|
|
|
|
they only support NVIDIA GPU cards. To use them you need to install
|
|
|
|
certain NVIDIA CUDA software on your system:
|
|
|
|
|
2011-05-28 01:59:03 +08:00
|
|
|
Check if you have an NVIDIA card: cat /proc/driver/nvidia/cards/0 Go
|
|
|
|
to http://www.nvidia.com/object/cuda_get.html Install a driver and
|
|
|
|
toolkit appropriate for your system (SDK is not necessary) Follow the
|
|
|
|
instructions in README in lammps/lib/gpu to build the library Run
|
|
|
|
lammps/lib/gpu/nvc_get_devices to list supported devices and
|
|
|
|
properties :ul
|
2011-05-27 07:45:30 +08:00
|
|
|
|
|
|
|
GPU configuration :h4
|
|
|
|
|
|
|
|
When using GPUs, you are restricted to one physical GPU per LAMMPS
|
2011-06-14 07:18:49 +08:00
|
|
|
process. Multiple processes can share a single GPU and in many cases
|
2011-05-27 07:45:30 +08:00
|
|
|
it will be more efficient to run with multiple processes per GPU. Any
|
|
|
|
GPU accelerated style requires that "fix gpu"_fix_gpu.html be used in
|
|
|
|
the input script to select and initialize the GPUs. The format for the
|
|
|
|
fix is:
|
|
|
|
|
|
|
|
fix {name} all gpu {mode} {first} {last} {split} :pre
|
|
|
|
|
|
|
|
where {name} is the name for the fix. The gpu fix must be the first
|
|
|
|
fix specified for a given run, otherwise the program will exit with an
|
|
|
|
error. The gpu fix will not have any effect on runs that do not use
|
|
|
|
GPU acceleration; there should be no problem with specifying the fix
|
|
|
|
first in any input script.
|
|
|
|
|
|
|
|
{mode} can be either "force" or "force/neigh". In the former, neighbor
|
|
|
|
list calculation is performed on the CPU using the standard LAMMPS
|
|
|
|
routines. In the latter, the neighbor list calculation is performed on
|
|
|
|
the GPU. The GPU neighbor list can be used for better performance,
|
|
|
|
however, it cannot not be used with a triclinic box or with
|
|
|
|
"hybrid"_pair_hybrid.html pair styles.
|
|
|
|
|
|
|
|
There are cases when it might be more efficient to select the CPU for
|
|
|
|
neighbor list builds. If a non-GPU enabled style requires a neighbor
|
|
|
|
list, it will also be built using CPU routines. Redundant CPU and GPU
|
|
|
|
neighbor list calculations will typically be less efficient.
|
|
|
|
|
|
|
|
{first} is the ID (as reported by lammps/lib/gpu/nvc_get_devices) of
|
|
|
|
the first GPU that will be used on each node. {last} is the ID of the
|
|
|
|
last GPU that will be used on each node. If you have only one GPU per
|
|
|
|
node, {first} and {last} will typically both be 0. Selecting a
|
|
|
|
non-sequential set of GPU IDs (e.g. 0,1,3) is not currently supported.
|
|
|
|
|
|
|
|
{split} is the fraction of particles whose forces, torques, energies,
|
|
|
|
and/or virials will be calculated on the GPU. This can be used to
|
|
|
|
perform CPU and GPU force calculations simultaneously. If {split} is
|
|
|
|
negative, the software will attempt to calculate the optimal fraction
|
|
|
|
automatically every 25 timesteps based on CPU and GPU timings. Because
|
|
|
|
the GPU speedups are dependent on the number of particles, automatic
|
|
|
|
calculation of the split can be less efficient, but typically results
|
|
|
|
in loop times within 20% of an optimal fixed split.
|
|
|
|
|
|
|
|
If you have two GPUs per node, 8 CPU cores per node, and would like to
|
|
|
|
run on 4 nodes with dynamic balancing of force calculation across CPU
|
|
|
|
and GPU cores, the fix might be
|
|
|
|
|
|
|
|
fix 0 all gpu force/neigh 0 1 -1 :pre
|
|
|
|
|
|
|
|
with LAMMPS run on 32 processes. In this case, all CPU cores and GPU
|
|
|
|
devices on the nodes would be utilized. Each GPU device would be
|
|
|
|
shared by 4 CPU cores. The CPU cores would perform force calculations
|
|
|
|
for some fraction of the particles at the same time the GPUs performed
|
|
|
|
force calculation for the other particles.
|
|
|
|
|
|
|
|
Because of the large number of cores on each GPU device, it might be
|
|
|
|
more efficient to run on fewer processes per GPU when the number of
|
|
|
|
particles per process is small (100's of particles); this can be
|
|
|
|
necessary to keep the GPU cores busy.
|
|
|
|
|
|
|
|
GPU input script :h4
|
|
|
|
|
|
|
|
In order to use GPU acceleration in LAMMPS, "fix_gpu"_fix_gpu.html
|
|
|
|
should be used in order to initialize and configure the GPUs for
|
|
|
|
use. Additionally, GPU enabled styles must be selected in the input
|
|
|
|
script. Currently, this is limited to a few "pair
|
|
|
|
styles"_pair_style.html and PPPM. Some GPU-enabled styles have
|
|
|
|
additional restrictions listed in their documentation.
|
|
|
|
|
|
|
|
GPU asynchronous pair computation :h4
|
|
|
|
|
|
|
|
The GPU accelerated pair styles can be used to perform pair style
|
|
|
|
force calculation on the GPU while other calculations are performed on
|
|
|
|
the CPU. One method to do this is to specify a {split} in the gpu fix
|
|
|
|
as described above. In this case, force calculation for the pair
|
|
|
|
style will also be performed on the CPU.
|
|
|
|
|
|
|
|
When the CPU work in a GPU pair style has finished, the next force
|
|
|
|
computation will begin, possibly before the GPU has finished. If
|
|
|
|
{split} is 1.0 in the gpu fix, the next force computation will begin
|
|
|
|
almost immediately. This can be used to run a
|
|
|
|
"hybrid"_pair_hybrid.html GPU pair style at the same time as a hybrid
|
|
|
|
CPU pair style. In this case, the GPU pair style should be first in
|
|
|
|
the hybrid command in order to perform simultaneous calculations. This
|
|
|
|
also allows "bond"_bond_style.html, "angle"_angle_style.html,
|
|
|
|
"dihedral"_dihedral_style.html, "improper"_improper_style.html, and
|
|
|
|
"long-range"_kspace_style.html force computations to be run
|
|
|
|
simultaneously with the GPU pair style. Once all CPU force
|
|
|
|
computations have completed, the gpu fix will block until the GPU has
|
|
|
|
finished all work before continuing the run.
|
|
|
|
|
|
|
|
GPU timing :h4
|
|
|
|
|
|
|
|
GPU accelerated pair styles can perform computations asynchronously
|
|
|
|
with CPU computations. The "Pair" time reported by LAMMPS will be the
|
|
|
|
maximum of the time required to complete the CPU pair style
|
|
|
|
computations and the time required to complete the GPU pair style
|
|
|
|
computations. Any time spent for GPU-enabled pair styles for
|
|
|
|
computations that run simultaneously with "bond"_bond_style.html,
|
|
|
|
"angle"_angle_style.html, "dihedral"_dihedral_style.html,
|
|
|
|
"improper"_improper_style.html, and "long-range"_kspace_style.html
|
|
|
|
calculations will not be included in the "Pair" time.
|
|
|
|
|
|
|
|
When {mode} for the gpu fix is force/neigh, the time for neighbor list
|
|
|
|
calculations on the GPU will be added into the "Pair" time, not the
|
|
|
|
"Neigh" time. A breakdown of the times required for various tasks on
|
|
|
|
the GPU (data copy, neighbor calculations, force computations, etc.)
|
|
|
|
are output only with the LAMMPS screen output at the end of each
|
|
|
|
run. These timings represent total time spent on the GPU for each
|
|
|
|
routine, regardless of asynchronous CPU calculations.
|
|
|
|
|
|
|
|
GPU single vs double precision :h4
|
|
|
|
|
|
|
|
See the lammps/lib/gpu/README file for instructions on how to build
|
|
|
|
the LAMMPS gpu library for single, mixed, and double precision. The
|
|
|
|
latter requires that your GPU card supports double precision.
|
|
|
|
|
|
|
|
:line
|
|
|
|
|
|
|
|
10.3 USER-CUDA package :h4,link(10_3)
|
|
|
|
|
2011-05-28 01:59:03 +08:00
|
|
|
The USER-CUDA package was developed by Christian Trott at U Technology
|
|
|
|
Ilmenau in Germany.
|
|
|
|
|
2011-06-14 07:18:49 +08:00
|
|
|
This package will only be of any use to you, if you have an NVIDIA(tm)
|
|
|
|
graphics card being CUDA(tm) enabled. Your GPU needs to support
|
|
|
|
Compute Capability 1.3. This list may help
|
|
|
|
you to find out the Compute Capability of your card:
|
|
|
|
|
|
|
|
http://en.wikipedia.org/wiki/Comparison_of_Nvidia_graphics_processing_units
|
|
|
|
|
|
|
|
Install the Nvidia Cuda Toolkit in version 3.2 or higher and the
|
|
|
|
corresponding GPU drivers. The Nvidia Cuda SDK is not required for
|
|
|
|
LAMMPSCUDA but we recommend to install it and
|
|
|
|
|
|
|
|
make sure that the sample projects can be compiled without problems.
|
|
|
|
|
|
|
|
You should also be able to compile LAMMPS by typing
|
|
|
|
|
|
|
|
{make YourMachine}
|
|
|
|
|
|
|
|
inside the src directory of LAMMPS root path. If not, you should
|
|
|
|
consult the LAMMPS documentation.
|
|
|
|
|
|
|
|
Compilation :h4
|
|
|
|
|
|
|
|
If your {CUDA} toolkit is not installed in the default directoy
|
|
|
|
{/usr/local/cuda} edit the file {lib/cuda/Makefile.common}
|
|
|
|
accordingly.
|
|
|
|
|
|
|
|
Go to {lib/cuda/} and type
|
|
|
|
|
|
|
|
{make OPTIONS}
|
|
|
|
|
|
|
|
where {OPTIONS} are one or more of the following:
|
|
|
|
|
|
|
|
{precision = 2} set precision level: 1 .. single precision, 2
|
|
|
|
.. double precision, 3 .. positions in double precision, 4
|
|
|
|
.. positions and velocities in double precision :ulb,l
|
|
|
|
|
|
|
|
{arch = 20} set GPU compute capability: 20 .. CC2.0 (GF100/110
|
|
|
|
e.g. C2050,GTX580,GTX470), 21 .. CC2.1 (GF104/114 e.g. GTX560, GTX460,
|
|
|
|
GTX450), 13 .. CC1.3 (GF200 e.g. C1060, GTX285) :l
|
|
|
|
|
|
|
|
{prec_timer = 1} do not use precision timers if set to 0. This is
|
|
|
|
usually only usefull for compiling on Mac machines. :l
|
|
|
|
|
|
|
|
{dbg = 0} activate debug mode when setting to 1. Only usefull for
|
|
|
|
developers. :l
|
|
|
|
|
|
|
|
{cufft = 1} set CUDA FFT library. Can currently only be used to not
|
|
|
|
compile with cufft support (set to 0). In the future other CUDA
|
|
|
|
enabled FFT libraries might be supported. :l,ule
|
|
|
|
|
|
|
|
The settings will be written to the {lib/cuda/Makefile.defaults}. When
|
|
|
|
compiling with {make} only those settings will be used.
|
|
|
|
|
|
|
|
Go to {src}, install the USER-CUDA package with {make yes-USER-CUDA}
|
|
|
|
and compile the binary with {make YourMachine}. You might need to
|
|
|
|
delete old object files if you compiled without the USER-CUDA package
|
|
|
|
before, using the same Machine file ({rm Obj_YourMachine/*}).
|
|
|
|
|
|
|
|
CUDA versions of classes are only installed if the corresponding CPU
|
|
|
|
versions are installed as well. E.g. you need to install the KSPACE
|
|
|
|
package to use {pppm/cuda}.
|
|
|
|
|
|
|
|
Usage :h4
|
|
|
|
|
|
|
|
In order to make use of the GPU acceleration provided by the USER-CUDA
|
|
|
|
package, you only have to add
|
|
|
|
|
|
|
|
{accelerator cuda}
|
|
|
|
|
|
|
|
at the top of your input script. See the "accelerator"_accelerator.html command for details of additional options.
|
|
|
|
|
|
|
|
When compiling with USER-CUDA support the "-accelerator command-line
|
|
|
|
switch"_Section_start.html#2_6 is effectively set to "cuda" by default
|
|
|
|
and does not have to be given.
|
|
|
|
|
|
|
|
If you want to run simulations without using the "cuda" styles with
|
|
|
|
the same binary, you need to turn it explicitely off by giving "-a
|
|
|
|
none", "-a opt" or "-a gpu" as a command-
|
|
|
|
|
|
|
|
line argument.
|
|
|
|
|
|
|
|
The kspace style {pppm/cuda} has to be requested explicitely.
|
|
|
|
|
|
|
|
|
2011-05-27 07:45:30 +08:00
|
|
|
:line
|
|
|
|
|
|
|
|
10.4 Comparison of GPU and USER-CUDA packages :h4,link(10_4)
|
2011-06-14 07:18:49 +08:00
|
|
|
|
|
|
|
The USER-CUDA package is an alternative package for GPU acceleration
|
|
|
|
that runs as much of the simulation as possible on the GPU. Depending on
|
|
|
|
the simulation, this can provide a significant speedup when the number
|
|
|
|
of atoms per GPU is large.
|
|
|
|
|
|
|
|
The styles available for GPU acceleration
|
|
|
|
will be different in each package.
|
|
|
|
|
|
|
|
The main difference between the "GPU" and the "USER-CUDA" package is
|
|
|
|
that while the latter aims at calculating everything on the device the
|
|
|
|
GPU package uses it as an accelerator for the pair force, neighbor
|
|
|
|
list and pppm calculations only. As a consequence in different
|
|
|
|
scenarios either package can be faster. Generally the GPU package is
|
|
|
|
faster than the USER-CUDA package, if the number of atoms per device
|
|
|
|
is small. Also the GPU package profits from oversubscribing
|
|
|
|
devices. Hence one usually wants to launch two (or more) MPI processes
|
|
|
|
per device.
|
|
|
|
|
|
|
|
The exact crossover where the USER-CUDA package becomes faster depends
|
|
|
|
strongly on the pair-style. For example for a simple Lennard Jones
|
|
|
|
system the crossover (in single precision) can often be found between
|
|
|
|
50,000 - 100,000 atoms per device. When performing double precision
|
|
|
|
calculations this threshold can be significantly smaller. As a result
|
|
|
|
the GPU package can show better "strong scaling" behaviour in
|
|
|
|
comparison with the USER-CUDA package as long as this limit of atoms
|
|
|
|
per GPU is not reached.
|
|
|
|
|
|
|
|
Another scenario where the GPU package can be faster is, when a lot of
|
|
|
|
bonded interactions are calculated. Those are handled by both packages
|
|
|
|
by the host while the device simultaniously calculates the
|
|
|
|
pair-forces. Since, when using the GPU package, one launches several
|
|
|
|
MPI processes per device, this work is spread over more CPU cores as
|
|
|
|
compared to running the same simulation with the USER-CUDA package.
|
|
|
|
|
|
|
|
As a side note: the GPU package performance depends to some extent on
|
|
|
|
optimal bandwidth between host and device. Hence its performance is
|
|
|
|
affected if no full 16 PCIe lanes are available for each device. In
|
|
|
|
HPC environments this can be the case if S2050/70 servers are used,
|
|
|
|
where two devices generally share one PCIe 2.0 16x slot. Also many
|
|
|
|
multi GPU mainboards do not provide full 16 lanes to each of the PCIe
|
|
|
|
2.0 16x slots.
|
|
|
|
|
|
|
|
While the GPU package uses considerable more device memory than the
|
|
|
|
USER-CUDA package, this is generally not much of a problem. Typically
|
|
|
|
run times are larger than desired, before the memory is exhausted.
|
|
|
|
|
|
|
|
Currently the USER-CUDA package supports a wider range of
|
|
|
|
force-fields. On the other hand its performance is considerably
|
|
|
|
reduced if one has to use a fix at every timestep, which is not yet
|
|
|
|
available as a "CUDA"-accelerated version.
|
|
|
|
|
|
|
|
In the end for each simulations its best to just try both packages and
|
|
|
|
see which one is performing better in the particular situation.
|
|
|
|
|
|
|
|
Benchmark :h4
|
|
|
|
|
|
|
|
In the following 4 benchmark systems which are supported by both the
|
|
|
|
GPu and the CUDA package are shown:
|
|
|
|
|
|
|
|
1. Lennard Jones, 2.5A
|
|
|
|
256,000 atoms
|
|
|
|
2.5 A cutoff
|
|
|
|
0.844 density
|
|
|
|
|
|
|
|
2. Lennard Jones, 5.0A
|
|
|
|
256,000 atoms
|
|
|
|
5.0 A cutoff
|
|
|
|
0.844 density
|
|
|
|
|
|
|
|
3. Rhodopsin model
|
|
|
|
256,000 atoms
|
|
|
|
10A cutoff
|
|
|
|
Coulomb via PPPM
|
|
|
|
|
|
|
|
4. Lihtium-Phosphate
|
|
|
|
295650 atoms
|
|
|
|
15A cutoff
|
|
|
|
Coulomb via PPPM
|
|
|
|
|
|
|
|
Hardware:
|
|
|
|
Workstation:
|
|
|
|
2x GTX470
|
|
|
|
i7 950@3GHz
|
|
|
|
24Gb DDR3 @ 1066Mhz
|
|
|
|
CentOS 5.5
|
|
|
|
CUDA 3.2
|
|
|
|
Driver 260.19.12
|
|
|
|
|
|
|
|
eStella:
|
|
|
|
6 Nodes
|
|
|
|
2xC2050
|
|
|
|
2xQDR Infiniband interconnect(aggregate bandwidth 80GBps)
|
|
|
|
Intel X5650 HexCore @ 2.67GHz
|
|
|
|
SL 5.5
|
|
|
|
CUDA 3.2
|
|
|
|
Driver 260.19.26
|
|
|
|
|
|
|
|
Keeneland:
|
|
|
|
HP SL-390 (Ariston) cluster
|
|
|
|
120 nodes
|
|
|
|
2x Intel Westmere hex-core CPUs
|
|
|
|
3xC2070s
|
|
|
|
QDR InfiniBand interconnec
|
|
|
|
|