Commit Graph

55 Commits

Author SHA1 Message Date
Jason Henline 9fc16d4e11 [SE] Fix config bug with CUDA tests
Summary:
It turns out CMake errors out if a processed directory contains source
files that are not used. This was causing an error with the CUDATest.cpp
file when configuring StreamExecutor with the CUDA platform disabled.

Moving CUDATest.cpp to its own directory fixes this problem.

Reviewers: jlebar, jprice

Subscribers: beanz, mgorny, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24618

llvm-svn: 281654
2016-09-15 20:26:28 +00:00
Jason Henline 70720a7e1b [SE] Support CUDA dynamic shared memory
Summary:
Add proper handling for shared memory arguments in the CUDA platform. Also add
in unit tests for CUDA.

Reviewers: jlebar

Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24596

llvm-svn: 281635
2016-09-15 18:11:04 +00:00
Jason Henline b2d62bd071 [SE] Let users specify CUDA path
Summary: Add logic to allow users to specify the CUDA path at configuration time.

Reviewers: jlebar

Subscribers: beanz, mgorny, jlebar, jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24580

llvm-svn: 281626
2016-09-15 16:48:55 +00:00
Jason Henline 6bfc863d74 [SE] Add CUDA platform
Summary:
Basic CUDA platform implementation and cmake infrastructure to control
whether it's used. A few important TODOs will be handled in later
patches:

* Log some error messages that can't easily be returned as Errors.
* Cache modules and kernels to prevent reloading them if someone tries to
  reload a kernel that's already loaded.
* Tolerate shared memory arguments for kernel launches.

Reviewers: jlebar

Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24538

llvm-svn: 281524
2016-09-14 19:58:34 +00:00
Jason Henline b38d8a3a3b [SE] Pack global dev handle addresses
Summary:
We were packing global device memory handles in
`PackedKernelArgumentArray`, but as I was implementing the CUDA
platform, I realized that CUDA wants the address of the handle, not the
handle itself. So this patch switches to packing the address of the
handle.

Reviewers: jlebar

Subscribers: jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24528

llvm-svn: 281424
2016-09-13 23:59:10 +00:00
Jason Henline 3a90112591 Device doc says device is small
llvm-svn: 281423
2016-09-13 23:56:47 +00:00
Jason Henline 16a5352121 [SE] Platforms return Device values
Summary:
Platforms were returning Device pointers, but a Device is now basically
just a pointer to an underlying PlatformDevice, so we will now just pass
it around as a value.

Reviewers: jlebar

Subscribers: jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24537

llvm-svn: 281422
2016-09-13 23:56:46 +00:00
Jason Henline b459eb3529 [SE] KernelSpec return best PTX
Summary:
Before, the kernel spec would only return PTX for exactly the requested
compute capability. With this patch it will now return the PTX with the
largest compute capability that does not exceed that requested compute
capability.

Reviewers: jlebar

Subscribers: jprice, jlebar, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24531

llvm-svn: 281417
2016-09-13 23:29:25 +00:00
Jason Henline 46b5e48fde [SE] Use real HostPlatformDevice for testing
Summary:
Replace uses of SimpleHostPlatformDevice in tests with
HostPlatformDevice.

Reviewers: jlebar

Subscribers: jlebar, jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24519

llvm-svn: 281384
2016-09-13 20:14:44 +00:00
Jason Henline 3088696499 [SE] Host platform implementation
Summary:
This implementation does not currently support multiple concurrent streams, and
it won't allow kernels to be launched with grids larger than one block or
blocks larger than one thread. These limitations could be removed in the future
by launching new threads on the host, but that is not done in this
implementation.

Reviewers: jlebar

Subscribers: beanz, mgorny, jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24473

llvm-svn: 281377
2016-09-13 19:28:02 +00:00
Jason Henline fb62147949 [SE] Add .clang-format
Summary:
The .clang-tidy file is copied from the top-level LLVM source directory.

Also fix warnings generated by clang-format:

* Moved SimpleHostPlatformDevice.h so its header include guard could
  have the right format.
* Changed signatures of methods taking llvm::Twine by value to take it
  by const ref instead.
* Add "noexcept" to some move constructors and assignment operators.
* Removed a bunch of places where single-statement loops and
  conditionals were surrounded with braces. (This was not found by the
  current clang-tidy, but with a local patch that I hope to upstream
  soon.)

Reviewers: jlebar, jprice

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24468

llvm-svn: 281374
2016-09-13 19:25:43 +00:00
Jason Henline 45b467523b [SE] Stop using llvm-config --cxxflags
Summary:
Build configuration was adding $(llvm-config --cxxflags) to the
StreamExecutor CXXFLAGS, but this was causing "-O3" to be passed even
for debug builds, and was making debugging difficult.

The llvm-config call was originally introduced to handle the -fno-rtti
flag because an RTTI StreamExecutor could not link with a no-RTTI LLVM.
This patch converts to using LLVM_ENABLE_RTTI and only adding the
`-fno-rtti` flag if needed, not all the rest of the LLVM CXXFLAGS.

I have tested this with clang-4.0 and gcc-4.8 on Ubuntu. Some work will
probably have to be done to support MSVC.

Reviewers: jlebar

Subscribers: beanz, jprice, parallel_libs-commits, mgorny

Differential Revision: https://reviews.llvm.org/D24474

llvm-svn: 281347
2016-09-13 15:44:18 +00:00
Jason Henline c16fb8748d [SE] Clean up device and host memory slices
Summary:
* Add LLVM_ATTRIBUTE_UNUSED_RESULT used to slicing methods in order to
  emphasize that the slicing is not done in place.
* Change device memory slice function name from `drop_front` to `slice`
  in order to match the naming convention of `llvm::ArrayRef` and host
  memory slice.
* Change the parameter names of host memory slice functions to
  `DropCount` and `TakeCount` to match device memory slice declarations.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24464

llvm-svn: 281239
2016-09-12 17:20:43 +00:00
Jason Henline 57ea481945 [SE] RegisteredHostMemory for async device copies
Summary:
Improve the error-prone interface that allows users to pass host
pointers that haven't been registered to asynchronous copy methods. In
CUDA, this is an extremely easy error to make, and instead of failing at
runtime, it succeeds and gives the right answers by turning the async
copy into a sync copy. So, you silently get a huge performance
degradation if you misuse the old interface. This new interface should
prevent that.

Reviewers: jlebar

Subscribers: jprice, beanz, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24353

llvm-svn: 281225
2016-09-12 16:09:41 +00:00
Jason Henline a3ad6dcfaf [SE] Remove Utils directory
Summary:
There is no purpose in splitting out the Error class from the rest of
the StreamExecutor code. This organization was just a vestige of an old
failed design.

Plus, this change fixes a bug in the build where the utilites library
was not being statically linked in with libstreamexecutor.

Reviewers: jlebar, jprice

Subscribers: beanz, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24434

llvm-svn: 281118
2016-09-09 23:33:58 +00:00
Justin Lebar b9e51397bf [StreamExecutor] Make SE work with an in-tree LLVM build.
Summary:
With these changes, we can put parallel-libs within llvm/projects and
build as normal.

This is kind of the minimal change I could figure out how to make while
still making us compatible with llvm's build system.  Some things I'm
not thrilled about include:

 * The creation of a CoreTests directory (the macros really seemed to
   want this)

 * Pulling SimpleHostPlatformDevice.h into CoreTests.  It seems to me
   this should live inside unittests/include, or maybe tests/include,
   but I didn't want to make that change in this patch.

One important piece of work that remains to be done is to make

  $ ninja check-streamexecutor

run all the tests.  Right now the only way I've figured out to run the
tests is

  $ ninja projects/parallel-libs/streamexecutor/unittests/StreamExecutorUnitTests
  $ projects/parallel-libs/streamexecutor/unittests/CoreTests/CoreTests

Reviewers: jhen

Subscribers: beanz, parallel_libs-commits, jprice

Differential Revision: https://reviews.llvm.org/D24368

llvm-svn: 281091
2016-09-09 21:01:02 +00:00
Jason Henline 5755bb42ff Add streamexecutor-config
Summary:
Similar to llvm-config, gets command-line flags that are needed to build
applications linking against StreamExecutor.

Reviewers: jprice, jlebar

Subscribers: beanz, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24302

llvm-svn: 280955
2016-09-08 16:12:33 +00:00
Jason Henline fe51c2f7b4 [SE] Add getName method to Device class
Reviewers: jhen

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24240

llvm-svn: 280872
2016-09-07 22:26:20 +00:00
Jason Henline 19eeb37b8c [SE] Rename PlatformInterfaces to PlatformDevice
Summary:
The only interface that we ever plan to have in this file is
PlatformDevice, so it makes sense to rename the file to reflect that.

Reviewers: jprice

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24269

llvm-svn: 280737
2016-09-06 19:27:00 +00:00
Jason Henline 18ea094df1 [SE] Remove Platform*Handle classes
Summary:
As pointed out by jprice, these classes don't serve a purpose. Instead,
we stay consistent with the way memory is managed and let the Stream and
Kernel classes directly hold opaque handles to device Stream and Kernel
instances, respectively.

Reviewers: jprice, jlebar

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24213

llvm-svn: 280719
2016-09-06 17:07:22 +00:00
Jason Henline 3956b2840b [SE] Add getByteCount methods for device memory
Summary:
Simple utility methods will prevent users from making mistakes when
converting element counts to byte counts.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24197

llvm-svn: 280563
2016-09-03 00:32:07 +00:00
Jason Henline 91f199c4ca [SE] Remove broken doc ref
llvm-svn: 280512
2016-09-02 18:07:48 +00:00
Jason Henline 1ce1856133 [SE] Doc tweaks
Summary:
* Sections on main page.
* Use std algorithm for equality check in example.
* Add tree view on left side.
* Add extra CSS sheet to restrict content width.
* Add mild background color.
* Restrict alphabetic indexes to 1 column.
* Round corners of content boxes.
* Rename example to CUDASaxpy.cpp.
* Add CUDASaxpy.cpp to "Examples" section.

Reviewers: jprice

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24198

llvm-svn: 280511
2016-09-02 17:59:12 +00:00
Jason Henline 31b88cb030 [SE] GlobalDeviceMemory owns its handle
Summary:
Final step in getting GlobalDeviceMemory to own its handle.

* Make GlobalDeviceMemory movable, but no longer copyable.
* Make Device::freeDeviceMemory function private and make
  GlobalDeviceMemoryBase a friend of Device so GlobalDeviceMemoryBase
  can free its memory in its destructor.
* Make GlobalDeviceMemory constructor private and make Device a friend
  so it can construct GlobalDeviceMemory.
* Remove SharedDeviceMemoryBase class because it is never used.
* Remove explicit memory freeing from example code.

This change just consumes any errors generated during device memory freeing.
The real error handling will be added in a future patch.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24195

llvm-svn: 280509
2016-09-02 17:22:42 +00:00
Jason Henline 75fbe01eeb [SE] Add "install" actions to cmake build
The "install" build target will now copy the StreamExecutor library and
headers to the appropriate subdirectories of CMAKE_INSTALL_PREFIX.

llvm-svn: 280506
2016-09-02 17:19:19 +00:00
Jason Henline f26ef0a27a [SE] Don't pack raw device mem args
Summary:
Step 4 of getting GlobalDeviceMemory to own its handle.

Take out code to pack untyped device memory types as kernel arguments.
When GlobalDeviceMemory owns its handle, users will never touch untyped
device memory types, so they will never pass them as kernel args.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24177

llvm-svn: 280496
2016-09-02 16:10:51 +00:00
Jason Henline c15c9ebb1d [StreamExecutor] Pass device memory by ref
Summary:
Step 3 of getting GlobalDeviceMemory to own its handle.

Since GlobalDeviceMemory will no longer by copy-constructible, we must
pass instances by reference rather than by value.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24172

llvm-svn: 280439
2016-09-02 00:25:52 +00:00
Jason Henline dc2dff6c68 [SE] Make Kernel movable
Summary:
Kernel is basically just a smart pointer to the underlying
implementation, so making it movable prevents having to store a
std::unique_ptr to it.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24150

llvm-svn: 280437
2016-09-02 00:22:05 +00:00
Jason Henline e091f8e814 [StreamExecutor] Read dev array directly in test
Summary:
Step 2 of getting GlobalDeviceMemory to own its handle.

Use the SimpleHostPlatformDevice allocate methods to create device
arrays for tests, and check for successful copies by dereferncing the
device array handle directly because we know it is really a host
pointer.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24148

llvm-svn: 280428
2016-09-01 23:27:39 +00:00
Jason Henline 8e5b54021e [StreamExecutor] Dev handles in platform interface
Summary:
This is the first in a series of patches that will convert
GlobalDeviceMemory to own its device memory handle. The first step is to
remove GlobalDeviceMemoryBase from the PlatformInterface interfaces and
use raw handles there instead. This is useful because
GlobalDeviceMemoryBase is going to lose its importance in this process.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24114

llvm-svn: 280401
2016-09-01 18:48:21 +00:00
Jason Henline e9a12f1175 [SE] Make Stream movable
Summary:
The example code makes it clear that this is a much better design
decision.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24142

llvm-svn: 280397
2016-09-01 18:35:37 +00:00
Jason Henline a8a7fb95ef [SE] Docs use JAVADOC_AUTOBRIEF
That way we don't have to explicitly annotate each brief description as
\brief.

llvm-svn: 280384
2016-09-01 17:47:17 +00:00
Jason Henline c1e2b83d09 [StreamExecutor] getOrDie and dieIfError utils
Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24107

llvm-svn: 280312
2016-08-31 23:30:41 +00:00
Jason Henline 2eb1da8ed0 Exclude examples, unittests from doc gen
Public documentation shouldn't be generated for unit test code and code
that is only meant to be used as snippets in other documentation.

llvm-svn: 280278
2016-08-31 19:02:47 +00:00
Jason Henline 5b363dd294 [StreamExecutor] Add Doxygen main page
Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24066

llvm-svn: 280277
2016-08-31 19:02:44 +00:00
Jason Henline ba65d4412e [StreamExecutor] Add Stream::blockHostUntilDone
Summary: Add the type-safe wrapper to the platform-specific implementation.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24063

llvm-svn: 280182
2016-08-31 00:11:14 +00:00
Jason Henline 90ce6e1e64 [StreamExecutor] Simplify Kernel classes
Summary:
Make the Kernel class follow the pattern of the other classes. It now
has a type-safe user wrapper and a typeless, platform-specific handle.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D24043

llvm-svn: 280176
2016-08-30 23:35:24 +00:00
Jason Henline f14306b01e [StreamExecutor] Fix KernelSpec Doxygen
Summary:
There was a typo where \endcode was spelled as \encode and it was
keeping the whole file document from rendering. I also added in some \c
annotations for inline code stuff to make it look nicer.

Reviewers: jprice

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23941

llvm-svn: 279855
2016-08-26 19:55:32 +00:00
Jason Henline 20cf1eb161 [StreamExecutor] Add Platform and PlatformManager
Summary: Abstractions for a StreamExecutor platform

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23857

llvm-svn: 279779
2016-08-25 21:33:07 +00:00
Jason Henline bcc77b6249 [StreamExecutor] Rename Executor to Device
Summary: This more clearly describes what the class is.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23851

llvm-svn: 279669
2016-08-24 21:31:53 +00:00
Jason Henline 3053bbf3b2 [StreamExecutor] Fix allocateDeviceMemory
Summary:
The return value from PlatformExecutor::allocateDeviceMemory needs to be
converted from Expected<GlobalDeviceMemoryBase> to
Expected<GlobalDeviceMemory<T>> in Executor::allocateDeviceMemory.

A similar bug is also fixed for Executor::allocateHostMemory.

Thanks to jprice for identifying this bug.

Reviewers: jprice, jlebar

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23849

llvm-svn: 279658
2016-08-24 19:42:03 +00:00
Jason Henline 424fc7e611 [StreamExecutor] Clean up device copy comments
Summary:
Consolidate Executor::synchronousCopy* and Stream::thenCopy* methods into
Doxygen method groups and combine all their comments into one section.

Also a "doc" target to the build files to use Doxygen to build the
documentation.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23845

llvm-svn: 279654
2016-08-24 18:56:26 +00:00
Jason Henline bb1322d495 [StreamExecutor] Executor add synchronous methods
Summary:
Add Executor methods that block the host until completion. Since these
methods are host-synchronous, they don't require Stream arguments.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23577

llvm-svn: 279640
2016-08-24 16:58:20 +00:00
Jason Henline a91dc70b18 [StreamExecutor] Rename StreamExecutor to Executor
Summary: No functional changes just renaming this class for better readability.

Reviewers: jlebar

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23574

llvm-svn: 278833
2016-08-16 18:18:32 +00:00
Jason Henline 68b97c7dc9 [StreamExecutor] Add basic Stream operations
Summary: Add the Stream class and a few of the operations it supports.

Reviewers: jlebar, tra

Subscribers: jprice, parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23333

llvm-svn: 278829
2016-08-16 17:58:31 +00:00
Jason Henline b071092756 [StreamExecutor] Add DeviceMemory and kernel arg packing
Summary:
Add types for device memory and add the code that knows how to pack these
device memory types if they are passed as arguments to kernel launches.

Reviewers: jlebar, tra

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23211

llvm-svn: 278021
2016-08-08 16:45:19 +00:00
Jason Henline 7b1fbead89 [StreamExecutor] Add kernel types
Summary: Add StreamExecutor kernel types.

Reviewers: jlebar, tra

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23138

llvm-svn: 277827
2016-08-05 16:05:44 +00:00
Jason Henline 8c04cbf882 [StreamExecutor] Add KernelLoaderSpec
Summary:
Add definitions for the KernelLoaderSpec and MultiKernelLoaderSpec
classes to StreamExecutor. Instances of these classes are generated by the
compiler in order to provide host code with a handle to device code.

Reviewers: jlebar, tra

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D23038

llvm-svn: 277615
2016-08-03 18:04:13 +00:00
Jason Henline 401219ae3f [StreamExecutor] Add error handling library
Summary:
Error handling in StreamExecutor is based on llvm::Error and
llvm::Expected. This CL sets up the StreamExecutor wrapper classes in
the streamexecutor namespace.

All the other StreamExecutor code makes use of this error handling code,
so this is the first CL for checking in StreamExecutor.

Reviewers: jlebar, tra

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D22687

llvm-svn: 277210
2016-07-29 20:45:52 +00:00
Jason Henline a12aa1fa9c Add .clang-format to parallel-libs
Summary:
The format style is set to LLVM. This is consistent with the
parallel-libs project charter which specifies that its libraries will
conform to LLVM coding style.

Reviewers: jlebar

Subscribers: parallel_libs-commits

Differential Revision: https://reviews.llvm.org/D22576

llvm-svn: 276145
2016-07-20 17:49:55 +00:00