Summary:
It turns out CMake errors out if a processed directory contains source
files that are not used. This was causing an error with the CUDATest.cpp
file when configuring StreamExecutor with the CUDA platform disabled.
Moving CUDATest.cpp to its own directory fixes this problem.
Reviewers: jlebar, jprice
Subscribers: beanz, mgorny, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24618
llvm-svn: 281654
Summary:
Add proper handling for shared memory arguments in the CUDA platform. Also add
in unit tests for CUDA.
Reviewers: jlebar
Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24596
llvm-svn: 281635
Summary:
Basic CUDA platform implementation and cmake infrastructure to control
whether it's used. A few important TODOs will be handled in later
patches:
* Log some error messages that can't easily be returned as Errors.
* Cache modules and kernels to prevent reloading them if someone tries to
reload a kernel that's already loaded.
* Tolerate shared memory arguments for kernel launches.
Reviewers: jlebar
Subscribers: beanz, mgorny, jprice, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24538
llvm-svn: 281524
Summary:
We were packing global device memory handles in
`PackedKernelArgumentArray`, but as I was implementing the CUDA
platform, I realized that CUDA wants the address of the handle, not the
handle itself. So this patch switches to packing the address of the
handle.
Reviewers: jlebar
Subscribers: jprice, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24528
llvm-svn: 281424
Summary:
Platforms were returning Device pointers, but a Device is now basically
just a pointer to an underlying PlatformDevice, so we will now just pass
it around as a value.
Reviewers: jlebar
Subscribers: jprice, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24537
llvm-svn: 281422
Summary:
Before, the kernel spec would only return PTX for exactly the requested
compute capability. With this patch it will now return the PTX with the
largest compute capability that does not exceed that requested compute
capability.
Reviewers: jlebar
Subscribers: jprice, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24531
llvm-svn: 281417
Summary:
This implementation does not currently support multiple concurrent streams, and
it won't allow kernels to be launched with grids larger than one block or
blocks larger than one thread. These limitations could be removed in the future
by launching new threads on the host, but that is not done in this
implementation.
Reviewers: jlebar
Subscribers: beanz, mgorny, jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24473
llvm-svn: 281377
Summary:
The .clang-tidy file is copied from the top-level LLVM source directory.
Also fix warnings generated by clang-format:
* Moved SimpleHostPlatformDevice.h so its header include guard could
have the right format.
* Changed signatures of methods taking llvm::Twine by value to take it
by const ref instead.
* Add "noexcept" to some move constructors and assignment operators.
* Removed a bunch of places where single-statement loops and
conditionals were surrounded with braces. (This was not found by the
current clang-tidy, but with a local patch that I hope to upstream
soon.)
Reviewers: jlebar, jprice
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24468
llvm-svn: 281374
Summary:
Build configuration was adding $(llvm-config --cxxflags) to the
StreamExecutor CXXFLAGS, but this was causing "-O3" to be passed even
for debug builds, and was making debugging difficult.
The llvm-config call was originally introduced to handle the -fno-rtti
flag because an RTTI StreamExecutor could not link with a no-RTTI LLVM.
This patch converts to using LLVM_ENABLE_RTTI and only adding the
`-fno-rtti` flag if needed, not all the rest of the LLVM CXXFLAGS.
I have tested this with clang-4.0 and gcc-4.8 on Ubuntu. Some work will
probably have to be done to support MSVC.
Reviewers: jlebar
Subscribers: beanz, jprice, parallel_libs-commits, mgorny
Differential Revision: https://reviews.llvm.org/D24474
llvm-svn: 281347
Summary:
* Add LLVM_ATTRIBUTE_UNUSED_RESULT used to slicing methods in order to
emphasize that the slicing is not done in place.
* Change device memory slice function name from `drop_front` to `slice`
in order to match the naming convention of `llvm::ArrayRef` and host
memory slice.
* Change the parameter names of host memory slice functions to
`DropCount` and `TakeCount` to match device memory slice declarations.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24464
llvm-svn: 281239
Summary:
Improve the error-prone interface that allows users to pass host
pointers that haven't been registered to asynchronous copy methods. In
CUDA, this is an extremely easy error to make, and instead of failing at
runtime, it succeeds and gives the right answers by turning the async
copy into a sync copy. So, you silently get a huge performance
degradation if you misuse the old interface. This new interface should
prevent that.
Reviewers: jlebar
Subscribers: jprice, beanz, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24353
llvm-svn: 281225
Summary:
There is no purpose in splitting out the Error class from the rest of
the StreamExecutor code. This organization was just a vestige of an old
failed design.
Plus, this change fixes a bug in the build where the utilites library
was not being statically linked in with libstreamexecutor.
Reviewers: jlebar, jprice
Subscribers: beanz, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24434
llvm-svn: 281118
Summary:
With these changes, we can put parallel-libs within llvm/projects and
build as normal.
This is kind of the minimal change I could figure out how to make while
still making us compatible with llvm's build system. Some things I'm
not thrilled about include:
* The creation of a CoreTests directory (the macros really seemed to
want this)
* Pulling SimpleHostPlatformDevice.h into CoreTests. It seems to me
this should live inside unittests/include, or maybe tests/include,
but I didn't want to make that change in this patch.
One important piece of work that remains to be done is to make
$ ninja check-streamexecutor
run all the tests. Right now the only way I've figured out to run the
tests is
$ ninja projects/parallel-libs/streamexecutor/unittests/StreamExecutorUnitTests
$ projects/parallel-libs/streamexecutor/unittests/CoreTests/CoreTests
Reviewers: jhen
Subscribers: beanz, parallel_libs-commits, jprice
Differential Revision: https://reviews.llvm.org/D24368
llvm-svn: 281091
Summary:
Similar to llvm-config, gets command-line flags that are needed to build
applications linking against StreamExecutor.
Reviewers: jprice, jlebar
Subscribers: beanz, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24302
llvm-svn: 280955
Summary:
The only interface that we ever plan to have in this file is
PlatformDevice, so it makes sense to rename the file to reflect that.
Reviewers: jprice
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24269
llvm-svn: 280737
Summary:
As pointed out by jprice, these classes don't serve a purpose. Instead,
we stay consistent with the way memory is managed and let the Stream and
Kernel classes directly hold opaque handles to device Stream and Kernel
instances, respectively.
Reviewers: jprice, jlebar
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24213
llvm-svn: 280719
Summary:
Simple utility methods will prevent users from making mistakes when
converting element counts to byte counts.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24197
llvm-svn: 280563
Summary:
* Sections on main page.
* Use std algorithm for equality check in example.
* Add tree view on left side.
* Add extra CSS sheet to restrict content width.
* Add mild background color.
* Restrict alphabetic indexes to 1 column.
* Round corners of content boxes.
* Rename example to CUDASaxpy.cpp.
* Add CUDASaxpy.cpp to "Examples" section.
Reviewers: jprice
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24198
llvm-svn: 280511
Summary:
Final step in getting GlobalDeviceMemory to own its handle.
* Make GlobalDeviceMemory movable, but no longer copyable.
* Make Device::freeDeviceMemory function private and make
GlobalDeviceMemoryBase a friend of Device so GlobalDeviceMemoryBase
can free its memory in its destructor.
* Make GlobalDeviceMemory constructor private and make Device a friend
so it can construct GlobalDeviceMemory.
* Remove SharedDeviceMemoryBase class because it is never used.
* Remove explicit memory freeing from example code.
This change just consumes any errors generated during device memory freeing.
The real error handling will be added in a future patch.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24195
llvm-svn: 280509
The "install" build target will now copy the StreamExecutor library and
headers to the appropriate subdirectories of CMAKE_INSTALL_PREFIX.
llvm-svn: 280506
Summary:
Step 4 of getting GlobalDeviceMemory to own its handle.
Take out code to pack untyped device memory types as kernel arguments.
When GlobalDeviceMemory owns its handle, users will never touch untyped
device memory types, so they will never pass them as kernel args.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24177
llvm-svn: 280496
Summary:
Step 3 of getting GlobalDeviceMemory to own its handle.
Since GlobalDeviceMemory will no longer by copy-constructible, we must
pass instances by reference rather than by value.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24172
llvm-svn: 280439
Summary:
Kernel is basically just a smart pointer to the underlying
implementation, so making it movable prevents having to store a
std::unique_ptr to it.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24150
llvm-svn: 280437
Summary:
Step 2 of getting GlobalDeviceMemory to own its handle.
Use the SimpleHostPlatformDevice allocate methods to create device
arrays for tests, and check for successful copies by dereferncing the
device array handle directly because we know it is really a host
pointer.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24148
llvm-svn: 280428
Summary:
This is the first in a series of patches that will convert
GlobalDeviceMemory to own its device memory handle. The first step is to
remove GlobalDeviceMemoryBase from the PlatformInterface interfaces and
use raw handles there instead. This is useful because
GlobalDeviceMemoryBase is going to lose its importance in this process.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24114
llvm-svn: 280401
Summary:
The example code makes it clear that this is a much better design
decision.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24142
llvm-svn: 280397
Public documentation shouldn't be generated for unit test code and code
that is only meant to be used as snippets in other documentation.
llvm-svn: 280278
Summary:
Make the Kernel class follow the pattern of the other classes. It now
has a type-safe user wrapper and a typeless, platform-specific handle.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24043
llvm-svn: 280176
Summary:
There was a typo where \endcode was spelled as \encode and it was
keeping the whole file document from rendering. I also added in some \c
annotations for inline code stuff to make it look nicer.
Reviewers: jprice
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D23941
llvm-svn: 279855
Summary: This more clearly describes what the class is.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D23851
llvm-svn: 279669
Summary:
The return value from PlatformExecutor::allocateDeviceMemory needs to be
converted from Expected<GlobalDeviceMemoryBase> to
Expected<GlobalDeviceMemory<T>> in Executor::allocateDeviceMemory.
A similar bug is also fixed for Executor::allocateHostMemory.
Thanks to jprice for identifying this bug.
Reviewers: jprice, jlebar
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D23849
llvm-svn: 279658
Summary:
Consolidate Executor::synchronousCopy* and Stream::thenCopy* methods into
Doxygen method groups and combine all their comments into one section.
Also a "doc" target to the build files to use Doxygen to build the
documentation.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D23845
llvm-svn: 279654
Summary:
Add Executor methods that block the host until completion. Since these
methods are host-synchronous, they don't require Stream arguments.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D23577
llvm-svn: 279640
Summary: No functional changes just renaming this class for better readability.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D23574
llvm-svn: 278833
Summary: Add the Stream class and a few of the operations it supports.
Reviewers: jlebar, tra
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D23333
llvm-svn: 278829
Summary:
Add types for device memory and add the code that knows how to pack these
device memory types if they are passed as arguments to kernel launches.
Reviewers: jlebar, tra
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D23211
llvm-svn: 278021
Summary:
Add definitions for the KernelLoaderSpec and MultiKernelLoaderSpec
classes to StreamExecutor. Instances of these classes are generated by the
compiler in order to provide host code with a handle to device code.
Reviewers: jlebar, tra
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D23038
llvm-svn: 277615
Summary:
Error handling in StreamExecutor is based on llvm::Error and
llvm::Expected. This CL sets up the StreamExecutor wrapper classes in
the streamexecutor namespace.
All the other StreamExecutor code makes use of this error handling code,
so this is the first CL for checking in StreamExecutor.
Reviewers: jlebar, tra
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D22687
llvm-svn: 277210
Summary:
The format style is set to LLVM. This is consistent with the
parallel-libs project charter which specifies that its libraries will
conform to LLVM coding style.
Reviewers: jlebar
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D22576
llvm-svn: 276145