Summary:
We were packing global device memory handles in
`PackedKernelArgumentArray`, but as I was implementing the CUDA
platform, I realized that CUDA wants the address of the handle, not the
handle itself. So this patch switches to packing the address of the
handle.
Reviewers: jlebar
Subscribers: jprice, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24528
llvm-svn: 281424
Summary:
Platforms were returning Device pointers, but a Device is now basically
just a pointer to an underlying PlatformDevice, so we will now just pass
it around as a value.
Reviewers: jlebar
Subscribers: jprice, jlebar, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24537
llvm-svn: 281422
Summary:
This implementation does not currently support multiple concurrent streams, and
it won't allow kernels to be launched with grids larger than one block or
blocks larger than one thread. These limitations could be removed in the future
by launching new threads on the host, but that is not done in this
implementation.
Reviewers: jlebar
Subscribers: beanz, mgorny, jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24473
llvm-svn: 281377
Summary:
Improve the error-prone interface that allows users to pass host
pointers that haven't been registered to asynchronous copy methods. In
CUDA, this is an extremely easy error to make, and instead of failing at
runtime, it succeeds and gives the right answers by turning the async
copy into a sync copy. So, you silently get a huge performance
degradation if you misuse the old interface. This new interface should
prevent that.
Reviewers: jlebar
Subscribers: jprice, beanz, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24353
llvm-svn: 281225
Summary:
* Sections on main page.
* Use std algorithm for equality check in example.
* Add tree view on left side.
* Add extra CSS sheet to restrict content width.
* Add mild background color.
* Restrict alphabetic indexes to 1 column.
* Round corners of content boxes.
* Rename example to CUDASaxpy.cpp.
* Add CUDASaxpy.cpp to "Examples" section.
Reviewers: jprice
Subscribers: parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24198
llvm-svn: 280511
Summary:
Final step in getting GlobalDeviceMemory to own its handle.
* Make GlobalDeviceMemory movable, but no longer copyable.
* Make Device::freeDeviceMemory function private and make
GlobalDeviceMemoryBase a friend of Device so GlobalDeviceMemoryBase
can free its memory in its destructor.
* Make GlobalDeviceMemory constructor private and make Device a friend
so it can construct GlobalDeviceMemory.
* Remove SharedDeviceMemoryBase class because it is never used.
* Remove explicit memory freeing from example code.
This change just consumes any errors generated during device memory freeing.
The real error handling will be added in a future patch.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24195
llvm-svn: 280509
Summary:
Kernel is basically just a smart pointer to the underlying
implementation, so making it movable prevents having to store a
std::unique_ptr to it.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24150
llvm-svn: 280437
Summary:
The example code makes it clear that this is a much better design
decision.
Reviewers: jlebar
Subscribers: jprice, parallel_libs-commits
Differential Revision: https://reviews.llvm.org/D24142
llvm-svn: 280397