vote.ballot instruction is gone in recent CUDA versions and
vote.sync.ballot can not be used because it needs a thread mask parameter.
Fortunately PTX 6.2 (introduced with CUDA-9.2) provides activemask.b32
instruction for this.
Differential Revision: https://reviews.llvm.org/D66665
llvm-svn: 370792
Summary:
These all had somewhat custom file headers with different text from the
ones I searched for previously, and so I missed them. Thanks to Hal and
Kristina and others who prompted me to fix this, and sorry it took so
long.
Reviewers: hfinkel
Subscribers: mcrosier, javed.absar, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60406
llvm-svn: 357941
* __shfl_{up,down}* uses unsigned int for the third parameter.
* added [unsigned] long overloads for non-sync shuffles.
Differential Revision: https://reviews.llvm.org/D41521
llvm-svn: 321326
Summary:
MSVC seems to use "__in" and "__out" for its own purposes, so we have to
pick different names in this macro.
Reviewers: tra
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D28325
llvm-svn: 291138
Summary: Clang changes to make use of the LLVM intrinsics added in D21160.
Reviewers: tra
Subscribers: jholewinski, cfe-commits
Differential Revision: http://reviews.llvm.org/D21162
llvm-svn: 272299
Summary: The order is [x, y, z, w], not [w, x, y, z].
Subscribers: cfe-commits, tra
Differential Revision: http://reviews.llvm.org/D20794
llvm-svn: 271215
Summary:
Previously it was implemented as inline asm in the CUDA headers.
This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions. This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.
Reviewers: tra, rsmith
Subscribers: jhen, cfe-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D19990
llvm-svn: 270150