forked from OSchip/llvm-project
2e4ecfdebe
Summary: Previously it was implemented as inline asm in the CUDA headers. This change allows us to use the [addr+imm] addressing mode when executing ld.global.nc instructions. This translates into a 1.3x speedup on some benchmarks that call this instruction from within an unrolled loop. Reviewers: tra, rsmith Subscribers: jhen, cfe-commits, jholewinski Differential Revision: http://reviews.llvm.org/D19990 llvm-svn: 270150 |
||
---|---|---|
clang | ||
clang-tools-extra | ||
compiler-rt | ||
debuginfo-tests | ||
libclc | ||
libcxx | ||
libcxxabi | ||
libunwind | ||
lld | ||
lldb | ||
llgo | ||
llvm | ||
openmp | ||
polly |