forked from OSchip/llvm-project
51dfc27589
We may generate GPU kernels that store into scalars in case we run some sequential code on the GPU because the remaining data is expected to already be on the GPU. For these kernels it is important to not keep the scalar values in thread-local registers, but to store them back to the corresponding device memory objects that backs them up. We currently only store scalars back at the end of a kernel. This is only correct if precisely one thread is executed. In case more than one thread may be run, we currently invalidate the scop. To support such cases correctly, we would need to always load and store back from a corresponding global memory slot instead of a thread-local alloca slot. llvm-svn: 281838 |
||
---|---|---|
.. | ||
cuda-annotations.ll | ||
double-parallel-loop.ll | ||
host-control-flow.ll | ||
host-statement.ll | ||
invalid-kernel.ll | ||
kernel-params-only-some-arrays.ll | ||
kernel-params-scop-parameter.ll | ||
non-read-only-scalars.ll | ||
non-zero-array-offset.ll | ||
only-part-of-array-modified.ll | ||
parametric-loop-bound.ll | ||
phi-nodes-in-kernel.ll | ||
private-memory.ll | ||
region-stmt.ll | ||
remove-dead-instructions-in-stmt-2.ll | ||
remove-dead-instructions-in-stmt.ll | ||
run-time-check.ll | ||
scalar-param-and-value-32-bit.ll | ||
scalar-param-and-value-use.ll | ||
scalar-parameter-fp128.ll | ||
scalar-parameter-half.ll | ||
scalar-parameter-i80.ll | ||
scalar-parameter-i120.ll | ||
scalar-parameter-i128.ll | ||
scalar-parameter-i3000.ll | ||
scalar-parameter-ppc_fp128.ll | ||
scalar-parameter-x86_fp80.ll | ||
scalar-parameter.ll | ||
scheduler-timeout.ll | ||
shared-memory-scalar.ll | ||
shared-memory-two-dimensional.ll | ||
shared-memory.ll | ||
size-cast.ll | ||
untouched-arrays.ll |