In CUDA mode all local variables are actually thread local|threadprivate, not private, and, thus, they cannot be shared between threads|lanes. llvm-svn: 326590