Summary:
OpenCL 2.0 specification defines '-cl-uniform-work-group-size' option,
which requires that the global work-size be a multiple of the work-group
size specified to clEnqueueNDRangeKernel and allows optimizations that
are made possible by this restriction.
The patch introduces the support of this option.
To keep information about whether an OpenCL kernel has uniform work
group size or not, clang generates 'uniform-work-group-size' function
attribute for every kernel:
- "uniform-work-group-size"="true" for OpenCL 1.2 and lower,
- "uniform-work-group-size"="true" for OpenCL 2.0 and higher if
'-cl-uniform-work-group-size' option was specified,
- "uniform-work-group-size"="false" for OpenCL 2.0 and higher if no
'-cl-uniform-work-group-size' options was specified.
If the function is not an OpenCL kernel, 'uniform-work-group-size'
attribute isn't generated.
Patch by: krisb
Reviewers: yaxunl, Anastasia, b-sumner
Reviewed By: yaxunl, Anastasia
Subscribers: nhaehnle, yaxunl, Anastasia, cfe-commits
Differential Revision: https://reviews.llvm.org/D43570
llvm-svn: 325771
This was done for CUDA functions in r261779, and for the same
reason this also needs to be done for OpenCL. An arbitrary
function could have a barrier() call in it, which in turn
requires the calling function to be convergent.
llvm-svn: 315094
Certain OpenCL builtin functions are supposed to be executed by all threads in a work group or sub group. Such functions should not be made divergent during transformation. It makes sense to mark them with convergent attribute.
The adding of convergent attribute is based on Ettore Speziale's work and the original proposal and patch can be found at https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22271.html.
Differential Revision: https://reviews.llvm.org/D25343
llvm-svn: 285725