Commit Graph

5 Commits

Author SHA1 Message Date
Jeroen Ketema 1364d268a4 Implement mem_fence on ptx
PTX does not differentiate between read and write fences. Hence, these a
lowered to a mem_fence call. The mem_fence function compiles to the
“member.cta” instruction, which commits all outstanding reads and writes
of a thread such that these become visible to all other threads in the same
CTA (i.e., work-group). The instruction does not differentiate between
global and local memory. Hence, the flags parameter is ignored, except
for deciding whether a “member.cta” instruction should be issued at all.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315235
2017-10-09 19:43:04 +00:00
Jan Vesely a82e080b57 AMDGPU: Implement get_global_offset builtin
Also fix get_global_id to consider offset
No idea how to add this for ptx, so they are stuck with the old get_global_id
implementation.

v2: split to a separate patch

v3: Switch R600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276443
2016-07-22 17:24:24 +00:00
Peter Collingbourne 7b0ad09cc3 Add barrier.cl to SOURCES, spotted by Jin Wang.
llvm-svn: 163227
2012-09-05 18:13:55 +00:00
Peter Collingbourne a385c53413 PTX: move implementations of work-item and synchronisation functions
to lib, and add header files in generic.  Incorporates a patch by
Tom Stellard!

llvm-svn: 161313
2012-08-05 22:25:37 +00:00
Peter Collingbourne d5395fbf03 Initial commit.
llvm-svn: 147756
2012-01-08 22:09:58 +00:00