Jan Vesely
3a7e8e77e6
amdgcn: Consolidate atomic minmax helpers
...
Removes most overrides
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewer: Aaron Watry
llvm-svn: 347665
2018-11-27 16:01:13 +00:00
Jan Vesely
f663e7e6da
amdgcn: Move __clc_amdgcn_s_waitcnt definition to clc file
...
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 346082
2018-11-04 00:39:27 +00:00
Jan Vesely
0e95b6a579
amdgcn: Convert get_num_groups to clc
...
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 346081
2018-11-04 00:39:25 +00:00
Jan Vesely
97283de27d
amdgcn: Convert get_global_size to clc
...
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 346080
2018-11-04 00:39:20 +00:00
Jan Vesely
ea2f32b75d
amdgcn: Convert get_local_size to clc
...
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-By: Aaron Watry <awatry@gmail.com>
llvm-svn: 346079
2018-11-04 00:39:16 +00:00
Jan Vesely
faa1ff16c1
amdgcn: Use __constant AS for amdgcn builtins.
...
Fixes build after clang r338707.
Reviewer: Matthew.Arsenault@amd.com
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 338898
2018-08-03 15:14:08 +00:00
Jan Vesely
70a270da5f
Add initial support for half precision builtins
...
v2: fix fmax implementation
use consistent checks for __CLC_FP_SIZE
add missing TODOs
fix whitespace in definitions.h
v3: undef ZERO in modf.inc
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
reviewer: Jeroen Ketema <j.ketema@xs4all.nl>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 332677
2018-05-17 22:55:30 +00:00
Jan Vesely
8fa100dfe3
amdgcn/fmin: Fix typos that reduced precision
...
Not sure how these sneaked in.
Fixes fminD and few other tests(fractD, cosD) on carrizo
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 330198
2018-04-17 18:11:29 +00:00
Jan Vesely
fd11db19c2
amdgcn: Update datalayout after LLVM r328656
...
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 329290
2018-04-05 14:47:44 +00:00
Jan Vesely
f96b1b88f8
amdgcn/fmax: fcanonicalize operands
...
v_max instruction needs canonicalized operands.
Passes CTS on carrizo
Reviewer: Aaron Watry <awatry@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 327076
2018-03-08 23:01:01 +00:00
Jan Vesely
e724e346ab
amdgcn/fmin: fcanonicalize operands
...
v_min instruction needs canonicalized operands.
Passes CTS on carrizo
Reviewer: Aaron Watry <awatry@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 327075
2018-03-08 23:00:58 +00:00
Jan Vesely
04a46bf0a2
amdgcn,popcount: Workaround broken llvm.ctpop intrinsic on some GCN ASICs
...
This is only really needed for VI+ ASICs. However, llvm would cast the value to
i32 for older asics anyway. The proper fix is in LLVM-7 (r326535).
Fixes CTS popcount on carrizo.
Reviewer: Aaron Watry <awatry@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 327044
2018-03-08 18:58:07 +00:00
Jan Vesely
1ad6a94676
amdgcn: Fix build after GDS/const AS swap in r325030
...
Acked-by: Aaron Watry <awatry@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 325866
2018-02-23 07:37:01 +00:00
Jan Vesely
eda1872d04
amdgcn: Fix datalayout after addition of 32bit const AS in r324747
...
Acked-by: Aaron Watry <awatry@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 325865
2018-02-23 07:36:54 +00:00
Jan Vesely
911666f3fa
amdgcn: Fix datalayout after clang r324101
...
r324101 switched around AS numbering
Acked-by: Aaron Watry <awatry@gmail.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 325863
2018-02-23 07:36:39 +00:00
Jan Vesely
c420b61b26
amdgcn: Add missing datalayout info to .ll files
...
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 316239
2017-10-20 21:10:18 +00:00
Jeroen Ketema
fe9fa89854
Let get_work_dim take exactly 0 arguments
...
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314634
2017-10-01 20:11:46 +00:00
Jan Vesely
ce29e8cde1
Restore support for llvm-3.9
...
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314543
2017-09-29 19:06:41 +00:00
Jan Vesely
c9bbbe2403
Implement cl_khr_int64_extended_atomics builtins
...
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 313811
2017-09-20 20:42:19 +00:00
Jan Vesely
3d1db3de74
amdgcn,waitcnt: Add datalayout info
...
This file is only compiled for GCN which all share the same layout
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 312493
2017-09-04 15:52:07 +00:00
Jan Vesely
999b1d9426
amdgcn: rewrite barrier() using fence and clang __builtin_amdgcn_s_barrier
...
Specs require using fences when barrier() is invoked:
"The barrier function will either flush any variables stored in local memory
or queue a memory fence to ensure correct ordering of memory operations to local memory."
and
"The barrier function will queue a memory fence to ensure correct ordering
of memory operations to global memory."
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 311022
2017-08-16 17:09:00 +00:00
Jan Vesely
1977092dc3
amdgcn: Implement {read_,write_,}mem_fence builtin
...
v2: add more detailed comment about waitcnt instruction
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 311021
2017-08-16 17:08:56 +00:00
Matt Arsenault
958fce3192
amdgcn: Fix return type of get_num_groups
...
llvm-svn: 279723
2016-08-25 07:31:40 +00:00
Matt Arsenault
26d9c41ff6
amdgcn: Fix return type for get_global_size
...
llvm-svn: 279644
2016-08-24 17:52:04 +00:00
Matt Arsenault
314364cbd2
amdgpu: Fix default case value for get_local_size
...
llvm-svn: 279359
2016-08-20 04:17:17 +00:00
Matt Arsenault
220268d177
amdgcn: Fix get_local_size IR return type
...
llvm-svn: 279350
2016-08-20 00:01:21 +00:00
Matt Arsenault
2ce3d94a01
amdgcn: Correct return types to be size_t
...
llvm-svn: 279343
2016-08-19 22:49:39 +00:00
Jan Vesely
a82e080b57
AMDGPU: Implement get_global_offset builtin
...
Also fix get_global_id to consider offset
No idea how to add this for ptx, so they are stuck with the old get_global_id
implementation.
v2: split to a separate patch
v3: Switch R600 to use implictarg.ptr
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276443
2016-07-22 17:24:24 +00:00
Jan Vesely
74f02db922
AMDGPU: Use clang intrinsics for workitem builtins
...
v2: split into 2 patches
use clang builtins for other intrinsics as well
v3: Fix warnings
Switch r600 to use implictarg.ptr
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276442
2016-07-22 17:24:20 +00:00
Matt Arsenault
b456c6dd56
Replace llvm.AMDGPU.ldexp with llvm.amdgcn.ldexp
...
It didn't really work on r600 to begin with, which should
get its own intrinsic.
llvm-svn: 275813
2016-07-18 16:42:50 +00:00
Matt Arsenault
45e6eaaa05
amdgcn: Use new workitem intrinsics
...
llvm-svn: 261042
2016-02-17 00:27:27 +00:00
Matt Arsenault
a48e15c6cb
Split sources for amdgcn and r600
...
Most files remain in a common amdgpu directory.
Also switches barriers to to use convergent,
and use llvm.amdgcn.s.barrier.
This now requires 3.9/trunk to build amdgcn.
llvm-svn: 260777
2016-02-13 01:01:59 +00:00