Commit Graph

43 Commits

Author SHA1 Message Date
Matt Arsenault 958fce3192 amdgcn: Fix return type of get_num_groups
llvm-svn: 279723
2016-08-25 07:31:40 +00:00
Matt Arsenault 26d9c41ff6 amdgcn: Fix return type for get_global_size
llvm-svn: 279644
2016-08-24 17:52:04 +00:00
Matt Arsenault 314364cbd2 amdgpu: Fix default case value for get_local_size
llvm-svn: 279359
2016-08-20 04:17:17 +00:00
Matt Arsenault 220268d177 amdgcn: Fix get_local_size IR return type
llvm-svn: 279350
2016-08-20 00:01:21 +00:00
Jan Vesely a82e080b57 AMDGPU: Implement get_global_offset builtin
Also fix get_global_id to consider offset
No idea how to add this for ptx, so they are stuck with the old get_global_id
implementation.

v2: split to a separate patch

v3: Switch R600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276443
2016-07-22 17:24:24 +00:00
Jan Vesely 74f02db922 AMDGPU: Use clang intrinsics for workitem builtins
v2: split into 2 patches
    use clang builtins for other intrinsics as well

v3: Fix warnings
    Switch r600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276442
2016-07-22 17:24:20 +00:00
Matt Arsenault 1ab0d9c1ee R600: Use new barrier intrinsic
llvm-svn: 275874
2016-07-18 18:42:17 +00:00
Matt Arsenault 45e6eaaa05 amdgcn: Use new workitem intrinsics
llvm-svn: 261042
2016-02-17 00:27:27 +00:00
Matt Arsenault a48e15c6cb Split sources for amdgcn and r600
Most files remain in a common amdgpu directory.

Also switches barriers to to use convergent,
and use llvm.amdgcn.s.barrier.

This now requires 3.9/trunk to build amdgcn.

llvm-svn: 260777
2016-02-13 01:01:59 +00:00
Tom Stellard a59fd49ba4 r600: Add image writing builtins.
Patch by: Zoltan Gilian

llvm-svn: 248161
2015-09-21 14:59:56 +00:00
Tom Stellard 9a7d4a940f r600: Add image reading builtins.
Patch by: Zoltan Gilian

llvm-svn: 248160
2015-09-21 14:59:54 +00:00
Tom Stellard ccc0ec1ddb Add image attribute getter builtins
Added get_image_* OpenCL builtins to the headers.
Added implementation to the r600 target.

Patch by: Zoltan Gilian

llvm-svn: 248159
2015-09-21 14:47:53 +00:00
Tom Stellard 4f8d26230c R600: Implement accurate double precision sqrt v2
v2:
  - Use same implementation for R600 and gcn.

llvm-svn: 241907
2015-07-10 13:37:08 +00:00
Jan Vesely 30978bb99d r600: Use __clc_ldexp on asics that don't implement the intruction
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236649
2015-05-06 21:59:30 +00:00
Tom Stellard 2ca909d824 math: Add ldexp implementation
Signed-off-by: Aaron Watry <awatry@gmail.com>

Tom Stellard:
  - Add denormal handling.
  - Share vectorization code with r600 implementation.

Patch By: Aaron Watry

llvm-svn: 236639
2015-05-06 20:53:32 +00:00
Tom Stellard f30d5fc01d Implement ldexp for R600/SI
llvm-svn: 236638
2015-05-06 20:53:29 +00:00
Tom Stellard 24ea64e050 r600: get_work_dim: Update metadata syntax for LLVM 3.6
llvm-svn: 225042
2014-12-31 15:27:59 +00:00
Jan Vesely ae50c89589 r600: Fix get_work_dim range metadata
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220388
2014-10-22 14:32:53 +00:00
Jan Vesely 260827caa2 r600: Use llvm intrinsic to read work dimension information
v2: Fix function declaration
    Add range metadata to r600 implementation
v3: change prefix to AMDGPU

Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219793
2014-10-15 15:08:06 +00:00
Aaron Watry f4133b8a10 R600: Map Address spaces for atomic_cmpxchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217925
2014-09-16 22:34:59 +00:00
Aaron Watry e210cae126 R600: Map address spaces for atomic_xchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217924
2014-09-16 22:34:58 +00:00
Aaron Watry 0545fa3fb0 R600: Map address spaces for atomic_min
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217923
2014-09-16 22:34:56 +00:00
Aaron Watry dd754f4b33 R600: Map address spaces for atomic_xor
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217922
2014-09-16 22:34:55 +00:00
Aaron Watry ea32a57060 R600: Map addr spaces and use atomic_max
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217921
2014-09-16 22:34:53 +00:00
Aaron Watry 5ab82be926 R600: Map address spaces for atomic_or
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217920
2014-09-16 22:34:52 +00:00
Aaron Watry 348db3c666 R600: Map atomic_and address spaces
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217919
2014-09-16 22:34:51 +00:00
Aaron Watry f991505d02 vload/vstore: Use casts instead of scalarizing everything in CLC version
This generates bitcode which is indistinguishable from what was
hand-written for int32 types in v[load|store]_impl.ll.

v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom)
v3: Also remove unused generic/lib/shared/v[load|store]_impl.ll
v2: (Per Matt Arsenault) Fix alignment issues with vector load stores

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 216069
2014-08-20 13:58:57 +00:00
Jeroen Ketema 526fe2d501 Move clcmacro.h to avoid cluttering user namespace v2
v2: - use quotes instead of <>
    - add include to r600/lib/math/nextafter.c changed

Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 211576
2014-06-24 09:36:32 +00:00
Tom Stellard d2e83929a9 R600: Set the noduplicate attribute on barrier() intrinsics
This will prevent LLVM optimization passes from creating illegal uses
of the barrier() intrinsic (e.g. calling barrier() from a conditional
that is not executed by all threads).

llvm-svn: 193753
2013-10-31 15:50:48 +00:00
Tom Stellard 6c7b86c106 Implement nextafter() builtin
There are two implementations of nextafter():
1. Using clang's __builtin_nextafter.  Clang replaces this builtin with
a call to nextafter which is part of libm.  Therefore, this
implementation will only work for targets with an implementation of
libm (e.g. most CPU targets).

2. The other implementation is written in OpenCL C.  This function is
known internally as __clc_nextafter and can be used by targets that
don't have access to libm.

llvm-svn: 192383
2013-10-10 19:08:51 +00:00
Aaron Watry 283e3fa011 Add atomic_sub and atomic_dec builtin functions
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190201
2013-09-06 20:20:21 +00:00
Aaron Watry 50a7bcbac9 Add atomic_inc and atomic_add builtins
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 190058
2013-09-05 16:04:01 +00:00
Aaron Watry c0aa6e0291 Enable assembly vload3 int/uint constant/global for R600
It's supported by the R600 LLVM back-end now, at least for evergreen.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188180
2013-08-12 14:42:50 +00:00
Aaron Watry 7d52565321 Add vload* for addrspace(2) and use as constant load for R600
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188179
2013-08-12 14:42:49 +00:00
Aaron Watry bde11213e7 Added get_num_groups
The get_num_groups function was missing for r600g. I did the same
thing as the other workitem functions.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 187059
2013-07-24 18:03:38 +00:00
Aaron Watry 99a2f3b274 Fix and re-enable R600 vload/vstore assembly
The assembly optimizations were making unsafe assumptions about which address
spaces had which identifiers.

Also, fix vload/vstore with 64-bit pointers. This was broken previously on
Radeon SI.

This version still only has assembly versions of int/uint 2/4/8/16 for global
loads and stores on R600, but it does it in a way that would be very easily
extended to private/local/constant and could also be handled easily on other
architectures.

v2: 1) Leave v[load|store]_impl.ll in generic/lib
    2) Remove vload_if.ll and vstore_if.ll interfaces
    3) Fix address+offset calculations
    3) Remove offset from assembly arg list
llvm-svn: 186416
2013-07-16 14:29:01 +00:00
Tom Stellard 3a81b5d083 Implement barrier() builtin
Reviewed and Tested-by: Aaron Watry <awatry@gmail.com>

llvm-svn: 185837
2013-07-08 17:26:39 +00:00
Tom Stellard c0af47de00 r600: Fix implementations of get_group_id.ll and get_local_size.ll
llvm-svn: 185005
2013-06-26 18:22:00 +00:00
Tom Stellard 88812274dc r600: Add overrides file
llvm-svn: 184983
2013-06-26 18:20:08 +00:00
Tom Stellard f2f5a86620 R600: Replace cl implementations with LLVM IR implementation
This allows libclc to be built for R600 with upstream clang and LLVM.

llvm-svn: 184980
2013-06-26 18:20:00 +00:00
Tom Stellard 38f0ac9d5e r600: Add get_global_size() implementation
llvm-svn: 184977
2013-06-26 18:19:44 +00:00
Tom Stellard ac14c4e878 r600: Fix get_global_id implementation
llvm-svn: 184976
2013-06-26 18:19:39 +00:00
Tom Stellard 879327fcdc r600: Initial support
This includes a get_global_id() implementation and function stubs for
the other workitem and synchronization functions.

llvm-svn: 184975
2013-06-26 18:18:59 +00:00