llvm-project

Commit Graph

Author	SHA1	Message	Date
Jan Vesely	85e2fa44c6	libclc/r600: Use target specific builtins to implement rsqrt and native_rsqrt Fixes OCL CTS rsqrt and half_rsqrt (1 thread, scalaer) tests on AMD Turks. Reviewer: awatry Differential Revision: https://reviews.llvm.org/D74016	2020-02-09 14:42:15 -05:00
Jan Vesely	2d32cd1585	r600: Convert barrier to clc Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewer: Aaron Watry llvm-svn: 346078	2018-11-04 00:35:15 +00:00
Jan Vesely	35b7ac4c30	r600: Convert get_num_groups to clc Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewer: Aaron Watry llvm-svn: 346077	2018-11-04 00:35:12 +00:00
Jan Vesely	cc6c2ef3b4	r600: Convert get_global_size to clc Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewer: Aaron Watry llvm-svn: 346076	2018-11-04 00:35:08 +00:00
Jan Vesely	5fa4e06e27	r600: Convert get_local_size to clc Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewer: Aaron Watry llvm-svn: 346075	2018-11-04 00:35:03 +00:00
Jan Vesely	e0edcaa4a9	r600/fmin: Flush denormals before calling builtin. Same reason as amdgcn. Fixes fmin, minmag CTS on turks. Reviewer: Tom Stellard <tstellar@redhat.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 334228	2018-06-07 20:27:58 +00:00
Jan Vesely	e23c0ec086	r600/fmax: Flush denormals before calling builtin. Same reason as amdgcn. Fixes fmax, maxmag CTS on turks. Reviewer: Tom Stellard <tstellar@redhat.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 334227	2018-06-07 20:27:56 +00:00
Matt Arsenault	958fce3192	amdgcn: Fix return type of get_num_groups llvm-svn: 279723	2016-08-25 07:31:40 +00:00
Matt Arsenault	26d9c41ff6	amdgcn: Fix return type for get_global_size llvm-svn: 279644	2016-08-24 17:52:04 +00:00
Matt Arsenault	220268d177	amdgcn: Fix get_local_size IR return type llvm-svn: 279350	2016-08-20 00:01:21 +00:00
Jan Vesely	a82e080b57	AMDGPU: Implement get_global_offset builtin Also fix get_global_id to consider offset No idea how to add this for ptx, so they are stuck with the old get_global_id implementation. v2: split to a separate patch v3: Switch R600 to use implictarg.ptr Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276443	2016-07-22 17:24:24 +00:00
Jan Vesely	74f02db922	AMDGPU: Use clang intrinsics for workitem builtins v2: split into 2 patches use clang builtins for other intrinsics as well v3: Fix warnings Switch r600 to use implictarg.ptr Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276442	2016-07-22 17:24:20 +00:00
Matt Arsenault	45e6eaaa05	amdgcn: Use new workitem intrinsics llvm-svn: 261042	2016-02-17 00:27:27 +00:00
Matt Arsenault	a48e15c6cb	Split sources for amdgcn and r600 Most files remain in a common amdgpu directory. Also switches barriers to to use convergent, and use llvm.amdgcn.s.barrier. This now requires 3.9/trunk to build amdgcn. llvm-svn: 260777	2016-02-13 01:01:59 +00:00
Tom Stellard	a59fd49ba4	r600: Add image writing builtins. Patch by: Zoltan Gilian llvm-svn: 248161	2015-09-21 14:59:56 +00:00
Tom Stellard	9a7d4a940f	r600: Add image reading builtins. Patch by: Zoltan Gilian llvm-svn: 248160	2015-09-21 14:59:54 +00:00
Tom Stellard	ccc0ec1ddb	Add image attribute getter builtins Added get_image_* OpenCL builtins to the headers. Added implementation to the r600 target. Patch by: Zoltan Gilian llvm-svn: 248159	2015-09-21 14:47:53 +00:00
Tom Stellard	4f8d26230c	R600: Implement accurate double precision sqrt v2 v2: - Use same implementation for R600 and gcn. llvm-svn: 241907	2015-07-10 13:37:08 +00:00
Tom Stellard	f30d5fc01d	Implement ldexp for R600/SI llvm-svn: 236638	2015-05-06 20:53:29 +00:00
Jan Vesely	260827caa2	r600: Use llvm intrinsic to read work dimension information v2: Fix function declaration Add range metadata to r600 implementation v3: change prefix to AMDGPU Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219793	2014-10-15 15:08:06 +00:00
Aaron Watry	f991505d02	vload/vstore: Use casts instead of scalarizing everything in CLC version This generates bitcode which is indistinguishable from what was hand-written for int32 types in v[load\|store]_impl.ll. v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom) v3: Also remove unused generic/lib/shared/v[load\|store]_impl.ll v2: (Per Matt Arsenault) Fix alignment issues with vector load stores Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> CC: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 216069	2014-08-20 13:58:57 +00:00
Tom Stellard	d2e83929a9	R600: Set the noduplicate attribute on barrier() intrinsics This will prevent LLVM optimization passes from creating illegal uses of the barrier() intrinsic (e.g. calling barrier() from a conditional that is not executed by all threads). llvm-svn: 193753	2013-10-31 15:50:48 +00:00
Tom Stellard	6c7b86c106	Implement nextafter() builtin There are two implementations of nextafter(): 1. Using clang's __builtin_nextafter. Clang replaces this builtin with a call to nextafter which is part of libm. Therefore, this implementation will only work for targets with an implementation of libm (e.g. most CPU targets). 2. The other implementation is written in OpenCL C. This function is known internally as __clc_nextafter and can be used by targets that don't have access to libm. llvm-svn: 192383	2013-10-10 19:08:51 +00:00
Aaron Watry	50a7bcbac9	Add atomic_inc and atomic_add builtins Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 190058	2013-09-05 16:04:01 +00:00
Aaron Watry	bde11213e7	Added get_num_groups The get_num_groups function was missing for r600g. I did the same thing as the other workitem functions. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 187059	2013-07-24 18:03:38 +00:00
Aaron Watry	99a2f3b274	Fix and re-enable R600 vload/vstore assembly The assembly optimizations were making unsafe assumptions about which address spaces had which identifiers. Also, fix vload/vstore with 64-bit pointers. This was broken previously on Radeon SI. This version still only has assembly versions of int/uint 2/4/8/16 for global loads and stores on R600, but it does it in a way that would be very easily extended to private/local/constant and could also be handled easily on other architectures. v2: 1) Leave v[load\|store]_impl.ll in generic/lib 2) Remove vload_if.ll and vstore_if.ll interfaces 3) Fix address+offset calculations 3) Remove offset from assembly arg list llvm-svn: 186416	2013-07-16 14:29:01 +00:00
Tom Stellard	3a81b5d083	Implement barrier() builtin Reviewed and Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 185837	2013-07-08 17:26:39 +00:00
Tom Stellard	f2f5a86620	R600: Replace cl implementations with LLVM IR implementation This allows libclc to be built for R600 with upstream clang and LLVM. llvm-svn: 184980	2013-06-26 18:20:00 +00:00
Tom Stellard	38f0ac9d5e	r600: Add get_global_size() implementation llvm-svn: 184977	2013-06-26 18:19:44 +00:00
Tom Stellard	879327fcdc	r600: Initial support This includes a get_global_id() implementation and function stubs for the other workitem and synchronization functions. llvm-svn: 184975	2013-06-26 18:18:59 +00:00

30 Commits