llvm-project

Commit Graph

Author	SHA1	Message	Date
Jan Vesely	e4d5d10076	tan: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317258	2017-11-02 19:48:57 +00:00
Jan Vesely	b0fab2696a	sqrt: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317257	2017-11-02 19:48:55 +00:00
Jan Vesely	e3802356e2	sinpi: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317256	2017-11-02 19:48:53 +00:00
Jan Vesely	4708b10878	sinh: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317255	2017-11-02 19:48:51 +00:00
Jan Vesely	a3febe3fa9	sin: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317254	2017-11-02 19:48:50 +00:00
Jan Vesely	25671b40d7	native_log: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317253	2017-11-02 19:48:48 +00:00
Jan Vesely	fd13434d83	native_log2: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317252	2017-11-02 19:48:46 +00:00
Jan Vesely	d6ad07687d	native_log10: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317251	2017-11-02 19:48:44 +00:00
Jan Vesely	139185dfc7	log: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317250	2017-11-02 19:48:43 +00:00
Jan Vesely	27dffff6e8	logb: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317249	2017-11-02 19:48:41 +00:00
Jan Vesely	7fc23fbdcb	log2: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317248	2017-11-02 19:48:39 +00:00
Jan Vesely	a9132ce347	log1p: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317247	2017-11-02 19:48:37 +00:00
Jan Vesely	4e062cb74e	lgamma: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317246	2017-11-02 19:48:35 +00:00
Jan Vesely	4cb612e140	exp2: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317245	2017-11-02 19:48:33 +00:00
Jan Vesely	e99ba9a23d	cospi: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317244	2017-11-02 19:48:31 +00:00
Jan Vesely	c708278f13	cosh: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317243	2017-11-02 19:48:30 +00:00
Jan Vesely	f76371d948	cos: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317242	2017-11-02 19:48:27 +00:00
Jan Vesely	50a3cccdbe	cbrt: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317241	2017-11-02 19:48:25 +00:00
Jan Vesely	a4df39bcad	atanpi: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317240	2017-11-02 19:48:23 +00:00
Jan Vesely	1bd2ac257a	atanh: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317239	2017-11-02 19:48:22 +00:00
Jan Vesely	0942b5e1bf	atan: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317238	2017-11-02 19:48:20 +00:00
Jan Vesely	d3d5e322e3	asinpi: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317237	2017-11-02 19:48:18 +00:00
Jan Vesely	48bda32986	asinh: Use unary_dec instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317236	2017-11-02 19:48:16 +00:00
Jan Vesely	ba4b98c691	asin: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317235	2017-11-02 19:48:15 +00:00
Jan Vesely	61171847b7	acospi: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317234	2017-11-02 19:48:13 +00:00
Jan Vesely	720783d9f5	acosh: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317233	2017-11-02 19:48:11 +00:00
Jan Vesely	caca914218	acos: Use unary_decl instead of custom inc file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-By: Aaron Watry <awatry@gmail.com> llvm-svn: 317232	2017-11-02 19:48:06 +00:00
Jan Vesely	47e093da9b	math: Implement native_log10 Use llvm instrinsic by default Provide amdgpu workaround v2: drop old amd copyrights Reviewer: Aaron Watry Reviewed-by: Vedran Miletić <vedran@miletic.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 316588	2017-10-25 16:49:22 +00:00
Jan Vesely	7ab2d0bdcd	shared: Implement aligned vector stores (vstorea_half) Float version passes newly posted piglit tests on turks, float and double pass on carrizo. v2: scalar vstorea_half v3: fix typo Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 316291	2017-10-22 14:21:59 +00:00
Jan Vesely	12061c7125	shared: Implement aligned vector loads (vloada_half) Passes newly posted piglits on turks and carrizo v2: add scalar vloada_half v3: fix typo Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 316290	2017-10-22 14:21:56 +00:00
Jan Vesely	3d349ea98e	Make image builtins r600/llvm-3.9 only The implementation uses r600 sepcific intrinsics LLVM-4 switched to _ro_t and _rw_t image types Portions of the code can be moved back as more targets/llvm versions add image support Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315341	2017-10-10 18:10:21 +00:00
Jan Vesely	1de1444d62	Do not include clc_nextafter header globally Drop unused clc/math/clc_nextafter.h header Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315190	2017-10-08 19:33:58 +00:00
Jan Vesely	6a5c8ddb3a	math/nextafter: Use custom declaration inc file Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315189	2017-10-08 19:33:55 +00:00
Jan Vesely	72be1cc0be	math/binary_decl.inc: Do not declare mixed float/double functions fmin/fmax only need vector/scalar mix Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315188	2017-10-08 19:33:53 +00:00
Jan Vesely	beb6591753	ldexp: Fix double precision function return type Fixes ~1200 external calls from nvtpx library. Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315170	2017-10-08 06:56:14 +00:00
Jan Vesely	a02d0e2c50	integer/sub_sat: Use clang builtin instead of llvm asm reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314703	2017-10-02 18:39:03 +00:00
Jan Vesely	1964df8fad	integer/add_sat: Use clang builtin instead of llvm asm reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314702	2017-10-02 18:39:00 +00:00
Jan Vesely	943057a288	integer/clz: Use clang builtin instead of llvm asm The generated llvm IR mostly identical. char/uchar case is a bit worse. reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314701	2017-10-02 18:38:57 +00:00
Jeroen Ketema	fe9fa89854	Let get_work_dim take exactly 0 arguments Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314634	2017-10-01 20:11:46 +00:00
Jeroen Ketema	17fdf263c5	Do no circularly define NULL Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314633	2017-10-01 20:10:14 +00:00
Jan Vesely	41b1500db0	geometric: geometric functions are only supported for vector lengths <=4 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314545	2017-09-29 19:06:47 +00:00
Jan Vesely	1fa727d615	Rework atomic ops to use clang builtins rather than llvm asm reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314112	2017-09-25 16:07:34 +00:00
Jan Vesely	c9bbbe2403	Implement cl_khr_int64_extended_atomics builtins Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 313811	2017-09-20 20:42:19 +00:00
Jan Vesely	1c81f4b0e3	Implement cl_khr_int64_base_atomics builtins Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 313810	2017-09-20 20:42:14 +00:00
Aaron Watry	e62f5fa64d	Add native_recip(x) as ((1)/(x)) Signed-off-by: Aaron Watry <awatry@gmail.com> Acked-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 313107	2017-09-13 01:40:25 +00:00
Aaron Watry	415a60f303	integer: Add popcount implementation using ctpop intrinsic Also copy/modify the unary_intrin.inc from math/ to make the intrinsic declaration somewhat reusable. Passes CL CTS integer_ops/test_integer_ops popcount tests for CL 1.2 Tested-by on GCN 1.0 (Pitcairn) Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312854	2017-09-09 02:23:54 +00:00
Jan Vesely	285d2fb85c	Implement vload_half{,n} and vload(half) v2: add vload(half) as well make helpers amdgpu specific (NVPTX uses different private AS numbering) use clang builtin on clang >= 6 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312839	2017-09-08 23:59:00 +00:00
Jan Vesely	661ac03a1b	vstore: Cleanup and add vstore(half) Add missing undefs Make helpers amdgpu specific (NVPTX uses different numbering for private AS) Use clang builtins on clang >= 6 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312838	2017-09-08 23:58:57 +00:00
Jan Vesely	1796d590c1	Fixup clc.h comment Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312491	2017-09-04 15:52:03 +00:00
Aaron Watry	0bf96b1712	relational: Implement shuffle2 builtin This was added in CL 1.1 Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via: test_conformance/relationals/test_relationals shuffle_built_in_dual_input v2: Add half support to shuffle2 Move shuffle2 to misc/ Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312404	2017-09-02 02:23:28 +00:00
Aaron Watry	880f15dae6	relational: Implement shuffle builtin This was added in CL 1.1 Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via: test_conformance/relationals/test_relationals shuffle_built_in v2: Add half-precision support to shuffle when available. Move to misc/ and add section 6.12.12 to clc.h Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312403	2017-09-02 02:23:26 +00:00
Aaron Watry	da8dfefd1c	Add halfN types and enable fp16 when generating builtin declarations Uses the same mechanism to enable fp16 as we use for fp64 when processing clc.h Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312402	2017-09-02 02:23:16 +00:00
Jan Vesely	1977092dc3	amdgcn: Implement {read_,write_,}mem_fence builtin v2: add more detailed comment about waitcnt instruction Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 311021	2017-08-16 17:08:56 +00:00
Jan Vesely	09f0a560e1	add __kernel_exec macros also consolidate macros into one file, and rename to clcmacros.h Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 309358	2017-07-28 03:39:03 +00:00
Jan Vesely	2f2a3bc0dc	generic: add missing get_work_dim include Fixes few piglits since clang r304193 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 304556	2017-06-02 15:58:35 +00:00
Jan Vesely	9f7172965c	math: Implement sinh function mostly copied form amd_builtins llvm-svn: 296233	2017-02-25 02:46:53 +00:00
Aaron Watry	dfec3c8e95	math: Add native_tan as wrapper to tan Trivially define native_tan as a redirect to tan. If there are any targets with a native implementation, we can deal with it later. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <arsenm2@gmail.com> llvm-svn: 295920	2017-02-23 01:46:57 +00:00
Jeroen Ketema	ed98e8d099	Add the correct prefixes to the cl_khr_fp64 pragma llvm-svn: 294915	2017-02-12 21:31:41 +00:00
Matt Arsenault	9df2b9781c	math: Add native_rsqrt builtin function Trivial define to rsqrt. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 294608	2017-02-09 18:39:26 +00:00
Aaron Watry	c606efabb7	math: Add logb builtin Ported from the amd-builtins branch. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 292335	2017-01-18 03:14:10 +00:00
Aaron Watry	900bd7eb7f	math: Add expm1 builtin function Ported from the amd-builtins branch. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 292334	2017-01-18 03:13:37 +00:00
Jan Vesely	0a5aac3fc4	Provide vstore_half helper to workaround clc restrictions clang won't accept half precision loads and stores without cl_khr_fp16 since r281904 llvm-svn: 282106	2016-09-21 20:15:55 +00:00
Aaron Watry	af569547fa	math: Implement tgamma Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281566	2016-09-15 00:17:34 +00:00
Aaron Watry	e9009cdd21	math: Implement lgamma Just use lgamma_r and ignore the value returned in the second argument Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281565	2016-09-15 00:17:31 +00:00
Aaron Watry	0ab07e1bde	math: Implement lgamma_r Ported from the amd-builtins branch, which is itself based on the Sun Microsystems implementation. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281564	2016-09-15 00:17:28 +00:00
Aaron Watry	f969413a82	Add ADDR_SPACE parameter to _CLC_V_V_VP_VECTORIZE This macro is currently unused, but I plan to use it shortly. The previous form did casts of pointers without an address space, which doesn't work so well for CL 1.x. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281563	2016-09-15 00:17:22 +00:00
Matt Arsenault	fbfd828d2a	Replace nextafter implementation This one passes conformance. llvm-svn: 280961	2016-09-08 16:37:56 +00:00
Jan Vesely	eade17271a	Avoid ambiguity in calling atom_add functions. clang (since r280553) allows pointer casts in function overloads, so we need to disambiguate the second argument. clang might be smarter about overloads in the future see https://reviews.llvm.org/D24113, but let's be safe in libclc anyway. llvm-svn: 280871	2016-09-07 22:11:02 +00:00
Jan Vesely	ad8672727c	Implement vstore_half{,n} Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 278962	2016-08-17 20:02:11 +00:00
Jan Vesely	4c59714a52	Make min follow the OCL 1.0 specs OpenCL 1.0: "Returns y if y < x, otherwise it returns x. If x and y are infinite or NaN, the return values are undefined." OpenCL 1.1+: "Returns y if y < x, otherwise it returns x. If x or y are infinite or NaN, the return values are undefined." The 1.0 version is stricter so use that one. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276704	2016-07-25 22:36:22 +00:00
Tom Stellard	d835b3f1af	Implement cbrt builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 276497	2016-07-22 23:45:15 +00:00
Tom Stellard	9cb070f96a	Implement cosh builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 276496	2016-07-22 23:45:13 +00:00
Tom Stellard	ff13926a60	geometric/floatn.inc: Add vec8 and vec16 types llvm-svn: 276495	2016-07-22 23:45:11 +00:00
Jan Vesely	a82e080b57	AMDGPU: Implement get_global_offset builtin Also fix get_global_id to consider offset No idea how to add this for ptx, so they are stuck with the old get_global_id implementation. v2: split to a separate patch v3: Switch R600 to use implictarg.ptr Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276443	2016-07-22 17:24:24 +00:00
Jan Vesely	3317f253de	64 bit integers are legal in full profile without an extension Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 273042	2016-06-17 20:30:41 +00:00
Jan Vesely	973c1fa5f5	math: Use single precision fmax in sp path Fixes fdim piglit on Turks v2: use CL fmax instead of __builtin Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom.stellard@amd.com> llvm-svn: 269807	2016-05-17 19:44:01 +00:00
Jan Vesely	c374cb76f4	math: Add erf ported from amd-builtins The scalar float/double function bodies are a direct copy/paste, aside from the removed (optional) code in float function body that requires subnormals. reviewers: jvesely Patch by: Vedran Miletić <rivanvx@gmail.com> llvm-svn: 268766	2016-05-06 18:02:30 +00:00
Aaron Watry	55a8e0fd6d	math: Add fdim implementation Based on the amd-builtin, but explicitly vectorized for all sizes (not just float4), and includes a vectorized double implementation. Passes piglit (float) tests on pitcairn. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 268708	2016-05-06 03:34:45 +00:00
Aaron Watry	09f3c99a86	math: Fix ilogb(double) return type Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 261714	2016-02-24 00:52:15 +00:00
Aaron Watry	d6d0454231	math: Add ilogb ported from amd-builtins The scalar float/double function bodies are a direct copy/paste with usage of the CLC wrappers to vectorize them. This commit also adds in the FP_ILOGB0 and FP_ILOGBNAN macros which are equal to the results of ilogb(0.0f) and ilogb(float nan) respectively. v2: Add FP_ILOGB0 and FP_ILOGBNAN definitions Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> v1 Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 261639	2016-02-23 14:43:09 +00:00
Jan Vesely	7fbb96b907	math: Fix log2 vectorization on non-fp64 hw reviewer: tstellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 260301	2016-02-09 22:17:42 +00:00
Aaron Watry	8872800eff	math: Add frexp ported from amd-builtins The float implementation is almost a direct port from the amd-builtins, but instead of just having a scalar and float4 implementation, it has a scalar and arbitrary width vector implementation. The double scalar is also a direct port from AMD's builtin release. The double vector implementation copies the logic in the float vector implementation using the values from the double scalar version. Both have been tested in piglit using tests sent to that project's mailing list. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 260114	2016-02-08 17:07:21 +00:00
Tom Stellard	37d19875fa	Implement modf math builtin V2: use the reference implementation as suggested by Matt Arsenault Patch By: Pavel Ondračka llvm-svn: 258933	2016-01-27 14:52:10 +00:00
Tom Stellard	a249f50970	Add _CLC_V_V_VP_VECTORIZE macro Patch by: Pavel Ondračka llvm-svn: 258932	2016-01-27 14:52:07 +00:00
Aaron Watry	23faa5a1f9	integer: remove explicit casts from _MIN definitions The spec says (section 6.12.3, CL version 1.2): The macro names given in the following list must use the values specified. The values shall all be constant expressions suitable for use in #if preprocessing directives. This commit addresses the second part of that statement. Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> CC: Moritz Pflanzer <moritz.pflanzer14@imperial.ac.uk> CC: Serge Martin <edb+libclc@sigluy.net> llvm-svn: 249445	2015-10-06 19:12:12 +00:00
Niels Ole Salscheider	f51df5ba8c	Implement tanh builtin This is a port from the AMD builtin library. llvm-svn: 248780	2015-09-29 06:39:09 +00:00
Tom Stellard	e2bab44ca9	Add sampler defines. Patch by: Zoltan Gilian llvm-svn: 248163	2015-09-21 14:59:58 +00:00
Tom Stellard	50dfd44577	Add image attribute defines. Patch by: Zoltan Gilian llvm-svn: 248162	2015-09-21 14:59:57 +00:00
Tom Stellard	a59fd49ba4	r600: Add image writing builtins. Patch by: Zoltan Gilian llvm-svn: 248161	2015-09-21 14:59:56 +00:00
Tom Stellard	9a7d4a940f	r600: Add image reading builtins. Patch by: Zoltan Gilian llvm-svn: 248160	2015-09-21 14:59:54 +00:00
Tom Stellard	ccc0ec1ddb	Add image attribute getter builtins Added get_image_* OpenCL builtins to the headers. Added implementation to the r600 target. Patch by: Zoltan Gilian llvm-svn: 248159	2015-09-21 14:47:53 +00:00
Aaron Watry	43ee367d1e	integer: Update integer limits to comply with spec The values for the char/short/integer/long minimums were declared with their actual values, not the definitions from the CL spec (v1.1). As a result, (-2147483648) was actually being treated as a long by the compiler, not an int, which caused issues when trying to add/subtract that value from a vector. Update the definitions to use the values declared by the spec, and also add explicit casts for the char/short/int minimums so that the compiler actually treats them as shorts/chars. Without those casts, they actually end up stored as integers, and the compiler may end up storing the INT_MIN as a long. The compiler can sign extend the values if it needs to convert the char->short, short->int, or int->long v2: Add explicit cast for INT_MIN and fix some type-o's and wrapping in the commit message. Reported-by: Moritz Pflanzer <moritz.pflanzer14@imperial.ac.uk> CC: Moritz Pflanzer <moritz.pflanzer14@imperial.ac.uk> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> Signed-off-by: Aaron Watry <awatry@gmail.com> llvm-svn: 247661	2015-09-15 03:56:21 +00:00
Jeroen Ketema	d7be603ab1	Remove files accidentally not removed in r244310 llvm-svn: 244987	2015-08-13 23:43:12 +00:00
Tom Stellard	7a09e88b6e	Fix double implementation of log We need to use M_LOG2E instead of M_LOG2E_F. llvm-svn: 243132	2015-07-24 18:07:14 +00:00
Tom Stellard	44b6117dfd	Implement accurate log2 function Use the implementation was ported from the AMD builtin library rather than LLVM Intrinsics. This has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 243131	2015-07-24 18:07:12 +00:00
Tom Stellard	f01ffa9ddc	Use llvm intrinsics for native_log and native_log2 llvm-svn: 243130	2015-07-24 18:07:06 +00:00
Tom Stellard	2ef5ec6b2b	Fix implementation of sqrt v2 Passing values less than 0 to the llvm.sqrt() intrinsic results in undefined behavior, so we need to check the input and return NaN if is is less than 0. v2: - Fix build failures. llvm-svn: 241906	2015-07-10 13:37:07 +00:00
Tom Stellard	a64bad8338	Use a more accurate implementation for exp Using exp2(x * M_LOG2E_F) does not give us accurate enough results for OpenCL. If you look at the new exp implementation you'll see that it does multiply the input by M_LOG2E_F, but it still uses the original input in part of the calculation. This exp implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237229	2015-05-13 03:55:09 +00:00
Tom Stellard	d538fdc217	Implement exp2 using OpenCL C rather than using an intrinsic Not all targets support the intrinsic, so it's better to have a generic implementation which does not use it. This exp2 implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237228	2015-05-13 03:55:07 +00:00
Tom Stellard	4294541290	Implement sin for double types This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237155	2015-05-12 17:18:47 +00:00
Tom Stellard	2e6ff0c66e	Implement cos for double types This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237154	2015-05-12 17:18:46 +00:00
Tom Stellard	37406a209c	Implement atan2pi builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237138	2015-05-12 14:48:26 +00:00
Tom Stellard	79cc3eda1e	Implement atan2 for doubles This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237131	2015-05-12 13:48:51 +00:00
Jan Vesely	b0fb990b54	math: limit half_sqrt to single precision Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236941	2015-05-09 22:31:03 +00:00
Jan Vesely	7c829fe149	geometric: Limit fast_{distance,length} functions to single precision Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236940	2015-05-09 22:31:01 +00:00
Jan Vesely	071833d454	Fix ldexp fp64 build error Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 236939	2015-05-09 22:30:59 +00:00
Tom Stellard	17ec3a51c3	Implement fast_normalize builtin v4 This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Remove f suffix from constant in double implementations. - Consolidate implementations using the .cl/.inc approach. v3: - Use __CLC_FPSIZE instead of __CLC_FP{32,64} v4 (Jan Vesely): - Limit to single precision. llvm-svn: 236920	2015-05-09 00:04:12 +00:00
Tom Stellard	2ddfa0c5b2	Implement half_rsqrt builtin v3 This is a generic implementation which just calls rsqrt. Targets should override this if they want a faster implementation. v2: - Alphabettize SOURCES v3 (Jan Vesely): Limit to single precision types. llvm-svn: 236915	2015-05-08 23:28:44 +00:00
Jan Vesely	90e7ad589e	Move ldexp soft implementation to a separate file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236648	2015-05-06 21:59:29 +00:00
Jan Vesely	bc81ebefb7	Implement sinpi builtin Ported from AMD builtin library, passes piglit on Turks. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236647	2015-05-06 21:59:26 +00:00
Tom Stellard	2ca909d824	math: Add ldexp implementation Signed-off-by: Aaron Watry <awatry@gmail.com> Tom Stellard: - Add denormal handling. - Share vectorization code with r600 implementation. Patch By: Aaron Watry llvm-svn: 236639	2015-05-06 20:53:32 +00:00
Tom Stellard	f30d5fc01d	Implement ldexp for R600/SI llvm-svn: 236638	2015-05-06 20:53:29 +00:00
Tom Stellard	aed5f3cf7e	Fix implementation of normalize builtin The new implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 236608	2015-05-06 16:06:31 +00:00
Tom Stellard	ba742f58af	Allow compilation depending to the LLVM version It allows to keep temporary compatibilty with older version. For exemple, this can be use when change are not to large. Patch by: EdB llvm-svn: 236113	2015-04-29 15:37:06 +00:00
Jan Vesely	44e768e777	Fix compilation warnings without cl_khr_fp64 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 235762	2015-04-24 19:54:17 +00:00
Tom Stellard	9447de37a9	Implement fract builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 235620	2015-04-23 18:50:14 +00:00
Tom Stellard	d9ca1f1596	configure: Add --enable-runtime-subnormal option This makes it possible for runtime implementations to disable subnormal handling at runtime. When this flag is enabled, decisions about how to handle subnormals in the library will be controlled by an external variable called __CLC_SUBNORMAL_DISABLE. Function implementations should use these new helpers for querying subnormal support: __clc_fp16_subnormals_supported(); __clc_fp32_subnormals_supported(); __clc_fp64_subnormals_supported(); In order for the library to link correctly with this feature, users will be required to either: 1. Insert this variable into the module (if using the LLVM/Clang C++/C APIs). 2. Pass either subnormal_disable.bc or subnormal_use_default.bc to the linker. These files are distributed with liblclc and installed to $(installdir). e.g.: llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_disable.bc or llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_use_default.bc If you do not supply the --enable-runtime-subnormal then the library behaves the same as it did before this commit. In addition to these changes, the patch adds helper functions that should be used when implementing library functions that need special handling for denormals: __clc_fp16_subnormals_supported(); __clc_fp32_subnormals_supported(); __clc_fp64_subnormals_supported(); llvm-svn: 235329	2015-04-20 18:49:50 +00:00
Tom Stellard	da2969fca7	Implement atanh builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 234324	2015-04-07 16:20:22 +00:00
Tom Stellard	ca4d382e11	Implement acosh builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 234323	2015-04-07 16:20:20 +00:00
Tom Stellard	03dc366e79	Implement atanpi builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233928	2015-04-02 17:01:58 +00:00
Tom Stellard	eea0997566	Implement asinpi builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233927	2015-04-02 17:01:56 +00:00
Tom Stellard	2b4ef39b2f	Implement asinh builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233926	2015-04-02 17:01:54 +00:00
Tom Stellard	084124a8fa	Implement acospi builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233925	2015-04-02 17:01:52 +00:00
Tom Stellard	1ded220cc0	Implement fmax using __builtin_fmax This ensures correct handling of NaNi. This has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233713	2015-03-31 16:59:23 +00:00
Tom Stellard	310da7bfd2	Implement fmin using __builtin_fmin This ensures correct handling of NaN. This has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233712	2015-03-31 16:59:21 +00:00
Tom Stellard	bd4da7a0ef	Implement fast_distance builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 232978	2015-03-23 18:10:04 +00:00
Tom Stellard	cb80e14f2c	Implement fast_length builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 232977	2015-03-23 18:10:02 +00:00
Tom Stellard	d2a1559846	Implement half_sqrt builtin v2 This is a generic implementation which just calls sqrt. Targets should override this if they want a faster implementation. v2: - Alphabetize SOURCES llvm-svn: 232965	2015-03-23 17:01:37 +00:00
Tom Stellard	551a669e80	Implement distance builtin v2 This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Remove unnecessary copyright. llvm-svn: 232964	2015-03-23 17:01:35 +00:00
Tom Stellard	cb1c0d7939	Fix implementation of length builtin v2 v2: - Move common code into a macro - Use the same constant for all vector types. llvm-svn: 232963	2015-03-23 17:01:33 +00:00
Tom Stellard	8d3a4e3af2	Add __clc_ prefix to functions in sincos_helpers.cl This will help avoid naming conflicts with functions defined in kernels linking with libclc. llvm-svn: 232960	2015-03-23 16:20:24 +00:00
Aaron Watry	2cf4d5f312	math: Implement erfc Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 232674	2015-03-18 21:52:07 +00:00
Tom Stellard	adfd96f742	Fix bitselect for float/double types v2 We need to reinterpret float/double types as uint/ulong in order to perform the bitwise operations. This has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Use vector operations rather than splitting vectors into scalar components. Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 231373	2015-03-05 15:31:05 +00:00
Aaron Watry	1314630ec3	Move mix from math to common It has been part of the common functions since 1.0 Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 231137	2015-03-03 21:25:08 +00:00
Tom Stellard	9d0d374c5b	Implement step builtin This has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 230970	2015-03-02 15:29:41 +00:00
Tom Stellard	1f28b14bba	Implement smoothstep builtin v2 This has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Fix typo in smoothstep.h llvm-svn: 230969	2015-03-02 15:29:39 +00:00
Tom Stellard	f5e5b0171d	Implement radians builtin v2 This has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Move to the common/ directory llvm-svn: 230968	2015-03-02 15:29:37 +00:00
Tom Stellard	8336b3a604	Implement degrees builtin v2 This has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Move to the common/ directory llvm-svn: 230967	2015-03-02 15:29:35 +00:00
Aaron Watry	f89bcca0b7	libclc/math: Add cospi Ported from the libclc/amd-builtins branch v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4 Add cospi(double) implementation instead of using llvm.cos Notes: The sincosD_piby4.h file is mostly the same as the builtin implementation released by AMD. The inline attribute declaration is changed, and M_PI is used instead of a constant double. Otherwise, the only difference is that the header explicitly enables the fp64 pragma. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk> CC: Tom Stellard <tom@stellard.net> CC: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 230641	2015-02-26 15:42:00 +00:00
Jan Vesely	51702e6e75	Implement log10 v2: Use constant and multiplication instead of division v3: Use hex constants Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 227585	2015-01-30 18:00:34 +00:00
Jeroen Ketema	c9526139bc	Remove wrong semi-colons Patch by Alastair Donaldson llvm-svn: 224568	2014-12-19 09:18:23 +00:00
Jeroen Ketema	7a22aebbda	Don't include <stddef.h> Including a standard or system header isn't allowed in OpenCL. The type "size_t" needs to be explicitely defined now. v2: Use __SIZE_TYPE__ instead of unsigned int. v3: Define ptrdiff_t and NULL. Patch-by: Jean-Sébastien Pédron Reviewed-by: Jeroen Ketema Reviewed-by: Jan Vesely llvm-svn: 222235	2014-11-18 14:19:27 +00:00
NAKAMURA Takumi	729be14435	Prune CRLF. llvm-svn: 220678	2014-10-27 12:37:26 +00:00
Jan Vesely	260827caa2	r600: Use llvm intrinsic to read work dimension information v2: Fix function declaration Add range metadata to r600 implementation v3: change prefix to AMDGPU Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219793	2014-10-15 15:08:06 +00:00
Tom Stellard	bf9f76fbe0	Implement log1p builtin llvm-svn: 219230	2014-10-07 20:22:42 +00:00
Jan Vesely	8f64c3d842	Implement fmod Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 219087	2014-10-05 20:24:52 +00:00
Tom Stellard	081e778d22	Implement async_work_group_copy builtin v3 This is a simple implementation which just copies data synchronously. v2: - Use size_t. v3: - Fix possible race condition by splitting the copy among multiple work items. llvm-svn: 219008	2014-10-03 19:49:39 +00:00
Tom Stellard	ed5bbfdb1b	Implement async_work_group_strided_copy builtin v2 This is a simple implementation which just copies data synchronously. v2: - Use size_t. llvm-svn: 219007	2014-10-03 19:49:37 +00:00
Tom Stellard	b5064f79ef	Implement wait_group_events builtin v2 This is a simple default implemetation which just calls barrier(). v2: - Only call barrier() once. llvm-svn: 219006	2014-10-03 19:49:34 +00:00
Jeroen Ketema	87d2ca57d7	Remove more redundant semi-colons llvm-svn: 218039	2014-09-18 09:23:40 +00:00

1 2 3 4 5 ...

394 Commits