llvm-project

Commit Graph

Author	SHA1	Message	Date
Jan Vesely	a02d0e2c50	integer/sub_sat: Use clang builtin instead of llvm asm reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314703	2017-10-02 18:39:03 +00:00
Jan Vesely	1964df8fad	integer/add_sat: Use clang builtin instead of llvm asm reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314702	2017-10-02 18:39:00 +00:00
Jan Vesely	943057a288	integer/clz: Use clang builtin instead of llvm asm The generated llvm IR mostly identical. char/uchar case is a bit worse. reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314701	2017-10-02 18:38:57 +00:00
Jan Vesely	1fa727d615	Rework atomic ops to use clang builtins rather than llvm asm reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314112	2017-09-25 16:07:34 +00:00
Jan Vesely	c9bbbe2403	Implement cl_khr_int64_extended_atomics builtins Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 313811	2017-09-20 20:42:19 +00:00
Jan Vesely	1c81f4b0e3	Implement cl_khr_int64_base_atomics builtins Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 313810	2017-09-20 20:42:14 +00:00
Jan Vesely	661ac03a1b	vstore: Cleanup and add vstore(half) Add missing undefs Make helpers amdgpu specific (NVPTX uses different numbering for private AS) Use clang builtins on clang >= 6 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312838	2017-09-08 23:58:57 +00:00
Aaron Watry	0bf96b1712	relational: Implement shuffle2 builtin This was added in CL 1.1 Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via: test_conformance/relationals/test_relationals shuffle_built_in_dual_input v2: Add half support to shuffle2 Move shuffle2 to misc/ Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312404	2017-09-02 02:23:28 +00:00
Aaron Watry	880f15dae6	relational: Implement shuffle builtin This was added in CL 1.1 Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via: test_conformance/relationals/test_relationals shuffle_built_in v2: Add half-precision support to shuffle when available. Move to misc/ and add section 6.12.12 to clc.h Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312403	2017-09-02 02:23:26 +00:00
Jan Vesely	9f7172965c	math: Implement sinh function mostly copied form amd_builtins llvm-svn: 296233	2017-02-25 02:46:53 +00:00
Aaron Watry	c606efabb7	math: Add logb builtin Ported from the amd-builtins branch. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 292335	2017-01-18 03:14:10 +00:00
Aaron Watry	900bd7eb7f	math: Add expm1 builtin function Ported from the amd-builtins branch. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 292334	2017-01-18 03:13:37 +00:00
Jan Vesely	0a5aac3fc4	Provide vstore_half helper to workaround clc restrictions clang won't accept half precision loads and stores without cl_khr_fp16 since r281904 llvm-svn: 282106	2016-09-21 20:15:55 +00:00
Aaron Watry	af569547fa	math: Implement tgamma Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281566	2016-09-15 00:17:34 +00:00
Aaron Watry	e9009cdd21	math: Implement lgamma Just use lgamma_r and ignore the value returned in the second argument Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281565	2016-09-15 00:17:31 +00:00
Aaron Watry	0ab07e1bde	math: Implement lgamma_r Ported from the amd-builtins branch, which is itself based on the Sun Microsystems implementation. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281564	2016-09-15 00:17:28 +00:00
Tom Stellard	d835b3f1af	Implement cbrt builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 276497	2016-07-22 23:45:15 +00:00
Tom Stellard	9cb070f96a	Implement cosh builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 276496	2016-07-22 23:45:13 +00:00
Jan Vesely	c374cb76f4	math: Add erf ported from amd-builtins The scalar float/double function bodies are a direct copy/paste, aside from the removed (optional) code in float function body that requires subnormals. reviewers: jvesely Patch by: Vedran Miletić <rivanvx@gmail.com> llvm-svn: 268766	2016-05-06 18:02:30 +00:00
Aaron Watry	55a8e0fd6d	math: Add fdim implementation Based on the amd-builtin, but explicitly vectorized for all sizes (not just float4), and includes a vectorized double implementation. Passes piglit (float) tests on pitcairn. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 268708	2016-05-06 03:34:45 +00:00
Aaron Watry	d6d0454231	math: Add ilogb ported from amd-builtins The scalar float/double function bodies are a direct copy/paste with usage of the CLC wrappers to vectorize them. This commit also adds in the FP_ILOGB0 and FP_ILOGBNAN macros which are equal to the results of ilogb(0.0f) and ilogb(float nan) respectively. v2: Add FP_ILOGB0 and FP_ILOGBNAN definitions Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> v1 Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 261639	2016-02-23 14:43:09 +00:00
Aaron Watry	8872800eff	math: Add frexp ported from amd-builtins The float implementation is almost a direct port from the amd-builtins, but instead of just having a scalar and float4 implementation, it has a scalar and arbitrary width vector implementation. The double scalar is also a direct port from AMD's builtin release. The double vector implementation copies the logic in the float vector implementation using the values from the double scalar version. Both have been tested in piglit using tests sent to that project's mailing list. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 260114	2016-02-08 17:07:21 +00:00
Tom Stellard	37d19875fa	Implement modf math builtin V2: use the reference implementation as suggested by Matt Arsenault Patch By: Pavel Ondračka llvm-svn: 258933	2016-01-27 14:52:10 +00:00
Niels Ole Salscheider	f51df5ba8c	Implement tanh builtin This is a port from the AMD builtin library. llvm-svn: 248780	2015-09-29 06:39:09 +00:00
Tom Stellard	ccc0ec1ddb	Add image attribute getter builtins Added get_image_* OpenCL builtins to the headers. Added implementation to the r600 target. Patch by: Zoltan Gilian llvm-svn: 248159	2015-09-21 14:47:53 +00:00
Tom Stellard	7a09e88b6e	Fix double implementation of log We need to use M_LOG2E instead of M_LOG2E_F. llvm-svn: 243132	2015-07-24 18:07:14 +00:00
Tom Stellard	44b6117dfd	Implement accurate log2 function Use the implementation was ported from the AMD builtin library rather than LLVM Intrinsics. This has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 243131	2015-07-24 18:07:12 +00:00
Tom Stellard	f01ffa9ddc	Use llvm intrinsics for native_log and native_log2 llvm-svn: 243130	2015-07-24 18:07:06 +00:00
Tom Stellard	2ef5ec6b2b	Fix implementation of sqrt v2 Passing values less than 0 to the llvm.sqrt() intrinsic results in undefined behavior, so we need to check the input and return NaN if is is less than 0. v2: - Fix build failures. llvm-svn: 241906	2015-07-10 13:37:07 +00:00
Tom Stellard	d538fdc217	Implement exp2 using OpenCL C rather than using an intrinsic Not all targets support the intrinsic, so it's better to have a generic implementation which does not use it. This exp2 implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237228	2015-05-13 03:55:07 +00:00
Tom Stellard	37406a209c	Implement atan2pi builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 237138	2015-05-12 14:48:26 +00:00
Tom Stellard	17ec3a51c3	Implement fast_normalize builtin v4 This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Remove f suffix from constant in double implementations. - Consolidate implementations using the .cl/.inc approach. v3: - Use __CLC_FPSIZE instead of __CLC_FP{32,64} v4 (Jan Vesely): - Limit to single precision. llvm-svn: 236920	2015-05-09 00:04:12 +00:00
Tom Stellard	2ddfa0c5b2	Implement half_rsqrt builtin v3 This is a generic implementation which just calls rsqrt. Targets should override this if they want a faster implementation. v2: - Alphabettize SOURCES v3 (Jan Vesely): Limit to single precision types. llvm-svn: 236915	2015-05-08 23:28:44 +00:00
Jan Vesely	90e7ad589e	Move ldexp soft implementation to a separate file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236648	2015-05-06 21:59:29 +00:00
Jan Vesely	bc81ebefb7	Implement sinpi builtin Ported from AMD builtin library, passes piglit on Turks. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 236647	2015-05-06 21:59:26 +00:00
Tom Stellard	2ca909d824	math: Add ldexp implementation Signed-off-by: Aaron Watry <awatry@gmail.com> Tom Stellard: - Add denormal handling. - Share vectorization code with r600 implementation. Patch By: Aaron Watry llvm-svn: 236639	2015-05-06 20:53:32 +00:00
Tom Stellard	9447de37a9	Implement fract builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 235620	2015-04-23 18:50:14 +00:00
Tom Stellard	d9ca1f1596	configure: Add --enable-runtime-subnormal option This makes it possible for runtime implementations to disable subnormal handling at runtime. When this flag is enabled, decisions about how to handle subnormals in the library will be controlled by an external variable called __CLC_SUBNORMAL_DISABLE. Function implementations should use these new helpers for querying subnormal support: __clc_fp16_subnormals_supported(); __clc_fp32_subnormals_supported(); __clc_fp64_subnormals_supported(); In order for the library to link correctly with this feature, users will be required to either: 1. Insert this variable into the module (if using the LLVM/Clang C++/C APIs). 2. Pass either subnormal_disable.bc or subnormal_use_default.bc to the linker. These files are distributed with liblclc and installed to $(installdir). e.g.: llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_disable.bc or llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_use_default.bc If you do not supply the --enable-runtime-subnormal then the library behaves the same as it did before this commit. In addition to these changes, the patch adds helper functions that should be used when implementing library functions that need special handling for denormals: __clc_fp16_subnormals_supported(); __clc_fp32_subnormals_supported(); __clc_fp64_subnormals_supported(); llvm-svn: 235329	2015-04-20 18:49:50 +00:00
Tom Stellard	da2969fca7	Implement atanh builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 234324	2015-04-07 16:20:22 +00:00
Tom Stellard	ca4d382e11	Implement acosh builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 234323	2015-04-07 16:20:20 +00:00
Tom Stellard	03dc366e79	Implement atanpi builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233928	2015-04-02 17:01:58 +00:00
Tom Stellard	eea0997566	Implement asinpi builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233927	2015-04-02 17:01:56 +00:00
Tom Stellard	2b4ef39b2f	Implement asinh builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233926	2015-04-02 17:01:54 +00:00
Tom Stellard	084124a8fa	Implement acospi builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 233925	2015-04-02 17:01:52 +00:00
Tom Stellard	bd4da7a0ef	Implement fast_distance builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 232978	2015-03-23 18:10:04 +00:00
Tom Stellard	cb80e14f2c	Implement fast_length builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 232977	2015-03-23 18:10:02 +00:00
Tom Stellard	d2a1559846	Implement half_sqrt builtin v2 This is a generic implementation which just calls sqrt. Targets should override this if they want a faster implementation. v2: - Alphabetize SOURCES llvm-svn: 232965	2015-03-23 17:01:37 +00:00
Tom Stellard	551a669e80	Implement distance builtin v2 This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Remove unnecessary copyright. llvm-svn: 232964	2015-03-23 17:01:35 +00:00
Aaron Watry	2cf4d5f312	math: Implement erfc Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 232674	2015-03-18 21:52:07 +00:00
Tom Stellard	adfd96f742	Fix bitselect for float/double types v2 We need to reinterpret float/double types as uint/ulong in order to perform the bitwise operations. This has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Use vector operations rather than splitting vectors into scalar components. Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 231373	2015-03-05 15:31:05 +00:00
Aaron Watry	1314630ec3	Move mix from math to common It has been part of the common functions since 1.0 Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 231137	2015-03-03 21:25:08 +00:00
Tom Stellard	9d0d374c5b	Implement step builtin This has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 230970	2015-03-02 15:29:41 +00:00
Tom Stellard	1f28b14bba	Implement smoothstep builtin v2 This has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Fix typo in smoothstep.h llvm-svn: 230969	2015-03-02 15:29:39 +00:00
Tom Stellard	f5e5b0171d	Implement radians builtin v2 This has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Move to the common/ directory llvm-svn: 230968	2015-03-02 15:29:37 +00:00
Tom Stellard	8336b3a604	Implement degrees builtin v2 This has been tested with piglit, OpenCV, and the ocl conformance tests. v2: - Move to the common/ directory llvm-svn: 230967	2015-03-02 15:29:35 +00:00
Aaron Watry	f89bcca0b7	libclc/math: Add cospi Ported from the libclc/amd-builtins branch v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4 Add cospi(double) implementation instead of using llvm.cos Notes: The sincosD_piby4.h file is mostly the same as the builtin implementation released by AMD. The inline attribute declaration is changed, and M_PI is used instead of a constant double. Otherwise, the only difference is that the header explicitly enables the fp64 pragma. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk> CC: Tom Stellard <tom@stellard.net> CC: Matt Arsenault <Matthew.Arsenault@amd.com> llvm-svn: 230641	2015-02-26 15:42:00 +00:00
Jan Vesely	51702e6e75	Implement log10 v2: Use constant and multiplication instead of division v3: Use hex constants Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 227585	2015-01-30 18:00:34 +00:00
Tom Stellard	bf9f76fbe0	Implement log1p builtin llvm-svn: 219230	2014-10-07 20:22:42 +00:00
Jan Vesely	8f64c3d842	Implement fmod Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 219087	2014-10-05 20:24:52 +00:00
Tom Stellard	081e778d22	Implement async_work_group_copy builtin v3 This is a simple implementation which just copies data synchronously. v2: - Use size_t. v3: - Fix possible race condition by splitting the copy among multiple work items. llvm-svn: 219008	2014-10-03 19:49:39 +00:00
Tom Stellard	ed5bbfdb1b	Implement async_work_group_strided_copy builtin v2 This is a simple implementation which just copies data synchronously. v2: - Use size_t. llvm-svn: 219007	2014-10-03 19:49:37 +00:00
Tom Stellard	b5064f79ef	Implement wait_group_events builtin v2 This is a simple default implemetation which just calls barrier(). v2: - Only call barrier() once. llvm-svn: 219006	2014-10-03 19:49:34 +00:00
Aaron Watry	0d976ba497	atomic: Add generic atom[ic]_cmpxchg Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217918	2014-09-16 22:34:49 +00:00
Aaron Watry	025d79ad6c	atomic: Implement generic atom[ic]_xchg Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217917	2014-09-16 22:34:45 +00:00
Aaron Watry	7cfa12c2a5	atomic: Add generic atomic_min implementation Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217916	2014-09-16 22:34:41 +00:00
Aaron Watry	3f0a1a4c27	atomic: Add generic atom[ic]_xor Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217915	2014-09-16 22:34:36 +00:00
Aaron Watry	31e67d1cff	atomic: Add atom[ic]_or Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217914	2014-09-16 22:34:32 +00:00
Aaron Watry	cc68405761	atomics: Add generic atom[ic]_and Not used yet. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217913	2014-09-16 22:34:28 +00:00
Aaron Watry	49614fbfd9	atomic: Add generic implementation of atom[ic]_max Not used yet... v2: Correct int/uint behavior Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217912	2014-09-16 22:34:24 +00:00
Aaron Watry	c9b88d32be	atomic: define extension functions for existing atomic implementations We were missing the local versions of the atom_* before Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 217911	2014-09-16 22:34:21 +00:00
Aaron Watry	947bdd059a	math: Add tan implementation Uses the algorithm: tan(x) = sin(x) / sqrt(1-sin^2(x)) An alternative is: tan(x) = sin(x) / cos(x) Which produces more verbose bitcode and longer assembly. Either way, the generated bitcode seems pretty nasty and a more optimized but still precise-enough solution is welcome. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 217511	2014-09-10 15:43:35 +00:00
Aaron Watry	951ab64d19	math: Add asin implementation asin(x) = atan2(x, sqrt( 1-x^2 )) alternatively: asin(x) = PI/2 - acos(x) Use the atan2 implementation since it produces slightly shorter bitcode and R600 machine code. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 217510	2014-09-10 15:43:32 +00:00
Aaron Watry	268beab921	math: Add acos implementation Passes the tests that were submitted to the piglit list Tested on R600 (Pitcairn) Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 217509	2014-09-10 15:43:29 +00:00
Jan Vesely	05a60b7ac3	add isordered builtin Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 217247	2014-09-05 13:59:15 +00:00
Jan Vesely	63486c1f0e	add isunordered builtin v2: remove trailing newline Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 217246	2014-09-05 13:59:13 +00:00
Jan Vesely	41a0c491de	add islessgreater builtin v2: remove trailing newline Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 217245	2014-09-05 13:59:11 +00:00
Jan Vesely	369e20353c	add isnormal builtin v2: simplify and remove isnan leftovers remove trailing newline Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 217244	2014-09-05 13:59:09 +00:00
Jan Vesely	a5a3b023b4	add isfinite builtin v2: simplify and remove isinf leftovers remove trailing newline Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 217243	2014-09-05 13:59:06 +00:00
Tom Stellard	7a9e2c6879	Implement isinf builtin llvm-svn: 217046	2014-09-03 15:55:40 +00:00
Tom Stellard	d8a73abfc3	Fix implementation of copysign This was previously implemented with a macro and we were using __builtin_copysign(), which takes double inputs for the float version of copysign(). Reviewed-and-Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 217045	2014-09-03 15:55:38 +00:00
Jan Vesely	ef513d392b	Implement generic mad_sat v2: Fix trailing whitespace Fix signed long overflow improve comment v3: fix typo Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 216923	2014-09-02 17:55:02 +00:00
Aaron Watry	9447097636	Revert "Implement generic mad_sat" This reverts commit cf62eded8b623a1c10d3692d25e5882b7939f564. I didn't mean to commit this... Jan has a v3 incoming llvm-svn: 216322	2014-08-23 14:06:01 +00:00
Aaron Watry	6bfac7ae69	Implement generic mad_sat v2: Fix trailing whitespace Fix signed long overflow improve comment Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu> llvm-svn: 216320	2014-08-23 14:04:33 +00:00
Tom Stellard	2ad4243bf7	Implement prefetch builtin The default implementation is a no-op. Targets should override this with their own implementations. llvm-svn: 216127	2014-08-20 21:23:03 +00:00
Aaron Watry	f991505d02	vload/vstore: Use casts instead of scalarizing everything in CLC version This generates bitcode which is indistinguishable from what was hand-written for int32 types in v[load\|store]_impl.ll. v4: Use vec2+scalar for vec3 load/stores to prevent corruption (per Tom) v3: Also remove unused generic/lib/shared/v[load\|store]_impl.ll v2: (Per Matt Arsenault) Fix alignment issues with vector load stores Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> CC: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 216069	2014-08-20 13:58:57 +00:00
Jan Vesely	12c660827e	relational: Add islessequal(floatN) builtin v2: remove the initial undef Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 214568	2014-08-01 21:50:59 +00:00
Jan Vesely	acba2c98eb	relational: Add isless(floatN) builtin v2: remove the initial undef Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 214567	2014-08-01 21:50:55 +00:00
Tom Stellard	903a78b7c6	Implement sin builtin for float types This double version still uses @llvm.sin. llvm-svn: 213762	2014-07-23 15:16:21 +00:00
Tom Stellard	c0ab2f81e3	Implement cos builtin for float types The double version still uses @llvm.cos. llvm-svn: 213761	2014-07-23 15:16:18 +00:00
Tom Stellard	f9caca8b9d	Implement atan2 builtin llvm-svn: 213760	2014-07-23 15:16:16 +00:00
Tom Stellard	47882923c7	Implement atan builtin llvm-svn: 213759	2014-07-23 15:16:13 +00:00
Aaron Watry	d7f022a582	relational: Implement isnotequal v2: Use relational macros instead of hand-rolled ones Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 213320	2014-07-17 22:07:32 +00:00
Aaron Watry	30102536c0	relational: Implement isgreaterequal v2: Use relational macros instead of hand-rolled macros Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 213319	2014-07-17 22:07:27 +00:00
Aaron Watry	803a992f04	relational: Implement isgreater v2: Use relational macros instead of hand-rolled macros Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 213318	2014-07-17 22:07:19 +00:00
Aaron Watry	d9ee196eab	relational: Implement signbit v2 Changes: - use __builtin_signbit instead of shifting by hand - significantly improve vector shuffling - Works correctly now for signbit(float16) on radeonsi Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 211696	2014-06-25 13:29:23 +00:00
Jeroen Ketema	42df5d2a8f	Add exp10 Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 211680	2014-06-25 10:06:35 +00:00
Jeroen Ketema	09516fa27d	Add pown Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 211211	2014-06-18 19:42:23 +00:00
Aaron Watry	6af2969a61	math: Implement mix builtin Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 211047	2014-06-16 19:53:59 +00:00
Aaron Watry	f7f79d2a94	relational: Add isequal(floatN) builtin Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 211046	2014-06-16 19:53:57 +00:00
Aaron Watry	e167db9238	Add all(igentype) builtin Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 211045	2014-06-16 19:53:54 +00:00

1 2 3 4

183 Commits