Commit Graph

116 Commits

Author SHA1 Message Date
Tom Stellard 9cb070f96a Implement cosh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 276496
2016-07-22 23:45:13 +00:00
Jan Vesely c374cb76f4 math: Add erf ported from amd-builtins
The scalar float/double function bodies are a direct copy/paste,
aside from the removed (optional) code in float function body that
requires subnormals.

reviewers: jvesely

Patch by: Vedran Miletić <rivanvx@gmail.com>

llvm-svn: 268766
2016-05-06 18:02:30 +00:00
Aaron Watry 55a8e0fd6d math: Add fdim implementation
Based on the amd-builtin, but explicitly vectorized for all sizes (not just
float4), and includes a vectorized double implementation.

Passes piglit (float) tests on pitcairn.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 268708
2016-05-06 03:34:45 +00:00
Aaron Watry d6d0454231 math: Add ilogb ported from amd-builtins
The scalar float/double function bodies are a direct copy/paste
with usage of the CLC wrappers to vectorize them.

This commit also adds in the FP_ILOGB0 and FP_ILOGBNAN macros which are
equal to the results of ilogb(0.0f) and ilogb(float nan) respectively.

v2: Add FP_ILOGB0 and FP_ILOGBNAN definitions

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
v1 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

llvm-svn: 261639
2016-02-23 14:43:09 +00:00
Aaron Watry 8872800eff math: Add frexp ported from amd-builtins
The float implementation is almost a direct port from the amd-builtins,
but instead of just having a scalar and float4 implementation, it has
a scalar and arbitrary width vector implementation.

The double scalar is also a direct port from AMD's builtin release.

The double vector implementation copies the logic in the float vector
implementation using the values from the double scalar version.

Both have been tested in piglit using tests sent to that project's
mailing list.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260114
2016-02-08 17:07:21 +00:00
Tom Stellard 37d19875fa Implement modf math builtin
V2: use the reference implementation as suggested by Matt Arsenault

Patch By: Pavel Ondračka

llvm-svn: 258933
2016-01-27 14:52:10 +00:00
Niels Ole Salscheider f51df5ba8c Implement tanh builtin
This is a port from the AMD builtin library.

llvm-svn: 248780
2015-09-29 06:39:09 +00:00
Tom Stellard ccc0ec1ddb Add image attribute getter builtins
Added get_image_* OpenCL builtins to the headers.
Added implementation to the r600 target.

Patch by: Zoltan Gilian

llvm-svn: 248159
2015-09-21 14:47:53 +00:00
Tom Stellard 7a09e88b6e Fix double implementation of log
We need to use M_LOG2E instead of M_LOG2E_F.

llvm-svn: 243132
2015-07-24 18:07:14 +00:00
Tom Stellard 44b6117dfd Implement accurate log2 function
Use the implementation was ported from the AMD builtin library rather
than LLVM Intrinsics.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 243131
2015-07-24 18:07:12 +00:00
Tom Stellard f01ffa9ddc Use llvm intrinsics for native_log and native_log2
llvm-svn: 243130
2015-07-24 18:07:06 +00:00
Tom Stellard 2ef5ec6b2b Fix implementation of sqrt v2
Passing values less than 0 to the llvm.sqrt() intrinsic results in
undefined behavior, so we need to check the input and return NaN if
is is less than 0.

v2:
  - Fix build failures.

llvm-svn: 241906
2015-07-10 13:37:07 +00:00
Tom Stellard d538fdc217 Implement exp2 using OpenCL C rather than using an intrinsic
Not all targets support the intrinsic, so it's better to have a
generic implementation which does not use it.

This exp2 implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237228
2015-05-13 03:55:07 +00:00
Tom Stellard 37406a209c Implement atan2pi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 237138
2015-05-12 14:48:26 +00:00
Tom Stellard 17ec3a51c3 Implement fast_normalize builtin v4
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Remove f suffix from constant in double implementations.
  - Consolidate implementations using the .cl/.inc approach.

v3:
 - Use __CLC_FPSIZE instead of __CLC_FP{32,64}

v4 (Jan Vesely):
 - Limit to single precision.

llvm-svn: 236920
2015-05-09 00:04:12 +00:00
Tom Stellard 2ddfa0c5b2 Implement half_rsqrt builtin v3
This is a generic implementation which just calls rsqrt.
Targets should override this if they want a faster implementation.

v2:
  - Alphabettize SOURCES

v3 (Jan Vesely):
  Limit to single precision types.

llvm-svn: 236915
2015-05-08 23:28:44 +00:00
Jan Vesely 90e7ad589e Move ldexp soft implementation to a separate file
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236648
2015-05-06 21:59:29 +00:00
Jan Vesely bc81ebefb7 Implement sinpi builtin
Ported from AMD builtin library, passes piglit on Turks.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 236647
2015-05-06 21:59:26 +00:00
Tom Stellard 2ca909d824 math: Add ldexp implementation
Signed-off-by: Aaron Watry <awatry@gmail.com>

Tom Stellard:
  - Add denormal handling.
  - Share vectorization code with r600 implementation.

Patch By: Aaron Watry

llvm-svn: 236639
2015-05-06 20:53:32 +00:00
Tom Stellard 9447de37a9 Implement fract builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 235620
2015-04-23 18:50:14 +00:00
Tom Stellard d9ca1f1596 configure: Add --enable-runtime-subnormal option
This makes it possible for runtime implementations to disable
subnormal handling at runtime.

When this flag is enabled, decisions about how to handle subnormals
in the library will be controlled by an external variable called
__CLC_SUBNORMAL_DISABLE.

Function implementations should use these new helpers for querying subnormal
support:
__clc_fp16_subnormals_supported();
__clc_fp32_subnormals_supported();
__clc_fp64_subnormals_supported();

In order for the library to link correctly with this feature,
users will be required to either:

1. Insert this variable into the module (if using the LLVM/Clang C++/C APIs).

2. Pass either subnormal_disable.bc or subnormal_use_default.bc to the
linker.  These files are distributed with liblclc and installed to
$(installdir).  e.g.:

llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_disable.bc

or

llvm-link -o kernel-out.bc kernel.bc builtins-nosubnormal.bc subnormal_use_default.bc

If you do not supply the --enable-runtime-subnormal then the library
behaves the same as it did before this commit.

In addition to these changes, the patch adds helper functions that
should be used when implementing library functions that need
special handling for denormals:

__clc_fp16_subnormals_supported();
__clc_fp32_subnormals_supported();
__clc_fp64_subnormals_supported();

llvm-svn: 235329
2015-04-20 18:49:50 +00:00
Tom Stellard da2969fca7 Implement atanh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 234324
2015-04-07 16:20:22 +00:00
Tom Stellard ca4d382e11 Implement acosh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 234323
2015-04-07 16:20:20 +00:00
Tom Stellard 03dc366e79 Implement atanpi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233928
2015-04-02 17:01:58 +00:00
Tom Stellard eea0997566 Implement asinpi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233927
2015-04-02 17:01:56 +00:00
Tom Stellard 2b4ef39b2f Implement asinh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233926
2015-04-02 17:01:54 +00:00
Tom Stellard 084124a8fa Implement acospi builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 233925
2015-04-02 17:01:52 +00:00
Tom Stellard bd4da7a0ef Implement fast_distance builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 232978
2015-03-23 18:10:04 +00:00
Tom Stellard cb80e14f2c Implement fast_length builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 232977
2015-03-23 18:10:02 +00:00
Tom Stellard d2a1559846 Implement half_sqrt builtin v2
This is a generic implementation which just calls sqrt.  Targets should
override this if they want a faster implementation.

v2:
  - Alphabetize SOURCES

llvm-svn: 232965
2015-03-23 17:01:37 +00:00
Tom Stellard 551a669e80 Implement distance builtin v2
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Remove unnecessary copyright.

llvm-svn: 232964
2015-03-23 17:01:35 +00:00
Aaron Watry 2cf4d5f312 math: Implement erfc
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 232674
2015-03-18 21:52:07 +00:00
Tom Stellard adfd96f742 Fix bitselect for float/double types v2
We need to reinterpret float/double types as uint/ulong in order to
perform the bitwise operations.

This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Use vector operations rather than splitting vectors into scalar
    components.

Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 231373
2015-03-05 15:31:05 +00:00
Aaron Watry 1314630ec3 Move mix from math to common
It has been part of the common functions since 1.0

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 231137
2015-03-03 21:25:08 +00:00
Tom Stellard 9d0d374c5b Implement step builtin
This has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 230970
2015-03-02 15:29:41 +00:00
Tom Stellard 1f28b14bba Implement smoothstep builtin v2
This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Fix typo in smoothstep.h

llvm-svn: 230969
2015-03-02 15:29:39 +00:00
Tom Stellard f5e5b0171d Implement radians builtin v2
This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Move to the common/ directory

llvm-svn: 230968
2015-03-02 15:29:37 +00:00
Tom Stellard 8336b3a604 Implement degrees builtin v2
This has been tested with piglit, OpenCV, and the ocl conformance tests.

v2:
  - Move to the common/ directory

llvm-svn: 230967
2015-03-02 15:29:35 +00:00
Aaron Watry f89bcca0b7 libclc/math: Add cospi
Ported from the libclc/amd-builtins branch

v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4
    Add cospi(double) implementation instead of using llvm.cos

Notes:
The sincosD_piby4.h file is mostly the same as the builtin implementation
released by AMD. The inline attribute declaration is changed, and M_PI is
used instead of a constant double. Otherwise, the only difference is that
the header explicitly enables the fp64 pragma.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
CC: Tom Stellard <tom@stellard.net>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 230641
2015-02-26 15:42:00 +00:00
Jan Vesely 51702e6e75 Implement log10
v2: Use constant and multiplication instead of division
v3: Use hex constants

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 227585
2015-01-30 18:00:34 +00:00
Tom Stellard bf9f76fbe0 Implement log1p builtin
llvm-svn: 219230
2014-10-07 20:22:42 +00:00
Jan Vesely 8f64c3d842 Implement fmod
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 219087
2014-10-05 20:24:52 +00:00
Tom Stellard 081e778d22 Implement async_work_group_copy builtin v3
This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.

v3:
  - Fix possible race condition by splitting the copy among multiple
    work items.

llvm-svn: 219008
2014-10-03 19:49:39 +00:00
Tom Stellard ed5bbfdb1b Implement async_work_group_strided_copy builtin v2
This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.

llvm-svn: 219007
2014-10-03 19:49:37 +00:00
Tom Stellard b5064f79ef Implement wait_group_events builtin v2
This is a simple default implemetation which just calls barrier().

v2:
  - Only call barrier() once.

llvm-svn: 219006
2014-10-03 19:49:34 +00:00
Aaron Watry 0d976ba497 atomic: Add generic atom[ic]_cmpxchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217918
2014-09-16 22:34:49 +00:00
Aaron Watry 025d79ad6c atomic: Implement generic atom[ic]_xchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217917
2014-09-16 22:34:45 +00:00
Aaron Watry 7cfa12c2a5 atomic: Add generic atomic_min implementation
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217916
2014-09-16 22:34:41 +00:00
Aaron Watry 3f0a1a4c27 atomic: Add generic atom[ic]_xor
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217915
2014-09-16 22:34:36 +00:00
Aaron Watry 31e67d1cff atomic: Add atom[ic]_or
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217914
2014-09-16 22:34:32 +00:00