Tom Stellard
da2969fca7
Implement atanh builtin
...
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 234324
2015-04-07 16:20:22 +00:00
Tom Stellard
ca4d382e11
Implement acosh builtin
...
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 234323
2015-04-07 16:20:20 +00:00
Tom Stellard
03dc366e79
Implement atanpi builtin
...
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 233928
2015-04-02 17:01:58 +00:00
Tom Stellard
eea0997566
Implement asinpi builtin
...
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 233927
2015-04-02 17:01:56 +00:00
Tom Stellard
2b4ef39b2f
Implement asinh builtin
...
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 233926
2015-04-02 17:01:54 +00:00
Tom Stellard
084124a8fa
Implement acospi builtin
...
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 233925
2015-04-02 17:01:52 +00:00
Tom Stellard
1ded220cc0
Implement fmax using __builtin_fmax
...
This ensures correct handling of NaNi.
This has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 233713
2015-03-31 16:59:23 +00:00
Tom Stellard
310da7bfd2
Implement fmin using __builtin_fmin
...
This ensures correct handling of NaN.
This has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 233712
2015-03-31 16:59:21 +00:00
Tom Stellard
bd4da7a0ef
Implement fast_distance builtin
...
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 232978
2015-03-23 18:10:04 +00:00
Tom Stellard
cb80e14f2c
Implement fast_length builtin
...
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 232977
2015-03-23 18:10:02 +00:00
Tom Stellard
d2a1559846
Implement half_sqrt builtin v2
...
This is a generic implementation which just calls sqrt. Targets should
override this if they want a faster implementation.
v2:
- Alphabetize SOURCES
llvm-svn: 232965
2015-03-23 17:01:37 +00:00
Tom Stellard
551a669e80
Implement distance builtin v2
...
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
v2:
- Remove unnecessary copyright.
llvm-svn: 232964
2015-03-23 17:01:35 +00:00
Tom Stellard
cb1c0d7939
Fix implementation of length builtin v2
...
v2:
- Move common code into a macro
- Use the same constant for all vector types.
llvm-svn: 232963
2015-03-23 17:01:33 +00:00
Tom Stellard
8d3a4e3af2
Add __clc_ prefix to functions in sincos_helpers.cl
...
This will help avoid naming conflicts with functions defined in
kernels linking with libclc.
llvm-svn: 232960
2015-03-23 16:20:24 +00:00
Aaron Watry
2cf4d5f312
math: Implement erfc
...
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 232674
2015-03-18 21:52:07 +00:00
Tom Stellard
adfd96f742
Fix bitselect for float/double types v2
...
We need to reinterpret float/double types as uint/ulong in order to
perform the bitwise operations.
This has been tested with piglit, OpenCV, and the ocl conformance tests.
v2:
- Use vector operations rather than splitting vectors into scalar
components.
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 231373
2015-03-05 15:31:05 +00:00
Aaron Watry
1314630ec3
Move mix from math to common
...
It has been part of the common functions since 1.0
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 231137
2015-03-03 21:25:08 +00:00
Tom Stellard
9d0d374c5b
Implement step builtin
...
This has been tested with piglit, OpenCV, and the ocl conformance tests.
llvm-svn: 230970
2015-03-02 15:29:41 +00:00
Tom Stellard
1f28b14bba
Implement smoothstep builtin v2
...
This has been tested with piglit, OpenCV, and the ocl conformance tests.
v2:
- Fix typo in smoothstep.h
llvm-svn: 230969
2015-03-02 15:29:39 +00:00
Tom Stellard
f5e5b0171d
Implement radians builtin v2
...
This has been tested with piglit, OpenCV, and the ocl conformance tests.
v2:
- Move to the common/ directory
llvm-svn: 230968
2015-03-02 15:29:37 +00:00
Tom Stellard
8336b3a604
Implement degrees builtin v2
...
This has been tested with piglit, OpenCV, and the ocl conformance tests.
v2:
- Move to the common/ directory
llvm-svn: 230967
2015-03-02 15:29:35 +00:00
Aaron Watry
f89bcca0b7
libclc/math: Add cospi
...
Ported from the libclc/amd-builtins branch
v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4
Add cospi(double) implementation instead of using llvm.cos
Notes:
The sincosD_piby4.h file is mostly the same as the builtin implementation
released by AMD. The inline attribute declaration is changed, and M_PI is
used instead of a constant double. Otherwise, the only difference is that
the header explicitly enables the fp64 pragma.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
CC: Tom Stellard <tom@stellard.net>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 230641
2015-02-26 15:42:00 +00:00
Jan Vesely
51702e6e75
Implement log10
...
v2: Use constant and multiplication instead of division
v3: Use hex constants
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 227585
2015-01-30 18:00:34 +00:00
Tom Stellard
bf9f76fbe0
Implement log1p builtin
...
llvm-svn: 219230
2014-10-07 20:22:42 +00:00
Jan Vesely
8f64c3d842
Implement fmod
...
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 219087
2014-10-05 20:24:52 +00:00
Tom Stellard
081e778d22
Implement async_work_group_copy builtin v3
...
This is a simple implementation which just copies data synchronously.
v2:
- Use size_t.
v3:
- Fix possible race condition by splitting the copy among multiple
work items.
llvm-svn: 219008
2014-10-03 19:49:39 +00:00
Tom Stellard
ed5bbfdb1b
Implement async_work_group_strided_copy builtin v2
...
This is a simple implementation which just copies data synchronously.
v2:
- Use size_t.
llvm-svn: 219007
2014-10-03 19:49:37 +00:00
Tom Stellard
b5064f79ef
Implement wait_group_events builtin v2
...
This is a simple default implemetation which just calls barrier().
v2:
- Only call barrier() once.
llvm-svn: 219006
2014-10-03 19:49:34 +00:00
Aaron Watry
0d976ba497
atomic: Add generic atom[ic]_cmpxchg
...
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217918
2014-09-16 22:34:49 +00:00
Aaron Watry
025d79ad6c
atomic: Implement generic atom[ic]_xchg
...
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217917
2014-09-16 22:34:45 +00:00
Aaron Watry
7cfa12c2a5
atomic: Add generic atomic_min implementation
...
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217916
2014-09-16 22:34:41 +00:00
Aaron Watry
3f0a1a4c27
atomic: Add generic atom[ic]_xor
...
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217915
2014-09-16 22:34:36 +00:00
Aaron Watry
31e67d1cff
atomic: Add atom[ic]_or
...
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217914
2014-09-16 22:34:32 +00:00
Aaron Watry
cc68405761
atomics: Add generic atom[ic]_and
...
Not used yet.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217913
2014-09-16 22:34:28 +00:00
Aaron Watry
49614fbfd9
atomic: Add generic implementation of atom[ic]_max
...
Not used yet...
v2: Correct int/uint behavior
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217912
2014-09-16 22:34:24 +00:00
Aaron Watry
c9b88d32be
atomic: define extension functions for existing atomic implementations
...
We were missing the local versions of the atom_* before
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217911
2014-09-16 22:34:21 +00:00
Aaron Watry
947bdd059a
math: Add tan implementation
...
Uses the algorithm:
tan(x) = sin(x) / sqrt(1-sin^2(x))
An alternative is:
tan(x) = sin(x) / cos(x)
Which produces more verbose bitcode and longer assembly.
Either way, the generated bitcode seems pretty nasty and a more optimized
but still precise-enough solution is welcome.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217511
2014-09-10 15:43:35 +00:00
Aaron Watry
951ab64d19
math: Add asin implementation
...
asin(x) = atan2(x, sqrt( 1-x^2 ))
alternatively:
asin(x) = PI/2 - acos(x)
Use the atan2 implementation since it produces slightly shorter bitcode and
R600 machine code.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217510
2014-09-10 15:43:32 +00:00
Aaron Watry
268beab921
math: Add acos implementation
...
Passes the tests that were submitted to the piglit list
Tested on R600 (Pitcairn)
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217509
2014-09-10 15:43:29 +00:00
Jan Vesely
05a60b7ac3
add isordered builtin
...
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217247
2014-09-05 13:59:15 +00:00
Jan Vesely
63486c1f0e
add isunordered builtin
...
v2: remove trailing newline
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217246
2014-09-05 13:59:13 +00:00
Jan Vesely
41a0c491de
add islessgreater builtin
...
v2: remove trailing newline
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217245
2014-09-05 13:59:11 +00:00
Jan Vesely
369e20353c
add isnormal builtin
...
v2: simplify and remove isnan leftovers
remove trailing newline
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217244
2014-09-05 13:59:09 +00:00
Jan Vesely
a5a3b023b4
add isfinite builtin
...
v2: simplify and remove isinf leftovers
remove trailing newline
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217243
2014-09-05 13:59:06 +00:00
Tom Stellard
7a9e2c6879
Implement isinf builtin
...
llvm-svn: 217046
2014-09-03 15:55:40 +00:00
Tom Stellard
d8a73abfc3
Fix implementation of copysign
...
This was previously implemented with a macro and we were using
__builtin_copysign(), which takes double inputs for the float
version of copysign().
Reviewed-and-Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217045
2014-09-03 15:55:38 +00:00
Jan Vesely
ef513d392b
Implement generic mad_sat
...
v2: Fix trailing whitespace
Fix signed long overflow
improve comment
v3: fix typo
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 216923
2014-09-02 17:55:02 +00:00
Aaron Watry
9447097636
Revert "Implement generic mad_sat"
...
This reverts commit cf62eded8b623a1c10d3692d25e5882b7939f564.
I didn't mean to commit this... Jan has a v3 incoming
llvm-svn: 216322
2014-08-23 14:06:01 +00:00
Aaron Watry
6bfac7ae69
Implement generic mad_sat
...
v2: Fix trailing whitespace
Fix signed long overflow
improve comment
Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
llvm-svn: 216320
2014-08-23 14:04:33 +00:00
Tom Stellard
2ad4243bf7
Implement prefetch builtin
...
The default implementation is a no-op. Targets should override this
with their own implementations.
llvm-svn: 216127
2014-08-20 21:23:03 +00:00