This is a generic implementation which just calls sqrt. Targets should
override this if they want a faster implementation.
v2:
- Alphabetize SOURCES
llvm-svn: 232965
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.
v2:
- Remove unnecessary copyright.
llvm-svn: 232964
We need to reinterpret float/double types as uint/ulong in order to
perform the bitwise operations.
This has been tested with piglit, OpenCV, and the ocl conformance tests.
v2:
- Use vector operations rather than splitting vectors into scalar
components.
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 231373
It has been part of the common functions since 1.0
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 231137
Ported from the libclc/amd-builtins branch
v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4
Add cospi(double) implementation instead of using llvm.cos
Notes:
The sincosD_piby4.h file is mostly the same as the builtin implementation
released by AMD. The inline attribute declaration is changed, and M_PI is
used instead of a constant double. Otherwise, the only difference is that
the header explicitly enables the fp64 pragma.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
CC: Tom Stellard <tom@stellard.net>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 230641
Some functions are implemented using hand-written LLVM IR, and
LLVM assembly format is allowed to change between versions, so we
should require a specific version of LLVM.
llvm-svn: 225041
Including a standard or system header isn't allowed in OpenCL.
The type "size_t" needs to be explicitely defined now.
v2: Use __SIZE_TYPE__ instead of unsigned int.
v3: Define ptrdiff_t and NULL.
Patch-by: Jean-Sébastien Pédron
Reviewed-by: Jeroen Ketema
Reviewed-by: Jan Vesely
llvm-svn: 222235
v2: Fix function declaration
Add range metadata to r600 implementation
v3: change prefix to AMDGPU
Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219793
This is a simple implementation which just copies data synchronously.
v2:
- Use size_t.
v3:
- Fix possible race condition by splitting the copy among multiple
work items.
llvm-svn: 219008
We were missing the local versions of the atom_* before
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217911
Uses the algorithm:
tan(x) = sin(x) / sqrt(1-sin^2(x))
An alternative is:
tan(x) = sin(x) / cos(x)
Which produces more verbose bitcode and longer assembly.
Either way, the generated bitcode seems pretty nasty and a more optimized
but still precise-enough solution is welcome.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217511
Passes the tests that were submitted to the piglit list
Tested on R600 (Pitcairn)
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217509