The implementation uses r600 sepcific intrinsics
LLVM-4 switched to _ro_t and _rw_t image types
Portions of the code can be moved back as more targets/llvm versions add image support
Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315341
The generated llvm IR mostly identical. char/uchar case is a bit worse.
reviewer: Tom Stellard
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314701
Also copy/modify the unary_intrin.inc from math/ to make the
intrinsic declaration somewhat reusable.
Passes CL CTS integer_ops/test_integer_ops popcount tests for CL 1.2
Tested-by on GCN 1.0 (Pitcairn)
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312854
v2: add vload(half) as well
make helpers amdgpu specific (NVPTX uses different private AS numbering)
use clang builtin on clang >= 6
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312839
Add missing undefs
Make helpers amdgpu specific (NVPTX uses different numbering for private AS)
Use clang builtins on clang >= 6
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312838
This was added in CL 1.1
Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in_dual_input
v2: Add half support to shuffle2
Move shuffle2 to misc/
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312404
This was added in CL 1.1
Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in
v2: Add half-precision support to shuffle when available.
Move to misc/ and add section 6.12.12 to clc.h
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312403
Uses the same mechanism to enable fp16 as we use for fp64 when
processing clc.h
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312402
also consolidate macros into one file, and rename to clcmacros.h
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 309358
Trivially define native_tan as a redirect to tan.
If there are any targets with a native implementation, we can deal with it later.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
llvm-svn: 295920
Ported from the amd-builtins branch.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 292335
Ported from the amd-builtins branch.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 292334
Just use lgamma_r and ignore the value returned in the second argument
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281565
Ported from the amd-builtins branch, which is itself based on the
Sun Microsystems implementation.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281564
This macro is currently unused, but I plan to use it shortly.
The previous form did casts of pointers without an address space, which
doesn't work so well for CL 1.x.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281563
clang (since r280553) allows pointer casts in function overloads,
so we need to disambiguate the second argument.
clang might be smarter about overloads in the future
see https://reviews.llvm.org/D24113, but let's be safe in libclc anyway.
llvm-svn: 280871
OpenCL 1.0: "Returns y if y < x, otherwise it returns x. If x *and* y
are infinite or NaN, the return values are undefined."
OpenCL 1.1+: "Returns y if y < x, otherwise it returns x. If x *or* y
are infinite or NaN, the return values are undefined."
The 1.0 version is stricter so use that one.
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276704
Also fix get_global_id to consider offset
No idea how to add this for ptx, so they are stuck with the old get_global_id
implementation.
v2: split to a separate patch
v3: Switch R600 to use implictarg.ptr
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276443
Fixes fdim piglit on Turks
v2: use CL fmax instead of __builtin
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom.stellard@amd.com>
llvm-svn: 269807
The scalar float/double function bodies are a direct copy/paste,
aside from the removed (optional) code in float function body that
requires subnormals.
reviewers: jvesely
Patch by: Vedran Miletić <rivanvx@gmail.com>
llvm-svn: 268766
Based on the amd-builtin, but explicitly vectorized for all sizes (not just
float4), and includes a vectorized double implementation.
Passes piglit (float) tests on pitcairn.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 268708
The scalar float/double function bodies are a direct copy/paste
with usage of the CLC wrappers to vectorize them.
This commit also adds in the FP_ILOGB0 and FP_ILOGBNAN macros which are
equal to the results of ilogb(0.0f) and ilogb(float nan) respectively.
v2: Add FP_ILOGB0 and FP_ILOGBNAN definitions
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
v1 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 261639