Commit Graph

253 Commits

Author SHA1 Message Date
Jan Vesely 1fa727d615 Rework atomic ops to use clang builtins rather than llvm asm
reviewer: Aaron Watry

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314112
2017-09-25 16:07:34 +00:00
Jan Vesely c9bbbe2403 Implement cl_khr_int64_extended_atomics builtins
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 313811
2017-09-20 20:42:19 +00:00
Jan Vesely 1c81f4b0e3 Implement cl_khr_int64_base_atomics builtins
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 313810
2017-09-20 20:42:14 +00:00
Aaron Watry e62f5fa64d Add native_recip(x) as ((1)/(x))
Signed-off-by: Aaron Watry <awatry@gmail.com>
Acked-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 313107
2017-09-13 01:40:25 +00:00
Aaron Watry 415a60f303 integer: Add popcount implementation using ctpop intrinsic
Also copy/modify the unary_intrin.inc from math/ to make the
intrinsic declaration somewhat reusable.

Passes CL CTS integer_ops/test_integer_ops popcount tests for CL 1.2

Tested-by on GCN 1.0 (Pitcairn)

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312854
2017-09-09 02:23:54 +00:00
Jan Vesely 285d2fb85c Implement vload_half{,n} and vload(half)
v2: add vload(half) as well
    make helpers amdgpu specific (NVPTX uses different private AS numbering)
    use clang builtin on clang >= 6

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312839
2017-09-08 23:59:00 +00:00
Jan Vesely 661ac03a1b vstore: Cleanup and add vstore(half)
Add missing undefs
Make helpers amdgpu specific (NVPTX uses different numbering for private AS)
Use clang builtins on clang >= 6

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312838
2017-09-08 23:58:57 +00:00
Jan Vesely 1796d590c1 Fixup clc.h comment
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312491
2017-09-04 15:52:03 +00:00
Aaron Watry 0bf96b1712 relational: Implement shuffle2 builtin
This was added in CL 1.1

Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in_dual_input

v2: Add half support to shuffle2
    Move shuffle2 to misc/

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312404
2017-09-02 02:23:28 +00:00
Aaron Watry 880f15dae6 relational: Implement shuffle builtin
This was added in CL 1.1

Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in

v2: Add half-precision support to shuffle when available.
    Move to misc/ and add section 6.12.12 to clc.h

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312403
2017-09-02 02:23:26 +00:00
Aaron Watry da8dfefd1c Add halfN types and enable fp16 when generating builtin declarations
Uses the same mechanism to enable fp16 as we use for fp64 when
processing clc.h

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312402
2017-09-02 02:23:16 +00:00
Jan Vesely 1977092dc3 amdgcn: Implement {read_,write_,}mem_fence builtin
v2: add more detailed comment about waitcnt instruction

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 311021
2017-08-16 17:08:56 +00:00
Jan Vesely 09f0a560e1 add __kernel_exec macros
also consolidate macros into one file, and rename to clcmacros.h

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 309358
2017-07-28 03:39:03 +00:00
Jan Vesely 2f2a3bc0dc generic: add missing get_work_dim include
Fixes few piglits since clang r304193

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 304556
2017-06-02 15:58:35 +00:00
Jan Vesely 9f7172965c math: Implement sinh function
mostly copied form amd_builtins

llvm-svn: 296233
2017-02-25 02:46:53 +00:00
Aaron Watry dfec3c8e95 math: Add native_tan as wrapper to tan
Trivially define native_tan as a redirect to tan.

If there are any targets with a native implementation, we can deal with it later.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <arsenm2@gmail.com>
llvm-svn: 295920
2017-02-23 01:46:57 +00:00
Jeroen Ketema ed98e8d099 Add the correct prefixes to the cl_khr_fp64 pragma
llvm-svn: 294915
2017-02-12 21:31:41 +00:00
Matt Arsenault 9df2b9781c math: Add native_rsqrt builtin function
Trivial define to rsqrt.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 294608
2017-02-09 18:39:26 +00:00
Aaron Watry c606efabb7 math: Add logb builtin
Ported from the amd-builtins branch.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 292335
2017-01-18 03:14:10 +00:00
Aaron Watry 900bd7eb7f math: Add expm1 builtin function
Ported from the amd-builtins branch.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
CC: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 292334
2017-01-18 03:13:37 +00:00
Jan Vesely 0a5aac3fc4 Provide vstore_half helper to workaround clc restrictions
clang won't accept half precision loads and stores without cl_khr_fp16 since r281904

llvm-svn: 282106
2016-09-21 20:15:55 +00:00
Aaron Watry af569547fa math: Implement tgamma
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281566
2016-09-15 00:17:34 +00:00
Aaron Watry e9009cdd21 math: Implement lgamma
Just use lgamma_r and ignore the value returned in the second argument

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281565
2016-09-15 00:17:31 +00:00
Aaron Watry 0ab07e1bde math: Implement lgamma_r
Ported from the amd-builtins branch, which is itself based on the
Sun Microsystems implementation.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281564
2016-09-15 00:17:28 +00:00
Aaron Watry f969413a82 Add ADDR_SPACE parameter to _CLC_V_V_VP_VECTORIZE
This macro is currently unused, but I plan to use it shortly.

The previous form did casts of pointers without an address space, which
doesn't work so well for CL 1.x.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 281563
2016-09-15 00:17:22 +00:00
Matt Arsenault fbfd828d2a Replace nextafter implementation
This one passes conformance.

llvm-svn: 280961
2016-09-08 16:37:56 +00:00
Jan Vesely eade17271a Avoid ambiguity in calling atom_add functions.
clang (since r280553) allows pointer casts in function overloads,
so we need to disambiguate the second argument.

clang might be smarter about overloads in the future
see https://reviews.llvm.org/D24113, but let's be safe in libclc anyway.

llvm-svn: 280871
2016-09-07 22:11:02 +00:00
Jan Vesely ad8672727c Implement vstore_half{,n}
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 278962
2016-08-17 20:02:11 +00:00
Jan Vesely 4c59714a52 Make min follow the OCL 1.0 specs
OpenCL 1.0: "Returns y if y < x, otherwise it returns x. If x *and* y
are infinite or NaN, the return values are undefined."

OpenCL 1.1+: "Returns y if y < x, otherwise it returns x. If x *or* y
are infinite or NaN, the return values are undefined."

The 1.0 version is stricter so use that one.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276704
2016-07-25 22:36:22 +00:00
Tom Stellard d835b3f1af Implement cbrt builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 276497
2016-07-22 23:45:15 +00:00
Tom Stellard 9cb070f96a Implement cosh builtin
This implementation was ported from the AMD builtin library
and has been tested with piglit, OpenCV, and the ocl conformance tests.

llvm-svn: 276496
2016-07-22 23:45:13 +00:00
Tom Stellard ff13926a60 geometric/floatn.inc: Add vec8 and vec16 types
llvm-svn: 276495
2016-07-22 23:45:11 +00:00
Jan Vesely a82e080b57 AMDGPU: Implement get_global_offset builtin
Also fix get_global_id to consider offset
No idea how to add this for ptx, so they are stuck with the old get_global_id
implementation.

v2: split to a separate patch

v3: Switch R600 to use implictarg.ptr

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 276443
2016-07-22 17:24:24 +00:00
Jan Vesely 3317f253de 64 bit integers are legal in full profile without an extension
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 273042
2016-06-17 20:30:41 +00:00
Jan Vesely 973c1fa5f5 math: Use single precision fmax in sp path
Fixes fdim piglit on Turks

v2: use CL fmax instead of __builtin

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom.stellard@amd.com>
llvm-svn: 269807
2016-05-17 19:44:01 +00:00
Jan Vesely c374cb76f4 math: Add erf ported from amd-builtins
The scalar float/double function bodies are a direct copy/paste,
aside from the removed (optional) code in float function body that
requires subnormals.

reviewers: jvesely

Patch by: Vedran Miletić <rivanvx@gmail.com>

llvm-svn: 268766
2016-05-06 18:02:30 +00:00
Aaron Watry 55a8e0fd6d math: Add fdim implementation
Based on the amd-builtin, but explicitly vectorized for all sizes (not just
float4), and includes a vectorized double implementation.

Passes piglit (float) tests on pitcairn.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 268708
2016-05-06 03:34:45 +00:00
Aaron Watry 09f3c99a86 math: Fix ilogb(double) return type
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 261714
2016-02-24 00:52:15 +00:00
Aaron Watry d6d0454231 math: Add ilogb ported from amd-builtins
The scalar float/double function bodies are a direct copy/paste
with usage of the CLC wrappers to vectorize them.

This commit also adds in the FP_ILOGB0 and FP_ILOGBNAN macros which are
equal to the results of ilogb(0.0f) and ilogb(float nan) respectively.

v2: Add FP_ILOGB0 and FP_ILOGBNAN definitions

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
v1 Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

llvm-svn: 261639
2016-02-23 14:43:09 +00:00
Jan Vesely 7fbb96b907 math: Fix log2 vectorization on non-fp64 hw
reviewer: tstellard
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260301
2016-02-09 22:17:42 +00:00
Aaron Watry 8872800eff math: Add frexp ported from amd-builtins
The float implementation is almost a direct port from the amd-builtins,
but instead of just having a scalar and float4 implementation, it has
a scalar and arbitrary width vector implementation.

The double scalar is also a direct port from AMD's builtin release.

The double vector implementation copies the logic in the float vector
implementation using the values from the double scalar version.

Both have been tested in piglit using tests sent to that project's
mailing list.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 260114
2016-02-08 17:07:21 +00:00
Tom Stellard 37d19875fa Implement modf math builtin
V2: use the reference implementation as suggested by Matt Arsenault

Patch By: Pavel Ondračka

llvm-svn: 258933
2016-01-27 14:52:10 +00:00
Tom Stellard a249f50970 Add _CLC_V_V_VP_VECTORIZE macro
Patch by: Pavel Ondračka

llvm-svn: 258932
2016-01-27 14:52:07 +00:00
Aaron Watry 23faa5a1f9 integer: remove explicit casts from _MIN definitions
The spec says (section 6.12.3, CL version 1.2):
  The macro names given in the following list must use the values
  specified. The values shall all be constant expressions suitable
  for use in #if preprocessing directives.

This commit addresses the second part of that statement.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
CC: Moritz Pflanzer <moritz.pflanzer14@imperial.ac.uk>
CC: Serge Martin <edb+libclc@sigluy.net>
llvm-svn: 249445
2015-10-06 19:12:12 +00:00
Niels Ole Salscheider f51df5ba8c Implement tanh builtin
This is a port from the AMD builtin library.

llvm-svn: 248780
2015-09-29 06:39:09 +00:00
Tom Stellard e2bab44ca9 Add sampler defines.
Patch by: Zoltan Gilian

llvm-svn: 248163
2015-09-21 14:59:58 +00:00
Tom Stellard 50dfd44577 Add image attribute defines.
Patch by: Zoltan Gilian

llvm-svn: 248162
2015-09-21 14:59:57 +00:00
Tom Stellard a59fd49ba4 r600: Add image writing builtins.
Patch by: Zoltan Gilian

llvm-svn: 248161
2015-09-21 14:59:56 +00:00
Tom Stellard 9a7d4a940f r600: Add image reading builtins.
Patch by: Zoltan Gilian

llvm-svn: 248160
2015-09-21 14:59:54 +00:00
Tom Stellard ccc0ec1ddb Add image attribute getter builtins
Added get_image_* OpenCL builtins to the headers.
Added implementation to the r600 target.

Patch by: Zoltan Gilian

llvm-svn: 248159
2015-09-21 14:47:53 +00:00