Commit Graph

219 Commits

Author SHA1 Message Date
Aaron Watry f89bcca0b7 libclc/math: Add cospi
Ported from the libclc/amd-builtins branch

v2: Rename sincos_f_piby4 to __libclc__sincosf_piby4
    Add cospi(double) implementation instead of using llvm.cos

Notes:
The sincosD_piby4.h file is mostly the same as the builtin implementation
released by AMD. The inline attribute declaration is changed, and M_PI is
used instead of a constant double. Otherwise, the only difference is that
the header explicitly enables the fp64 pragma.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
CC: Tom Stellard <tom@stellard.net>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>
llvm-svn: 230641
2015-02-26 15:42:00 +00:00
Jan Vesely 51702e6e75 Implement log10
v2: Use constant and multiplication instead of division
v3: Use hex constants

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 227585
2015-01-30 18:00:34 +00:00
Tom Stellard 0f39721261 Use amdgcn triple for SI+ GPUs
llvm-svn: 225296
2015-01-06 20:42:12 +00:00
Tom Stellard 24ea64e050 r600: get_work_dim: Update metadata syntax for LLVM 3.6
llvm-svn: 225042
2014-12-31 15:27:59 +00:00
Tom Stellard 1d77071e0d Require LLVM 3.6 and bump version to 0.1.0
Some functions are implemented using hand-written LLVM IR, and
LLVM assembly format is allowed to change between versions, so we
should require a specific version of LLVM.

llvm-svn: 225041
2014-12-31 15:27:53 +00:00
Jeroen Ketema c9526139bc Remove wrong semi-colons
Patch by Alastair Donaldson

llvm-svn: 224568
2014-12-19 09:18:23 +00:00
Jeroen Ketema 7a22aebbda Don't include <stddef.h>
Including a standard or system header isn't allowed in OpenCL.

The type "size_t" needs to be explicitely defined now.

v2: Use __SIZE_TYPE__ instead of unsigned int.
v3: Define ptrdiff_t and NULL.

Patch-by: Jean-Sébastien Pédron
Reviewed-by: Jeroen Ketema
Reviewed-by: Jan Vesely
llvm-svn: 222235
2014-11-18 14:19:27 +00:00
NAKAMURA Takumi 729be14435 Prune CRLF.
llvm-svn: 220678
2014-10-27 12:37:26 +00:00
Jan Vesely ae50c89589 r600: Fix get_work_dim range metadata
Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 220388
2014-10-22 14:32:53 +00:00
Jan Vesely 260827caa2 r600: Use llvm intrinsic to read work dimension information
v2: Fix function declaration
    Add range metadata to r600 implementation
v3: change prefix to AMDGPU

Reviewed-by: Tom Stellard <tom@stellard.net>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 219793
2014-10-15 15:08:06 +00:00
Tom Stellard bf9f76fbe0 Implement log1p builtin
llvm-svn: 219230
2014-10-07 20:22:42 +00:00
Jan Vesely 8f64c3d842 Implement fmod
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 219087
2014-10-05 20:24:52 +00:00
Tom Stellard 081e778d22 Implement async_work_group_copy builtin v3
This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.

v3:
  - Fix possible race condition by splitting the copy among multiple
    work items.

llvm-svn: 219008
2014-10-03 19:49:39 +00:00
Tom Stellard ed5bbfdb1b Implement async_work_group_strided_copy builtin v2
This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.

llvm-svn: 219007
2014-10-03 19:49:37 +00:00
Tom Stellard b5064f79ef Implement wait_group_events builtin v2
This is a simple default implemetation which just calls barrier().

v2:
  - Only call barrier() once.

llvm-svn: 219006
2014-10-03 19:49:34 +00:00
Jeroen Ketema 87d2ca57d7 Remove more redundant semi-colons
llvm-svn: 218039
2014-09-18 09:23:40 +00:00
Aaron Watry db770b5bb5 atomic: undef macros that are included from atomic_decl.inc
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
llvm-svn: 217958
2014-09-17 15:27:39 +00:00
Jeroen Ketema 839b8a62d9 Remove redundant semi-colons
llvm-svn: 217954
2014-09-17 14:40:48 +00:00
Aaron Watry f4133b8a10 R600: Map Address spaces for atomic_cmpxchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217925
2014-09-16 22:34:59 +00:00
Aaron Watry e210cae126 R600: Map address spaces for atomic_xchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217924
2014-09-16 22:34:58 +00:00
Aaron Watry 0545fa3fb0 R600: Map address spaces for atomic_min
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217923
2014-09-16 22:34:56 +00:00
Aaron Watry dd754f4b33 R600: Map address spaces for atomic_xor
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217922
2014-09-16 22:34:55 +00:00
Aaron Watry ea32a57060 R600: Map addr spaces and use atomic_max
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217921
2014-09-16 22:34:53 +00:00
Aaron Watry 5ab82be926 R600: Map address spaces for atomic_or
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217920
2014-09-16 22:34:52 +00:00
Aaron Watry 348db3c666 R600: Map atomic_and address spaces
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217919
2014-09-16 22:34:51 +00:00
Aaron Watry 0d976ba497 atomic: Add generic atom[ic]_cmpxchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217918
2014-09-16 22:34:49 +00:00
Aaron Watry 025d79ad6c atomic: Implement generic atom[ic]_xchg
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217917
2014-09-16 22:34:45 +00:00
Aaron Watry 7cfa12c2a5 atomic: Add generic atomic_min implementation
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217916
2014-09-16 22:34:41 +00:00
Aaron Watry 3f0a1a4c27 atomic: Add generic atom[ic]_xor
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217915
2014-09-16 22:34:36 +00:00
Aaron Watry 31e67d1cff atomic: Add atom[ic]_or
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217914
2014-09-16 22:34:32 +00:00
Aaron Watry cc68405761 atomics: Add generic atom[ic]_and
Not used yet.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217913
2014-09-16 22:34:28 +00:00
Aaron Watry 49614fbfd9 atomic: Add generic implementation of atom[ic]_max
Not used yet...

v2: Correct int/uint behavior

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217912
2014-09-16 22:34:24 +00:00
Aaron Watry c9b88d32be atomic: define extension functions for existing atomic implementations
We were missing the local versions of the atom_* before

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 217911
2014-09-16 22:34:21 +00:00
Aaron Watry 947bdd059a math: Add tan implementation
Uses the algorithm:
tan(x) = sin(x) / sqrt(1-sin^2(x))

An alternative is:
tan(x) = sin(x) / cos(x)

Which produces more verbose bitcode and longer assembly.

Either way, the generated bitcode seems pretty nasty and a more optimized
but still precise-enough solution is welcome.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217511
2014-09-10 15:43:35 +00:00
Aaron Watry 951ab64d19 math: Add asin implementation
asin(x) = atan2(x, sqrt( 1-x^2 ))

alternatively:
asin(x) = PI/2 - acos(x)

Use the atan2 implementation since it produces slightly shorter bitcode and
R600 machine code.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217510
2014-09-10 15:43:32 +00:00
Aaron Watry 268beab921 math: Add acos implementation
Passes the tests that were submitted to the piglit list

Tested on R600 (Pitcairn)

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 217509
2014-09-10 15:43:29 +00:00
Jan Vesely 05a60b7ac3 add isordered builtin
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217247
2014-09-05 13:59:15 +00:00
Jan Vesely 63486c1f0e add isunordered builtin
v2: remove trailing newline

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217246
2014-09-05 13:59:13 +00:00
Jan Vesely 41a0c491de add islessgreater builtin
v2: remove trailing newline

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217245
2014-09-05 13:59:11 +00:00
Jan Vesely 369e20353c add isnormal builtin
v2: simplify and remove isnan leftovers
    remove trailing newline

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217244
2014-09-05 13:59:09 +00:00
Jan Vesely a5a3b023b4 add isfinite builtin
v2: simplify and remove isinf leftovers
    remove trailing newline

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217243
2014-09-05 13:59:06 +00:00
Tom Stellard 7a9e2c6879 Implement isinf builtin
llvm-svn: 217046
2014-09-03 15:55:40 +00:00
Tom Stellard d8a73abfc3 Fix implementation of copysign
This was previously implemented with a macro and we were using
__builtin_copysign(), which takes double inputs for the float
version of copysign().

Reviewed-and-Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 217045
2014-09-03 15:55:38 +00:00
Jan Vesely ef513d392b Implement generic mad_sat
v2: Fix trailing whitespace
    Fix signed long overflow
    improve comment

v3: fix typo

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 216923
2014-09-02 17:55:02 +00:00
Jan Vesely 62496142d5 configure: Add rpath to prepare-builtins util
v2: use space instead of '=' to make Mac happy

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
llvm-svn: 216922
2014-09-02 17:54:59 +00:00
Michel Danzer 7b77ab7b2c Fix build against LLVM SVN >= r216488
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 216654
2014-08-28 06:19:37 +00:00
Michel Danzer a10b492ce3 Fix build against LLVM SVN >= r216393
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 216653
2014-08-28 06:19:33 +00:00
Aaron Watry 9447097636 Revert "Implement generic mad_sat"
This reverts commit cf62eded8b623a1c10d3692d25e5882b7939f564.

I didn't mean to commit this...  Jan has a v3 incoming

llvm-svn: 216322
2014-08-23 14:06:01 +00:00
Aaron Watry a4fdda01b8 Add int3/uint3 to integer-gentype.inc
These were missing and caused mad24/mul24 with int3/uint3 arg type to fail

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 216321
2014-08-23 14:04:36 +00:00
Aaron Watry 6bfac7ae69 Implement generic mad_sat
v2: Fix trailing whitespace
    Fix signed long overflow
    improve comment

Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
llvm-svn: 216320
2014-08-23 14:04:33 +00:00