Vector true is -1, not 1, which means we need to use the relational unary
macro instead of the normal unary builtin one.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213316
relational.h includes relational macros for defining functions which need to
return 1 for scalar true and -1 for vector true.
I believe that this is the only place that this behavior is required, so the
macro is placed at its lowest useful level (same directory as it is used in).
This also creates re-usable unary/binary declaration and floatn includes which
should simplify relational builtin declarations.
Mostly patterned off of include/math/[binary_decl|unary_decl|floatn].inc
but with required changes for relational functions.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 213315
The vector components were mistakenly using () instead of {}, which caused
all but the last vector component to be dropped on the floor.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jeroen Ketema <j.ketema@imperial.ac.uk>
llvm-svn: 211733
v2 Changes:
- use __builtin_signbit instead of shifting by hand
- significantly improve vector shuffling
- Works correctly now for signbit(float16) on radeonsi
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 211696
v2: - use quotes instead of <>
- add include to r600/lib/math/nextafter.c changed
Reviewed-by: Tom Stellard <tom@stellard.net>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 211576
v3: change __builtin_nanf() to __builtin_nanf("")
This doesn't work yet, but it was agreed to commit as-is with the logic
that "broken" is better than "completely missing" and this should be
fixed in clang.
v2: use __builtin_inff() and also add nan/huge_val definitions
Signed-off-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 211065
Use separate implementations instead of a macro
to ensure the constant multiplied with is of
higher precision.
v2: Use the correct formula, spotted by Dan Liew <daniel.liew@imperial.ac.uk>
Reviewed-by: Aaron Warty <awatry@gmail.com>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 210891
Some function definitions were using _CLC_DECL, which meant that they
weren't being marked as always_inline.
Reviewed-by and Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 193754
There are two implementations of nextafter():
1. Using clang's __builtin_nextafter. Clang replaces this builtin with
a call to nextafter which is part of libm. Therefore, this
implementation will only work for targets with an implementation of
libm (e.g. most CPU targets).
2. The other implementation is written in OpenCL C. This function is
known internally as __clc_nextafter and can be used by targets that
don't have access to libm.
llvm-svn: 192383
Everything except long/ulong is handled by just casting to the next larger type,
doing the math and then shifting/casting the result.
For 64-bit types, we break the high/low parts of each operand apart, and do
a FOIL-based multiplication.
v2:
Discard the stack-overflow implementation due to copyright concerns.
- The implementation is still FOIL-based, but discards the previous code.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188684
rhadd = (x+y+1)>>1
Implemented as:
(x>>1) + (y>>1) + ((x&1)|(y&1))
This prevents us having to do assembly addition and overflow detection
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188477
(x + y) >> 1 gets changed to:
(x>>1) + (y>>1) + (x&y&1)
Saves us having to do any llvm assembly and overflow checking in the addition.
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188476
Not hooked up to R600 yet due to current lack of support, at least on EG.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 188181
Reduces all vector upsamples down to its scalar components, so probably
not the most efficient thing in the world, but it does what the
spec says it needs to do.
Another possible implementation would be to convert/cast everything as
unsigned if necessary, upsample the input vectors, create the upsampled
value, and then cast back to signed if required.
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard at amd.com>
llvm-svn: 186691
The assembly optimizations were making unsafe assumptions about which address
spaces had which identifiers.
Also, fix vload/vstore with 64-bit pointers. This was broken previously on
Radeon SI.
This version still only has assembly versions of int/uint 2/4/8/16 for global
loads and stores on R600, but it does it in a way that would be very easily
extended to private/local/constant and could also be handled easily on other
architectures.
v2: 1) Leave v[load|store]_impl.ll in generic/lib
2) Remove vload_if.ll and vstore_if.ll interfaces
3) Fix address+offset calculations
3) Remove offset from assembly arg list
llvm-svn: 186416
This commit gets us back to pure CLC and fixes offset calculations.
The next commit will re-enable the assembly implementation for R600,
fix bugs related to 64-bit address spaces, and also fix the
incorrect assumption that address space identifiers are the same in
all architectures.
llvm-svn: 186415
libclc was defining and undefing GENTYPE and several other macros with
common names in its header files. This was preventing applications from
defining macros with identical names as command line arguments to the
compiler, because the definitions in the header files were masking the
macros defined as compiler arguements.
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 185838
The assembly should be generic, but at least currently R600 only supports
32-bit stores of [u]int1/4, and I believe that only global is well-supported.
R600 lowers the 8/16 component stores to multiple 4-component stores.
The unoptimized C versions of the other stuff is left in place.
Patch by: Aaron Watry
llvm-svn: 185009
The assembly should be generic, but at least currently R600 only supports
32-bit loads of int1/4, and I believe that only global is well-supported.
R600 lowers the 8/16 component vectors to multiple 4-bit loads.
The unoptimized C versions of the other stuff is left in place.
Patch by: Aaron Watry
llvm-svn: 185008
Squashed commit of the following:
commit a0df0a0e86c55c1bdc0b9c0f5a739e5adef4b056
Author: Aaron Watry <awatry@gmail.com>
Date: Mon Apr 15 18:42:04 2013 -0500
libclc: Rename clz.ll to clz_if.ll to ensure it gets built.
configure.py treats files that have the same name with the .cl and .ll
extensions as overriding eachother.
E.g. If you have clz.cl and clz.ll both specified to be built in the same
SOURCES file, only the first file listed will actually be built.
Since the contents of clz.ll were an interface that is implemented in
clz_impl.ll, rename clz.ll to clz_if.ll to make sure that the interface is
built.
commit 931b62bed05c58f737de625bd415af09571a6a5a
Author: Aaron Watry <awatry@gmail.com>
Date: Sat Apr 13 12:32:54 2013 -0500
libclc: llvm assembly implementation of clz
Untested... currently crashes in the same manner as add_sat.
commit 6ef0b7b0b6d2e5584086b4b9a9243743b2e0538f
Author: Aaron Watry <awatry@gmail.com>
Date: Sat Mar 23 12:35:27 2013 -0500
libclc: Add stub clz builtin
For scalar int/uint, attempt to use the clz llvm builtin.. for all others
return 0 until an actual implementation is finished.
Patch by: Aaron Watry
llvm-svn: 185004
For any GENTYPE that isn't scalar, we need to implement a mixed
vector/scalar version of clamp/max.
This depends on the min() patches I sent to the list a few minutes ago.
Patch by: Aaron Watry
llvm-svn: 185003
Checks if the current GENTYPE is scalar, and if not, then defines a separate
implementation of the function which casts the second arg to vector before
proceeding.
Patch by: Aaron Watry
llvm-svn: 185002