Commit Graph

397 Commits

Author SHA1 Message Date
Jan Vesely 7ab2d0bdcd shared: Implement aligned vector stores (vstorea_half)
Float version passes newly posted piglit tests on turks, float and double pass on carrizo.
v2: scalar vstorea_half
v3: fix typo

Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 316291
2017-10-22 14:21:59 +00:00
Jan Vesely 12061c7125 shared: Implement aligned vector loads (vloada_half)
Passes newly posted piglits on turks and carrizo
v2: add scalar vloada_half
v3: fix typo

Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 316290
2017-10-22 14:21:56 +00:00
Jan Vesely c420b61b26 amdgcn: Add missing datalayout info to .ll files
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 316239
2017-10-20 21:10:18 +00:00
Jan Vesely 66b32ad9ad r600: Add missing datalayout to .ll files
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 316238
2017-10-20 21:00:31 +00:00
Jan Vesely 577c52b9c7 travis: enable checks of nvptx libraries
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315343
2017-10-10 18:10:25 +00:00
Jan Vesely 2601429bac travis: Enable external function call checks on llvm-{4,5}
Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315342
2017-10-10 18:10:24 +00:00
Jan Vesely 3d349ea98e Make image builtins r600/llvm-3.9 only
The implementation uses r600 sepcific intrinsics
LLVM-4 switched to _ro_t and _rw_t image types
Portions of the code can be moved back as more targets/llvm versions add image support

Reviewer: Aaron Watry
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315341
2017-10-10 18:10:21 +00:00
Jeroen Ketema 1364d268a4 Implement mem_fence on ptx
PTX does not differentiate between read and write fences. Hence, these a
lowered to a mem_fence call. The mem_fence function compiles to the
“member.cta” instruction, which commits all outstanding reads and writes
of a thread such that these become visible to all other threads in the same
CTA (i.e., work-group). The instruction does not differentiate between
global and local memory. Hence, the flags parameter is ignored, except
for deciding whether a “member.cta” instruction should be issued at all.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315235
2017-10-09 19:43:04 +00:00
Jeroen Ketema 4f5a3d5d6f Make ptx barrier work irrespective of the cl_mem_fence_flags
This generates a "bar.sync 0” instruction, which not only causes the
threads to wait, but does acts as a memory fence, as required by
OpenCL. The fence does not differentiate between local and global
memory. Unfortunately, there is no similar instruction which does
not include a memory fence. Hence, we cannot optimize the case
where neither CLK_LOCAL_MEM_FENCE nor CLK_GLOBAL_MEM_FENCE is
passed.

llvm-svn: 315228
2017-10-09 18:36:48 +00:00
Jan Vesely 3c51ae5bd9 travis: Make sure we report failure even if only earlier checked files fail
for loop would only report status of the last command
v2: return '1'
    call test instead of '['

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315193
2017-10-08 20:07:58 +00:00
Jan Vesely 136381dc38 check_external_calls.sh: Print number of calls in tested file.
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315192
2017-10-08 20:07:56 +00:00
Jan Vesely 80bb52ae75 ptx: Use __clc_nextafter to implement nextafter
using clang builtin results in external library call

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315191
2017-10-08 19:34:00 +00:00
Jan Vesely 1de1444d62 Do not include clc_nextafter header globally
Drop unused clc/math/clc_nextafter.h header

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315190
2017-10-08 19:33:58 +00:00
Jan Vesely 6a5c8ddb3a math/nextafter: Use custom declaration inc file
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315189
2017-10-08 19:33:55 +00:00
Jan Vesely 72be1cc0be math/binary_decl.inc: Do not declare mixed float/double functions
fmin/fmax only need vector/scalar mix

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315188
2017-10-08 19:33:53 +00:00
Jan Vesely beb6591753 ldexp: Fix double precision function return type
Fixes ~1200 external calls from nvtpx library.

Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315170
2017-10-08 06:56:14 +00:00
Jan Vesely 391305638c configure: Fix handling of directories with compats only source lists
Reviewer: Jeroen Ketema
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 315018
2017-10-05 20:16:28 +00:00
Jeroen Ketema 957151bd86 Add vload_half helpers for ptx
The removes the vload_half unresolved calls from the nvptx libraries.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314998
2017-10-05 18:17:40 +00:00
Jeroen Ketema feefb0870f Add vstore_half helpers for ptx
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314925
2017-10-04 19:07:48 +00:00
Jan Vesely a02d0e2c50 integer/sub_sat: Use clang builtin instead of llvm asm
reviewer: Tom Stellard

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314703
2017-10-02 18:39:03 +00:00
Jan Vesely 1964df8fad integer/add_sat: Use clang builtin instead of llvm asm
reviewer: Tom Stellard

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314702
2017-10-02 18:39:00 +00:00
Jan Vesely 943057a288 integer/clz: Use clang builtin instead of llvm asm
The generated llvm IR mostly identical. char/uchar case is a bit worse.

reviewer: Tom Stellard

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314701
2017-10-02 18:38:57 +00:00
Jeroen Ketema fe9fa89854 Let get_work_dim take exactly 0 arguments
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314634
2017-10-01 20:11:46 +00:00
Jeroen Ketema 17fdf263c5 Do no circularly define NULL
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314633
2017-10-01 20:10:14 +00:00
Jan Vesely 2b7fa1c6f6 Fix amdgcn-amdhsa on llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314548
2017-09-29 19:06:52 +00:00
Jan Vesely aee030f284 travis: Check built libraries on llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314547
2017-09-29 19:06:50 +00:00
Jan Vesely 8c8c287adf Add script to check for unresolved function calls
v2: add shell shebang
    improve error checks and reporting
v3: fix typo

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314546
2017-09-29 19:06:48 +00:00
Jan Vesely 41b1500db0 geometric: geometric functions are only supported for vector lengths <=4
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314545
2017-09-29 19:06:47 +00:00
Jan Vesely 8d08f01eff travis: add build using llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314544
2017-09-29 19:06:45 +00:00
Jan Vesely ce29e8cde1 Restore support for llvm-3.9
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Acked-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 314543
2017-09-29 19:06:41 +00:00
Jan Vesely 3bb50f6f7b Add missing HAVE_LLVM define to fix build with latest llvm
Broken since r314111

V2: pointed out by Jan Vesely
   - Use format() instead of % formating

Patch-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314261
2017-09-26 23:15:54 +00:00
Jan Vesely 1fa727d615 Rework atomic ops to use clang builtins rather than llvm asm
reviewer: Aaron Watry

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314112
2017-09-25 16:07:34 +00:00
Jan Vesely 760052047b prepare_builtins: Fix compile breakage with older LLVM
Fixes r314050

reviewer: Tom Stellard

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 314111
2017-09-25 16:04:37 +00:00
Reid Kleckner 3fc649cb76 [Support] Rename tool_output_file to ToolOutputFile, NFC
This class isn't similar to anything from the STL, so it shouldn't use
the STL naming conventions.

llvm-svn: 314050
2017-09-23 01:03:17 +00:00
Jan Vesely c9bbbe2403 Implement cl_khr_int64_extended_atomics builtins
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 313811
2017-09-20 20:42:19 +00:00
Jan Vesely 1c81f4b0e3 Implement cl_khr_int64_base_atomics builtins
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 313810
2017-09-20 20:42:14 +00:00
Jan Vesely d0320d5289 Add travis CI configuration file
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 313773
2017-09-20 17:28:58 +00:00
Aaron Watry e62f5fa64d Add native_recip(x) as ((1)/(x))
Signed-off-by: Aaron Watry <awatry@gmail.com>
Acked-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 313107
2017-09-13 01:40:25 +00:00
Aaron Watry 415a60f303 integer: Add popcount implementation using ctpop intrinsic
Also copy/modify the unary_intrin.inc from math/ to make the
intrinsic declaration somewhat reusable.

Passes CL CTS integer_ops/test_integer_ops popcount tests for CL 1.2

Tested-by on GCN 1.0 (Pitcairn)

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312854
2017-09-09 02:23:54 +00:00
Jan Vesely 285d2fb85c Implement vload_half{,n} and vload(half)
v2: add vload(half) as well
    make helpers amdgpu specific (NVPTX uses different private AS numbering)
    use clang builtin on clang >= 6

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312839
2017-09-08 23:59:00 +00:00
Jan Vesely 661ac03a1b vstore: Cleanup and add vstore(half)
Add missing undefs
Make helpers amdgpu specific (NVPTX uses different numbering for private AS)
Use clang builtins on clang >= 6

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312838
2017-09-08 23:58:57 +00:00
Jan Vesely b9dbaae3fb configure.py: Simplify compatibility sources
Just add the SOURCE_X.Y list to the list of sources if X.Y is the current llvm version.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
llvm-svn: 312837
2017-09-08 23:58:53 +00:00
Jan Vesely 3d1db3de74 amdgcn,waitcnt: Add datalayout info
This file is only compiled for GCN which all share the same layout

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 312493
2017-09-04 15:52:07 +00:00
Jan Vesely e337b30c7d r600: Cleanup barrier implementation.
We don't have memory fences for r600 so just call group barrier directly
Make sure that barrier is called even with 0 flags

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 312492
2017-09-04 15:52:05 +00:00
Jan Vesely 1796d590c1 Fixup clc.h comment
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312491
2017-09-04 15:52:03 +00:00
Aaron Watry 0bf96b1712 relational: Implement shuffle2 builtin
This was added in CL 1.1

Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in_dual_input

v2: Add half support to shuffle2
    Move shuffle2 to misc/

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312404
2017-09-02 02:23:28 +00:00
Aaron Watry 880f15dae6 relational: Implement shuffle builtin
This was added in CL 1.1

Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in

v2: Add half-precision support to shuffle when available.
    Move to misc/ and add section 6.12.12 to clc.h

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312403
2017-09-02 02:23:26 +00:00
Aaron Watry da8dfefd1c Add halfN types and enable fp16 when generating builtin declarations
Uses the same mechanism to enable fp16 as we use for fp64 when
processing clc.h

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312402
2017-09-02 02:23:16 +00:00
Jan Vesely 999b1d9426 amdgcn: rewrite barrier() using fence and clang __builtin_amdgcn_s_barrier
Specs require using fences when barrier() is invoked:
"The barrier function will either flush any variables stored in local memory
or queue a memory fence to ensure correct ordering of memory operations to local memory."
and
"The barrier function will queue a memory fence to ensure correct ordering
of memory operations to global memory."

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 311022
2017-08-16 17:09:00 +00:00
Jan Vesely 1977092dc3 amdgcn: Implement {read_,write_,}mem_fence builtin
v2: add more detailed comment about waitcnt instruction

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Aaron Watry <awatry@gmail.com>
Tested-by: Aaron Watry <awatry@gmail.com>
llvm-svn: 311021
2017-08-16 17:08:56 +00:00