llvm-project

Commit Graph

Author	SHA1	Message	Date
Jan Vesely	7ab2d0bdcd	shared: Implement aligned vector stores (vstorea_half) Float version passes newly posted piglit tests on turks, float and double pass on carrizo. v2: scalar vstorea_half v3: fix typo Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 316291	2017-10-22 14:21:59 +00:00
Jan Vesely	12061c7125	shared: Implement aligned vector loads (vloada_half) Passes newly posted piglits on turks and carrizo v2: add scalar vloada_half v3: fix typo Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 316290	2017-10-22 14:21:56 +00:00
Jan Vesely	c420b61b26	amdgcn: Add missing datalayout info to .ll files Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-by: Aaron Watry <awatry@gmail.com> llvm-svn: 316239	2017-10-20 21:10:18 +00:00
Jan Vesely	66b32ad9ad	r600: Add missing datalayout to .ll files Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-by: Aaron Watry <awatry@gmail.com> llvm-svn: 316238	2017-10-20 21:00:31 +00:00
Jan Vesely	577c52b9c7	travis: enable checks of nvptx libraries Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315343	2017-10-10 18:10:25 +00:00
Jan Vesely	2601429bac	travis: Enable external function call checks on llvm-{4,5} Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315342	2017-10-10 18:10:24 +00:00
Jan Vesely	3d349ea98e	Make image builtins r600/llvm-3.9 only The implementation uses r600 sepcific intrinsics LLVM-4 switched to _ro_t and _rw_t image types Portions of the code can be moved back as more targets/llvm versions add image support Reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315341	2017-10-10 18:10:21 +00:00
Jeroen Ketema	1364d268a4	Implement mem_fence on ptx PTX does not differentiate between read and write fences. Hence, these a lowered to a mem_fence call. The mem_fence function compiles to the “member.cta” instruction, which commits all outstanding reads and writes of a thread such that these become visible to all other threads in the same CTA (i.e., work-group). The instruction does not differentiate between global and local memory. Hence, the flags parameter is ignored, except for deciding whether a “member.cta” instruction should be issued at all. Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315235	2017-10-09 19:43:04 +00:00
Jeroen Ketema	4f5a3d5d6f	Make ptx barrier work irrespective of the cl_mem_fence_flags This generates a "bar.sync 0” instruction, which not only causes the threads to wait, but does acts as a memory fence, as required by OpenCL. The fence does not differentiate between local and global memory. Unfortunately, there is no similar instruction which does not include a memory fence. Hence, we cannot optimize the case where neither CLK_LOCAL_MEM_FENCE nor CLK_GLOBAL_MEM_FENCE is passed. llvm-svn: 315228	2017-10-09 18:36:48 +00:00
Jan Vesely	3c51ae5bd9	travis: Make sure we report failure even if only earlier checked files fail for loop would only report status of the last command v2: return '1' call test instead of '[' Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315193	2017-10-08 20:07:58 +00:00
Jan Vesely	136381dc38	check_external_calls.sh: Print number of calls in tested file. Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315192	2017-10-08 20:07:56 +00:00
Jan Vesely	80bb52ae75	ptx: Use __clc_nextafter to implement nextafter using clang builtin results in external library call Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315191	2017-10-08 19:34:00 +00:00
Jan Vesely	1de1444d62	Do not include clc_nextafter header globally Drop unused clc/math/clc_nextafter.h header Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315190	2017-10-08 19:33:58 +00:00
Jan Vesely	6a5c8ddb3a	math/nextafter: Use custom declaration inc file Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315189	2017-10-08 19:33:55 +00:00
Jan Vesely	72be1cc0be	math/binary_decl.inc: Do not declare mixed float/double functions fmin/fmax only need vector/scalar mix Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315188	2017-10-08 19:33:53 +00:00
Jan Vesely	beb6591753	ldexp: Fix double precision function return type Fixes ~1200 external calls from nvtpx library. Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315170	2017-10-08 06:56:14 +00:00
Jan Vesely	391305638c	configure: Fix handling of directories with compats only source lists Reviewer: Jeroen Ketema Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 315018	2017-10-05 20:16:28 +00:00
Jeroen Ketema	957151bd86	Add vload_half helpers for ptx The removes the vload_half unresolved calls from the nvptx libraries. Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314998	2017-10-05 18:17:40 +00:00
Jeroen Ketema	feefb0870f	Add vstore_half helpers for ptx Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314925	2017-10-04 19:07:48 +00:00
Jan Vesely	a02d0e2c50	integer/sub_sat: Use clang builtin instead of llvm asm reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314703	2017-10-02 18:39:03 +00:00
Jan Vesely	1964df8fad	integer/add_sat: Use clang builtin instead of llvm asm reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314702	2017-10-02 18:39:00 +00:00
Jan Vesely	943057a288	integer/clz: Use clang builtin instead of llvm asm The generated llvm IR mostly identical. char/uchar case is a bit worse. reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314701	2017-10-02 18:38:57 +00:00
Jeroen Ketema	fe9fa89854	Let get_work_dim take exactly 0 arguments Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314634	2017-10-01 20:11:46 +00:00
Jeroen Ketema	17fdf263c5	Do no circularly define NULL Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314633	2017-10-01 20:10:14 +00:00
Jan Vesely	2b7fa1c6f6	Fix amdgcn-amdhsa on llvm-3.9 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314548	2017-09-29 19:06:52 +00:00
Jan Vesely	aee030f284	travis: Check built libraries on llvm-3.9 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314547	2017-09-29 19:06:50 +00:00
Jan Vesely	8c8c287adf	Add script to check for unresolved function calls v2: add shell shebang improve error checks and reporting v3: fix typo Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314546	2017-09-29 19:06:48 +00:00
Jan Vesely	41b1500db0	geometric: geometric functions are only supported for vector lengths <=4 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314545	2017-09-29 19:06:47 +00:00
Jan Vesely	8d08f01eff	travis: add build using llvm-3.9 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314544	2017-09-29 19:06:45 +00:00
Jan Vesely	ce29e8cde1	Restore support for llvm-3.9 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-by: Aaron Watry <awatry@gmail.com> llvm-svn: 314543	2017-09-29 19:06:41 +00:00
Jan Vesely	3bb50f6f7b	Add missing HAVE_LLVM define to fix build with latest llvm Broken since r314111 V2: pointed out by Jan Vesely - Use format() instead of % formating Patch-by: Pavel Ondračka <pavel.ondracka@gmail.com> Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314261	2017-09-26 23:15:54 +00:00
Jan Vesely	1fa727d615	Rework atomic ops to use clang builtins rather than llvm asm reviewer: Aaron Watry Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314112	2017-09-25 16:07:34 +00:00
Jan Vesely	760052047b	prepare_builtins: Fix compile breakage with older LLVM Fixes r314050 reviewer: Tom Stellard Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 314111	2017-09-25 16:04:37 +00:00
Reid Kleckner	3fc649cb76	[Support] Rename tool_output_file to ToolOutputFile, NFC This class isn't similar to anything from the STL, so it shouldn't use the STL naming conventions. llvm-svn: 314050	2017-09-23 01:03:17 +00:00
Jan Vesely	c9bbbe2403	Implement cl_khr_int64_extended_atomics builtins Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 313811	2017-09-20 20:42:19 +00:00
Jan Vesely	1c81f4b0e3	Implement cl_khr_int64_base_atomics builtins Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 313810	2017-09-20 20:42:14 +00:00
Jan Vesely	d0320d5289	Add travis CI configuration file Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 313773	2017-09-20 17:28:58 +00:00
Aaron Watry	e62f5fa64d	Add native_recip(x) as ((1)/(x)) Signed-off-by: Aaron Watry <awatry@gmail.com> Acked-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 313107	2017-09-13 01:40:25 +00:00
Aaron Watry	415a60f303	integer: Add popcount implementation using ctpop intrinsic Also copy/modify the unary_intrin.inc from math/ to make the intrinsic declaration somewhat reusable. Passes CL CTS integer_ops/test_integer_ops popcount tests for CL 1.2 Tested-by on GCN 1.0 (Pitcairn) Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312854	2017-09-09 02:23:54 +00:00
Jan Vesely	285d2fb85c	Implement vload_half{,n} and vload(half) v2: add vload(half) as well make helpers amdgpu specific (NVPTX uses different private AS numbering) use clang builtin on clang >= 6 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312839	2017-09-08 23:59:00 +00:00
Jan Vesely	661ac03a1b	vstore: Cleanup and add vstore(half) Add missing undefs Make helpers amdgpu specific (NVPTX uses different numbering for private AS) Use clang builtins on clang >= 6 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312838	2017-09-08 23:58:57 +00:00
Jan Vesely	b9dbaae3fb	configure.py: Simplify compatibility sources Just add the SOURCE_X.Y list to the list of sources if X.Y is the current llvm version. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tstellar@redhat.com> llvm-svn: 312837	2017-09-08 23:58:53 +00:00
Jan Vesely	3d1db3de74	amdgcn,waitcnt: Add datalayout info This file is only compiled for GCN which all share the same layout Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 312493	2017-09-04 15:52:07 +00:00
Jan Vesely	e337b30c7d	r600: Cleanup barrier implementation. We don't have memory fences for r600 so just call group barrier directly Make sure that barrier is called even with 0 flags Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 312492	2017-09-04 15:52:05 +00:00
Jan Vesely	1796d590c1	Fixup clc.h comment Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312491	2017-09-04 15:52:03 +00:00
Aaron Watry	0bf96b1712	relational: Implement shuffle2 builtin This was added in CL 1.1 Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via: test_conformance/relationals/test_relationals shuffle_built_in_dual_input v2: Add half support to shuffle2 Move shuffle2 to misc/ Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312404	2017-09-02 02:23:28 +00:00
Aaron Watry	880f15dae6	relational: Implement shuffle builtin This was added in CL 1.1 Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via: test_conformance/relationals/test_relationals shuffle_built_in v2: Add half-precision support to shuffle when available. Move to misc/ and add section 6.12.12 to clc.h Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312403	2017-09-02 02:23:26 +00:00
Aaron Watry	da8dfefd1c	Add halfN types and enable fp16 when generating builtin declarations Uses the same mechanism to enable fp16 as we use for fp64 when processing clc.h Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 312402	2017-09-02 02:23:16 +00:00
Jan Vesely	999b1d9426	amdgcn: rewrite barrier() using fence and clang __builtin_amdgcn_s_barrier Specs require using fences when barrier() is invoked: "The barrier function will either flush any variables stored in local memory or queue a memory fence to ensure correct ordering of memory operations to local memory." and "The barrier function will queue a memory fence to ensure correct ordering of memory operations to global memory." Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 311022	2017-08-16 17:09:00 +00:00
Jan Vesely	1977092dc3	amdgcn: Implement {read_,write_,}mem_fence builtin v2: add more detailed comment about waitcnt instruction Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 311021	2017-08-16 17:08:56 +00:00
Jan Vesely	7fc4c79fa5	configure.py: Drop explicit import of int builtin I can't reproduce the error that made me add this. Reported-by: Kim Gräsman <kim.grasman@gmail.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Kim Gräsman <kim.grasman@gmail.com> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 310968	2017-08-15 22:24:05 +00:00
Jan Vesely	a4a20cd2f3	configure.py: Make python3 friendly mostly prints and exceptions. Few behavioral changes are documented in the text Generated Makefile is identical between python2 and python3 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 309820	2017-08-02 15:00:59 +00:00
Jan Vesely	09f0a560e1	add __kernel_exec macros also consolidate macros into one file, and rename to clcmacros.h Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 309358	2017-07-28 03:39:03 +00:00
Jan Vesely	2f2a3bc0dc	generic: add missing get_work_dim include Fixes few piglits since clang r304193 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Aaron Watry <awatry@gmail.com> llvm-svn: 304556	2017-06-02 15:58:35 +00:00
Jan Vesely	9f7172965c	math: Implement sinh function mostly copied form amd_builtins llvm-svn: 296233	2017-02-25 02:46:53 +00:00
Jan Vesely	c3868c8f8d	.gitignore: Ignore amdgcn-mesa object directory llvm-svn: 296164	2017-02-24 20:32:18 +00:00
Aaron Watry	dfec3c8e95	math: Add native_tan as wrapper to tan Trivially define native_tan as a redirect to tan. If there are any targets with a native implementation, we can deal with it later. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <arsenm2@gmail.com> llvm-svn: 295920	2017-02-23 01:46:57 +00:00
Jeroen Ketema	80d2e8ffc1	Move BufferPtr into the block where it it being used The previous location outside the block would crash prepare-builtins when no the builtins file accidentially not passed on the command line. llvm-svn: 294916	2017-02-12 21:33:49 +00:00
Jeroen Ketema	ed98e8d099	Add the correct prefixes to the cl_khr_fp64 pragma llvm-svn: 294915	2017-02-12 21:31:41 +00:00
Matt Arsenault	9df2b9781c	math: Add native_rsqrt builtin function Trivial define to rsqrt. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 294608	2017-02-09 18:39:26 +00:00
Aaron Watry	c606efabb7	math: Add logb builtin Ported from the amd-builtins branch. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 292335	2017-01-18 03:14:10 +00:00
Aaron Watry	900bd7eb7f	math: Add expm1 builtin function Ported from the amd-builtins branch. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <Matthew.Arsenault@amd.com> CC: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 292334	2017-01-18 03:13:37 +00:00
Tom Stellard	d83eb34ee7	Fix build since r286752. llvm-svn: 286839	2016-11-14 16:06:33 +00:00
Tom Stellard	088faab429	Fix build since llvm r286566 and require at least llvm 4.0 llvm-svn: 286634	2016-11-11 21:34:47 +00:00
Jan Vesely	0a5aac3fc4	Provide vstore_half helper to workaround clc restrictions clang won't accept half precision loads and stores without cl_khr_fp16 since r281904 llvm-svn: 282106	2016-09-21 20:15:55 +00:00
Tom Stellard	6b195ece57	configure: Add amdgcn-mesa-mesa3d target llvm-svn: 281793	2016-09-16 22:43:33 +00:00
Tom Stellard	f19cf403c4	amdgcn-amdhsa: Add get_num_groups implementation llvm-svn: 281792	2016-09-16 22:43:31 +00:00
Tom Stellard	e7ad23bad3	amdgcn-amdhsa: Add get_global_size() implementation llvm-svn: 281791	2016-09-16 22:43:29 +00:00
Aaron Watry	af569547fa	math: Implement tgamma Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281566	2016-09-15 00:17:34 +00:00
Aaron Watry	e9009cdd21	math: Implement lgamma Just use lgamma_r and ignore the value returned in the second argument Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281565	2016-09-15 00:17:31 +00:00
Aaron Watry	0ab07e1bde	math: Implement lgamma_r Ported from the amd-builtins branch, which is itself based on the Sun Microsystems implementation. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281564	2016-09-15 00:17:28 +00:00
Aaron Watry	f969413a82	Add ADDR_SPACE parameter to _CLC_V_V_VP_VECTORIZE This macro is currently unused, but I plan to use it shortly. The previous form did casts of pointers without an address space, which doesn't work so well for CL 1.x. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 281563	2016-09-15 00:17:22 +00:00
Matt Arsenault	fbfd828d2a	Replace nextafter implementation This one passes conformance. llvm-svn: 280961	2016-09-08 16:37:56 +00:00
Jan Vesely	eade17271a	Avoid ambiguity in calling atom_add functions. clang (since r280553) allows pointer casts in function overloads, so we need to disambiguate the second argument. clang might be smarter about overloads in the future see https://reviews.llvm.org/D24113, but let's be safe in libclc anyway. llvm-svn: 280871	2016-09-07 22:11:02 +00:00
Niels Ole Salscheider	63f71057c0	configure.py: Add polaris10 and polaris11 llvm-svn: 280121	2016-08-30 18:00:41 +00:00
Matt Arsenault	958fce3192	amdgcn: Fix return type of get_num_groups llvm-svn: 279723	2016-08-25 07:31:40 +00:00
Matt Arsenault	7ef7e6aacd	Strip opencl.ocl.version metadata This should be uniqued when linking, but right now it creates a lot of metadata spam listing the same version. This should also probably be reporting the compiled version of the user program, which may differ from the library. Currently the library IR files report 1.0 while 1.1/1.2 are the default for user programs. llvm-svn: 279692	2016-08-25 00:25:10 +00:00
Matt Arsenault	d0a275228e	amdgcn: Also correct get_local_size type for HSA llvm-svn: 279656	2016-08-24 19:11:52 +00:00
Matt Arsenault	26d9c41ff6	amdgcn: Fix return type for get_global_size llvm-svn: 279644	2016-08-24 17:52:04 +00:00
Matt Arsenault	314364cbd2	amdgpu: Fix default case value for get_local_size llvm-svn: 279359	2016-08-20 04:17:17 +00:00
Matt Arsenault	220268d177	amdgcn: Fix get_local_size IR return type llvm-svn: 279350	2016-08-20 00:01:21 +00:00
Matt Arsenault	2ce3d94a01	amdgcn: Correct return types to be size_t llvm-svn: 279343	2016-08-19 22:49:39 +00:00
Jan Vesely	ad8672727c	Implement vstore_half{,n} Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 278962	2016-08-17 20:02:11 +00:00
Jan Vesely	4c59714a52	Make min follow the OCL 1.0 specs OpenCL 1.0: "Returns y if y < x, otherwise it returns x. If x and y are infinite or NaN, the return values are undefined." OpenCL 1.1+: "Returns y if y < x, otherwise it returns x. If x or y are infinite or NaN, the return values are undefined." The 1.0 version is stricter so use that one. Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276704	2016-07-25 22:36:22 +00:00
Tom Stellard	d835b3f1af	Implement cbrt builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 276497	2016-07-22 23:45:15 +00:00
Tom Stellard	9cb070f96a	Implement cosh builtin This implementation was ported from the AMD builtin library and has been tested with piglit, OpenCV, and the ocl conformance tests. llvm-svn: 276496	2016-07-22 23:45:13 +00:00
Tom Stellard	ff13926a60	geometric/floatn.inc: Add vec8 and vec16 types llvm-svn: 276495	2016-07-22 23:45:11 +00:00
Jan Vesely	a82e080b57	AMDGPU: Implement get_global_offset builtin Also fix get_global_id to consider offset No idea how to add this for ptx, so they are stuck with the old get_global_id implementation. v2: split to a separate patch v3: Switch R600 to use implictarg.ptr Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276443	2016-07-22 17:24:24 +00:00
Jan Vesely	74f02db922	AMDGPU: Use clang intrinsics for workitem builtins v2: split into 2 patches use clang builtins for other intrinsics as well v3: Fix warnings Switch r600 to use implictarg.ptr Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 276442	2016-07-22 17:24:20 +00:00
Jan Vesely	7846c9b8f0	ptx: Fix builtin names after clang r274770 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Acked-By: Aaron Watry <awatry@gmail.com> llvm-svn: 276423	2016-07-22 15:00:08 +00:00
Matt Arsenault	633d749da7	amdgpu: Use right builtn for rsq The r600 path has never actually worked sinced double is not implemented there. llvm-svn: 276009	2016-07-19 19:02:01 +00:00
Matt Arsenault	1ab0d9c1ee	R600: Use new barrier intrinsic llvm-svn: 275874	2016-07-18 18:42:17 +00:00
Matt Arsenault	b456c6dd56	Replace llvm.AMDGPU.ldexp with llvm.amdgcn.ldexp It didn't really work on r600 to begin with, which should get its own intrinsic. llvm-svn: 275813	2016-07-18 16:42:50 +00:00
Jan Vesely	e97deffb6a	configure: Remove device specific defines Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 273044	2016-06-17 20:30:50 +00:00
Jan Vesely	5fd84d028d	nvptx: Drop feature defines. This is now handled by clang Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 273043	2016-06-17 20:30:49 +00:00
Jan Vesely	3317f253de	64 bit integers are legal in full profile without an extension Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 273042	2016-06-17 20:30:41 +00:00
Jan Vesely	973c1fa5f5	math: Use single precision fmax in sp path Fixes fdim piglit on Turks v2: use CL fmax instead of __builtin Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom.stellard@amd.com> llvm-svn: 269807	2016-05-17 19:44:01 +00:00
Jan Vesely	c374cb76f4	math: Add erf ported from amd-builtins The scalar float/double function bodies are a direct copy/paste, aside from the removed (optional) code in float function body that requires subnormals. reviewers: jvesely Patch by: Vedran Miletić <rivanvx@gmail.com> llvm-svn: 268766	2016-05-06 18:02:30 +00:00
Aaron Watry	55a8e0fd6d	math: Add fdim implementation Based on the amd-builtin, but explicitly vectorized for all sizes (not just float4), and includes a vectorized double implementation. Passes piglit (float) tests on pitcairn. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 268708	2016-05-06 03:34:45 +00:00
Tom Stellard	6cb18a09b1	prepare-builtins: Remove call to getGlobalContext() This function has been removed from LLVM. Patch By: Laurent Carlier llvm-svn: 266430	2016-04-15 14:18:58 +00:00

1 2 3 4 5 ...

447 Commits