llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	69aed7c364	[X86] Make _xgetbv/_xsetbv on non-windows platforms Summary: This patch attempts to redo what was tried in r278783, but was reverted. These intrinsics should be available on non-windows platforms with "xsave" feature check. But on Windows platforms they shouldn't have feature check since that's how MSVC behaves. To accomplish this I've added a MS builtin with no feature check. And a normal gcc builtin with a feature check. When _MSC_VER is not defined _xgetbv/_xsetbv will be macros pointing to the gcc builtin name. I've moved the forward declarations from intrin.h to immintrin.h to match the MSDN documentation and used that as the header file for the MS builtin. I'm not super happy with this implementation, and I'm open to suggestions for better ways to do it. Reviewers: rnk, RKSimon, spatel Reviewed By: rnk Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D56686 llvm-svn: 351160	2019-01-15 05:03:18 +00:00
Mandeep Singh Grang	1b9337f16b	[COFF, ARM64] Add __byteswap intrinsics Reviewers: rnk, efriedma, ssijaric, TomTan, haripul Reviewed By: efriedma Subscribers: javed.absar, cfe-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D56685 llvm-svn: 351147	2019-01-15 01:26:26 +00:00
Mandeep Singh Grang	0055f80b3f	[COFF, ARM64] Add __nop intrinsic Reviewers: rnk, efriedma, TomTan, haripul, ssijaric Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, cfe-commits Differential Revision: https://reviews.llvm.org/D56671 llvm-svn: 351135	2019-01-14 23:26:01 +00:00
Craig Topper	49488407aa	[X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select in IR instead. Fixes PR40259 llvm-svn: 351036	2019-01-14 08:46:51 +00:00
Craig Topper	bdbe5c7dc7	[X86] Make the pointer arguments to avx512 gather/scatter intrinsics 'void' to match gcc and Intel's documentation. The avx2 gather intrinsics are documented to use 'int', 'long long', 'float', or 'double' . So I'm leaving those. This matches gcc. llvm-svn: 350696	2019-01-09 07:36:01 +00:00
Craig Topper	cd9e232a4d	Recommit r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins." The MSVC limit hit in AutoUpgrade.cpp has been worked around for now. llvm-svn: 350568	2019-01-07 21:00:41 +00:00
Craig Topper	33c9088783	Revert r350555 "[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins." Had to revert the LLVM patch this depends on to fix a MSVC compiler limit in AutoUpgrade.cpp llvm-svn: 350563	2019-01-07 19:39:25 +00:00
Craig Topper	e34f2bb807	[X86] Use funnel shift intrinsics for the VBMI2 vshld/vshrd builtins. Differential Revision: https://reviews.llvm.org/D56365 llvm-svn: 350555	2019-01-07 19:10:22 +00:00
Ulrich Weigand	22ca9c628a	[SystemZ] Fix wrong codegen caused by typos in vecintrin.h The following two bugs in SystemZ high-level vector intrinsics are fixes by this patch: - The float case of vec_insert_and_zero should generate a VLLEZF pattern, but currently erroneously generates VLLEZLF. - The float and double versions of vec_orc erroneously generate and-with-complement instead of or-with-complement. The patch also fixes a couple of typos in the associated test. llvm-svn: 349751	2018-12-20 13:09:09 +00:00
Craig Topper	1f2b181689	[Builltins][X86] Provide implementations of __lzcnt16, __lzcnt, __lzcnt64 for MS compatibility. Remove declarations from intrin.h and implementations from lzcntintrin.h intrin.h had forward declarations for these and lzcntintrin.h had implementations that were only available with -mlzcnt or a -march that supported the lzcnt feature. For MS compatibility we should always have these builtins available regardless of X86 being the target or the CPU support the lzcnt instruction. The backends should be able to gracefully fallback to something support even if its just shifts and bit ops. Unfortunately, gcc also implements 2 of the 3 function names here on X86 when lzcnt feature is enabled. This patch adds builtins for these for MSVC compatibility and drops the forward declarations from intrin.h. To keep the gcc compatibility the two intrinsics that collided have been turned into macros that use the X86 specific builtins with the lzcnt feature check. These macros are only defined when _MSC_VER is not defined. Without them being macros we can get a redefinition error because -ms-extensions doesn't seem to set _MSC_VER but does make the MS builtins available. Should fix PR40014 Differential Revision: https://reviews.llvm.org/D55677 llvm-svn: 349098	2018-12-14 00:21:02 +00:00
Craig Topper	6d7a7ef9eb	[X86] Remove the addcarry builtins. Leaving only the addcarryx builtins since that matches gcc. The addcarry and addcarryx builtins do the same thing. The only difference is that addcarryx previously required adx feature. This commit removes the adx feature check from addcarryx and removes the addcarry builtin. This matches the builtins that gcc has. We don't guarantee compatibility in builtins, but we generally try to be consistent if its not a burden. llvm-svn: 348738	2018-12-10 06:07:59 +00:00
Artem Belevich	43cfce88b4	[CUDA] Added missing 'inline' for functions defined in a header. llvm-svn: 348662	2018-12-07 22:20:53 +00:00
Stefan Granitz	5c1ec2e9cb	[CMake] Store path to vendor-specific headers in clang-headers target property Summary: LLDB.framework wants a copy these headers. With this change LLDB can easily glob for the list of files: ``` get_target_property(clang_include_dir clang-headers RUNTIME_OUTPUT_DIRECTORY) file(GLOB_RECURSE clang_vendor_headers RELATIVE ${clang_include_dir} "${clang_include_dir}/*") ``` By default `RUNTIME_OUTPUT_DIRECTORY` is unset for custom targets like `clang-headers`. Reviewers: aprantl, JDevlieghere, davide, friss, dexonsmith Reviewed By: JDevlieghere Subscribers: mgorny, #lldb, cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D55128 llvm-svn: 348116	2018-12-03 10:34:25 +00:00
Nemanja Ivanovic	2447baff84	[PowerPC] Vector load/store builtins overstate alignment of pointers A number of builtins in altivec.h load/store vectors from pointers to scalar types. Currently they just cast the pointer to a vector pointer, but expressions like that have the alignment of the target type. Of course, the input pointer did not have that alignment so this triggers UBSan (and rightly so). This resolves https://bugs.llvm.org/show_bug.cgi?id=39704 Differential revision: https://reviews.llvm.org/D54787 llvm-svn: 347556	2018-11-26 14:35:38 +00:00
Zi Xuan Wu	71c35e13c3	[PowerPC] [Clang] [AltiVec] The second parameter of vec_sr function should be modulo the number of bits in the element The second parameter of vec_sr function is representing shift bits and it should be modulo the number of bits in the element like what vec_sl does now. This is actually required by the ABI: Each element of the result vector is the result of logically right shifting the corresponding element of ARG1 by the number of bits specified by the value of the corresponding element of ARG2, modulo the number of bits in the element. The bits that are shifted out are replaced by zeros. Differential Revision: https://reviews.llvm.org/D54087 llvm-svn: 346471	2018-11-09 03:35:32 +00:00
Andrew Savonichev	3fee351867	[OpenCL] Add support of cl_intel_device_side_avc_motion_estimation extension Summary: Documentation can be found at https://www.khronos.org/registry/OpenCL/extensions/intel/cl_intel_device_side_avc_motion_estimation.txt Patch by Kristina Bessonova Reviewers: Anastasia, yaxunl, shafik Reviewed By: Anastasia Subscribers: arphaman, sidorovd, AlexeySotkin, krisb, bader, asavonic, cfe-commits Differential Revision: https://reviews.llvm.org/D51484 llvm-svn: 346392	2018-11-08 11:25:41 +00:00
Andrew Savonichev	3b12b7e702	Revert r346326 [OpenCL] Add support of cl_intel_device_side_avc_motion_estimation This patch breaks Index/opencl-types.cl LIT test: Script: -- : 'RUN: at line 1'; stage1/bin/c-index-test -test-print-type llvm/tools/clang/test/Index/opencl-types.cl -cl-std=CL2.0 \| stage1/bin/FileCheck llvm/tools/clang/test/Index/opencl-types.cl -- Command Output (stderr): -- llvm/tools/clang/test/Index/opencl-types.cl:3:26: warning: unsupported OpenCL extension 'cl_khr_fp16' - ignoring [-Wignored-pragmas] llvm/tools/clang/test/Index/opencl-types.cl:4:26: warning: unsupported OpenCL extension 'cl_khr_fp64' - ignoring [-Wignored-pragmas] llvm/tools/clang/test/Index/opencl-types.cl:8:9: error: use of type 'double' requires cl_khr_fp64 extension to be enabled llvm/tools/clang/test/Index/opencl-types.cl:11:8: error: declaring variable of type 'half' is not allowed llvm/tools/clang/test/Index/opencl-types.cl:15:3: error: use of type 'double' requires cl_khr_fp64 extension to be enabled llvm/tools/clang/test/Index/opencl-types.cl:16:3: error: use of type 'double4' (vector of 4 'double' values) requires cl_khr_fp64 extension to be enabled llvm/tools/clang/test/Index/opencl-types.cl:26:26: warning: unsupported OpenCL extension 'cl_khr_gl_msaa_sharing' - ignoring [-Wignored-pragmas] llvm/tools/clang/test/Index/opencl-types.cl:35:44: error: use of type '__read_only image2d_msaa_t' requires cl_khr_gl_msaa_sharing extension to be enabled llvm/tools/clang/test/Index/opencl-types.cl:36:49: error: use of type '__read_only image2d_array_msaa_t' requires cl_khr_gl_msaa_sharing extension to be enabled llvm/tools/clang/test/Index/opencl-types.cl:37:49: error: use of type '__read_only image2d_msaa_depth_t' requires cl_khr_gl_msaa_sharing extension to be enabled llvm/tools/clang/test/Index/opencl-types.cl:38:54: error: use of type '__read_only image2d_array_msaa_depth_t' requires cl_khr_gl_msaa_sharing extension to be enabled llvm-svn: 346338	2018-11-07 18:34:19 +00:00
Andrew Savonichev	35dfce723c	[OpenCL] Add support of cl_intel_device_side_avc_motion_estimation extension Summary: Documentation can be found at https://www.khronos.org/registry/OpenCL/extensions/intel/cl_intel_device_side_avc_motion_estimation.txt Patch by Kristina Bessonova Reviewers: Anastasia, yaxunl, shafik Reviewed By: Anastasia Subscribers: arphaman, sidorovd, AlexeySotkin, krisb, bader, asavonic, cfe-commits Differential Revision: https://reviews.llvm.org/D51484 llvm-svn: 346326	2018-11-07 15:44:01 +00:00
Reid Kleckner	902fa41188	[MS] Zero out ECX in __cpuid in intrin.h Summary: Some CPUID leafs depend on the value of ECX as well as EAX, but we left it uninitialized. Originally reported as https://crbug.com/901547 Reviewers: craig.topper, hans Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D54171 llvm-svn: 346265	2018-11-06 20:45:26 +00:00
Mandeep Singh Grang	574caddc0d	[COFF, ARM64] Implement InterlockedDecrement_ builtins This is eight in a series of patches to move intrinsic definitions out of intrin.h. Differential: https://reviews.llvm.org/D54068 llvm-svn: 346208	2018-11-06 05:07:43 +00:00
Mandeep Singh Grang	fdf74d9751	[COFF, ARM64] Implement InterlockedIncrement_ builtins This is seventh in a series of patches to move intrinsic definitions out of intrin.h. Differential: https://reviews.llvm.org/D54067 llvm-svn: 346207	2018-11-06 05:05:32 +00:00
Mandeep Singh Grang	c89157b5c1	[COFF, ARM64] Implement InterlockedAnd_ builtins This is sixth in a series of patches to move intrinsic definitions out of intrin.h. Differential: https://reviews.llvm.org/D54066 llvm-svn: 346206	2018-11-06 05:03:13 +00:00
Mandeep Singh Grang	806f10701b	[COFF, ARM64] Implement InterlockedXor_ builtins This is fifth in a series of patches to move intrinsic definitions out of intrin.h. Note: This was reviewed and approved in D54065 but somehow that diff was messed up. Committing this again with the proper diff. llvm-svn: 346205	2018-11-06 04:55:20 +00:00
Mandeep Singh Grang	d9f70b1495	Revert "[COFF, ARM64] Implement InterlockedXor_ builtins" This reverts commit cc3d3cd0fbeb88412d332354c261ff139c4ede6b. llvm-svn: 346192	2018-11-06 01:14:24 +00:00
Mandeep Singh Grang	d8a4455d97	[COFF, ARM64] Implement InterlockedXor_ builtins Summary: This is fifth in a series of patches to move intrinsic definitions out of intrin.h. Reviewers: rnk, efriedma, mstorsjo, TomTan Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D54065 llvm-svn: 346191	2018-11-06 01:12:29 +00:00
Mandeep Singh Grang	ec62b31e2c	[COFF, ARM64] Implement InterlockedOr_ builtins This is fourth in a series of patches to move intrinsic definitions out of intrin.h. llvm-svn: 346190	2018-11-06 01:11:25 +00:00
Mandeep Singh Grang	6b880689f0	[COFF, ARM64] Implement InterlockedCompareExchange_ builtins Summary: This is third in a series of patches to move intrinsic definitions out of intrin.h. Reviewers: rnk, efriedma, mstorsjo, TomTan Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D54062 llvm-svn: 346189	2018-11-06 00:36:48 +00:00
Mandeep Singh Grang	7fa07e554d	[COFF, ARM64] Implement InterlockedExchange_ builtins Summary: Windows SDK needs these intrinsics to be proper builtins. This is second in a series of patches to move intrinsic defintions out of intrin.h. Reviewers: rnk, mstorsjo, efriedma, TomTan Reviewed By: rnk, efriedma Subscribers: javed.absar, kristof.beyls, chrib, jfb, kristina, cfe-commits Differential Revision: https://reviews.llvm.org/D54046 llvm-svn: 346044	2018-11-02 21:18:23 +00:00
Eli Friedman	b262d1631e	[ARM64] [Windows] Implement _InterlockedExchangeAdd_ builtins. These apparently need to be proper builtins to handle the Windows SDK. Differential Revision: https://reviews.llvm.org/D53916 llvm-svn: 345779	2018-10-31 21:31:09 +00:00
Andrew Savonichev	70b6bbe2bb	[OpenCL] Remove PIPE_RESERVE_ID_VALID_BIT from opencl-c.h Summary: PIPE_RESERVE_ID_VALID_BIT is implementation defined, so lets not keep it in the header. Previously the topic was discussed here: https://reviews.llvm.org/D32896 Reviewers: Anastasia, yaxunl Reviewed By: Anastasia Subscribers: cfe-commits, asavonic, bader Differential Revision: https://reviews.llvm.org/D52658 llvm-svn: 345051	2018-10-23 17:05:29 +00:00
Andrew Savonichev	700c3bea9e	[OpenCL] Add cl_intel_planar_yuv extension Just adding a preprocessor #define for the extension. Patch by Alexey Sotkin and Dmitry Sidorov Phabricator review: https://reviews.llvm.org/D51402 llvm-svn: 345044	2018-10-23 16:13:16 +00:00
Craig Topper	eae26bf737	[X86] Add more intrinsics to match icc. This adds _mm_loadu_epi8, _mm256_loadu_epi8, _mm512_loadu_epi8 _mm_loadu_epi16, _mm256_loadu_epi16, _mm512_loadu_epi16 _mm_storeu_epi8, _mm256_storeu_epi8, _mm512_storeu_epi8 _mm_storeu_epi16, _mm256_storeu_epi16, _mm512_storeu_epi16 llvm-svn: 344862	2018-10-20 19:28:52 +00:00
Craig Topper	58508be3c0	[X86] Add missing intrinsics to match icc. This adds _mm_and_epi32, _mm_and_epi64 _mm_andnot_epi32, _mm_andnot_epi64 _mm_or_epi32, _mm_or_epi64 _mm_xor_epi32, _mm_xor_epi64 _mm256_and_epi32, _mm256_and_epi64 _mm256_andnot_epi32, _mm256_andnot_epi64 _mm256_or_epi32, _mm256_or_epi64 _mm256_xor_epi32, _mm256_xor_epi64 _mm_loadu_epi32, _mm_loadu_epi64 _mm_load_epi32, _mm_load_epi64 _mm256_loadu_epi32, _mm256_loadu_epi64 _mm256_load_epi32, _mm256_load_epi64 _mm512_loadu_epi32, _mm512_loadu_epi64 _mm512_load_epi32, _mm512_load_epi64 _mm_storeu_epi32, _mm_storeu_epi64 _mm_store_epi32, _mm_load_epi64 _mm256_storeu_epi32, _mm256_storeu_epi64 _mm256_store_epi32, _mm256_load_epi64 _mm512_storeu_epi32, _mm512_storeu_epi64 _mm512_store_epi32,V _mm512_load_epi64 llvm-svn: 344861	2018-10-20 19:28:50 +00:00
Mandeep Singh Grang	2147b1af95	[COFF, ARM64] Add _ReadStatusReg and_WriteStatusReg intrinsics Reviewers: rnk, compnerd, mstorsjo, efriedma, TomTan, haripul, javed.absar Reviewed By: efriedma Subscribers: dmajor, kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D53115 llvm-svn: 344765	2018-10-18 23:35:35 +00:00
Mandeep Singh Grang	df7929676d	[COFF, ARM64] Add _InterlockedAdd intrinsic Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, chrib, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D52811 llvm-svn: 343894	2018-10-05 21:57:41 +00:00
Mandeep Singh Grang	ecc82ef0c2	[COFF, ARM64] Add __getReg intrinsic Reviewers: rnk, mstorsjo, compnerd, TomTan, haripul, javed.absar, efriedma Reviewed By: efriedma Subscribers: peter.smith, efriedma, kristof.beyls, chrib, cfe-commits Differential Revision: https://reviews.llvm.org/D52838 llvm-svn: 343824	2018-10-04 22:32:42 +00:00
Matt Arsenault	a01151294a	OpenCL: Mark printf format string argument Fixes not warning on format string errors. llvm-svn: 343653	2018-10-03 02:01:19 +00:00
Craig Topper	716e8e6858	[X86] Add more of the icc unaligned load/store to/from 128 bit vector intrinsics Summary: This patch adds _mm_loadu_si32 _mm_loadu_si16 _mm_storeu_si64 _mm_storeu_si32 _mm_storeu_si16 We already had _mm_load_si64. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D52665 llvm-svn: 343388	2018-09-29 17:49:42 +00:00
Craig Topper	6ad9220067	[X86] Add the movbe instruction intrinsics from icc. These intrinsics exist in icc. They can be found on the Intel Intrinsics Guide website. All the backend support is in place to pattern match a load+bswap or a bswap+store pattern to the MOVBE instructions. So we just need to get the frontend to emit the correct IR. The pointer arguments in icc are declared as void so I had to jump through a packed struct to forcing a specific alignment on the load/store. Same trick we use in the unaligned vector load/store intrinsics Differential Revision: https://reviews.llvm.org/D52586 llvm-svn: 343343	2018-09-28 17:09:51 +00:00
Craig Topper	fb5d9f2849	[X86] For lzcnt/tzcnt intrinsics use cttz/ctlz intrinsics with zero_undef flag set to false. Previously we used a select and the zero_undef=true intrinsic. In -O2 this pattern will get optimized to zero_undef=false. But in -O0 this optimization won't happen. This results in a compare and cmov being wrapped around a tzcnt/lzcnt instruction. By using the zero_undef=false intrinsic directly without the select, we can improve the -O0 codegen to just an lzcnt/tzcnt instruction. Differential Revision: https://reviews.llvm.org/D52392 llvm-svn: 343126	2018-09-26 17:01:44 +00:00
Artem Belevich	44ecb0e3c2	[CUDA] Added basic support for compiling with CUDA-10.0 llvm-svn: 342924	2018-09-24 23:10:44 +00:00
Craig Topper	d88f76a891	[X86] Add ktest intrinsics to match gcc and icc. These aren't documented in the Intel Intrinsics Guide, but are supported by gcc and icc. Includes these intrinsics: _ktestc_mask8_u8, _ktestz_mask8_u8, _ktest_mask8_u8 _ktestc_mask16_u8, _ktestz_mask16_u8, _ktest_mask16_u8 _ktestc_mask32_u8, _ktestz_mask32_u8, _ktest_mask32_u8 _ktestc_mask64_u8, _ktestz_mask64_u8, _ktest_mask64_u8 llvm-svn: 341265	2018-08-31 22:29:56 +00:00
Craig Topper	42a4d0822e	[X86] Add k-mask conversion and load/store instrinsics to match gcc and icc. This adds: _cvtmask8_u32, _cvtmask16_u32, _cvtmask32_u32, _cvtmask64_u64 _cvtu32_mask8, _cvtu32_mask16, _cvtu32_mask32, _cvtu64_mask64 _load_mask8, _load_mask16, _load_mask32, _load_mask64 _store_mask8, _store_mask16, _store_mask32, _store_mask64 These are currently missing from the Intel Intrinsics Guide webpage. llvm-svn: 341251	2018-08-31 20:41:06 +00:00
Craig Topper	2aa8efc820	[X86] Add kshift intrinsics to match gcc and icc. This adds the following intrinsics: _kshiftli_mask8 _kshiftli_mask16 _kshiftli_mask32 _kshiftli_mask64 _kshiftri_mask8 _kshiftri_mask16 _kshiftri_mask32 _kshiftri_mask64 llvm-svn: 341234	2018-08-31 18:22:52 +00:00
Craig Topper	a65bf65e0b	[X86] Add kadd intrinsics to match gcc and icc. This adds the following intrinsics: _kadd_mask64 _kadd_mask32 _kadd_mask16 _kadd_mask8 These are missing from the Intel Intrinsics Guide, but are implemented by both gcc and icc. llvm-svn: 340879	2018-08-28 22:32:14 +00:00
Craig Topper	cb5fd56c7f	[X86] Add kortest intrinsics for 8, 32, and 64 bit masks. Add new intrinsic names for 16 bit masks. This matches gcc and icc despite not being documented in the Intel Intrinsics Guide. llvm-svn: 340798	2018-08-28 06:28:25 +00:00
Craig Topper	c330ca8611	[X86] Add intrinsics for kand/kandn/knot/kor/kxnor/kxor with 8, 32, and 64-bit mask registers. This also adds a second intrinsic name for the 16-bit mask versions. These intrinsics match gcc and icc. They just aren't published in the Intel Intrinsics Guide so I only recently found they existed. llvm-svn: 340719	2018-08-27 06:20:22 +00:00
Craig Topper	9a022280b5	[X86] Remove min_vector_width 512 from some intrinsics that operate only on k-registers. llvm-svn: 340718	2018-08-27 06:20:20 +00:00
Craig Topper	e0b5d4cd9d	[X86] Rename __DEFAULT_FN_ATTRS to a__DEFAULT_FN_ATTRS512 in avx512dqintrin.h and avx512bwintrin.h. This is preparation for adding removing min_vector_width 512 from some intrinsics. llvm-svn: 340717	2018-08-27 06:20:19 +00:00
Craig Topper	266b858705	[X86] Undef __DEFAULT_FN_ATTRS in avx512fintrin.h. Fixes test failure after r340713 llvm-svn: 340714	2018-08-27 05:44:45 +00:00
Craig Topper	f821f5314e	[X86] Don't set min_vector_width to 512 on intrinsics that only operate on k registers. llvm-svn: 340713	2018-08-27 05:27:15 +00:00
Nico Weber	b2c53d3393	Make __shiftleft128 / __shiftright128 real compiler built-ins. r337619 added __shiftleft128 / __shiftright128 as functions in intrin.h. Microsoft's STL plans on using these functions, and they're using intrin0.h which just has declarations of built-ins to not pull in the huge intrin.h header in the standard library headers. That requires that these functions are real built-ins. https://reviews.llvm.org/D50907 llvm-svn: 340048	2018-08-17 17:19:06 +00:00
Craig Topper	72a7606433	[X86] Remove masking from the 512-bit paddus/psubus builtins. Use a select builtin instead. llvm-svn: 339845	2018-08-16 07:28:06 +00:00
Craig Topper	0609d1e211	[X86] Remove masking from the 512-bit padds and psubs builtins. Use select builtin instead. llvm-svn: 339843	2018-08-16 06:20:29 +00:00
Pirama Arumuga Nainar	3c1a7bc290	[Headers] Define _HAS_SUBNORM for FLT, DBL, LDBL Summary: These macros are defined in the C11 standard and can be defined based on the ___HAS_DENORM__ default macros. Reviewers: bruno, rsmith, doug.gregor Subscribers: llvm-commits, enh, srhines Differential Revision: https://reviews.llvm.org/D37302 llvm-svn: 339284	2018-08-08 20:38:38 +00:00
Martin Storsjo	1662647995	[Headers] Expand _Unwind_Exception for SEH on MinGW/x86_64 This matches how GCC defines this struct. Differential Revision: https://reviews.llvm.org/D50380 llvm-svn: 339170	2018-08-07 20:02:40 +00:00
Louis Dionne	58529c3f57	[clang] Fix broken include_next in float.h Summary: The code defines __FLOAT_H and then includes the next <float.h>, which is guarded on __FLOAT_H so it gets skipped entirely. This commit uses the header guard __CLANG_FLOAT_H, like other headers (such as limits.h) do. Reviewers: jfb Subscribers: dexonsmith, cfe-commits Differential Revision: https://reviews.llvm.org/D50276 llvm-svn: 339016	2018-08-06 14:29:47 +00:00
Fangrui Song	6907ce2f8f	Remove trailing space sed -Ei 's/[[:space:]]+$//' include/*/.{def,h,td} lib/*/.{cpp,h} llvm-svn: 338291	2018-07-30 19:24:48 +00:00
Nico Weber	e5898c1911	[ms] Add __shiftleft128 / __shiftright128 intrinsics Carefully match the pattern matched by ISel so that this produces shld / shrd (unless Subtarget->isSHLDSlow() is true). Thanks to Craig Topper for providing the LLVM IR pattern that gets successfully matched. Fixes PR37755. llvm-svn: 337619	2018-07-20 21:02:09 +00:00
Artem Belevich	fa07bb646c	[CUDA] Provide integer SIMD functions for CUDA-9.2 CUDA-9.2 made all integer SIMD functions into compiler builtins, so clang no longer has access to the implementation of these functions in either headers of libdevice and has to provide its own implementation. This is mostly a 1:1 mapping to a corresponding PTX instructions with an exception of vhadd2/vhadd4 that don't have an equivalent instruction and had to be implemented with a bit hack. Performance of this implementation will be suboptimal for SM_50 and newer GPUs where PTXAS generates noticeably worse code for the SIMD instructions compared to the code it generates for the inline assembly generated by nvcc (or used to come with CUDA headers). Differential Revision: https://reviews.llvm.org/D49274 llvm-svn: 337587	2018-07-20 17:44:34 +00:00
Mandeep Singh Grang	0054f48b44	[COFF] Add more missing MSVC ARM64 intrinsics Summary: Added the following intrinsics: _BitScanForward, _BitScanReverse, _BitScanForward64, _BitScanReverse64 _InterlockedAnd64, _InterlockedDecrement64, _InterlockedExchange64, _InterlockedExchangeAdd64, _InterlockedExchangeSub64, _InterlockedIncrement64, _InterlockedOr64, _InterlockedXor64. Reviewers: compnerd, mstorsjo, rnk, javed.absar Reviewed By: mstorsjo Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49445 llvm-svn: 337327	2018-07-17 22:03:24 +00:00
Eric Christopher	83225b4075	Remove unnecessary trailing ; in macro intrinsic definition. llvm-svn: 337321	2018-07-17 20:22:17 +00:00
Mikhail Dvoretckii	d1bf9ef0c7	[X86] Lowering integer truncation intrinsics to native IR This patch lowers the _mm[256\|512]_cvtepi{64\|32\|16}_epi{32\|16\|8} intrinsics to native IR in cases where the result's length is less than 128 bits. The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for 128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps) instructions are generated instead Differential Revision: https://reviews.llvm.org/D48712 llvm-svn: 336643	2018-07-10 08:22:44 +00:00
Craig Topper	36ab775cc1	[X86] Use masked the masked scalar fma builtins to implement the default rounding version of the fma intrinsics. The rounding mode is checked in CGBuiltin.cpp to generate the correct intrinsic call. Making this switch switchs the masking to use the i8 bitcast to <8 x i1> and extract i1 version of the IR for the mask. Previously we ended up with a scalar 'and' plus an icmp. llvm-svn: 336637	2018-07-10 04:38:29 +00:00
Craig Topper	638426fc36	[X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that is suitable for use in scalar mask intrinsics. This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics. This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling. I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle. llvm-svn: 336622	2018-07-10 00:37:25 +00:00
Craig Topper	74c10e3236	[Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter. Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible. To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future. To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin. To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute. To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins. There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch. Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue. Differential Revision: https://reviews.llvm.org/D48617 llvm-svn: 336583	2018-07-09 19:00:16 +00:00
Craig Topper	0a485d13a4	[X86] Remove some unnecessarily escaped new lines from avx512fintrin.h llvm-svn: 336499	2018-07-07 22:03:19 +00:00
Craig Topper	3e720a302c	[X86] Fix a few intrinsics that were ignoring their rounding mode argument and hardcoded _MM_FROUND_CUR_DIRECTION internally. I believe these have been broken since their introduction into clang. I've enhanced the tests for these intrinsics to using a real rounding mode and checking all the intrinsic arguments instead of just the name. llvm-svn: 336498	2018-07-07 22:03:16 +00:00
Craig Topper	218da62091	[X86] Change _mm512_shuffle_pd and _mm512_shuffle_ps to use target specific shuffle builtins instead of generic __builtin_shufflevector. I added the builtins for 128, 256, and 512 bits recently but looks like I failed to convert to using the 512 bit one. llvm-svn: 336488	2018-07-07 17:03:34 +00:00
Craig Topper	5cbeeedd27	[X86] Fix various type mismatches in intrinsic headers and intrinsic tests that cause extra bitcasts to be emitted in the IR. Found via imprecise grepping of the -O0 IR. There could still be more bugs out there. llvm-svn: 336487	2018-07-07 17:03:32 +00:00
Craig Topper	10f20fc42b	[X86] Add missing scalar fma intrinsics with rounding, but no mask. We had the mask versions of the rounding intrinsics, but not one without masking. Also change the rounding tests to not use the CUR_DIRECTION rounding mode. llvm-svn: 336470	2018-07-06 22:08:43 +00:00
Craig Topper	0029470dde	[X86] Correct the width of mask arguments in intrinsic headers and tests. All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there. Some of these were real bugs where we lost bits from the user input: _mm512_mask_broadcast_f32x8 _mm512_maskz_broadcast_f32x8 _mm512_mask_broadcast_i32x8 _mm512_maskz_broadcast_i32x8 _mm256_mask_cvtusepi16_storeu_epi8 llvm-svn: 336042	2018-06-30 06:05:17 +00:00
Craig Topper	0e9de769a0	[X86] Remove masking from the avx512 rotate builtins. Use a select builtin instead. llvm-svn: 336036	2018-06-30 01:32:14 +00:00
Justin Lebar	2a192abaec	[CUDA] Make __host__/__device__ min/max overloads constexpr in C++14. Summary: Tests in a separate change to the test-suite. Reviewers: rsmith, tra Subscribers: lahwaacz, sanjoy, cfe-commits Differential Revision: https://reviews.llvm.org/D48151 llvm-svn: 336026	2018-06-29 22:28:09 +00:00
Justin Lebar	5cb41c2acf	[CUDA] Make min/max shims host+device. Summary: Fixes PR37753: min/max can't be called from __host__ __device__ functions in C++14 mode. Testcase in a separate test-suite commit. Reviewers: rsmith Subscribers: sanjoy, lahwaacz, cfe-commits Differential Revision: https://reviews.llvm.org/D48036 llvm-svn: 336025	2018-06-29 22:27:56 +00:00
Craig Topper	8bf793fb35	[X86] Remove masking from the avx512 packed sqrt builtins. Use select builtins instead. llvm-svn: 335945	2018-06-29 05:43:33 +00:00
Craig Topper	1763dbb278	[X86] Correct the inline assembly implementations of __movsb/w/d/q and __stosw/d/q to mark registers/memory as modified The inline assembly for these didn't mark that edi, esi, ecx are modified by movs/stos instruction. It also didn't mark that memory is modified. This issue was reported to llvm-dev last year http://lists.llvm.org/pipermail/cfe-dev/2017-November/055863.html but no bug was ever filed. Differential Revision: https://reviews.llvm.org/D48448 llvm-svn: 335270	2018-06-21 18:56:30 +00:00
Craig Topper	b2431c6c33	[Intrinsics] Add/move some builtin declarations in intrin.h to get ms-intrinsics.c to not issue warnings ud2 and int2c were missing declarations entirely. And the bitscans were only under x86_64, but they seem to be in BuiltinsARM.def as well and are tested by ms_intrinsics.c Differential Revision: https://reviews.llvm.org/D48187 llvm-svn: 335259	2018-06-21 17:07:04 +00:00
Craig Topper	ddfe69cc99	[X86] Rewrite the add/mul/or/and reduction intrinsics to make better use of other intrinsics and remove undef shuffle indices. Similar to what was done to max/min recently. These already reduced the vector width to 256 and 128 bit as we go unlike the original max/min code. Differential Revision: https://reviews.llvm.org/D48346 llvm-svn: 335253	2018-06-21 16:41:28 +00:00
Craig Topper	2da60bc231	[X86] Remove masking from the 512-bit floating point max/min builtins. Use select in IR instead. llvm-svn: 335200	2018-06-21 05:01:01 +00:00
Craig Topper	d334615c23	[X86] Undefine _mm512_mask_reduce_operator macro in avx512fintrin.h before redefining it. llvm-svn: 335086	2018-06-20 00:31:39 +00:00
Craig Topper	61495b3650	Recommit r335070 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible."" Test has been updated to reflect the IRGen. llvm-svn: 335075	2018-06-19 21:00:30 +00:00
Craig Topper	79b13a0348	Revert r335070 "[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible." The test changes are failing the buildbot and its going to take me some time to fix it. llvm-svn: 335072	2018-06-19 19:37:07 +00:00
Craig Topper	873afd0262	[X86] Rewrite the max and min reduction intrinsics to make better use of other functions and to reduce width to 256 and 128 bits were possible. We only need to use 512 bit vectors all the way through v8i64 reductions since those max instructions are new to avx512f and only available in 512 bits until SKX. For v16i32 and floating point we have legacy 128/256 bit instructions we can use. I've tried to use other intrinsics to reduce the verbosity of the code and avoid having to mention all the shuffles. I've also removed all the -1 shuffle indices so the output sequence is fully specified and not left to backend optimization. Differential Revision: https://reviews.llvm.org/D47401 llvm-svn: 335070	2018-06-19 19:13:54 +00:00
Craig Topper	31730ae761	[X86] Rename __builtin_ia32_pslldqi128 to __builtin_ia32_pslldqi128_byteshift and similar for other sizes. Remove the multiply by 8 from the header files. The previous names took the shift amount in bits to match gcc and required a multiply by 8 in the header. This creates a misleading error message when we check the range of the immediate to the builtin since the allowed range also got multiplied by 8. This commit changes the builtins to use a byte shift amount to match the underlying instruction and the Intel intrinsic. Fixes the remaining issue from PR37795. llvm-svn: 334773	2018-06-14 22:02:35 +00:00
Craig Topper	b521dc3acf	[X86] Add inline assembly versions of _InterlockedExchange_HLEAcquire/Release and _InterlockedCompareExchange_HLEAcquire/Release for MSVC compatibility. Clang/LLVM doesn't have a way to pass an HLE hint through to the X86 backend to emit HLE prefixed instructions. So this is a good short term fix. Differential Revision: https://reviews.llvm.org/D47672 llvm-svn: 334751	2018-06-14 18:43:52 +00:00
Tomasz Krupa	82aa42af49	[X86] Lowering Mask Scalar intrinsics to native IR (Clang part) Summary: Lowering add, sub, mul, and div mask scalar intrinsic calls to native IR. Reviewers: craig.topper, RKSimon, spatel, sroland Reviewed By: craig.topper Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D47979 llvm-svn: 334741	2018-06-14 17:36:23 +00:00
Craig Topper	2527c378c6	[X86] Remove masking from avx512vbmi2 concat and shift by immediate builtins. Use select builtins instead. llvm-svn: 334577	2018-06-13 07:19:28 +00:00
Craig Topper	91bbe98757	[X86] Remove masking from dbpsadbw builtins, use select builtin instead. llvm-svn: 334385	2018-06-11 06:18:29 +00:00
Craig Topper	3614b41a8e	[X86] Remove masking from the 512-bit packed floating point add/sub/mul/div builtins. Use select in IR instead. llvm-svn: 334359	2018-06-10 06:01:42 +00:00
Craig Topper	88097d9355	[X86] Add back some masked vector truncate builtins. Custom IRgen a a few others. I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl. By using special buitlins we can emit a select without using the 128/256-bit select builtins. llvm-svn: 334331	2018-06-08 21:50:08 +00:00
Craig Topper	5f50f33806	[X86] Fold masking into subvector extract builtins. I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features. The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported. llvm-svn: 334330	2018-06-08 21:50:07 +00:00
Craig Topper	03f4f04b91	[X86] Add builtins for vpermq/vpermpd instructions to enable target feature checking. llvm-svn: 334311	2018-06-08 18:00:25 +00:00
Craig Topper	9d3962f4f1	[X86] Change immediate type for some builtins from char to int. These builtins are all handled by CGBuiltin.cpp so it doesn't much matter what the immediate type is, but int matches the intrinsic spec. llvm-svn: 334310	2018-06-08 18:00:22 +00:00
Craig Topper	422a1bbb84	[X86] Add builtins for shufps and shufpd to enable target feature and immediate range checking. llvm-svn: 334266	2018-06-08 07:18:33 +00:00
Craig Topper	03de166ccd	[X86] Add builtins for pshufd, pshuflw, and pshufhw to enable target feature and immediate range checking. llvm-svn: 334265	2018-06-08 06:13:16 +00:00
Craig Topper	573dab1553	[X86] Fix some typecasts in intrinsic headers that I messed up in r334261. This was caught by the Header tests, but not the CodeGen tests. llvm-svn: 334264	2018-06-08 04:09:14 +00:00
Craig Topper	3428beeb2f	[X86] Add subvector insert and extract builtins to enable target feature checking and immediate range checking. Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc. llvm-svn: 334261	2018-06-08 03:24:47 +00:00
Craig Topper	acf5601961	[X86] Add builtins for vpermilps/pd instructions to enable target feature checking. llvm-svn: 334256	2018-06-08 00:59:27 +00:00
Craig Topper	7d17d7278b	[X86] Add builtins for blend with immediate control to enforce target feature requirements and check immediate range. llvm-svn: 334249	2018-06-08 00:00:21 +00:00

1 2 3 4 5 ...

1530 Commits