llvm-project

Commit Graph

Author	SHA1	Message	Date
Oliver Stannard	2fcee8bd52	[ARM,AArch64] Add intrinsics for dot product instructions The ACLE spec which describes these intrinsics hasn't been published yet, but this is based on the final draft which will be published soon, and these have already been implemented by GCC. Differential revision: https://reviews.llvm.org/D46109 llvm-svn: 331039	2018-04-27 14:03:32 +00:00
Sven van Haastregt	4700faa28e	[OpenCL] Add separate read_only and write_only pipe IR types SPIR-V encodes the read_only and write_only access qualifiers of pipes, so separate LLVM IR types are required to target SPIR-V. Other backends may also find this useful. These new types are `opencl.pipe_ro_t` and `opencl.pipe_wo_t`, which replace `opencl.pipe_t`. This replaces __get_pipe_num_packets(...) and __get_pipe_max_packets(...) which took a read_only pipe with separate versions for read_only and write_only pipes, namely: * __get_pipe_num_packets_ro(...) * __get_pipe_num_packets_wo(...) * __get_pipe_max_packets_ro(...) * __get_pipe_max_packets_wo(...) These separate versions exist to avoid needing a bitcast to one of the two qualified pipe types. Patch by Stuart Brady. Differential Revision: https://reviews.llvm.org/D46015 llvm-svn: 331026	2018-04-27 10:37:04 +00:00
Chandler Carruth	16429acacb	[x86] Revert r330322 (& r330323): Lowering x86 adds/addus/subs/subus intrinsics The LLVM commit introduces a crash in LLVM's instruction selection. I filed http://llvm.org/PR37260 with the test case. llvm-svn: 330997	2018-04-26 21:46:01 +00:00
Alexander Ivchenko	d96ddccdb4	Lowering x86 adds/addus/subs/subus intrinsics (clang) This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. Patch by tkrupa Differential Revision: https://reviews.llvm.org/D44786 llvm-svn: 330323	2018-04-19 12:15:11 +00:00
Artem Belevich	0ae8590354	[NVPTX, CUDA] Added support for m8n32k16 and m32n8k16 variants of wmma instructions. The new instructions were added added for sm_70+ GPUs in CUDA-9.1. Differential Revision: https://reviews.llvm.org/D45068 llvm-svn: 330296	2018-04-18 21:51:48 +00:00
Keith Wyss	f437e35671	[XRay] Add clang builtin for xray typed events. Summary: A clang builtin for xray typed events. Differs from __xray_customevent(...) by the presence of a type tag that is vended by compiler-rt in typical usage. This allows xray handlers to expand logged events with their type description and plugins to process traced events based on type. This change depends on D45633 for the intrinsic definition. Reviewers: dberris, pelikan, rnk, eizan Subscribers: cfe-commits, llvm-commits Differential Revision: https://reviews.llvm.org/D45716 llvm-svn: 330220	2018-04-17 21:32:43 +00:00
Aaron Ballman	fe93546b11	Add modifiers for unsigned char and signed char field printing for __builtin_dump_struct. Patch by Paul Semel. llvm-svn: 330188	2018-04-17 14:00:06 +00:00
Aaron Ballman	b6a7702297	Add checks for format specifiers used by __builtin_dump_struct and added a new specifier for null-terminated constant strings. Patch by Paul Semel. llvm-svn: 330185	2018-04-17 11:57:47 +00:00
Ivan A. Kosarev	9cdb2c75d9	[NEON] Support vrndns_f32 intrinsic Differential Revision: https://reviews.llvm.org/D45515 llvm-svn: 330012	2018-04-13 12:46:02 +00:00
Dean Michael Berris	488f7c2b67	[XRay][clang] Add flag to choose instrumentation bundles Summary: This change addresses http://llvm.org/PR36926 by allowing users to pick which instrumentation bundles to use, when instrumenting with XRay. In particular, the flag `-fxray-instrumentation-bundle=` has four valid values: - `all`: the default, emits all instrumentation kinds - `none`: equivalent to -fnoxray-instrument - `function`: emits the entry/exit instrumentation - `custom`: emits the custom event instrumentation These can be combined either as comma-separated values, or as repeated flag values. Reviewers: echristo, kpw, eizan, pelikan Reviewed By: pelikan Subscribers: mgorny, cfe-commits Differential Revision: https://reviews.llvm.org/D44970 llvm-svn: 329985	2018-04-13 02:31:58 +00:00
Aaron Ballman	0652534131	Introduce a new builtin, __builtin_dump_struct, that is useful for dumping structure contents at runtime in circumstances where debuggers may not be easily available (such as in kernel work). Patch by Paul Semel. llvm-svn: 329762	2018-04-10 21:58:13 +00:00
Craig Topper	304edc1e75	[X86] Emit native IR for pmuldq/pmuludq builtins. I believe all the pieces are now in place in the backend to make this work correctly. We can either mask the input to 32 bits for pmuludg or shl/ashr for pmuldq and use a regular mul instruction. The backend should combine this to PMULUDQ/PMULDQ and then SimplifyDemandedBits will remove the and/shifts. Differential Revision: https://reviews.llvm.org/D45421 llvm-svn: 329605	2018-04-09 19:17:54 +00:00
Alexander Kornienko	2a8c18d991	Fix typos in clang Found via codespell -q 3 -I ../clang-whitelist.txt Where whitelist consists of: archtype cas classs checkk compres definit frome iff inteval ith lod methode nd optin ot pres statics te thru Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few files that have dubious fixes reverted.) Differential revision: https://reviews.llvm.org/D44188 llvm-svn: 329399	2018-04-06 15:14:32 +00:00
Krzysztof Parzyszek	49fb6b5ecf	[Hexagon] Remove default values from lambda parameters llvm-svn: 329394	2018-04-06 13:51:48 +00:00
Gor Nishanov	2a78fa5209	[coroutines] Add __builtin_coro_noop => llvm.coro.noop A recent addition to Coroutines TS (https://wg21.link/p0913) adds a pre-defined coroutine noop_coroutine that does nothing. To implement this feature, we implemented an llvm.coro.noop intrinsic that returns a coroutine handle to a coroutine that does nothing when resumed or destroyed. This patch adds a builtin __builtin_coro_noop() that maps to llvm.coro.noop intrinsic. Related llvm change: https://reviews.llvm.org/D45114 llvm-svn: 328993	2018-04-02 17:35:37 +00:00
Krzysztof Parzyszek	790e422be9	[Hexagon] Aid bit-reverse load intrinsics lowering with bitcode The conversion of operatios to bitcode helps to eliminate an additional store in certain cases. We used to lower these load intrinsics in DAG to DAG conversion by which time, the "Dead Store Elimination" pass is already run. There is an associated LLVM patch. Patch by Sumanth Gundapaneni. llvm-svn: 328776	2018-03-29 13:54:31 +00:00
Krzysztof Parzyszek	1ef2a1f414	[Hexagon] Add support for "new" circular buffer intrinsics These instructions have been around for a long time, but we haven't supported intrinsics for them. The "new" vesrions use the CSx register for the start of the buffer instead of the K field in the Mx register. There is a related llvm patch. Patch by Brendon Cahoon. llvm-svn: 328725	2018-03-28 19:40:57 +00:00
Abderrazek Zaafrani	b5ac56fb81	[ARM] Add ARMv8.2-A FP16 vector intrinsic Putting back the code in commit r327189 that was reverted in r322737. The code is being committed in three stages and this one is the last stage: 1) r327455 fp16 feature flags, 2) r327836 pass half type or i16 based on FullFP16, and 3) the code here which the front-end fp16 vector intrinsic for ARM. Differential revision https://reviews.llvm.org/D43650 llvm-svn: 328277	2018-03-23 00:08:40 +00:00
Artem Belevich	30512869ff	[NVPTX] Make tensor shape part of WMMA intrinsic's name. This is needed for the upcoming implementation of the new 8x32x16 and 32x8x16 variants of WMMA instructions introduced in CUDA 9.1. Differential Revision: https://reviews.llvm.org/D44719 llvm-svn: 328158	2018-03-21 21:55:02 +00:00
Eric Fiselier	fa752f23cc	[Builtins] Overload __builtin_operator_new/delete to allow forwarding to usual allocation/deallocation functions. Summary: Libc++'s default allocator uses `__builtin_operator_new` and `__builtin_operator_delete` in order to allow the calls to new/delete to be ellided. However, libc++ now needs to support over-aligned types in the default allocator. In order to support this without disabling the existing optimization Clang needs to support calling the aligned new overloads from the builtins. See llvm.org/PR22634 for more information about the libc++ bug. This patch changes `__builtin_operator_new`/`__builtin_operator_delete` to call any usual `operator new`/`operator delete` function. It does this by performing overload resolution with the arguments passed to the builtin to determine which allocation function to call. If the selected function is not a usual allocation function a diagnostic is issued. One open issue is if the `align_val_t` overloads should be considered "usual" when `LangOpts::AlignedAllocation` is disabled. In order to allow libc++ to detect this new behavior the value for `__has_builtin(__builtin_operator_new)` has been updated to `201802`. Reviewers: rsmith, majnemer, aaron.ballman, erik.pilkington, bogner, ahatanak Reviewed By: rsmith Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D43047 llvm-svn: 328134	2018-03-21 19:19:48 +00:00
Abderrazek Zaafrani	585051ae74	[AArch64] Add vmulxh_lane fp16 vector intrinsic https://reviews.llvm.org/D44591 llvm-svn: 328038	2018-03-20 20:37:31 +00:00
Artem Belevich	914d4babec	[NVPTX] Make tensor load/store intrinsics overloaded. This way we can support address-space specific variants without explicitly encoding the space in the name of the intrinsic. Less intrinsics to deal with -> less boilerplate. Added a bit of tablegen magic to match/replace an intrinsics with a pointer argument in particular address space with the space-specific instruction variant. Updated tests to use non-default address spaces. Differential Revision: https://reviews.llvm.org/D43268 llvm-svn: 328006	2018-03-20 17:18:59 +00:00
Sjoerd Meijer	87793e7599	[ARM] Pass half or i16 types for NEON intrinsics For generating NEON intrinsics, this determines the NEON data type, and whether it should be a half type or an i16 type. I.e., we always pass a half type for AArch64, this hasn't changed, but now also for ARM but only when FullFP16 is enabled, and i16 otherwise. This is intended to be non-functional change, but together with the backend work in D44538 which adds support for f16 vectors, this enables adding the AArch32 FP16 (vector) intrinsics. Differential Revision: https://reviews.llvm.org/D44561 llvm-svn: 327836	2018-03-19 13:22:49 +00:00
Sjoerd Meijer	95da875898	This reverts "r327189 - [ARM] Add ARMv8.2-A FP16 vector intrinsic" This is causing problems in testing, and PR36683 was raised. Reverting it until we have sorted out how to pass f16 vectors. llvm-svn: 327437	2018-03-13 19:38:56 +00:00
Abderrazek Zaafrani	5bd68cf742	[ARM] Add ARMv8.2-A FP16 vector intrinsic Add the fp16 neon vector intrinsic for ARM as described in the ARM ACLE document. Reviews in https://reviews.llvm.org/D43650 llvm-svn: 327189	2018-03-09 23:39:34 +00:00
Craig Topper	ebb0838f74	[X86] Reverse the operand order of the implementation of the kunpack builtins. The second operand needs to be in the lower bits of the concatenation. This matches llvm 5.0, gcc, and icc behavior. Fixes PR36360. llvm-svn: 324954	2018-02-12 22:38:52 +00:00
Abderrazek Zaafrani	e7ed880761	[AArch64] Fixes for ARMv8.2-A FP16 scalar intrinsic - clang portion https://reviews.llvm.org/D42993 llvm-svn: 324940	2018-02-12 21:26:06 +00:00
Craig Topper	a57d64e30f	[X86] Change the signature of the AVX512 packed fp compare intrinsics to return vXi1 mask. Make bitcasts to scalar explicit in IR Summary: This is the clang equivalent of r324827 Reviewers: zvi, delena, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43143 llvm-svn: 324828	2018-02-10 23:34:27 +00:00
Craig Topper	c0b2e982d9	[X86] Replace kortest intrinsics with native IR. llvm-svn: 324647	2018-02-08 20:16:17 +00:00
Peter Collingbourne	9e31f0a389	IRGen: Emit an inline implementation of __builtin_wmemcmp on MSVCRT platforms. The MSVC runtime library does not provide a definition of wmemcmp, so we need an inline implementation. Differential Revision: https://reviews.llvm.org/D42441 llvm-svn: 323362	2018-01-24 18:59:58 +00:00
Dan Gohman	4f637e0ccc	[WebAssembly] Add mem.* builtin functions. This corresponds to r323222 in LLVM. The new names are not yet finalized, so use them at your own risk. llvm-svn: 323224	2018-01-23 17:04:04 +00:00
Abderrazek Zaafrani	ce8746d178	[AArch64] Add ARMv8.2-A FP16 scalar intrinsics https://reviews.llvm.org/D41792 llvm-svn: 323006	2018-01-19 23:11:18 +00:00
Craig Topper	f517f1a516	[X86] Implement old kunpck intrinsics using vector ops on vXi1 instead of integer shift/and/or Summary: kunpck intrinsics were removed in favor of native IR a few months ago. The implementation lowers them as by operation on the integer types passed to the intrinsic and then just shifting, masking, and oring them together. A special X86 DAG combine was added to recognize this patter and turn it into a concat_vector operation. I think it makes more sense to keep the IR implementation closer to vector operations on vXi1. Given that we expect these builtins to be used around other builtins that operate on k-registers which we try to represent in IR with vXi1. InstCombine should be able to get rid of the bitcasts between integers and vXi1 leaving only the vector operations. Reviewers: RKSimon, spatel, zvi, jina.nahias Reviewed By: RKSimon Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D42016 llvm-svn: 322461	2018-01-14 19:23:50 +00:00
Craig Topper	de91dff5d4	[X86] Replace cvt*2mask intrinsics with native IR using 'icmp slt X, zeroinitializer. llvm-svn: 322038	2018-01-08 22:37:56 +00:00
Benjamin Kramer	dfecbe9ad8	Add support for a limited subset of TS 18661-3 math builtins. These just overloads for _Float128. They're supported by GCC 7 and used by glibc. APFloat support is already there so just add the overloads. __builtin_copysignf128 __builtin_fabsf128 __builtin_huge_valf128 __builtin_inff128 __builtin_nanf128 __builtin_nansf128 This is the same support that GCC has, according to the documentation, but limited to _Float128. llvm-svn: 321948	2018-01-06 21:49:54 +00:00
Vedant Kumar	bbafd50756	[CGBuiltin] Handle unsigned mul overflow properly (PR35750) r320902 fixed the IRGen for some types of checked multiplications. It did not handle unsigned overflow correctly in the case where the signed operand is negative (PR35750). Eli pointed out that on overflow, the result must be equal to the unique value that is equivalent to the mathematically-correct result modulo two raised to the k power, where k is the number of bits in the result type. This patch fixes the specialized IRGen from r320902 accordingly. Testing: Apart from check-clang, I modified the test harness from r320902 to validate the results of all multiplications -- not just the ones which don't overflow: https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081 llvm.org/PR35750, rdar://34963321 Differential Revision: https://reviews.llvm.org/D41717 llvm-svn: 321771	2018-01-03 23:11:32 +00:00
Coby Tayree	2268576fa0	[x86][icelake][bitalg] added bitalg feature recognition added intrinsics support for bitalg instructions _mm512_popcnt_epi16 _mm512_mask_popcnt_epi16 _mm512_maskz_popcnt_epi16 _mm512_popcnt_epi8 _mm512_mask_popcnt_epi8 _mm512_maskz_popcnt_epi8 _mm512_mask_bitshuffle_epi64_mask _mm512_bitshuffle_epi64_mask _mm256_popcnt_epi16 _mm256_mask_popcnt_epi16 _mm256_maskz_popcnt_epi16 _mm128_popcnt_epi16 _mm128_mask_popcnt_epi16 _mm128_maskz_popcnt_epi16 _mm256_popcnt_epi8 _mm256_mask_popcnt_epi8 _mm256_maskz_popcnt_epi8 _mm128_popcnt_epi8 _mm128_mask_popcnt_epi8 _mm128_maskz_popcnt_epi8 _mm256_mask_bitshuffle_epi32_mask _mm256_bitshuffle_epi32_mask _mm128_mask_bitshuffle_epi16_mask _mm128_bitshuffle_epi16_mask matching a similar work on the backend (D40222) Differential Revision: https://reviews.llvm.org/D41564 llvm-svn: 321483	2017-12-27 10:01:00 +00:00
Craig Topper	170de4b4ba	[X86] Allow _mm_prefetch (both the header implementation and the builtin) to accept bit 2 which is supposed to indicate the prefetched addresses will be written to Add the appropriate _MM_HINT_ET0/ET1 defines to match gcc. llvm-svn: 321325	2017-12-21 23:50:22 +00:00
Abderrazek Zaafrani	abb890b7be	[AArch64] Enable fp16 data type for the Builtin for AArch64 only. Differential Revision: https:://reviews.llvm.org/D41360 llvm-svn: 321301	2017-12-21 20:10:03 +00:00
Abderrazek Zaafrani	f58a132eef	[AARch64] Add ARMv8.2-A FP16 vector intrinsics Putting back the code that was reverted few weeks ago. Differential Revision: https://reviews.llvm.org/D34161 llvm-svn: 321294	2017-12-21 19:20:01 +00:00
Vedant Kumar	09b5bfdd85	[ubsan] Diagnose noreturn functions which return Diagnose 'unreachable' UB when a noreturn function returns. 1. Insert a check at the end of functions marked noreturn. 2. A decl may be marked noreturn in the caller TU, but not marked in the TU where it's defined. To diagnose this scenario, strip away the noreturn attribute on the callee and insert check after calls to it. Testing: check-clang, check-ubsan, check-ubsan-minimal, D40700 rdar://33660464 Differential Revision: https://reviews.llvm.org/D40698 llvm-svn: 321231	2017-12-21 00:10:25 +00:00
Adrian Prantl	f3b3ccda59	Silence a bunch of implicit fallthrough warnings llvm-svn: 321115	2017-12-19 22:06:11 +00:00
Craig Topper	5028ace602	[X86] Implement kand/kandn/kor/kxor/kxnor/knot intrinsics using native IR. llvm-svn: 320919	2017-12-16 08:26:22 +00:00
Craig Topper	b846d1ff76	[X86] Add builtins and tests for 128 and 256 bit vpopcntdq. llvm-svn: 320915	2017-12-16 06:02:31 +00:00
Vedant Kumar	fa5a0e59f0	[CodeGen] Specialize mixed-sign mul-with-overflow (fix PR34920) This patch introduces a specialized way to lower overflow-checked multiplications with mixed-sign operands. This fixes link failures and ICEs on code like this: void mul(int64_t a, uint64_t b) { int64_t res; __builtin_mul_overflow(a, b, &res); } The generic checked-binop irgen would use a 65-bit multiplication intrinsic here, which requires runtime support for _muloti4 (128-bit multiplication), and therefore fails to link on i386. To get an ICE on x86_64, change the example to use __int128_t / __uint128_t. Adding runtime and backend support for 65-bit or 129-bit checked multiplication on all of our supported targets is infeasible. This patch solves the problem by using simpler, specialized irgen for the mixed-sign case. llvm.org/PR34920, rdar://34963321 Testing: Apart from check-clang, I compared the output from this fairly comprehensive test driver using unpatched & patched clangs: https://gist.github.com/vedantk/3eb9c88f82e5c32f2e590555b4af5081 Differential Revision: https://reviews.llvm.org/D41149 llvm-svn: 320902	2017-12-16 01:28:25 +00:00
Reid Kleckner	627f45fe52	[CodeGen][X86] Implement _InterlockedCompareExchange128 intrinsic Summary: InterlockedCompareExchange128 is a bit more complicated than the other InterlockedCompareExchange functions, so it requires a bit more work. It doesn't directly refer to 128bit ints, instead it takes pointers to 64bit ints for Destination and ComparandResult, and exchange is taken as two 64bit ints (high & low). The previous value is written to ComparandResult, and success is returned. This implementation does the following in order to produce a cmpxchg instruction: 1. Cast everything to 128bit ints or int pointers, and glues together the Exchange values 2. Reads from CompareandResult to get the comparand 3. Calls cmpxchg volatile (on X86 this will produce a lock cmpxchg16b instruction) 1. Result 0 (previous value) is written back to ComparandResult 2. Result 1 (success bool) is zext'ed to a uchar and returned Resolves bug https://llvm.org/PR35251 Patch by Colden Cullen! Reviewers: rnk, agutowski Reviewed By: rnk Subscribers: majnemer, cfe-commits Differential Revision: https://reviews.llvm.org/D41032 llvm-svn: 320730	2017-12-14 19:00:21 +00:00
Krzysztof Parzyszek	5a6558382c	[Hexagon] Intrinsic support for V62 and V65 llvm-svn: 320609	2017-12-13 19:56:03 +00:00
Sanjay Patel	08fba37e9d	[CodeGen] fix mapping from fmod calls to frem instruction Similar to D40044 and discussed in D40594. llvm-svn: 319619	2017-12-02 17:52:00 +00:00
Sanjay Patel	0c0f77d03d	[CodeGen] remove stale comment; NFC The libm functions with LLVM intrinsic twins were moved above this blob with: https://reviews.llvm.org/rL319593 llvm-svn: 319618	2017-12-02 16:29:34 +00:00
Sanjay Patel	3e287b4d35	[CodeGen] convert math libcalls/builtins to equivalent LLVM intrinsics There are 20 LLVM math intrinsics that correspond to mathlib calls according to the LangRef: http://llvm.org/docs/LangRef.html#standard-c-library-intrinsics We were only converting 3 mathlib calls (sqrt, fma, pow) and 12 builtin calls (ceil, copysign, fabs, floor, fma, fmax, fmin, nearbyint, pow, rint, round, trunc) to their intrinsic-equivalents. This patch pulls the transforms together and handles all 20 cases. The switch is guarded by a check for const-ness to make sure we're not doing the transform if errno could possibly be set by the libcall or builtin. Differential Revision: https://reviews.llvm.org/D40044 llvm-svn: 319593	2017-12-01 23:15:52 +00:00

1 2 3 4 5 ...

869 Commits