llvm-project

Commit Graph

Author	SHA1	Message	Date
Reid Kleckner	211b1f324f	Revert "[IR] Make AttributeSetNode public, avoid temporary AttributeList copies" This reverts r299875. A Linux bot came back with a test failure: http://bb.pgr.jp/builders/test-clang-i686-linux-RA/builds/741/steps/test_clang/logs/Clang%20%3A%3A%20CodeGen__2006-05-19-SingleEltReturn.c llvm-svn: 299878	2017-04-10 20:34:19 +00:00
Reid Kleckner	324c99dee5	[IR] Make AttributeSetNode public, avoid temporary AttributeList copies Summary: AttributeList::get(Fn\|Ret\|Param)Attributes no longer creates a temporary AttributeList just to hide the AttributeSetNode type. I've also added a factory method to create AttributeLists from a parallel array of AttributeSetNodes. I think this simplifies construction of AttributeLists when rewriting function prototypes. Previously we would test if a particular index had attributes, and conditionally add a temporary attribute list to a vector. Now the attribute set vector is parallel to the argument vector already that these passes already construct. My long term vision is to wrap AttributeSetNode* inside an AttributeSet type that holds the enum attributes, but that will come in a follow up change. I haven't done any performance measurements for this change because profiling hasn't shown that any of the affected code is hot. Reviewers: pete, chandlerc, sanjoy, hfinkel Reviewed By: pete Subscribers: jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D31198 llvm-svn: 299875	2017-04-10 20:18:10 +00:00
Joerg Sonnenberger	28bed106e0	Do not translate rint into nearbyint, but truncate it like nearbyint. A common way to implement nearbyint is by fiddling with the floating point environment and calling rint. This is used at least by the BSD libm and musl. As such, canonicalizing the latter to the former will create infinite loops for libm and generally pessimize performance, at least when the generic C versions are used. This change preserves the rint in the libcall translation and also handles the domain truncation logic, so that rint with float argument will be reduced to rintf etc. llvm-svn: 299247	2017-03-31 19:58:07 +00:00
Dehao Chen	fed890ea3a	Fix the InstCombine to reserve the VP metadata and sets correct call count. Summary: Currently the VP metadata was dropped when InstCombine converts a call to direct call. This patch converts the VP metadata to branch_weights so that its hotness is recorded. Reviewers: eraman, davidxl Reviewed By: davidxl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D31344 llvm-svn: 299228	2017-03-31 15:59:52 +00:00
Simon Pilgrim	68168d17b9	Spelling mistakes in comments. NFCI. Based on corrections mentioned in patch for clang for PR27635 llvm-svn: 299072	2017-03-30 12:59:53 +00:00
Matt Arsenault	4c7795dd31	AMDGPU: Fold rcp/rsq of undef to undef llvm-svn: 298725	2017-03-24 19:04:57 +00:00
Reid Kleckner	b518054b87	Rename AttributeSet to AttributeList Summary: This class is a list of AttributeSetNodes corresponding the function prototype of a call or function declaration. This class used to be called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is typically accessed by parameter and return value index, so "AttributeList" seems like a more intuitive name. Rename AttributeSetImpl to AttributeListImpl to follow suit. It's useful to rename this class so that we can rename AttributeSetNode to AttributeSet later. AttributeSet is the set of attributes that apply to a single function, argument, or return value. Reviewers: sanjoy, javed.absar, chandlerc, pete Reviewed By: pete Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits Differential Revision: https://reviews.llvm.org/D31102 llvm-svn: 298393	2017-03-21 16:57:19 +00:00
Matt Arsenault	d81f557fe2	AMDGPU: Fold icmp/fcmp into icmp intrinsic The typical use is a library vote function which compares to 0. Fold the user condition into the intrinsic. llvm-svn: 297650	2017-03-13 18:14:02 +00:00
Mikael Holmen	760dc9aba7	Remove sometimes faulty rewrite of memcpy in instcombine. Summary: Solves PR 31990. The bad rewrite could replace a memcpy of one word with store i4 -1 while it should actually be store i8 -1 Hopefully opt and llc has improved enough so the original optimization done by the code isn't needed anymore. One already existing testcase is affected. It originally tested that the memcpy was replaced with load double but since we now remove that rewrite it will be load i64 instead. Patch suggestion by Eli Friedman. Reviewers: eli.friedman, majnemer, efriedma Reviewed By: efriedma Subscribers: efriedma, llvm-commits Differential Revision: https://reviews.llvm.org/D30254 llvm-svn: 296585	2017-03-01 06:45:20 +00:00
Matt Arsenault	cdb468c0f9	AMDGPU: Basic folds for fmed3 intrinsic Constant fold, canonicalize constants to RHS, reduce to minnum/maxnum when inputs are nan/undef. llvm-svn: 296409	2017-02-27 23:08:49 +00:00
Matt Arsenault	d4bca1e9ef	AMDGPU: Replace disabled exp inputs with undef llvm-svn: 295914	2017-02-23 00:44:03 +00:00
Matt Arsenault	f5262256a1	AMDGPU: Add replacement bfe intrinsics llvm-svn: 295899	2017-02-22 23:04:58 +00:00
Matt Arsenault	1f17c66890	AMDGPU: Add cvt.pkrtz intrinsic Convert llvm.SI.packf16 test uses llvm-svn: 295797	2017-02-22 00:27:34 +00:00
Matt Arsenault	920576042d	InstCombine: Canonicalize fast fmuladd to fmul + fadd llvm-svn: 295353	2017-02-16 18:46:24 +00:00
Craig Topper	3731f4d173	[AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus intrinsics like it does 128/256-bit. llvm-svn: 295294	2017-02-16 07:35:23 +00:00
Igor Laevsky	a9b6872908	[InstComobineCalls] Fix buildbot failures after r294453. Some targets don't support uint64_t options. Change type to unsigned. Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294461	2017-02-08 15:21:48 +00:00
Igor Laevsky	900ffa34c8	[InstCombineCalls] Unfold element atomic memcpy instruction Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294453	2017-02-08 14:32:04 +00:00
Igor Laevsky	4b317fa24e	[InstCombineCalls] Remove zero length atomic memcpy intrinsics Differential Revision: https://reviews.llvm.org/D28909 llvm-svn: 294452	2017-02-08 14:23:47 +00:00
Sanjoy Das	e0e5795f6b	[InstCombine] Allow InstCombine to merge adjacent guards Summary: If there are two adjacent guards with different conditions, we can remove one of them and include its condition into the condition of another one. This patch allows InstCombine to merge them by the following pattern: guard(a); guard(b) -> guard(a & b). Reviewers: reames, apilipenko, igor-laevsky, anna, sanjoy Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29378 llvm-svn: 293778	2017-02-01 16:34:55 +00:00
Davide Italiano	aec4617dc8	[Instcombine] Combine consecutive identical fences Differential Revision: https://reviews.llvm.org/D29314 llvm-svn: 293661	2017-01-31 18:09:05 +00:00
Justin Lebar	25ebe2d767	[NVPTX] [InstCombine] Add llvm_unreachable to appease MSVC. llvm-svn: 293253	2017-01-27 02:04:07 +00:00
Justin Lebar	e3ac0fb948	[NVPTX] Fix use-after-stack-free bug in InstCombineCalls. Introduced in r293244. llvm-svn: 293251	2017-01-27 01:49:39 +00:00
Justin Lebar	698c31b8db	[NVPTX] Upgrade NVVM intrinsics in InstCombineCalls. Summary: There are many NVVM intrinsics that we can't entirely get rid of, but that nonetheless often correspond to target-generic LLVM intrinsics. For example, if flush denormals to zero (ftz) is enabled, we can convert @llvm.nvvm.ceil.ftz.f to @llvm.ceil.f32. On the other hand, if ftz is disabled, we can't do this, because @llvm.ceil.f32 will be lowered to a non-ftz PTX instruction. In this case, we can, however, simplify the non-ftz nvvm ceil intrinsic, @llvm.nvvm.ceil.f, to @llvm.ceil.f32. These transformations are particularly useful because they let us constant fold instructions that appear in libdevice, the bitcode library that ships with CUDA and essentially functions as its libm. Reviewers: tra Subscribers: hfinkel, majnemer, llvm-commits Differential Revision: https://reviews.llvm.org/D28794 llvm-svn: 293244	2017-01-27 00:58:58 +00:00
Sanjoy Das	7516192a71	Revert a couple of InstCombine/Guard checkins This change reverts: r293061: "[InstCombine] Canonicalize guards for NOT OR condition" r293058: "[InstCombine] Canonicalize guards for AND condition" They miscompile cases like: ``` declare void @llvm.experimental.guard(i1, ...) define void @test_guard_not_or(i1 %A, i1 %B) { %C = or i1 %A, %B %D = xor i1 %C, true call void(i1, ...) @llvm.experimental.guard(i1 %D, i32 20, i32 30)[ "deopt"() ] ret void } ``` because they do transfer the `i32 20, i32 30` parameters to newly created guard instructions. llvm-svn: 293227	2017-01-26 23:38:11 +00:00
Craig Topper	b6122122c9	[X86] Add demanded elts support for the inputs to pclmul intrinsic This intrinsic uses bit 0 and bit 4 of an immediate argument to determine which bits of its inputs to read. This patch uses this information to simplify the demanded elements of the input vectors. Differential Revision: https://reviews.llvm.org/D28979 llvm-svn: 293151	2017-01-26 05:17:13 +00:00
Artur Pilipenko	b85f7a5d99	[InstCombine] Canonicalize guards for NOT OR condition This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D29075 Patch by Maxim Kazantsev. llvm-svn: 293061	2017-01-25 14:45:12 +00:00
Simon Pilgrim	6f6b279109	[InstCombine][SSE] Add support for PACKSS/PACKUS constant folding Differential Revision: https://reviews.llvm.org/D28949 llvm-svn: 293060	2017-01-25 14:37:24 +00:00
Artur Pilipenko	4df4c4a4aa	[InstCombine] Canonicalize guards for AND condition This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: apilipenko Differential Revision: https://reviews.llvm.org/D29074 Patch by Maxim Kazantsev. llvm-svn: 293058	2017-01-25 14:20:52 +00:00
Artur Pilipenko	e812ca00bb	[InstCombine] Allow InstrCombine to remove one of adjacent guards if they are equivalent This is a partial fix for Bug 31520 - [guards] canonicalize guards in instcombine Reviewed By: majnemer, apilipenko Differential Revision: https://reviews.llvm.org/D29071 Patch by Maxim Kazantsev. llvm-svn: 293056	2017-01-25 14:12:12 +00:00
Simon Pilgrim	78f8630ac0	[InstCombine][X86] MULDQ/MULUDQ undef -> zero Added early out for single undef input - we were already supporting (and testing) this in the constant folding code, we just do it quicker now Drop undef handling from demanded elts code now that we handle it fully in InstCombiner::visitCallInst llvm-svn: 292913	2017-01-24 11:07:41 +00:00
Matt Arsenault	954a624fb9	SimplifyLibCalls: Replace more unary libcalls with intrinsics llvm-svn: 292855	2017-01-23 23:55:08 +00:00
Simon Pilgrim	f6f3a36159	[InstCombine][X86] Add MULDQ/MULUDQ constant folding support llvm-svn: 292793	2017-01-23 15:22:59 +00:00
Simon Pilgrim	bb13fdabec	[InstCombine][X86] MULDQ/MULUDQ undef -> zero Match generic mul behaviour so that <X x i64> multiply and muldq/muludq pattern act the same llvm-svn: 292784	2017-01-23 12:07:32 +00:00
Simon Pilgrim	a50a93fcd0	[InstCombine][X86] Add MULDQ/MULUDQ undef handling llvm-svn: 292627	2017-01-20 18:20:30 +00:00
Simon Pilgrim	a22c3a1c0f	[InstCombine] Remove unnecessary intrinsics demanded elts handling As discussed on D28777 - we don't need to handle 'all element' shuffles inside InstCombiner::visitCallInst as InstCombiner::SimplifyDemandedVectorElts will do everything we need. llvm-svn: 292365	2017-01-18 13:44:04 +00:00
Simon Pilgrim	d4eb800b03	[InstCombine][X86][AVX] Add DemandedElts support for VPERMILPD/VPERMILPS instructions Simplify a vpermilvar shuffle mask based on the elements of the mask that are actually demanded. llvm-svn: 292209	2017-01-17 11:35:03 +00:00
Matt Arsenault	7233344c28	SimplifyLibCalls: Replace fabs libcalls with intrinsics Add missing fabs(fpext) optimzation that worked with the call, and also fixes it creating a second fpext when there were multiple uses. llvm-svn: 292172	2017-01-17 00:10:40 +00:00
Simon Pilgrim	73a68c25a0	[InstCombine][SSE] Add DemandedElts support for PSHUFB instructions Simplify a pshufb shuffle mask based on the elements of the mask that are actually demanded. Differential Revision: https://reviews.llvm.org/D28745 llvm-svn: 292101	2017-01-16 11:30:41 +00:00
Hal Finkel	8a9a783f2c	Make processing @llvm.assume more efficient - Add affected values to the assumption cache Here's my second try at making @llvm.assume processing more efficient. My previous attempt, which leveraged operand bundles, r289755, didn't end up working: it did make assume processing more efficient but eliminating the assumption cache made ephemeral value computation too expensive. This is a more-targeted change. We'll keep the assumption cache, but extend it to keep a map of affected values (i.e. values about which an assumption might provide some information) to the corresponding assumption intrinsics. This allows ValueTracking and LVI to find assumptions relevant to the value being queried without scanning all assumptions in the function. The fact that ValueTracking started doing O(number of assumptions in the function) work, for every known-bits query, has become prohibitively expensive in some cases. As discussed during the review, this is a pragmatic fix that, longer term, will likely be replaced by a more-principled solution (perhaps based on an extended SSA form). Differential Revision: https://reviews.llvm.org/D28459 llvm-svn: 291671	2017-01-11 13:24:24 +00:00
Matt Arsenault	3f509042b0	InstCombine: Set operands instead of creating new call llvm-svn: 291612	2017-01-10 23:17:52 +00:00
Matt Arsenault	3bdd75d01e	InstCombine: Fold cos(-x) -> cos(x) Also cos(fabs(x)) -> cos(x) llvm-svn: 291022	2017-01-04 22:49:03 +00:00
Matt Arsenault	56ff4839ae	InstCombine: Fold fabs on select of constants llvm-svn: 290913	2017-01-03 22:40:34 +00:00
Sanjay Patel	f0d1e77373	[InstCombine] use 'match' to reduce code bloat; NFCI I wrote this patch before seeing the comment in: https://reviews.llvm.org/D27114 ...that suggests we should actually be canonicalizing the other way. So just in case we decide this is the right way, we might as well have a cleaner implementation. llvm-svn: 290912	2017-01-03 22:25:31 +00:00
Matt Arsenault	b264c94963	InstCombine: Add fma with constant transforms DAGCombine already does these. llvm-svn: 290860	2017-01-03 04:32:35 +00:00
Matt Arsenault	1cc294c85d	InstCombine: Add fma + fabs/fneg transforms fma (fneg x), (fneg y), z -> fma x, y, z fma (fabs x), (fabs x), z -> fma x, x, z llvm-svn: 290859	2017-01-03 04:32:31 +00:00
Craig Topper	d00db69227	[InstCombine][AVX-512] Teach InstCombine that llvm.x86.avx512.vcomi.sd and llvm.x86.avx512.vcomi.ss don't use the upper elements of their input. This was already done for the SSE/SSE2 version of the intrinsics. llvm-svn: 290776	2016-12-31 00:45:06 +00:00
Craig Topper	991636312b	[InstCombine][AVX-512] When turning intrinsics with masking into native IR, don't emit a select if the mask is known to be all ones. This saves InstCombine the burden of having to optimize the select later. llvm-svn: 290774	2016-12-30 23:06:28 +00:00
Craig Topper	72f2d4e8d6	[InstCombine][X86] Add DemandedElts support for 512-bit PMULDQ/PMULUDQ instructions PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use. This builds on r290554 which added supported for 128 and 256-bit. llvm-svn: 290582	2016-12-27 05:30:09 +00:00
Craig Topper	7f8540b5e7	[AVX-512][InstCombine] Teach InstCombine to turn masked scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION. An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed. llvm-svn: 290566	2016-12-27 01:56:30 +00:00
Craig Topper	020b228155	[AVX-512][InstCombine] Teach InstCombine to turn packed add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION. llvm-svn: 290559	2016-12-27 00:23:16 +00:00

1 2 3 4 5 ...

447 Commits