llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	c9cf7fc7a4	[InstCombine][X86] Add DemandedElts support for PMULDQ/PMULUDQ instructions PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use. Differential Revision: https://reviews.llvm.org/D28119 llvm-svn: 290554	2016-12-26 23:28:17 +00:00
Craig Topper	7b788ada2d	[AVX-512][InstCombine] Teach InstCombine to turn scalar add/sub/mul/div with rounding intrinsics into normal IR operations if the rounding mode is CUR_DIRECTION. Summary: I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon. I'll do the same thing for packed add/sub/mul/div in a future patch. Reviewers: delena, RKSimon, zvi, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27879 llvm-svn: 290535	2016-12-26 06:33:19 +00:00
Craig Topper	e328045711	[AVX-512][InstCombine] Teach InstCombine to converted masked vpermv intrinsics into shufflevector instructions Summary: This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants. We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that. Reviewers: zvi, delena, spatel, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27825 llvm-svn: 290530	2016-12-25 23:58:57 +00:00
George Burgess IV	3f08914e7e	[Analysis] Centralize objectsize lowering logic. We're currently doing nearly the same thing for @llvm.objectsize in three different places: two of them are missing checks for overflow, and one of them could subtly break if InstCombine gets much smarter about removing alloc sites. Seems like a good idea to not do that. llvm-svn: 290214	2016-12-20 23:46:36 +00:00
Daniel Jasper	aec2fa352f	Revert @llvm.assume with operator bundles (r289755-r289757) This creates non-linear behavior in the inliner (see more details in r289755's commit thread). llvm-svn: 290086	2016-12-19 08:22:17 +00:00
Craig Topper	ab5f355d8c	[AVX-512][InstCombine] Add masked scalar FMA intrinsics to SimplifyDemandedVectorElts. llvm-svn: 289759	2016-12-15 03:49:45 +00:00
Hal Finkel	3ca4a6bcf1	Remove the AssumptionCache After r289755, the AssumptionCache is no longer needed. Variables affected by assumptions are now found by using the new operand-bundle-based scheme. This new scheme is more computationally efficient, and also we need much less code... llvm-svn: 289756	2016-12-15 03:02:15 +00:00
Hal Finkel	cb9f78e1c3	Make processing @llvm.assume more efficient by using operand bundles There was an efficiency problem with how we processed @llvm.assume in ValueTracking (and other places). The AssumptionCache tracked all of the assumptions in a given function. In order to find assumptions relevant to computing known bits, etc. we searched every assumption in the function. For ValueTracking, that means that we did O(#assumes * #values) work in InstCombine and other passes (with a constant factor that can be quite large because we'd repeat this search at every level of recursion of the analysis). Several of us discussed this situation at the last developers' meeting, and this implements the discussed solution: Make the values that an assume might affect operands of the assume itself. To avoid exposing this detail to frontends and passes that need not worry about it, I've used the new operand-bundle feature to add these extra call "operands" in a way that does not affect the intrinsic's signature. I think this solution is relatively clean. InstCombine adds these extra operands based on what ValueTracking, LVI, etc. will need and then those passes need only search the users of the values under consideration. This should fix the computational-complexity problem. At this point, no passes depend on the AssumptionCache, and so I'll remove that as a follow-up change. Differential Revision: https://reviews.llvm.org/D27259 llvm-svn: 289755	2016-12-15 02:53:42 +00:00
Craig Topper	aeaa52cc11	[X86][InstCombine] Handle demanded elements for operand of AVX-512 scalar floating point to integer conversion intrinsics. llvm-svn: 289639	2016-12-14 07:46:12 +00:00
Craig Topper	268b3abe6d	[X86][InstCombine] Teach SimplifyDemandedVectorElts to handle masked scalar add/sub/mul/div/max/min intrinsics better. Now we can remove these intrinsics if element 0 isn't used. Also fix undef element tracking. llvm-svn: 289636	2016-12-14 06:06:58 +00:00
Craig Topper	dfd268d76b	[X86][InstCombine] Handle scalar fmadd intrinsics correctly in SimplifyDemandedVectorElts. Now we pass a modified version of DemandedElts to each operand and we calculate undef elts correctly. llvm-svn: 289632	2016-12-14 05:43:05 +00:00
Craig Topper	eb6a20e79e	[X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar round intrinsics more correctly. Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Similarly we clear bit 0 for optimizing operand 0. Also calculate UndefElts correctly. Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support. llvm-svn: 289629	2016-12-14 03:17:30 +00:00
Craig Topper	a0372dec26	[X86][InstCombine] Teach SimplifyDemandedVectorElts to handle scalar min/max/cmp intrinsics more correctly. Now we only pass bit 0 of the DemandedElts to optimize operand 1 as we recurse since the upper bits are unused. Also calculate UndefElts correctly. Simplify InstCombineCalls for these instrinics to just call SimplifyDemandedVectorElts for the call instrution to reuse this support. llvm-svn: 289628	2016-12-14 03:17:27 +00:00
Craig Topper	ac75bca1eb	[X86][InstCombine] Fix SimplifyDemandedVectorElts to handle frcz scalar intrinsics correctly. Only the lower bits of the input element are used. And only the lower element can be undef since the upper bits are zeroed. Have InstCombineCalls call SimplifyDemandedVectorElts for these intrinsics to reuse this support. llvm-svn: 289523	2016-12-13 07:45:45 +00:00
Craig Topper	61b280e7b0	[X86][InstCombine] Teach InstCombineCalls to simplify demanded elements for scalar FMA intrinsics. These intrinsics don't read the upper bits of their second and third inputs so we can try to simplify them. llvm-svn: 289372	2016-12-11 07:42:06 +00:00
Craig Topper	d96395365a	[AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded for scalar cmp intrinsics with masking and rounding. These intrinsics don't read the upper elements of their first and second input. These are slightly different the the SSE version which does use the upper bits of its first element as passthru bits since the result goes to an XMM register. For AVX-512 the result goes to a mask register instead. llvm-svn: 289371	2016-12-11 07:42:04 +00:00
Craig Topper	790d0fa569	[AVX-512][InstCombine] Teach InstCombineCalls how to simplify demanded elements for scalar add,div,mul,sub,max,min intrinsics with masking and rounding. These intrinsics don't read the upper bits of their second input. And the third input is the passthru for masking and that only uses the lower element as well. llvm-svn: 289370	2016-12-11 07:42:01 +00:00
Craig Topper	58917f3508	[AVX-512][InstCombine] Add 512-bit vpermilvar intrinsics to InstCombineCalls to match 128 and 256-bit. llvm-svn: 289354	2016-12-11 01:59:36 +00:00
Craig Topper	9a63d7ade5	[X86][InstCombine] Teach InstCombineCalls to turn pshufb intrinsic into a shufflevector if the indices are constant. llvm-svn: 289348	2016-12-11 00:23:50 +00:00
David Majnemer	d5648c7a7d	Replace some callers of setTailCall with setTailCallKind We were a little sloppy with adding tailcall markers. Be more consistent by using setTailCallKind instead of setTailCall. llvm-svn: 287955	2016-11-25 22:35:09 +00:00
Craig Topper	1de753f7f5	[InstCombine][AVX-512] Teach InstCombineCalls how to handle the intrinsics for variable shift with 16-bit elements. This is a straightforward extension of the existing support for 32/64-bit element types. Just needed to add the additional instrinsics to the switches. llvm-svn: 287316	2016-11-18 06:04:33 +00:00
Craig Topper	6910fa0ef4	[X86] Remove the scalar intrinsics for fadd/fsub/fdiv/fmul Summary: These intrinsics have been unused for clang for a while. This patch removes them. We auto upgrade them to extractelements, a scalar operation and then an insertelement. This matches the sequence used by clangs intrinsic file. Reviewers: zvi, delena, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26660 llvm-svn: 287083	2016-11-16 05:24:10 +00:00
Craig Topper	b4173a5a70	[InstCombine][AVX-512] Teach InstCombineCalls to handle the new unmasked AVX-512 variable shift intrinsics. llvm-svn: 286755	2016-11-13 07:26:19 +00:00
Craig Topper	8b831cbb2a	[InstCombine][AVX-512] Expand vector shift handling to work on the AVX-512 shift by immediate and shift by single value. This does not include support for the AVX-512 variable shifts. That will be coming in a future patch. llvm-svn: 286739	2016-11-13 01:51:55 +00:00
Andrea Di Biagio	f3fd316223	[InstCombine][SSE4a] Fix assertion failure in the insertq/insertqi combining logic. This fixes a similar issue to the one already fixed by r280804 (revieved in D24256). Revision 280804 fixed the problem with unsafe dyn_casts in the extrq/extrqi combining logic. However, it turns out that even the insertq/insertqi logic was affected by the same problem. llvm-svn: 280807	2016-09-07 12:47:53 +00:00
Andrea Di Biagio	8df5b9cf48	[InstCombine][SSE4a] Fix assertion failure caused by unsafe dyn_casts on the operands of extrq/extrqi intrinsic calls. This patch fixes an assertion failure caused by unsafe dynamic casts on the constant operands of sse4a intrinsic calls to extrq/extrqi The combine logic that simplifies sse4a extrq/extrqi intrinsic calls currently checks if the input operands are constants. Internally, that logic relies on dyn_casts of values returned by calls to method Constant::getAggregateElement. However, method getAggregateElemet may return nullptr if the constant element cannot be retrieved. So, all the dyn_casts can potentially fail. This is what happens for example if a constexpr value is passed in input to an extrq/extrqi intrinsic call. This patch fixes the problem by using a dyn_cast_or_null (instead of a simple dyn_cast) on the result of each call to Constant::getAggregateElement. Added reproducible test cases to x86-sse4a.ll. Differential Revision: https://reviews.llvm.org/D24256 llvm-svn: 280804	2016-09-07 12:03:03 +00:00
Dorit Nuzman	abd15f69b2	[InstCombine] Preserve llvm.mem.parallel_loop_access metadata when replacing memcpy with ld/st. When InstCombine replaces a memcpy with loads+stores it does not copy over the llvm.mem.parallel_loop_access from the memcpy instruction. This patch fixes that. Differential Revision: https://reviews.llvm.org/D23499 llvm-svn: 280617	2016-09-04 07:49:39 +00:00
Dorit Nuzman	7673ba7ac2	Test commit. llvm-svn: 280615	2016-09-04 07:06:00 +00:00
Matt Arsenault	46a0382ab2	AMDGPU: Do basic folding of class intrinsic This allows more of the OCML builtin library to be constant folded. llvm-svn: 280586	2016-09-03 07:06:58 +00:00
Amaury Sechet	763c59dc9a	Make cltz and cttz zero undef when the operand cannot be zero in InstCombine Summary: Also add popcount(n) == bitsize(n) -> n == -1 transformation. Reviewers: majnemer, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D23134 llvm-svn: 279141	2016-08-18 20:43:50 +00:00
Justin Bogner	cd1d5aaf2e	Replace a few more "fall through" comments with LLVM_FALLTHROUGH Follow up to r278902. I had missed "fall through", with a space. llvm-svn: 278970	2016-08-17 20:30:52 +00:00
Eugene Zelenko	cdc7161281	Fix some Clang-tidy modernize and Include What You Use warnings. Differential revision: https://reviews.llvm.org/D23291 llvm-svn: 278364	2016-08-11 17:20:18 +00:00
Sanjay Patel	38ae83de38	fix comment; NFC llvm-svn: 278342	2016-08-11 15:23:56 +00:00
Sanjay Patel	e3c335cbed	use auto* with dyn_cast ; NFC llvm-svn: 278340	2016-08-11 15:21:21 +00:00
Sanjay Patel	5a470950b9	getParent()->getParent() == getFunction() ; NFC llvm-svn: 278339	2016-08-11 15:16:06 +00:00
Sanjay Patel	8e3ab17c44	[InstCombine] refactor ctlz/cttz folds (NFCI) Note that this fold really belongs in InstSimplify. Refactoring here anyway as an intermediate step because there's a planned addition to this function in D23134. Differential Revision: https://reviews.llvm.org/D23223 llvm-svn: 277883	2016-08-05 22:42:46 +00:00
Justin Bogner	9979840f59	InstCombine: Replace some never-null pointers with references. NFC llvm-svn: 277792	2016-08-05 01:06:44 +00:00
Vitaly Buka	0ab23cf1c8	Do not remove empty lifetime.start/lifetime.end ranges Summary: Asan stack-use-after-scope check should poison alloca even if there is no access between start and end. This is possible for code like this: for (int i = 0; i < 3; i++) { int x; p = &x; } "Loop Invariant Code Motion" will move "p = &x;" out of the loop, making start/end range empty. PR27453 Reviewers: eugenis Differential Revision: https://reviews.llvm.org/D22842 llvm-svn: 277072	2016-07-28 22:59:03 +00:00
Vitaly Buka	2fae6a7702	Should be committed as one CL. This reverts commits r277068 r277067 r277066. llvm-svn: 277071	2016-07-28 22:59:01 +00:00
Vitaly Buka	f0500b6ae5	Do not remove empty lifetime.start/lifetime.end ranges Summary: Asan stack-use-after-scope check should poison alloca even if there is no access between start and end. This is possible for code like this: for (int i = 0; i < 3; i++) { int x; p = &x; } "Loop Invariant Code Motion" will move "p = &x;" out of the loop, making start/end range empty. PR27453 Reviewers: eugenis Differential Revision: https://reviews.llvm.org/D22842 llvm-svn: 277068	2016-07-28 22:50:48 +00:00
Vitaly Buka	3645793872	maned llvm-svn: 277067	2016-07-28 22:50:45 +00:00
Vitaly Buka	caca9da4ff	range llvm-svn: 277066	2016-07-28 22:50:43 +00:00
David Majnemer	666aa945a5	[InstCombine] Masked loads with undef masks can fold to normal loads We were able to fold masked loads with an all-ones mask to a normal load. However, we couldn't turn a masked load with a mask with mixed ones and undefs into a normal load. llvm-svn: 275380	2016-07-14 06:58:42 +00:00
David Majnemer	d77a3b61eb	Move a transform from InstCombine to InstSimplify. This transform doesn't require any new instructions, it can safely live in InstSimplify. llvm-svn: 275344	2016-07-13 23:32:53 +00:00
Sean Silva	45835e731d	Remove dead TLI arg of isKnownNonNull and propagate deadness. NFC. This actually uncovered a surprisingly large chain of ultimately unused TLI args. From what I can gather, this argument is a remnant of when isKnownNonNull would look at the TLI directly. The current approach seems to be that InferFunctionAttrs runs early in the pipeline and uses TLI to annotate the TLI-dependent non-null information as return attributes. This also removes the dependence of functionattrs on TLI altogether. llvm-svn: 274455	2016-07-02 23:47:27 +00:00
Matt Arsenault	802ebcb4bb	InstCombine: Don't strip convergent from intrinsic callsites Specific instances of intrinsic calls may want to be convergent, such as certain register reads but the intrinsic declaration is not. llvm-svn: 273188	2016-06-20 19:04:44 +00:00
Craig Topper	99d1eab327	[IR] Require ArrayRef of 'uint32_t' instead of 'int' for the mask argument for one of the signatures of CreateShuffleVector. This better emphasises that you can't use it for the -1 as undef behavior. llvm-svn: 272491	2016-06-12 00:41:19 +00:00
Benjamin Kramer	46e38f3678	Avoid copies of std::strings and APInt/APFloats where we only read from it As suggested by clang-tidy's performance-unnecessary-copy-initialization. This can easily hit lifetime issues, so I audited every change and ran the tests under asan, which came back clean. llvm-svn: 272126	2016-06-08 10:01:20 +00:00
Simon Pilgrim	db9893fb90	[InstCombine][AVX2] Add support for simplifying AVX2 per-element shifts to native shifts Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit). If the shift amount is constant we can sometimes convert these instructions to native shifts: 1 - if all shift amounts are in range then the conversion is trivial. 2 - out of range arithmetic shifts can be clamped to the (bitwidth - 1) (a legal shift amount) before conversion. 3 - logical shifts just return zero if all elements have out of range shift amounts. In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts. Differential Revision: http://reviews.llvm.org/D19675 llvm-svn: 271996	2016-06-07 10:27:15 +00:00
Simon Pilgrim	91e3ac8293	[InstCombine][SSE] Add MOVMSK constant folding (PR27982) This patch adds support for folding undef/zero/constant inputs to MOVMSK instructions. The SSE/AVX versions can be fully folded, but the MMX version can only handle undef inputs. Differential Revision: http://reviews.llvm.org/D20998 llvm-svn: 271990	2016-06-07 08:18:35 +00:00
Craig Topper	8287fd8abd	[X86] Remove SSE/AVX unaligned store intrinsics as clang no longer uses them. Auto upgrade to native unaligned store instructions. llvm-svn: 271236	2016-05-30 23:15:56 +00:00
Simon Pilgrim	9602d678cb	[X86][SSE] (Reapplied) Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm) This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused. Reapplied now that the the companion patch (D20684) removes/auto-upgrade the clang intrinsics has been committed. Differential Revision: http://reviews.llvm.org/D20686 llvm-svn: 271131	2016-05-28 18:03:41 +00:00
Simon Pilgrim	4642a57fbf	Revert: r270973 - [X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm) llvm-svn: 270976	2016-05-27 09:02:25 +00:00
Simon Pilgrim	c013e5737b	[X86][SSE] Replace (V)PMOVSX and (V)PMOVZX integer extension intrinsics with generic IR (llvm) This patch removes the llvm intrinsics VPMOVSX and (V)PMOVZX sign/zero extension intrinsics and auto-upgrades to SEXT/ZEXT calls instead. We already did this for SSE41 PMOVSX sometime ago so much of that implementation can be reused. A companion patch (D20684) removes/auto-upgrade the clang intrinsics. Differential Revision: http://reviews.llvm.org/D20686 llvm-svn: 270973	2016-05-27 08:49:15 +00:00
Craig Topper	a423aa4642	[X86] Add the AVX storeu intrinsics to InstCombine and LoopStrengthReduce in the same places that the SSE/SSE2 storeu intrinsics appear. I don't really know how to test this. Just seemed like we should be consistent. llvm-svn: 270819	2016-05-26 04:28:45 +00:00
Arnaud A. de Grandmaison	333ef381b8	[InstCombine] Remove trivially empty va_start/va_end and va_copy/va_end ranges. When a va_start or va_copy is immediately followed by a va_end (ignoring debug information or other start/end in between), then it is safe to remove the pair. As this code shares some commonalities with the lifetime markers, this has been factored to helper functions. This InstCombine pattern kicks-in 3 times when running the LLVM test suite. llvm-svn: 269033	2016-05-10 09:24:49 +00:00
Simon Pilgrim	ca140b17cb	[InstCombine][SSE] Added support to VPERMD/VPERMPS to shuffle combine to accept UNDEF elements. llvm-svn: 268206	2016-05-01 20:43:02 +00:00
Simon Pilgrim	eeacc40e27	[InstCombine][SSE] Added support to VPERMILVAR to shuffle combine to accept UNDEF elements. llvm-svn: 268204	2016-05-01 20:22:42 +00:00
Simon Pilgrim	e5e8c2fde0	[InstCombine][SSE] Added support to PSHUFB to shuffle combine to accept UNDEF elements. llvm-svn: 268202	2016-05-01 19:26:21 +00:00
Simon Pilgrim	8cddf8b3c6	[InstCombine][AVX2] Combine VPERMD/VPERMPS intrinsics with constant masks to shufflevector. llvm-svn: 268199	2016-05-01 16:41:22 +00:00
Simon Pilgrim	640f9964c7	[InstCombine][AVX] VPERMILVAR to shuffle combine to use general aggregate elements. NFCI. Make use of Constant::getAggregateElement instead of checking constant types - first step towards adding support for UNDEF mask elements. llvm-svn: 268158	2016-04-30 07:23:30 +00:00
Simon Pilgrim	bf60cc492c	[InstCombine][SSE] PSHUFB to shuffle combine to use general aggregate elements. NFCI. Make use of Constant::getAggregateElement instead of checking constant types - first step towards adding support for UNDEF mask elements. llvm-svn: 268115	2016-04-29 21:34:54 +00:00
David Majnemer	231a68cc22	[InstCombine] Propagate operand bundles We neglected to transfer operand bundles for some transforms. These were found via inspection, I'll try to come up with some test cases. llvm-svn: 268010	2016-04-29 08:07:20 +00:00
Simon Pilgrim	424da1637a	[InstCombine][SSE] Demanded vector elements for scalar intrinsics (Part 1 of 2) This patch improves support for determining the demanded vector elements through SSE scalar intrinsics: 1 - recognise that we only need the lowest element of the second input for binary scalar operations (and all the elements of the first input) 2 - recognise that the roundss/roundsd intrinsics use the lowest element of the second input and the remaining elements from the first input Differential Revision: http://reviews.llvm.org/D17490 llvm-svn: 267356	2016-04-24 18:12:42 +00:00
Simon Pilgrim	1c9a9f255c	[InstCombine] Avoid updating argument demanded elements in separate passes. As discussed on D17490, we should attempt to update an intrinsic's arguments demanded elements in one pass if we can. llvm-svn: 267355	2016-04-24 17:57:27 +00:00
Simon Pilgrim	2f6097d113	[X86][InstCombine] Tidyup VPERMILVAR -> shufflevector conversion to helper function. NFCI. llvm-svn: 267352	2016-04-24 17:23:46 +00:00
Simon Pilgrim	c0c56e747a	[X86][InstCombine] Tidyup PSHUFB -> shufflevector conversion to helper function. NFCI. llvm-svn: 267351	2016-04-24 17:00:34 +00:00
Sanjay Patel	5e5056d939	[x86, InstCombine] fix masked load pass-through operand to be a zero vector This bug was introduced with: http://reviews.llvm.org/rL262269 AVX masked loads are specified to set vector lanes to zero when the high bit of the mask element for that lane is zero: "If the mask is 0, the corresponding data element is set to zero in the load form of these instructions, and unmodified in the store form." --Intel manual Differential Revision: http://reviews.llvm.org/D19017 llvm-svn: 266148	2016-04-12 23:16:23 +00:00
George Burgess IV	278199f615	Add the allocsize attribute to LLVM. `allocsize` is a function attribute that allows users to request that LLVM treat arbitrary functions as allocation functions. This patch makes LLVM accept the `allocsize` attribute, and makes `@llvm.objectsize` recognize said attribute. The review for this was split into two patches for ease of reviewing: D18974 and D14933. As promised on the revisions, I'm landing both patches as a single commit. Differential Revision: http://reviews.llvm.org/D14933 llvm-svn: 266032	2016-04-12 01:05:35 +00:00
David Majnemer	fcc5811797	[InstCombine] Add a peephole for redundant assumes Two or more identical assumes are occasionally next to each other in a basic block. While our generic machinery will turn a redundant assume into a no-op, it is not super cheap. We can perform a simpler check to achieve the same result for this case. llvm-svn: 265801	2016-04-08 16:37:12 +00:00
Matt Arsenault	2fe4fbc184	AMDGPU: Add frexp_exp intrinsic llvm-svn: 264944	2016-03-30 22:28:52 +00:00
Matt Arsenault	5cd4f8f89f	AMDGPU: Constant folding for frexp_mant llvm-svn: 264943	2016-03-30 22:28:26 +00:00
Justin Lebar	9d94397859	[attrs] Handle convergent CallSites. Summary: Previously we had a notion of convergent functions but not of convergent calls. This is insufficient to correctly analyze calls where the target is unknown, e.g. indirect calls. Now a call is convergent if it targets a known-convergent function, or if it's explicitly marked as convergent. As usual, we can remove convergent where we can prove that no convergent operations are performed in the call. Originally landed as r261544, then reverted in r261544 for (incidental) build breakage. Re-landed here with no changes. Reviewers: chandlerc, jingyue Subscribers: llvm-commits, tra, jhen, hfinkel Differential Revision: http://reviews.llvm.org/D17739 llvm-svn: 263481	2016-03-14 20:18:54 +00:00
Sanjay Patel	c4acbae63f	[x86, InstCombine] delete x86 SSE2 masked store with zero mask This follows up on the related AVX instruction transforms, but this one is too strange to do anything more with. Intel's behavioral description of this instruction in its Software Developer's Manual is tragi-comic. llvm-svn: 263340	2016-03-12 15:16:59 +00:00
Sanjay Patel	6f2c01f712	[x86, InstCombine] transform more x86 masked loads to LLVM intrinsics Continuation of: http://reviews.llvm.org/rL262269 llvm-svn: 262273	2016-02-29 23:59:00 +00:00
Sanjay Patel	98a71505f5	[x86, InstCombine] transform x86 AVX masked loads to LLVM intrinsics The intended effect of this patch in conjunction with: http://reviews.llvm.org/rL259392 http://reviews.llvm.org/rL260145 is that customers using the AVX intrinsics in C will benefit from combines when the load mask is constant: __m128 mload_zeros(float f) { return _mm_maskload_ps(f, _mm_set1_epi32(0)); } __m128 mload_fakeones(float f) { return _mm_maskload_ps(f, _mm_set1_epi32(1)); } __m128 mload_ones(float f) { return _mm_maskload_ps(f, _mm_set1_epi32(0x80000000)); } __m128 mload_oneset(float f) { return _mm_maskload_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0)); } ...so none of the above will actually generate a masked load for optimized code. This is the masked load counterpart to: http://reviews.llvm.org/rL262064 llvm-svn: 262269	2016-02-29 23:16:48 +00:00
Reid Kleckner	892ae2e2b6	[InstCombine] Be more conservative about removing stackrestore We ended up removing a save/restore pair around an inalloca call, leading to a miscompile in Chromium. llvm-svn: 262095	2016-02-27 00:53:54 +00:00
Sanjay Patel	fc7e7ebf36	[x86, InstCombine] transform x86 AVX2 masked stores to LLVM intrinsics Replicate everything for integers...because x86. Continuation of: http://reviews.llvm.org/rL262064 llvm-svn: 262077	2016-02-26 21:51:44 +00:00
Sanjay Patel	1ace99351f	[x86, InstCombine] transform x86 AVX masked stores to LLVM intrinsics The intended effect of this patch in conjunction with: http://reviews.llvm.org/rL259392 http://reviews.llvm.org/rL260145 is that customers using the AVX intrinsics in C will benefit from combines when the store mask is constant: void mstore_zero_mask(float f, __m128 v) { _mm_maskstore_ps(f, _mm_set1_epi32(0), v); } void mstore_fake_ones_mask(float f, __m128 v) { _mm_maskstore_ps(f, _mm_set1_epi32(1), v); } void mstore_ones_mask(float f, __m128 v) { _mm_maskstore_ps(f, _mm_set1_epi32(0x80000000), v); } void mstore_one_set_elt_mask(float f, __m128 v) { _mm_maskstore_ps(f, _mm_set_epi32(0x80000000, 0, 0, 0), v); } ...so none of the above will actually generate a masked store for optimized code. Differential Revision: http://reviews.llvm.org/D17485 llvm-svn: 262064	2016-02-26 21:04:14 +00:00
Artur Pilipenko	31bcca47d3	NFC. Move isDereferenceable to Loads.h/cpp This is a part of the refactoring to unify isSafeToLoadUnconditionally and isDereferenceablePointer functions. In subsequent change I'm going to eliminate isDerferenceableAndAlignedPointer from Loads API, leaving isSafeToLoadSpecualtively the only function to check is load instruction can be speculated. Reviewed By: hfinkel Differential Revision: http://reviews.llvm.org/D16180 llvm-svn: 261736	2016-02-24 12:49:04 +00:00
Justin Lebar	ccbd8f5a02	Revert "[attrs] Handle convergent CallSites." This reverts r261544, which was causing a test failure in Transforms/FunctionAttrs/readattrs.ll. llvm-svn: 261549	2016-02-22 18:24:43 +00:00
Justin Lebar	7bf9187abb	[attrs] Handle convergent CallSites. Summary: Previously we had a notion of convergent functions but not of convergent calls. This is insufficient to correctly analyze calls where the target is unknown, e.g. indirect calls. Now a call is convergent if it targets a known-convergent function, or if it's explicitly marked as convergent. As usual, we can remove convergent where we can prove that no convergent operations are performed in the call. Reviewers: chandlerc, jingyue Subscribers: hfinkel, jhen, tra, llvm-commits Differential Revision: http://reviews.llvm.org/D17317 llvm-svn: 261544	2016-02-22 17:51:35 +00:00
Sanjay Patel	2440130437	fix inaccurate comment; NFC llvm-svn: 261484	2016-02-21 17:33:31 +00:00
Sanjay Patel	368ac5dbf7	[InstCombine] add getNegativeIsTrueBoolVec() helper function; NFC Originally part of: http://reviews.llvm.org/D17485 We need this when simplifying masked memory ops too. llvm-svn: 261483	2016-02-21 17:29:33 +00:00
Simon Pilgrim	471efd244a	[InstCombine] SSE/SSE2 (u)comiss/(u)comisd comparison intrinsics only use the lowest vector element llvm-svn: 261460	2016-02-20 23:17:35 +00:00
Richard Trieu	7a08381403	Remove uses of builtin comma operator. Cleanup for upcoming Clang warning -Wcomma. No functionality change intended. llvm-svn: 261270	2016-02-18 22:09:30 +00:00
Artur Pilipenko	44e7c51b05	Don't propagate dereferenceable attribute through gc.relocate in InstCombine Reviewed By: reames Differential Revision: http://reviews.llvm.org/D16143 llvm-svn: 260509	2016-02-11 11:22:46 +00:00
Philip Reames	ea4d8e8ce9	[InstCombine][GC] Handle gc.relocations of vector type We introduced gc.relocates of vector-of-pointer types a couple of weeks back. Somehow, I missed updating the InstCombine rule to account for this. If we hit this code path with a vector-of-pointers gc.relocate, we'd crash on a cast<PointerType>. I also took the chance to do a bit of code style cleanup. llvm-svn: 260279	2016-02-09 21:09:22 +00:00
Sanjay Patel	4b198802b3	function names start with a lowercase letter; NFC llvm-svn: 259425	2016-02-01 22:23:39 +00:00
Sanjay Patel	103ab7d571	[InstCombine] simplify masked scatter/gather intrinsics with zero masks A masked scatter with a zero mask means there's no store. A masked gather with a zero mask means the passthru arg is returned. This is a continuation of: http://reviews.llvm.org/rL259369 http://reviews.llvm.org/rL259392 llvm-svn: 259421	2016-02-01 22:10:26 +00:00
Sanjay Patel	04f792bdc9	[InstCombine] simplify masked store intrinsics with all ones or zeros masks A masked store with a zero mask means there's no store. A masked store with an allOnes mask means it's a normal vector store. This is a continuation of: http://reviews.llvm.org/rL259369 llvm-svn: 259392	2016-02-01 19:39:52 +00:00
Sanjay Patel	b695c5557c	[InstCombine] simplify masked load intrinsics with all ones or zeros masks A masked load with a zero mask means there's no load. A masked load with an allOnes mask means it's a normal vector load. Differential Revision: http://reviews.llvm.org/D16691 llvm-svn: 259369	2016-02-01 17:00:10 +00:00
Sanjay Patel	0069f56e33	add helper function for minnum/maxnum ; NFC llvm-svn: 259326	2016-01-31 16:35:23 +00:00
Sanjay Patel	6038d3e5c6	function names start with a lower case letter ; NFC llvm-svn: 259264	2016-01-29 23:27:03 +00:00
Sanjay Patel	f9f5d3cc45	fix formatting; NFC llvm-svn: 259262	2016-01-29 23:14:58 +00:00
Sanjay Patel	03c03f57ee	less indenting; NFCI llvm-svn: 259002	2016-01-28 00:03:16 +00:00
Matt Arsenault	bef34e21c7	AMDGPU: Rename intrinsics to use amdgcn prefix The intrinsic target prefix should match the target name as it appears in the triple. This is not yet complete, but gets most of the important ones. llvm.AMDGPU.* intrinsics used by mesa and libclc are still handled for compatability for now. llvm-svn: 258557	2016-01-22 21:30:34 +00:00
Sanjay Patel	cd4377c74d	don't repeat function names in comments; NFC llvm-svn: 258360	2016-01-20 22:24:38 +00:00
Sanjay Patel	1c600c6e83	80-cols; NFC llvm-svn: 258323	2016-01-20 16:41:43 +00:00
Sanjay Patel	142c49bc42	remove outdated comment; NFC llvm-svn: 258147	2016-01-19 17:29:22 +00:00

1 2 3 4 5 ...

447 Commits