llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	2ea8dbf564	[CostModel][X86] Add more exhaustive masked load/store/gather/scatter/expand/compress cost tests llvm-svn: 357838	2019-04-06 12:08:37 +00:00
Guozhi Wei	36fc9c3107	[LCG] Add aliased functions as LCG roots Current LCG doesn't check aliased functions. So if an internal function has a public alias it will not be added to CG SCC, but it is still reachable from outside through the alias. So this patch adds aliased functions to SCC. Differential Revision: https://reviews.llvm.org/D59898 llvm-svn: 357795	2019-04-05 18:51:08 +00:00
Sanjoy Das	32fd32bc6f	[SCEV] Check the cache in get{S\|U}MaxExpr before doing any work Summary: This lets us avoid e.g. checking if A >=s B in getSMaxExpr(A, B) if we've already established that (A smax B) is the best we can do. Fixes PR41225. Reviewers: asbirlea Subscribers: mcrosier, jlebar, bixia, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60010 llvm-svn: 357320	2019-03-29 22:00:12 +00:00
Alina Sbirlea	f085cc5aa7	[MemorySSA] Limit clobber walks. Summary: This patch limits all getClobberingMemoryAccess() walks to MaxCheckLimit. Reviewers: george.burgess.iv Subscribers: sanjoy, jlebar, Prazek, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59569 llvm-svn: 357319	2019-03-29 21:56:09 +00:00
Alina Sbirlea	e589067e61	[MemorySSA] Don't optimize incomplete phis. Summary: MemoryPhis cannot be optimized out until they are complete. Resolves PR41254. Reviewers: george.burgess.iv Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59966 llvm-svn: 357315	2019-03-29 21:16:31 +00:00
James Y Knight	c0e6b8ac3a	IR: Support parsing numeric block ids, and emit them in textual output. Just as as llvm IR supports explicitly specifying numeric value ids for instructions, and emits them by default in textual output, now do the same for blocks. This is a slightly incompatible change in the textual IR format. Previously, llvm would parse numeric labels as string names. E.g. define void @f() { br label %"55" 55: ret void } defined a label named "55", even without needing to be quoted, while the reference required quoting. Now, if you intend a block label which looks like a value number to be a name, you must quote it in the definition too (e.g. `"55":`). Previously, llvm would print nameless blocks only as a comment, and would omit it if there was no predecessor. This could cause confusion for readers of the IR, just as unnamed instructions did prior to the addition of "%5 = " syntax, back in 2008 (PR2480). Now, it will always print a label for an unnamed block, with the exception of the entry block. (IMO it may be better to print it for the entry-block as well. However, that requires updating many more tests.) Thus, the following is supported, and is the canonical printing: define i32 @f(i32, i32) { %3 = add i32 %0, %1 br label %4 4: ret i32 %3 } New test cases covering this behavior are added, and other tests updated as required. Differential Revision: https://reviews.llvm.org/D58548 llvm-svn: 356789	2019-03-22 18:27:13 +00:00
Sjoerd Meijer	633fb0f266	[TTI] getMemcpyCost This adds new function getMemcpyCost to TTI so that the cost of a memcpy can be modeled and queried. The default implementation returns Expensive, but targets can override this function to model the cost more accurately. Differential Revision: https://reviews.llvm.org/D59252 llvm-svn: 356555	2019-03-20 14:15:46 +00:00
Matt Arsenault	e0c1f9e76d	AMDGPU: Partially fix default device for HSA There are a few different issues, mostly stemming from using generation based checks for anything instead of subtarget features. Stop adding flat-address-space as a feature for HSA, as it should only be a device property. This was incorrectly allowing flat instructions to select for SI. Increase the default generation for HSA to avoid the encoding error when emitting objects. This has some other side effects from various checks which probably should be separate subtarget features (in the cost model and for dealing with the DS offset folding issue). Partial fix for bug 41070. It should probably be an error to try using amdhsa without flat support. llvm-svn: 356347	2019-03-17 21:31:35 +00:00
Tim Renouf	e30aa6a136	[AMDGPU] Prepare for introduction of v3 and v5 MVTs AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This commit does not add them, but makes preparatory changes: * Fixed assumptions of power-of-2 vector type in kernel arg handling, and added v5 kernel arg tests and v3/v5 shader arg tests. * Added v5 tests for cost analysis. * Added vec3/vec5 arg test cases. Some of this patch is from Matt Arsenault, also of AMD. Differential Revision: https://reviews.llvm.org/D58928 Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd llvm-svn: 356342	2019-03-17 21:04:16 +00:00
Teresa Johnson	4ab0a9f0a4	[SCEV] Use depth limit for trunc analysis Summary: This fixes an extremely long compile time caused by recursive analysis of truncs, which were not previously subject to any depth limits unlike some of the other ops. I decided to use the same control used for sext/zext, since the routines analyzing these are sometimes mutually recursive with the trunc analysis. Reviewers: mkazantsev, sanjoy Subscribers: sanjoy, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58994 llvm-svn: 355949	2019-03-12 18:28:05 +00:00
Ryan Taylor	67f36903ae	[AMDGPU] Add support for 64 bit buffer atomic artihmetic instructions Summary: This adds support for 64 bit buffer atomic arithmetic instructions but does not include cmpswap as that depends on a fix to the way the register pairs are handled Change-Id: Ib207ea65fb69487ccad5066ea647ae8ddfe2ce61 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58918 llvm-svn: 355520	2019-03-06 17:02:06 +00:00
Florian Hahn	98f11a7d75	[SCEV] Handle case where MaxBECount is less precise than ExactBECount for OR. In some cases, MaxBECount can be less precise than ExactBECount for AND and OR (the AND case was PR26207). In the OR test case, both ExactBECounts are undef, but MaxBECount are different, so we hit the assertion below. This patch uses the same solution the AND case already uses. Assertion failed: ((isa<SCEVCouldNotCompute>(ExactNotTaken) \|\| !isa<SCEVCouldNotCompute>(MaxNotTaken)) && "Exact is not allowed to be less precise than Max"), function ExitLimit This patch also consolidates test cases for both AND and OR in a single test case. Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13245 Reviewers: sanjoy, efriedma, mkazantsev Reviewed By: sanjoy Differential Revision: https://reviews.llvm.org/D58853 llvm-svn: 355259	2019-03-02 02:31:44 +00:00
Alina Sbirlea	fcfa7c5f92	[MemorySSA] Make insertDef insert corresponding phi nodes. Summary: The original assumption for the insertDef method was that it would not materialize Defs out of no-where, hence it will not insert phis needed after inserting a Def. However, when cloning an instruction (use case used in LICM), we do materialize Defs "out of no-where". If the block receiving a Def has at least one other Def, then no processing is needed. If the block just received its first Def, we must check where Phi placement is needed. The only new usage of insertDef is in LICM, hence the trigger for the bug. But the original goal of the method also fails to apply for the move() method. If we move a Def from the entry point of a diamond to either the left or right blocks, then the merge block must add a phi. While this usecase does not currently occur, or may be viewed as an incorrect transformation, MSSA must behave corectly given the scenario. Resolves PR40749 and PR40754. Reviewers: george.burgess.iv Subscribers: sanjoy, jlebar, Prazek, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58652 llvm-svn: 355040	2019-02-27 22:20:22 +00:00
Alina Sbirlea	9026404125	[MemorySSA & SimpleLoopUnswitch] Update MemorySSA in ReplaceUsesOfWith. SimpleLoopUnswitch must update MemorySSA when removing instructions. Resolves PR39197. llvm-svn: 354919	2019-02-26 19:44:52 +00:00
Simon Pilgrim	42bf2dd629	[TTI] Add generic cost model for smul/umul overflow intrinsics Based off smul/umul fixed costs and the implementation in TargetLowering::expandMULO. llvm-svn: 354784	2019-02-25 13:30:23 +00:00
Dmitri Gribenko	751c5fbf6a	Fixed typos in tests: s/CEHCK/CHECK/ Reviewers: ilya-biryukov Subscribers: sanjoy, sdardis, javed.absar, jrtc27, atanasyan, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58608 llvm-svn: 354781	2019-02-25 13:12:33 +00:00
Simon Pilgrim	9caf0f0d15	[TTI] Add generic cost model for fixed point smul/umul Based on an IR equivalent of target lowering's generic expansion - target specific costs will typically be lower (IR doesn't have a good mull/mulh equivalent) but we need a baseline. Differential Revision: https://reviews.llvm.org/D57925 llvm-svn: 354774	2019-02-25 11:59:23 +00:00
Alina Sbirlea	151100787d	[MemorySSA] Update test with minimized one. NFCI llvm-svn: 354658	2019-02-22 07:34:54 +00:00
Alina Sbirlea	90d2e3a16d	[MemorySSA & LoopPassManager] Resolve PR40038. The correct edge being deleted is not to the unswitched exit block, but to the original block before it was split. That's the key in the map, not the value. The insert is correct. The new edge is to the .split block. The splitting turns OriginalBB into: OriginalBB -> OriginalBB.split. Assuming the orignal CFG edge: ParentBB->OriginalBB, we must now delete ParentBB->OriginalBB, not ParentBB->OriginalBB.split. llvm-svn: 354656	2019-02-22 07:18:37 +00:00
Alina Sbirlea	97468e9282	[MemorySSA & LoopPassManager] Update MemorySSA in formDedicatedExitBlocks. MemorySSA is now updated when forming dedicated exit blocks. Resolves PR40037. llvm-svn: 354623	2019-02-21 21:13:34 +00:00
Sam Parker	0b53e8454b	[BPI] Look through bitcasts in calcZeroHeuristic Constant hoisting may have hidden a constant behind a bitcast so that it isn't folded into its users. However, this prevents BPI from calculating some of its heuristics that are based upon constant values. So, I've added a simple helper function to look through these casts. Differential Revision: https://reviews.llvm.org/D58166 llvm-svn: 354119	2019-02-15 11:50:21 +00:00
Alina Sbirlea	d77edc00a8	[MemorySSA] Remove verifyClobberSanity. Summary: This verification may fail after certain transformations due to BasicAA's fragility. Added a small explanation and a testcase that triggers the assert in checkClobberSanity (before its removal). Addresses PR40509. Reviewers: george.burgess.iv Subscribers: sanjoy, jlebar, llvm-commits, Prazek Tags: #llvm Differential Revision: https://reviews.llvm.org/D57973 llvm-svn: 353739	2019-02-11 19:51:21 +00:00
Simon Pilgrim	fbb3086fc8	[CostModel][X86] Add UMUL fixed point cost tests llvm-svn: 353153	2019-02-05 10:55:38 +00:00
Max Kazantsev	437ee05885	[SCEV] Do not bother creating separate SCEVUnknown for unreachable nodes Currently, SCEV creates SCEVUnknown for every node of unreachable code. If we have a huge amounts of such code, we will be littering SE with these nodes. We could just state that they all are undef and save some memory. Differential Revision: https://reviews.llvm.org/D57567 Reviewed By: sanjoy llvm-svn: 353017	2019-02-04 05:04:19 +00:00
Philip Pfaffe	9438585fe4	[DA][NewPM] Handle transitive dependencies in the new-pm version of DA Summary: The analysis result of DA caches pointers to AA, SCEV, and LI, but it never checks for their invalidation. Fix that. Reviewers: chandlerc, dmgreen, bogner Reviewed By: dmgreen Subscribers: hiraditya, bollu, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D56381 llvm-svn: 352986	2019-02-03 12:25:41 +00:00
Max Kazantsev	b37419ef66	[SCEV] Prohibit SCEV transformations for huge SCEVs Currently SCEV attempts to limit transformations so that they do not work with big SCEVs (that may take almost infinite compile time). But for this, it uses heuristics such as recursion depth and number of operands, which do not give us a guarantee that we don't actually have big SCEVs. This situation is still possible, though it is not likely to happen. However, the bug PR33494 showed a bunch of simple corner case tests where we still produce huge SCEVs, even not reaching big recursion depth etc. This patch introduces a concept of 'huge' SCEVs. A SCEV is huge if its expression size (intoduced in D35989) exceeds some threshold value. We prohibit optimizing transformations if any of SCEVs we are dealing with is huge. This gives us a reliable check that we don't spend too much time working with them. As the next step, we can possibly get rid of old limiting mechanisms, such as recursion depth thresholds. Differential Revision: https://reviews.llvm.org/D35990 Reviewed By: reames llvm-svn: 352728	2019-01-31 06:19:25 +00:00
Max Kazantsev	468ad52213	[SCEV] Take correct loop in AddRec simplification. PR40420 The code of AddRec simplification is using wrong loop when it creates a new AddRecExpr. It should be using AddRecLoop which we have saved and against which all gate checks are made, and not calling AddRec->getLoop() over and over again because AddRec may change and become an AddRecurrency from outer loop during the transform iterations. Considering this change trivial, commiting for postcommit review. llvm-svn: 352451	2019-01-29 05:37:59 +00:00
Max Kazantsev	d4de606ddb	[NFC] Merge failing test from PR40420 llvm-svn: 352450	2019-01-29 05:12:40 +00:00
Nikita Popov	8e1a464e6a	[CodeGen][X86] Expand UADDSAT to NOT+UMIN+ADD Followup to D56636, this time handling the UADDSAT case by expanding uadd.sat(a, b) to umin(a, ~b) + b. Differential Revision: https://reviews.llvm.org/D56869 llvm-svn: 352409	2019-01-28 19:19:09 +00:00
Tim Corringham	824ca3f3dd	[AMDGPU] Add intrinsics for 16 bit interpolation Summary: Added the intrinsics llvm.amdgcn.interp.p1.f16() and llvm.amdgcn.interp.p2.f16() and related LIT test. The p1 intrinsic generates code appropriate for both 16 and 32 bank LDS. Reviewers: #amdgpu, dstuttard, arsenm, tpr Reviewed By: #amdgpu, arsenm Subscribers: jvesely, mgorny, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46754 llvm-svn: 352357	2019-01-28 13:48:59 +00:00
Simon Pilgrim	adca820927	[TTI] Add generic SADDSAT/SSUBSAT costs Add generic costs calculation for SADDSAT/SSUBSAT intrinsics, this uses generic costs for sadd_with_overflow/ssub_with_overflow, an extra sign comparison + a selects based on the sign/overflow. This completes PR40316 Differential Revision: https://reviews.llvm.org/D57239 llvm-svn: 352315	2019-01-27 13:51:59 +00:00
Nemanja Ivanovic	7d007ddedf	[PowerPC] Update Vector Costs for P9 For the power9 CPU, vector operations consume a pair of execution units rather than one execution unit like a scalar operation. Update the target transform cost functions to reflect the higher cost of vector operations when targeting Power9. Patch by RolandF. Differential revision: https://reviews.llvm.org/D55461 llvm-svn: 352261	2019-01-26 01:18:48 +00:00
Simon Pilgrim	30b206b5da	[CostModel][X86] Add SMUL fixed point cost tests llvm-svn: 352046	2019-01-24 13:48:20 +00:00
Simon Pilgrim	47ca8606ba	[TTI] Add generic SADDO/SSUBO costs Added x86 scalar sadd_with_overflow/ssub_with_overflow costs. llvm-svn: 352045	2019-01-24 13:36:45 +00:00
Simon Pilgrim	a131e4e296	[TTI] Add generic UADDSAT/USUBSAT costs Add generic costs calculation for UADDSAT/USUBSAT intrinsics, this fallbacks to using generic costs for uadd_with_overflow/usub_with_overflow + a select. Differential Revision: https://reviews.llvm.org/D56907 llvm-svn: 352044	2019-01-24 12:27:10 +00:00
Simon Pilgrim	2d1964b90f	[TTI] Add generic UADDO/USUBO costs Added x86 scalar uadd_with_overflow/usub_with_overflow costs. Differential Revision: https://reviews.llvm.org/D56907 llvm-svn: 352043	2019-01-24 12:10:20 +00:00
Simon Pilgrim	f87226eb70	[IR] Match intrinsic parameter by scalar/vectorwidth This patch replaces the existing LLVMVectorSameWidth matcher with LLVMScalarOrSameVectorWidth. The matching args must be either scalars or vectors with the same number of elements, but in either case the scalar/element type can differ, specified by LLVMScalarOrSameVectorWidth. I've updated the _overflow intrinsics to demonstrate this - allowing it to return a i1 or <N x i1> overflow result, matching the scalar/vectorwidth of the other (add/sub/mul) result type. The masked load/store/gather/scatter intrinsics have also been updated to use this, although as we specify the reference type to be llvm_anyvector_ty we guarantee the mask will be <N x i1> so no change in behaviour Differential Revision: https://reviews.llvm.org/D57090 llvm-svn: 351957	2019-01-23 16:00:22 +00:00
Simon Pilgrim	ee900efb30	[CostModel][X86] Add ICMP Predicate specific costs First step towards PR40376, this patch adds support for getCmpSelInstrCost to use the (optional) Instruction CmpInst predicate to indicate the type of integer comparison we're performing and alter the costs accordingly. Differential Revision: https://reviews.llvm.org/D57013 llvm-svn: 351810	2019-01-22 12:29:38 +00:00
Simon Pilgrim	44feb4a87b	[CostModel][X86] Add XOP icmp cost tests (PR40376) llvm-svn: 351741	2019-01-21 11:33:52 +00:00
Simon Pilgrim	c934d3a01b	[CostModel][X86] Add explicit vector select costs Prior to SSE41 (and sometimes on AVX1), vector select has to be performed as a ((X & C)\|(Y & ~C)) bit select. Exposes a couple of issues with the min/max reduction costs (which only go down to SSE42 for some reason). The increase pre-SSE41 selection costs also prevent a couple of tests from firing any longer, so I've either tweaked the target or added AVX tests as well to the existing SSE2 tests. llvm-svn: 351685	2019-01-20 13:55:01 +00:00
Simon Pilgrim	1231904c48	[CostModel][X86] Add explicit fcmp costs for pre-SSE42 targets Typical throughputs: cmpss/cmpps = 1cy and cmpsd/cmppd = 2cy before the Core2 era llvm-svn: 351684	2019-01-20 13:21:43 +00:00
Simon Pilgrim	60e5a3accb	[CostModel][X86] Split icmp/fcmp costs tests and test all comparison codes llvm-svn: 351682	2019-01-20 12:10:42 +00:00
Simon Pilgrim	5d7182ecb6	[CostModel][X86] Add masked load/store/gather/scatter tests for SSE2/SSE42/AVX1 targets llvm-svn: 351681	2019-01-20 11:23:01 +00:00
Simon Pilgrim	a8b009fd14	[CostModel][X86] Add non-constant vselect cost tests Also add AVX512 costs at the same time llvm-svn: 351680	2019-01-20 11:19:35 +00:00
Neil Henning	3ed09f8e0c	[AMDGPU] Add some missing always-uniform values. This commit adds some missing intrinsics into the isAlwaysUniform list for the AMDGPU backend. Differential Revision: https://reviews.llvm.org/D56845 llvm-svn: 351562	2019-01-18 16:39:27 +00:00
Nikita Popov	d3b86b79fa	Reapply "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors" Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Reapplying with updated SLPVectorizer tests. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351219	2019-01-15 18:43:41 +00:00
James Y Knight	693d39dd12	Remove irrelevant references to legacy git repositories from compiler identification lines in test-cases. (Doing so only because it's then easier to search for references which are actually important and need fixing.) llvm-svn: 351200	2019-01-15 16:18:52 +00:00
Nikita Popov	5885eec35a	Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors" This reverts commit r351125. I missed test changes in an SLPVectorizer test, due to the cost model changes. Reverting for now. llvm-svn: 351129	2019-01-14 22:18:39 +00:00
Nikita Popov	8e9a8432a8	[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors Related to https://bugs.llvm.org/show_bug.cgi?id=40123. Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB, which produces much better code for X86. Differential Revision: https://reviews.llvm.org/D56636 llvm-svn: 351125	2019-01-14 21:43:30 +00:00
Nikita Popov	9f6e9cf71b	[ConstantFolding] Fold undef for integer intrinsics This fixes https://bugs.llvm.org/show_bug.cgi?id=40110. This implements handling of undef operands for integer intrinsics in ConstantFolding, in particular for the bitcounting intrinsics (ctpop, cttz, ctlz), the with.overflow intrinsics, the saturating math intrinsics and the funnel shift intrinsics. The undef behavior follows what InstSimplify does for the general cas e of non-constant operands. For the bitcount intrinsics (where InstSimplify doesn't do undef handling -- there cannot be a combination of an undef + non-constant operand) I'm using a 0 result if the intrinsic is defined for zero and undef otherwise. Differential Revision: https://reviews.llvm.org/D55950 llvm-svn: 350971	2019-01-11 21:18:00 +00:00

1 2 3 4 5 ...

1681 Commits