llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	1ac72a0774	[IPConstProp] Regenerate check lines. Preparation for D84447.	2020-07-30 09:52:16 +01:00
Yuanfang Chen	8224c5047e	For some tests targeting SystemZ, -march=z13 ---> -mcpu=z13 z13 is not a target. It is a CPU.	2020-07-29 19:18:01 -07:00
Juneyoung Lee	111a02decd	[JumpThreading] Fold br(freeze(undef)) This patch makes JumpThreading fold br(freeze(undef)) if the freeze instruction is only used by the branch. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84818	2020-07-30 09:38:50 +09:00
Hiroshi Yamauchi	ae7589e1f1	Revert "[PGO] Include the mem ops into the function hash." This reverts commit `120e66b341`. Due to a buildbot failure.	2020-07-29 15:04:57 -07:00
Sanjay Patel	fef513f5cc	[InstSimplify] fold min/max intrinsic with undef operand	2020-07-29 17:03:50 -04:00
Sanjay Patel	5cd695dd7f	[InstSimplify] fold min/max with opposite of limit value	2020-07-29 17:03:50 -04:00
Hiroshi Yamauchi	120e66b341	[PGO] Include the mem ops into the function hash. To avoid hash collisions when the only difference is in mem ops. Differential Revision: https://reviews.llvm.org/D84782	2020-07-29 13:59:40 -07:00
Hiroshi Yamauchi	cd890944ad	[PGO] Remove insignificant function hash values from some tests. This is to avoid the need to update a bunch of test files when the PGO instrumentation function hashing changes. Split off of D84782. Differential Revision: https://reviews.llvm.org/D84865	2020-07-29 10:23:42 -07:00
Craig Topper	3efc978bae	[LV] Add abs/smin/smax/umin/umax intrinsics to isTriviallyVectorizable This patch adds support for vectorizing these intrinsics. Differential Revision: https://reviews.llvm.org/D84796	2020-07-29 10:23:07 -07:00
Roman Lebedev	1d51dc38d8	[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108	2020-07-29 20:05:30 +03:00
Arthur Eubanks	4a10029d7e	[NewPM][Attributor] Pin tests with -attributor to legacy PM All these tests already explicitly test against both legacy PM and NPM. $ sed -i 's/ -attributor / -attributor -enable-new-pm=0 /g' $(rg --path-separator // -l -- -passes=) $ sed -i 's/ -attributor-cgscc / -attributor-cgscc -enable-new-pm=0 /g' $(rg --path-separator // -l -- -passes=) Now all tests in Transforms/Attributor/ pass under NPM. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84813	2020-07-29 09:02:30 -07:00
Sanjay Patel	3e8534fbc6	[InstSimplify] allow partial undef constants for vector min/max folds	2020-07-29 11:53:41 -04:00
Sanjay Patel	3c20ede18b	[InstSimplify] fold integer min/max intrinsic with same args	2020-07-29 11:53:41 -04:00
David Sherwood	9ad7c980bb	[SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock In vectorizeChainsInBlock we try to collect chains of PHI nodes that have the same element type, but the code is relying upon the implicit conversion from TypeSize -> uint64_t. For now, I have modified the code to ignore PHI nodes with scalable types. Differential Revision: https://reviews.llvm.org/D83542	2020-07-29 16:29:19 +01:00
Juneyoung Lee	672df0fc67	[InstSimplify] add tests for expandCommutativeBinOp; NFC	2020-07-29 23:21:39 +09:00
David Green	9ddb28964c	[ARM] Tune getCastInstrCost for extending masked loads and truncating masked stores This patch uses the feature added in D79162 to fix the cost of a sext/zext of a masked load, or a trunc for a masked store. Previously, those were considered cheap or even free, but it's not the case as we cannot split the load in the same way we would for normal loads. This updates the costs to better reflect reality, and adds a test for it in test/Analysis/CostModel/ARM/cast.ll. It also adds a vectorizer test that showcases the improvement: in some cases, the vectorizer will now choose a smaller VF when tail-predication is enabled, which results in better codegen. (Because if it were to use a higher VF in those cases, the code we see above would be generated, and the vmovs would block tail-predication later in the process, resulting in very poor codegen overall) Original Patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79163	2020-07-29 13:41:34 +01:00
Matt Arsenault	c230965ccf	AMDGPU: Make saturating add/sub legal for DAG path	2020-07-29 08:27:31 -04:00
Florian Hahn	2aa2c40d23	[NewGVN] Require asserts for crashing tests. Without asserts, it might take a long time for the tests to crash. Only run them with assert builds.	2020-07-29 12:41:05 +01:00
Yevgeny Rouban	5d6cd61904	[LoopSimplifyCFG] Delete landing pads in dead exit blocks In addition to removing phi nodes this patch removes any landing pad that the dead exit block might have. Without this fix Verifier complains about a new switch instruction jumps to a block with a landing pad. Differential Revision: https://reviews.llvm.org/D84320	2020-07-29 18:36:51 +07:00
Juneyoung Lee	1ae766e3e0	[InstCombine] Add tests for select(freeze(undef)); NFC	2020-07-29 15:27:09 +09:00
Florian Hahn	8e67982384	[NewGVN] Add test cases for remaining known issues. This patch adds IR tests for the known NewGVN issues. The intention is that adding them now will make it easier to keep track of fixes.	2020-07-28 20:33:04 +01:00
Sanjay Patel	3fb13b8484	[InstSimplify] allow undefs in icmp with vector constant folds This is the main icmp simplification shortcoming seen in D84655. Alive2 agrees that the basic examples are correct at least: define <2 x i1> @src(<2 x i8> %x) { %0: %r = icmp sle <2 x i8> { undef, 128 }, %x ret <2 x i1> %r } => define <2 x i1> @tgt(<2 x i8> %x) { %0: ret <2 x i1> { 1, 1 } } Transformation seems to be correct! define <2 x i1> @src(<2 x i32> %X) { %0: %A = or <2 x i32> %X, { 63, 63 } %B = icmp ult <2 x i32> %A, { undef, 50 } ret <2 x i1> %B } => define <2 x i1> @tgt(<2 x i32> %X) { %0: ret <2 x i1> { 0, 0 } } Transformation seems to be correct! https://alive2.llvm.org/ce/z/omt2ee https://alive2.llvm.org/ce/z/GW4nP_ Differential Revision: https://reviews.llvm.org/D84762	2020-07-28 15:13:53 -04:00
Sanjay Patel	f75cf240d6	[InstCombine] avoid crashing on vector constant expression (PR46872)	2020-07-28 15:02:36 -04:00
Sanjay Patel	496fc3f196	[InstSimplify] add tests for icmp with partial undef constant; NFC	2020-07-28 15:00:33 -04:00
Juneyoung Lee	4c9af6d0e0	[JumpThreading] Add a basic support for freeze instruction This patch adds a basic support for freeze instruction to JumpThreading by making ComputeValueKnownInPredecessorsImpl look into its operand. Reviewed By: efriedma, nikic Differential Revision: https://reviews.llvm.org/D84598	2020-07-29 03:12:14 +09:00
Juneyoung Lee	4887495a3e	[JumpThreading] Add tests that have a cast of freeze and vice versa	2020-07-29 02:16:44 +09:00
Arthur Eubanks	2ca6c422d2	[FunctionAttrs] Rename functionattrs -> function-attrs To match NewPM pass name, and also for readability. Also rename rpo-functionattrs -> rpo-function-attrs while we're here. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84694	2020-07-28 09:09:13 -07:00
Florian Hahn	be2ea29ee1	[SCEV] Add additional tests. Increase test coverage for upcoming changes to how SCEV deals with LCSSA phis.	2020-07-28 16:15:57 +01:00
Jinsong Ji	d28f86723f	Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `bf544fa1c3`. Fixed the typo in PPCInstrInfo.cpp.	2020-07-28 14:00:11 +00:00
Luofan Chen	5ee07dc53f	[Attributor] Track AA dependency using dependency graph This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D78861	2020-07-28 18:02:49 +08:00
Roman Lebedev	e40315d2b4	[GVN] Rewrite IsValueFullyAvailableInBlock(): no recursion, less false-negatives While this doesn't appear to help with the perf issue being exposed by D84108, the function as-is is very weird, convoluted, and what's worse, recursive. There was no need for `SpeculativelyAvaliableAndUsedForSpeculation`, tri-state choice is enough. We don't even ever check for that state. The basic idea here is that we need to perform a depth-first traversal of the predecessors of the basic block in question, either finding a preexisting state for the block in a map, or inserting a "placeholder" `SpeculativelyAvaliable`, If we encounter an `Unavaliable` block, then we need to give up search, and back-propagate the `Unavaliable` state to the each successor of said block, more specifically to the each `SpeculativelyAvaliable` we've just created. However, if we have traversed entirety of the predecessors and have not encountered an `Unavaliable` block, then it must mean the value is fully available. We could update each inserted `SpeculativelyAvaliable` into a `Avaliable`, but we don't need to, as assertion excersizes, because we can assume that if we see an `SpeculativelyAvaliable` entry, it is actually `Avaliable`, because during the time we've produced it, if we would have found that it has an `Unavaliable` predecessor, we would have updated it's successors, including this block, into `Unavaliable` Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D84181	2020-07-28 10:19:28 +03:00
Wei Mi	a23f62343c	Supplement instr profile with sample profile. PGO profile is usually more precise than sample profile. However, PGO profile needs to be collected from loadtest and loadtest may not be representative enough to the production workload. Sample profile collected from production can be used as a supplement -- for functions cold in loadtest but warm/hot in production, we can scale up the related function in PGO profile if the function is warm or hot in sample profile. The implementation contains changes in compiler side and llvm-profdata side. Given an instr profile and a sample profile, for a function cold in PGO profile but warm/hot in sample profile, llvm-profdata will either mark all the counters in the profile to be -1 or scale up the max count in the function to be above hot threshold, depending on the zero counter ratio in the profile. The assumption is if there are too many counters being zero in the function profile, the profile is more likely to cause harm than good, then llvm-profdata will mark all the counters to be -1 indicating the function is hot but the profile is unaccountable. In compiler side, if a function profile with all -1 counters is seen, the function entry count will be set to be above hot threshold but its internal profile will be dropped. In the long run, it may be useful to let compiler support using PGO profile and sample profile at the same time, but that requires more careful design and more substantial changes to make two profiles work seamlessly. The patch here serves as a simple intermediate solution. Differential Revision: https://reviews.llvm.org/D81981	2020-07-27 20:17:40 -07:00
Jinsong Ji	bf544fa1c3	Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `adffce7153`. This is breaking test-suite, revert while investigation.	2020-07-27 21:07:00 +00:00
Jinsong Ji	adffce7153	[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html no one is making use of QPX/A2Q/BGQ/BGP CNK anymore. This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang, CNK support in openmp/polly. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D83915	2020-07-27 19:24:39 +00:00
Shinji Okumura	bef19abcf7	[Attributor][NFC] Add tests to noalias.ll Summary: Add tests to `noalias.ll` to make changes in D84665 clear Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis Subscribers: uenoku, kuter, bbn, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84688	2020-07-28 03:53:06 +09:00
Roman Lebedev	1da9834557	[JumpThreading] ProcessBranchOnXOR(): bailout if any pred ends in indirect branch (PR46857) SplitBlockPredecessors() can not split blocks that have such terminators, and in two other places we already ensure that we don't end up calling SplitBlockPredecessors() on such blocks. Do so in one more place. Fixes https://bugs.llvm.org/show_bug.cgi?id=46857	2020-07-27 15:39:03 +03:00
Sanjay Patel	1ebcf03551	[InstSimplify] add tests for min/max intrinsics; NFC	2020-07-27 08:26:27 -04:00
Juneyoung Lee	6701c0bf73	[JumpThreading] Add a test case that has a phi with undef; NFC	2020-07-27 19:08:45 +09:00
Juneyoung Lee	c891f519e1	[JumpThreading] Add a test that threads jumps with frozen branch conditions	2020-07-27 19:04:50 +09:00
Juneyoung Lee	e1eacf27c6	[InstCombine] Fold freeze into phi if one operand is not undef This patch adds folding freeze into phi if it has only one operand to target. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84601	2020-07-27 17:07:27 +09:00
Juneyoung Lee	194a4beedd	[InstCombine] Add more tests to freeze-phi.ll; NFC	2020-07-27 09:43:00 +09:00
Juneyoung Lee	ab4e1be7ab	[InstCombine] Add a test for folding freeze into phi; NFC	2020-07-27 02:24:00 +09:00
Sanjay Patel	0481e1ae3c	[InstSimplify] fold integer min/max intrinsics with limit constant	2020-07-26 09:41:54 -04:00
Sanjay Patel	c6cf71107a	[InstSimplify] add tests for min/max intrinsics; NFC	2020-07-26 09:04:37 -04:00
Sanjay Patel	b89ae102e6	[InstSimplify] fold fcmp using isKnownNeverInfinity + isKnownNeverNaN Follow-up to D84035 / rG7393d7574c09. This sidesteps a question of FMF/poison on fcmp raised in PR46077: http://bugs.llvm.org/PR46077 https://alive2.llvm.org/ce/z/TCsyzD define i1 @src(float %x) { %0: %x42 = fadd nnan ninf float %x, 42.000000 %r = fcmp ueq float %x42, inf ret i1 %r } => define i1 @tgt(float %x) { %0: ret i1 0 } Transformation seems to be correct! https://alive2.llvm.org/ce/z/FQaH7a define i1 @src(i8 %x) { %0: %cast = uitofp i8 %x to float %r = fcmp one float inf, %cast ret i1 %r } => define i1 @tgt(i8 %x) { %0: ret i1 1 } Transformation seems to be correct!	2020-07-26 09:04:37 -04:00
Sanjay Patel	912e9e5262	[InstSimplify] add tests for fcmp with infinity constant; NFC	2020-07-26 09:04:36 -04:00
Juneyoung Lee	920e267974	[JumpThreading] Add a test for D84598; NFC	2020-07-26 22:00:01 +09:00
Juneyoung Lee	9f074214b7	[ValueTracking] Instruction::isBinaryOp should be used for constexprs This is a simple patch that makes canCreateUndefOrPoison use Instruction::isBinaryOp because BinaryOperator inherits Instruction. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84596	2020-07-26 21:48:51 +09:00
Juneyoung Lee	02dadab1b4	NFC; add an example that subtracts pointers to two global vars	2020-07-26 20:47:33 +09:00
Nikita Popov	f4199b8f0b	[SCCP] Add assume non null test (NFC)	2020-07-25 16:02:15 +02:00
Nikita Popov	ad16e71c95	Reapply [SCCP] Directly remove non-feasible edges Reapply with DTU update moved after CFG update, which is a requirement of the API. ----- Non-feasible control-flow edges are currently removed by replacing the branch condition with a constant and then calling ConstantFoldTerminator. This happens in a rather roundabout manner, by inspecting the users (effectively: predecessors) of unreachable blocks, and further complicated by the need to explicitly materialize the condition for "forced" edges. I would like to extend SCCP to discard switch conditions that are non-feasible based on range information, but this is incompatible with the current approach (as there is no single constant we could use.) Instead, this patch explicitly removes non-feasible edges. It currently only needs to handle the case where there is a single feasible edge. The llvm_unreachable() branch will need to be implemented for the aforementioned switch improvement. Differential Revision: https://reviews.llvm.org/D84264	2020-07-25 14:52:35 +02:00
Florian Hahn	3c1476d26c	[IPSCCP] Drop argmemonly after replacing pointer argument. This patch updates IPSCCP to drop argmemonly and inaccessiblemem_or_argmemonly if it replaces a pointer argument. Fixes PR46717. Reviewers: efriedma, davide, nikic, jdoerfert Reviewed By: efriedma, jdoerfert Differential Revision: https://reviews.llvm.org/D84432	2020-07-25 11:52:14 +01:00
Rong Xu	1dd39b1133	[PGO] Fix incorrect function entry count Function entry count might be zero after the profile counts reset and before reentry to the function. Zero profile entry count is very bad as the profile count from BFI will be wrong. A simple fix is to set the profile entry count to 1 if there are non-zero profile counts in this function. Differential Revision: https://reviews.llvm.org/D84378	2020-07-24 17:39:55 -07:00
Rong Xu	31bd15c562	[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction Skip profile count promotion if any of the ExitBlocks contains a ret instruction. This is to prevent dumping of incomplete profile -- if the the loop is a long running loop and dump is called in the middle of the loop, the result profile is incomplete. ExitBlocks containing a ret instruction is an indication of a long running loop -- early exit to error handling code. Differential Revision: https://reviews.llvm.org/D84379	2020-07-24 17:38:31 -07:00
Rong Xu	dcf1bca0de	Revert "[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction" This reverts commit `867ef4472d`.	2020-07-24 17:33:49 -07:00
Rong Xu	867ef4472d	[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction Forgot including the tests in the commit `6fdc6f6c7d`.	2020-07-24 17:23:33 -07:00
Johannes Doerfert	aa09db495a	[SROA] Teach promote to register about droppable instructions This is the second of two patches to address PR46753. We basically allow SROA to promote allocas that are used in doppable instructions, for now that means `llvm.assume`. The (transitive) uses are replaced by `undef` in the droppable instructions. See also D83976. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D83978	2020-07-24 15:15:39 -05:00
Johannes Doerfert	ce8928f2e4	[Mem2Reg] Teach promote to register about droppable instructions This is the first of two patches to address PR46753. We basically allow mem2reg to promote allocas that are used in doppable instructions, for now that means `llvm.assume`. The uses of the alloca (or a bitcast or zero offset GEP from there) are replaced by `undef` in the droppable instructions. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D83976	2020-07-24 15:15:38 -05:00
Johannes Doerfert	ce2d69b557	[SROA][Mem2Reg] Do not crash on alloca + addrspacecast SROA knows that it can look through addrspacecast but PromoteMemoryToRegister did not handle them. This caused an assertion error for the test case, exposed while running `Transforms/PhaseOrdering/inlining-alignment-assumptions.ll` with D83978 applied. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84085	2020-07-24 15:15:38 -05:00
Arthur Eubanks	9bb6ce78be	Rename scoped-noalias -> scoped-noalias-aa Summary: To match NewPM name. Also the new name is clearer and more consistent. Subscribers: jvesely, nhaehnle, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84542	2020-07-24 12:14:27 -07:00
Roman Lebedev	3319d05630	[NFC][GVN] Improve loadpre-missed-opportunity.ll test again thanks to @fhahn	2020-07-24 20:32:51 +03:00
Florian Hahn	1c7c69c795	[ValueTracking] Check for ConstantExpr before using recursive helpers. Make sure we do not call constainsConstantExpression/containsUndefElement on ConstantExpression, which is not supported. In particular, containsUndefElement/constainsConstantExpression are only supported on constants which are supported by getAggregateElement. Unfortunately there's no convenient way to check if a constant supports getAggregateElement, so just check for non-constantexpressions with vector type. Other users of those functions do so too. Reviewers: spatel, nikic, craig.topper, lebedev.ri, jdoerfert, aqjune Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84512	2020-07-24 17:37:09 +01:00
Roman Lebedev	804622053a	[NFC][GVN] Clean loadpre-missed-opportunity.ll test some more	2020-07-24 12:44:22 +03:00
Florian Hahn	2c1799f892	[IPSCCP] Add another test case with argmemonly callsite attributes.	2020-07-24 10:13:51 +01:00
Fangrui Song	4637daa990	Revert D84264 "[SCCP] Directly remove non-feasible edges" & `5db5b4bc43` It breaks stage-2 build. Clang crashed when compiling llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp llvm/Support/GenericDomTree.h eraseNode: Node is not a leaf node	2020-07-23 17:51:48 -07:00
Roman Lebedev	0a5971139a	[NFC][GVN] Add a (horrible) test for D84181 demonstrating non-NFC'ness	2020-07-24 01:28:23 +03:00
Sidharth Baveja	38a8217931	[Loop Fusion] Integrate Loop Peeling into Loop Fusion (re-land after fixing ASAN build failures) This patch adds the ability to peel off iterations of the first loop in loop fusion. This can allow for both loops to have the same trip count, making it legal for them to be fused together. Here is a simple scenario peeling can be used in loop fusion: for (i = 0; i < 10; ++i) a[i] = a[i] + 3; for (j = 1; j < 10; ++j) b[j] = b[j] + 5; Here is we can make use of peeling, and then fuse the two loops together. We can peel off the 0th iteration of the loop i, and then combine loop i and j for i = 1 to 10. a[0] = a[0] +3; for (i = 1; i < 10; ++i) { a[i] = a[i] + 3; b[i] = b[i] + 5; } Currently peeling with loop fusion is only supported for loops with constant trip counts and a single exit point. Both unguarded and guarded loops are supported. Reviewed By: bmahjour (Bardia Mahjour), MaskRay (Fangrui Song) Differential Revision: https://reviews.llvm.org/D82927	2020-07-23 21:02:04 +00:00
Nikita Popov	183342c0a9	[SCCP] Add another switch+phi test (NFC)	2020-07-23 21:51:09 +02:00
Nikita Popov	9394c3ec88	[SCCP] Directly remove non-feasible edges Non-feasible control-flow edges are currently removed by replacing the branch condition with a constant and then calling ConstantFoldTerminator. This happens in a rather roundabout manner, by inspecting the users (effectively: predecessors) of unreachable blocks, and further complicated by the need to explicitly materialize the condition for "forced" edges. I would like to extend SCCP to discard switch conditions that are non-feasible based on range information, but this is incompatible with the current approach (as there is no single constant we could use.) Instead, this patch explicitly removes non-feasible edges. It currently only needs to handle the case where there is a single feasible edge. The llvm_unreachable() branch will need to be implemented for the aforementioned switch improvement. Differential Revision: https://reviews.llvm.org/D84264	2020-07-23 20:32:57 +02:00
Nikita Popov	def48b0e88	[PredicateInfo][SCCP] Remove assertion (PR46814) As long as RenamedOp is not guaranteed to be accurate, we cannot assert here and should just return false. This was already done for the other conditions in this function. Fixes https://bugs.llvm.org/show_bug.cgi?id=46814.	2020-07-23 19:36:51 +02:00
Florian Hahn	0f80d598b0	[IPSCCP] Add test case for PR46717 for argmemonly handling.	2020-07-23 17:25:26 +01:00
Sanjay Patel	cfe40acd16	[VectorCombine] add tests for load vectorization; NFC	2020-07-23 11:24:04 -04:00
Florian Hahn	82e35197e6	[LSR] Re-generate check lines for test. The test is quite frafile, as the check lines match IR numbers and it is not obvious why only a very small subset is checked. Re-generate check lines, so further changes are more obvious.	2020-07-23 13:53:53 +01:00
Florian Hahn	09c96a31ef	[LoopIdiom] Add additional test cases.	2020-07-23 13:53:26 +01:00
Hamilton Tobon Mosquera	6f0d99d2b9	[OpenMPOpt] Regression test for hiding latency of H2D mem transfers	2020-07-22 20:02:54 -05:00
Fangrui Song	27650ec554	Revert D81682 "[PGO] Extend the value profile buckets for mem op sizes." This reverts commit `4a539faf74`. There is a __llvm_profile_instrument_range related crash in PGO-instrumented clang: ``` (gdb) bt llvm::ConstantRange const&, llvm::APInt const&, unsigned int, bool) () llvm::ScalarEvolution::getRangeForAffineAR(llvm::SCEV const, llvm::SCEV const, llvm::SCEV const*, unsigned int) () ``` (The body of __llvm_profile_instrument_range is inlined, so we can only find__llvm_profile_instrument_target in the trace) ``` 23│ 0x000055555dba0961 <+65>: nopw %cs:0x0(%rax,%rax,1) 24│ 0x000055555dba096b <+75>: nopl 0x0(%rax,%rax,1) 25│ 0x000055555dba0970 <+80>: mov %rsi,%rbx 26│ 0x000055555dba0973 <+83>: mov 0x8(%rsi),%rsi # %rsi=-1 -> SIGSEGV 27│ 0x000055555dba0977 <+87>: cmp %r15,(%rbx) 28│ 0x000055555dba097a <+90>: je 0x55555dba0a76 <__llvm_profile_instrument_target+342> ```	2020-07-22 16:08:25 -07:00
Rong Xu	50da55a585	[PGO] Supporting code for always instrumenting entry block This patch includes the supporting code that enables always instrumenting the function entry block by default. This patch will NOT the default behavior. It adds a variant bit in the profile version, adds new directives in text profile format, and changes llvm-profdata tool accordingly. This patch is a split of D83024 (https://reviews.llvm.org/D83024) Many test changes from D83024 are also included. Differential Revision: https://reviews.llvm.org/D84261	2020-07-22 15:01:53 -07:00
Nikita Popov	e20b3079c1	[SCCP] Add additional multi-edge + phi tests (NFC)	2020-07-22 22:10:23 +02:00
Nikita Popov	33f6542014	[SCCP] Regenerate test checks (NFC) And adjust the indbrtest4 test to actually test what it's supposed to. BB1 is supposed to be eliminated here, but isn't, because BB0 still branches to it. This was lost due to the incomplete CHECK lines.	2020-07-22 22:10:23 +02:00
Nikita Popov	eae6bb3807	[SCCP] Add multi-edge switch + phi test case (NFC)	2020-07-22 20:28:22 +02:00
Anton Afanasyev	56c92bf4b7	[SLP][Test] Precommit tests for D83779. NFC.	2020-07-22 18:25:45 +03:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
Alexey Bataev	be37f13e2d	[SLP]Add an extra test for vectorization of non-pow-2 trees, NFC.	2020-07-22 09:13:30 -04:00
Sebastian Neubauer	2c659082bd	[AMDGPU] Don't combine memory intrs to v3i16 v3i16 and v3f16 currently cannot be legalized and lowered so they should not be emitted by inst combining. Moved the check down to still allow extracting 1 or 2 elements via the dmask. Fixes image intrinsics being combined to return v3x16. Differential Revision: https://reviews.llvm.org/D84223	2020-07-22 12:44:01 +02:00
Max Kazantsev	360ab70712	[SimplifyCFG] Do not create unneeded PR Phi in block with convergent calls We do not thread blocks with convergent calls, but this check was missing when we decide to insert PR Phis into it (which we only do for threading). Differential Revision: https://reviews.llvm.org/D83936 Reviewed By: nikic	2020-07-22 13:53:50 +07:00
Chen Zheng	e8425b27fe	[PowerPC] add store (load float) pattern to isProfitableToHoist store (load float) can be optimized to store(load i32) in InstCombine pass. Add store (load float) to isProfitableToHoist to make sure we don't break the opt in InstCombine pass. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D82341	2020-07-21 20:55:13 -04:00
Juneyoung Lee	ace0bf7490	[ValueTracking] Fix incorrect handling of canCreateUndefOrPoison .. in isGuaranteedNotToBeUndefOrPoison. This caused early exit of isGuaranteedNotToBeUndefOrPoison, making it return imprecise result. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84251	2020-07-22 09:31:16 +09:00
Nikita Popov	ef868a848e	[SCCP] Add switch+range tests (NFC)	2020-07-21 23:07:50 +02:00
Fangrui Song	8a268bec1b	Revert D82927 "[Loop Fusion] Integrate Loop Peeling into Loop Fusion" This reverts commit `bb8850d34d`. It broke 3 check-llvm-transforms-loopfusion tests in an ASAN build. LoopFuse.cpp `for (BasicBlock *Pred : predecessors(BB)) {` may operate on a deleted BB.	2020-07-21 12:24:50 -07:00
Hiroshi Yamauchi	7bedae7dee	[PGO][PGSO] Add profile guided size optimization to loop vectorization legality.	2020-07-21 11:16:36 -07:00
Sidharth Baveja	bb8850d34d	[Loop Fusion] Integrate Loop Peeling into Loop Fusion Summary: This patch adds the ability to peel off iterations of the first loop in loop fusion. This can allow for both loops to have the same trip count, making it legal for them to be fused together. Here is a simple scenario peeling can be used in loop fusion: for (i = 0; i < 10; ++i) a[i] = a[i] + 3; for (j = 1; j < 10; ++j) b[j] = b[j] + 5; Here is we can make use of peeling, and then fuse the two loops together. We can peel off the 0th iteration of the loop i, and then combine loop i and j for i = 1 to 10. a[0] = a[0] +3; for (i = 1; i < 10; ++i) { a[i] = a[i] + 3; b[i] = b[i] + 5; } Currently peeling with loop fusion is only supported for loops with constant trip counts and a single exit point. Both unguarded and guarded loops are supported. Author: sidbav (Sidharth Baveja) Reviewers: kbarton, Meinersbur, bkramer, Whitney, skatkov, ashlykov, fhahn, bmahjour Reviewed By: bmahjour Subscribers: bmahjour, mgorny, hiraditya, zzheng Tags: LLVM Differential Revision: https://reviews.llvm.org/D82927	2020-07-21 15:59:14 +00:00
Jay Foad	5e5bda74b6	[IR] Simplify Use::swap. NFCI. The new implementation makes it clear that there are exactly two conditional stores (after the initial no-op optimization). By contrast the old implementation had seven conditionals, some hidden inside other functions. This commit can change the order of operands in operand lists, hence the tweak to one test case. Differential Revision: https://reviews.llvm.org/D80116	2020-07-21 12:15:12 +01:00
Florian Hahn	752fea7c27	[SCCP] Add range metadata to call sites with known return ranges. If we inferred a range for the function return value, we can add !range at all call-sites of the function, if the range does not include undef. Reviewers: efriedma, davide, nikic Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D83952	2020-07-21 10:06:54 +01:00
Sanjay Patel	750f4c591d	[InstCombine] allow peeking through zext of shift amount to match rotate idioms (PR45701) We might want to also allow trunc of the shift amount, but that seems less likely? define i32 @src(i32 %x, i1 %y) { %0: %rem = and i1 %y, 1 %cmp = icmp eq i1 %rem, 0 %sh_prom = zext i1 %rem to i32 %sub = sub nsw nuw i1 0, %rem %sh_prom1 = zext i1 %sub to i32 %shr = lshr i32 %x, %sh_prom1 %shl = shl i32 %x, %sh_prom %or = or i32 %shl, %shr %r = select i1 %cmp, i32 %x, i32 %or ret i32 %r } => define i32 @tgt(i32 %x, i1 %y) { %0: %t = zext i1 %y to i32 %r = fshl i32 %x, i32 %x, i32 %t ret i32 %r } Transformation seems to be correct! https://alive2.llvm.org/ce/z/xgMvE3 http://bugs.llvm.org/PR45701	2020-07-20 16:18:11 -04:00
Sanjay Patel	92ec0c5da6	[InstCombine] add tests for funnel shift/rotate with narrow shift amount; NFC	2020-07-20 16:18:11 -04:00
Florian Hahn	f13a59bcff	[Matrix] Use TileInfo to create tiled loop nest for matrix multiply. This patch uses the TileInfo introduced in D77550 to generate a loop nest for tiled matrix multiplication, instead of generating the unrolled code for the whole multiplication. This makes code-generation more scalable for larger matrixes. Initially loops are only used if both the number of rows and columns are divisible by the tile size. Other cases will be added as follow-up. Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D81308	2020-07-20 21:11:53 +01:00
Mircea Trofin	70f8d0ac8a	[llvm] Development-mode InlineAdvisor Summary: This is the InlineAdvisor used in 'development' mode. It enables two scenarios: - loading models via a command-line parameter, thus allowing for rapid training iteration, where models can be used for the next exploration phase without requiring recompiling the compiler. This trades off some compilation speed for the added flexibility. - collecting training logs, in the form of tensorflow.SequenceExample protobufs. We generate these as textual protobufs, which simplifies generation and testing. The protobufs may then be readily consumed by a tensorflow-based training algorithm. To speed up training, training logs may also be collected from the 'default' training policy. In that case, this InlineAdvisor does not use a model. RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140763.html Reviewers: jdoerfert, davidxl Subscribers: mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83733	2020-07-20 11:01:56 -07:00
Matt Arsenault	5e999cbe8d	IR: Define byref parameter attribute This allows tracking the in-memory type of a pointer argument to a function for ABI purposes. This is essentially a stripped down version of byval to remove some of the stack-copy implications in its definition. This includes the base IR changes, and some tests for places where it should be treated similarly to byval. Codegen support will be in a future patch. My original attempt at solving some of these problems was to repurpose byval with a different address space from the stack. However, it is technically permitted for the callee to introduce a write to the argument, although nothing does this in reality. There is also talk of removing and replacing the byval attribute, so a new attribute would need to take its place anyway. This is intended avoid some optimization issues with the current handling of aggregate arguments, as well as fixes inflexibilty in how frontends can specify the kernel ABI. The most honest representation of the amdgpu_kernel convention is to expose all kernel arguments as loads from constant memory. Today, these are raw, SSA Argument values and codegen is responsible for turning these into loads. Background: There currently isn't a satisfactory way to represent how arguments for the amdgpu_kernel calling convention are passed. In reality, arguments are passed in a single, flat, constant memory buffer implicitly passed to the function. It is also illegal to call this function in the IR, and this is only ever invoked by a driver of some kind. It does not make sense to have a stack passed parameter in this context as is implied by byval. It is never valid to write to the kernel arguments, as this would corrupt the inputs seen by other dispatches of the kernel. These argumets are also not in the same address space as the stack, so a copy is needed to an alloca. From a source C-like language, the kernel parameters are invisible. Semantically, a copy is always required from the constant argument memory to a mutable variable. The current clang calling convention lowering emits raw values, including aggregates into the function argument list, since using byval would not make sense. This has some unfortunate consequences for the optimizer. In the aggregate case, we end up with an aggregate store to alloca, which both SROA and instcombine turn into a store of each aggregate field. The optimizer never pieces this back together to see that this is really just a copy from constant memory, so we end up stuck with expensive stack usage. This also means the backend dictates the alignment of arguments, and arbitrarily picks the LLVM IR ABI type alignment. By allowing an explicit alignment, frontends can make better decisions. For example, there's real no advantage to an aligment higher than 4, so a frontend could choose to compact the argument layout. Similarly, there is a high penalty to using an alignment lower than 4, so a frontend could opt into more padding for small arguments. Another design consideration is when it is appropriate to expose the fact that these arguments are all really passed in adjacent memory. Currently we have a late IR optimization pass in codegen to rewrite the kernel argument values into explicit loads to enable vectorization. In most programs, unrelated argument loads can be merged together. However, exposing this property directly from the frontend has some disadvantages. We still need a way to track the original argument sizes and alignments to report to the driver. I find using some side-channel, metadata mechanism to track this unappealing. If the kernel arguments were exposed as a single buffer to begin with, alias analysis would be unaware that the padding bits betewen arguments are meaningless. Another family of problems is there are still some gaps in replacing all of the available parameter attributes with metadata equivalents once lowered to loads. The immediate plan is to start using this new attribute to handle all aggregate argumets for kernels. Long term, it makes sense to migrate all kernel arguments, including scalars, to be passed indirectly in the same manner. Additional context is in D79744.	2020-07-20 10:23:09 -04:00
Florian Hahn	dc1087d408	[Matrix] Add minimal lowering pass that only requires TTI. This patch adds a new variant of the matrix lowering pass that only does a minimal lowering and only depends on TTI. The main purpose of this pass is to have a pass with minimal dependencies to run as part of the backend pipeline. At the moment, the only difference to the regular lowering pass is that it does not support remarks. But in subsequent patches add support for tiling to the lowering pass which will require more analysis, which we do not want to run in the backend, as the lowering should happen in the middle-end in practice and running it in the backend is mostly for convenience when running llc. Reviewers: anemet, Gerolf, efriedma, hfinkel Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D76867	2020-07-20 11:16:11 +01:00
Roman Lebedev	04b729d076	[NFCI][SimplifyCFG] Guard common code hoisting with a (default-on) flag Common code sinking is already guarded with a (with default-off!) flag, so add a flag for hoisting, too. D84108 will hopefully make hoisting off-by-default too.	2020-07-20 10:29:57 +03:00
Roman Lebedev	3de4166325	[NFC][SimplifyCFG] Add standalone test for common code hoisting xform option Also, move one test into it's correct place	2020-07-20 10:29:29 +03:00
sstefan1	e3d646c699	[Attributor][NFC] applying update_test_checks with --check-attributes Summary: All tests are updated, except wrapper.ll since it is not working nicely with newly created functions. Reviewers: jdoerfert, uenoku, baziotis, homerdin Subscribers: arphaman, jfb, kuter, bbn, okura, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D84130	2020-07-20 08:17:34 +02:00
Juneyoung Lee	30201d3b61	[ValueTracking] Let isGuaranteedNotToBeUndefOrPoison use canCreateUndefOrPoison This patch adds support more operations. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D83926	2020-07-20 09:21:39 +09:00
Wenlei He	d41d952be9	Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks" This reverts commit `2d6ecfa168`.	2020-07-19 08:49:04 -07:00
Wenlei He	2d6ecfa168	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks Summary: This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose. Subscribers: mgorny, aprantl, hiraditya, llvm-commits Tags: #llvm Resubmit for https://reviews.llvm.org/D84086	2020-07-19 08:21:05 -07:00
Nikita Popov	c6e13667e7	[PredicateInfo] Add a method to interpret predicate as cmp constraint Both users of predicteinfo (NewGVN and SCCP) are interested in getting a cmp constraint on the predicated value. They currently implement separate logic for this. This patch adds a common method for this in PredicateBase. This enables a missing bit of PredicateInfo handling in SCCP: Now the predicate on the condition itself is also used. For switches it means we know that the switched-on value is the same as the case value. For assumes/branches we know that the condition is true or false. Differential Revision: https://reviews.llvm.org/D83640	2020-07-19 15:34:32 +02:00
Sanjay Patel	7393d7574c	[InstSimplify] fold fcmp with infinity constant using isKnownNeverInfinity This is a step towards trying to remove unnecessary FP compares with infinity when compiling with -ffinite-math-only or similar. I'm intentionally not checking FMF on the fcmp itself because I'm assuming that will go away eventually. The analysis part of this was added with rGcd481136 for use with isKnownNeverNaN. Similarly, that could be an enhancement here to get predicates like 'one' and 'ueq'. Differential Revision: https://reviews.llvm.org/D84035	2020-07-19 09:24:52 -04:00
Nikita Popov	d12ec0f752	[InstCombine] Fix store merge worklist management (PR46680) Fixes https://bugs.llvm.org/show_bug.cgi?id=46680. Just like insertions through IRBuilder, InsertNewInstBefore() should be using the deferred worklist mechanism, so that processing of newly added instructions is prioritized. There's one side-effect of the worklist order change which could be classified as a regression. An add op gets pushed through a select that at the time is not a umax. We could add a reverse transform that tries to push adds in the reverse direction to restore a min/max, but that seems like a sure way of getting infinite loops... Seems like something that should best wait on min/max intrinsics. Differential Revision: https://reviews.llvm.org/D84109	2020-07-19 15:05:45 +02:00
Nikita Popov	13ae440de4	[InstCombine] Add test for PR46680 (NFC)	2020-07-18 23:37:16 +02:00
Joseph Huber	3bbbe4c4b6	[OpenMP] Add Additional Function Attribute Information to OMPKinds.def Summary: This patch adds more function attribute information to the runtime function definitions in OMPKinds.def. The goal is to provide sufficient information about OpenMP runtime functions to perform more optimizations on OpenMP code. Reviewers: jdoerfert Subscribers: aaron.ballman cfe-commits yaxunl guansong sstefan1 llvm-commits Tags: #OpenMP #clang #LLVM Differential Revision: https://reviews.llvm.org/D81031	2020-07-18 12:55:50 -04:00
Roman Lebedev	8d487668d0	[CVP] Soften SDiv into a UDiv as long as we know domains of both of the operands. Yes, if operands are non-positive this comes at the extra cost of two extra negations. But a. division is already just ridiculously costly, two more subtractions can't hurt much :) and b. we have better/more analyzes/folds for an unsigned division, we could end up narrowing it's bitwidth, converting it to lshr, etc. This is essentially a take two on `0fdcca07ad`, which didn't fix the potential regression i was seeing, because ValueTracking's computeKnownBits() doesn't make use of dominating conditions in it's analysis. While i could teach it that, this seems like the more general fix. This big hammer actually does catch said potential regression. Over vanilla test-suite + RawSpeed + darktable (10M IR instrs, 1M IR BB, 1M X86 ASM instrs), this fires/converts 5 more (+2%) SDiv's, the total instruction count at the end of middle-end pipeline is only +6, so out of +10 extra negations, ~half are folded away, and asm instr count is only +1, so practically speaking all extra negations are folded away and are therefore free. Sadly, all these new UDiv's remained, none folded away. But there are two less basic blocks. https://rise4fun.com/Alive/VS6 Name: v0 Pre: C0 >= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 C0, C1 Name: v1 Pre: C0 <= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 -C0, C1 %r = sub i8 0, %t0 Name: v2 Pre: C0 >= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 C0, -C1 %r = sub i8 0, %t0 Name: v3 Pre: C0 <= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 -C0, -C1	2020-07-18 17:59:56 +03:00
Roman Lebedev	7b16fd8a25	[NFC][CVP] Add tests for possible sdiv->udiv where operands are not non-negative Currently that fold requires both operands to be non-negative, but the only real requirement for the fold is that we must know the domains of the operands.	2020-07-18 17:59:31 +03:00
David Green	2f4c3e8097	[LV] Add additional InLoop redution tests. NFC	2020-07-18 12:14:23 +01:00
Chen Zheng	bb07eb944f	[PowerPC]add testcase for adding store (load float*) pattern, nfc	2020-07-17 22:57:08 -04:00
Chen Zheng	6d247f980d	[SCEV][IndVarSimplify] insert point should not be block front. Recommit after removing the unused cast instructions. Differential Revision: https://reviews.llvm.org/D80975	2020-07-17 22:25:10 -04:00
Arthur Eubanks	0dfa4a83fa	Revert "[PGO][PGSO] Add profile guided size optimization to loop vectorization legality." This reverts commit `30c382a7c6`. See https://crbug.com/1106813.	2020-07-17 16:47:41 -07:00
Eric Christopher	020545d386	Temporarily Revert "[OpenMP] Add Additional Function Attribute Information to OMPKinds.def" as it's causing a few unused variable warnings via the macro instantiation: sources/llvm-project/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def:649:17: error: unused variable 'InaccessibleOnlyAttrs' [-Werror,-Wunused-variable] __OMP_ATTRS_SET(InaccessibleOnlyAttrs, ^ This reverts commit `09fe0c5ab9`.	2020-07-17 15:05:42 -07:00
Eric Christopher	ae08dbc673	Temporarily Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks" as it is failing the inline-replay.ll test as well as sanitizers/Werror from returning a stack local variable. This reverts commit `029946b112`.	2020-07-17 14:58:01 -07:00
Joseph Huber	09fe0c5ab9	[OpenMP] Add Additional Function Attribute Information to OMPKinds.def Summary: This patch adds more function attribute information to the runtime function definitions in OMPKinds.def. The goal is to provide sufficient information about OpenMP runtime functions to perform more optimizations on OpenMP code. Reviewers: jdoerfert Subscribers: aaron.ballman cfe-commits yaxunl guansong sstefan1 llvm-commits Tags: #OpenMP #clang #llvm Differential Revision: https://reviews.llvm.org/D81031	2020-07-17 17:54:01 -04:00
Wenlei He	029946b112	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks Summary: This change added a new inline advisor that takes optimization remarks for previous inlining as input, and provide the decision as advice so current inlining can replay inline decision of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites. The change can be useful for Inliner tuning. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inliner advisor with SampleProfileLoader's inline decision for replay. The new inline advisor can also be used by regular CGSCC inliner later if needed. Reviewers: davidxl, mtrofin, wmi, hoy Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83743	2020-07-17 13:30:47 -07:00
Xinan Jiang	d8e0baf29d	[InstCombine] Fix typo in comment. Reviewers: fhahn Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D83951	2020-07-17 20:57:45 +01:00
Roman Lebedev	0fdcca07ad	[InstCombine] Fold X sdiv (-1 << C) -> -(X u>> Y) iff X is non-negative This is the one i'm seeing as missed optimization, although there are likely other possibilities, as usual. There are 4 variants of a general sdiv->udiv fold: https://rise4fun.com/Alive/VS6 Name: v0 Pre: C0 >= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 C0, C1 Name: v1 Pre: C0 <= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 -C0, C1 %r = sub i8 0, %t0 Name: v2 Pre: C0 >= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 C0, -C1 %r = sub i8 0, %t0 Name: v3 Pre: C0 <= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 -C0, -C1 If we really don't like sdiv (more than udiv that is), and are okay with increasing instruction count (2 new negations), and we ensure that we don't undo the fold, then we could just implement these..	2020-07-17 22:50:09 +03:00
Roman Lebedev	66b66988e6	[NFC][InstCombine] Add some tests with sdiv-by-negative-power-of-two	2020-07-17 22:50:09 +03:00
George Rokos	04713f8aa6	Added missing API call to OpenMP test	2020-07-17 10:40:11 -07:00
Sanjay Patel	acbc688263	[InstSimplify] add tests for fcmp with infinity; NFC	2020-07-17 11:51:41 -04:00
Sjoerd Meijer	7ebc6bed84	[ARM][MVE] Reorg of the LV tail-folding tests It was getting difficult to see which test was in which file, so this reorganises the test files so that now all filenames start with tail-folding-* followed by a more descriptive name what that group of tests check.	2020-07-17 15:54:15 +01:00
Sidharth Baveja	11e879d4f1	[Loop Simplify] Resolve an issue where metadata is not applied to a loop latch. Summary: This patch resolves an issue where the metadata of a loop is not added to the new loop latch, and not removed from the old loop latch. This issue occurs in the SplitBlockPredecessors function, which adds a new block in a loop, and in the case that the block passed into this function is the header of the loop, the loop can be modified such that the latch of the loop is replaced. This patch applies to the Loop Simplify pass since it ensures that each loop has exit blocks which only have predecessors that are inside of the loop. In the case that this is not true, the pass will create a new exit block for the loop. This guarantees that the loop preheader/header will dominate the exit blocks. Author: sidbav (Sidharth Baveja) Reviewers: asbirlea (Alina Sbirlea), chandlerc (Chandler Carruth), Whitney (Whitney Tsang), bmahjour (Bardia Mahjour) Reviewed By: asbirlea (Alina Sbirlea) Subscribers: hiraditya (Aditya Kumar), llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D83869	2020-07-17 14:02:14 +00:00
Sam Parker	ed48e6fa65	[NFC][ARM] Add SimplifyCFG test	2020-07-17 14:07:40 +01:00
Anna Welker	23c9534515	[LV] Enable the LoopVectorizer to create pointer inductions This patch enables the LoopVectorizer to build a phi of pointer type and provide the vector loads and stores with vector type getelementptrs built from the pointer induction variable, which produces much less instructions than the previous approach of creating scalar getelementpointers and glue them together to a vector. Differential Revision: https://reviews.llvm.org/D81267	2020-07-17 13:35:07 +01:00
Max Kazantsev	df6e185e8f	[InstCombine][Test] Test for fix of replacing select with Phis when branch has the same labels An additional test that allows to check the correctness of handling the case of the same branch labels in the dominator when trying to replace select with phi-node. Patch By: Kirill Polushin Differential Revision: https://reviews.llvm.org/D84006 Reviewed By: mkazantsev	2020-07-17 17:16:28 +07:00
Juneyoung Lee	582901d0b5	[ValueTracking] Let isGuaranteedNotToBeUndefOrPoison consider noundef This patch adds support for noundef arguments. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D83752	2020-07-17 12:53:08 +09:00
Juneyoung Lee	cd4953246b	Add a test for D83752	2020-07-17 12:50:40 +09:00
Jon Roelofs	a0537fc35f	[SimplifyCFG] Fix crash in the EXPENSIVE_CHECKS build SimplifyCFG was incorrectly reporting to the pass manager that it had not made changes after folding away a PHI. This is detected in the EXPENSIVE_CHECKS build when the function's hash changes. Differential Revision: https://reviews.llvm.org/D83985	2020-07-16 15:34:41 -06:00
Roman Lebedev	b636e7d1fc	[NFC][PhaseOrdering] Add a test demonstrating pitfails of common code hoisting on loop rotation Depending on the -rotation-max-header-size=?, hoisting common code early makes loop rotation impossible.	2020-07-16 23:53:26 +03:00
Mircea Trofin	9870f77441	[llvm] Moved InlineSizeEstimatorAnalysis test to .ll Summary: Following guidance in https://llvm.org/docs/TestingGuide.html#testing-analysis Reviewers: mehdi_amini Subscribers: mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83918	2020-07-16 12:25:16 -07:00
Eric Christopher	7bfaa40086	Temporarily Revert "[AssumeBundles] Use operand bundles to encode alignment assumptions" due to the performance bugs filed in https://bugs.llvm.org/show_bug.cgi?id=46753. An SROA change soon may obviate some of these problems. This reverts commit `8d09f20798`.	2020-07-16 11:54:04 -07:00
Matt Arsenault	0347039a6e	ValueTracking: Fix isKnownNonZero for non-0 null pointers for byval The IR doesn't have a proper concept of invalid pointers, and "null" constants are just all zeros (though it really needs one). I think it's not possible to break this for AMDGPU due to the copy semantics of byval. If you have an original stack object at 0, the byval copy will be placed above it so I don't think it's really possible to hit a 0 address.	2020-07-16 13:50:49 -04:00
Florian Hahn	037c812191	[SCCP] Add test cases for adding !range to call-sites.	2020-07-16 15:34:58 +01:00
Max Kazantsev	989ee11df6	[Test] Add test that shows how SimplifyCFG may insert redunant Phi It happens when a block cannot be threaded because of a convergent function.	2020-07-16 16:23:11 +07:00
Max Kazantsev	90798e09e2	Re-enable "[InstCombine] Simplify boolean Phis with const inputs using CFG" This reverts commit `b893822e32`. + Clang test fixes + Insertion point fix for landing pads	2020-07-16 16:09:08 +07:00
Max Kazantsev	b893822e32	Revert "[InstCombine] Simplify boolean Phis with const inputs using CFG" This reverts commit `00472067c3`. Need to fix failing clang tests.	2020-07-16 12:58:39 +07:00
Max Kazantsev	00472067c3	[InstCombine] Simplify boolean Phis with const inputs using CFG This patch adds simplification for pattern: ``` if (cond) / \ ... ... \ / p = phi [true] [false] ... br p, succ_1, succ_2 ``` If we can prove that top block's branches dominate respective inputs of a block that has a Phi with constant inputs, we can use the branch condition (maybe inverted) instead of Phi. This will make proofs of implication for further jump threading more transparent. Differential Revision: https://reviews.llvm.org/D81375 Reviewed By: xbolva00	2020-07-16 12:06:10 +07:00
Craig Topper	00f3579aea	Revert "[InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms" and subsequent patches This reverts most of the following patches due to reports of miscompiles. I've left the added test cases with comments updated to be FIXMEs. `1cf6f210a2` [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison. `469da663f2` [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison `122b0640fc` [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison `ac0af12ed2` [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison `9b1e95329a` [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms	2020-07-15 22:02:33 -07:00
George Rokos	911fcf382f	Fix lit test related to declare mapper patch D67833.	2020-07-15 20:31:36 -07:00
Hongtao Yu	f3731d34fa	[LoopUnroll] Update branch weight for remainder loop Unrolling a loop with compile-time unknown trip count results in a remainder loop. The remainder loop executes the remaining iterations of the original loop when the original trip count is not a multiple of the unroll factor. For better profile counts maintenance throughout the optimization pipeline, I'm assigning an artificial weight to the latch branch of the remainder loop. A remainder loop runs up to as many times as the unroll factor subtracted by 1. Therefore I'm assigning the maximum possible trip count as the back edge weight. This should be more accurate than the default non-profile weight, which assumes the back edge runs much more frequently than the exit edge. Differential Revision: https://reviews.llvm.org/D83187	2020-07-15 12:33:29 -07:00
Hiroshi Yamauchi	30c382a7c6	[PGO][PGSO] Add profile guided size optimization to loop vectorization legality. Differential Revision: https://reviews.llvm.org/D83329	2020-07-15 11:49:36 -07:00
Sanjay Patel	d8b268680d	[InstCombine] prevent infinite looping in or-icmp fold (PR46712) I'm not sure if the test is truly minimal, but we need to induce a situation where a value becomes a constant but is not immediately folded before getting to the 'or' transform.	2020-07-15 14:12:12 -04:00
Sanjay Patel	efc30e591b	[InstCombine] update datalayout in test file; NFC We need to specify legal integer widths to trigger PR46712, so add those here. This doesn't appear to affect any existing tests, and it's not clear why a datalayout would not include any legal integer widths. While here, change some variable names that include 'tmp' to avoid warnings from the auto-generating script for CHECK lines.	2020-07-15 14:12:12 -04:00
Hiroshi Yamauchi	4a539faf74	[PGO] Extend the value profile buckets for mem op sizes. Extend the memop value profile buckets to be more flexible (could accommodate a mix of individual values and ranges) and to cover more value ranges (from 11 to 22 buckets). Disabled behind a flag (to be enabled separately) and the existing code to be removed later.	2020-07-15 10:26:15 -07:00
Arthur Eubanks	f413b53a67	[NPM][IVUsers] Rename ivusers -> iv-users LPM passes were named iv-users, which seems nicer than ivusers. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D83803	2020-07-15 09:38:21 -07:00
David Green	24cd66d219	[HardwareLoops] Add sibling loop test. This missed being part of `9e03547cab`.	2020-07-15 16:36:17 +01:00
Tim Northover	37b96d51d0	CodeGenPrep: remove AssertingVH references before deleting dead instructions. CodeGenPrepare keeps fairly close track of various instructions it's seen, particularly GEPs, in maps and vectors. However, sometimes those instructions become dead and get removed while it's still executing. This triggers AssertingVH references to them in an asserts build and could lead to miscompiles in a release build (I've only seen a later segfault though). So this patch adds a callback to RecursivelyDeleteTriviallyDeadInstructions which can make sure the instruction about to be deleted is removed from CodeGenPrepare's data structures.	2020-07-15 15:19:21 +01:00
John Brawn	20854d85e1	[DSE,MSSA] Recognise init_trampoline in getLocForWriteEx This fixes an instance where MemorySSA-using Dead Store Elimination is failing to do a transformation that the non-MemorySSA-using version does. Differential Revision: https://reviews.llvm.org/D83783	2020-07-15 12:18:58 +01:00
Tim Northover	5165b2b5fd	AArch64+ARM: make LLVM consider system registers volatile. Some of the system registers readable on AArch64 and ARM platforms return different values with each read (for example a timer counter), these shouldn't be hoisted outside loops or otherwise interfered with, but the normal @llvm.read_register intrinsic is only considered to read memory. This introduces a separate @llvm.read_volatile_register intrinsic and maps all system-registers on ARM platforms to use it for the __builtin_arm_rsr calls. Registers declared with asm("r9") or similar are unaffected.	2020-07-15 09:47:36 +01:00
Giorgis Georgakoudis	7f680be593	[OpenMPOpt][NFC] Update checks for parallel_deletion test	2020-07-14 23:36:33 -07:00
Luofan Chen	6db99d18b6	Revert "[Attributor] Track AA dependency using dependency graph" This reverts commit `8df7af560a`.	2020-07-15 11:48:08 +08:00
Johannes Doerfert	64d99a1d04	[CallGraph] Update callback call sites in RefreshCallGraph Since D82572, we keep "reference" edges for callback call sites. While not strictly necessary they can improve the traversal order. However, we did not update them properly in case a pass removed the callback call site which caused a verification error (PR46687). With this patch we update these reference edges properly during the invocation of `CallGraphSCCPass::RefreshCallGraph` in non-checking mode. Reviewed By: sdmitriev Differential Revision: https://reviews.llvm.org/D83718	2020-07-14 22:33:57 -05:00
Luofan Chen	8df7af560a	[Attributor] Track AA dependency using dependency graph Summary: This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers. Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis Reviewed By: jdoerfert Subscribers: jfb, okura, mgrang, kuter, lebedev.ri, hiraditya, uenoku, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78861	2020-07-15 10:40:21 +08:00
Luofan Chen	e21323a1e9	Revert "[Attributor] [WIP] Track AA dependency using dependency graph" This reverts commit `6b78ed6070`.	2020-07-15 10:33:55 +08:00
Luofan Chen	6b78ed6070	[Attributor] [WIP] Track AA dependency using dependency graph Summary: This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers. Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis Reviewed By: jdoerfert Subscribers: jfb, okura, mgrang, kuter, lebedev.ri, hiraditya, uenoku, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78861	2020-07-15 10:21:49 +08:00
Christopher Tetreault	9c87c55805	[SVE] Make cstfp_pred_ty and cst_pred_ty work with scalable splats Reviewers: efriedma, lebedev.ri, fhahn, c-rhodes, david-arm Reviewed By: efriedma, david-arm Subscribers: tschuett, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83001	2020-07-14 14:20:39 -07:00
Tyker	0257ba581c	Fix tests after `16f777f421`	2020-07-14 22:52:26 +02:00
Tyker	16f777f421	[NFC] Add debug and stat counters to assume queries and assume builder Summary: Add debug counter and stats counter to assume queries and assume builder here is the collected stats on a build of check-llvm + check-clang. "assume-builder.NumAssumeBuilt": 2720879, "assume-builder.NumAssumesMerged": 761396, "assume-builder.NumAssumesRemoved": 1576212, "assume-builder.NumBundlesInAssumes": 6518809, "assume-queries.NumAssumeQueries": 85566380, "assume-queries.NumUsefullAssumeQueries": 2727360, the NumUsefullAssumeQueries stat is actually pessimistic because in a few places queries ask to keep providing information to try to get better information. and this isn't counted as a usefull query evem tho it can be usefull Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83506	2020-07-14 21:49:14 +02:00
Teresa Johnson	6014c46c80	Restore "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP" This restores commit `80d0a137a5`, and the follow on fix in `873c0d0786`, with a new fix for test failures after a 2-stage clang bootstrap, and a more robust fix for the Chromium build failure that an earlier version partially fixed. See also discussion on D75201. Reviewers: evgeny777 Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73242	2020-07-14 12:16:57 -07:00
Sjoerd Meijer	2b3c505d0f	[Matrix] Intrinsic descriptions This changes the matrix load/store intrinsic definitions to load/store from/to a pointer, and not from/to a pointer to a vector, as discussed in D83477. This also includes the recommit of "[Matrix] Tighten LangRef definitions and Verifier checks" which adds improved language reference descriptions of the matrix intrinsics and verifier checks. Differential Revision: https://reviews.llvm.org/D83785	2020-07-14 19:58:16 +01:00
Sanjay Patel	e6c016420c	[ValueTracking] fix library to intrinsic mapping to respect 'nobuiltin' attribute This is another problem raised in: http://bugs.llvm.org/PR46627	2020-07-14 10:04:24 -04:00
Sanjay Patel	9300de4d1c	[InstSimplify] add test with nobuiltin attribute (PR46627); NFC	2020-07-14 10:04:24 -04:00
serge-sans-paille	1cd1c1d62e	Revert "[SCEV][IndVarSimplify] insert point should not be block front." This reverts commit `f1efb8bb4b`. Reverted because it doesn't correctly update the pass return status, see http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/9441/steps/test-check-all/logs/FAIL%3A%20LLVM%3A%3Awiden-i32-i8ptr.ll	2020-07-14 14:24:26 +02:00
Sanjay Patel	34d35d4a42	[ValueTracking] fix miscompile in maxnum case of cannotBeOrderedLessThanZeroImpl (PR46627) A miscompile with -0.0 is shown in: http://bugs.llvm.org/PR46627 This is because maxnum(-0.0, +0.0) does not specify a fixed result: http://llvm.org/docs/LangRef.html#llvm-maxnum-intrinsic So we need to tighten the constraints for when it is ok to say the result of maxnum is positive (including +0.0). Differential Revision: https://reviews.llvm.org/D83601	2020-07-14 08:08:09 -04:00
Sanjay Patel	9cc669d22d	[InstCombine][InstSimplify] add tests for sign of maxnum; NFC More coverage for D83601.	2020-07-14 08:08:09 -04:00
Sam Parker	a5405a2f05	[NFC][ARM] Add SimplifyCFG tests	2020-07-14 11:10:11 +01:00
Sjoerd Meijer	959eaa50d6	[ARM][MVE] Only tail-fold integer add reductions If a vector body has live-out values, it is probably a reduction, which needs a final reduction step after the loop. MVE has a VADDV instruction to reduce integer vectors, but doesn't have an equivalent one for float vectors. A live-out value that is not recognised as reduction later in the optimisation pipeline will result in the tail-predicated loop to be reverted to a non-predicated loop and this is very expensive, i.e. it has a significant performance impact, which is what we hope to avoid with fine tuning the ARM TTI hook preferPredicateOverEpilogue implementation. Differential Revision: https://reviews.llvm.org/D82953	2020-07-14 10:15:07 +01:00
Jameson Nash	2c7a07b59d	[GVN] teach ConstantFolding correct handling of non-integral addrspace casts Here we teach the ConstantFolding analysis pass that it is not legal to replace a load of a bitcast constant (having a non-integral addrspace) with a bitcast of the value of that constant (with a different non-integral addrspace). But also teach it that certain bit patterns are always known and convertable (a fact it already uses elsewhere). This required us to also fix a globalopt test, since, after this change, LLVM is able to realize that the test actually is a valid transform (NULL is always a known bit-pattern) and so it doesn't need to emit the failure remarks for it. Also simplify some of the negative tests for transforms by avoiding a type change in their bitcast, and add positive versions of the same tests, to show that they otherwise should work. Differential Revision: https://reviews.llvm.org/D59730	2020-07-13 21:44:17 -04:00
Jameson Nash	19f01a4847	[GVN] add early exit to ConstantFoldLoadThroughBitcast [NFC] And adds some additional test coverage to ensure later commits don't introduce regressions. Differential Revision: https://reviews.llvm.org/D59730	2020-07-13 21:44:17 -04:00
Mircea Trofin	6b109f2f05	[llvm][NFC] Removed unused CHECKs in a ml test The CHECKs are now in Inputs/test-module.ll	2020-07-13 16:59:14 -07:00
Mircea Trofin	73f02a61df	[llvm][NFC] ML InlineAdvisor: Factored CHECKs in common test The CHECKs are going to be shared with the development mode test	2020-07-13 16:31:07 -07:00
Tyker	8d09f20798	[AssumeBundles] Use operand bundles to encode alignment assumptions Summary: NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html Complemantary to the assumption outliner prototype in D71692, this patch shows how we could simplify the code emitted for an alignemnt assumption. The generated code is smaller, less fragile, and it makes it easier to recognize the additional use as a "assumption use". As mentioned in D71692 and on the mailing list, we could adopt this scheme, and similar schemes for other patterns, without adopting the assumption outlining. Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71739	2020-07-14 01:05:58 +02:00
Gui Andrade	bfa3b627c6	[InstCombine] Erase attribute lists for simplified libcalls Currently, a transformation like pow(2.0, x) -> exp2(x) copies the pow attribute list verbatim and applies it to exp2. This works out fine when the attribute list is empty, but when it isn't clang may error due due to the mismatch. The source function and destination don't necessarily have anything to do with one another, attribute-wise. So it makes sense to remove the attribute lists (this is similar to what IPO does in this situation). This was discovered after implementing the `noundef` param attribute. Differential Revision: https://reviews.llvm.org/D82820	2020-07-13 22:32:33 +00:00
Vedant Kumar	3d52b1e81b	Revert "[InstCombine] Drop debug loc in TryToSinkInstruction (reland)" This reverts commit `9649c2095f`. See discussion on the llvm-commits thread: if it's OK to preserve the location when sinking a call, it's probably OK to always preserve the location.	2020-07-13 15:17:07 -07:00
Nikita Popov	353fa4403a	[PredicateInfo] Place predicate info after assume Place the ssa.copy instructions for assumes after the assume, instead of before it. Both options are valid, but placing them afterwards prevents assumes from being replaced with assume(true). This fixes https://bugs.llvm.org/show_bug.cgi?id=37541 in NewGVN and will avoid a similar issue in SCCP when we handle more predicate infos. Differential Revision: https://reviews.llvm.org/D83631	2020-07-13 21:10:11 +02:00
Nikita Popov	4b626dd949	[NewGVN] Separate passing assume tests (NFC) Result might not be exactly the same as under GVN, but all the desired transforms are made.	2020-07-13 21:07:03 +02:00
Nikita Popov	14f738b350	[NewGVN] Rename xfail tests (NFC) Add an -xfail.ll suffix to tests marked XFAIL, so these files can be split into passing and failing parts.	2020-07-13 21:07:03 +02:00
Sanne Wouda	7b84045565	[SLPVectorizer] handle vectorizeable library functions Teaches the SLPVectorizer to use vectorized library functions for non-intrinsic calls. This already worked for intrinsics that have vectorized library functions, thanks to D75878, but schedules with library functions with a vector variant were being rejected early. - assume that there are no load/store dependencies between lib functions with a vector variant; this would otherwise prevent the bundle from becoming "ready" - check during legalization that the vector variant can be used - fix-up where we previously assumed that a call would be an intrinsic Differential Revision: https://reviews.llvm.org/D82550	2020-07-13 15:28:46 +01:00
Sanne Wouda	e909f6bc48	Pre-commit tests Prepare to land D82550	2020-07-13 15:28:46 +01:00
Sjoerd Meijer	595270ae39	[ARM][MVE] Refactor option -disable-mve-tail-predication This refactors option -disable-mve-tail-predication to take different arguments so that we have 1 option to control tail-predication rather than several different ones. This is also a prep step for D82953, in which we want to reject reductions unless that is requested with this option. Differential Revision: https://reviews.llvm.org/D83133	2020-07-13 13:40:33 +01:00
Max Kazantsev	e808cab824	[InstCombine] Improve select -> phi canonicalization: consider more blocks We can try to replace select with a Phi not in its parent block alone, but also in blocks of its arguments. We benefit from it when select's argument is a Phi. Differential Revision: https://reviews.llvm.org/D83284 Reviewed By: nikic	2020-07-13 11:40:32 +07:00
Shinji Okumura	c73f425f84	[Attributor] Add AAValueSimplifyCallSiteArgument::manifest Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D82861	2020-07-13 07:01:50 +09:00
Alexey Lapshin	0a01fc96e2	Revert "[TRE] allow TRE for non-capturing calls." This reverts commit `f7907e9d22`. That commit caused error on multi-stage build.	2020-07-13 00:39:48 +03:00
Sanjay Patel	4458973347	[InstCombine] fold mul of zext/sext bools to 'and' Similar to rG40fcc42: The base case only worked because we were relying on a poison-unsafe select transform; if that is fixed, we would regress on patterns like this. The extra use tests show that the select transform can't be applied consistently. So it may be a regression to have an extra instruction on 1 test, but that result was not created safely and does not happen reliably.	2020-07-12 15:56:26 -04:00
Ayal Zaks	82a5157ff1	[LV] Fixing versioning-for-unit-stide of loops with small trip count This patch fixes D81345 and PR46652. If a loop with a small trip count is compiled w/o -Os/-Oz, Loop Access Analysis still generates runtime checks for unit strides that will version the loop. In such cases, the loop vectorizer should either re-run the analysis or bail-out from vectorizing the loop, as done prior to D81345. The latter is applied for now as the former requires refactoring. Differential Revision: https://reviews.llvm.org/D83470	2020-07-12 19:51:47 +03:00
Nikita Popov	d589372704	[SCCP] Extend nonnull metadata test (NFC)	2020-07-12 17:48:32 +02:00
Nikita Popov	6634aef71f	[SCCP] Add test for predicate info condition handling (NFC)	2020-07-12 10:13:10 +02:00
kuter	4dbe82eef3	[Attributor] Introudce attribute seed allow list.	2020-07-12 02:25:33 +03:00
Nikita Popov	6792069a3f	[NewGVN] Regenerate test checks (NFC)	2020-07-11 22:51:49 +02:00
Michael Liao	0b4cf802fa	[fix-irreducible] Skip unreachable predecessors. Summary: - Skip unreachable predecessors during header detection in SCC. Those unreachable blocks would be generated in the switch lowering pass in the corner cases or other frontends. Even though they could be removed through the CFG simplification, we should skip them during header detection. Reviewers: sameerds Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83562	2020-07-11 10:08:44 -04:00
Alexey Lapshin	f7907e9d22	[TRE] allow TRE for non-capturing calls. The current implementation of Tail Recursion Elimination has a very restricted pre-requisite: AllCallsAreTailCalls. i.e. it requires that no function call receives a pointer to local stack. Generally, function calls that receive a pointer to local stack but do not capture it - should not break TRE. This fix allows us to do TRE if it is proved that no pointer to the local stack is escaped. Reviewed by: efriedma Differential Revision: https://reviews.llvm.org/D82085	2020-07-11 14:01:48 +03:00
Johannes Doerfert	5b0581aedc	[OpenMP] Replace function pointer uses in GPU state machine In non-SPMD mode we create a state machine like code to identify the parallel region the GPU worker threads should execute next. The identification uses the parallel region function pointer as that allows it to work even if the kernel (=target region) and the parallel region are in separate TUs. However, taking the address of a function comes with various downsides. With this patch we will identify the most common situation and replace the function pointer use with a dummy global symbol (for identification purposes only). That means, if the parallel region is only called from a single target region (or kernel), we do not use the function pointer of the parallel region to identify it but a new global symbol. Fixes PR46450. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83271	2020-07-11 01:44:00 -05:00
Johannes Doerfert	e8039ad4de	[OpenMP] Identify GPU kernels (aka. OpenMP target regions) We now identify GPU kernels, that is entry points into the GPU code. These kernels (can) correspond to OpenMP target regions. With this patch we identify and on request print them via remarks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83269	2020-07-11 01:44:00 -05:00
sstefan1	b8235d2bd8	Reland "[OpenMPOpt] ICV Tracking" This reverts commit `1d542f0ca8`. `recollectUses()` is added to prevent looking at dead uses after Attributor run. This is the first and most basic ICV Tracking implementation. For this first version, we only support deduplication within the same BB. Reviewers: jdoerfert, JonChesterfield, hamax97, jhuber6, uenoku, baziotis, lebedev.ri Differential Revision: https://reviews.llvm.org/D81788	2020-07-11 02:25:57 +02:00
Sanjay Patel	351f2b3c0a	[InstSimplify] add tests for maxnum (PR46627); NFC	2020-07-10 20:20:38 -04:00

... 2 3 4 5 6 ...

15680 Commits