llvm-project

Commit Graph

Author	SHA1	Message	Date
Florian Hahn	3b0d30ffd3	[SCEVExpander] Name temporary instructions for LCSSA insertion (NFC).	2020-07-31 18:16:46 +01:00
Hongtao Yu	d23c1d6a8d	[AutoFDO] Avoid merging inlinee samples multiple times A function call can be replicated by optimizations like loop unroll and jump threading and the replicates end up sharing the sample nested callee profile. Therefore when it comes to merging samples for uninlined callees in the sample profile inliner, a callee profile can be merged multiple times which will cause an assert to fire. This change avoids merging same callee profile for duplicate callsites by filtering out callee profiles with a non-zero head sample count. Reviewed By: wenlei, wmi Differential Revision: https://reviews.llvm.org/D84997	2020-07-31 09:30:05 -07:00
Benjamin Kramer	c6f08b14d4	Hide some internal symbols. NFC.	2020-07-31 17:28:02 +02:00
Vitaly Buka	b0eb40ca39	[NFC] Remove unused GetUnderlyingObject paramenter Depends on D84617. Differential Revision: https://reviews.llvm.org/D84621	2020-07-31 02:10:03 -07:00
Juneyoung Lee	ad48367722	[JumpThreading] Let SimplifyPartiallyRedundantLoad look into freeze This patch allows SimplifyPartiallyRedundantLoad work when the branch condition was frozen. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84944	2020-07-31 15:28:24 +09:00
Max Kazantsev	8aaeee5fb6	[SimpleLoopUnswitch] Preserve make.implicit in non-trivial unswitch if legal We can preserve make.implicit metadata in the split block if it is guaranteed that after following the branch we always reach the block where processing of null case happens, which is equivalent to "initial condition must execute if the loop is entered". Differential Revision: https://reviews.llvm.org/D84925 Reviewed By: asbirlea	2020-07-31 11:38:43 +07:00
Max Kazantsev	d889e17eca	[SimpleLoopUnswitch] Drop make.implicit metadata in case of non-trivial unswitching Non-trivial unswitching simply moves terminator being unswitch from the loop up to the switch block. It also preserves all metadata that was there. It might not be a correct thing to do for `make.implicit` metadata. Consider case: ``` for (...) { cond = // computed in loop if (cond) return X; if (p == null) throw_npe(); !make implicit } ``` Before the unswitching, if `p` is null and we reach this check, we are guaranteed to go to `throw_npe()` block. Now we unswitch on `p == null` condition: ``` if (p == null) !make implicit { for (...) { if (cond) return X; throw_npe() } } else { for (...) { if (cond) return X; } } ``` Now, following `true` branch of `p == null` does not always lead us to `throw_npe()` because the loop has side exit. Now, if we run ImplicitNullCheck pass on this code, it may end up making the unswitch condition implicit. This may lead us to turning normal path to `return X` into signal-throwing path, which is not efficient. Note that this does not happen during trivial unswitch: it guarantees that we do not have side exits before condition being unswitched. This patch fixes this situation by unconditional dropping of `make.implicit` metadata when we perform non-trivial unswitch. We could preserve it if we could prove that the condition always executes. This can be done as a follow-up. Differential Revision: https://reviews.llvm.org/D84916 Reviewed By: asbirlea	2020-07-31 11:33:02 +07:00
Wei Mi	836991d367	Fix a crash when the sample profile uses md5 and -sample-profile-merge-inlinee is enabled. When -sample-profile-merge-inlinee is enabled, new FunctionSamples may be created during profile merge without GUIDToFuncNameMap being initialized. That will occasionally cause compiler crash. The patch fixes it. Differential Revision: https://reviews.llvm.org/D84994	2020-07-30 21:21:06 -07:00
Vitaly Buka	89051ebace	[NFC] GetUnderlyingObject -> getUnderlyingObject I am going to touch them in the next patch anyway	2020-07-30 21:08:24 -07:00
Vitaly Buka	b256cb88a7	[ValueTracking] Remove AllocaForValue parameter findAllocaForValue uses AllocaForValue to cache resolved values. The function is used only to resolve arguments of lifetime intrinsic which usually are not fare for allocas. So result reuse is likely unnoticeable. In followup patches I'd like to replace the function with GetUnderlyingObjects. Depends on D84616. Differential Revision: https://reviews.llvm.org/D84617	2020-07-30 18:48:34 -07:00
Vitaly Buka	61cab352e3	[NFC] Move findAllocaForValue into ValueTracking.h Differential Revision: https://reviews.llvm.org/D84616	2020-07-30 18:22:59 -07:00
kuterd	49def10e02	[Attributor] Add time trace support. This patch addes time trace functionality to have a better understanding of the analysis times. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84980	2020-07-31 03:08:50 +03:00
Nikita Popov	4c16eafe12	[SCCP] Remove dead switch cases based on range information Determine whether switch edges are feasible based on range information, and remove non-feasible edges lateron. This does not try to determine whether the default edge is dead, as we'd have to determine that the range is fully covered by the cases for that. Another limitation here is that we don't remove dead cases that have the same successor as a live case. I'm not handling this because I wanted to keep the edge removal based on feasible edges only, rather than inspecting ranges again there -- this does not seem like a particularly useful case to handle. Differential Revision: https://reviews.llvm.org/D84270	2020-07-30 21:21:08 +02:00
Simon Pilgrim	4a161bd8b3	LoopUnroll.cpp - pass std::vector by const reference to needToInsertPhisForLCSSA helper. NFCI. Avoid an unnecessary pass by value.	2020-07-30 18:17:04 +01:00
Yuanfang Chen	555cf42f38	[NewPM][PassInstrument] Add PrintPass callback to StandardInstrumentations Problem: Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order. Solution: Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder) This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order. This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want. The test fixes looks messy. It includes changes: - Remove PassInstrumentationAnalysis - Remove PassAdaptor - If a PassAdaptor is for a real pass, the pass is added - Pass reorder (to the correct order), related to PassAdaptor - Add missing passes (due to Debuglogging not passed down) Reviewed By: asbirlea, aeubanks Differential Revision: https://reviews.llvm.org/D84774	2020-07-30 10:07:57 -07:00
Hiroshi Yamauchi	3d6f53018f	[PGO] Include the mem ops into the function hash. To avoid hash collisions when the only difference is in mem ops.	2020-07-30 09:26:20 -07:00
Simon Pilgrim	6316b0023e	Attributor.h - remove unnecessary includes. NFCI. Fix implicit cpp include dependencies.	2020-07-30 15:26:41 +01:00
David Green	1da0c47fa2	[LoopVectorizer] Don't create unused block masks for reductions. NFC This removes some unneeded block masks when we don't have any reductions. It should not have any effect on codegen as the values created are dead anyway. Differential Revision: https://reviews.llvm.org/D81415	2020-07-30 14:28:08 +01:00
Florian Hahn	59d6e814ce	Revert "[IPConstProp] Remove and move tests to SCCP." This reverts commit `e77624a3be`. Looks like some clang tests manually invoke -ipconstprop via opt.....	2020-07-30 13:06:54 +01:00
Florian Hahn	e77624a3be	[IPConstProp] Remove and move tests to SCCP. As far as I know, ipconstprop has not been used in years and ipsccp has been used instead. This has the potential for confusion and sometimes leads people to spend time finding & reporting bugs as well as updating it to work with the latest API changes. This patch moves the tests over to SCCP. There's one functional difference I am aware of: ipconstprop propagates for each call-site individually, so for functions that are called with different constant arguments it can sometimes produce better results than ipsccp (at much higher compile-time cost).But IPSCCP can be thought to do so as well for internal functions and as mentioned earlier, the pass seems unused in practice (and there are no plans on working towards enabling it anytime). Also discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2020-July/143773.html Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84447	2020-07-30 12:36:27 +01:00
Simon Pilgrim	cc529285fd	VectorUtils.h - reduce unnecessary includes. NFC. Replace TargetLibraryInfo.h include with forward declaration and fix implicit dependencies. Reduce SmallSet.h include to SmallVector.h include.	2020-07-30 12:27:49 +01:00
Max Kazantsev	3678ad88a6	[NFC] Remove unused variable	2020-07-30 13:32:15 +07:00
Juneyoung Lee	111a02decd	[JumpThreading] Fold br(freeze(undef)) This patch makes JumpThreading fold br(freeze(undef)) if the freeze instruction is only used by the branch. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84818	2020-07-30 09:38:50 +09:00
Hiroshi Yamauchi	ae7589e1f1	Revert "[PGO] Include the mem ops into the function hash." This reverts commit `120e66b341`. Due to a buildbot failure.	2020-07-29 15:04:57 -07:00
Hiroshi Yamauchi	120e66b341	[PGO] Include the mem ops into the function hash. To avoid hash collisions when the only difference is in mem ops. Differential Revision: https://reviews.llvm.org/D84782	2020-07-29 13:59:40 -07:00
Florian Hahn	f75564ad4e	Reland "[SCEVExpander] Add option to preserve LCSSA directly." This reverts the revert commit `dc28675768`. It includes a fix for Polly, which uses SCEVExpander on IR that is not in LCSSA form. Set PreserveLCSSA = false in that case, to ensure we do not introduce LCSSA phis where there were none before.	2020-07-29 20:41:53 +01:00
Matt Morehouse	e2d0b44a7c	[DFSan] Add efficient fast16labels instrumentation mode. Adds the -fast-16-labels flag, which enables efficient instrumentation for DFSan when the user needs <=16 labels. The instrumentation eliminates most branches and most calls to __dfsan_union or __dfsan_union_load. Reviewed By: vitalybuka Differential Revision: https://reviews.llvm.org/D84371	2020-07-29 18:58:47 +00:00
Florian Hahn	dc28675768	Revert "[SCEVExpander] Add option to preserve LCSSA directly." This reverts commit `99166fd4fb`, because it breaks the polly builders. polly/test/Isl/CodeGen/invariant_load_escaping_second_scop.ll fails because a apparently unnecessary LCSSA phi node is introduced. Make the bots green again, while I take a closer look.	2020-07-29 19:19:04 +01:00
Arthur Eubanks	71d0a2b8a3	[DFSan][NewPM] Port DataFlowSanitizer to NewPM Reviewed By: ychen, morehouse Differential Revision: https://reviews.llvm.org/D84707	2020-07-29 10:19:15 -07:00
Roman Lebedev	1d51dc38d8	[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108	2020-07-29 20:05:30 +03:00
David Sherwood	9ad7c980bb	[SVE] Don't consider scalable vector types in SLPVectorizerPass::vectorizeChainsInBlock In vectorizeChainsInBlock we try to collect chains of PHI nodes that have the same element type, but the code is relying upon the implicit conversion from TypeSize -> uint64_t. For now, I have modified the code to ignore PHI nodes with scalable types. Differential Revision: https://reviews.llvm.org/D83542	2020-07-29 16:29:19 +01:00
Florian Hahn	99166fd4fb	[SCEVExpander] Add option to preserve LCSSA directly. This patch teaches SCEVExpander to directly preserve LCSSA. As it is currently, SCEV does not look through PHI nodes in loops, as it might break LCSSA form. Once SCEVExpander can preserve LCSSA form, it should be safe for SCEV to look through PHIs. To preserve LCSSA form, this patch uses formLCSSAForInstructions on operands of newly created instructions, if the definition is inside a different loop than the new instruction. The final value we return from expandCodeFor may also need LCSSA phis, depending on the insert point. As no user for it exists there yet, create a temporary instruction at the insert point, which can be passed to formLCSSAForInstructions. This temporary instruction is removed after LCSSA construction. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D71538	2020-07-29 15:07:37 +01:00
David Green	60280e9818	[Analysis] TTI: Add CastContextHint for getCastInstrCost Currently, getCastInstrCost has limited information about the cast it's rating, often just the opcode and types. Sometimes there is a context instruction as well, but it isn't trustworthy: for instance, when the vectorizer is rating a plan, it calls getCastInstrCost with the old instructions when, in fact, it's trying to evaluate the cost of the instruction post-vectorization. Thus, the current system can get the cost of certain casts incorrect as the correct cost can vary greatly based on the context in which it's used. For example, if the vectorizer queries getCastInstrCost to evaluate the cost of a sext(load) with tail predication enabled, getCastInstrCost will think it's free most of the time, but it's not always free. On ARM MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar situations can come up with how masked loads can be extended when being split. To fix that, this path adds a new parameter to getCastInstrCost to give it a hint about the context of the cast. It adds a CastContextHint enum which contains the type of the load/store being created by the vectorizer - one for each of the types it can produce. Original patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79162	2020-07-29 13:32:53 +01:00
Yevgeny Rouban	5d6cd61904	[LoopSimplifyCFG] Delete landing pads in dead exit blocks In addition to removing phi nodes this patch removes any landing pad that the dead exit block might have. Without this fix Verifier complains about a new switch instruction jumps to a block with a landing pad. Differential Revision: https://reviews.llvm.org/D84320	2020-07-29 18:36:51 +07:00
Johannes Doerfert	450dc09d69	[SROA][Mem2Reg] Use efficient droppable use API (after D83976) Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D84804	2020-07-28 17:41:01 -05:00
Sanjay Patel	f75cf240d6	[InstCombine] avoid crashing on vector constant expression (PR46872)	2020-07-28 15:02:36 -04:00
Juneyoung Lee	4c9af6d0e0	[JumpThreading] Add a basic support for freeze instruction This patch adds a basic support for freeze instruction to JumpThreading by making ComputeValueKnownInPredecessorsImpl look into its operand. Reviewed By: efriedma, nikic Differential Revision: https://reviews.llvm.org/D84598	2020-07-29 03:12:14 +09:00
Arthur Eubanks	2ca6c422d2	[FunctionAttrs] Rename functionattrs -> function-attrs To match NewPM pass name, and also for readability. Also rename rpo-functionattrs -> rpo-function-attrs while we're here. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84694	2020-07-28 09:09:13 -07:00
Jinsong Ji	d28f86723f	Re-land "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `bf544fa1c3`. Fixed the typo in PPCInstrInfo.cpp.	2020-07-28 14:00:11 +00:00
Luofan Chen	5ee07dc53f	[Attributor] Track AA dependency using dependency graph This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D78861	2020-07-28 18:02:49 +08:00
Roman Lebedev	e40315d2b4	[GVN] Rewrite IsValueFullyAvailableInBlock(): no recursion, less false-negatives While this doesn't appear to help with the perf issue being exposed by D84108, the function as-is is very weird, convoluted, and what's worse, recursive. There was no need for `SpeculativelyAvaliableAndUsedForSpeculation`, tri-state choice is enough. We don't even ever check for that state. The basic idea here is that we need to perform a depth-first traversal of the predecessors of the basic block in question, either finding a preexisting state for the block in a map, or inserting a "placeholder" `SpeculativelyAvaliable`, If we encounter an `Unavaliable` block, then we need to give up search, and back-propagate the `Unavaliable` state to the each successor of said block, more specifically to the each `SpeculativelyAvaliable` we've just created. However, if we have traversed entirety of the predecessors and have not encountered an `Unavaliable` block, then it must mean the value is fully available. We could update each inserted `SpeculativelyAvaliable` into a `Avaliable`, but we don't need to, as assertion excersizes, because we can assume that if we see an `SpeculativelyAvaliable` entry, it is actually `Avaliable`, because during the time we've produced it, if we would have found that it has an `Unavaliable` predecessor, we would have updated it's successors, including this block, into `Unavaliable` Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D84181	2020-07-28 10:19:28 +03:00
Wei Mi	a23f62343c	Supplement instr profile with sample profile. PGO profile is usually more precise than sample profile. However, PGO profile needs to be collected from loadtest and loadtest may not be representative enough to the production workload. Sample profile collected from production can be used as a supplement -- for functions cold in loadtest but warm/hot in production, we can scale up the related function in PGO profile if the function is warm or hot in sample profile. The implementation contains changes in compiler side and llvm-profdata side. Given an instr profile and a sample profile, for a function cold in PGO profile but warm/hot in sample profile, llvm-profdata will either mark all the counters in the profile to be -1 or scale up the max count in the function to be above hot threshold, depending on the zero counter ratio in the profile. The assumption is if there are too many counters being zero in the function profile, the profile is more likely to cause harm than good, then llvm-profdata will mark all the counters to be -1 indicating the function is hot but the profile is unaccountable. In compiler side, if a function profile with all -1 counters is seen, the function entry count will be set to be above hot threshold but its internal profile will be dropped. In the long run, it may be useful to let compiler support using PGO profile and sample profile at the same time, but that requires more careful design and more substantial changes to make two profiles work seamlessly. The patch here serves as a simple intermediate solution. Differential Revision: https://reviews.llvm.org/D81981	2020-07-27 20:17:40 -07:00
Arthur Eubanks	c37bb5e2a5	[DFSan] Remove unused DataFlowSanitizer vars Reviewed By: morehouse Differential Revision: https://reviews.llvm.org/D84704	2020-07-27 14:59:07 -07:00
Jinsong Ji	bf544fa1c3	Revert "[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support" This reverts commit `adffce7153`. This is breaking test-suite, revert while investigation.	2020-07-27 21:07:00 +00:00
Roman Lebedev	351d234d86	[OpenMPOpt] Most SCC's are uninteresting, don't waste time on them (up to 16x faster) Summary: This seems obvious in hindsight, but the result is surprising. I've measured compile-time of `-openmpopt` pass standalone on RawSpeed unity build, and while there is some OpenMP stuff, most is not OpenMP. But nonetheless the pass does a lot of costly preparations before ever trying to look for OpenMP stuff in SCC. Numbers (n=25): 0.094624s -> 0.005976s, an -93.68% improvement, or ~16x Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: yaxunl, hiraditya, guansong, llvm-commits, sstefan1 Tags: #llvm Differential Revision: https://reviews.llvm.org/D84689	2020-07-27 23:36:34 +03:00
Jinsong Ji	adffce7153	[PowerPC] Remove QPX/A2Q BGQ/BGP CNK support Per RFC http://lists.llvm.org/pipermail/llvm-dev/2020-April/141295.html no one is making use of QPX/A2Q/BGQ/BGP CNK anymore. This patch remove the support of QPX/A2Q in llvm, BGQ/BGP in clang, CNK support in openmp/polly. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D83915	2020-07-27 19:24:39 +00:00
Kazu Hirata	902cbcd59e	Use llvm::is_contained where appropriate (NFC) Summary: This patch replaces std::find with llvm::is_contained where appropriate. Reviewers: efriedma, nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, jvesely, nhaehnle, hiraditya, rogfer01, kerbowa, llvm-commits, vkmr Tags: #llvm Differential Revision: https://reviews.llvm.org/D84489	2020-07-27 10:20:44 -07:00
Roman Lebedev	1da9834557	[JumpThreading] ProcessBranchOnXOR(): bailout if any pred ends in indirect branch (PR46857) SplitBlockPredecessors() can not split blocks that have such terminators, and in two other places we already ensure that we don't end up calling SplitBlockPredecessors() on such blocks. Do so in one more place. Fixes https://bugs.llvm.org/show_bug.cgi?id=46857	2020-07-27 15:39:03 +03:00
Nathan James	d127112724	[llvm][NFC] Silence unused variable warning by using isa over dyn_cast	2020-07-27 13:37:21 +01:00
Juneyoung Lee	e1eacf27c6	[InstCombine] Fold freeze into phi if one operand is not undef This patch adds folding freeze into phi if it has only one operand to target. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84601	2020-07-27 17:07:27 +09:00
Fangrui Song	fae221e7ad	[gcov] Simplify/speed up CFG hash calculation	2020-07-26 21:15:33 -07:00
Roman Lebedev	96d74530c0	[Reduce] Argument reduction: do deal with function declarations We can happily turn function definitions into declarations, thus obscuring their argument from being elided by this pass. I don't believe there is a good reason to just ignore declarations. likely even proper llvm intrinsics ones, at worst the input becomes uninteresting. The other question here is that all these transforms are all-or-nothing. In some cases, should we be treating each use separately? The main blocker here seemed to be that llvm::CloneFunctionInto() does `&OldFunc->front()`, which inserts a nullptr into a densemap, which is not happy about it and asserts.	2020-07-26 01:31:56 +03:00
Nikita Popov	632a89e866	[SCCP] Restore the change reporting as well Reapply `5db5b4bc43`.	2020-07-25 15:11:30 +02:00
Nikita Popov	ad16e71c95	Reapply [SCCP] Directly remove non-feasible edges Reapply with DTU update moved after CFG update, which is a requirement of the API. ----- Non-feasible control-flow edges are currently removed by replacing the branch condition with a constant and then calling ConstantFoldTerminator. This happens in a rather roundabout manner, by inspecting the users (effectively: predecessors) of unreachable blocks, and further complicated by the need to explicitly materialize the condition for "forced" edges. I would like to extend SCCP to discard switch conditions that are non-feasible based on range information, but this is incompatible with the current approach (as there is no single constant we could use.) Instead, this patch explicitly removes non-feasible edges. It currently only needs to handle the case where there is a single feasible edge. The llvm_unreachable() branch will need to be implemented for the aforementioned switch improvement. Differential Revision: https://reviews.llvm.org/D84264	2020-07-25 14:52:35 +02:00
Simon Pilgrim	b5e14d78f1	SimplifyLibCalls - remove unnecessary header and forward declaration. NFC. We include TargetLibraryInfo.h so don't need to forward declare it, and we don't need to include TargetLibraryInfo.h in SimplifyLibCalls.cpp as well.	2020-07-25 12:58:39 +01:00
Florian Hahn	3c1476d26c	[IPSCCP] Drop argmemonly after replacing pointer argument. This patch updates IPSCCP to drop argmemonly and inaccessiblemem_or_argmemonly if it replaces a pointer argument. Fixes PR46717. Reviewers: efriedma, davide, nikic, jdoerfert Reviewed By: efriedma, jdoerfert Differential Revision: https://reviews.llvm.org/D84432	2020-07-25 11:52:14 +01:00
Rong Xu	1dd39b1133	[PGO] Fix incorrect function entry count Function entry count might be zero after the profile counts reset and before reentry to the function. Zero profile entry count is very bad as the profile count from BFI will be wrong. A simple fix is to set the profile entry count to 1 if there are non-zero profile counts in this function. Differential Revision: https://reviews.llvm.org/D84378	2020-07-24 17:39:55 -07:00
Rong Xu	31bd15c562	[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction Skip profile count promotion if any of the ExitBlocks contains a ret instruction. This is to prevent dumping of incomplete profile -- if the the loop is a long running loop and dump is called in the middle of the loop, the result profile is incomplete. ExitBlocks containing a ret instruction is an indication of a long running loop -- early exit to error handling code. Differential Revision: https://reviews.llvm.org/D84379	2020-07-24 17:38:31 -07:00
Rong Xu	5546c2ab42	Revert "[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction" This reverts commit `6fdc6f6c7d`.	2020-07-24 17:35:44 -07:00
Rong Xu	6fdc6f6c7d	[PGO][InstrProf] Do not promote count if the exit blocks contains ret instruction Skip profile count promotion if any of the ExitBlocks contains a ret instruction. This is to prevent dumping of incomplete profile -- if the the loop is a long running loop and dump is called in the middle of the loop, the result profile is incomplete. ExitBlocks containing a ret instruction is an indication of a long running loop -- early exit to error handling code. Differential Revision: https://reviews.llvm.org/D84379	2020-07-24 17:13:58 -07:00
Johannes Doerfert	aa09db495a	[SROA] Teach promote to register about droppable instructions This is the second of two patches to address PR46753. We basically allow SROA to promote allocas that are used in doppable instructions, for now that means `llvm.assume`. The (transitive) uses are replaced by `undef` in the droppable instructions. See also D83976. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D83978	2020-07-24 15:15:39 -05:00
Johannes Doerfert	ce8928f2e4	[Mem2Reg] Teach promote to register about droppable instructions This is the first of two patches to address PR46753. We basically allow mem2reg to promote allocas that are used in doppable instructions, for now that means `llvm.assume`. The uses of the alloca (or a bitcast or zero offset GEP from there) are replaced by `undef` in the droppable instructions. Reviewed By: Tyker Differential Revision: https://reviews.llvm.org/D83976	2020-07-24 15:15:38 -05:00
Johannes Doerfert	ce2d69b557	[SROA][Mem2Reg] Do not crash on alloca + addrspacecast SROA knows that it can look through addrspacecast but PromoteMemoryToRegister did not handle them. This caused an assertion error for the test case, exposed while running `Transforms/PhaseOrdering/inlining-alignment-assumptions.ll` with D83978 applied. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D84085	2020-07-24 15:15:38 -05:00
Gui Andrade	1e77b3af12	[MSAN] Allow inserting array checks Flattens arrays by ORing together all their elements. Differential Revision: https://reviews.llvm.org/D84446	2020-07-24 20:12:58 +00:00
Simon Pilgrim	0128b9505c	Revert rG5dd566b7c7b78bd- "PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI." This reverts commit `5dd566b7c7`. Causing some buildbot failures that I'm not seeing on MSVC builds.	2020-07-24 13:02:33 +01:00
Simon Pilgrim	5dd566b7c7	PassManager.h - remove unnecessary Function.h/Module.h includes. NFCI. PassManager.h is one of the top headers in the ClangBuildAnalyzer frontend worst offenders list. This exposes a large number of implicit dependencies on various forward declarations/includes in other headers that need addressing.	2020-07-24 12:40:50 +01:00
Fangrui Song	4637daa990	Revert D84264 "[SCCP] Directly remove non-feasible edges" & `5db5b4bc43` It breaks stage-2 build. Clang crashed when compiling llvm/lib/Target/Hexagon/HexagonFrameLowering.cpp llvm/Support/GenericDomTree.h eraseNode: Node is not a leaf node	2020-07-23 17:51:48 -07:00
Sidharth Baveja	38a8217931	[Loop Fusion] Integrate Loop Peeling into Loop Fusion (re-land after fixing ASAN build failures) This patch adds the ability to peel off iterations of the first loop in loop fusion. This can allow for both loops to have the same trip count, making it legal for them to be fused together. Here is a simple scenario peeling can be used in loop fusion: for (i = 0; i < 10; ++i) a[i] = a[i] + 3; for (j = 1; j < 10; ++j) b[j] = b[j] + 5; Here is we can make use of peeling, and then fuse the two loops together. We can peel off the 0th iteration of the loop i, and then combine loop i and j for i = 1 to 10. a[0] = a[0] +3; for (i = 1; i < 10; ++i) { a[i] = a[i] + 3; b[i] = b[i] + 5; } Currently peeling with loop fusion is only supported for loops with constant trip counts and a single exit point. Both unguarded and guarded loops are supported. Reviewed By: bmahjour (Bardia Mahjour), MaskRay (Fangrui Song) Differential Revision: https://reviews.llvm.org/D82927	2020-07-23 21:02:04 +00:00
Nikita Popov	5db5b4bc43	[SCCP] Add missing change reporting Forgot to actually use the return value of the function.	2020-07-23 20:58:29 +02:00
Nikita Popov	9394c3ec88	[SCCP] Directly remove non-feasible edges Non-feasible control-flow edges are currently removed by replacing the branch condition with a constant and then calling ConstantFoldTerminator. This happens in a rather roundabout manner, by inspecting the users (effectively: predecessors) of unreachable blocks, and further complicated by the need to explicitly materialize the condition for "forced" edges. I would like to extend SCCP to discard switch conditions that are non-feasible based on range information, but this is incompatible with the current approach (as there is no single constant we could use.) Instead, this patch explicitly removes non-feasible edges. It currently only needs to handle the case where there is a single feasible edge. The llvm_unreachable() branch will need to be implemented for the aforementioned switch improvement. Differential Revision: https://reviews.llvm.org/D84264	2020-07-23 20:32:57 +02:00
Nikita Popov	def48b0e88	[PredicateInfo][SCCP] Remove assertion (PR46814) As long as RenamedOp is not guaranteed to be accurate, we cannot assert here and should just return false. This was already done for the other conditions in this function. Fixes https://bugs.llvm.org/show_bug.cgi?id=46814.	2020-07-23 19:36:51 +02:00
Gui Andrade	3285b24249	[MSAN] Allow emitting checks for struct types Differential Revision: https://reviews.llvm.org/D82680	2020-07-23 16:50:59 +00:00
Gui Andrade	0025d52c0f	[MSAN] Never allow checking calls to __sanitizer_unaligned_{load,store} These functions expect the caller to always pass shadows over TLS. Differential Revision: https://reviews.llvm.org/D84351	2020-07-23 16:42:59 +00:00
Simon Pilgrim	86fd5be6fd	AggressiveInstCombine.h - remove unused includes. NFC.	2020-07-23 16:20:13 +01:00
Braedy Kuzma	24e41a34fe	[Matrix] Add asserts for mismatched element types. This patch clarifies the failing point of having input or output vectors of differing types. Before, lowering would fail elsewhere (e.g. in `fmul` creation) which may have been not immediately clear. As a side effect, the `getElementType` and `getVectoryTy` functions required the `const` qualifier to be added. Reviewers: fhahn Reviewed By: fhahn Differential Revision: https://reviews.llvm.org/D84374	2020-07-23 16:02:48 +01:00
Florian Hahn	ecd3f853a8	[SCEVExpander] Use IRBuilderCallbackInserter to call rememberInstruction. Currently there are plenty of instructions that SCEVExpander creates but does not track as created. IRBuilder allows specifying a callback whenever an instruction is inserted. Use this to call rememberInstruction automatically for each created instruction. There are still a few rememberInstruction calls remaining, because in some cases Inst::Create functions are used to construct instructions. Suggested by @lebedev.ri in D75980. Reviewers: mkazantsev, reames, sanjoy.google, lebedev.ri Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D84326	2020-07-23 14:25:28 +01:00
Shinji Okumura	697c6d8907	[Attributor] Cache query results for isPotentiallyReachable in AAReachability Summary: This is the next patch of [[ https://reviews.llvm.org/D76210 \| D76210 ]]. This patch made a map in `InformationCache` for caching results. Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis Reviewed By: jdoerfert Subscribers: hiraditya, uenoku, kuter, bbn, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83246	2020-07-23 20:49:28 +09:00
Simon Pilgrim	5b20c14525	ValueProfileCollector.h - remove unnecessary includes. NFC.	2020-07-23 12:33:13 +01:00
Hiroshi Yamauchi	557db6f8aa	Reland D84057 [PGO][PGSO] Remove a temporary flag used for gradual rollout. The revert was a misfire. Remove the temporary flag PGSOIRPassOrTestOnly and the guard code which was used for the staged rollout. This is a cleanup (NFC) as it's now false by default. Differential Revision: https://reviews.llvm.org/D84057	2020-07-22 20:57:25 -07:00
Fangrui Song	27650ec554	Revert D81682 "[PGO] Extend the value profile buckets for mem op sizes." This reverts commit `4a539faf74`. There is a __llvm_profile_instrument_range related crash in PGO-instrumented clang: ``` (gdb) bt llvm::ConstantRange const&, llvm::APInt const&, unsigned int, bool) () llvm::ScalarEvolution::getRangeForAffineAR(llvm::SCEV const, llvm::SCEV const, llvm::SCEV const*, unsigned int) () ``` (The body of __llvm_profile_instrument_range is inlined, so we can only find__llvm_profile_instrument_target in the trace) ``` 23│ 0x000055555dba0961 <+65>: nopw %cs:0x0(%rax,%rax,1) 24│ 0x000055555dba096b <+75>: nopl 0x0(%rax,%rax,1) 25│ 0x000055555dba0970 <+80>: mov %rsi,%rbx 26│ 0x000055555dba0973 <+83>: mov 0x8(%rsi),%rsi # %rsi=-1 -> SIGSEGV 27│ 0x000055555dba0977 <+87>: cmp %r15,(%rbx) 28│ 0x000055555dba097a <+90>: je 0x55555dba0a76 <__llvm_profile_instrument_target+342> ```	2020-07-22 16:08:25 -07:00
Rong Xu	50da55a585	[PGO] Supporting code for always instrumenting entry block This patch includes the supporting code that enables always instrumenting the function entry block by default. This patch will NOT the default behavior. It adds a variant bit in the profile version, adds new directives in text profile format, and changes llvm-profdata tool accordingly. This patch is a split of D83024 (https://reviews.llvm.org/D83024) Many test changes from D83024 are also included. Differential Revision: https://reviews.llvm.org/D84261	2020-07-22 15:01:53 -07:00
Fangrui Song	dbdda8232a	Revert D84057 "[PGO][PGSO] Remove a temporary flag used for gradual rollout." This reverts commit `e64afefdf8`. It caused a PGO bootstrapped clang to crash on many source files. `__llvm_profile_instrument_range` seems to trigger a null pointer dereference. Call stack: __llvm_profile_instrument_range llvm::APInt::udiv(llvm::APInt const&) const getRangeForAffineARHelper	2020-07-22 14:28:28 -07:00
Fangrui Song	5724c8ba29	Temporarily revert D83903 "[PGO] Enable the extended value profile buckets for mem op sizes." `__llvm_profile_instrument_memop` transitively calls calloc, thus calloc should not be instrumented. I saw a `calloc -> __llvm_profile_instrument_memop -> calloc -> __llvm_profile_instrument_memop -> ...` infinite loop leading to stack overflow when the malloc implementation (e.g. tcmalloc) is built and instrumented along with the application. We should figure out the library calls which may be instrumented and disable their instrumentation before rolling out this change. Reviewed By: yamauchi Differential Revision: https://reviews.llvm.org/D84358	2020-07-22 13:12:19 -07:00
Gui Andrade	33d239513c	[MSAN] Instrument libatomic load/store calls These calls are neither intercepted by compiler-rt nor is libatomic.a naturally instrumented. This patch uses the existing libcall mechanism to detect a call to atomic_load or atomic_store, and instruments them much like the preexisting instrumentation for atomics. Calls to _load are modified to have at least Acquire ordering, and calls to _store at least Release ordering. Because this needs to be converted at runtime, msan injects a LUT (implemented as a vector with extractelement). Differential Revision: https://reviews.llvm.org/D83337	2020-07-22 16:45:06 +00:00
Sebastian Neubauer	2a6c871596	[InstCombine] Move target-specific inst combining For a long time, the InstCombine pass handled target specific intrinsics. Having target specific code in general passes was noted as an area for improvement for a long time. D81728 moves most target specific code out of the InstCombine pass. Applying the target specific combinations in an extra pass would probably result in inferior optimizations compared to the current fixed-point iteration, therefore the InstCombine pass resorts to newly introduced functions in the TargetTransformInfo when it encounters unknown intrinsics. The patch should not have any effect on generated code (under the assumption that code never uses intrinsics from a foreign target). This introduces three new functions: TargetTransformInfo::instCombineIntrinsic TargetTransformInfo::simplifyDemandedUseBitsIntrinsic TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic A few target specific parts are left in the InstCombine folder, where it makes sense to share code. The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. This allows to move about 3000 lines out from InstCombine to the targets. Differential Revision: https://reviews.llvm.org/D81728	2020-07-22 15:59:49 +02:00
Sebastian Neubauer	2c659082bd	[AMDGPU] Don't combine memory intrs to v3i16 v3i16 and v3f16 currently cannot be legalized and lowered so they should not be emitted by inst combining. Moved the check down to still allow extracting 1 or 2 elements via the dmask. Fixes image intrinsics being combined to return v3x16. Differential Revision: https://reviews.llvm.org/D84223	2020-07-22 12:44:01 +02:00
Sjoerd Meijer	5567c62afa	[Matrix] Add LowerMatrixIntrinsics to the NPM Pass LowerMatrixIntrinsics wasn't running yet running under the new pass manager, and this adds LowerMatrixIntrinsics to the pipeline (to the same place as where it is running in the old PM). Differential Revision: https://reviews.llvm.org/D84180	2020-07-22 09:47:53 +01:00
Max Kazantsev	360ab70712	[SimplifyCFG] Do not create unneeded PR Phi in block with convergent calls We do not thread blocks with convergent calls, but this check was missing when we decide to insert PR Phis into it (which we only do for threading). Differential Revision: https://reviews.llvm.org/D83936 Reviewed By: nikic	2020-07-22 13:53:50 +07:00
Fangrui Song	8a268bec1b	Revert D82927 "[Loop Fusion] Integrate Loop Peeling into Loop Fusion" This reverts commit `bb8850d34d`. It broke 3 check-llvm-transforms-loopfusion tests in an ASAN build. LoopFuse.cpp `for (BasicBlock *Pred : predecessors(BB)) {` may operate on a deleted BB.	2020-07-21 12:24:50 -07:00
Hiroshi Yamauchi	7bedae7dee	[PGO][PGSO] Add profile guided size optimization to loop vectorization legality.	2020-07-21 11:16:36 -07:00
Jordan Rupprecht	1ee1da1ea5	[NFC] Fix unused var warning	2020-07-21 09:26:01 -07:00
Sidharth Baveja	bb8850d34d	[Loop Fusion] Integrate Loop Peeling into Loop Fusion Summary: This patch adds the ability to peel off iterations of the first loop in loop fusion. This can allow for both loops to have the same trip count, making it legal for them to be fused together. Here is a simple scenario peeling can be used in loop fusion: for (i = 0; i < 10; ++i) a[i] = a[i] + 3; for (j = 1; j < 10; ++j) b[j] = b[j] + 5; Here is we can make use of peeling, and then fuse the two loops together. We can peel off the 0th iteration of the loop i, and then combine loop i and j for i = 1 to 10. a[0] = a[0] +3; for (i = 1; i < 10; ++i) { a[i] = a[i] + 3; b[i] = b[i] + 5; } Currently peeling with loop fusion is only supported for loops with constant trip counts and a single exit point. Both unguarded and guarded loops are supported. Author: sidbav (Sidharth Baveja) Reviewers: kbarton, Meinersbur, bkramer, Whitney, skatkov, ashlykov, fhahn, bmahjour Reviewed By: bmahjour Subscribers: bmahjour, mgorny, hiraditya, zzheng Tags: LLVM Differential Revision: https://reviews.llvm.org/D82927	2020-07-21 15:59:14 +00:00
Jon Roelofs	dc09c65f63	LoopIdiomRecognize: use ExpandedValuesCleaner in another place This is a necessary cleanup after having expanded a SCEV. See: https://reviews.llvm.org/D84071#inline-774728 Differential Revision: https://reviews.llvm.org/D84174	2020-07-21 09:32:23 -06:00
Jon Roelofs	4d75cc4b0a	More conservatively report status from LoopIdiomRecognize Being "precise" here is getting us into trouble with one of the EXPENSIVE_CHECKS buildbots, see [1]. Rather than reporting IR additions that later get rolled back as "no change", instead we now conservatively report that there was. 1: http://lists.llvm.org/pipermail/llvm-dev/2020-July/143509.html Differential Revision: https://reviews.llvm.org/D84071	2020-07-21 09:32:22 -06:00
Florian Hahn	752fea7c27	[SCCP] Add range metadata to call sites with known return ranges. If we inferred a range for the function return value, we can add !range at all call-sites of the function, if the range does not include undef. Reviewers: efriedma, davide, nikic Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D83952	2020-07-21 10:06:54 +01:00
Sanjay Patel	750f4c591d	[InstCombine] allow peeking through zext of shift amount to match rotate idioms (PR45701) We might want to also allow trunc of the shift amount, but that seems less likely? define i32 @src(i32 %x, i1 %y) { %0: %rem = and i1 %y, 1 %cmp = icmp eq i1 %rem, 0 %sh_prom = zext i1 %rem to i32 %sub = sub nsw nuw i1 0, %rem %sh_prom1 = zext i1 %sub to i32 %shr = lshr i32 %x, %sh_prom1 %shl = shl i32 %x, %sh_prom %or = or i32 %shl, %shr %r = select i1 %cmp, i32 %x, i32 %or ret i32 %r } => define i32 @tgt(i32 %x, i1 %y) { %0: %t = zext i1 %y to i32 %r = fshl i32 %x, i32 %x, i32 %t ret i32 %r } Transformation seems to be correct! https://alive2.llvm.org/ce/z/xgMvE3 http://bugs.llvm.org/PR45701	2020-07-20 16:18:11 -04:00
Florian Hahn	f13a59bcff	[Matrix] Use TileInfo to create tiled loop nest for matrix multiply. This patch uses the TileInfo introduced in D77550 to generate a loop nest for tiled matrix multiplication, instead of generating the unrolled code for the whole multiplication. This makes code-generation more scalable for larger matrixes. Initially loops are only used if both the number of rows and columns are divisible by the tile size. Other cases will be added as follow-up. Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D81308	2020-07-20 21:11:53 +01:00
Hiroshi Yamauchi	9f5d8e8a72	[PGO] Enable the extended value profile buckets for mem op sizes. Following up D81682 and enable the new, extended value profile buckets for mem op sizes. Differential Revision: https://reviews.llvm.org/D83903	2020-07-20 12:05:09 -07:00
Hiroshi Yamauchi	e64afefdf8	[PGO][PGSO] Remove a temporary flag used for gradual rollout. Remove the temporary flag PGSOIRPassOrTestOnly and the guard code which was used for the staged rollout. This is a cleanup (NFC) as it's now false by default. Differential Revision: https://reviews.llvm.org/D84057	2020-07-20 11:12:11 -07:00
Florian Hahn	e1270b16c9	[Matrix] Add TileInfo abstraction for tiled matrix code-gen. This patch adds a TileInfo abstraction and utilities to create a 3-level loop nest for tiling. Reviewers: anemet Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D77550	2020-07-20 18:49:08 +01:00
Matt Arsenault	5e999cbe8d	IR: Define byref parameter attribute This allows tracking the in-memory type of a pointer argument to a function for ABI purposes. This is essentially a stripped down version of byval to remove some of the stack-copy implications in its definition. This includes the base IR changes, and some tests for places where it should be treated similarly to byval. Codegen support will be in a future patch. My original attempt at solving some of these problems was to repurpose byval with a different address space from the stack. However, it is technically permitted for the callee to introduce a write to the argument, although nothing does this in reality. There is also talk of removing and replacing the byval attribute, so a new attribute would need to take its place anyway. This is intended avoid some optimization issues with the current handling of aggregate arguments, as well as fixes inflexibilty in how frontends can specify the kernel ABI. The most honest representation of the amdgpu_kernel convention is to expose all kernel arguments as loads from constant memory. Today, these are raw, SSA Argument values and codegen is responsible for turning these into loads. Background: There currently isn't a satisfactory way to represent how arguments for the amdgpu_kernel calling convention are passed. In reality, arguments are passed in a single, flat, constant memory buffer implicitly passed to the function. It is also illegal to call this function in the IR, and this is only ever invoked by a driver of some kind. It does not make sense to have a stack passed parameter in this context as is implied by byval. It is never valid to write to the kernel arguments, as this would corrupt the inputs seen by other dispatches of the kernel. These argumets are also not in the same address space as the stack, so a copy is needed to an alloca. From a source C-like language, the kernel parameters are invisible. Semantically, a copy is always required from the constant argument memory to a mutable variable. The current clang calling convention lowering emits raw values, including aggregates into the function argument list, since using byval would not make sense. This has some unfortunate consequences for the optimizer. In the aggregate case, we end up with an aggregate store to alloca, which both SROA and instcombine turn into a store of each aggregate field. The optimizer never pieces this back together to see that this is really just a copy from constant memory, so we end up stuck with expensive stack usage. This also means the backend dictates the alignment of arguments, and arbitrarily picks the LLVM IR ABI type alignment. By allowing an explicit alignment, frontends can make better decisions. For example, there's real no advantage to an aligment higher than 4, so a frontend could choose to compact the argument layout. Similarly, there is a high penalty to using an alignment lower than 4, so a frontend could opt into more padding for small arguments. Another design consideration is when it is appropriate to expose the fact that these arguments are all really passed in adjacent memory. Currently we have a late IR optimization pass in codegen to rewrite the kernel argument values into explicit loads to enable vectorization. In most programs, unrelated argument loads can be merged together. However, exposing this property directly from the frontend has some disadvantages. We still need a way to track the original argument sizes and alignments to report to the driver. I find using some side-channel, metadata mechanism to track this unappealing. If the kernel arguments were exposed as a single buffer to begin with, alias analysis would be unaware that the padding bits betewen arguments are meaningless. Another family of problems is there are still some gaps in replacing all of the available parameter attributes with metadata equivalents once lowered to loads. The immediate plan is to start using this new attribute to handle all aggregate argumets for kernels. Long term, it makes sense to migrate all kernel arguments, including scalars, to be passed indirectly in the same manner. Additional context is in D79744.	2020-07-20 10:23:09 -04:00
Benjamin Kramer	e88b6ed748	[LLE] std::inserter doesn't work with SmallSet, so don't use it.	2020-07-20 15:47:42 +02:00
Benjamin Kramer	44ab60f74d	[LoopSimplify] Use SmallPtrSet and range for loops more. NFCI.	2020-07-20 15:00:59 +02:00
Florian Hahn	dc1087d408	[Matrix] Add minimal lowering pass that only requires TTI. This patch adds a new variant of the matrix lowering pass that only does a minimal lowering and only depends on TTI. The main purpose of this pass is to have a pass with minimal dependencies to run as part of the backend pipeline. At the moment, the only difference to the regular lowering pass is that it does not support remarks. But in subsequent patches add support for tiling to the lowering pass which will require more analysis, which we do not want to run in the backend, as the lowering should happen in the middle-end in practice and running it in the backend is mostly for convenience when running llc. Reviewers: anemet, Gerolf, efriedma, hfinkel Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D76867	2020-07-20 11:16:11 +01:00
Roman Lebedev	04b729d076	[NFCI][SimplifyCFG] Guard common code hoisting with a (default-on) flag Common code sinking is already guarded with a (with default-off!) flag, so add a flag for hoisting, too. D84108 will hopefully make hoisting off-by-default too.	2020-07-20 10:29:57 +03:00
Juneyoung Lee	0a6aee5160	[ValueTracking] Add canCreateUndefOrPoison & let canCreatePoison use Operator This patch - adds `canCreateUndefOrPoison` - refactors `canCreatePoison` so it can deal with constantexprs `canCreateUndefOrPoison` will be used at D83926. Reviewed By: nikic, jdoerfert Differential Revision: https://reviews.llvm.org/D84007	2020-07-20 01:24:30 +09:00
Wenlei He	d41d952be9	Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks" This reverts commit `2d6ecfa168`.	2020-07-19 08:49:04 -07:00
Wenlei He	2d6ecfa168	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks Summary: This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose. Subscribers: mgorny, aprantl, hiraditya, llvm-commits Tags: #llvm Resubmit for https://reviews.llvm.org/D84086	2020-07-19 08:21:05 -07:00
Roman Lebedev	2f3862eb9f	Reland "[InstCombine] Lower infinite combine loop detection thresholds" This reverts commit `4500db8c59`, which was reverted because lower thresholds exposed a new issue (PR46680). Now that it was resolved by `d12ec0f752`, we can reinstate lower limits and wait for a new bugreport before reverting this again...	2020-07-19 16:37:03 +03:00
Nikita Popov	c6e13667e7	[PredicateInfo] Add a method to interpret predicate as cmp constraint Both users of predicteinfo (NewGVN and SCCP) are interested in getting a cmp constraint on the predicated value. They currently implement separate logic for this. This patch adds a common method for this in PredicateBase. This enables a missing bit of PredicateInfo handling in SCCP: Now the predicate on the condition itself is also used. For switches it means we know that the switched-on value is the same as the case value. For assumes/branches we know that the condition is true or false. Differential Revision: https://reviews.llvm.org/D83640	2020-07-19 15:34:32 +02:00
Roman Lebedev	fb5577d4f8	[NFCI][GVN] Make IsValueFullyAvailableInBlock() readable - use enum class instead of magic numbers This does not change any logic, it only wraps the magic 0/1/2/3 constants into an enum class.	2020-07-19 16:33:56 +03:00
Nikita Popov	d12ec0f752	[InstCombine] Fix store merge worklist management (PR46680) Fixes https://bugs.llvm.org/show_bug.cgi?id=46680. Just like insertions through IRBuilder, InsertNewInstBefore() should be using the deferred worklist mechanism, so that processing of newly added instructions is prioritized. There's one side-effect of the worklist order change which could be classified as a regression. An add op gets pushed through a select that at the time is not a umax. We could add a reverse transform that tries to push adds in the reverse direction to restore a min/max, but that seems like a sure way of getting infinite loops... Seems like something that should best wait on min/max intrinsics. Differential Revision: https://reviews.llvm.org/D84109	2020-07-19 15:05:45 +02:00
Fangrui Song	5809a32e7c	[gcov] Add __gcov_dump/__gcov_reset and delete __gcov_flush GCC r187297 (2012-05) introduced `__gcov_dump` and `__gcov_reset`. `__gcov_flush = __gcov_dump + __gcov_reset` The resolution to https://gcc.gnu.org/PR93623 ("No need to dump gcdas when forking" target GCC 11.0) removed the unuseful and undocumented __gcov_flush. Close PR38064. Reviewed By: calixte, serge-sans-paille Differential Revision: https://reviews.llvm.org/D83149	2020-07-18 15:07:46 -07:00
Roman Lebedev	9dceb32f30	[NFC][CVP] processSDiv(): pacify gcc compilers	2020-07-18 19:41:43 +03:00
Florian Hahn	4b19cccbb5	[PredicateInfo] Fold PredicateWithCondition into PredicateBase (NFC). Each concrete instance of a predicate has a condition (also noted in the original PredicateBase comment) and to me it seems like there is no clear benefit of having both PredicateBase and PredicateWithCondition and they can be folded together. Reviewers: nikic, efriedma Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84089	2020-07-18 16:21:56 +01:00
Roman Lebedev	8d487668d0	[CVP] Soften SDiv into a UDiv as long as we know domains of both of the operands. Yes, if operands are non-positive this comes at the extra cost of two extra negations. But a. division is already just ridiculously costly, two more subtractions can't hurt much :) and b. we have better/more analyzes/folds for an unsigned division, we could end up narrowing it's bitwidth, converting it to lshr, etc. This is essentially a take two on `0fdcca07ad`, which didn't fix the potential regression i was seeing, because ValueTracking's computeKnownBits() doesn't make use of dominating conditions in it's analysis. While i could teach it that, this seems like the more general fix. This big hammer actually does catch said potential regression. Over vanilla test-suite + RawSpeed + darktable (10M IR instrs, 1M IR BB, 1M X86 ASM instrs), this fires/converts 5 more (+2%) SDiv's, the total instruction count at the end of middle-end pipeline is only +6, so out of +10 extra negations, ~half are folded away, and asm instr count is only +1, so practically speaking all extra negations are folded away and are therefore free. Sadly, all these new UDiv's remained, none folded away. But there are two less basic blocks. https://rise4fun.com/Alive/VS6 Name: v0 Pre: C0 >= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 C0, C1 Name: v1 Pre: C0 <= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 -C0, C1 %r = sub i8 0, %t0 Name: v2 Pre: C0 >= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 C0, -C1 %r = sub i8 0, %t0 Name: v3 Pre: C0 <= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 -C0, -C1	2020-07-18 17:59:56 +03:00
Roman Lebedev	45b7388824	[NFC][CVP] Rename predicates - s/positive/non negative/ to better note that zero is ok	2020-07-18 17:59:32 +03:00
Roman Lebedev	2cde6984d8	[NFC][CVP] Refactor isPositive() out of hasPositiveOperands()	2020-07-18 17:59:32 +03:00
Gui Andrade	951584db4f	Revert "update libatomic instrumentation" This was committed mistakenly. This reverts commit `1f29171ae7`.	2020-07-18 03:53:00 +00:00
Gui Andrade	1f29171ae7	update libatomic instrumentation	2020-07-18 03:39:21 +00:00
Chen Zheng	6d247f980d	[SCEV][IndVarSimplify] insert point should not be block front. Recommit after removing the unused cast instructions. Differential Revision: https://reviews.llvm.org/D80975	2020-07-17 22:25:10 -04:00
Kuba Mracek	176a6e7abe	[asan] Use dynamic shadow memory position on Apple Silicon macOS This is needed because macOS on Apple Silicon has some reserved pages inside the "regular" shadow memory location, and mapping over that location fails. Differential Revision: https://reviews.llvm.org/D82912	2020-07-17 17:40:21 -07:00
Arthur Eubanks	0dfa4a83fa	Revert "[PGO][PGSO] Add profile guided size optimization to loop vectorization legality." This reverts commit `30c382a7c6`. See https://crbug.com/1106813.	2020-07-17 16:47:41 -07:00
Leonard Chan	cf5df40c4c	Revert "[AddressSanitizer] Don't use weak linkage for __{start,stop}_asan_globals" This reverts commit `d76e62fdb7`. Reverting since this can lead to linker errors: ``` ld.lld: error: undefined hidden symbol: __start_asan_globals ``` when using --gc-sections. The linker can discard __start_asan_globals once there are no more `asan_globals` sections left, which can lead to this error if we have external linkages to them.	2020-07-17 15:29:50 -07:00
Eric Christopher	ae08dbc673	Temporarily Revert "[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks" as it is failing the inline-replay.ll test as well as sanitizers/Werror from returning a stack local variable. This reverts commit `029946b112`.	2020-07-17 14:58:01 -07:00
Wenlei He	029946b112	[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks Summary: This change added a new inline advisor that takes optimization remarks for previous inlining as input, and provide the decision as advice so current inlining can replay inline decision of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites. The change can be useful for Inliner tuning. A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inliner advisor with SampleProfileLoader's inline decision for replay. The new inline advisor can also be used by regular CGSCC inliner later if needed. Reviewers: davidxl, mtrofin, wmi, hoy Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83743	2020-07-17 13:30:47 -07:00
Roman Lebedev	0fdcca07ad	[InstCombine] Fold X sdiv (-1 << C) -> -(X u>> Y) iff X is non-negative This is the one i'm seeing as missed optimization, although there are likely other possibilities, as usual. There are 4 variants of a general sdiv->udiv fold: https://rise4fun.com/Alive/VS6 Name: v0 Pre: C0 >= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 C0, C1 Name: v1 Pre: C0 <= 0 && C1 >= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 -C0, C1 %r = sub i8 0, %t0 Name: v2 Pre: C0 >= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %t0 = udiv i8 C0, -C1 %r = sub i8 0, %t0 Name: v3 Pre: C0 <= 0 && C1 <= 0 %r = sdiv i8 C0, C1 => %r = udiv i8 -C0, -C1 If we really don't like sdiv (more than udiv that is), and are okay with increasing instruction count (2 new negations), and we ensure that we don't undo the fold, then we could just implement these..	2020-07-17 22:50:09 +03:00
Stanislav Mekhanoshin	efb5040262	Fixed warning about signed/unsigned comparison I've got the report clang11 issues signed/unsigned mismatch warning here. For some reason only clang11 seems to issue this warning. Differential Revision: https://reviews.llvm.org/D83916	2020-07-17 11:03:42 -07:00
Florian Hahn	31d71c69f1	[Matrix] Only run matrix lowering early with -O0. Currently matrix lowering is run twice if OptLevel > 0. Fix that and also add a test for OptLevel > 0 with matrix lowering enabled.	2020-07-17 15:53:16 +01:00
Sidharth Baveja	11e879d4f1	[Loop Simplify] Resolve an issue where metadata is not applied to a loop latch. Summary: This patch resolves an issue where the metadata of a loop is not added to the new loop latch, and not removed from the old loop latch. This issue occurs in the SplitBlockPredecessors function, which adds a new block in a loop, and in the case that the block passed into this function is the header of the loop, the loop can be modified such that the latch of the loop is replaced. This patch applies to the Loop Simplify pass since it ensures that each loop has exit blocks which only have predecessors that are inside of the loop. In the case that this is not true, the pass will create a new exit block for the loop. This guarantees that the loop preheader/header will dominate the exit blocks. Author: sidbav (Sidharth Baveja) Reviewers: asbirlea (Alina Sbirlea), chandlerc (Chandler Carruth), Whitney (Whitney Tsang), bmahjour (Bardia Mahjour) Reviewed By: asbirlea (Alina Sbirlea) Subscribers: hiraditya (Aditya Kumar), llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D83869	2020-07-17 14:02:14 +00:00
Anna Welker	23c9534515	[LV] Enable the LoopVectorizer to create pointer inductions This patch enables the LoopVectorizer to build a phi of pointer type and provide the vector loads and stores with vector type getelementptrs built from the pointer induction variable, which produces much less instructions than the previous approach of creating scalar getelementpointers and glue them together to a vector. Differential Revision: https://reviews.llvm.org/D81267	2020-07-17 13:35:07 +01:00
Benjamin Kramer	9a0689e072	Make helpers static. NFC.	2020-07-17 13:49:11 +02:00
Marco Elver	785d41a261	[TSan] Add option for emitting compound read-write instrumentation This adds option -tsan-compound-read-before-write to emit different instrumentation for the write if the read before that write is omitted from instrumentation. The default TSan runtime currently does not support the different instrumentation, and the option is disabled by default. Alternative runtimes, such as the Kernel Concurrency Sanitizer (KCSAN) can make use of the feature. Indeed, the initial motivation is for use in KCSAN as it was determined that due to the Linux kernel having a large number of unaddressed data races, it makes sense to improve performance and reporting by distinguishing compounded operations. E.g. the compounded instrumentation is typically emitted for compound operations such as ++, +=, \|=, etc. By emitting different reports, such data races can easily be noticed, and also automatically bucketed differently by CI systems. Reviewed By: dvyukov, glider Tags: #llvm Differential Revision: https://reviews.llvm.org/D83867	2020-07-17 10:24:20 +02:00
Max Kazantsev	c989881078	[InstCombine] Fix replace select with Phis when branch has the same labels ``` define i32 @test(i1 %cond) { entry: br i1 %cond, label %exit, label %exit exit: %result = select i1 %cond, i32 123, i32 456 ret i32 %result } ``` In this test, after applying transformation of replacing select with Phis, the result will be: ``` define i32 @test(i1 %cond) { entry: br i1 %cond, label %exit, label %exit exit: %result = i32 phi [123, %exit], [123, %exit] ret i32 %result } ``` That is, select is transformed into an invalid Phi, which will then be reduced to 123 and the second value will be lost. But it is worth noting that this problem will arise only if select is in the InstCombine worklist will be before the branch. Otherwise, InstCombine will replace the branch condition with false and transformation will not be applied. The fix is to check the target labels in the branch condition for equality. Patch By: Kirill Polushin Differential Revision: https://reviews.llvm.org/D84003 Reviewed By: mkazantsev	2020-07-17 14:04:58 +07:00
Jon Roelofs	a0537fc35f	[SimplifyCFG] Fix crash in the EXPENSIVE_CHECKS build SimplifyCFG was incorrectly reporting to the pass manager that it had not made changes after folding away a PHI. This is detected in the EXPENSIVE_CHECKS build when the function's hash changes. Differential Revision: https://reviews.llvm.org/D83985	2020-07-16 15:34:41 -06:00
Eric Christopher	7bfaa40086	Temporarily Revert "[AssumeBundles] Use operand bundles to encode alignment assumptions" due to the performance bugs filed in https://bugs.llvm.org/show_bug.cgi?id=46753. An SROA change soon may obviate some of these problems. This reverts commit `8d09f20798`.	2020-07-16 11:54:04 -07:00
Nadav Rotem	8f0a8ed44e	[InjectTLIMappings] Use StringRef instead of std::string for FN name. https://reviews.llvm.org/D83797	2020-07-16 11:53:04 -07:00
Matt Arsenault	023883a834	IR: Rename Argument::hasPassPointeeByValueAttr to prepare for byref When the byref attribute is added, there will need to be two similar functions for the existing cases which have an associate value copy, and byref which does not. Most, but not all of the existing uses will use the existing version. The associated size function added by D82679 also needs to contextually differ, and will help eliminate a few places still relying on pointee element types.	2020-07-16 13:50:49 -04:00
Florian Hahn	569868f6b7	[SCCP] Only track returns of functions with non-void ret ty (NFC). There is no need to add functions with void return types to the set of tracked return values. This does not change functionality, because we such functions do not have return values and we never update or access them.	2020-07-16 15:15:19 +01:00
Roman Lebedev	30f6c08ba3	Reland "[NFC] SimplifyCFG: refactor/deduplicate command-line settings override handling" Initially i forgot to stage the SimplifyCFGPass::SimplifyCFGPass() change to actually take the passed params..	2020-07-16 15:25:11 +03:00
Roman Lebedev	ff2f5c3e58	Revert "[NFC] SimplifyCFG: refactor/deduplicate command-line settings override handling" Seems to be breaking the bots. This reverts commit `740a1da108`.	2020-07-16 14:27:07 +03:00
Roman Lebedev	740a1da108	[NFC] SimplifyCFG: refactor/deduplicate command-line settings override handling	2020-07-16 13:40:02 +03:00
Roman Lebedev	fb432a51f4	Reland "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions" This reverts commit `1067d3e176`, which reverted commit `b2018198c3`, because it introduced a Dependency Cycle between Transforms/Scalar and Transforms/Utils. So let's just move SimplifyCFGOptions.h into Utils/, thus avoiding the cycle.	2020-07-16 13:40:01 +03:00
Florian Hahn	cbe0e539e7	[Matrix] Also run lowering during -O0. Currently the backends cannot lower the matrix intrinsics directly and rely on the lowering to vector instructions happening in the middle-end. At the moment, this means the backend crashes when matrix types extension code is compiled with -O0, e.g. http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-aarch64-O0-g/7902/ This patch enables also runs the lowering with -O0 in the middle-end as a temporary solution. Long term, a lightweight version of the lowering should run in the backend, on demand.	2020-07-16 10:51:31 +01:00
Max Kazantsev	90798e09e2	Re-enable "[InstCombine] Simplify boolean Phis with const inputs using CFG" This reverts commit `b893822e32`. + Clang test fixes + Insertion point fix for landing pads	2020-07-16 16:09:08 +07:00
Adrian Kuegel	1067d3e176	Revert "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions" This reverts commit `b2018198c3`. This commit introduced a Dependency Cycle between Transforms/Scalar and Transforms/Utils. Transforms/Scalar already depends on Transforms/Utils, so if SimplifyCFGOptions.h is moved to Scalar, and Utils/Local.h still depends on it, we have a cycle.	2020-07-16 10:54:10 +02:00
Max Kazantsev	b893822e32	Revert "[InstCombine] Simplify boolean Phis with const inputs using CFG" This reverts commit `00472067c3`. Need to fix failing clang tests.	2020-07-16 12:58:39 +07:00
Max Kazantsev	00472067c3	[InstCombine] Simplify boolean Phis with const inputs using CFG This patch adds simplification for pattern: ``` if (cond) / \ ... ... \ / p = phi [true] [false] ... br p, succ_1, succ_2 ``` If we can prove that top block's branches dominate respective inputs of a block that has a Phi with constant inputs, we can use the branch condition (maybe inverted) instead of Phi. This will make proofs of implication for further jump threading more transparent. Differential Revision: https://reviews.llvm.org/D81375 Reviewed By: xbolva00	2020-07-16 12:06:10 +07:00
Roman Lebedev	b2018198c3	[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions Taking so many parameters is simply unmaintainable. We don't want to include the entire llvm/Transforms/Utils/Local.h into llvm/Transforms/Scalar.h so i've split SimplifyCFGOptions into it's own header.	2020-07-16 01:27:54 +03:00
Roman Lebedev	af19b1ceef	[NFCI] CFGSimplifyPass: change (the only) constructor to take SimplifyCFGOptions Taking that long list of parameters is already simply unmaintainable.	2020-07-16 01:27:53 +03:00
Roman Lebedev	2815429d08	[NFC][SimplifyCFG] HoistThenElseCodeToIf(): after hoisting terminator, do return Changed, not just true Otherwise, if Changed was still false before that, we would not account for that hoist in NumHoistCommonCode statistic.	2020-07-16 00:32:48 +03:00
Roman Lebedev	1cfc24fd67	[NFC][SimplifyCFG] HoistThenElseCodeToIf(): count number of common instruction "blocks" hoisted I.e. out of all the times HoistThenElseCodeToIf() was called, how many times did it actually hoist something?	2020-07-16 00:21:56 +03:00
Roman Lebedev	7b53ad88d4	[NFC][SimplifyCFG] HoistThenElseCodeToIf(): count number of common instructions hoisted	2020-07-16 00:21:56 +03:00
Roman Lebedev	3fc1defc0b	[NFC][SimplifyCFG] SinkCommonCodeFromPredecessors(): count number of instruction "blocks" actually sunk Out of all the times the function was called, how many times did we actually sink anything?	2020-07-16 00:21:56 +03:00
Roman Lebedev	9ed65c76c0	[NFC][SimplifyCFG] SinkCommonCodeFromPredecessors(): add debug output when failing to actually sink instr	2020-07-16 00:21:55 +03:00
Roman Lebedev	4c79864488	[NFC][SimplifyCFG] SinkCommonCodeFromPredecessors(): early return if nothing to sink If we can't sink even one instruction, early return, to increase readability.	2020-07-16 00:21:55 +03:00
Roman Lebedev	702a3c6410	[NFC][SimplifyCFG] Rename statistic NumSinkCommons into NumSinkCommonInstrs It really counts instructions added into common block, not number of instruction groups sunk.	2020-07-16 00:21:55 +03:00
Roman Lebedev	ce4459a0db	[NFC][LoopRotate] Add a statistic for how many times rotation failed due to the header size	2020-07-16 00:21:55 +03:00
Hongtao Yu	f3731d34fa	[LoopUnroll] Update branch weight for remainder loop Unrolling a loop with compile-time unknown trip count results in a remainder loop. The remainder loop executes the remaining iterations of the original loop when the original trip count is not a multiple of the unroll factor. For better profile counts maintenance throughout the optimization pipeline, I'm assigning an artificial weight to the latch branch of the remainder loop. A remainder loop runs up to as many times as the unroll factor subtracted by 1. Therefore I'm assigning the maximum possible trip count as the back edge weight. This should be more accurate than the default non-profile weight, which assumes the back edge runs much more frequently than the exit edge. Differential Revision: https://reviews.llvm.org/D83187	2020-07-15 12:33:29 -07:00
Hiroshi Yamauchi	30c382a7c6	[PGO][PGSO] Add profile guided size optimization to loop vectorization legality. Differential Revision: https://reviews.llvm.org/D83329	2020-07-15 11:49:36 -07:00
Sanjay Patel	d8b268680d	[InstCombine] prevent infinite looping in or-icmp fold (PR46712) I'm not sure if the test is truly minimal, but we need to induce a situation where a value becomes a constant but is not immediately folded before getting to the 'or' transform.	2020-07-15 14:12:12 -04:00
Hiroshi Yamauchi	4a539faf74	[PGO] Extend the value profile buckets for mem op sizes. Extend the memop value profile buckets to be more flexible (could accommodate a mix of individual values and ranges) and to cover more value ranges (from 11 to 22 buckets). Disabled behind a flag (to be enabled separately) and the existing code to be removed later.	2020-07-15 10:26:15 -07:00
Tim Northover	37b96d51d0	CodeGenPrep: remove AssertingVH references before deleting dead instructions. CodeGenPrepare keeps fairly close track of various instructions it's seen, particularly GEPs, in maps and vectors. However, sometimes those instructions become dead and get removed while it's still executing. This triggers AssertingVH references to them in an asserts build and could lead to miscompiles in a release build (I've only seen a later segfault though). So this patch adds a callback to RecursivelyDeleteTriviallyDeadInstructions which can make sure the instruction about to be deleted is removed from CodeGenPrepare's data structures.	2020-07-15 15:19:21 +01:00
John Brawn	20854d85e1	[DSE,MSSA] Recognise init_trampoline in getLocForWriteEx This fixes an instance where MemorySSA-using Dead Store Elimination is failing to do a transformation that the non-MemorySSA-using version does. Differential Revision: https://reviews.llvm.org/D83783	2020-07-15 12:18:58 +01:00
Florian Hahn	9ea0d8c38f	[LoopRotate] Remove unnecessary verifyMemorySSA calls. The actual rotation happens in processLoop, so the second removed call to verifyMemorySSA was unnecessary. In fact, processLoop/rotateLoop already verify MemorySSA before and after transforming each loop. Hence, both calls can be removed. Pointed out by @lebedev.ri post-commit D51718.	2020-07-15 11:49:24 +01:00
Chen Zheng	c86c1e972d	[IndVarSimplify] Uniformly use emplace_back for DeadInsts, nfc	2020-07-15 02:48:09 -04:00
Giorgis Georgakoudis	694ded37b9	[OpenMPOpt] Fix preserved analyses return	2020-07-14 23:18:43 -07:00
Luofan Chen	6db99d18b6	Revert "[Attributor] Track AA dependency using dependency graph" This reverts commit `8df7af560a`.	2020-07-15 11:48:08 +08:00
Johannes Doerfert	fec1f2109f	[OpenMP] Emit remarks during GPU state machine optimization Since D83271 we can optimize the GPU state machine to avoid spurious call edges that increase the register usage of kernels. With this patch we inform the user why and if this optimization is happening and when it is not. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D83707	2020-07-14 22:33:57 -05:00
Luofan Chen	8df7af560a	[Attributor] Track AA dependency using dependency graph Summary: This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers. Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis Reviewed By: jdoerfert Subscribers: jfb, okura, mgrang, kuter, lebedev.ri, hiraditya, uenoku, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78861	2020-07-15 10:40:21 +08:00
Luofan Chen	e21323a1e9	Revert "[Attributor] [WIP] Track AA dependency using dependency graph" This reverts commit `6b78ed6070`.	2020-07-15 10:33:55 +08:00
Luofan Chen	6b78ed6070	[Attributor] [WIP] Track AA dependency using dependency graph Summary: This patch added dependency graph to the attributor so that we can dump the dependencies between AAs more easily. We can also apply general graph algorithms to the graph, making it easier for us to create deep wrappers. Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis Reviewed By: jdoerfert Subscribers: jfb, okura, mgrang, kuter, lebedev.ri, hiraditya, uenoku, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78861	2020-07-15 10:21:49 +08:00
Luofan Chen	233af8958e	[Attributor] Create getter function for the ID of the abstract attribute Summary: The `getIdAddr()` function returns the address of the ID of the abstract attribute Reviewers: jdoerfert, sstefan1, uenoku, homerdin, baziotis Reviewed By: jdoerfert Subscribers: okura, hiraditya, uenoku, kuter, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83172	2020-07-15 09:55:18 +08:00
Tyker	16f777f421	[NFC] Add debug and stat counters to assume queries and assume builder Summary: Add debug counter and stats counter to assume queries and assume builder here is the collected stats on a build of check-llvm + check-clang. "assume-builder.NumAssumeBuilt": 2720879, "assume-builder.NumAssumesMerged": 761396, "assume-builder.NumAssumesRemoved": 1576212, "assume-builder.NumBundlesInAssumes": 6518809, "assume-queries.NumAssumeQueries": 85566380, "assume-queries.NumUsefullAssumeQueries": 2727360, the NumUsefullAssumeQueries stat is actually pessimistic because in a few places queries ask to keep providing information to try to get better information. and this isn't counted as a usefull query evem tho it can be usefull Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83506	2020-07-14 21:49:14 +02:00
Teresa Johnson	6014c46c80	Restore "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP" This restores commit `80d0a137a5`, and the follow on fix in `873c0d0786`, with a new fix for test failures after a 2-stage clang bootstrap, and a more robust fix for the Chromium build failure that an earlier version partially fixed. See also discussion on D75201. Reviewers: evgeny777 Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73242	2020-07-14 12:16:57 -07:00
Logan Smith	a19461d9e1	[NFC] Add 'override' keyword where missing in include/ and lib/. This fixes warnings raised by Clang's new -Wsuggest-override, in preparation for enabling that warning in the LLVM build. This patch also removes the virtual keyword where redundant, but only in places where doing so improves consistency within a given file. It also removes a couple unnecessary virtual destructor declarations in derived classes where the destructor inherited from the base class is already virtual. Differential Revision: https://reviews.llvm.org/D83709	2020-07-14 09:47:29 -07:00
serge-sans-paille	1cd1c1d62e	Revert "[SCEV][IndVarSimplify] insert point should not be block front." This reverts commit `f1efb8bb4b`. Reverted because it doesn't correctly update the pass return status, see http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/9441/steps/test-check-all/logs/FAIL%3A%20LLVM%3A%3Awiden-i32-i8ptr.ll	2020-07-14 14:24:26 +02:00
Roman Lebedev	e2b75cafcb	[NFCI][InstCombine] Move store merging from `visitStoreInst()` into `visitUnconditionalBranchInst()` Summary: As @nikic is pointing out in https://bugs.llvm.org/show_bug.cgi?id=46680#c5, InstCombine should not have forward instruction scans, so let's move this transform into the proper place. This is pretty much NFCI. Reviewers: nikic, spatel Reviewed By: nikic Subscribers: hiraditya, llvm-commits, nikic Tags: #llvm Differential Revision: https://reviews.llvm.org/D83670	2020-07-14 10:41:51 +03:00
Djordje Todorovic	1af8c93bab	[deadargelim] Attach dbg info to the insert/extractvalue instructions Attach DbgLoc on insertvalue/extractvalue instructions created by DeadArgumentElimination. This fixes the PR46350. Differential Revision: https://reviews.llvm.org/D81939	2020-07-14 08:52:04 +02:00
Jameson Nash	2c7a07b59d	[GVN] teach ConstantFolding correct handling of non-integral addrspace casts Here we teach the ConstantFolding analysis pass that it is not legal to replace a load of a bitcast constant (having a non-integral addrspace) with a bitcast of the value of that constant (with a different non-integral addrspace). But also teach it that certain bit patterns are always known and convertable (a fact it already uses elsewhere). This required us to also fix a globalopt test, since, after this change, LLVM is able to realize that the test actually is a valid transform (NULL is always a known bit-pattern) and so it doesn't need to emit the failure remarks for it. Also simplify some of the negative tests for transforms by avoiding a type change in their bitcast, and add positive versions of the same tests, to show that they otherwise should work. Differential Revision: https://reviews.llvm.org/D59730	2020-07-13 21:44:17 -04:00
Jameson Nash	e244f86f4d	[VNCoercion] avoid creating bitcast for zero offsets [NFCI] This could previously make it more complicated for ConstantFolding later, leading to a higher likelyhood it would have to reject the expression, even though zero seems like probably the common case here. Differential Revision: https://reviews.llvm.org/D59730	2020-07-13 21:44:17 -04:00
Gui Andrade	871251b2b6	[MSAN] Implement experimental vector reduction intrinsics Implement llvm.experimental.vector.{add,mul,or,and,...}. An IR test is included but no C test for lack of good way to get the compiler to emit these. Differential Revision: https://reviews.llvm.org/D82920	2020-07-14 00:12:10 +00:00
Gui Andrade	d1c7f51a9e	MemorySanitizer: If a field is marked noundef, check init at call site Adds LLVM option to control eager checking under -msan-eager-checks. This change depends on the noundef keyword to determining cases where it it sound to check these shadows, and falls back to passing shadows values by TLS. Checking at call boundaries enforces undefined behavior rules with passing uninitialized arguments by value. Differential Revision: https://reviews.llvm.org/D81699	2020-07-13 23:32:26 +00:00
Tyker	8d09f20798	[AssumeBundles] Use operand bundles to encode alignment assumptions Summary: NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html Complemantary to the assumption outliner prototype in D71692, this patch shows how we could simplify the code emitted for an alignemnt assumption. The generated code is smaller, less fragile, and it makes it easier to recognize the additional use as a "assumption use". As mentioned in D71692 and on the mailing list, we could adopt this scheme, and similar schemes for other patterns, without adopting the assumption outlining. Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1 Reviewed By: jdoerfert Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71739	2020-07-14 01:05:58 +02:00
Gui Andrade	bfa3b627c6	[InstCombine] Erase attribute lists for simplified libcalls Currently, a transformation like pow(2.0, x) -> exp2(x) copies the pow attribute list verbatim and applies it to exp2. This works out fine when the attribute list is empty, but when it isn't clang may error due due to the mismatch. The source function and destination don't necessarily have anything to do with one another, attribute-wise. So it makes sense to remove the attribute lists (this is similar to what IPO does in this situation). This was discovered after implementing the `noundef` param attribute. Differential Revision: https://reviews.llvm.org/D82820	2020-07-13 22:32:33 +00:00
Vedant Kumar	3d52b1e81b	Revert "[InstCombine] Drop debug loc in TryToSinkInstruction (reland)" This reverts commit `9649c2095f`. See discussion on the llvm-commits thread: if it's OK to preserve the location when sinking a call, it's probably OK to always preserve the location.	2020-07-13 15:17:07 -07:00
Nikita Popov	353fa4403a	[PredicateInfo] Place predicate info after assume Place the ssa.copy instructions for assumes after the assume, instead of before it. Both options are valid, but placing them afterwards prevents assumes from being replaced with assume(true). This fixes https://bugs.llvm.org/show_bug.cgi?id=37541 in NewGVN and will avoid a similar issue in SCCP when we handle more predicate infos. Differential Revision: https://reviews.llvm.org/D83631	2020-07-13 21:10:11 +02:00
Sanne Wouda	13fec93a77	[NFC] rename to reflect F is not necessarily an Intrinsic	2020-07-13 15:28:46 +01:00
Sanne Wouda	7b84045565	[SLPVectorizer] handle vectorizeable library functions Teaches the SLPVectorizer to use vectorized library functions for non-intrinsic calls. This already worked for intrinsics that have vectorized library functions, thanks to D75878, but schedules with library functions with a vector variant were being rejected early. - assume that there are no load/store dependencies between lib functions with a vector variant; this would otherwise prevent the bundle from becoming "ready" - check during legalization that the vector variant can be used - fix-up where we previously assumed that a call would be an intrinsic Differential Revision: https://reviews.llvm.org/D82550	2020-07-13 15:28:46 +01:00
Max Kazantsev	e808cab824	[InstCombine] Improve select -> phi canonicalization: consider more blocks We can try to replace select with a Phi not in its parent block alone, but also in blocks of its arguments. We benefit from it when select's argument is a Phi. Differential Revision: https://reviews.llvm.org/D83284 Reviewed By: nikic	2020-07-13 11:40:32 +07:00
Shinji Okumura	c73f425f84	[Attributor] Add AAValueSimplifyCallSiteArgument::manifest Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D82861	2020-07-13 07:01:50 +09:00
Alexey Lapshin	0a01fc96e2	Revert "[TRE] allow TRE for non-capturing calls." This reverts commit `f7907e9d22`. That commit caused error on multi-stage build.	2020-07-13 00:39:48 +03:00
Sanjay Patel	4458973347	[InstCombine] fold mul of zext/sext bools to 'and' Similar to rG40fcc42: The base case only worked because we were relying on a poison-unsafe select transform; if that is fixed, we would regress on patterns like this. The extra use tests show that the select transform can't be applied consistently. So it may be a regression to have an extra instruction on 1 test, but that result was not created safely and does not happen reliably.	2020-07-12 15:56:26 -04:00
Ayal Zaks	82a5157ff1	[LV] Fixing versioning-for-unit-stide of loops with small trip count This patch fixes D81345 and PR46652. If a loop with a small trip count is compiled w/o -Os/-Oz, Loop Access Analysis still generates runtime checks for unit strides that will version the loop. In such cases, the loop vectorizer should either re-run the analysis or bail-out from vectorizing the loop, as done prior to D81345. The latter is applied for now as the former requires refactoring. Differential Revision: https://reviews.llvm.org/D83470	2020-07-12 19:51:47 +03:00
kuter	4dbe82eef3	[Attributor] Introudce attribute seed allow list.	2020-07-12 02:25:33 +03:00
Michael Liao	81db614411	Fix `-Wunused-variable` warnings. NFC.	2020-07-11 10:09:44 -04:00
Michael Liao	0b4cf802fa	[fix-irreducible] Skip unreachable predecessors. Summary: - Skip unreachable predecessors during header detection in SCC. Those unreachable blocks would be generated in the switch lowering pass in the corner cases or other frontends. Even though they could be removed through the CFG simplification, we should skip them during header detection. Reviewers: sameerds Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83562	2020-07-11 10:08:44 -04:00
sstefan1	850b150cff	[Attributor][NFC] Add more debug output for deleted functions	2020-07-11 14:26:08 +02:00
Alexey Lapshin	f7907e9d22	[TRE] allow TRE for non-capturing calls. The current implementation of Tail Recursion Elimination has a very restricted pre-requisite: AllCallsAreTailCalls. i.e. it requires that no function call receives a pointer to local stack. Generally, function calls that receive a pointer to local stack but do not capture it - should not break TRE. This fix allows us to do TRE if it is proved that no pointer to the local stack is escaped. Reviewed by: efriedma Differential Revision: https://reviews.llvm.org/D82085	2020-07-11 14:01:48 +03:00
Roman Lebedev	4500db8c59	Revert "Reland "[InstCombine] Lower infinite combine loop detection thresholds""" And there's a new hit: https://bugs.llvm.org/show_bug.cgi?id=46680 This reverts commit `7103c87596`.	2020-07-11 13:53:24 +03:00
Johannes Doerfert	dce6bc18c4	[OpenMP][FIX] remove unused variable and long if-else chain MSVC throws an error if you use "too many" if-else in a row: `Frontend/OpenMP/OMPKinds.def(570): fatal error C1061: compiler limit: blocks nested too deeply` We work around it now...	2020-07-11 02:37:57 -05:00
Mehdi Amini	c44702bcdf	Remove unused variable `KMPC_KERNEL_PARALLEL_WORK_FN_PTR_ARG_NO` (NFC) This fixes a compiler warning.	2020-07-11 07:17:28 +00:00
Johannes Doerfert	5b0581aedc	[OpenMP] Replace function pointer uses in GPU state machine In non-SPMD mode we create a state machine like code to identify the parallel region the GPU worker threads should execute next. The identification uses the parallel region function pointer as that allows it to work even if the kernel (=target region) and the parallel region are in separate TUs. However, taking the address of a function comes with various downsides. With this patch we will identify the most common situation and replace the function pointer use with a dummy global symbol (for identification purposes only). That means, if the parallel region is only called from a single target region (or kernel), we do not use the function pointer of the parallel region to identify it but a new global symbol. Fixes PR46450. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83271	2020-07-11 01:44:00 -05:00
Johannes Doerfert	624d34afff	[OpenMP] Compute a proper module slice for the CGSCCC pass The module slice describes which functions we can analyze and transform while working on an SCC as part of the CGSCC OpenMPOpt pass. So far, we simply restricted it to the SCC. In a follow up we will need to have a bigger scope which is why this patch introduces a proper identification of the module slice. In short, everything that has a transitive reference to a function in the SCC or is transitively referenced by one is fair game. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D83270	2020-07-11 01:44:00 -05:00
Johannes Doerfert	e8039ad4de	[OpenMP] Identify GPU kernels (aka. OpenMP target regions) We now identify GPU kernels, that is entry points into the GPU code. These kernels (can) correspond to OpenMP target regions. With this patch we identify and on request print them via remarks. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D83269	2020-07-11 01:44:00 -05:00
Johannes Doerfert	54bd3751ce	[OpenMP][NFC] Add convenient helper and early exit check	2020-07-11 00:51:51 -05:00
Johannes Doerfert	b726c55709	[OpenMP][NFC] Fix some typos	2020-07-11 00:51:51 -05:00
sstefan1	b8235d2bd8	Reland "[OpenMPOpt] ICV Tracking" This reverts commit `1d542f0ca8`. `recollectUses()` is added to prevent looking at dead uses after Attributor run. This is the first and most basic ICV Tracking implementation. For this first version, we only support deduplication within the same BB. Reviewers: jdoerfert, JonChesterfield, hamax97, jhuber6, uenoku, baziotis, lebedev.ri Differential Revision: https://reviews.llvm.org/D81788	2020-07-11 02:25:57 +02:00
Sidharth Baveja	e541e1b757	[NFC] Separate Peeling Properties into its own struct (re-land after minor fix) Summary: This patch separates the peeling specific parameters from the UnrollingPreferences, and creates a new struct called PeelingPreferences. Functions which used the UnrollingPreferences struct for peeling have been updated to use the PeelingPreferences struct. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel), anhtuyen (Anh Tuyen Tran), nikic (Nikita Popov) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580	2020-07-10 18:39:30 +00:00
Benjamin Kramer	b887da81cc	[CGProfile] Fix layering, IPO depends in Instrumentation.	2020-07-10 19:13:47 +02:00
Zequan Wu	1fbb719470	[LPM] Port CGProfilePass from NPM to LPM Reviewers: hans, chandlerc!, asbirlea, nikic Reviewed By: hans, nikic Subscribers: steven_wu, dexonsmith, nikic, echristo, void, zhizhouy, cfe-commits, aeubanks, MaskRay, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D83013	2020-07-10 09:04:51 -07:00
Roman Lebedev	1d542f0ca8	Revert "[OpenMPOpt] ICV Tracking" There appears to be some kind of memory corruption/use-after-free/etc going on here. In particular, in `OpenMPOpt::deleteParallelRegions()`, in `DeleteCallCB()`, `CI` is garbage. WIll post reproducer in the original review. This reverts commit `6c4a5e9257`.	2020-07-10 19:00:15 +03:00
Roman Lebedev	7103c87596	Reland "[InstCombine] Lower infinite combine loop detection thresholds"" This relands commit `cd7f8051ac` that was reverted since lower threshold have successfully found an issue. Now that the issue is fixed, let's wait until the next one is reported. This reverts commit `caa423eef0`.	2020-07-10 17:49:16 +03:00
Roman Lebedev	2655a70a04	[InstCombine] After merging store into successor, queue prev. store to be visited (PR46661) We can happen to have a situation with many stores eligible for transform, but due to our visitation order (top to bottom), when we have processed the first eligible instruction, we would not try to reprocess the previous instructions that are now also eligible. So after we've successfully merged a store that was second-to-last instruction into successor, if the now-second-to-last instruction is also a such store that is eligible, add it to worklist to be revisited. Fixes https://bugs.llvm.org/show_bug.cgi?id=46661	2020-07-10 17:49:16 +03:00
Florian Hahn	264ab1e2c8	[LV] Pick vector loop body as insert point for SCEV expansion. Currently the DomTree is not kept up to date for additional blocks generated in the vector loop, for example when vectorizing with predication. SCEVExpander relies on dominance checks when looking for existing instructions to re-use and in some cases that can lead to the expander picking instructions that do not actually dominate their insert point (e.g. as in PR46525). Unfortunately keeping the DT up-to-date is a bit tricky, because the CFG is only patched up after generating code for a block. For now, we can just use the vector loop header, as this ensures the inserted instructions dominate all uses in the vector loop. There should be no noticeable impact on the generated code, as other passes should sink those instructions, if profitable. Fixes PR46525. Reviewers: Ayal, gilr, mkazantsev, dmgreen Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D83288	2020-07-10 10:37:12 +01:00
Diogo Sampaio	7bf168390f	[BDCE] SExt -> ZExt when no sign bits is used and instruction has multiple uses Summary: This allows to convert any SExt to a ZExt when we know none of the extended bits are used, specially in cases where there are multiple uses of the value. Reviewers: dmgreen, eli.friedman, spatel, lebedev.ri, nikic Reviewed By: lebedev.ri, nikic Subscribers: hiraditya, dmgreen, craig.topper, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60413	2020-07-10 08:34:53 +01:00
SharmaRithik	e71c7b593a	[CodeMoverUtils] Move OrderedInstructions to CodeMoverUtils Summary: This patch moves OrderedInstructions to CodeMoverUtils as It was the only place where OrderedInstructions is required. Authored By: RithikSharma Reviewer: Whitney, bmahjour, etiotto, fhahn, nikic Reviewed By: Whitney, nikic Subscribers: mgorny, hiraditya, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D80643	2020-07-10 11:22:43 +05:30
Chen Zheng	f1efb8bb4b	[SCEV][IndVarSimplify] insert point should not be block front. The block front may be a PHI node, inserting a cast instructions like BitCast, PtrToInt, IntToPtr among PHIs is not right. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D80975	2020-07-09 21:56:57 -04:00
Nikita Popov	c0308fd154	[PredicateInfo] Print RenamedOp (NFC) Make it easier to debug renaming issues.	2020-07-09 23:14:24 +02:00
Fangrui Song	c025bdf25a	Revert D83013 "[LPM] Port CGProfilePass from NPM to LPM" This reverts commit `c92a8c0a0f`. It breaks builds and has unaddressed review comments.	2020-07-09 13:34:04 -07:00
Roman Lebedev	caa423eef0	Revert "[InstCombine] Lower infinite combine loop detection thresholds" And just after 3 days, we have a hit in `InstCombiner::mergeStoreIntoSuccessor()`: https://bugs.llvm.org/show_bug.cgi?id=46661 To be recommitted once that is addressed. This reverts commit `cd7f8051ac`.	2020-07-09 23:10:42 +03:00
Zequan Wu	c92a8c0a0f	[LPM] Port CGProfilePass from NPM to LPM Reviewers: hans, chandlerc!, asbirlea, nikic Reviewed By: hans, nikic Subscribers: steven_wu, dexonsmith, nikic, echristo, void, zhizhouy, cfe-commits, aeubanks, MaskRay, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D83013	2020-07-09 13:03:42 -07:00
dfukalov	167767a775	SpeculativeExecution: Fix for logic change introduced in D81730. Summary: The test case started to hoist bitcasts to upper BB after D81730. Reverted unintentional logic change. Some instructions may have zero cost but will not be hoisted by different limitation so should be counted for threshold. Reviewers: aprantl, arsenm, nhaehnle Reviewed By: aprantl Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82761	2020-07-09 15:45:23 +03:00
Florian Hahn	a86ce06faf	[SCCP] Use conditional info with AND/OR branch conditions. Currently SCCP does not combine the information of conditions joined by AND in the true branch or OR in the false branch. For branches on AND, 2 copies will be inserted for the true branch, with one being the operand of the other as in the code below. We can combine the information using intersection. Note that for the OR case, the copies are inserted in the false branch, where using intersection is safe as well. define void @foo(i32 %a) { entry: %lt = icmp ult i32 %a, 100 %gt = icmp ugt i32 %a, 20 %and = and i1 %lt, %gt ; Has predicate info ; branch predicate info { TrueEdge: 1 Comparison: %lt = icmp ult i32 %a, 100 Edge: [label %entry,label %true] } %a.0 = call i32 @llvm.ssa.copy.140247425954880(i32 %a) ; Has predicate info ; branch predicate info { TrueEdge: 1 Comparison: %gt = icmp ugt i32 %a, 20 Edge: [label %entry,label %false] } %a.1 = call i32 @llvm.ssa.copy.140247425954880(i32 %a.0) br i1 %and, label %true, label %false true: ; preds = %entry call void @use(i32 %a.1) %true.1 = icmp ne i32 %a.1, 20 call void @use.i1(i1 %true.1) ret void false: ; preds = %entry call void @use(i32 %a.1) ret void } Reviewers: efriedma, davide, mssimpso, nikic Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D77808	2020-07-09 12:59:24 +01:00
Benjamin Kramer	b44470547e	Make helpers static. NFC.	2020-07-09 13:48:56 +02:00
Jun Ma	f0bfad2ed9	[Coroutines] Refactor sinkLifetimeStartMarkers Differential Revision: https://reviews.llvm.org/D83379	2020-07-09 18:23:28 +08:00
Florian Hahn	b805e94477	[PredicateInfo] Add additional RenamedOp field to PB. OriginalOp of a predicate always refers to the original IR value that was renamed. So for nested predicates of the same value, it will always refer to the original IR value. For the use in SCCP however, we need to find the renamed value that is currently used in the condition associated with the predicate. This patch adds a new RenamedOp field to do exactly that. NewGVN currently relies on the existing behavior to merge instruction metadata. A test case to check for exactly that has been added in `195fa4bfae`. Reviewers: efriedma, davide, nikic Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D78133	2020-07-09 09:51:18 +01:00
Nikita Popov	0b39d2d752	Revert "[NFC] Separate Peeling Properties into its own struct" This reverts commit `0369dc98f9`. Many failing tests.	2020-07-08 21:43:32 +02:00
Gui Andrade	ff7900d5de	[LLVM] Accept `noundef` attribute in function definitions/calls The `noundef` attribute indicates an argument or return value which may never have an undef value representation. This patch allows LLVM to parse the attribute. Differential Revision: https://reviews.llvm.org/D83412	2020-07-08 19:02:04 +00:00
Sidharth Baveja	0369dc98f9	[NFC] Separate Peeling Properties into its own struct Summary: This patch makes the peeling properties of the loop accessible by other loop transformations. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580	2020-07-08 18:59:59 +00:00
Anh Tuyen Tran	6965af43e6	Revert "[NFC] Separate Peeling Properties into its own struct" This reverts commit `fead250b43`.	2020-07-08 18:58:05 +00:00
Anh Tuyen Tran	fead250b43	[NFC] Separate Peeling Properties into its own struct Summary: This patch makes the peeling properties of the loop accessible by other loop transformations. Author: sidbav (Sidharth Baveja) Reviewers: Whitney (Whitney Tsang), Meinersbur (Michael Kruse), skatkov (Serguei Katkov), ashlykov (Arkady Shlykov), bogner (Justin Bogner), hfinkel (Hal Finkel) Reviewed By: Meinersbur (Michael Kruse) Subscribers: fhahn (Florian Hahn), hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D80580	2020-07-08 18:56:03 +00:00
Christopher Tetreault	c444b1b904	[SVE] Remove calls to VectorType::getNumElements from Scalar Reviewers: efriedma, fhahn, reames, kmclaughlin, sdesmalen Reviewed By: sdesmalen Subscribers: tschuett, hiraditya, rkruppe, psnobl, dantrushin, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82243	2020-07-08 11:08:20 -07:00
Wei Mi	e32469a140	[SampleFDO] Enable sample-profile-top-down-load and sample-profile-merge-inlinee by default. sample-profile-top-down-load is an internal option which can enable top-down order of inlining and profile annotation in sample profile load pass. It was found to be beneficial for better profile annotation. Recently we found it could also solve some build time issue. Suppose function A has many callsites in function B. In the last release binary where sample profile was collected, the outline copy of A is large because there are many other functions inlined into A. However although all the callsites calling A in B are inlined, but every inlined body is small (A was inlined into B before other functions are inlined into A), there is no build time issue in last release. In an optimized build using the sample profile collected from last release, without top-down inlining, we saw a case that A got very large because of inlining, and then multiple callsites of A got inlined into B, and that led to a huge B which caused significant build time issue besides profile annotation issue. To solve that problem, the patch enables the flag sample-profile-top-down-load by default. sample-profile-top-down-load can have better performance when it is enabled together with sample-profile-merge-inlinee so in this patch we also enable sample-profile-merge-inlinee by default. Differential Revision: https://reviews.llvm.org/D82919	2020-07-08 09:23:18 -07:00
Nicolai Hähnle	3fa989d4fd	DomTree: remove explicit use of DomTreeNodeBase::iterator Summary: Almost all uses of these iterators, including implicit ones, really only need the const variant (as it should be). The only exception is in NewGVN, which changes the order of dominator tree child nodes. Change-Id: I4b5bd71e32d71b0c67b03d4927d93fe9413726d4 Reviewers: arsenm, RKSimon, mehdi_amini, courbet, rriddle, aartbik Subscribers: wdng, Prazek, hiraditya, kuhar, rogfer01, rriddle, jpienaar, shauheen, antiagainst, nicolasvasilache, arpith-jacob, mgester, lucyrfox, aartbik, liufengdb, stephenneuendorffer, Joonsoo, grosul1, vkmr, Kayjukh, jurahul, msifontes, cfe-commits, llvm-commits Tags: #clang, #mlir, #llvm Differential Revision: https://reviews.llvm.org/D83087	2020-07-08 18:18:49 +02:00
sstefan1	6aab27ba85	[OpenMPIRBuilder][Fix] Move llvm::omp::types to OpenMPIRBuilder. Summary: D82193 exposed a problem with global type definitions in `OMPConstants.h`. This causes a race when running in thinLTO mode. Types now live inside of OpenMPIRBuilder to prevent this from happening. Reviewers: jdoerfert Subscribers: yaxunl, hiraditya, guansong, dexonsmith, aaron.ballman, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D83176	2020-07-08 17:23:55 +02:00
Stanislav Mekhanoshin	64030099c3	SLP: honor requested max vector size merging PHIs At the moment this place does not check maximum size set by TTI and just creates a maximum possible vectors. Differential Revision: https://reviews.llvm.org/D82227	2020-07-08 08:06:15 -07:00
Florian Hahn	80970ac875	[DSE,MSSA] Eliminate stores by terminators (free,lifetime.end). This patch adds support for eliminating stores by free & lifetime.end calls. We can remove stores that are not read before calling a memory terminator and we can eliminate all stores after a memory terminator until we see a new lifetime.start. The second case seems to not really trigger much in practice though. Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D72410	2020-07-08 08:59:46 +01:00
Florian Hahn	04b85e2bcb	Revert "[SLP] Make sure instructions are ordered when computing spill cost." This seems to break http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/24371 This reverts commit `eb46137daa`.	2020-07-07 23:15:01 +01:00
Arthur Eubanks	2279380eab	[Inliner] Don't skip inlining alwaysinline in optnone functions Previously the NPM inliner would skip all potential inlines in an optnone function, but alwaysinline callees should be inlined regardless of optnone. Fixes inline-optnone.ll under NPM. Reviewed By: kazu Differential Revision: https://reviews.llvm.org/D83021	2020-07-07 12:54:55 -07:00
Nikita Popov	8691544a27	[SCCP] Use range metadata for loads and calls When all else fails, use range metadata to constrain the result of loads and calls. It should also be possible to use !nonnull, but that would require some general support for inequalities in SCCP first. Differential Revision: https://reviews.llvm.org/D83179	2020-07-07 21:09:21 +02:00
Nikita Popov	9dfea03517	[SCCP] Handle assume predicates Take assume predicates into account when visiting ssa.copy. The handling is the same as for branch predicates, with the difference that we're always on the true edge. Differential Revision: https://reviews.llvm.org/D83257	2020-07-07 20:22:52 +02:00
Hans Wennborg	7fc279ca3d	[GlobalOpt] Don't remove inalloca from musttail-called functions Otherwise the verifier complains about the mismatching function ABIs. Differential revision: https://reviews.llvm.org/D83300	2020-07-07 19:02:46 +02:00
SharmaRithik	082e395230	[CodeMoverUtils] Make specific analysis dependent checks optional Summary: This patch makes code motion checks optional which are dependent on specific analysis example, dominator tree, post dominator tree and dependence info. The aim is to make the adoption of CodeMoverUtils easier for clients that don't use analysis which were strictly required by CodeMoverUtils. This will also help in diversifying code motion checks using other analysis example MSSA. Authored By: RithikSharma Reviewer: Whitney, bmahjour, etiotto Reviewed By: Whitney Subscribers: Prazek, hiraditya, george.burgess.iv, asbirlea, llvm-commits Tag: LLVM Differential Revision: https://reviews.llvm.org/D82566	2020-07-07 20:11:07 +05:30
Roman Lebedev	16266e6396	[Scalarizer] When gathering scattered scalar, don't replace it with itself The (previously-crashing) test-case would cause us to seemingly-harmlessly replace some use with something else, but we can't replace it with itself, so we would crash.	2020-07-07 17:03:53 +03:00
Ayal Zaks	7bf299c8d8	[LV] Vectorize without versioning-for-unit-stride under -Os/-Oz If a loop is in a function marked OptSize, Loop Access Analysis should refrain from generating runtime checks for unit strides that will version the loop. If a loop is in a function marked OptSize and its vectorization is enabled, it should be vectorized w/o any versioning. Fixes PR46228. Differential Revision: https://reviews.llvm.org/D81345	2020-07-07 15:04:21 +03:00
Jordan Rupprecht	10c82eecbc	Revert "[LV] Enable the LoopVectorizer to create pointer inductions" This reverts commit `a8fe12065e`. It causes a crash when building gzip. Will post the detailed reduced test case to D81267.	2020-07-06 17:50:38 -07:00
Roman Lebedev	69dca6efc6	[NFCI][IR] Introduce CallBase::Create() wrapper Summary: It is reasonably common to want to clone some call with different bundles. Let's actually provide an interface to do that. Reviewers: chandlerc, jdoerfert, dblaikie, nickdesaulniers Reviewed By: nickdesaulniers Subscribers: llvm-commits, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D83248	2020-07-07 01:16:36 +03:00
Roman Lebedev	db05f2e34a	[Scalarizer] Centralize instruction DCE As reported in https://reviews.llvm.org/D83101#2133062 the new visitInsertElementInst()/visitExtractElementInst() functionality is causing miscompiles (previously-crashing test added) It is due to the fact how the infra of Scalarizer is dealing with DCE, it was not updated or was it ready for such scalar value forwarding. It always assumed that the moment we "scalarized" something, it can go away, and did so with prejudice. But that is no longer safe/okay to do. Instead, let's prevent it from ever shooting itself into foot, and let's just accumulate the instructions-to-be-deleted in a vector, and collectively cleanup (those that are actually dead) them all at the end. All existing tests are not reporting any new garbage leftovers, but maybe it's test coverage issue.	2020-07-07 01:12:51 +03:00
Nicolai Hähnle	dfcc68c528	DomTree: Remove getRoots() accessor Summary: Avoid exposing details about how roots are stored. This enables subsequent type-erasure changes. v5: - cleanup a unit test by using EXPECT_EQ instead of EXPECT_TRUE Change-Id: I532b774cc71f2224e543bc7d79131d97f63f093d Reviewers: arsenm, RKSimon, mehdi_amini, courbet Subscribers: jvesely, wdng, hiraditya, kuhar, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D83085	2020-07-06 21:58:11 +02:00

... 3 4 5 6 7 ...

24944 Commits