llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Lebedev	1d51dc38d8	[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline I've been looking at missed vectorizations in one codebase. One particular thing that stands out is that some of the loops reach vectorizer in a rather mangled form, with weird PHI's, and some of the loops aren't even in a rotated form. After taking a more detailed look, that happened because the loop's headers were too big by then. It is evident that SimplifyCFG's common code hoisting transform is at fault there, because the pattern it handles is precisely the unrotated loop basic block structure. Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled by default, and is always run, unlike it's friend, common code sinking transform, `SinkCommonCodeFromPredecessors()`, which is not enabled by default and is only run once very late in the pipeline. I'm proposing to harmonize this, and disable common code hoisting until //late// in pipeline. Definition of //late// may vary, here currently i've picked the same one as for code sinking, but i suppose we could enable it as soon as right after loop rotation happens. Experimentation shows that this does indeed unsurprizingly help, more loops got rotated, although other issues remain elsewhere. Now, this undoubtedly seriously shakes phase ordering. This will undoubtedly be a mixed bag in terms of both compile- and run- time performance, codesize. Since we no longer aggressively hoist+deduplicate common code, we don't pay the price of said hoisting (which wasn't big). That may allow more loops to be rotated, so we pay that price. That, in turn, that may enable all the transforms that require canonical (rotated) loop form, including but not limited to vectorization, so we pay that too. And in general, no deduplication means more [duplicate] instructions going through the optimizations. But there's still late hoisting, some of them will be caught late. As per benchmarks i've run {F12360204}, this is mostly within the noise, there are some small improvements, some small regressions. One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure this will expose many more pre-existing missed optimizations, as usual :S llvm-compile-time-tracker.com thoughts on this: http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions * this does regress compile-time by +0.5% geomean (unsurprizingly) * size impact varies; for ThinLTO it's actually an improvement The largest fallout appears to be in GVN's load partial redundancy elimination, it spends much more time in `MemoryDependenceResults::getNonLocalPointerDependency()`. Non-local `MemoryDependenceResults` is widely-known to be, uh, costly. There does not appear to be a proper solution to this issue, other than silencing the compile-time performance regression by tuning cut-off thresholds in `MemoryDependenceResults`, at the cost of potentially regressing run-time performance. D84609 attempts to move in that direction, but the path is unclear and is going to take some time. If we look at stats before/after diffs, some excerpts: * RawSpeed (the target) {F12360200} * -14 (-73.68%) loops not rotated due to the header size (yay) * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer * -3937 (-64.19%) common instructions hoisted * +561 (+0.06%) x86 asm instructions * -2 basic blocks * +2418 (+0.11%) IR instructions * vanilla test-suite + RawSpeed + darktable {F12360201} * -36396 (-65.29%) common instructions hoisted * +1676 (+0.02%) x86 asm instructions * +662 (+0.06%) basic blocks * +4395 (+0.04%) IR instructions It is likely to be sub-optimal for when optimizing for code size, so one might want to change tune pipeline by enabling sinking/hoisting when optimizing for size. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D84108	2020-07-29 20:05:30 +03:00
Sjoerd Meijer	5567c62afa	[Matrix] Add LowerMatrixIntrinsics to the NPM Pass LowerMatrixIntrinsics wasn't running yet running under the new pass manager, and this adds LowerMatrixIntrinsics to the pipeline (to the same place as where it is running in the old PM). Differential Revision: https://reviews.llvm.org/D84180	2020-07-22 09:47:53 +01:00
Florian Hahn	dc1087d408	[Matrix] Add minimal lowering pass that only requires TTI. This patch adds a new variant of the matrix lowering pass that only does a minimal lowering and only depends on TTI. The main purpose of this pass is to have a pass with minimal dependencies to run as part of the backend pipeline. At the moment, the only difference to the regular lowering pass is that it does not support remarks. But in subsequent patches add support for tiling to the lowering pass which will require more analysis, which we do not want to run in the backend, as the lowering should happen in the middle-end in practice and running it in the backend is mostly for convenience when running llc. Reviewers: anemet, Gerolf, efriedma, hfinkel Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D76867	2020-07-20 11:16:11 +01:00
Florian Hahn	31d71c69f1	[Matrix] Only run matrix lowering early with -O0. Currently matrix lowering is run twice if OptLevel > 0. Fix that and also add a test for OptLevel > 0 with matrix lowering enabled.	2020-07-17 15:53:16 +01:00
Roman Lebedev	fb432a51f4	Reland "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions" This reverts commit `1067d3e176`, which reverted commit `b2018198c3`, because it introduced a Dependency Cycle between Transforms/Scalar and Transforms/Utils. So let's just move SimplifyCFGOptions.h into Utils/, thus avoiding the cycle.	2020-07-16 13:40:01 +03:00
Florian Hahn	cbe0e539e7	[Matrix] Also run lowering during -O0. Currently the backends cannot lower the matrix intrinsics directly and rely on the lowering to vector instructions happening in the middle-end. At the moment, this means the backend crashes when matrix types extension code is compiled with -O0, e.g. http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-aarch64-O0-g/7902/ This patch enables also runs the lowering with -O0 in the middle-end as a temporary solution. Long term, a lightweight version of the lowering should run in the backend, on demand.	2020-07-16 10:51:31 +01:00
Adrian Kuegel	1067d3e176	Revert "[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions" This reverts commit `b2018198c3`. This commit introduced a Dependency Cycle between Transforms/Scalar and Transforms/Utils. Transforms/Scalar already depends on Transforms/Utils, so if SimplifyCFGOptions.h is moved to Scalar, and Utils/Local.h still depends on it, we have a cycle.	2020-07-16 10:54:10 +02:00
Roman Lebedev	b2018198c3	[NFCI] createCFGSimplificationPass(): migrate to also take SimplifyCFGOptions Taking so many parameters is simply unmaintainable. We don't want to include the entire llvm/Transforms/Utils/Local.h into llvm/Transforms/Scalar.h so i've split SimplifyCFGOptions into it's own header.	2020-07-16 01:27:54 +03:00
Teresa Johnson	6014c46c80	Restore "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP" This restores commit `80d0a137a5`, and the follow on fix in `873c0d0786`, with a new fix for test failures after a 2-stage clang bootstrap, and a more robust fix for the Chromium build failure that an earlier version partially fixed. See also discussion on D75201. Reviewers: evgeny777 Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73242	2020-07-14 12:16:57 -07:00
Zequan Wu	1fbb719470	[LPM] Port CGProfilePass from NPM to LPM Reviewers: hans, chandlerc!, asbirlea, nikic Reviewed By: hans, nikic Subscribers: steven_wu, dexonsmith, nikic, echristo, void, zhizhouy, cfe-commits, aeubanks, MaskRay, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D83013	2020-07-10 09:04:51 -07:00
Fangrui Song	c025bdf25a	Revert D83013 "[LPM] Port CGProfilePass from NPM to LPM" This reverts commit `c92a8c0a0f`. It breaks builds and has unaddressed review comments.	2020-07-09 13:34:04 -07:00
Zequan Wu	c92a8c0a0f	[LPM] Port CGProfilePass from NPM to LPM Reviewers: hans, chandlerc!, asbirlea, nikic Reviewed By: hans, nikic Subscribers: steven_wu, dexonsmith, nikic, echristo, void, zhizhouy, cfe-commits, aeubanks, MaskRay, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits Tags: #llvm, #clang Differential Revision: https://reviews.llvm.org/D83013	2020-07-09 13:03:42 -07:00
Sanjay Patel	098e48a6a1	[PassManager] restore early-cse to vector cleanup As noted in D80236 - the early-cse pass was included here before: D75145 / rG71a316883d50 But it got moved outside of the "extra" option there, then it got dropped while adjusting -vector-combine: rG6438ea45e053 rG57bb4787d72f So this is restoring the behavior and adding a test to prevent accidental changes again. I don't see an equivalent option for the new pass manager.	2020-06-14 10:04:53 -04:00
AK	96458fc510	Add cl::ZeroOrMore to get around build system issues It is quite common to get multiple instances of optimization flags while building. The following optimizations does not have cl::ZeroOrMore which causes errors during the build. Reviewers: alexbdv,spop Differential Revision: https://reviews.llvm.org/D81187	2020-06-07 10:15:18 -07:00
Sanjay Patel	57bb4787d7	[Pass Manager] remove EarlyCSE as clean-up for VectorCombine EarlyCSE was added with D75145, but the motivating test is not regressed by removing the extra pass now. That might be because VectorCombine altered the way it processes instructions, or it might be from (re)moving VectorCombine in the pipeline. The extra round of EarlyCSE appears to cost approximately 0.26% in compile-time as discussed in D80236, so we need some evidence to justify its inclusion here, but we do not have that (yet). I suspect that between SLP and VectorCombine, we are creating patterns that InstCombine and/or codegen are not prepared for, but we will need to reduce those examples and include them as PhaseOrdering and/or test-suite benchmarks.	2020-05-24 12:36:21 -04:00
Sanjay Patel	6438ea45e0	[VectorCombine] position pass after SLP in the optimization pipeline rather than before There are 2 known problem patterns shown in the test diffs here: vector horizontal ops (an x86 specialization) and vector reductions. SLP has greater ability to match and fold those than vector-combine, so let SLP have first chance at that. This is a quick fix while we continue to improve vector-combine and possibly canonicalize to reduction intrinsics. In the longer term, we should improve matching of these patterns because if they were created in the "bad" forms shown here, then we would miss optimizing them. I'm not sure what is happening with alias analysis on the addsub test. The old pass manager now shows an extra line for that, and we see an improvement that comes from SLP vectorizing a store. I don't know what's missing with the new pass manager to make that happen. Strangely, I can't reproduce the behavior if I compile from C++ with clang and invoke the new PM with "-fexperimental-new-pass-manager". Differential Revision: https://reviews.llvm.org/D80236	2020-05-22 12:22:44 -04:00
OCHyams	da100de0a6	[NFC][DwarfDebug] Add test for variables with a single location which don't span their entire scope. The previous commit (`6d1c40c171`) is an older version of the test. Reviewed By: aprantl, vsk Differential Revision: https://reviews.llvm.org/D79573	2020-05-11 11:49:11 +02:00
Johannes Doerfert	c5794f77eb	[Attributor][PM] Introduce `-attributor-enable={none,cgscc,module,all}` The old command line option `-attributor-disable` was too coarse grained as we want to measure the effects of the module or cgscc pass without the other as well. Since `none` is the default there is no real functional change. Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D78571	2020-04-21 15:22:10 -05:00
Teresa Johnson	33ffb62e23	Allow disabling of vectorization using internal options Summary: Currently, the internal options -vectorize-loops, -vectorize-slp, and -interleave-loops do not have much practical effect. This is because they are used to initialize the corresponding flags in the pass managers, and those flags are then unconditionally overwritten when compiling via clang or via LTO from the linkers. The only exception was -vectorize-loops via opt because of some special hackery there. While vectorization could still be disabled when compiling via clang, using -fno-[slp-]vectorize, this meant that there was no way to disable it when compiling in LTO mode via the linkers. This only affected ThinLTO, since for regular LTO vectorization is done during the compile step for scalability reasons. For ThinLTO it is invoked in the LTO backends. See also the discussion on PR45434. This patch makes it so the internal options can actually be used to disable these optimizations. Ultimately, the best long term solution is to mark the loops with metadata (similar to the approach used to fix -fno-unroll-loops in D77058), but this enables a shorter term workaround, and actually makes these internal options useful. I constant propagated the initial values of these internal flags into the pass manager flags (for some reasons vectorize-loops and interleave-loops were initialized to true, while vectorize-slp was initialized to false). As mentioned above, they are overwritten unconditionally so this doesn't have any real impact, and these initial values aren't particularly meaningful. I then changed the passes to check the internl values and return without performing the associated optimization when false (I changed the default of -vectorize-slp to true so the options behave similarly). I was able to remove the hackery in opt used to get -vectorize-loops=false to work, as well as a special option there used to disable SLP vectorization. Finally, I changed thinlto-slp-vectorize-pm.c to: a) Only test SLP (moved the loop vectorization checking to a new test). b) Use code that is slp vectorized when it is enabled, and check that instead of whether the pass is enabled. c) Test the new behavior of -vectorize-slp. d) Test both pass managers. The loop vectorization (and associated interleaving) testing I moved to a new thinlto-loop-vectorize-pm.c test, with several changes: a) Changed the flags on the interleaving testing so that it will actually interleave, and check that. b) Test the new behavior of -vectorize-loops and -interleave-loops. c) Test both pass managers. Reviewers: fhahn, wmi Subscribers: hiraditya, steven_wu, dexonsmith, cfe-commits, davezarzycki, llvm-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D77989	2020-04-14 18:09:10 -07:00
Tarindu Jayatilaka	b43b59fcc0	Expose `attributor-disable` to the new and old pass managers The new and old pass managers (PassManagerBuilder.cpp and PassBuilder.cpp) are exposed to an `extern` declaration of `attributor-disable` option which will guard the addition of the attributor passes to the pass pipelines. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D76871	2020-04-05 22:29:34 -05:00
Nikita Popov	dc81923659	[InstCombine] Remove ExpensiveCombines option D75801 removed the last and only user of this option, so we can drop it now. The original idea behind this was to only run expensive transforms under -O3, but apart from the one known bits transform, this has never really taken off. I believe nowadays the recommendation is to put expensive transforms in AggressiveInstCombine instead, though that isn't terribly popular either :) Differential Revision: https://reviews.llvm.org/D76540	2020-03-22 16:56:28 +01:00
Sanjay Patel	71a316883d	[PassManager] adjust VectorCombine placement The initial placement of vector-combine in the opt pipeline revealed phase ordering bugs: https://bugs.llvm.org/show_bug.cgi?id=45015 https://bugs.llvm.org/show_bug.cgi?id=42022 This patch contains a few independent changes: 1. Move the pass up in the pipeline, so it happens just after loop-vectorization. This is only to keep vectorization passes together in the pipeline at the moment. I don't have evidence of interaction between these yet. 2. Add an -early-cse pass after -vector-combine to clean up redundant ops. This was partly proposed as far back as rL219644 (which is why it's effectively being moved in the old PM code). This is important because the subsequent -instcombine doesn't work as well without EarlyCSE. With the CSE, -instcombine is able to squash shuffles together in 1 of the tests (because those are simple "select" shuffles). 3. Remove the -vector-combine pass that was running after SLP. We may want to do that eventually, but I don't have a test case to support it yet. Differential Revision: https://reviews.llvm.org/D75145	2020-03-04 11:10:49 -05:00
Teresa Johnson	80bf137fa1	Revert "Restore "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP"" This reverts commit `80d0a137a5`, and the follow on fix in `873c0d0786`. It is causing test failures after a multi-stage clang bootstrap. See discussion on D73242 and D75201.	2020-03-02 14:02:13 -08:00
Ayke van Laethem	2a7a989c3e	[LLVM-C] Add bindings for addCoroutinePassesToExtensionPoints This patch adds bindings to C and Go for addCoroutinePassesToExtensionPoints, which is used to add coroutine passes to the correct locations in PassManagerBuilder. Differential Revision: https://reviews.llvm.org/D51642	2020-02-24 20:15:51 +01:00
Teresa Johnson	80d0a137a5	Restore "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP" This restores commit `748bb5a0f1`, along with a fix for a Chromium test suite build issue (and a new test for that case). Differential Revision: https://reviews.llvm.org/D73242	2020-02-11 10:48:05 -08:00
Sanjay Patel	a17f03bd93	[VectorCombine] new IR transform pass for partial vector ops We have several bug reports that could be characterized as "reducing scalarization", and this topic was also raised on llvm-dev recently: http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html ...so I'm proposing that we deal with these patterns in a new, lightweight IR vector pass that runs before/after other vectorization passes. There are 4 alternate options that I can think of to deal with this kind of problem (and we've seen various attempts at all of these), but they all have flaws: InstCombine - can't happen without TTI, but we don't want target-specific folds there. SDAG - too late to assist other vectorization passes; TLI is not equipped for these kind of cost queries; limited to a single basic block. CGP - too late to assist other vectorization passes; would need to re-implement basic cleanups like CSE/instcombine. SLP - doesn't fit with existing transforms; limited to a single basic block. This initial patch/transform is based on existing code in AggressiveInstCombine: we walk backwards through the function looking for a pattern match. But we diverge from that cost-independent IR canonicalization pass by using TTI to decide if the vector alternative is profitable. We probably have at least 10 similar bug reports/patterns (binops, constants, inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements. It's possible that we could iterate on a worklist to fix-point like InstCombine does, but it's safer to start with a most basic case and evolve from there, so I didn't try to do anything fancy with this initial implementation. Differential Revision: https://reviews.llvm.org/D73480	2020-02-09 10:04:41 -05:00
Johannes Doerfert	b0c77c36d2	[Attributor] Add an Attributor CGSCC pass and run it In addition to the module pass, this patch introduces a CGSCC pass that runs the Attributor on a strongly connected component of the call graph (both old and new PM). The Attributor was always design to be used on a subset of functions which makes this patch mostly mechanical. The one change is that we give up `norecurse` deduction in the module pass in favor of doing it during the CGSCC pass. This makes the interfaces simpler but can be revisited if needed. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D70767	2020-02-08 21:27:34 -06:00
Johannes Doerfert	9548b74a83	[OpenMP] Introduce the OpenMPOpt transformation pass The OpenMPOpt pass is a CGSCC pass in which OpenMP specific optimizations can reside. The OpenMPOpt pass uses the OpenMPKinds.def file to identify runtime calls and their uses. This allows targeted transformations and eases their implementation. This initial patch deduplicates `__kmpc_global_thread_num` and `omp_get_thread_num` calls. We can also identify arguments that are equivalent to such a call result and use it instead. Later we can determine "gtid" arguments based on the use in kernel functions etc. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D69930	2020-02-08 14:47:03 -06:00
Teresa Johnson	25aa2eef99	Revert "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP" This reverts commit `748bb5a0f1`. Due to Chromium CFI+ThinLTO test crashes reported on patch.	2020-02-05 19:27:32 -08:00
Teresa Johnson	748bb5a0f1	[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP Summary: Currently type test assume sequences inserted for devirtualization are removed during WPD. This patch delays their removal until later in the optimization pipeline. This is an enabler for upcoming enhancements to indirect call promotion, for example streamlined promotion guard sequences that compare against vtable address instead of the target function, when there are small number of possible vtables (either determined via WPD or by in-progress type profiling). We need the type tests to correlate the callsites with the address point offset needed in the compare sequence, and optionally to associated type summary info computed during WPD. This depends on work in D71913 to enable invocation of LowerTypeTests to drop type test assume sequences, which will now be invoked following ICP in the ThinLTO post-LTO link pipelines, and also after the existing export phase LowerTypeTests invocation in regular LTO (which is already after ICP). We cannot simply move the existing import phase LowerTypeTests pass later in the ThinLTO post link pipelines, as the comment in PassBuilder.cpp notes (it must run early because when performing CFI other passes may disturb the sequences it looks for). This necessitated adding a new type test resolution "Unknown" that we can use on the type test assume sequences previously removed by WPD, that we now want LTT to ignore. Depends on D71913. Reviewers: pcc, evgeny777 Subscribers: mehdi_amini, Prazek, hiraditya, steven_wu, dexonsmith, arphaman, davidxl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D73242	2020-02-05 08:59:48 -08:00
Elia Geretto	ab2300bc15	[PassManagerBuilder] Remove global extension when a plugin is unloaded This commit fixes PR39321. GlobalExtensions is not guaranteed to be destroyed when optimizer plugins are unloaded. If it is indeed destroyed after a plugin is dlclose-d, the destructor of the corresponding ExtensionFn is not mapped anymore, causing a call to unmapped memory during destruction. This commit guarantees that extensions coming from external plugins are removed from GlobalExtensions when the plugin is unloaded if GlobalExtensions has not been destroyed yet. Differential Revision: https://reviews.llvm.org/D71959	2020-01-29 16:15:45 +00:00
Florian Hahn	526244b187	[Matrix] Add first set of matrix intrinsics and initial lowering pass. This is the first patch adding an initial set of matrix intrinsics and a corresponding lowering pass. This has been discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html The first patch introduces four new intrinsics (transpose, multiply, columnwise load and store) and a LowerMatrixIntrinsics pass, that lowers those intrinsics to vector operations. Matrixes are embedded in a 'flat' vector (e.g. a 4 x 4 float matrix embedded in a <16 x float> vector) and the intrinsics take the dimension information as parameters. Those parameters need to be ConstantInt. For the memory layout, we initially assume column-major, but in the RFC we also described how to extend the intrinsics to support row-major as well. For the initial lowering, we split the input of the intrinsics into a set of column vectors, transform those column vectors and concatenate the result columns to a flat result vector. This allows us to lower the intrinsics without any shape propagation, as mentioned in the RFC. In follow-up patches, we plan to submit the following improvements: * Shape propagation to eliminate the embedding/splitting for each intrinsic. * Fused & tiled lowering of multiply and other operations. * Optimization remarks highlighting matrix expressions and costs. * Generate loops for operations on large matrixes. * More general block processing for operation on large vectors, exploiting shape information. We would like to add dedicated transpose, columnwise load and store intrinsics, even though they are not strictly necessary. For example, we could instead emit a large shufflevector instruction instead of the transpose. But we expect that to (1) become unwieldy for larger matrixes (even for 16x16 matrixes, the resulting shufflevector masks would be huge), (2) risk instcombine making small changes, causing us to fail to detect the transpose, preventing better lowerings For the load/store, we are additionally planning on exploiting the intrinsics for better alias analysis. Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor, efriedma, rengolin Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70456	2019-12-12 15:42:18 +00:00
Dávid Bolvanský	40963b2bf0	Revert "[Attributor] Move pass after InstCombine to futher eliminate null pointer checks" This reverts commit `7ca7d62c6e`. Commited accidentally.	2019-11-27 22:45:47 +01:00
Dávid Bolvanský	7ca7d62c6e	[Attributor] Move pass after InstCombine to futher eliminate null pointer checks Summary: PR44149 Reviewers: jdoerfert Subscribers: mehdi_amini, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70737	2019-11-27 22:36:51 +01:00
Eric Christopher	fd39b1bb20	Revert "Revert "As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there."" This reapplies: `8ff85ed905` Original commit message: As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there. This change doesn't include any change to move from selection dag to fast isel and that will come with other numbers that should help inform that decision. There also haven't been any real debuggability studies with this pipeline yet, this is just the initial start done so that people could see it and we could start tweaking after. Test updates: Outside of the newpm tests most of the updates are coming from either optimization passes not run anymore (and without a compelling argument at the moment) that were largely used for canonicalization in clang. Original post: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410 This reverts commit `c9ddb02659`.	2019-11-26 20:28:52 -08:00
Muhammad Omair Javaid	c9ddb02659	Revert "As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there." This reverts commit `8ff85ed905`. This commit introduced 9 new failures on lldb buildbot host at http://lab.llvm.org:8014/builders/lldb-aarch64-ubuntu Following tests were failing: lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq1/TestAmbiguousTailCallSeq1.py lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq2/TestAmbiguousTailCallSeq2.py lldb-api :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py lldb-api :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py lldb-api :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py lldb-api :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py lldb-api :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py lldb-api :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py lldb-api :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py lldb-api :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410	2019-11-26 09:32:13 +05:00
Eric Christopher	8ff85ed905	As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there. This change doesn't include any change to move from selection dag to fast isel and that will come with other numbers that should help inform that decision. There also haven't been any real debuggability studies with this pipeline yet, this is just the initial start done so that people could see it and we could start tweaking after. Test updates: Outside of the newpm tests most of the updates are coming from either optimization passes not run anymore (and without a compelling argument at the moment) that were largely used for canonicalization in clang. Original post: http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html Tags: #llvm Differential Revision: https://reviews.llvm.org/D65410	2019-11-25 17:16:46 -08:00
Joerg Sonnenberger	9681ea9560	Reapply r374743 with a fix for the ocaml binding Add a pass to lower is.constant and objectsize intrinsics This pass lowers is.constant and objectsize intrinsics not simplified by earlier constant folding, i.e. if the object given is not constant or if not using the optimized pass chain. The result is recursively simplified and constant conditionals are pruned, so that dead blocks are removed even for -O0. This allows inline asm blocks with operand constraints to work all the time. The new pass replaces the existing lowering in the codegen-prepare pass and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert on the intrinsics. Differential Revision: https://reviews.llvm.org/D65280 llvm-svn: 374784	2019-10-14 16:15:14 +00:00
Dmitri Gribenko	1a21f98ac3	Revert "Add a pass to lower is.constant and objectsize intrinsics" This reverts commit r374743. It broke the build with Ocaml enabled: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218 llvm-svn: 374768	2019-10-14 12:22:48 +00:00
Joerg Sonnenberger	e4300c392d	Add a pass to lower is.constant and objectsize intrinsics This pass lowers is.constant and objectsize intrinsics not simplified by earlier constant folding, i.e. if the object given is not constant or if not using the optimized pass chain. The result is recursively simplified and constant conditionals are pruned, so that dead blocks are removed even for -O0. This allows inline asm blocks with operand constraints to work all the time. The new pass replaces the existing lowering in the codegen-prepare pass and fallbacks in SDAG/GlobalISEL and FastISel. The latter now assert on the intrinsics. Differential Revision: https://reviews.llvm.org/D65280 llvm-svn: 374743	2019-10-13 23:00:15 +00:00
Dmitri Gribenko	2bf8d77453	Revert "Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."" This reverts commit r371502, it broke tests (clang/test/CodeGenCXX/auto-var-init.cpp). llvm-svn: 371507	2019-09-10 10:39:09 +00:00
Clement Courbet	612c260ec3	Reland "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline." With a fix for sanitizer breakage (see explanation in D60318). llvm-svn: 371502	2019-09-10 09:18:00 +00:00
Serge Guelton	4137aeb4bf	Provide basic Full LTO extension points Differential Revision: https://reviews.llvm.org/D61738 llvm-svn: 364937	2019-07-02 15:52:39 +00:00
Clement Courbet	2851248fa1	Revert "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline." Breaks sanitizers: libFuzzer :: cxxstring.test libFuzzer :: memcmp.test libFuzzer :: recommended-dictionary.test libFuzzer :: strcmp.test libFuzzer :: value-profile-mem.test libFuzzer :: value-profile-strcmp.test llvm-svn: 364416	2019-06-26 12:13:13 +00:00
Clement Courbet	7b3a5f0e6d	[ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline. This allows later passes (in particular InstCombine) to optimize more cases. One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0. llvm-svn: 364412	2019-06-26 11:50:18 +00:00
Johannes Doerfert	aade782a98	[Attributor] Pass infrastructure and fixpoint framework NOTE: Note that no attributes are derived yet. This patch will not go in alone but only with others that derive attributes. The framework is split for review purposes. This commit introduces the Attributor pass infrastructure and fixpoint iteration framework. Further patches will introduce abstract attributes into this framework. In a nutshell, the Attributor will update instances of abstract arguments until a fixpoint, or a "timeout", is reached. Communication between the Attributor and the abstract attributes that are derived is restricted to the AbstractState and AbstractAttribute interfaces. Please see the file comment in Attributor.h for detailed information including design decisions and typical use case. Also consider the class documentation for Attributor, AbstractState, and AbstractAttribute. Reviewers: chandlerc, homerdin, hfinkel, fedor.sergeev, sanjoy, spatel, nlopes, nicholas, reames Subscribers: mehdi_amini, mgorny, hiraditya, bollu, steven_wu, dexonsmith, dang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59918 llvm-svn: 362578	2019-06-05 03:02:24 +00:00
Alina Sbirlea	d82ddfa7c3	[NewPassManager] Add tuning option: ForgetAllSCEVInLoopUnroll [NFC]. Summary: Mirror tuning option from old pass manager in new pass manager. Reviewers: chandlerc Subscribers: mehdi_amini, jlebar, zzheng, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61612 llvm-svn: 361560	2019-05-23 21:52:59 +00:00
Alina Sbirlea	458c7339e1	[NewPassManager] Add tuning option: SLPVectorization [NFC]. Summary: Mirror tuning option from old pass manager in new pass manager. Reviewers: chandlerc Subscribers: mehdi_amini, jlebar, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61616 llvm-svn: 360276	2019-05-08 17:58:35 +00:00
Alina Sbirlea	4e1ac95cf5	[PassManagerBuilder] Add option for interleaved loops, for loop vectorize. Summary: Match NewPassManager behavior: add option for interleaved loops in the old pass manager, and use that instead of the flag used to disable loop unroll. No changes in the defaults. Reviewers: chandlerc Subscribers: mehdi_amini, jlebar, dmgreen, hsaito, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61030 llvm-svn: 359615	2019-04-30 21:29:20 +00:00
Eric Christopher	dfebd84eb3	Remove the EnableEarlyCSEMemSSA set of options from the legacy and new pass managers. They were default to true and not being used. Differential Revision: https://reviews.llvm.org/D60747 llvm-svn: 358789	2019-04-19 22:18:53 +00:00
Alina Sbirlea	43709f7233	[LICM & MemorySSA] Make limit flags pass tuning options. Summary: Make the flags in LICM + MemorySSA tuning options in the old and new pass managers. Subscribers: mehdi_amini, jlebar, Prazek, george.burgess.iv, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60490 llvm-svn: 358772	2019-04-19 17:46:50 +00:00
Alina Sbirlea	0499a2f961	[NewPassManager] Adding pass tuning options: loop vectorize. Summary: Trying to add the plumbing necessary to add tuning options to the new pass manager. Testing with the flags for loop vectorize. Reviewers: chandlerc Subscribers: sanjoy, mehdi_amini, jlebar, steven_wu, dexonsmith, dang, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59723 llvm-svn: 358763	2019-04-19 16:11:59 +00:00
Eric Christopher	eff3b6fe7f	Elaborate why we have an option on by default for enabling chr. llvm-svn: 358641	2019-04-18 06:17:40 +00:00
Eric Christopher	0ebbf72a63	Remove the run-slp-after-loop-vectorization option. It's been on by default for 4 years and cleans up the pass hierarchy. llvm-svn: 358548	2019-04-17 02:26:27 +00:00
Alina Sbirlea	2312a06c87	[SCEV] Add option to forget everything in SCEV. Summary: Create a method to forget everything in SCEV. Add a cl::opt and PassManagerBuilder option to use this in LoopUnroll. Motivation: Certain Halide applications spend a very long time compiling in forgetLoop, and prefer to forget everything and rebuild SCEV from scratch. Sample difference in compile time reduction: 21.04 to 14.78 using current ToT release build. Testcase showcasing this cannot be opensourced and is fairly large. The option disabled by default, but it may be desirable to enable by default. Evidence in favor (two difference runs on different days/ToT state): File Before (s) After (s) clang-9.bc 7267.91 6639.14 llvm-as.bc 194.12 194.12 llvm-dis.bc 62.50 62.50 opt.bc 1855.85 1857.53 File Before (s) After (s) clang-9.bc 8588.70 7812.83 llvm-as.bc 196.20 194.78 llvm-dis.bc 61.55 61.97 opt.bc 1739.78 1886.26 Reviewers: sanjoy Subscribers: mehdi_amini, jlebar, zzheng, javed.absar, dmgreen, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60144 llvm-svn: 358304	2019-04-12 19:16:07 +00:00
Robert Lougher	f2158a8ef0	Resubmit r356511 "[TailCallElim] Add tailcall elimination pass to LTO pipelines" Failing LLD tests have been fixed in r356593. llvm-svn: 356594	2019-03-20 19:08:18 +00:00
Robert Lougher	c67a759c99	Revert r356511 "[TailCallElim] Add tailcall elimination pass to LTO pipelines" Due to buildbot failures (LLD tests). llvm-svn: 356516	2019-03-19 20:54:20 +00:00
Robert Lougher	de548ccab9	[TailCallElim] Add tailcall elimination pass to LTO pipelines LTO provides additional opportunities for tailcall elimination due to link-time inlining and visibility of nocapture attribute. Testing showed negligible impact on compilation times. Differential Revision: https://reviews.llvm.org/D58391 llvm-svn: 356511	2019-03-19 20:24:28 +00:00
Rong Xu	db29a3a438	[PGO] Context sensitive PGO (part 3) Part 3 of CSPGO changes (mostly related to PassMananger). Differential Revision: https://reviews.llvm.org/D54175 llvm-svn: 355330	2019-03-04 20:21:27 +00:00
Manman Ren	1829512dd3	Add a module pass for order file instrumentation The basic idea of the pass is to use a circular buffer to log the execution ordering of the functions. We only log the function when it is first executed. We use a 8-byte hash to log the function symbol name. In this pass, we add three global variables: (1) an order file buffer: a circular buffer at its own llvm section. (2) a bitmap for each module: one byte for each function to say if the function is already executed. (3) a global index to the order file buffer. At the function prologue, if the function has not been executed (by checking the bitmap), log the function hash, then atomically increase the index. Differential Revision: https://reviews.llvm.org/D57463 llvm-svn: 355133	2019-02-28 20:13:38 +00:00
Vedant Kumar	47a0c9b69c	[HotColdSplit] Schedule splitting late to fix perf regression With or without PGO data applied, splitting early in the pipeline (either before the inliner or shortly after it) regresses performance across SPEC variants. The cause appears to be that splitting hides context for subsequent optimizations. Schedule splitting late again, in effect reversing r352080, which scheduled the splitting pass early for code size benefits (documented in https://reviews.llvm.org/D57082). Differential Revision: https://reviews.llvm.org/D58258 llvm-svn: 354158	2019-02-15 18:46:44 +00:00
Teresa Johnson	716abbeb43	[HotColdSplit] Move splitting after instrumented PGO use Summary: Follow up to D57082 which moved splitting earlier in the pipeline, in order to perform it before inlining. However, it was moved too early, before the IR is annotated with instrumented PGO data. This caused the splitting to incorrectly determine cold functions. Move it to just after PGO annotation (still before inlining), in both pass managers. Reviewers: vsk, hiraditya, sebpop Subscribers: mehdi_amini, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57805 llvm-svn: 353270	2019-02-06 04:29:39 +00:00
Vedant Kumar	ef1ebed1c6	[HotColdSplit] Move splitting earlier in the pipeline Performing splitting early has several advantages: - Inhibiting inlining of cold code early improves code size. Compared to scheduling splitting at the end of the pipeline, this cuts code size growth in half within the iOS shared cache (0.69% to 0.34%). - Inhibiting inlining of cold code improves compile time. There's no need to inline split cold functions, or to inline as much within those split functions as they are marked `minsize`. - During LTO, extra work is only done in the pre-link step. Less code must be inlined during cross-module inlining. An additional motivation here is that the most common cold regions identified by the static/conservative splitting heuristic can (a) be found before inlining and (b) do not grow after inlining. E.g. __assert_fail, os_log_error. The disadvantages are: - Some opportunities for splitting out cold code may be missed. This gap can potentially be narrowed by adding a worklist algorithm to the splitting pass. - Some opportunities to reduce code size may be lost (e.g. store sinking, when one side of the CFG diamond is split). This does not outweigh the code size benefits of splitting earlier. On net, splitting early in the pipeline has substantial code size benefits, and no major effects on memory locality or performance. We measured memory locality using ktrace data, and consistently found that 10% fewer pages were needed to capture 95% of text page faults in key iOS benchmarks. We measured performance on frequency-stabilized iOS devices using LNT+externals. This reverses course on the decision made to schedule splitting late in r344869 (D53437). Differential Revision: https://reviews.llvm.org/D57082 llvm-svn: 352080	2019-01-24 18:55:49 +00:00
Chandler Carruth	2946cd7010	Update the file headers across all of the LLVM projects in the monorepo to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636	2019-01-19 08:50:56 +00:00
Wei Mi	3bcccdfe38	[SampleFDO] Skip profile reading when flattened profile used in ThinLTO postlink If the sample profile has no inlining hierachy information included, we call the sample profile is flattened. For flattened profile, in ThinLTO postlink phase, SampleProfileLoader's hot function inlining and profile annotation will do nothing, so it is better to save the effort to read in the profile and run the sample profile loader pass. It is helpful for reducing compile time when the flattened profile is huge. Differential Revision: https://reviews.llvm.org/D54819 llvm-svn: 351476	2019-01-17 20:48:34 +00:00
Wei Mi	79c4408aa2	Fix a mistake in rL351392. PGOInstrGen should be initialized to "" instead of false. llvm-svn: 351397	2019-01-16 23:31:40 +00:00
Wei Mi	c876e3d42b	[PGO] Make pgo related options in opt more consistent. Currently we have pgo options defined in PassManagerBuilder.cpp only for instrument pgo, but not for sample pgo. We also have pgo options defined in NewPMDriver.cpp in opt only for new pass manager and for all kinds of pgo. They have some inconsistency. To make the options more consistent and make tests writing easier, the patch let old pass manager to share the same pgo options with new pass manager in opt, and removes the options in PassManagerBuilder.cpp. Differential Revision: https://reviews.llvm.org/D56749 llvm-svn: 351392	2019-01-16 23:19:02 +00:00
Teresa Johnson	853b962416	[ThinLTO] Handle chains of aliases At -O0, globalopt is not run during the compile step, and we can have a chain of an alias having an immediate aliasee of another alias. The summaries are constructed assuming aliases in a canonical form (flattened chains), and as a result only the base object but no intermediate aliases were preserved. Fix by adding a pass that canonicalize aliases, which ensures each alias is a direct alias of the base object. Reviewers: pcc, davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, arphaman, llvm-commits Differential Revision: https://reviews.llvm.org/D54507 llvm-svn: 350423	2019-01-04 19:04:54 +00:00
Michael Kruse	d4eb13c880	[LoopVectorize] Rename pass options. NFC. Rename: NoUnrolling to InterleaveOnlyWhenForced and AlwaysVectorize to !VectorizeOnlyWhenForced Contrary to what the name 'AlwaysVectorize' suggests, it does not unconditionally vectorize all loops, but applies a cost model to determine whether vectorization is profitable to all loops. Hence, passing false will disable the cost model, except when a loop is marked with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested by @hfinkel in D55716) better matches this behavior. Similarly, 'NoUnrolling' disables the profitability cost model for interleaving (a term to distinguish it from unrolling by the LoopUnrollPass); rename it for consistency. Differential Revision: https://reviews.llvm.org/D55785 llvm-svn: 349513	2018-12-18 17:46:09 +00:00
Michael Kruse	3284775b70	[LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops. When using clang with `-fno-unroll-loops` (implicitly added with `-O1`), the LoopUnrollPass is not not added to the (legacy) pass pipeline. This also means that it will not process any loop metadata such as llvm.loop.unroll.enable (which is generated by #pragma unroll or WarnMissedTransformationsPass emits a warning that a forced transformation has not been applied (see https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html). Such explicit transformations should take precedence over disabling heuristics. This patch unconditionally adds LoopUnrollPass to the optimizing pipeline (that is, it is still not added with `-O0`), but passes a flag indicating whether automatic unrolling is dis-/enabled. This is the same approach as LoopVectorize uses. The new pass manager's pipeline builder has no option to disable unrolling, hence the problem does not apply. Differential Revision: https://reviews.llvm.org/D55716 llvm-svn: 349509	2018-12-18 17:16:05 +00:00
Michael Kruse	7244852557	[Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes. When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g. #pragma clang loop unroll_and_jam(enable) #pragma clang loop distribute(enable) is the same as #pragma clang loop distribute(enable) #pragma clang loop unroll_and_jam(enable) and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used. This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance, !0 = !{!0, !1, !2} !1 = !{!"llvm.loop.unroll_and_jam.enable"} !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3} !3 = !{!"llvm.loop.distribute.enable"} defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop. Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account. For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations. Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated. To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied. With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling). Reviewed By: hfinkel, dmgreen Differential Revision: https://reviews.llvm.org/D49281 Differential Revision: https://reviews.llvm.org/D55288 llvm-svn: 348944	2018-12-12 17:32:52 +00:00
Xin Tong	642c8d3575	[LTO] Load sample profile in LTO link step. Summary: Load sample profile in LTO link step. ThinLTO calls populateModulePassManager to load the profile Reviewers: tejohnson, davidxl, danielcdh Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54564 llvm-svn: 346971	2018-11-15 18:06:42 +00:00
Aditya Kumar	d9e2e383a9	Schedule Hot Cold Splitting pass after most optimization passes Summary: In the new+old pass manager, hot cold splitting was schedule too early. Thanks to Vedant for pointing this out. Reviewers: sebpop, vsk Reviewed By: sebpop, vsk Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D53437 llvm-svn: 344869	2018-10-21 18:11:56 +00:00
Eric Christopher	dcf1d97c5c	Temporarily revert "[GVNHoist] Re-enable GVNHoist by default" This reverts commit r342387 as it's showing significant performance regressions in a number of benchmarks. Followed up with the committer and original thread with an example and will get performance numbers before recommitting. llvm-svn: 343522	2018-10-01 18:57:08 +00:00
Florian Hahn	8600fee52e	Recommit r343308: [LoopInterchange] Turn into a loop pass. llvm-svn: 343450	2018-10-01 09:59:48 +00:00
whitequark	29b2980159	Revert "[LLVM-C] Add bindings for addCoroutinePassesToExtensionPoints" This reverts commit c4baf7c2f06ff5459c4f5998ce980346e72bff97. Broke the bots, and should really be in Transforms/Coroutines instead. llvm-svn: 343337	2018-09-28 16:45:18 +00:00
whitequark	937afbc365	[LLVM-C] Add bindings for addCoroutinePassesToExtensionPoints Summary: This patch adds bindings to C and Go for addCoroutinePassesToExtensionPoints, which is used to add coroutine passes to the correct locations in PassManagerBuilder. Reviewers: whitequark, deadalnix Reviewed By: whitequark Subscribers: mehdi_amini, modocache, llvm-commits Differential Revision: https://reviews.llvm.org/D51642 llvm-svn: 343336	2018-09-28 16:38:11 +00:00
Florian Hahn	8d72ecc36f	Revert r343308: [LoopInterchange] Turn into a loop pass. llvm-svn: 343310	2018-09-28 10:20:07 +00:00
Florian Hahn	0694c159f7	[LoopInterchange] Turn into a loop pass. This patch turns LoopInterchange into a loop pass. It now only considers top-level loops and tries to move the innermost loop to the optimal position within the loop nest. By only looking at top-level loops, we might miss a few opportunities the function pass would get (e.g. if we have a loop nest of 3 loops, in the function pass we might process loops at level 1 and 2 and move the inner most loop to level 1, and then we process loops at levels 0, 1, 2 and interchange again, because we now have a different inner loop). But I think it would be better to handle such cases by picking the best inner loop from the start and avoid re-visiting the same loops again. The biggest advantage of it being a function pass is that it interacts nicely with the other loop passes. Without this patch, there are some performance regressions on AArch64 with loop interchanging enabled, where no loops were interchanged, but we missed out on some other loop optimizations. It also removes the SimplifyCFG run. We are just changing branches, so the CFG should not be more complicated, besides the additional 'unique' preheaders this pass might create. Reviewers: chandlerc, efriedma, mcrosier, javed.absar, xbolva00 Reviewed By: xbolva00 Differential Revision: https://reviews.llvm.org/D51702 llvm-svn: 343308	2018-09-28 09:45:50 +00:00
Alexandros Lamprineas	8a1c374b2e	[GVNHoist] Re-enable GVNHoist by default Rebase rL341954 since https://bugs.llvm.org/show_bug.cgi?id=38912 has been fixed by rL342055. Precommit testing performed: * Overnight runs of csmith comparing the output between programs compiled with gvn-hoist enabled/disabled. * Bootstrap builds of clang with UbSan/ASan configurations. llvm-svn: 342387	2018-09-17 12:24:55 +00:00
Alexandros Lamprineas	fe0512d575	Revert "[GVNHoist] Re-enable GVNHoist by default" This reverts rL341954. The builder `sanitizer-x86_64-linux-bootstrap-ubsan` has been failing with timeouts at stage2 clang/ubsan: [3065/3073] Linking CXX executable bin/lld command timed out: 1200 seconds without output running python ../sanitizer_buildbot/sanitizers/buildbot_selector.py, attempting to kill llvm-svn: 342001	2018-09-11 22:10:57 +00:00
Alexandros Lamprineas	db18e972d7	[GVNHoist] Re-enable GVNHoist by default Rebase rL340922 since https://bugs.llvm.org/show_bug.cgi?id=38807 has been fixed by rL341947. llvm-svn: 341954	2018-09-11 15:55:45 +00:00
Aditya Kumar	801394a3d7	Hot cold splitting pass Find cold blocks based on profile information (or optionally with static analysis). Forward propagate profile information to all cold-blocks. Outline a cold region. Set calling conv and prof hint for the callsite of the outlined function. Worked in collaboration with: Sebastian Pop <s.pop@samsung.com> Differential Revision: https://reviews.llvm.org/D50658 llvm-svn: 341669	2018-09-07 15:03:49 +00:00
Hiroshi Yamauchi	9775a620b0	[PGO] Control Height Reduction Summary: Control height reduction merges conditional blocks of code and reduces the number of conditional branches in the hot path based on profiles. if (hot_cond1) { // Likely true. do_stg_hot1(); } if (hot_cond2) { // Likely true. do_stg_hot2(); } -> if (hot_cond1 && hot_cond2) { // Hot path. do_stg_hot1(); do_stg_hot2(); } else { // Cold path. if (hot_cond1) { do_stg_hot1(); } if (hot_cond2) { do_stg_hot2(); } } This speeds up some internal benchmarks up to ~30%. Reviewers: davidxl Reviewed By: davidxl Subscribers: xbolva00, dmgreen, mehdi_amini, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D50591 llvm-svn: 341386	2018-09-04 17:19:13 +00:00
Alexandros Lamprineas	f6db5bcd38	Revert r340922 "[GVNHoist] Re-enable GVNHoist by default" Another sanitizer buildbot failed this time at bootstrap when compiling SemaTemplateInstantiate.cpp with this assertion: `dominates(MD, U) && "Memory Def does not dominate it's uses"'. http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/15047 llvm-svn: 340925	2018-08-29 13:00:55 +00:00
Alexandros Lamprineas	c03b9b8854	[GVNHoist] Re-enable GVNHoist by default Rebase rL338240 since the excessive memory usage observed when using GVNHoist with UBSan has been fixed by rL340818. Differential Revision: https://reviews.llvm.org/D49858 llvm-svn: 340922	2018-08-29 11:58:34 +00:00
Vlad Tsyrklevich	1c7160e85f	Revert "[GVNHoist] Re-enable GVNHoist by default" This reverts commit r338240 because it was causing OOMs on the UBSan buildbot when building clang/lib/Sema/SemaChecking.cpp llvm-svn: 338297	2018-07-30 20:07:33 +00:00
Alexandros Lamprineas	de3ca964c1	[GVNHoist] Re-enable GVNHoist by default My initial motivation for this came from https://reviews.llvm.org/D48122, where it was pointed out that my change didn't fit well in SimplifyCFG and therefore using GVNHoist was a better way to go. GVNHoist has been disabled for a while as there was a list of bugs related to it. I have fixed the following bugs: https://bugs.llvm.org/show_bug.cgi?id=37808 -> https://reviews.llvm.org/D48372 (rL337149) https://bugs.llvm.org/show_bug.cgi?id=36787 -> https://reviews.llvm.org/D49555 (rL337674) https://bugs.llvm.org/show_bug.cgi?id=37445 -> https://reviews.llvm.org/D49425 (rL337680) The next two bugs no longer occur, and it's unclear which commit fixed them: https://bugs.llvm.org/show_bug.cgi?id=36635 https://bugs.llvm.org/show_bug.cgi?id=37791 I investigated this one and proved to be unrelated to GVNHoist, but a genuine bug in NewGvn: https://bugs.llvm.org/show_bug.cgi?id=37660 To convince myself GVNHoist is in a good state I made a successful bootstrap build of LLVM. Merging this change now in order to make it to the LLVM 7.0.0 branch. Differential Revision: https://reviews.llvm.org/D49858 llvm-svn: 338240	2018-07-30 10:50:18 +00:00
Teresa Johnson	e214fdeb69	[ThinLTO] Ensure the TargetLibraryInfo is constructed early enough Summary: Without this change, the WholeProgramDevirt pass, which requires the TargetLibraryInfo, will construct one from the default triple. Fixes PR38139. Reviewers: pcc Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49278 llvm-svn: 337750	2018-07-23 21:58:19 +00:00
David Green	963401d2be	[UnrollAndJam] New Unroll and Jam pass This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder Loop So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 336062	2018-07-01 12:47:30 +00:00
Chandler Carruth	7c557f804d	[instsimplify] Move the instsimplify pass to use more obvious file names and diretory. Also cleans up all the associated naming to be consistent and removes the public access to the pass ID which was unused in LLVM. Also runs clang-format over parts that changed, which generally cleans up a bunch of formatting. This is in preparation for doing some internal cleanups to the pass. Differential Revision: https://reviews.llvm.org/D47352 llvm-svn: 336028	2018-06-29 23:36:03 +00:00
Tobias Edler von Koch	7609cb83e6	Re-land "[LTO] Enable module summary emission by default for regular LTO" Since we are now producing a summary also for regular LTO builds, we need to run the NameAnonGlobals pass in those cases as well (the summary cannot handle anonymous globals). See https://reviews.llvm.org/D34156 for details on the original change. This reverts commit 6c9ee4a4a438a8059aacc809b2dd57128fccd6b3. llvm-svn: 335385	2018-06-22 20:23:21 +00:00
Chandler Carruth	aa5f4d2e23	Revert r335306 (and r335314) - the Call Graph Profile pass. This is the first pass in the main pipeline to use the legacy PM's ability to run function analyses "on demand". Unfortunately, it turns out there are bugs in that somewhat-hacky approach. At the very least, it leaks memory and doesn't support -debug-pass=Structure. Unclear if there are larger issues or not, but this should get the sanitizer bots back to green by fixing the memory leaks. llvm-svn: 335320	2018-06-22 05:33:57 +00:00
Michael J. Spencer	fc93dd8e18	[Instrumentation] Add Call Graph Profile pass This patch adds support for generating a call graph profile from Branch Frequency Info. The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight. After scanning all the functions, it generates an appending module flag containing the data. The format looks like: !llvm.module.flags = !{!0} !0 = !{i32 5, !"CG Profile", !1} !1 = !{!2, !3, !4} ; List of edges !2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32 !3 = !{void (i1)* @freq, void ()* @a, i64 11} !4 = !{void (i1)* @freq, void ()* @b, i64 20} Differential Revision: https://reviews.llvm.org/D48105 llvm-svn: 335306	2018-06-21 23:31:10 +00:00
Chandler Carruth	71fd27043e	[PM/LoopUnswitch] When using the new SimpleLoopUnswitch pass, schedule loop-cleanup passes at the beginning of the loop pass pipeline, and re-enqueue loops after even trivial unswitching. This will allow us to much more consistently avoid simplifying code while doing trivial unswitching. I've also added a test case that specifically shows effective iteration using this technique. I've unconditionally updated the new PM as that is always using the SimpleLoopUnswitch pass, and I've made the pipeline changes for the old PM conditional on using this new unswitch pass. I added a bunch of comments to the loop pass pipeline in the old PM to make it more clear what is going on when reviewing. Hopefully this will unblock doing partial unswitching instead of just full unswitching. Differential Revision: https://reviews.llvm.org/D47408 llvm-svn: 333493	2018-05-30 02:46:45 +00:00
David Green	aee7ad0cde	Revert 333358 as it's failing on some builders. I'm guessing the tests reply on the ARM backend being built. llvm-svn: 333359	2018-05-27 12:54:33 +00:00
David Green	3034281b43	[UnrollAndJam] Add a new Unroll and Jam pass This is a simple implementation of the unroll-and-jam classical loop optimisation. The basic idea is that we take an outer loop of the form: for i.. ForeBlocks(i) for j.. SubLoopBlocks(i, j) AftBlocks(i) Instead of doing normal inner or outer unrolling, we unroll as follows: for i... i+=2 ForeBlocks(i) ForeBlocks(i+1) for j.. SubLoopBlocks(i, j) SubLoopBlocks(i+1, j) AftBlocks(i) AftBlocks(i+1) Remainder So we have unrolled the outer loop, then jammed the two inner loops into one. This can lead to a simpler inner loop if memory accesses can be shared between the now-jammed loops. To do this we have to prove that this is all safe, both for the memory accesses (using dependence analysis) and that ForeBlocks(i+1) can move before AftBlocks(i) and SubLoopBlocks(i, j). Differential Revision: https://reviews.llvm.org/D41953 llvm-svn: 333358	2018-05-27 12:11:21 +00:00
Vlad Tsyrklevich	b768d235a9	Revert "Enable EliminateAvailableExternally pass for -O1" This reverts commit r330961 because it breaks a handful of clang tests. llvm-svn: 330964	2018-04-26 17:54:53 +00:00
Vlad Tsyrklevich	42c5a9c29a	Enable EliminateAvailableExternally pass for -O1 Summary: Follow-up to D43690, the EliminateAvailableExternally pass currently runs under -O0 and -O2 and up. Under -O1 we would still want to drop available_externally symbols to reduce space without inlining having run. Reviewers: tejohnson Reviewed By: tejohnson Subscribers: mehdi_amini, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D46093 llvm-svn: 330961	2018-04-26 17:33:24 +00:00
David Blaikie	ba47dd16c5	Fix some layering in AggressiveInstCombine (avoiding inclusion of Scalar.h) llvm-svn: 330726	2018-04-24 15:40:07 +00:00

1 2 3 4 5 ...

445 Commits