llvm-project

Commit Graph

Author	SHA1	Message	Date
Yi Jiang	79eb0aa8cb	Reapply: Add slp vectorization to LTO passes. The bug it exposed has been fixed by r207983. <radar://16641956> llvm-svn: 208013	2014-05-05 23:14:46 +00:00
Yi Jiang	e2d5f29c2f	Revert r207571 - Add slp vectorization to LTO passes llvm-svn: 207693	2014-04-30 19:27:24 +00:00
Yi Jiang	4e234aa790	Add slp vectorization to LTO passes llvm-svn: 207571	2014-04-29 19:35:39 +00:00
Craig Topper	f40110f4d8	[C++] Use 'nullptr'. Transforms edition. llvm-svn: 207196	2014-04-25 05:29:35 +00:00
Duncan P. N. Exon Smith	49f3ec80c2	PMBuilder: Expose an option to disable tail calls Adds API to allow frontends to disable tail calls in PassManagerBuilder. <rdar://problem/16050591> llvm-svn: 206542	2014-04-18 01:05:15 +00:00
Duncan P. N. Exon Smith	2b69189c9c	LTO: Add more loop simplification passes to LTO Similar to r202051, add missing loop simplification passes to the LTO optimization pipeline. Patch by Rafael Espindola. llvm-svn: 206306	2014-04-15 17:48:15 +00:00
Hal Finkel	86b3064f2b	Move partial/runtime unrolling late in the pipeline The generic (concatenation) loop unroller is currently placed early in the standard optimization pipeline. This is a good place to perform full unrolling, but not the right place to perform partial/runtime unrolling. However, most targets don't enable partial/runtime unrolling, so this never mattered. However, even some x86 cores benefit from partial/runtime unrolling of very small loops, and follow-up commits will enable this. First, we need to move partial/runtime unrolling late in the optimization pipeline (importantly, this is after SLP and loop vectorization, as vectorization can drastically change the size of a loop), while keeping the full unrolling where it is now. This change does just that. llvm-svn: 205264	2014-03-31 23:23:51 +00:00
Arnold Schwaighofer	6ccda923e5	LTO: Add the loop vectorizer to the LTO pipeline. During the LTO phase LICM will move loop invariant global variables out of loops (informed by GlobalModRef). This makes more loops countable presenting opportunity for the loop vectorizer. Adding the loop vectorizer improves some TSVC benchmarks and twolf/ref dataset (5%) on x86-64. radar://15970632 llvm-svn: 202051	2014-02-24 18:19:31 +00:00
Chandler Carruth	5ad5f15cff	[cleanup] Move the Dominators.h and Verifier.h headers into the IR directory. These passes are already defined in the IR library, and it doesn't make any sense to have the headers in Analysis. Long term, I think there is going to be a much better way to divide these matters. The dominators code should be fully separated into the abstract graph algorithm and have that put in Support where it becomes obvious that evn Clang's CFGBlock's can use it. Then the verifier can manually construct dominance information from the Support-driven interface while the Analysis library can provide a pass which both caches, reconstructs, and supports a nice update API. But those are very long term, and so I don't want to leave the really confusing structure until that day arrives. llvm-svn: 199082	2014-01-13 09:26:24 +00:00
Renato Golin	729a3ae90a	Add #pragma vectorize enable/disable to LLVM The intended behaviour is to force vectorization on the presence of the flag (either turn on or off), and to continue the behaviour as expected in its absence. Tests were added to make sure the all cases are covered in opt. No tests were added in other tools with the assumption that they should use the PassManagerBuilder in the same way. This patch also removes the outdated -late-vectorize flag, which was on by default and not helping much. The pragma metadata is being attached to the same place as other loop metadata, but nothing forbids one from attaching it to a function (to enable #pragma optimize) or basic blocks (to hint the basic-block vectorizers), etc. The logic should be the same all around. Patches to Clang to produce the metadata will be produced after the initial implementation is agreed upon and committed. Patches to other vectorizers (such as SLP and BB) will be added once we're happy with the pass manager changes. llvm-svn: 196537	2013-12-05 21:20:02 +00:00
Hal Finkel	29aeb20518	Add a loop rerolling flag to the PassManagerBuilder This adds a boolean member variable to the PassManagerBuilder to control loop rerolling (just like we have for unrolling and the various vectorization options). This is necessary for control by the frontend. Loop rerolling remains disabled by default at all optimization levels. llvm-svn: 194966	2013-11-17 16:02:50 +00:00
Hal Finkel	bf45efde2d	Add a loop rerolling pass This adds a loop rerolling pass: the opposite of (partial) loop unrolling. The transformation aims to take loops like this: for (int i = 0; i < 3200; i += 5) { a[i] += alpha * b[i]; a[i + 1] += alpha * b[i + 1]; a[i + 2] += alpha * b[i + 2]; a[i + 3] += alpha * b[i + 3]; a[i + 4] += alpha * b[i + 4]; } and turn them into this: for (int i = 0; i < 3200; ++i) { a[i] += alpha * b[i]; } and loops like this: for (int i = 0; i < 500; ++i) { x[3i] = foo(0); x[3i+1] = foo(0); x[3*i+2] = foo(0); } and turn them into this: for (int i = 0; i < 1500; ++i) { x[i] = foo(0); } There are two motivations for this transformation: 1. Code-size reduction (especially relevant, obviously, when compiling for code size). 2. Providing greater choice to the loop vectorizer (and generic unroller) to choose the unrolling factor (and a better ability to vectorize). The loop vectorizer can take vector lengths and register pressure into account when choosing an unrolling factor, for example, and a pre-unrolled loop limits that choice. This is especially problematic if the manual unrolling was optimized for a machine different from the current target. The current implementation is limited to single basic-block loops only. The rerolling recognition should work regardless of how the loop iterations are intermixed within the loop body (subject to dependency and side-effect constraints), but the significant restriction is that the order of the instructions in each iteration must be identical. This seems sufficient to capture all current use cases. This pass is not currently enabled by default at any optimization level. llvm-svn: 194939	2013-11-16 23:59:05 +00:00
Rafael Espindola	282a47037b	Use LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN instead of the "dso list". There are two ways one could implement hiding of linkonce_odr symbols in LTO: * LLVM tells the linker which symbols can be hidden if not used from native files. * The linker tells LLVM which symbols are not used from other object files, but will be put in the dso symbol table if present. GOLD's API is the second option. It was implemented almost 1:1 in llvm by passing the list down to internalize. LLVM already had partial support for the first option. It is also very similar to how ld64 handles hiding these symbols when not doing LTO. This patch then * removes the APIs for the DSO list. * marks LTO_SYMBOL_SCOPE_DEFAULT_CAN_BE_HIDDEN all linkonce_odr unnamed_addr global values and other linkonce_odr whose address is not used. * makes the gold plugin responsible for handling the API mismatch. llvm-svn: 193800	2013-10-31 20:51:58 +00:00
Nadav Rotem	7f27e0b0ce	Mark some command line flags as hidden llvm-svn: 193013	2013-10-18 23:38:13 +00:00
Rafael Espindola	cda2911caa	Optimize linkonce_odr unnamed_addr functions during LTO. Generalize the API so we can distinguish symbols that are needed just for a DSO symbol table from those that are used from some native .o. The symbols that are only wanted for the dso symbol table can be dropped if llvm can prove every other dso has a copy (linkonce_odr) and the address is not important (unnamed_addr). llvm-svn: 191922	2013-10-03 18:29:09 +00:00
Nadav Rotem	5d78dba6d9	Enable late-vectorization by default. This patch changes the default setting for the LateVectorization flag that controls where the loop-vectorizer is ran. Perf gains: SingleSource/Benchmarks/Shootout/matrix -37.33% MultiSource/Benchmarks/PAQ8p/paq8p -22.83% SingleSource/Benchmarks/Linpack/linpack-pc -16.22% SingleSource/Benchmarks/Shootout-C++/ary3 -15.16% MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt -10.34% MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl -7.12% Regressions: SingleSource/Benchmarks/Misc/lowercase 15.10% MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt 13.18% SingleSource/Benchmarks/Shootout-C++/matrix 8.27% SingleSource/Benchmarks/CoyoteBench/lpbench 7.30% llvm-svn: 189858	2013-09-03 21:33:17 +00:00
Bill Wendling	4c0d9adecb	Random cleanup: No need to use a std::vector here, since createInternalizePass uses an ArrayRef. llvm-svn: 189632	2013-08-30 00:48:37 +00:00
Nadav Rotem	4c459bcd47	Vectorizer/PassManager: I am working on moving the vectorizer out of the SCC passes. This patch moves the SLP-vectorizer and BB-vectorizer back into SCC passes for two reasons: 1. They are a kind of cannonicalization. 2. The performance measurements show that it is better to keep them in. There should be no functional change if you are not enabling the LateVectorization mode. llvm-svn: 189539	2013-08-28 23:40:29 +00:00
Hal Finkel	6d09904cc9	Disable unrolling in the loop vectorizer when disabled in the pass manager When unrolling is disabled in the pass manager, the loop vectorizer should also not unroll loops. This will allow the -fno-unroll-loops option in Clang to behave as expected (even for vectorizable loops). The loop vectorizer's -force-vector-unroll option will (continue to) override the pass-manager setting (including -force-vector-unroll=0 to force use of the internal auto-selection logic). In order to test this, I added a flag to opt (-disable-loop-unrolling) to force disable unrolling through opt (the analog of -fno-unroll-loops in Clang). Also, this fixes a small bug in opt where the loop vectorizer was enabled only after the pass manager populated the queue of passes (the global_alias.ll test needed a slight update to the RUN line as a result of this fix). llvm-svn: 189499	2013-08-28 18:33:10 +00:00
Arnold Schwaighofer	124ccf3ad1	Also remove logic in LateVectorize llvm-svn: 188285	2013-08-13 16:12:04 +00:00
Arnold Schwaighofer	c14b59d1a1	Remove logic that decides whether to vectorize or not depending on O-levels I have moved this logic into clang and opt. llvm-svn: 188281	2013-08-13 15:51:25 +00:00
Tom Stellard	aa664d9b92	Factor FlattenCFG out from SimplifyCFG Patch by: Mei Ye llvm-svn: 187764	2013-08-06 02:43:45 +00:00
Nadav Rotem	e4e6e9ed47	Move the optlevel check to the frontend. llvm-svn: 187628	2013-08-01 22:41:58 +00:00
Nadav Rotem	9153b3871d	Only enable SLP-vectorization on O3 builds. llvm-svn: 187595	2013-08-01 18:28:15 +00:00
Tom Stellard	8b1e021e85	SimplifyCFG: Use parallel-and and parallel-or mode to consolidate branch conditions Merge consecutive if-regions if they contain identical statements. Both transformations reduce number of branches. The transformation is guarded by a target-hook, and is currently enabled only for +R600, but the correctness has been tested on X86 target using a variety of CPU benchmarks. Patch by: Mei Ye llvm-svn: 187278	2013-07-27 00:01:07 +00:00
Chandler Carruth	08e1b8742b	Add a flag to defer vectorization into a phase after the inliner and its CGSCC pass manager. This should insulate the inlining decisions from the vectorization decisions, however it may have both compile time and code size problems so it is just an experimental option right now. Adding this based on a discussion with Arnold and it seems at least worth having this flag for us to both run some experiments to see if this strategy is workable. It may solve some of the regressions seen with the loop vectorizer. llvm-svn: 184698	2013-06-24 07:21:47 +00:00
Meador Inge	dfb08a2cb8	Remove the simplify-libcalls pass (finally) This commit completely removes what is left of the simplify-libcalls pass. All of the functionality has now been migrated to the instcombine and functionattrs passes. The following C API functions are now NOPs: 1. LLVMAddSimplifyLibCallsPass 2. LLVMPassManagerBuilderSetDisableSimplifyLibCalls llvm-svn: 184459	2013-06-20 19:48:07 +00:00
Nadav Rotem	cde24ef389	Disable vectorization for -Oz. llvm-svn: 184089	2013-06-17 17:22:40 +00:00
Nadav Rotem	7dd8210b71	Enable the loop vectorizer by default for -Os and -O2. llvm-svn: 184084	2013-06-17 16:23:34 +00:00
Nadav Rotem	99e529ea3c	Jeffrey Yasskin volunteered to benchmark the vectorizer on -O2 or -Os when compiling chrome. This patch adds a new flag to enable vectorization on all levels and not only on -O3. It should go away once we make a decision. llvm-svn: 183456	2013-06-06 22:35:47 +00:00
Filip Pizlo	dec20e43c0	This patch breaks up Wrap.h so that it does not have to include all of the things, and renames it to CBindingWrapping.h. I also moved CBindingWrapping.h into Support/. This new file just contains the macros for defining different wrap/unwrap methods. The calls to those macros, as well as any custom wrap/unwrap definitions (like for array of Values for example), are put into corresponding C++ headers. Doing this required some #include surgery, since some .cpp files relied on the fact that including Wrap.h implicitly caused the inclusion of a bunch of other things. This also now means that the C++ headers will include their corresponding C API headers; for example Value.h must include llvm-c/Core.h. I think this is harmless, since the C API headers contain just external function declarations and some C types, so I don't believe there should be any nasty dependency issues here. llvm-svn: 180881	2013-05-01 20:59:00 +00:00
Eric Christopher	04d4e9312c	Move C++ code out of the C headers and into either C++ headers or the C++ files themselves. This enables people to use just a C compiler to interoperate with LLVM. llvm-svn: 180063	2013-04-22 22:47:22 +00:00
Nadav Rotem	b9116e6966	SLPVectorizer: Make it a function pass and add code for hoisting the vector-gather sequence out of loops. llvm-svn: 179562	2013-04-15 22:00:26 +00:00
Nadav Rotem	d4dcc003df	Add an option -vectorize-slp-aggressive for running the BB vectorizer. Make -fslp-vectorize run the slp-vectorizer. llvm-svn: 179508	2013-04-15 05:39:58 +00:00
Nadav Rotem	a1e5e44eb3	Rename the slp-vectorizer clang/llvm flags. No functionality change. llvm-svn: 179505	2013-04-15 04:54:42 +00:00
Nick Lewycky	5f50854186	Use LLVMBool instead of 'bool' in the C API. Based on a patch by Peter Zotov! llvm-svn: 176793	2013-03-10 21:58:22 +00:00
Andrew Trick	fcb37243f9	Generalize my previous fix for -print-options. Always print options that differ from their implicit default. At least for simple option types. llvm-svn: 176572	2013-03-06 19:04:56 +00:00
Andrew Trick	946c2b32e6	Give -loop-vectorize an explicit default. This way, clang -mllvm -print-options shows that the driver is overriding it. llvm-svn: 176569	2013-03-06 18:22:22 +00:00
Hal Finkel	bf4db4fe11	Unroll again after running BBVectorize Because BBVectorize may significantly shorten a loop body, unroll again after vectorization. This is especially important when using runtime or partial unrolling. llvm-svn: 173730	2013-01-29 00:22:49 +00:00
Chandler Carruth	683ff2d7f9	Remove the long defunct 'DefaultPasses' header. We have a pass manager builder these days, and this thing hasn't seen updates for a very long time. llvm-svn: 171741	2013-01-07 15:16:50 +00:00
Nadav Rotem	be6570d429	Move the loop vectorizer from O2 to O3. It looks like the increase in code size actually hurts the performance on many programs. llvm-svn: 171471	2013-01-04 17:57:44 +00:00
Roman Divacky	a229186a82	Remove duplicate includes. llvm-svn: 170902	2012-12-21 17:06:44 +00:00
Nadav Rotem	9aee065e3c	Enable the loop vectorizer in clang and not in the pass manager, so that we can disable it in clang. llvm-svn: 170470	2012-12-18 23:09:44 +00:00
Nadav Rotem	c0699854dd	Enable the loop vectorizer. llvm-svn: 170416	2012-12-18 06:37:12 +00:00
NAKAMURA Takumi	8f45b6c709	Revert r170246, "Enable the loop vectorizer by default." llvm-svn: 170267	2012-12-15 06:11:13 +00:00
Nadav Rotem	acde77481d	Enable the loop vectorizer by default. llvm-svn: 170246	2012-12-14 21:30:23 +00:00
Nadav Rotem	d3a3c9fdd5	revert r170166 - disable the loop vectorizer. llvm-svn: 170172	2012-12-14 01:57:00 +00:00
Nadav Rotem	3b606d6fd5	Enable the loop vectorizer. llvm-svn: 170166	2012-12-14 00:30:34 +00:00
Nadav Rotem	b4ea4b3751	Disable the loop vectorizer. llvm-svn: 170162	2012-12-14 00:02:07 +00:00
Nadav Rotem	e5e28b48c8	Enable the Loop Vectorizer by default for O2 and O3. Disable if-conversion by default. I plan to revert this patch later today. llvm-svn: 170157	2012-12-13 23:11:54 +00:00
Nadav Rotem	d0bb22bba3	LoopVectorizer: Use the "optsize" attribute to decide if we are allowed to increase the function size. llvm-svn: 170004	2012-12-12 19:29:45 +00:00
Nadav Rotem	aeb17df802	LoopVectorizer: When -Os is used, vectorize only loops that dont require a tail loop. There is no testcase because I dont know of a way to initialize the loop vectorizer pass without adding an additional hidden flag. llvm-svn: 169950	2012-12-12 01:11:46 +00:00
Nadav Rotem	36cdd82627	Enable the loop vectorizer only on O2 and above. (Still disabled by default) llvm-svn: 169774	2012-12-10 21:45:01 +00:00
Chandler Carruth	ed0881b2a6	Use the new script to sort the includes of every file under lib. Sooooo many of these had incorrect or strange main module includes. I have manually inspected all of these, and fixed the main module include to be the nearest plausible thing I could find. If you own or care about any of these source files, I encourage you to take some time and check that these edits were sensible. I can't have broken anything (I strictly added headers, and reordered them, never removed), but they may not be the headers you'd really like to identify as containing the API being implemented. Many forward declarations and missing includes were added to a header files to allow them to parse cleanly when included first. The main module rule does in fact have its merits. =] llvm-svn: 169131	2012-12-03 16:50:05 +00:00
Nadav Rotem	ec739205cc	No need to run LICM after loop vectorization because we dont generate invariant code any more. llvm-svn: 168928	2012-11-29 19:28:29 +00:00
Dmitri Gribenko	0011bbf985	Use empty parens for empty function parameter list instead of '(void)'. llvm-svn: 168049	2012-11-15 16:51:49 +00:00
Nadav Rotem	d3df665140	80-col llvm-svn: 167036	2012-10-30 18:37:43 +00:00
Nadav Rotem	39aab03be3	Rename the BB-vectorize flag to match the dragonegg name llvm-svn: 166948	2012-10-29 18:01:14 +00:00
Nadav Rotem	c59ae207ef	Change the PassManagerBuilder (used by -O3) loop vectorizer flag from -vectorize to -vectorize-loops because we dont want to share the same flag as the bb-vectorizer. llvm-svn: 166937	2012-10-29 16:36:25 +00:00
Rafael Espindola	4253bd8faf	Change the internalize pass to internalize all symbols when given an empty list of externals. This makes sense since a shared library with no symbols can still be useful if it has static constructors. llvm-svn: 166795	2012-10-26 18:47:48 +00:00
Nadav Rotem	086ea5c1f5	revert accidental change llvm-svn: 166643	2012-10-24 23:48:57 +00:00
Nadav Rotem	4a87683a41	Implement a basic cost model for vector and scalar instructions. llvm-svn: 166642	2012-10-24 23:47:38 +00:00
Chandler Carruth	e8479e15f5	Introduce a BarrierNoop pass, a hack designed to allow some control over the implicitly-formed-and-nesting CGSCC pass manager and function pass managers, especially when using them on the opt commandline or using extension points in the module builder. The '-barrier' opt flag (or the pass itself) will create a no-op module pass in the pipeline, resetting the pass manager stack, and allowing the creation of a new pipeline of function passes or CGSCC passes to be created that is independent from any previous pipelines. For example, this can be used to test running two CGSCC passes in independent CGSCC pass managers as opposed to in the same CGSCC pass manager. It also allows us to introduce a further hack into the PassManagerBuilder to separate the O0 pipeline extension passes from the always-inliner's CGSCC pass manager, which they likely do not want to participate in... At the very least none of the Sanitizer passes want this behavior. This fixes a bug with ASan at O0 currently, and I'll commit the ASan test which covers this pass. I'm happy to add a test case that this pass exists and works, but not sure how much time folks would like me to spend adding test cases for the details of its behavior of partition pass managers.... The whole thing is just vile, and mostly intended to unblock ASan, so I'm hoping to rip this all out in a brave new pass manager world. llvm-svn: 166172	2012-10-18 08:05:46 +00:00
Nadav Rotem	6b94c2a09b	Add a loop vectorizer. llvm-svn: 166112	2012-10-17 18:25:06 +00:00
Chandler Carruth	4e4359935b	Turn the new SROA pass back on. Let's see if it sticks this time. =] Again, let me know if anything breaks due to this! llvm-svn: 164986	2012-10-02 04:24:01 +00:00
Evan Cheng	8c6b06d4a0	GlobalDCE should be run at -O2 / -Os to eliminate unused dtor, etc. rdar://9142819 llvm-svn: 164850	2012-09-28 21:23:26 +00:00
Nick Lewycky	2e646236fb	Disable the new SROA pass to get the tree back in working order. We don't yet have testcases for the current problems. llvm-svn: 164731	2012-09-26 22:43:04 +00:00
Chandler Carruth	8232bf53c6	Enable the new SROA pass by default. Queue the fallout. ;] llvm-svn: 164480	2012-09-24 01:10:25 +00:00
Benjamin Kramer	9bc3efc81c	LNT builders have picked up new SROA, disable it to get the remaining builders green again. llvm-svn: 164124	2012-09-18 13:43:00 +00:00
Chandler Carruth	42cb9cb14f	Add a major missing piece to the new SROA pass: aggressive splitting of FCAs. This is essential in order to promote allocas that are used in struct returns by frontends like Clang. The FCA load would block the rest of the pass from firing, resulting is significant regressions with the bullet benchmark in the nightly test suite. Thanks to Duncan for repeated discussions about how best to do this, and to both him and Benjamin for review. This appears to have blocked many places where the pass tries to fire, and so I'm expect somewhat different results with this fix added. As with the last big patch, I'm including a change to enable the SROA by default temporarily. Ben is going to remove this as soon as the LNT bots pick up the patch. I'm just trying to get a round of LNT numbers from the stable machines in the lab. NOTE: Four clang tests are expected to fail in the brief window where this is enabled. Sorry for the noise! llvm-svn: 164119	2012-09-18 12:57:43 +00:00
Benjamin Kramer	ed11e35e57	Disable new sroa now that all buildbots have tested it. What we have so far: - Some clang test failures (these were known already) - Perf results are mixed, some big regressions http://llvm.org/perf/db_default/v4/nts/3844 http://llvm.org/perf/db_default/v4/nts/3845 bullet suffers a lot. matmul is interesting: slower scalar code, faster with -vectorize. - Some dragonegg selfhost bots crash in SROA during selfhost now http://lab.llvm.org:8011/builders/dragonegg-x86_64-linux-gcc-4.6-self-host-checks/builds/1632 http://lab.llvm.org:8011/builders/dragonegg-x86_64-linux-gcc-4.5-self-host/builds/1891 llvm-svn: 163968	2012-09-15 15:11:10 +00:00
Chandler Carruth	70b44c5ccf	Port the SSAUpdater-based promotion logic from the old SROA pass to the new one, and add support for running the new pass in that mode and in that slot of the pass manager. With this the new pass can completely replace the old one within the pipeline. The strategy for enabling or disabling the SSAUpdater logic is to do it by making the requirement of the domtree analysis optional. By default, it is required and we get the standard mem2reg approach. This is usually the desired strategy when run in stand-alone situations. Within the CGSCC pass manager, we disable requiring of the domtree analysis and consequentially trigger fallback to the SSAUpdater promotion. In theory this would allow the pass to re-use a domtree if one happened to be available even when run in a mode that doesn't require it. In practice, it lets us have a single pass rather than two which was simpler for me to wrap my head around. There is a hidden flag to force the use of the SSAUpdater code path for the purpose of testing. The primary testing strategy is just to run the existing tests through that path. One notable difference is that it has custom code to handle lifetime markers, and one of the tests has been enhanced to exercise that code. This has survived a bootstrap and the test suite without serious correctness issues, however my run of the test suite produced very alarming performance numbers. I don't entirely understand or trust them though, so more investigation is on-going. To aid my understanding of the performance impact of the new SROA now that it runs throughout the optimization pipeline, I'm enabling it by default in this commit, and will disable it again once the LNT bots have picked up one iteration with it. I want to get those bots (which are much more stable) to evaluate the impact of the change before I jump to any conclusions. NOTE: Several Clang tests will fail because they run -O3 and check the result's order of output. They'll go back to passing once I disable it again. llvm-svn: 163965	2012-09-15 11:43:14 +00:00
Chandler Carruth	6ba9824c2b	Actually keep the flag default-off for now. =/ That's what I get for being busy testing this... llvm-svn: 163890	2012-09-14 10:18:54 +00:00
Chandler Carruth	1b398ae0ae	Introduce a new SROA implementation. This is essentially a ground up re-think of the SROA pass in LLVM. It was initially inspired by a few problems with the existing pass: - It is subject to the bane of my existence in optimizations: arbitrary thresholds. - It is overly conservative about which constructs can be split and promoted. - The vector value replacement aspect is separated from the splitting logic, missing many opportunities where splitting and vector value formation can work together. - The splitting is entirely based around the underlying type of the alloca, despite this type often having little to do with the reality of how that memory is used. This is especially prevelant with unions and base classes where we tail-pack derived members. - When splitting fails (often due to the thresholds), the vector value replacement (again because it is separate) can kick in for preposterous cases where we simply should have split the value. This results in forming i1024 and i2048 integer "bit vectors" that tremendously slow down subsequnet IR optimizations (due to large APInts) and impede the backend's lowering. The new design takes an approach that fundamentally is not susceptible to many of these problems. It is the result of a discusison between myself and Duncan Sands over IRC about how to premptively avoid these types of problems and how to do SROA in a more principled way. Since then, it has evolved and grown, but this remains an important aspect: it fixes real world problems with the SROA process today. First, the transform of SROA actually has little to do with replacement. It has more to do with splitting. The goal is to take an aggregate alloca and form a composition of scalar allocas which can replace it and will be most suitable to the eventual replacement by scalar SSA values. The actual replacement is performed by mem2reg (and in the future SSAUpdater). The splitting is divided into four phases. The first phase is an analysis of the uses of the alloca. This phase recursively walks uses, building up a dense datastructure representing the ranges of the alloca's memory actually used and checking for uses which inhibit any aspects of the transform such as the escape of a pointer. Once we have a mapping of the ranges of the alloca used by individual operations, we compute a partitioning of the used ranges. Some uses are inherently splittable (such as memcpy and memset), while scalar uses are not splittable. The goal is to build a partitioning that has the minimum number of splits while placing each unsplittable use in its own partition. Overlapping unsplittable uses belong to the same partition. This is the target split of the aggregate alloca, and it maximizes the number of scalar accesses which become accesses to their own alloca and candidates for promotion. Third, we re-walk the uses of the alloca and assign each specific memory access to all the partitions touched so that we have dense use-lists for each partition. Finally, we build a new, smaller alloca for each partition and rewrite each use of that partition to use the new alloca. During this phase the pass will also work very hard to transform uses of an alloca into a form suitable for promotion, including forming vector operations, speculating loads throguh PHI nodes and selects, etc. After splitting is complete, each newly refined alloca that is a candidate for promotion to a scalar SSA value is run through mem2reg. There are lots of reasonably detailed comments in the source code about the design and algorithms, and I'm going to be trying to improve them in subsequent commits to ensure this is well documented, as the new pass is in many ways more complex than the old one. Some of this is still a WIP, but the current state is reasonbly stable. It has passed bootstrap, the nightly test suite, and Duncan has run it successfully through the ACATS and DragonEgg test suites. That said, it remains behind a default-off flag until the last few pieces are in place, and full testing can be done. Specific areas I'm looking at next: - Improved comments and some code cleanup from reviews. - SSAUpdater and enabling this pass inside the CGSCC pass manager. - Some datastructure tuning and compile-time measurements. - More aggressive FCA splitting and vector formation. Many thanks to Duncan Sands for the thorough final review, as well as Benjamin Kramer for lots of review during the process of writing this pass, and Daniel Berlin for reviewing the data structures and algorithms and general theory of the pass. Also, several other people on IRC, over lunch tables, etc for lots of feedback and advice. llvm-svn: 163883	2012-09-14 09:22:59 +00:00
Hal Finkel	204bf5352a	By default, use Early-CSE instead of GVN for vectorization cleanup. As has been suggested by Duncan and others, Early-CSE and GVN should do similar redundancy elimination, but Early-CSE is much less expensive. Most of my autovectorization benchmarks show a performance regresion, but all of these are < 0.1%, and so I think that it is still worth using the less expensive pass. llvm-svn: 154673	2012-04-13 17:15:33 +00:00
Bill Wendling	932b992888	Add an option to turn off the expensive GVN load PRE part of GVN. llvm-svn: 153902	2012-04-02 22:16:50 +00:00
Kostya Serebryany	e505a5abe9	add EP_OptimizerLast extension point llvm-svn: 153353	2012-03-23 23:22:59 +00:00
Hal Finkel	c34e51132c	Add a basic-block autovectorization pass. This is the initial checkin of the basic-block autovectorization pass along with some supporting vectorization infrastructure. Special thanks to everyone who helped review this code over the last several months (especially Tobias Grosser). llvm-svn: 149468	2012-02-01 03:51:43 +00:00
Dan Gohman	b9936296d3	Add a new PassManagerBuilder customization point, EP_ModuleOptimizerEarly, to allow passes to be added before the main ModulePass optimizers. llvm-svn: 148329	2012-01-17 20:51:32 +00:00
Duncan Sands	8fa0b6927d	Remove unused include. llvm-svn: 146037	2011-12-07 17:18:31 +00:00
Kostya Serebryany	dc436f95d2	make asan work at -O0, llvm part. Patch by glider@google.com llvm-svn: 145530	2011-11-30 22:19:26 +00:00
David Chisnall	719a72f34c	Add a mechanism for optimisation plugins to register passes that all front ends can use without needing to be aware of the plugin (or the plugin be aware of the front end). Before 3.0, I'd like to add a mechanism for automatically loading a set of plugins from a config file. API suggestions welcome... llvm-svn: 137717	2011-08-16 13:58:41 +00:00
Rafael Espindola	07f6091527	Add a C interface to PassManagerBuilder. It is missing the addExtension functionality since in the C api a pass is created and added to a pass manager in a single call. llvm-svn: 137159	2011-08-09 22:17:34 +00:00
Rafael Espindola	3ea478b7ac	Move methods in PassManagerBuilder offline. llvm-svn: 136727	2011-08-02 21:50:27 +00:00

1 2 3

134 Commits