llvm-project

Commit Graph

Author	SHA1	Message	Date
Tobias Grosser	ca7f5bb767	Full/partial tile separation for vectorization We isolate full tiles from partial tiles to be able to, for example, vectorize loops with parametric lower and/or upper bounds. If we use -polly-vectorizer=stripmine, we can see execution-time improvements: correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to 0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from 8m5565s to 7m4285s (-13.18 %). The current full/partial tile separation increases compile-time more than necessary. As a result, we see in compile time regressions, for example, for 3mm from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected as we generate more IR and consequently more time is spent in the LLVM backends. However, a first investiagation has shown that a larger portion of compile time is unnecessarily spent inside Polly's parallelism detection and could be eliminated by propagating existing knowledge about vector loop parallelism. Before enabling -polly-vectorizer=stripmine by default, it is necessary to address this compile-time issue. Contributed-by: Roman Gareev <gareevroman@gmail.com> Reviewers: jdoerfert, grosser Subscribers: grosser, #polly Differential Revision: http://reviews.llvm.org/D13779 llvm-svn: 250809	2015-10-20 09:12:21 +00:00
Johannes Doerfert	01978cfa0c	Remove independent blocks pass Polly can now be used as a analysis only tool as long as the code generation is disabled. However, we do not have an alternative to the independent blocks pass in place yet, though in the relevant cases this does not seem to impact the performance much. Nevertheless, a virtual alternative that allows the same transformations without changing the input region will follow shortly. llvm-svn: 250652	2015-10-18 12:28:00 +00:00
Tobias Grosser	0e3a6b13a4	Sort includes using 'clang-format -sort-includes' llvm-svn: 250392	2015-10-15 12:17:36 +00:00
Tobias Grosser	f30be2f370	RegisterPasses: Optionally run inliner before Polly This will allow us to optimize C++ template code with Polly. This support is mostly for debugging purpose and individual experiments. The ultimate goal is still to run Polly later in the pass manager when inlining already happened. llvm-svn: 250092	2015-10-12 20:03:44 +00:00
Johannes Doerfert	f363ed9804	[NFC] Move helper functions to ScopHelper Helper functions in the BlockGenerators.h/cpp introduce dependences from the frontend to the backend of Polly. As they are used in ScopDetection, ScopInfo, etc. we move them to the ScopHelper file. llvm-svn: 249919	2015-10-09 23:40:24 +00:00
Johannes Doerfert	45be64464b	[NFC] Consistenly use commented and annotated ScopPass functions The changes affect methods that are part of the Pass interface and include: - Comments that describe the methods purpose. - A consistent use of the keywords override and virtual. Additionally, the printScop method is now optional and removed from SCoP passes that do not implement it. llvm-svn: 248685	2015-09-27 15:43:29 +00:00
Johannes Doerfert	0f37630849	[NFC] Use releaseMemory to release internal memory llvm-svn: 248684	2015-09-27 15:42:28 +00:00
Chandler Carruth	66ef16b289	[PM] Update Polly for the new AA infrastructure landed in r247167. llvm-svn: 247198	2015-09-09 22:13:56 +00:00
Tobias Grosser	fa57e9b7e6	Make our data-locality schedule tree transforms externally accessible Other passes which perform different optimizations might be interested in also applying data-locality transformations as part of their overall transformation. llvm-svn: 245824	2015-08-24 06:01:47 +00:00
Tobias Grosser	1ac884d73a	Use marker nodes to annotate the different levels of tiling Currently, marker nodes are ignored during AST generation, but visible in the -debug-only=polly-ast output. llvm-svn: 245809	2015-08-23 09:11:00 +00:00
Tobias Grosser	fc490a99f5	Do really not unroll the vector loop in combination with register tiling The previous commit lacked a test case for register tiling + pre-vectorization and we obviously got it immediately wrong. llvm-svn: 245599	2015-08-20 19:08:16 +00:00
Tobias Grosser	42e2489553	Add experimental support for trivial register tiling Register tiling in Polly is for now just an additional level of tiling which is fully unrolled. It is disabled by default. To make this useful for more than experiments, we still need a cost function as well as possibly further optimizations that teach LLVM to actually put some of the values we got into scalar registers. llvm-svn: 245564	2015-08-20 13:45:05 +00:00
Tobias Grosser	0483271662	Add support for two-level tiling By default we only use one level of tiling for loops, but in general tiling for multiple levels is trivial for us. Hence, we add a set of options that allow people to play with a second level of tiling. If this is profitable for some cases we can work on heuristics that allow us to identify these cases and use two-level tiling for them. llvm-svn: 245563	2015-08-20 13:45:02 +00:00
Tobias Grosser	862b9b5239	Factor out check for tileable band node. llvm-svn: 245559	2015-08-20 12:32:45 +00:00
Tobias Grosser	9bdea573bd	Introduce tileBand function to simplify code llvm-svn: 245558	2015-08-20 12:22:37 +00:00
Tobias Grosser	d891b54132	Add some forgotten isl memory annotations llvm-svn: 245557	2015-08-20 12:16:23 +00:00
Tobias Grosser	07c1c2fcc9	Make prevectorization width configurable Polly uses 'prevectorization' to enable outer loop vectorization. When vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop iterations which are than interchanged to the innermost level such that LLVM's inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this loop. The number of loop iterations to strip-mine is now configurable with the option -polly-prevect-width=<number-of-prevec-dims>. This is mostly a debugging option. We should probably add a heuristic that derives the number of prevectorization dimensions from the target data and the data types used. llvm-svn: 245424	2015-08-19 08:46:11 +00:00
Tobias Grosser	161c9081e5	Do not use negative option name Instead of -polly-no-tiling, we use -polly-tiling=false to disable tiling. llvm-svn: 245423	2015-08-19 08:22:06 +00:00
Tobias Grosser	f10f4636ff	Simplify tiling code a bit We only need to allocate the tile size vector if we actually want to perform a tiling. llvm-svn: 245422	2015-08-19 08:03:37 +00:00
Tobias Grosser	77c0f5a3b7	Drop dead and disable code from IndependentBlocks Since Polly has now support for the code generation of scalar and PHI dependences this code was unused and is now dropped. llvm-svn: 245284	2015-08-18 09:30:28 +00:00
Tobias Grosser	c5bcf246d1	Fix Polly after SCEV port to new pass manager This fixes compilation after LLVM commit r245193. llvm-svn: 245211	2015-08-17 10:57:08 +00:00
Tobias Grosser	234a48270e	AST Generation Paper published in TOPLAS The July issue of TOPLAS contains a 50 page discussion of the AST generation techniques used in Polly. This discussion gives not only an in-depth description of how we (re)generate an imperative AST from our polyhedral based mathematical program description, but also gives interesting insights about: - Schedule trees: A tree-based mathematical program description that enables us to perform loop transformations on an abstract level, while issues like the generation of the correct loop structure and loop bounds will be taken care of by our AST generator. - Polyhedral unrolling: We discuss techniques that allow the unrolling of non-trivial loops in the context of parameteric loop bounds, complex tile shapes and conditionally executed statements. Such unrolling support enables the generation of predicated code e.g. in the context of GPGPU computing. - Isolation for full/partial tile separation: We discuss native support for handling full/partial tile separation and -- in general -- native support for isolation of boundary cases to enable smooth code generation for core computations. - AST generation with modulo constraints: We discuss how modulo mappings are lowered to efficient C/LLVM code. - User-defined constraint sets for run-time checks We discuss how arbitrary sets of constraints can be used to automatically create run-time checks that ensure a set of constrainst actually hold. This feature is very useful to verify at run-time various assumptions that have been taken program optimization. Polyhedral AST generation is more than scanning polyhedra Tobias Grosser, Sven Verdoolaege, Albert Cohen ACM Transations on Programming Languages and Systems (TOPLAS), 37(4), July 2015 llvm-svn: 245157	2015-08-15 09:34:33 +00:00
Michael Kruse	1d3c9b54fb	Remove leftover comment The function to which this commit applies has been removed in a previous commit. llvm-svn: 244450	2015-08-10 15:07:16 +00:00
Michael Kruse	fd613545cb	[Polly] Remove dead code in IndependentBlocks Summary: The splitExitBlock function is never called. Going to replace its functionality in successive patches that do not modify the IR. Reviewers: grosser Subscribers: pollydev Projects: #polly Differential Revision: http://reviews.llvm.org/D11865 llvm-svn: 244404	2015-08-08 20:31:20 +00:00
Tobias Grosser	b241d928bd	Rewrite getPrevectorMap using schedule trees operations Schedule trees are a lot easier to work with, for both humans and machines. For humans the more structured schedule representation is easier to reason about. Together with the more abstract isl programming interface this can result in a lot cleaner code (see this changeset). For machines, the structured schedule and the fact that we now use explicit piecewise affine expressions instead of integer maps makes it easier to generate code from this schedule tree. As a result, we can already see a slight compile-time improvement -- for 3mm from 0m0.593s to 0m0.551s seconds (-7 %). More importantly, future optimizations such as full-partial tile separation will most likely result in more streamlined code to be generated. Contributed-by: Roman Gareev <gareevroman@gmail.com> llvm-svn: 243458	2015-07-28 18:03:36 +00:00
Tobias Grosser	2764794ba4	Simplify some isl expression we use Suggested-by: Sven Verdoolaege <skimo-polly@kotnet.org> llvm-svn: 243254	2015-07-26 19:22:35 +00:00
Tobias Grosser	3b10c94062	Prevectorize the schedule of the band (or the point loop in case of tiling) Contributed-by: Roman Gareev <gareevroman@gmail.com> llvm-svn: 243214	2015-07-25 12:28:56 +00:00
Tobias Grosser	808cd69a92	Use schedule trees to represent execution order of statements Instead of flat schedules, we now use so-called schedule trees to represent the execution order of the statements in a SCoP. Schedule trees make it a lot easier to analyze, understand and modify properties of a schedule, as specific nodes in the tree can be choosen and possibly replaced. This patch does not yet fully move our DependenceInfo pass to schedule trees, as some additional performance analysis is needed here. (In general schedule trees should be faster in compile-time, as the more structured representation is generally easier to analyze and work with). We also can not yet perform the reduction analysis on schedule trees. For more information regarding schedule trees, please see Section 6 of https://lirias.kuleuven.be/handle/123456789/497238 llvm-svn: 242130	2015-07-14 09:33:13 +00:00
Tobias Grosser	af4e809ca6	Remove code for scalar and PHI to array translation This removes old code that has been disabled since several weeks and was hidden behind the flags -disable-polly-intra-scop-scalar-to-array=false and -polly-model-phi-nodes=false. Earlier, Polly used to translate scalars and PHI nodes to single element arrays, as this avoided the need for their special handling in Polly. With Johannes' patches adding native support for such scalar references to Polly, this code is not needed any more. After this commit both -polly-prepare and -polly-independent are now mostly no-ops. Only a couple of simple transformations still remain, but they are scheduled for removal too. Thanks again to Johannes Doerfert for his nice work in making all this code obsolete. llvm-svn: 240766	2015-06-26 07:31:18 +00:00
Michael Kruse	c59f22c556	Update ISL to isl-0.15-3-g532568a This version adds small integer optimization, but is not active by default. It will be enabled in a later commit. The schedule-fuse=min/max option has been replaced by the serialize-sccs option. Adapting Polly was necessary, but retaining the name polly-opt-fusion=min/max. Differential Revision: http://reviews.llvm.org/D10505 Reviewers: grosser llvm-svn: 240027	2015-06-18 16:45:40 +00:00
Tobias Grosser	97d8745087	Dump YAML schedule tree as properly indented tree in DEBUG output llvm-svn: 238645	2015-05-30 06:46:59 +00:00
Tobias Grosser	3e77d14563	Add indvar pass to canonicalization sequence Running indvar before Polly is useful as this eliminates zexts as they commonly appear when a 32 bit induction variable (type int) was used on a 64 bit system. These zexts confuse our delinearization and prevent for example the successful delinearization of the nussinov kernel in polybench-c-4.1. This fixes http://llvm.org/PR23426 Suggested-by: Xing Su <xsu.llvm@outlook.com> llvm-svn: 238643	2015-05-30 06:16:41 +00:00
Tobias Grosser	b2f399264d	Update isl to 93b8e43d This update brings mostly interface cleanups, but also fixes two bugs in imath (a memory leak, some undefined behavior). llvm-svn: 238422	2015-05-28 13:32:11 +00:00
Tobias Grosser	7c3bad52dd	Use value semantics for list of ScopStmt(s) instead of std::owningptr David Blaike suggested this as an alternative to the use of owningptr(s) for our memory management, as value semantics allow to avoid the additional interface complexity caused by owningptr while still providing similar memory consistency guarantees. We could also have used a std::vector, but the use of std::vector would yield possibly changing pointers which currently causes problems as for example the memory accesses carry pointers to their parent statements. Such pointers should not change. Reviewer: jblaikie, jdoerfert Differential Revision: http://reviews.llvm.org/D10041 llvm-svn: 238290	2015-05-27 05:16:57 +00:00
Tobias Grosser	679dfafd33	Use unique_ptr to clarify ownership of ScopStmt llvm-svn: 238090	2015-05-23 05:14:09 +00:00
Tobias Grosser	ac60f4594f	Enable scalar and PHI code generation for Polly The feature itself has been committed by Johannes in r238070. As this is the way forward, we now enable it to ensure we get test coverage. Thank you Johannes for this nice work! llvm-svn: 238088	2015-05-23 03:34:41 +00:00
Tobias Grosser	1b6ea573f2	Replace low-level constraint building with higher level functions Instead of explicitly building constraints and adding them to our maps we now use functions like map_order_le to add the relevant information to the maps. llvm-svn: 237934	2015-05-21 19:02:44 +00:00
Tobias Grosser	cd524dc51d	Add explicit #includes for used isl features llvm-svn: 236931	2015-05-09 09:36:38 +00:00
Tobias Grosser	ba0d09227c	Sort include directives Upcoming revisions of isl require us to include header files explicitly, which have previously been already transitively included. Before we add them, we sort the existing includes. Thanks to Chandler for sort_includes.py. A simple, but very convenient script. llvm-svn: 236930	2015-05-09 09:13:42 +00:00
Tobias Grosser	5483931117	Rename 'scattering' to 'schedule' In Polly we used both the term 'scattering' and the term 'schedule' to describe the execution order of a statement without actually distinguishing between them. We now uniformly use the term 'schedule' for the execution order. This corresponds to the terminology of isl. History: CLooG introduced the term scattering as the generated code can be used as a sequential execution order (schedule) or as a parallel dimension enumerating different threads of execution (placement). In Polly and/or isl the term placement was never used, but we uniformly refer to an execution order as a schedule and only later introduce parallelism. When doing so we do not talk about about specific placement dimensions. llvm-svn: 235380	2015-04-21 11:37:25 +00:00
Tobias Grosser	02cf69a6ed	Make -polly-no-tiling work again llvm-svn: 234125	2015-04-05 21:52:21 +00:00
Tobias Grosser	4f6bceface	Do not scale tile loops We now generate tile loops as: for (int c1 = 0; c1 <= 47; c1 += 1) for (int c2 = 0; c2 <= 47; c2 += 1) for (int c3 = 0; c3 <= 31; c3 += 1) for (int c4 = 0; c4 <= 31; c4 += 4) #pragma simd for (int c5 = c4; c5 <= c4 + 3; c5 += 1) Stmt_for_body3(32 * c1 + c3, 32 * c2 + c5); instead of for (int c1 = 0; c1 <= 1535; c1 += 32) for (int c2 = 0; c2 <= 1535; c2 += 32) for (int c3 = 0; c3 <= 31; c3 += 1) for (int c4 = 0; c4 <= 31; c4 += 4) #pragma simd for (int c5 = c4; c5 <= c4 + 3; c5 += 1) Stmt_for_body3(c1 + c3, c2 + c5); Run-time performance-wise this makes little difference, but this gives a large reduction in compile time (10-30% on 17 LNT benchmarks). Apparently the isl AST generator is not yet very efficient in generating the latter. llvm-svn: 233675	2015-03-31 07:52:36 +00:00
Tobias Grosser	378e003748	Drop libpluto support We do not have buildbots or anything that tests this functionality, hence it most likely bitrots. People interested to use this functionality can always recover it from svn history. llvm-svn: 233570	2015-03-30 17:54:01 +00:00
Tobias Grosser	bbb4cec2e8	Use schedule trees to perform post-scheduling transformations Replacing the old band_tree based code with code that is based on the new schedule tree [1] interface makes applying complex schedule transformations a lot more straightforward. We now do not need to reason about the meaning of flat schedules, but can use a more straightforward tree structure. We do not yet exploit this a lot in the current code, but hopefully we will be able to do so soon. This change also allows us to drop some code, as isl now provides some higher level interfaces to apply loop transformations such as tiling. This change causes some small test case changes as isl uses a slightly different way to perform loop tiling, but no significant functional changes are intended. [1] http://impact.gforge.inria.fr/impact2014/papers/impact2014-verdoolaege.pdf llvm-svn: 232911	2015-03-22 12:06:39 +00:00
Tobias Grosser	442c6ccb8c	Add some missing __isl_give/__isl_keep annotations llvm-svn: 232711	2015-03-19 07:43:35 +00:00
Johannes Doerfert	7e6424ba5a	Create a dependence struct to hold dependence information for a SCoP. The new Dependences struct in the DependenceInfo holds all information that was formerly part of the DependenceInfo. It also provides the same interface for the user to access this information. This is another step to a more general ScopPass interface that does allow multiple SCoPs to be "in flight". llvm-svn: 231327	2015-03-05 00:43:48 +00:00
Johannes Doerfert	f6557f98a2	Rename the Dependences pass to DependenceInfo [NFC] We rename the Dependences pass to DependenceInfo as a first step to a caching pass policy. The new DependenceInfo pass will later provide "Dependences" for a SCoP. To keep consistency the test folder is renamed too. llvm-svn: 231308	2015-03-04 22:43:40 +00:00
Johannes Doerfert	909a3bf21d	[Refactor] Use virtual and override appropriately + Add override for overwritten methods. + Remove virtual for methods we do not want to be overwritten. llvm-svn: 230898	2015-03-01 18:42:08 +00:00
Johannes Doerfert	3fe584d64f	[Refactor] Add a Scop & as argument to printScop This is the first step in the interface simplification. llvm-svn: 230897	2015-03-01 18:40:25 +00:00
Johannes Doerfert	5079200510	Do some preparation even with scalar and phi modeling enabled llvm-svn: 230790	2015-02-27 20:38:51 +00:00

1 2

95 Commits