llvm-project

Commit Graph

Author	SHA1	Message	Date
Michael Kruse	fa7c8cdfc6	[DeLICM] Make Knowledge::Written an isl::union_map. NFC. The map will later point to a ValInst that is written. llvm-svn: 300208	2017-04-13 16:32:25 +00:00
Tobias Grosser	7b5a4dfd46	Exploit BasicBlock::getModule to shorten code Suggested-by: Roman Gareev <gareevroman@gmail.com> llvm-svn: 299914	2017-04-11 04:59:13 +00:00
Tobias Grosser	67726b3260	SAdjust to recent change in constructor definition of AllocaInst llvm-svn: 299913	2017-04-11 04:23:38 +00:00
Matt Arsenault	b3e30c32ce	Update for alloca construction changes llvm-svn: 299905	2017-04-11 00:12:58 +00:00
Roman Gareev	9d4d91ca6a	[FIX] Fix ScheduleTreeOptimizer::optimizeMatMulPattern Use new values of the dimensions during their permutation. llvm-svn: 299663	2017-04-06 17:25:08 +00:00
Roman Gareev	e0d466342b	Restore the initial ordering of dimensions before applying the pattern matching Dimensions of band nodes can be implicitly permuted by the algorithm applied during the schedule generation. For example, in case of the following matrix-matrix multiplication, for (i = 0; i < 1024; i++) for (k = 0; k < 1024; k++) for (j = 0; j < 1024; j++) C[i][j] += A[i][k] * B[k][j]; it can produce the following schedule tree domain: "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and 0 <= i2 <= 1023 }" child: schedule: "[{ Stmt_for_body6[i0, i1, i2] -> [(i0)] }, { Stmt_for_body6[i0, i1, i2] -> [(i1)] }, { Stmt_for_body6[i0, i1, i2] -> [(i2)] }]" permutable: 1 coincident: [ 1, 1, 0 ] The current implementation of the pattern matching optimizations relies on the initial ordering of dimensions. Otherwise, it can produce the miscompilation (e.g., [1]). This patch helps to restore the initial ordering of dimensions by recreating the band node when the corresponding conditions are satisfied. Refs.: [1] - https://bugs.llvm.org/show_bug.cgi?id=32500 Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D31741 llvm-svn: 299662	2017-04-06 17:09:54 +00:00
Siddharth Bhat	5eeb1dd42e	[Polly] [ScheduleOptimizer] Prevent incorrect tile size computation Because Polly exposes parameters that directly influence tile size calculations, one can setup situations like divide-by-zero. Check against a possible divide-by-zero in getMacroKernelParams and return early. Also assert at the end of getMacroKernelParams that the block sizes computed for matrices are positive (>= 1). Tags: #polly Differential Revision: https://reviews.llvm.org/D31708 llvm-svn: 299633	2017-04-06 08:20:22 +00:00
Tobias Grosser	0d622a4bf9	Update to isl-0.18-417-gb9e7334 This is a regular maintenance update. llvm-svn: 299617	2017-04-06 03:41:47 +00:00
Michael Kruse	895f5d8080	Remove llvm.lifetime.start/end in original region. The current StackColoring algorithm does not correctly handle the situation when some, but not all paths from a BB to the entry node cross a llvm.lifetime.start. According to an interpretation of the language reference at http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic this might be correct, but it would cost too much effort to handle in StackColoring. To be on the safe side, remove all lifetime markers even in the original code version (they have never been copied to the optimized version) to ensure that no path to the entry block will cross a llvm.lifetime.start. The same principle applies to paths the a function return and the llvm.lifetime.end marker, so we remove them as well. This fixes llvm.org/PR32251. Also see the discussion at http://lists.llvm.org/pipermail/llvm-dev/2017-March/111551.html llvm-svn: 299585	2017-04-05 20:09:59 +00:00
Siddharth Bhat	bcbfdade41	[Polly] [DependenceInfo] change WAR, WAW generation to correct semantics = Change of WAR, WAW generation: = - `buildFlow(Sink, MustSource, MaySource, Sink)` treates any flow of the form `sink <- may source <- must source` as a may dependence. - we used to call: ```lang=cpp, name=old-flow-call.cpp Flow = buildFlow(MustWrite, MustWrite, Read, Schedule); WAW = isl_union_flow_get_must_dependence(Flow); WAR = isl_union_flow_get_may_dependence(Flow); ``` - This caused some WAW dependences to be treated as WAR dependences. - Incorrect semantics. - Now, we call WAR and WAW correctly. == Correct WAW: == ```lang=cpp, name=new-waw-call.cpp Flow = buildFlow(Write, MustWrite, MayWrite, Schedule); WAW = isl_union_flow_get_may_dependence(Flow); isl_union_flow_free(Flow); ``` == Correct WAR: == ```lang=cpp, name=new-war-call.cpp Flow = buildFlow(Write, Read, MustaWrite, Schedule); WAR = isl_union_flow_get_must_dependence(Flow); isl_union_flow_free(Flow); ``` - We want the "shortest" WAR possible (exact dependences). - We mark all the must-writes as may-source, reads as must-souce. - Then, we ask for must dependence. - This removes all the reads that flow through a must-write before reaching a sink. - Note that we only block ealier writes with must-writes. This is intuitively correct, as we do not want may-writes to block must-writes. - Leaves us with direct (R -> W). - This affects reduction generation since RED is built using WAW and WAR. = New StrictWAW for Reductions: = - We used to call: ```lang=cpp,name=old-waw-war-call.cpp Flow = buildFlow(MustWrite, MustWrite, Read, Schedule); WAW = isl_union_flow_get_must_dependence(Flow); WAR = isl_union_flow_get_may_dependence(Flow); ``` - This is the right model of WAW we need for reductions, just not in general. - Reductions need to track only strict WAW, without any interfering reductions. = Explanation: Why the new WAR dependences in tests are correct: = - We no longer set WAR = WAR - WAW - Hence, we will have WAR dependences that were originally removed. - These may look incorrect, but in fact make sense. == Code: == ```lang=llvm, name=new-war-dependence.ll ; void manyreductions(long A) { ; for (long i = 0; i < 1024; i++) ; for (long j = 0; j < 1024; j++) ; S0: A += 42; ; ; for (long i = 0; i < 1024; i++) ; for (long j = 0; j < 1024; j++) ; S1: A += 42; ; ``` === WAR dependence: === { S0[1023, 1023] -> S1[0, 0] } - Between `S0[1023, 1023]` and `S1[0, 0]`, we will have the dependences: ```lang=cpp, name=dependence-incorrect, counterexample S0[1023, 1023]: -- tmp = A (load0)-- WAR 2 add = tmp + 42 \| -> A = add (store0) \| WAR 1 S1[0, 0]: \| tmp = A (load1) \| add = tmp + 42 \| A = add (store1)<- ``` - One may assume that WAR2 hides WAR1 (since store0 happens before store1). However, within a statement, Polly has no idea about the ordering of loads and stores. - Hence, according to Polly, the code may have looked like this: ```lang=cpp, name=dependence-correct S0[1023, 1023]: A = add (store0) tmp = A (load0) ---* add = A + 42 \| WAR 1 S1[0, 0]: \| tmp = A (load1) \| add = A + 42 \| A = add (store1) <-* ``` - So, Polly generates (correct) WAR dependences. It does not make sense to remove these dependences, since they are correct with respect to Polly's model. Reviewers: grosser, Meinersbur tags: #polly Differential revision: https://reviews.llvm.org/D31386 llvm-svn: 299429	2017-04-04 13:08:23 +00:00
Philip Pfaffe	447f175eb5	Fix formatting in LoopGenerators llvm-svn: 299424	2017-04-04 10:22:17 +00:00
Philip Pfaffe	2d950f36ee	[Polly][NewPM] Pull references to the legacy PM interface from utilities and helpers Summary: A couple of the utilities used to analyze or build IR make explicit use of the legacy PM on their interface, to access analysis results. This patch removes the legacy PM from the interface, and just passes the required results directly. This shouldn't introduce any function changes, although the API technically allowed to obtain two different analysis results before, one passed by reference and one through the PM. I don't believe that was ever intended, however. Reviewers: grosser, Meinersbur Reviewed By: grosser Subscribers: nemanjai, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D31653 llvm-svn: 299423	2017-04-04 10:01:53 +00:00
Tobias Grosser	637be04b77	[PerfMonitor] Use Intrinsics::getDeclaration Instead of creating the declaration ourselves, we obtain it directly from the LLVM intrinsic definitions. This addresses a post-review comment for r299359. Suggested-by: Hongzing Zheng <etherzhhb@gmail.com> llvm-svn: 299360	2017-04-03 15:23:08 +00:00
Tobias Grosser	65371af2e1	[CodeGen] Add Performance Monitor Add support for -polly-codegen-perf-monitoring. When performance monitoring is enabled, we emit performance monitoring code during code generation that prints after program exit statistics about the total number of cycles executed as well as the number of cycles spent in scops. This gives an estimate on how useful polyhedral optimizations might be for a given program. Example output: Polly runtime information ------------------------- Total: 783110081637 Scops: 663718949365 In the future, we might also add functionality to measure how much time is spent in optimized scops and how many cycles are spent in the fallback code. Reviewers: bollu,sebpop Tags: #polly Differential Revision: https://reviews.llvm.org/D31599 llvm-svn: 299359	2017-04-03 14:55:37 +00:00
Michael Kruse	6e7854a560	[ScopInfo] Fix typos in option description. llvm-svn: 299356	2017-04-03 12:03:38 +00:00
Tobias Grosser	696a1ee99d	[PollyIRBuilder] Bound size of alias metadata No-alias metadata grows quadratic in the size of arrays involved, which can become very costly for large programs. This commit bounds the number of arrays for which we construct no-alias information to ten. This is conservatively correct, as we just provide less information to LLVM and speeds up the compile time of one of my internal test cases from 'does-not-terminate' to 'finishes-in-less-than-a-minute'. In the future we might try to be more clever here, but this change should provide a good baseline. llvm-svn: 299352	2017-04-03 07:42:50 +00:00
Tobias Grosser	af940ae280	Update to isl-0.18-410-gc253447 This is a regular maintenance update to ensure latest isl changes are tested in our buildbots. llvm-svn: 299350	2017-04-03 06:46:16 +00:00
Huihui Zhang	d6d6a3f2ee	revert test commit r299024 llvm-svn: 299026	2017-03-29 20:23:56 +00:00
Huihui Zhang	9d19e9d232	test commit, add blank line llvm-svn: 299024	2017-03-29 20:10:45 +00:00
Michael Kruse	c3e9c1442d	[ScopInfo] Introduce ScopStmt::contains(BB*). NFC. Provide an common way for testing if a statement contains something for region and block statements. First user is RegionGenerator::addOperandToPHI. Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 298617	2017-03-23 16:12:21 +00:00
Tobias Grosser	1f7e7d3d93	Update to isl-0.18-402-ga30c537 This is a regular maintenance update. llvm-svn: 298595	2017-03-23 13:38:24 +00:00
Michael Kruse	9e4e7b467f	[DeLICM] Add const qualifiers. NFC. llvm-svn: 298546	2017-03-22 20:09:58 +00:00
Michael Kruse	174f483990	[Support] Add functions to ISLTools. Add shiftDim and convertZoneToTimepoints overloads for isl maps. Add distributeDomain, liftDomains and applyDomainRange functions. These are going to be used in https://reviews.llvm.org/D31247 (Add known array contents to Knowledge) llvm-svn: 298543	2017-03-22 19:31:06 +00:00
Michael Kruse	d07d155ebb	[DeLICM] Remove overloaded Knowledge constructor. NFC. The isl C++ bindings now has implicit conversions from isl::set to isl::union_set. Therefore the additional overload accepting isl::set is not required anymore. llvm-svn: 298529	2017-03-22 18:01:23 +00:00
Michael Kruse	29143ec3f7	[DeLICM] Remove AllElements. NFC. It is not used and will not be used (anymore) in future commits. llvm-svn: 298522	2017-03-22 17:18:39 +00:00
Roman Gareev	cdfb57dc46	Introduce another level of metadata to distinguish non-aliasing accesses Introduce another level of alias metadata to distinguish the individual non-aliasing accesses that have inter iteration alias-free base pointers marked with "Inter iteration alias-free" mark nodes. It can be used to, for example, distinguish different stores (loads) produced by unrolling of the innermost loops and, subsequently, sink (hoist) them by LICM. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30606 llvm-svn: 298510	2017-03-22 14:25:24 +00:00
Roman Gareev	23df27682a	Map the new load to the base pointer of the invariant load hoisted load Map the new load to the base pointer of the invariant load hoisted load to be able to find the alias information for it. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30605 llvm-svn: 298507	2017-03-22 13:57:53 +00:00
Siddharth Bhat	44b6cb4e63	[DependenceInfo] change name Write to MustWrite to remove ambiguity [NFC] "Write" is an overloaded term. In collectInfo() till buildFlow(), it is used to mean "must writes". However, within the memory based analysis, it is used to mean "both may and must writes". Renaming the Write variable helps clarify this difference. Reviewers: grosser Tags: #polly Differential Revision: https://reviews.llvm.org/D31181 llvm-svn: 298361	2017-03-21 11:54:08 +00:00
Tobias Grosser	29eaa16b7e	Update isl to isl-0.18-395-g77701b3 This is a normal maintenance update. llvm-svn: 298352	2017-03-21 09:12:11 +00:00
Tobias Grosser	b28f86e9e6	[CodeGen] Remove need for all parameters to be in scop context for load hoisting. When not adding constraints on parameters using -polly-ignore-parameter-bounds, the context may not necessarily list all parameter dimensions. To support code generation in this situation, we now always iterate over the actual parameter list, rather than relying on the context to list all parameter dimensions. llvm-svn: 298197	2017-03-18 23:12:49 +00:00
Tobias Grosser	1be726a40d	[IslExprBuilder] Print accessed memory locations with RuntimeDebugBuilder After this change, enabling -polly-codegen-add-debug-printing in combination with -polly-codegen-generate-expressions allows us to instrument the compiled binaries to not only print the values stored and loaded to a given memory access, but also to print the accessed location with array name and per-dimension offset: MemRef_A[3][2] Store to 6299784: 5.000000 MemRef_A[3][3] Load from 6299788: 0.000000 MemRef_A[3][3] Store to 6299788: 6.000000 This can be very helpful for debugging. llvm-svn: 298194	2017-03-18 20:54:43 +00:00
Tobias Grosser	7693b116a1	[OpenMP] Do not emit lifetime markers for context In commit r219005 lifetime markers have been introduced to mark the lifetime of the OpenMP context data structure. However, their use seems incorrect and recently caused a miscompile in ASC_Sequoia/CrystalMk after r298053 which was not at all related to r298053. r298053 only caused a change in the loop order, as this change resulted in a different isl internal representation which caused the scheduler to derive a different schedule. This change then caused the IR to change, which apparently created a pattern in which LLVM exploites the lifetime markers. It seems we are using the OpenMP context outside of the lifetime markers. Even though CrystalMk could probably be fixed by expanding the scope of the lifetime markers, it is not clear what happens in case the OpenMP function call is in a loop which will cause a sequence of starting and ending lifetimes. As it is unlikely that the lifetime markers give any performance benefit, we just drop them to remove complexity. llvm-svn: 298192	2017-03-18 20:10:07 +00:00
Siddharth Bhat	3e4a7d38ab	[ScheduleOptimiser] fix typos in top comment [NFC] coice -> choice Transations -> Transactions llvm-svn: 298095	2017-03-17 14:52:19 +00:00
Michael Kruse	89b1f94e64	Revert "Remove references to AssumptionCache. NFC." The AssumptionCache removal of r289756 has been reverted in r290086/r290087. A different solution has been implemented in r291671 which keeps the AssumptionCache. We can therefore use it again in Polly. This reverts r289791. llvm-svn: 298089	2017-03-17 13:56:53 +00:00
Siddharth Bhat	4fe11cf95f	[DependenceInfo] Remove idempotent union: must-writes with may-writes [NFC] Since may-writes are always a superset of the must-writes, there is no point in taking a union of one with the other. llvm-svn: 298085	2017-03-17 13:26:10 +00:00
Michael Kruse	9b91c62e3a	[ScopInfo/PruneUnprofitable] Move default profitability check. In the previous default ScopInfo applied the profitability heuristic for scalar accesses (-polly-unprofitable-scalar-accs=true) and the -polly-prune-unprofitable was disabled by default (-polly-enable-prune-unprofitable=false) as that pruning was already done. This changes switches the defaults to -polly-unprofitable-scalar-accs=true -polly-enable-prune-unprofitable=false such that the scalar access heuristic check is done by the pass. This allows passes between ScopInfo and PruneUnprofitable to optimize away scalar accesses. Without enabling such intermediate passes, there is no change in behaviour of profitability checks in a PassManagerBuilder built pass chain, but it allows us to cover this configuration with the buildbots. Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 298081	2017-03-17 13:10:05 +00:00
Michael Kruse	f3091bf4cf	[PruneUnprofitable] Add -polly-prune-unprofitable pass. ScopInfo's normal profitability heuristic considers SCoPs where all statements have scalar writes as not profitably optimizable and invalidate the SCoP in that case. However, -polly-delicm and -polly-simplify may be able to remove some of the scalar writes such that the flag -polly-unprofitable-scalar-accs=false allows disabling that part of the heuristic. In cases where DeLICM (or other passes after ScopInfo) are not successful in removing scalar writes, the SCoP is still not profitably optimizable. The schedule optimizer would again try computing another schedule, resulting in slower compilation. The -polly-prune-unprofitable pass applies the profitability heuristic again before the schedule optimizer Polly can still bail out even with -polly-unprofitable-scalar-accs=false. Differential Revision: https://reviews.llvm.org/D31033 llvm-svn: 298080	2017-03-17 13:09:52 +00:00
Tobias Grosser	5842dee251	[ScopInfo] Add option to not add parameter bounds to context [NFC] For experiments it is sometimes helpful to provide parameter bound information to polly and to not use these parameter bounds for simplification. Add a new option "-polly-ignore-parameter-bounds" which does precisely this. llvm-svn: 298077	2017-03-17 13:00:53 +00:00
Siddharth Bhat	db5dd14cbb	[DependenceInfo] Replace use of deprecated isl_dim_n_out [NFC] Change isl_dim_n_out to isl_map_dim(*, isl_dim_out) llvm-svn: 298075	2017-03-17 12:59:01 +00:00
Siddharth Bhat	65f3d5201e	[DependenceInfo] Track may-writes and build flow information in Dependences::calculateDependences. This ensures that we handle may-writes correctly when building dependence information. Also add a test case checking correctness of may-write information. Not handling it before was an oversight. Differential Revision: https://reviews.llvm.org/D31075 llvm-svn: 298074	2017-03-17 12:31:28 +00:00
Tobias Grosser	8a6e605e96	[ScopInfo] Do not take inbounds assumptions [NFC] For experiments it is sometimes helpful to not take any inbounds assumptions. Add a new option "-polly-ignore-inbounds" which does precisely this. llvm-svn: 298073	2017-03-17 12:26:58 +00:00
Tobias Grosser	b58ed8d3cd	[ScopInfo] Do not try to eliminate parameter dimensions that do not exist In subsequent changes we will make Polly a little bit more lazy in adding parameter dimensions to different sets. As a result, not all parameters will always be part of the parameter space. This change ensures that we do not use the '-1' returned when a parameter dimension cannot be found, but instead just do not try to eliminate the anyhow non-existing dimension. llvm-svn: 298054	2017-03-17 09:02:53 +00:00
Tobias Grosser	941cb7d979	[ScopInfo] Do not expand getDomains() to full parameter space. Since several years, isl can perform most operations on sets with differing parameter spaces, by expanding the parameter space on demand relying using named isl ids to distinguish different parameter dimensions. By not always expanding to full dimensionality the set remain smaller and can likely be operated on faster. This change by itself did not yet result in measurable performance benefits, but it is a step into the right direction needed to ensure that subsequent changes indeed can work with lower-dimensional sets and these sets do not get blown up by accident when later intersected with the domain context. llvm-svn: 298053	2017-03-17 09:02:50 +00:00
Tobias Grosser	f4fe34bfb8	Update to isl-0.18-387-g3fa6191 This is a normal / regular maintenance update. llvm-svn: 297999	2017-03-16 21:33:20 +00:00
Siddharth Bhat	65c4026992	Set Dependences::RED to be non-null once Dependences::calculateDependences() occurs, even if there is no actual reduction. This ensures correctness with isl operations. llvm-svn: 297981	2017-03-16 20:06:49 +00:00
Michael Kruse	5545407fa4	[ScopInfo] Introduce ScopStmt::getSurroundingLoop(). NFC. Introduce ScopStmt::getSurroundingLoop() to replace getFirstNonBoxedLoopFor. getSurroundingLoop() returns the precomputed surrounding/first non-boxed loop. Except in ScopDetection, the list of boxed loops is only used to get the surrounding loop. getFirstNonBoxedLoopFor also requires LoopInfo at every use which is not necessarily available everywhere where we may want to use it. Differential Revision: https://reviews.llvm.org/D30985 llvm-svn: 297899	2017-03-15 22:16:43 +00:00
Tobias Grosser	d614b3e6bd	Preserve the isl-noexceptions.h C++ bindings when updating isl The bindings currently need to be generated manually, as they are not yet part of the official isl distribution. Hence, we keep them across updates assuming they only need to be updated when new functions or functionality should be exposed. llvm-svn: 297710	2017-03-14 07:46:28 +00:00
Tobias Grosser	9c19a0e16a	Add back header file that was accidentally dropped in previous update llvm-svn: 297709	2017-03-14 07:39:05 +00:00
Tobias Grosser	593ebdfbd1	Update to isl-0.18-369-g5e613c6 This is a regular maintenance update. llvm-svn: 297708	2017-03-14 07:33:26 +00:00
Tobias Grosser	c9d4cb2f42	[ScheduleOptimizer] Allow tiling after fusion In ScheduleOptimizer::isTileableBand(), allow the case in which the band node's child is an isl_schedule_sequence_node and its grandchildren isl_schedule_leaf_nodes. This case can arise when two or more statements are fused by the isl scheduler. The tile_after_fusion.ll test has two statements in separate loop nests and checks whether they are tiled after being fused when polly-opt-fusion equals "max". Reviewers: grosser Subscribers: gareevroman, pollydev Tags: #polly Contributed-by: Theodoros Theodoridis <theodort@student.ethz.ch> Differential Revision: https://reviews.llvm.org/D30815 llvm-svn: 297587	2017-03-12 19:02:31 +00:00
Tobias Grosser	de244eb450	Possible error in doc comment If a SCoP is most probably sequential, then it's better to run it on a CPU. Hence, there's no point in running it on a GPU. Reviewers: grosser Subscribers: nemanjai Tags: #polly Contributed-by: Singapuram Sanjay <singapuram.sanjay@gmail.com> Differential Revision: https://reviews.llvm.org/D30864 llvm-svn: 297578	2017-03-12 08:19:01 +00:00
Tobias Grosser	b2347dc241	[isl++] Add missing /* implicit */ marker llvm-svn: 297577	2017-03-12 08:17:50 +00:00
Tobias Grosser	5ac963743f	[isl++] Add last set of missing isl:: prefixes to increase consistency [NFC] llvm-svn: 297558	2017-03-11 07:58:12 +00:00
Tobias Grosser	d67d368e12	[isl++] Add namespace prefixes to isl::ctx and isl::stat These were missed in r297478. We add them for consistency. llvm-svn: 297520	2017-03-10 22:10:19 +00:00
Tobias Grosser	30a06dce68	[isl++] Drop warning about experimental status As most discussions about these bindings have concluded and only the final patch review on the isl mailing list is missing, we drop the experimental warning tag to match the patchset we will submit to isl, which is expected to not change notably any more. llvm-svn: 297519	2017-03-10 22:10:15 +00:00
Tobias Grosser	9839774e5d	[isl++] Do not use enum prefix Instead of declaring a function as: inline val plain_get_val_if_fixed(enum dim type, unsigned int pos) const; we use: inline isl::val plain_get_val_if_fixed(isl::dim type, unsigned int pos) const; The first argument caused the following compile time error on windows: "error C3431: 'dim': a scoped enumeration cannot be redeclared as an unscoped enumeration" In some cases it is sufficient to just drop the 'enum' prefix, but for example for isl::set the 'enum class dim' type collides with the function name isl::set::dim and can consequently not be referenced. To avoid such kind of ambiguities in the future we add the isl:: prefix consistently to all types used. Reported-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 297478	2017-03-10 17:01:30 +00:00
Michael Kruse	0446d81e2d	[Simplify] Add -polly-simplify pass. This new pass removes unnecessary accesses and writes. It currently supports 2 simplifications, but more are planned. It removes write accesses that write a loaded value back to the location it was loaded from. It is a typical artifact from DeLICM. Removing it will get rid of bogus dependencies later in dependency analysis. It also removes statements without side-effects. ScopInfo already removes these, but the removal of unnecessary writes can result in more side-effect free statements. Differential Revision: https://reviews.llvm.org/D30820 llvm-svn: 297473	2017-03-10 16:05:24 +00:00
Tobias Grosser	3e618c33fe	[DeadCodeElimination] Translate to C++ bindings This pass is a small and self-contained example of a piece of code that was written with the isl C interface. The diff of this change nicely shows how the C++ bindings can improve the readability of the code by avoiding the long C function names and by avoiding any need for memory management. As you will see, no calls to isl__copy or isl__free are needed anymore. Instead the C++ interface takes care of automatically managing the objects. This may introduce internally additional copies, but due to the isl reference counting, such copies are expected to be cheap. For performance critical operations, we will later exploit move semantics to eliminate unnecessary copies that have shown to be costly. Below we give a set of examples that shows the benefit of the C++ interface vs. the pure C interface. Check properties ---------------- Before: if (isl_aff_is_zero(aff) \|\| isl_aff_is_one(aff)) return true; After: if (Aff.is_zero() \|\| Aff.is_one()) return true; Type conversion --------------- Before: isl_union_pw_multi_aff UPMA = isl_union_pw_multi_aff_from_union_map(umap); After: isl::union_pw_multi_aff UPMA = UMap; Type construction ----------------- Before: auto Empty = isl_union_map_empty(space); After: auto Empty = isl::union_map::empty(Space); Operations ---------- Before: set = isl_union_set_intersect(set, set2); After: Set = Set.intersect(Set2); The use of isl::boolean in return types also adds an increases the robustness of Polly, as on conversion to true or false, we verify that no isl_bool_error has been returned and assert in case an error was returned. Before this change we would have just ignored the error and proceeded with (some) exection path. Tags: #polly Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D30619 llvm-svn: 297466	2017-03-10 15:05:38 +00:00
Tobias Grosser	51ebda8c9d	[FlattenAlgo] Translate to C++ bindings Translate the full algorithm to use the new isl C++ bindings This is a large piece of code that has been written with the Polly IslPtr<> memory management tool, which only performed memory management, but did not provide a method interface. As such the code was littered with calls to give(), copy(), keep(), and take(). The diff of this change should give a good example how the new method interface simplifies the code by removing the need for switching between managed types and C functions all the time and consequently also the need to use the long C function names. These are a couple of examples comparing the old IslPtr memory management interface with the complete method interface. Check properties ---------------- Before: if (isl_aff_is_zero(Aff.get()) \|\| isl_aff_is_one(Aff.get())) return true; After: if (Aff.is_zero() \|\| Aff.is_one()) return true; Type conversion --------------- Before: isl_union_pw_multi_aff *UPMA = give(isl_union_pw_multi_aff_from_union_map(UMap.copy()); After: isl::union_pw_multi_aff UPMA = UMap; Type construction ----------------- Before: auto Empty = give(isl_union_map_empty(Space.copy()); After: auto Empty = isl::union_map::empty(Space); Operations ---------- Before: Set = give(isl_union_set_intersect(Set.copy(), Set2.copy()); After: Set = Set.intersect(Set2); Tags: #polly Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D30617 llvm-svn: 297463	2017-03-10 14:55:58 +00:00
Tobias Grosser	4c24e57965	Add method interface to isl C++ bindings The isl C++ binding method interface introduces a thin C++ layer that allows to call isl methods directly on the memory managed C++ objects. This makes the relevant methods directly available via code-completion interfaces, allows for the use of overloading, conversion constructors, and many other nice C++ features that make using isl a lot easier. The individual features will be highlighted in the subsequent commits. Tags: #polly Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D30616 llvm-svn: 297462	2017-03-10 14:53:00 +00:00
Tobias Grosser	deaef15f52	Introduce isl C++ bindings, Part 1: value_ptr style interface Over the last couple of months several authors of independent isl C++ bindings worked together to jointly design an official set of isl C++ bindings which combines their experience in developing isl C++ bindings. The new bindings have been designed around a value pointer style interface and remove the need for explicit pointer managenent and instead use C++ language features to manage isl objects. This commit introduces the smart-pointer part of the isl C++ bindings and replaces the current IslPtr<T> classes, which served the very same purpose, but had to be manually maintained. Instead, we now rely on automatically generated classes for each isl object, which provide value_ptr semantics. An isl object has the following smart pointer interface: inline set manage(__isl_take isl_set ptr); class set { friend inline set manage(__isl_take isl_set ptr); isl_set ptr = nullptr; inline explicit set(__isl_take isl_set ptr); public: inline set(); inline set(const set &obj); inline set &operator=(set obj); inline ~set(); inline __isl_give isl_set copy() const &; inline __isl_give isl_set copy() && = delete; inline __isl_keep isl_set get() const; inline __isl_give isl_set release(); inline bool is_null() const; } The interface and behavior of the new value pointer style classes is inspired by http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3339.pdf, which proposes a std::value_ptr, a smart pointer that applies value semantics to its pointee. We currently only provide a limited set of public constructors and instead require provide a global overloaded type constructor method "isl::obj isl::manage(isl_obj )", which allows to convert an isl_set to an isl::set by calling 'S = isl::manage(s)'. This pattern models the make_unique() constructor for unique pointers. The next two functions isl::obj::get() and isl::obj::release() are taken directly from the std::value_ptr proposal: S.get() extracts the raw pointer of the object managed by S. S.release() extracts the raw pointer of the object managed by S and sets the object in S to null. We additionally add std::obj::copy(). S.copy() returns a raw pointer refering to a copy of S, which is a shortcut for "isl::obj(oldobj).release()", a functionality commonly needed when interacting directly with the isl C interface where all methods marked with __isl_take require consumable raw pointers. S.is_null() checks if S manages a pointer or if the managed object is currently null. We add this function to provide a more explicit way to check if the pointer is empty compared to a direct conversion to bool. This commit also introduces a couple of polly-specific extensions that cover features currently not handled by the official isl C++ bindings draft, but which have been provided by IslPtr<T> and are consequently added to avoid code churn. These extensions include: - operator bool() : Conversion from objects to bool - construction from nullptr_t - get_ctx() method - take/keep/give methods, which match the currently used naming convention of IslPtr<T> in Polly. They just forward to (release/get/manage). - raw_ostream printers We expect that these extensions are over time either removed or upstreamed to the official isl bindings. We also export a couple of classes that have not yet been exported in isl (e.g., isl::space) As part of the code review, the following two questions were asked: - Why do we not use a standard smart pointer? std::value_ptr was a proposal that has not been accepted. It is consequently not available in the standard library. Even if it would be available, we want to expand this interface with a complete method interface that is conveniently available from each managed pointer. The most direct way to achieve this is to generate a specialiced value style pointer class for each isl object type and add any additional methods to this class. The relevant changes follow in subsequent commits. - Why do we not use templates or macros to avoid code duplication? It is certainly possible to use templates or macros, but as this code is auto-generated there is no need to make writing this code more efficient. Also, most of these classes will be specialized with individual member functions in subsequent commits, such that there will be little code reuse to exploit. Hence, we decided to do so at the moment. These bindings are not yet officially part of isl, but the draft is already very stable. The smart pointer interface itself did not change since serveral months. Adding this code to Polly is against our normal policy of only importing official isl code. In this case however, we make an exception to showcase a non-trivial use case of these bindings which should increase confidence in these bindings and will help upstreaming them to isl. Tags: #polly Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D30325 llvm-svn: 297452	2017-03-10 11:41:03 +00:00
Tobias Grosser	e5671e54c0	Update to isl-0.18-356-g0b05d01 This is a regular maintenance update. llvm-svn: 297449	2017-03-10 09:17:55 +00:00
Michael Kruse	e4292bf086	[Support] Add -polly-dump-module pass. This pass allows writing the LLVM-IR just before and after the Polly passes to a file. Dumping the IR before Polly helps reproducing bugs that occur in code generated by clang. It is the only reliable way to get the IR that triggers a bug. The alternative is to emit the IR with clang -c -emit-llvm -S -o dump.ll then pass it through all optimization passes opt dump.ll -basicaa -sroa ... -S -o optdump.ll to then reproduce the error with opt optdump.ll -polly-opt-isl -polly-codegen -analyze However, the IR is not the same. -O3 uses a PassBuilder than creates passes with different parameters than the default. Dumping the IR after Polly is useful to compare a miscompilation with a known-good configuration. Differential Revision: https://reviews.llvm.org/D30788 llvm-svn: 297415	2017-03-09 22:29:58 +00:00
Tobias Grosser	8bd7f3c0a5	[ScopDetect/Info] Allow unconditional hoisting of loads from dereferenceable ptrs In case LLVM pointers are annotated with !dereferencable attributes/metadata or LLVM can look at the allocation from which a pointer is derived, we can know that dereferencing pointers is safe and can be done unconditionally. We use this information to proof certain pointers as save to hoist and then hoist them unconditionally. llvm-svn: 297375	2017-03-09 11:36:00 +00:00
Michael Kruse	9fb3ab1b19	[DeLICM] Add -polly-delicm-overapproximate-writes option. One of the current limitations of DeLICM is that it only creates PHI WRITEs that it knows are read by some PHI. Such writes may not span all instances of a statement. Polly's code generator currently does not support MemoryAccesses that are not executed in all instances ('partial accesses') and so has to give up on a possible mapping. This workaround has once been suggested by Tobias Grosser: Try to interpolate an arbitrary expansion to all instances. It will be checked for possible conflicts with the existing Knowledge and can be applied if the conflict checking result is that no semantics are changed. Expansion is done by simplifying the mapping by coalescing with the hope that coalescing will find a polyhedral 'rule' of the relevant map. It is then 'gist'-ed using the domain of the relevant instances such that the rule is expanded to the universe and finally intersected with the domain of all statement instances. The expansion makes conflicts become more likely, the found rule may still not encompass all statement instances and the found rule exposes internals of isl's implementation of coalesce and gist. The latter means that the result depends on how much effort the implementation invests into finding a rule which may change between versions of isl. Trivial implementations of gist and coalesce just return the input arguments. A patch that makes codegen support partial accesses is in preparation as well. Differential Revision: https://reviews.llvm.org/D30763 llvm-svn: 297373	2017-03-09 11:23:22 +00:00
Michael Kruse	935b2a3654	[DeadCodeElim] Put -polly-dce-precise-steps into the Polly category. llvm-svn: 297318	2017-03-08 23:25:35 +00:00
Michael Kruse	6744efa8d8	[ScopDetection] Only allow SCoP-wide available base pointers. Simplify ScopDetection::isInvariant(). Essentially deny everything that is defined within the SCoP and is not load-hoisted. The previous understanding of "invariant" has a few holes: - Expressions without side-effects with only invariant arguments, but are defined withing the SCoP's region with the exception of selects and PHIs. These should be part of the index expression derived by ScalarEvolution and not of the base pointer. - Function calls with that are !mayHaveSideEffects() (typically functions with "readnone nounwind" attributes). An example is given below. @C = external global i32 declare float* @getNextBasePtr(float) readnone nounwind ... %ptr = call float @getNextBasePtr(float* %A, float %B) The call might return: * %A, so %ptr aliases with it in the SCoP * %B, so %ptr aliases with it in the SCoP * @C, so %ptr aliases with it in the SCoP * a new pointer everytime it is called, such as malloc() * a pointer into the allocated block of one of the aforementioned * any of the above, at random at each call Hence and contrast to a comment in the base_pointer.ll regression test, %ptr is not necessarily the same all the time. It might also alias with anything and no AliasAnalysis can tell otherwise if the definition is external. It is hence not suitable in the role of a base pointer. The practical problem with base pointers defined in SCoP statements is that it is not available globally in the SCoP. The statement instance must be executed first before the base pointer can be used. This is no problem if the base pointer is transferred as a scalar value between statements. Uses of MemoryAccess::setNewAccessRelation may add a use of the base pointer anywhere in the array. setNewAccessRelation is used by JSONImporter, DeLICM and D28518. Indeed, BlockGenerator currently assumes that base pointers are available globally and generates invalid code for new access relation (referring to the base pointer of the original code) if not, even if the base pointer would be available in the statement. This could be fixed with some added complexity and restrictions. The ExprBuilder must lookup the local BBMap and code that call setNewAccessRelation must check whether the base pointer is available first. The code would still be incorrect in the presence of aliasing. There is the switch -polly-ignore-aliasing to explicitly allow this, but it is hardly a justification for the additional complexity. It would still be mostly useless because in most cases either getNextBasePtr() has external linkage in which case the readnone nounwind attributes cannot be derived in the translation unit itself, or is defined in the same translation unit and gets inlined. Reviewed By: grosser Differential Revision: https://reviews.llvm.org/D30695 llvm-svn: 297281	2017-03-08 15:14:46 +00:00
Michael Kruse	5a4ec5c42b	[ScopDetection] Require LoadInst base pointers to be hoisted. Only when load-hoisted we can be sure the base pointer is invariant during the SCoP's execution. Most of the time it would be added to the required hoists for the alias checks anyway, except with -polly-ignore-aliasing, -polly-use-runtime-alias-checks=0 or if AliasAnalysis is already sure it doesn't alias with anything (for instance if there is no other pointer to alias with). Two more parts in Polly assume that this load-hoisting took place: - setNewAccessRelation() which contains an assert which tests this. - BlockGenerator which would use to the base ptr from the original code if not load-hoisted (if the access expression is regenerated) Differential Revision: https://reviews.llvm.org/D30694 llvm-svn: 297195	2017-03-07 20:28:43 +00:00
Tobias Grosser	a0b85963ba	Update isl to isl-0.18-336-g1e193d9 This is a regular maintenance update llvm-svn: 297169	2017-03-07 17:53:34 +00:00
Tobias Grosser	ce69e7b593	[ScopInfo] Avoid infinite loop during schedule construction Our current scop modeling enters an infinite loop when trying to model code that has unreachable instructions (e.g., test/ScopInfo/BoundChecks/single-loop.ll), as the number of basic blocks returned by the LLVM Loop* does not include unreachable basic blocks that branch off from the core loop body. This arises for example in the following piece of code: for (i = 0; i < N; i++) { if (i > 1024) abort(); <- this abort might be translated to an unreachable A[i] = ... } This patch adds these unreachable basic blocks in our per loop basic block count to ensure that the schedule construction does not assume a loop has been processed completely, despite certain unreachable basic blocks still remaining. The infinite loop is only observable in combination with https://reviews.llvm.org/D12676 or a similar patch. llvm-svn: 297156	2017-03-07 16:17:55 +00:00
Tobias Grosser	134a572951	[ScopDetection] Do not detect scops that exit to an unreachable Scops that exit with an unreachable are today still permitted, but make little sense to optimize. We therefore can already skip them during scop detection. This speeds up scop detection in certain cases and also ensures that bugpoint does not introduce unreachables when reducing test cases. In practice this change should have little impact, as the performance of unreachable code is unlikely to matter. This commit is part of a series that makes Polly more robust in the presence of unreachables. llvm-svn: 297151	2017-03-07 15:50:43 +00:00
Tobias Grosser	1c787e0b49	[ScopDetection] Do not allow required-invariant loads in non-affine region These loads cannot be savely hoisted as the condition guarding the non-affine region cannot be duplicated to also protect the hoisted load later on. Today they are dropped in ScopInfo. By checking for this early, we do not even try to model them and possibly can still optimize smaller regions not containing this specific required-invariant load. llvm-svn: 296744	2017-03-02 12:15:37 +00:00
Tobias Grosser	c2f151084d	[ScopInfo] Disable memory folding in case it results in multi-disjunct relations Multi-disjunct access maps can easily result in inbound assumptions which explode in case of many memory accesses and many parameters. This change reduces compilation time of some larger kernel from over 15 minutes to less than 16 seconds. Interesting is the test case test/ScopInfo/multidim_param_in_subscript.ll which has a memory access [n] -> { Stmt_for_body3[i0, i1] -> MemRef_A[i0, -1 + n - i1] } which requires folding, but where only a single disjunct remains. We can still model this test case even when only using limited memory folding. For people only reading commit messages, here the comment that explains what memory folding is: To recover memory accesses with array size parameters in the subscript expression we post-process the delinearization results. We would normally recover from an access A[exp0(i) * N + exp1(i)] into an array A[][N] the 2D access A[exp0(i)][exp1(i)]. However, another valid delinearization is A[exp0(i) - 1][exp1(i) + N] which - depending on the range of exp1(i) - may be preferrable. Specifically, for cases where we know exp1(i) is negative, we want to choose the latter expression. As we commonly do not have any information about the range of exp1(i), we do not choose one of the two options, but instead create a piecewise access function that adds the (-1, N) offsets as soon as exp1(i) becomes negative. For a 2D array such an access function is created by applying the piecewise map: [i,j] -> [i, j] : j >= 0 [i,j] -> [i-1, j+N] : j < 0 After this patch we generate only the first case, except for situations where we can proove the first case to be invalid and can consequently select the second without introducing disjuncts. llvm-svn: 296679	2017-03-01 21:11:27 +00:00
Tobias Grosser	24222c7357	Fix namespaces after clang-format update llvm-svn: 296635	2017-03-01 15:54:27 +00:00
Tobias Grosser	d7c4975349	[ScopInfo] Simplify inbounds assumptions under domain constraints Without this simplification for a loop nest: void foo(long n1_a, long n1_b, long n1_c, long n1_d, long p1_b, long p1_c, long p1_d, float A_1[][p1_b][p1_c][p1_d]) { for (long i = 0; i < n1_a; i++) for (long j = 0; j < n1_b; j++) for (long k = 0; k < n1_c; k++) for (long l = 0; l < n1_d; l++) A_1[i][j][k][l] += i + j + k + l; } the assumption: n1_a <= 0 or (n1_a > 0 and n1_b <= 0) or (n1_a > 0 and n1_b > 0 and n1_c <= 0) or (n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d <= 0) or (n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d > 0 and p1_b >= n1_b and p1_c >= n1_c and p1_d >= n1_d) is taken rather than the simpler assumption: p9_b >= n9_b and p9_c >= n9_c and p9_d >= n9_d. The former is less strict, as it allows arbitrary values of p1_* in case, the loop is not executed at all. However, in practice these precise constraints explode when combined across different accesses and loops. For now it seems to make more sense to take less precise, but more scalable constraints by default. In case we find a practical example where more precise constraints are needed, we can think about allowing such precise constraints in specific situations where they help. This change speeds up the new test case from taking very long (waited at least a minute, but it probably takes a lot more) to below a second. llvm-svn: 296456	2017-02-28 09:45:54 +00:00
Tobias Grosser	cf66ea3845	Update isl to isl-0.18-304-g1efe43d This is a normal maintenance update. llvm-svn: 296441	2017-02-28 07:06:06 +00:00
Michael Kruse	6469380daa	[Cmake] Optionally use a system isl version. This patch adds an option to build against a version of libisl already installed on the system. The installation is autodetected using the pkg-config file shipped with isl. The detection of the library is in the FindISL.cmake module that creates an imported target. Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com> Differential Revision: https://reviews.llvm.org/D30043 llvm-svn: 296361	2017-02-27 17:54:25 +00:00
Michael Kruse	b295c37a15	[DeLICM] Statistics for use in regression tests. Print some measurements of the DeLICM transformation at -analyze to be used in regression tests. llvm-svn: 296347	2017-02-27 15:53:13 +00:00
Roman Gareev	bc3fbe49c5	Disable the parallel code generation in case of extension nodes We can not perform the dependence analysis and, consequently, the parallel code generation in case the schedule tree contains extension nodes. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30394 llvm-svn: 296325	2017-02-27 08:03:11 +00:00
Michael Kruse	e199f285b0	[DeLICM] Fortify against exceeding isl's max operations counter. Control flow would flow-through after the check whether the operations quota exceeded, with the intention that it would later be caught by Knowledge::isUsable(). However, the Knowledge constructor has its own assertions to check consistency which would fail if its fields have only been initialized partially because some sets have been computed correctly before the operations quota takes effect. Fix by erroring-out early instead of falling-throught into the code that might expect that everything has been computed correctly. For robustness, also bail-out if any of the fields contain nullptr values instead of relying on isl always setting exactly this error code if something went wrong. This should fix the perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable (-polly-process-unprofitable -polly-position=before-vectorizer -polly-enable-delicm) buildbot. llvm-svn: 296022	2017-02-23 21:58:20 +00:00
Michael Kruse	f4e201e09f	[Support] Remove NonowningIslPtr. NFC. NonowningIslPtr<isl_X> was used as types of function parameters when the function does not consume the isl object, i.e. an __isl_keep parameter. The alternatives are: 1. IslPtr<isl_X> This has additional calls to isl_X_copy and isl_X_free to increase/decrease the reference counter even though not needed. The caller already owns a reference to the isl object. 2. const IslPtr<isl_X>& This does not change the reference counter, but requires an additional load to get the pointer to the isl object (instead of just passing the pointer itself). Moreover, the compiler cannot rely on the constness of the pointer and has to reload the pointer every time it writes to memory (unless alias analysis such as TBAA says it is not possible). The isl C++ bindings currently in development do not have an equivalent to NonowningIslPtr and adding one would make the binding more complicated and its advantage in performance is small. In order to simplify the transition to these C++ bindings, remove NonowningIslPtr. Change every former use of it to alternative 2 mentioned aboce (const IslPtr<isl_X>&). llvm-svn: 295998	2017-02-23 17:57:27 +00:00
Michael Kruse	2c7169d00c	[DependenceInfo] Remove unused variable. NFC. llvm-svn: 295987	2017-02-23 15:41:01 +00:00
Michael Kruse	dd6f29375b	[DependenceInfo] Use references instead of double pointers. NFC. Non-const references are the more C++-ish way to modify a variable passed by the caller. llvm-svn: 295986	2017-02-23 15:40:56 +00:00
Michael Kruse	ec8fc32160	[DependenceInfo] Rename StmtScheduleDomain -> TaggedStmtDomain. NFC. llvm-svn: 295985	2017-02-23 15:40:52 +00:00
Michael Kruse	00c38e0df2	[DependenceInfo] Simplify use of StmtSchedule's domain [NFC] Once a StmtSchedule is created, only its domain is used anywhere within DependenceInfo::calculateDependences. So, we choose to return the wrapped domain of the union_map rather than the entire union_map. However, we still build the union_map first within collectInfo(). It is cleaner to first build the entire union_map and then pull the domain out in one shot, rather than repeatedly extracting the domain in bits and pieces from accdom. Contributed-by: Siddharth Bhat <siddu.druid@gmail.com> Differential Revision: https://reviews.llvm.org/D30208 llvm-svn: 295984	2017-02-23 15:40:46 +00:00
Michael Kruse	52ab4943b4	Remove all references to PostDominators. NFC. Marking a pass as preserved is necessary if any Polly pass uses it, even if it is not preserved within the generated code. Not marking it would cause the the Polly pass chain to be interrupted. It is not used by any Polly pass anymore, hence we can remove all references to it. llvm-svn: 295983	2017-02-23 15:16:22 +00:00
Michael Kruse	9f519714b3	[DeLICM] Add missing Doxygen comment. NFC. llvm-svn: 295978	2017-02-23 14:51:50 +00:00
Michael Kruse	311ecb00dc	[DeLICM] Capitalize parameter name. NFC. llvm-svn: 295977	2017-02-23 14:51:45 +00:00
Tobias Grosser	59d23bbdc6	Update isl to isl-0.18-282-g12465a5 Besides a variety of smaller cleanups, this update also contains a correctness fix to isl coalesce which resolves a crash in Polly. llvm-svn: 295966	2017-02-23 12:48:42 +00:00
Roman Gareev	96e1119a96	Make optimizations based on pattern matching be enabled by default Currently, pattern based optimizations of Polly can identify matrix multiplication and optimize it according to BLIS matmul optimization pattern (see ScheduleTreeOptimizer for details). This patch makes optimizations based on pattern matching be enabled by default. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30293 llvm-svn: 295958	2017-02-23 11:44:12 +00:00
Michael Kruse	d8d32bb3d1	[DeLICM] Regression test for skipping map targets. Add optimization-remarks-missed for when mapping targets have been skipped and add regression tests for them. llvm-svn: 295953	2017-02-23 10:25:20 +00:00
Michael Kruse	deb30e8278	[DeLICM] Add regression tests for DeLICM reject cases. These tests were not included in the main DeLICM commit. These check the cases where zone analysis cannot be successful because of assumption violations. We use the LLVM optimization remark infrastructure as it seems to be the best fit for this kind of messages. I tried to make use if the OptimizationRemarkEmitter. However, it would insert additional function passes into the pass manager to get the hotness information. The pass manager would insert them between the flatten pass and delicm, causing the ScopInfo with the flattened schedule being thrown away. Differential Revision: https://reviews.llvm.org/D30253 llvm-svn: 295846	2017-02-22 15:14:08 +00:00
Michael Kruse	8474470500	[DeLICM] Fix wrong comment. NFC. Correct a comment that claimed that a store after load was detected when the code checks a load after a store. llvm-svn: 295835	2017-02-22 14:14:40 +00:00
Michael Kruse	43ed25f1d9	[DeLICM] Print message when zone analysis is not available on -analysis. This is to distinguish the cases that analysis has failed from the case where not transformation was performed. llvm-svn: 295833	2017-02-22 13:48:35 +00:00
Michael Kruse	91cdafb86f	[DeLICM] Use opt<int>. There is no template specialization for cl::parser<unsigned long> such that parsing an cl::opt<unsigned long> command line argument will fail. Use opt<int> instead which has an associated parser. llvm-svn: 295832	2017-02-22 13:48:18 +00:00
Tobias Grosser	cc43087afc	[DependenceInfo] Simplify creation and subsequent use of AccessSchedule [NFC] We only ever use the wrapped domain of AccessSchedule, so stop creating an entire union_map and then pulling the domain out. Reviewers: grosser Tags: #polly Contributed-by: Siddharth Bhat <siddu.druid@gmail.com> Differential Revision: https://reviews.llvm.org/D30179 llvm-svn: 295726	2017-02-21 15:38:31 +00:00
Michael Kruse	9e52c39f0a	[DeLICM] Map values hoisted by LICM back to the array. Implement the -polly-delicm pass. The pass intends to undo the effects of LoopInvariantCodeMotion (LICM) which adds additional scalar dependencies into SCoPs. DeLICM will try to map those scalars back to the array elements they were promoted from, as long as the array element is unused. The is the main patch from the DeLICM/DePRE patch series. It does not yet undo GVN PRE for which additional information about known values is needed and does not handle PHI write accesses that have have no target. As such its usefulness is limited. Patches for these issues including regression tests for error situatons will follow. Reviewers: grosser Differential Revision: https://reviews.llvm.org/D24716 llvm-svn: 295713	2017-02-21 10:20:54 +00:00
Michael Kruse	5ab24fdb73	[Cmake] Install the isl headers into the install tree. isl headers are currently missing in a Polly installation. Because the Polly headers depend on those, code can't be compiled against an installed Polly. This patch installs the isl headers. I left a TODO, as optionally it should be possible to use a system version of isl instead of the one shipped with Polly. When compiling, clients of the installation need to add -I${PREFIX}/include/polly/ to there include path right now, because there currently is no way to export this path automatically. Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com> Differential Revision: https://reviews.llvm.org/D29931 llvm-svn: 295671	2017-02-20 16:57:14 +00:00
Tobias Grosser	079d511891	[ScopInfo] Count read-only arrays when computing complexity of alias check Instead of counting the number of read-only accesses, we now count the number of distinct read-only array references when checking if a run-time alias check may be too complex. The run-time alias check is quadratic in the number of base pointers, not the number of accesses. Before this change we accidentally skipped SPEC's lbm test case. llvm-svn: 295567	2017-02-18 20:51:29 +00:00
Tobias Grosser	28492b85e2	[DependenceInfo] Pull out statement [NFC] This simplifies the code slightly. llvm-svn: 295551	2017-02-18 16:41:28 +00:00
Tobias Grosser	8ee46985d2	[Dependences] Compute reduction dependences on schedule tree [NFC] This change gets rid of the need for zero padding, makes the reduction computation code more similar to the normal dependence computation, and also better documents what we do at the moment. Making the dependence computation for reductions a little bit easier to understand will hopefully help us to further reduce code duplication. This reduces the time spent only in the reduction dependence pass from 260ms to 150ms for test/DependenceInfo/reduction_sequence.ll. This is a reduction of over 40% in dependence computation time. This change was inspired by discussions with Michael Kruse, Utpal Bora, Siddharth Bhat, and Johannes Doerfert. It can hopefully lay the base for further cleanups of the reduction code. llvm-svn: 295550	2017-02-18 16:39:04 +00:00
Tobias Grosser	2461021150	Drop leftover debug statement llvm-svn: 295444	2017-02-17 13:39:45 +00:00
Tobias Grosser	cd01a363d6	[ScopInfo] Add statistics to count loops after scop modeling llvm-svn: 295431	2017-02-17 08:12:36 +00:00
Tobias Grosser	65ce9362b8	[ScopDetection] Compute the maximal loop depth correctly Before this change, we obtained loop depth numbers that were deeper then the actual loop depth. llvm-svn: 295430	2017-02-17 08:08:54 +00:00
Tobias Grosser	72745c2ef5	Updated isl to isl-0.18-254-g6bc184d This update includes a couple more coalescing changes as well as a large number of isl-internal code cleanups (dead assigments, ...). llvm-svn: 295419	2017-02-17 05:11:16 +00:00
Tobias Grosser	ca2cfd0bd8	[ScopInfo] Do not try to fold array dimensions of size zero Trying to fold such kind of dimensions will result in a division by zero, which crashes the compiler. As such arrays are likely to invalidate the scop anyhow (but are not illegal in LLVM-IR), there is no point in trying to optimize the array layout. Hence, we just avoid the folding of constant dimensions of size zero. llvm-svn: 295415	2017-02-17 04:48:52 +00:00
Tobias Grosser	90411a967b	[ScopInfo] Rename MaxDisjunctions -> MaxDisjuncts [NFC] There is only a single disjunction. However, we bound the number of 'disjuncts' in this disjunction. Name the variable accordingly. llvm-svn: 295362	2017-02-16 19:11:33 +00:00
Tobias Grosser	c8a8276710	[ScopInfo] Bound the number of disjuncts in context Before this change wrapping range metadata resulted in exponential growth of the context, which made context construction of large scops very slow. Instead, we now just do not model the range information precisely, in case the number of disjuncts in the context has already reached a certain limit. llvm-svn: 295360	2017-02-16 19:11:25 +00:00
Tobias Grosser	98a3aa4f19	[ScopInfo] Use uppercase variable name [NFC] llvm-svn: 295350	2017-02-16 18:39:18 +00:00
Tobias Grosser	3281f601bb	[ScopInfo] Always derive upper and lower bounds for parameters Commit r230230 introduced the use of range metadata to derive bounds for parameters, instead of just looking at the type of the parameter. As part of this commit support for wrapping ranges was added, where the lower bound of a parameter is larger than the upper bound: { 255 < p \|\| p < 0 } However, at the same time, for wrapping ranges support for adding bounds given by the size of the containing type has acidentally been dropped. As a result, the range of the parameters was not guaranteed to be bounded any more. This change makes sure we always add the bounds given by the size of the type and then additionally add bounds based on signed wrapping, if available. For a parameter p with a type size of 32 bit, the valid range is then: { -2147483648 <= p <= 2147483647 and (255 < p or p < 0) } llvm-svn: 295349	2017-02-16 18:39:14 +00:00
Roman Gareev	4eb07e481e	[FIX] Fix the typo in ScheduleOptimizer.cpp. llvm-svn: 295292	2017-02-16 07:04:41 +00:00
Michael Kruse	e23e94a08d	[DeLICM] Add Knowledge class. NFC. The Knowledge class remembers the state of data at any timepoint of a SCoP's execution. Currently, it tracks whether an array element is unused or is occupied by some value, and the writes to it. A future addition will be to also remember which value it contains. Objects are used to determine whether two Knowledge contain conflicting information, i.e. two states cannot be true a the same time. This commit was extracted from the DeLICM algorithm at https://reviews.llvm.org/D24716. llvm-svn: 295197	2017-02-15 16:59:10 +00:00
Tobias Grosser	288c450cf6	[ScopDetectDiagnostics] Do not format unnamed array names Formatting unnamed array names is expensive in LLVM as the this requires deriving the numbered virtual instruction name (e.g., %12) for an llvm::Value, which is currently not implemented efficiently. As instruction numberes anyhow do not really carry a lot of information for the user, we just print 'unknown' instead. This change reduces the scop detection time from 24 to 19 seconds, for one of our large-scale inputs. This is a reduction by 21%. llvm-svn: 294894	2017-02-12 10:53:02 +00:00
Tobias Grosser	9fe37df27c	[ScopDetection] Add statistics to count the maximal number of scops in loop llvm-svn: 294893	2017-02-12 10:52:57 +00:00
Tobias Grosser	b3a85884f7	Do not use wrapping ranges to bound non-affine accesses When deriving the range of valid values of a scalar evolution expression might be a range [12, 8), where the upper bound is smaller than the lower bound and where the range is expected to possibly wrap around. We theoretically could model such a range as a union of two non-wrapping ranges, but do not do this as of yet. Instead, we just do not derive any bounds. Before this change, we could have obtained bounds where the maximal possible value is strictly smaller than the minimal possible value, which is incorrect and also caused assertions during scop modeling. llvm-svn: 294891	2017-02-12 08:11:12 +00:00
Roman Gareev	b196055c0c	Check reduction dependencies in case of the matrix multiplication optimization To determine parameters of the matrix multiplication, we check RAW dependencies that can be expressed using only reduction dependencies. Consequently, we should check the reduction dependencies, if this is the case. Reviewed-by: Tobias Grosser <tobias@grosser.es>, Sven Verdoolaege <skimo-polly@kotnet.org> Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D29814 llvm-svn: 294836	2017-02-11 09:59:09 +00:00
Roman Gareev	de69293b01	[FIX] Fix the potential issue of containsOnlyMatMulDep. llvm-svn: 294835	2017-02-11 09:48:09 +00:00
Roman Gareev	5ef7e210c0	[NFC] Fix the style issue of lib/Transform/ScheduleOptimizer.cpp. llvm-svn: 294834	2017-02-11 08:43:41 +00:00
Roman Gareev	afcf026d81	[NFC] Fix style issues of lib/Transform/ScheduleOptimizer.cpp. llvm-svn: 294831	2017-02-11 07:14:37 +00:00
Roman Gareev	3d4eae31ea	Use the size of the widest type of the matrix multiplication operands The size of the operands type is the one of the parameters required to determine the BLIS micro-kernel. We get the size of the widest type of the matrix multiplication operands in case there are several different types. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D29269 llvm-svn: 294828	2017-02-11 07:00:05 +00:00
Tobias Grosser	296fe2e2ad	[ScopInfo] Use original base address when building ScopArrayInfo [NFC] This change clarfies that we want to indeed use the original base address when creating the ScopArrayInfo that corresponds to a given memory access. This change prepares for https://reviews.llvm.org/D28518. llvm-svn: 294734	2017-02-10 10:09:46 +00:00
Tobias Grosser	5db171a9da	[ScopInfo] Use getAccessValue to obtain the accessed value This replaces the use of getOriginalAddrPtr, a value that is stored in ScopArrayInfo and might at some point not be unique any more. However, the access value is defined to be unique. This change is an update on r294576, which only clarified that we need the original memory access, but where we still remained dependent to have one base pointer per scop. This change removes unnecessary uses of MemoryAddress::getOriginalBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294733	2017-02-10 10:09:44 +00:00
Tobias Grosser	583be06fb2	[BlockGenerator] Use MemoryAccess::getAccessValue to get load instruction When generating code in the BlockGenerator we copy all (interesting) instructions and keep track of the new values in a basic block map. To obtain the original llvm::Value that belongs to a load memory access, we use getAccessValue() instead of getOriginalBaseAddr(). The former always references the instruction we use to load values from. The latter, on the other hand, is obtaine from the corresponding ScopArrayInfo and would not be unique in case ScopArrayInfo objects at some point allow memory accesses with different base addresses. This change is an update on r294566, which only clarified that we need the original memory access, but where we still remained dependent to have one base pointer per scop. This change removes unnecessary uses of MemoryAddress::getOriginalBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294669	2017-02-09 23:54:23 +00:00
Tobias Grosser	e24b7b929d	[ScopInfo] Use MemoryAccess::getScopArrayInfo() interface to access Array [NFC] By using the public interface MemoryAccess::getScopArrayInfo() we avoid the direct access to the ScopArrayInfoMap and as a result also do not need to use the BasePtr as key. This change makes the code cleaner. The const-cast we introduce is a little ugly. We may consider to drop const correctness for getScopArrayInfo() at some point. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294655	2017-02-09 23:24:57 +00:00
Tobias Grosser	9c7d181c92	[ScopInfo] Use types instead of 'auto' and use more descriptive variable names [NFC] LLVM's coding conventions suggest to use auto only in obvious cases. Hence, we move this code to actually declare the types used. We also replace the variable name 'SAI', with the name 'Array', as this improves readability. llvm-svn: 294654	2017-02-09 23:24:54 +00:00
Tobias Grosser	889830b1c5	[ScopInfo] Use ScopArrayInfo instead of base address When building alias groups, we sort different ScopArrays into unrelated groups. Historically we identified arrays through their base pointer, as no ScopArrayInfo class was yet available. This change changes the alias group construction to reference arrays through their ScopArrayInfo object. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294649	2017-02-09 23:12:22 +00:00
Tobias Grosser	be372d5a04	[ScopInfo] Expect the OriginalBaseAddr when looking at underlying instructions [NFC] During SCoP construction we sometimes inspect the underlying IR by looking at the base address of a MemoryAccess. In such cases, we always want the original base address. Make this clear by calling getOriginalBaseAddr(). This is a non-functional change as getBaseAddr maps to getOriginalBaseAddr at the moment. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294576	2017-02-09 10:11:58 +00:00
Tobias Grosser	e0e0e4d4f6	[ScopInfo] Remove unnecessary indirection through SCEV [NFC] The base address of a memory access is already an llvm::Value. Hence, there is no need to go through SCEV, but we can directly work with the llvm::Value. Also use 'Value *' instead of 'auto' for cases where the type is not obvious. llvm-svn: 294575	2017-02-09 09:34:46 +00:00
Tobias Grosser	4553463be4	[IRBuilder] Extract base pointers directly from ScopArray Instead of iterating over statements and their memory accesses to extract the set of available base pointers, just directly iterate over all ScopArray objects. This reflects more the actual intend of the code: collect all arrays (and their base pointers) to emit alias information that specifies that accesses to different arrays cannot alias. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294574	2017-02-09 09:34:42 +00:00
Tobias Grosser	26fb7d7517	[IslAst] Print the ScopArray name to mark reductions Before this change we used the name of the base pointer to mark reductions. This is imprecise as the canonical reference is the ScopArray itself and not the basepointer of a reduction. Using the base pointer of reductions is problematic in cases where a single ScopArray is referenced through two different base pointers. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294568	2017-02-09 08:06:15 +00:00
Tobias Grosser	114f6d6ff5	[DependenceInfo] Use ScopArrayInfo to keep track of arrays [NFC] When computing reduction dependences we first identify all ScopArrays which are part of reductions and then only compute for these ScopArrays the more detailed data dependences that allow us to identify reductions and optimize across them. Instead of using the base pointer as identifier of a ScopArray, it is clearer and more understandable to directly use the ScopArray as identifier. This change implements such a switch. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294567	2017-02-09 08:06:05 +00:00
Tobias Grosser	02400a0e0c	[BlockGenerator] BBMap uses original BaseAddress for scalar loads [NFC] When regenerating code in the BlockGenerator we copy instructions that may references scalar values, for which the new value of a given scalar is looked up in BBMap using the original scalar llvm::Value as index. It is consequently necessary that (re)loaded scalar values are made available in BBMap using the original llvm::Value as key independently if the llvm::Value was (re)loaded from the original scalar or a new access function has been specified that caused the value to be reloaded from an array with a differnet base address. We make this clear by using MemoryAccess::getOriginalBaseAddr() instead of MemoryAccess::getBaseAddr() as index to BBMap. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294566	2017-02-09 08:05:50 +00:00
Roman Gareev	9989088ee9	Isolate a set of partial tile prefixes in case of the matrix multiplication optimization Isolate a set of partial tile prefixes to allow hoisting and sinking out of the unrolled innermost loops produced by the optimization of the matrix multiplication. In case it cannot be proved that the number of loop iterations can be evenly divided by tile sizes and we tile and unroll the point loop, the isl generates conditional expressions. Subsequently, the conditional expressions can prevent stores and loads of the unrolled loops from being sunk and hoisted. The patch isolates a set of partial tile prefixes, which have exactly Mr x Nr iterations of the two innermost loops, the result of the loop tiling performed by the matrix multiplication optimization, where Mr and Mr are parameters of the micro-kernel. This helps to get rid of the conditional expressions of the unrolled innermost loops. Probably this approach can be replaced with padding in future. In case of, for example, the gemm from Polybench/C 3.2 and parametric loop bounds, it helps to increase the performance from 7.98 GFlops (27.71% of theoretical peak) to 21.47 GFlops (74.57% of theoretical peak). Hence, we get the same performance as in case of scalar loops bounds. It also cause compile time regression. The compile-time is increased from 0.795 seconds to 0.837 seconds in case of scalar loops bounds and from 1.222 seconds to 1.490 seconds in case of parametric loops bounds. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D29244 llvm-svn: 294564	2017-02-09 07:10:01 +00:00
Roman Gareev	772498dc68	[NFC] Make ScheduleTreeOptimizer::optimizeBand return a schedule node optimized with optimizeMatMulPattern This patch makes ScheduleTreeOptimizer::optimizeBand return a schedule node optimized with optimizeMatMulPattern. Otherwise, it could not use the isolate option, because standardBandOpts could try to tile a band node with anchored subtree and get the error, since the use of the isolate option causes any tree containing the node to be considered anchored. Furthermore, it is not intended to apply standard optimizations, when the matrix multiplication has been detected. llvm-svn: 294444	2017-02-08 13:29:06 +00:00
Michael Kruse	49c21222a0	[External] Move lib/JSON to lib/External/JSON. NFC. For consistency with isl and ppcg which are already in lib/External. llvm-svn: 294126	2017-02-05 15:26:56 +00:00
Michael Kruse	acb08aaed5	[Support] Add convertZoneToTimepoints. NFC. This function has been extracted from the upcoming DeLICM patch (https://reviews.llvm.org/D24716). In contrast to computeReachingWrite and computeArrayUnused, convertZoneToTimepoints implies a format for zones (ranges between timepoints). Zones at the moment are unique to DeLICM, but convertZoneToTimepoints makes most sense in conjunction with the previous two functions. llvm-svn: 294094	2017-02-04 15:42:17 +00:00
Michael Kruse	ec67d36493	[Support] Add computeArrayUnused. NFC. This function has been extracted from the upcoming DeLICM patch (https://reviews.llvm.org/D24716). llvm-svn: 294093	2017-02-04 15:42:10 +00:00
Michael Kruse	f4dc133e69	[Support] Add computeReachingWrite. NFC. This function has been extracted from the upcoming DeLICM patch (https://reviews.llvm.org/D24716). llvm-svn: 294092	2017-02-04 15:42:01 +00:00
Michael Kruse	eeadf31de1	[Support] Remove unused function hasInvokeEdge. NFC. llvm-svn: 294062	2017-02-03 22:53:10 +00:00
Roman Gareev	98075fe181	A new algorithm for identification of a SCoP statement that implement a matrix multiplication The current identification of a SCoP statement that implement a matrix multiplication does not help to identify different permutations of loops that contain it and check for dependencies, which can prevent it from being optimized. It also requires external determination of the operands of the matrix multiplication. This patch contains the implementation of a new algorithm that helps to avoid these issues. It also modifies the test cases that generate matrix multiplications with linearized accesses, because the new algorithm does not support them. Reviewed-by: Michael Kruse <llvm@meinersbur.de>, Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28357 llvm-svn: 293890	2017-02-02 14:23:14 +00:00
Tobias Grosser	ff40087a6a	Update to recent formatting changes llvm-svn: 293756	2017-02-01 10:12:09 +00:00
Daniel Jasper	baaa152294	Fix format after recent clang-format change. llvm-svn: 293753	2017-02-01 09:31:42 +00:00
Roman Gareev	7758a2af53	Update the documentation on how the packing transformation is implemented Add a simple example to update the documentation on how the packing transformation is implemented. Reviewed-by: Tobias Grosser <tobias@grosser.es>, Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D28021 llvm-svn: 293429	2017-01-29 10:37:50 +00:00
Tobias Grosser	682c51143d	[BlockGenerator] Comment corretions for r293374 [NFC] This addresses some additional comments from Michael Kruse for commit r293374 as expressed in https://reviews.llvm.org/D28901. llvm-svn: 293378	2017-01-28 11:39:02 +00:00
Tobias Grosser	587f1f57ad	[Polly] [BlockGenerator] Unify ScalarMap and PhiOpsMap Instead of keeping two separate maps from Value to Allocas, one for MemoryType::Value and the other for MemoryType::PHI, we introduce a single map from ScopArrayInfo to the corresponding Alloca. This change is intended, both as a general simplification and cleanup, but also to reduce our use of MemoryAccess::getBaseAddr(). Moving away from using getBaseAddr() makes sure we have only a single place where the array (and its base pointer) for which we generate code for is specified, which means we can more easily introduce new access functions that use a different ScopArrayInfo as base. We already today experiment with modifiable access functions, so this change does not address a specific bug, but it just reduces the scope one needs to reason about. Another motivation for this patch is https://reviews.llvm.org/D28518, where memory accesses with different base pointers could possibly be mapped to a single ScopArrayInfo object. Such a mapping is currently not possible, as we currently generate alloca instructions according to the base addresses of the memory accesses, not according to the ScopArrayInfo object they belong to. By making allocas ScopArrayInfo specific, a mapping to a single ScopArrayInfo object will automatically mean that the same stack slot is used for these arrays. For D28518 this is not a problem, as only MemoryType::Array objects are mapping, but resolving this inconsistency will hopefully avoid confusion. llvm-svn: 293374	2017-01-28 07:42:10 +00:00
Michael Kruse	d1508812f5	[Support] Add general isl tools for DeLICM. NFC. Add some generally useful isl tools into a their own new ISLTools.cpp. These are the helpers were extracted from and will be use by the DeLICM algorithm (https://reviews.llvm.org/D24716). Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 293340	2017-01-27 22:51:36 +00:00
Michael Kruse	33dc454700	[CodePrepa] Remove unused declaration. NFC. llvm-svn: 293304	2017-01-27 16:59:09 +00:00
Tobias Grosser	77363965c0	[ScopDetectionDiagnostic] Add meaningfull enduser message for regions with entry block Before this change the user only saw "Unspecified Error", when a region contained the entry block. Now we report: "Scop contains function entry (not yet supported)." llvm-svn: 293169	2017-01-26 10:41:37 +00:00
Tobias Grosser	64bbb1357f	ScopDetectionDiagnostics: Also emit diagnostics in case no debug info is available In this case, we just use the start of the scop as the debug location. llvm-svn: 293165	2017-01-26 10:30:55 +00:00
Tobias Grosser	75dfaa1dbe	BlockGenerator: Do not redundantly reload from PHI-allocas in non-affine stmts Before this change we created an additional reload in the copy of the incoming block of a PHI node to reload the incoming value, even though the necessary value has already been made available by the normally generated scalar loads. In this change, we drop the code that generates this redundant reload and instead just reuse the scalar value already available. Besides making the generated code slightly cleaner, this change also makes sure that scalar loads go through the normal logic, which means they can be remapped (e.g. to array slots) and corresponding code is generated to load from the remapped location. Without this change, the original scalar load at the beginning of the non-affine region would have been remapped, but the redundant scalar load would continue to load from the old PHI slot location. It might be possible to further simplify the code in addOperandToPHI, but this would not only mean to pull out getNewValue, but to also change the insertion point update logic. As this did not work when trying it the first time, this change is likely not trivial. To not introduce bugs last minute, we postpone further simplications to a subsequent commit. We also document the current behavior a little bit better. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D28892 llvm-svn: 292486	2017-01-19 14:12:45 +00:00
Tobias Grosser	943c369c60	BlockGenerator: remove obfuscating const and const casts Making certain values 'const' to just cast it away a little later mainly obfuscates the code. Hence, we just drop the 'const' parts. Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 292480	2017-01-19 13:25:52 +00:00
Tobias Grosser	97b8490982	Use range-based for loop [NFC] llvm-svn: 292471	2017-01-19 05:09:23 +00:00
Tobias Grosser	e1ff0cf2eb	Relax assert when setting access functions with invariant base pointers Summary: Instead of forbidding such access functions completely, we verify that their base pointer has been hoisted and only assert in case the base pointer was not hoisted. I was trying for a little while to get a test case that ensures the assert is correctly fired in case of invariant load hoisting being disabled, but I could not find a good way to do so, as llvm-lit immediately aborts if a command yields a non-zero return value. As we do not generally test our asserts, not having a test case here seems OK. This resolves http://llvm.org/PR31494 Suggested-by: Michael Kruse <llvm@meinersbur.de> Reviewers: efriedma, jdoerfert, Meinersbur, gareevroman, sebpop, zinob, huihuiz, pollydev Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D28798 llvm-svn: 292213	2017-01-17 12:00:42 +00:00
Eli Friedman	71329901ea	Tidy up getFirstNonBoxedLoopFor [NFC] Move the function getFirstNonBoxedLoopFor which is used in ScopBuilder and in ScopInfo to Support/ScopHelpers to make it reusable in other locations. No functionality change. Patch by Sameer Abu Asal. Differential Revision: https://reviews.llvm.org/D28754 llvm-svn: 292168	2017-01-16 22:54:29 +00:00
Tobias Grosser	0032d87337	ScopInfo: document base pointers in alias-checks must be invariant [NFC] Before this change, this code has been mixed with a check for non-affine loops (and when originally introduce was also duplicated). By creating a separate loop and explicitly documenting this property, the current behavior becomes a lot more clear. llvm-svn: 292140	2017-01-16 15:49:14 +00:00
Tobias Grosser	f3c145f2ab	ScopInfo: Improve comments in buildAliasGroup [NFC] llvm-svn: 292139	2017-01-16 15:49:09 +00:00
Tobias Grosser	77f3257b41	ScopInfo: split out construction of a single alias group [NFC] The loop body in buildAliasGroups is still too large to easily scan it. Hence, we split the loop body out into a separate function to improve readability. llvm-svn: 292138	2017-01-16 15:49:07 +00:00
Tobias Grosser	e95222343c	ScopInfo: Do not modify the original alias group [NFC] Instead of modifying the original alias group and repurposing it as read-write access group when splitting accesses in read-only and read-write accesses, we just keep all three groups: the original alias group, the set of read-only accesses and the set of read-write accesses. This allows us to remove some complicated iterator handling and also allows for more code-reuse in calculateMinMaxAccess. llvm-svn: 292137	2017-01-16 15:49:04 +00:00
Tobias Grosser	457eb579dd	ScopInfo: No need to keep ReadOnlyAccesses in an additional map [NFC] It seems over time we added an additional map that maps from the base address of a read-only access to the actual access. However this map is never used. Drop the creation and use of this map to simplify our alias check generation code. llvm-svn: 292126	2017-01-16 14:24:48 +00:00
Tobias Grosser	dba2206b65	ScopInfo: no need to clear alias group explicitly The alias group will anyhow be cleared at the end of this function and is not used afterwards. We avoid an explicit clear() call at multiple places to improve readability of this code. llvm-svn: 292125	2017-01-16 14:13:01 +00:00
Tobias Grosser	21a059af09	Adjust formatting to commit r292110 [NFC] llvm-svn: 292123	2017-01-16 14:08:10 +00:00
Tobias Grosser	92fd612c84	ScopInfo: Fold SmallVectors used in alias check generation back into loop [NFC] Hoisting small vectors out of a loop seems to be a pure performance optimization, which is unlikely to have great impact in practice. As this hoisting just increases code-complexity, we fold the SmallVectors back into the loop. In subsequent commits, we will further simplify and structure this code, but we committed this change separately to provide an explanation to make clear that we purposefully reverted this optimization. llvm-svn: 292122	2017-01-16 14:08:02 +00:00
Tobias Grosser	e39f9127f9	ScopInfo: Extract out splitAliasGroupsByDomain [NFC] The function buildAliasGroups got very large. We extract out the splitting of alias groups to reduce its size and to better document the current behavior. llvm-svn: 292121	2017-01-16 14:08:00 +00:00
Tobias Grosser	9edcf07e83	ScopInfo: Extract out buildAliasGroupsForAccesses [NFC] The function buildAliasGroups got very large. We extract out the actual construction of alias groups to reduce its size and to better document the current behavior. llvm-svn: 292120	2017-01-16 14:07:57 +00:00
Tobias Grosser	2ef3887d28	Do not track the isl PDF manual in SVN There is no point in regularly committing a binary file to the repository, as this just unnecessarily increases the repository size. Interested people can find the isl manual for example at isl.gforge.inria.fr/manual.pdf. llvm-svn: 292105	2017-01-16 11:48:03 +00:00
Hongbin Zheng	6aded2a0e4	Fix compilation on MSVC, NFC Differential Revision: https://reviews.llvm.org/D28739 llvm-svn: 292067	2017-01-15 16:47:26 +00:00
Tobias Grosser	4d5a917287	Use typed enums to model MemoryKind and move MemoryKind out of ScopArrayInfo To benefit of the type safety guarantees of C++11 typed enums, which would have caught the type mismatch fixed in r291960, we make MemoryKind a typed enum. This change also allows us to drop the 'MK_' prefix and to instead use the more descriptive full name of the enum as prefix. To reduce the amount of typing needed, we use this opportunity to move MemoryKind from ScopArrayInfo to a global scope, which means the ScopArrayInfo:: prefix is not needed. This move also makes historically sense. In the beginning of Polly we had different MemoryKind enums in both MemoryAccess and ScopArrayInfo, which were later canonicalized to one. During this canonicalization we just choose the enum in ScopArrayInfo, but did not consider to move this shared enum to global scope. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D28090 llvm-svn: 292030	2017-01-14 20:25:44 +00:00
Tobias Grosser	67e94fb435	ScheduleOptimizer: Allow to set register width in command line We use this option to set a fixed register width in our test cases to make sure the results are identical accross platforms. llvm-svn: 292002	2017-01-14 07:14:54 +00:00
Tobias Grosser	e29db2173b	Update to recent clang-format changes llvm-svn: 291810	2017-01-12 21:05:19 +00:00
Eli Friedman	c6e3b6f156	Delete stray isl_map_dump call. llvm-svn: 291521	2017-01-10 01:08:11 +00:00
Tobias Grosser	cdbe5c9d6c	Fix some typos in comments llvm-svn: 291247	2017-01-06 17:30:34 +00:00
Tobias Grosser	94e5371dde	Update to isl-0.18-43-g0b4256f Even more isl coalesce changes. llvm-svn: 290783	2016-12-31 07:46:11 +00:00
Tobias Grosser	ba3ea97689	Update to isl-0.18-28-gccb9f33 Another set of isl coalesce changes. llvm-svn: 290681	2016-12-28 19:35:49 +00:00
Tobias Grosser	600941351e	Update to isl-0.18-17-g2844ebf This update improves isl's ability to coalesce different convex sets/maps, especially when the contain existentially quantified variables. llvm-svn: 290538	2016-12-26 12:11:40 +00:00
Roman Gareev	1c2927b209	Specify the default values of the cache parameters If the parameters of the target cache (i.e., cache level sizes, cache level associativities) are not specified or have wrong values, we use ones for parameters of the macro-kernel and do not perform data-layout optimizations of the matrix multiplication. In this patch we specify the default values of the cache parameters to be able to apply the pattern matching optimizations even in this case. Since there is no typical values of this parameters, we use the parameters of Intel Core i7-3820 SandyBridge that also help to attain the high-performance on IBM POWER System S822 and IBM Power 730 Express server. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28090 llvm-svn: 290518	2016-12-25 16:32:28 +00:00
Tobias Grosser	0791d5f5aa	ScheduleOptimizer: Fix spelling of option '-polly-target-throughput-vector-fma' througput -> throughput llvm-svn: 290418	2016-12-23 07:33:39 +00:00
Tobias Grosser	ccae1ee4df	Update isl to isl-0.18-9-gd4734f3 llvm-svn: 290389	2016-12-22 23:08:57 +00:00
Roman Gareev	be5299af0b	Change the determination of parameters of macro-kernel Typically processor architectures do not include an L3 cache, which means that Nc, the parameter of the micro-kernel, is, for all practical purposes, redundant ([1]). However, its small values can cause the redundant packing of the same elements of the matrix A, the first operand of the matrix multiplication. At the same time, big values of the parameter Nc can cause segmentation faults in case the available stack is exceeded. This patch adds an option to specify the parameter Nc as a multiple of the parameter of the micro-kernel Nr. In case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=8 it helps to improve the performance from 11.303 GFlops/sec (39,247% of theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak). Refs.: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28019 llvm-svn: 290256	2016-12-21 12:51:12 +00:00
Roman Gareev	bd5c6039c6	Align newly created arrays to the first level cache line boundary Aligning data to cache lines boundaries helps to avoid overheads related to an access to it ([1]). This patch aligns newly created arrays and adds an option to specify the first level cache line size. By default we use 64 bytes, which is a typical cache-line size ([2]). In case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=8 it helps to improve the performance from 11.303 GFlops/sec (39,247% of theoretical peak) to 12.63 GFlops/sec (43,8542% of theoretical peak). Refs.: [1] - http://www.alexonlinux.com/aligned-vs-unaligned-memory-access [2] - http://igoro.com/archive/gallery-of-processor-cache-effects/ Differential Revision: https://reviews.llvm.org/D28020 Reviewed-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 290253	2016-12-21 12:37:36 +00:00
Roman Gareev	92c446016a	[Polly] Use three-dimensional arrays to store packed operands of the matrix multiplication Previously we had two-dimensional accesses to store packed operands of the matrix multiplication for the sake of simplicity of the packed arrays. However, addition of the third dimension helps to simplify the corresponding memory access, reduce the execution time of isl operations applied to it, and consequently reduce the compile-time of Polly. For example, in case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=7 it helps to reduce the compile-time from about 361.456 seconds to about 0.816 seconds. Reviewed-by: Michael Kruse <llvm@meinersbur.de>, Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D27878 llvm-svn: 290251	2016-12-21 11:18:42 +00:00
Tobias Grosser	b6945e3301	Fix clang-format llvm-svn: 290103	2016-12-19 14:06:40 +00:00
Daniel Jasper	e5f3eba9c3	Fix format after recent clang-format change. llvm-svn: 290085	2016-12-19 07:54:15 +00:00
Alexandre Isoard	cbed3ce39f	Add isl_multi_pw_aff to GICHelper Add isl_multi_pw_aff* to GICHelper and add some missing isl_pw_multi_aff* handlers. llvm-svn: 290007	2016-12-16 23:41:26 +00:00
Roman Gareev	2606c48a1d	Restrict ranges of extension maps To prevent copy statements from accessing arrays out of bounds, ranges of their extension maps are restricted, according to the constraints of domains. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D25655 llvm-svn: 289815	2016-12-15 12:35:59 +00:00
Roman Gareev	15db81ef71	[NFC] Fix typos in getMacroKernelParams. llvm-svn: 289808	2016-12-15 12:00:57 +00:00
Roman Gareev	8babe1a216	The order of the loops defines the data reused in the BLIS implementation of gemm ([1]). In particular, elements of the matrix B, the second operand of matrix multiplication, are reused between iterations of the innermost loop. To keep the reused data in cache, only elements of matrix A, the first operand of matrix multiplication, should be evicted during an iteration of the innermost loop. To provide such a cache replacement policy, elements of the matrix A can, in particular, be loaded first and, consequently, be least-recently-used. In our case matrices are stored in row-major order instead of column-major order used in the BLIS implementation ([1]). One of the ways to address it is to accordingly change the order of the loops of the loop nest. However, it makes elements of the matrix A to be reused in the innermost loop and, consequently, requires to load elements of the matrix B first. Since the LLVM vectorizer always generates loads from the matrix A before loads from the matrix B and we can not provide it. Consequently, we only change the BLIS micro kernel and the computation of its parameters instead. In particular, reused elements of the matrix B are successively multiplied by specific elements of the matrix A . Refs.: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D25653 llvm-svn: 289806	2016-12-15 11:47:38 +00:00
Michael Kruse	7037fde427	Remove references to AssumptionCache. NFC. The AssumptionCache was removed in r289756 after being replaced by the an addtional operand list of affected values in r289755. The absence of that cache means that we have now have to manually search for llvm.assume intrinsics as now done by other passes (LazyValueInfo, CodeMetrics) do not take into account an llvm::Instruction's user lists (ScalarEvolution). llvm-svn: 289791	2016-12-15 09:25:14 +00:00
Tobias Grosser	b02b6a8404	Adjust clang-format formatting to r289531 clang-format has been updated in r289531 to keep labels and values on the same line. This change updates Polly to the new formatting style. llvm-svn: 289533	2016-12-13 12:44:00 +00:00
Michael Kruse	79c0173f53	[ScheduleOptimizer] Fix memory leak. NFC. llvm-svn: 289434	2016-12-12 14:51:06 +00:00
Michael Kruse	b9a683d75d	Add more ISL foreachElt functions. NFC. Add and implement foreachElt for isl_map, isl_set and isl_union_set. These are used by an out-of-tree patch which is in process of being upstreamed. llvm-svn: 288924	2016-12-07 17:47:57 +00:00
Michael Kruse	2ead2bfc12	Add IslPtr type traits. NFC. Add traits for isl_id and isl_multi_aff, required by out-of-tree patches currently in progress of upstreaming. isl_union_pw_aff_dump has been added to ISL during one of the last ISL updates, such that we can also enable its dump() trait. llvm-svn: 288915	2016-12-07 16:17:59 +00:00
Michael Kruse	1b8eb4104b	Update to isl-0.17.1-314-g3106e8d This version includes an update for imath (isl-0.17.1-49-g2f1c129). It fixes the compilation under windows, which does not know ssize_t. In addition, isl-0.17.1-288-g0500299 changed the way isl_test finds the source directory. It now generates a file isl_srcdir.c at configure-time, containing the source path, to not require setting the environment variable "srcdir" at test-time. The cmake build system had to be modified to also generate that file. llvm-svn: 288811	2016-12-06 14:37:39 +00:00
Johannes Doerfert	bda814350a	Allow to disable unsigned operations (zext, icmp ugt, ...) Unsigned operations are often useful to support but the heuristics are not yet tuned. This options allows to disable them if necessary. llvm-svn: 288521	2016-12-02 17:55:41 +00:00
Johannes Doerfert	a94ae1aede	Do not allow multiple possibly aliasing ptrs in an expression Relational comparisons should not involve multiple potentially aliasing pointers. Similarly this should hold for switch conditions and the two conditions involved in equality comparisons (separately!). This is a heuristic based on the C semantics that does only allow such operations when the base pointers do point into the same object. Since this makes aliasing likely we will bail out early instead of producing a probably failing runtime check. llvm-svn: 288516	2016-12-02 17:49:52 +00:00
Johannes Doerfert	2df9963fe3	Rerun mem2reg after the inliner It did happen that after the inliner finished we end up with promotable allocas in a function. We now run mem2reg to make sure everything is promoted if possible. llvm-svn: 288514	2016-12-02 17:43:57 +00:00
Tobias Grosser	bedef00e2c	[ScopInfo] Fold constant coefficients in array dimensions to the right This allows us to delinearize code such as the one below, where the array sizes are A[][2 * n] as there are n times two elements in the innermost dimension. Alternatively, we could try to generate another dimension for the struct in the innermost dimension, but as the struct has constant size, recovering this dimension is easy. struct com { double Real; double Img; }; void foo(long n, struct com A[][n]) { for (long i = 0; i < 100; i++) for (long j = 0; j < 1000; j++) A[i][j].Real += A[i][j].Img; } int main() { struct com A[100][1000]; foo(1000, A); llvm-svn: 288489	2016-12-02 08:10:56 +00:00
Tobias Grosser	491b799a4d	[ScopInfo] Separate construction and finalization of memory accesses [NFC] After having built memory accesses we perform some additional transformations on them to increase the chances that our delinearization guesses the right shape. Only after these transformations, we take the assumptions that the array shape we predict is such that no out-of-bounds memory accesses arise. Before this change, the construction of the memory access, the access folding that improves the represenation for certain parametric subscripts, and taking the assumption was all done right after a memory access was created. In this change we split this now into three separate iterations over all memory accesses. This means only after all memory accesses have been built, we start to canonicalize accesses, and to take assumptions. This split prepares for future canonicalizations that must consider all memory accesses for deriving additional beneficial transformations. llvm-svn: 288479	2016-12-02 05:21:22 +00:00
Johannes Doerfert	b1d6608430	[NFC] Check for feasibility prior to the profitability check Feasibility is checked late on its own but early it is hidden behind the "PollyProcessUnprofitable" guard. This change will make sure we opt out early if the runtime context is infeasible anyway. llvm-svn: 288329	2016-12-01 11:12:14 +00:00
Johannes Doerfert	b6c5a5dd01	[FIX] Do not try to hoist obviously overwritten loads llvm-svn: 288328	2016-12-01 11:10:45 +00:00
Tobias Grosser	dc6b87c56e	Add newline at end of debug print In '[DBG] Allow to emit the RTC value at runtime' the diagnostics were printed without a newline at the end of each diagnostic. We add such a newline to improve readability. llvm-svn: 288323	2016-12-01 08:08:47 +00:00

... 2 3 4 5 6 ...

2333 Commits