llvm-project

Commit Graph

Author	SHA1	Message	Date
Michael Kruse	8080011ca1	[unittests/DeLICM] Add test for Occipied vs Occupied. The interpretation of multiple known ValInsts for the same element and timepoint is that these are alterntivate names for the same values, for instance a PHINode and the incoming value when knowning it was the last executed block. That means that known values do not conflict if there at least (but necessarily all) one common ValInst. Add a case to test this principle. llvm-svn: 301480	2017-04-26 21:52:51 +00:00
Michael Kruse	3e519b949b	[DeLICM] Use Known information when comparing Occupied and Written. Do not conflict if a write writes the same value as already known. This change only affects unit tests, but no functional changes are expected on LLVM-IR, as no Known information is yet extracted and consequently this functionality is only triggered through unit tests. Differential Revision: https://reviews.llvm.org/D32026 llvm-svn: 301460	2017-04-26 20:35:07 +00:00
Tobias Grosser	1c3eebac08	Update to isl-0.18-423-g30331fe This is just a general maintenance update. llvm-svn: 301433	2017-04-26 17:08:02 +00:00
Michael Kruse	cd2be66bf0	[DeLICM] Use Known information when comparing Existing.Occupied and Proposed.Occupied. Do not conflict if the value of Existing and Proposed are the same. This change only affects unit tests, but no functional changes are expected on LLVM-IR, as no Known information is yet extracted and consequently this functionality is only triggered through unit tests. Differential Revision: https://reviews.llvm.org/D32025 llvm-svn: 301301	2017-04-25 10:57:32 +00:00
Siddharth Bhat	d277feda91	[PPCGCodeGeneration] Update PPCG Code Generation for OpenCL compatibility Added a small change to the way pointer arguments are set in the kernel code generation. The way the pointer is retrieved now, specifically requests global address space to be annotated. This is necessary, if the IR should be run through NVPTX to generate OpenCL compatible PTX. The changes do not affect the PTX Strings generated for the CUDA target (nvptx64-nvidia-cuda), but are necessary for OpenCL (nvptx64-nvidia-nvcl). Additionally, the data layout has been updated to what the NVPTX Backend requests/recommends. Contributed-by: Philipp Schaad Reviewers: Meinersbur, grosser, bollu Reviewed By: grosser, bollu Subscribers: jlebar, pollydev, llvm-commits, nemanjai, yaxunl, Anastasia Tags: #polly Differential Revision: https://reviews.llvm.org/D32215 llvm-svn: 301299	2017-04-25 08:08:29 +00:00
Michael Kruse	a8b0be819a	[unittests] Derive Occupied from Unused when given. When both, OccupiedAndKnown and Unused are given, use the former only for the Known values. The relation Unused \union Occupied must always hold. This allows us to specify Known independently of Occupied. It is needed for an artificial test case in https://reviews.llvm.org/D32025. llvm-svn: 301284	2017-04-25 00:30:42 +00:00
Michael Kruse	b745b740f9	[unittests] Add postcondition to completeLifetime. llvm-svn: 301283	2017-04-25 00:30:32 +00:00
Siddharth Bhat	729377f063	[Polly] [DependenceInfo] change WAR generation, Read will not block Read Earlier, the call to buildFlow was: WAR = buildFlow(Write, Read, MustWrite, Schedule). This meant that Read could block another Read, since must-sources can block each other. Fixed the call to buildFlow to correctly compute Read. The resulting code needs to do some ISL juggling to get the output we want. Bug report: https://bugs.llvm.org/show_bug.cgi?id=32623 Reviewers: Meinersbur Tags: #polly Differential Revision: https://reviews.llvm.org/D32011 llvm-svn: 301266	2017-04-24 22:23:12 +00:00
Tobias Grosser	9b34a08b19	[isl C++ bindings] Add explicit const casts for foreach bindings This avoids a compiler warning about lost 'const' attributes. Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 301108	2017-04-23 07:54:12 +00:00
Michael Kruse	abf05b18db	[CMake] Fix polly-isl-test execution in out-of-LLVM-tree builds. The isl unittest modified its PATH variable to point to the LLVM bin dir. When building out-of-LLVM-tree, it does not contain the polly-isl-test executable, hence the test fails. Ensure that the polly-isl-test is written to a bin directory in the build root, just like it would happen in an inside-LLVM build. Then, change PATH to include that dir such that the executable in it is prioritized before any other location. llvm-svn: 301096	2017-04-22 23:02:53 +00:00
Michael Kruse	9c19d1f3aa	[CMake] Fix unittests in out-of-LLVM-tree builds. Unittests are linked against a subset of LLVM libraries and its transitive dependencies resolved by CMake. The information about indirect library dependency is not available when building separately from LLVM, which result in missing symbol errors while linking. Resolve this issue by querying llvm-config about the available LLVM libraries and link against all of them, since dependence information is still not available. llvm-svn: 301095	2017-04-22 23:02:46 +00:00
Michael Kruse	ab6b47d2e7	[CMake] Link unittests only against libLLVM.so, if available. We can only link against libLLVM.so or the individual libLLVM*.so components, but not both of them. Doing so results in these components exist twice in the programs address space, since it is already contained in libLLVM.so. The observable effect of this is that command line switches are registered multiple times (once for each instance), which is an error. This fixes llvm.org/PR32735. Reported-by: Singapuram Sanjay Srivallabh <singapuram.sanjay@gmail.com> llvm-svn: 301020	2017-04-21 19:03:51 +00:00
Tobias Grosser	9e6c00194f	GICHelper: remove forgotten isl foreach declarations These should have been dropped in r300323. Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 300965	2017-04-21 10:50:33 +00:00
Michael Kruse	8431e996d3	[DeLICM] Use Known information when comparing Existing.Written and Proposed.Written. This change only affects unit tests, but no functional changes are expected on LLVM-IR, as no Known information is yet extracted and consequently this functionality is only triggered through unit tests. Differential Revision: https://reviews.llvm.org/D32027 llvm-svn: 300874	2017-04-20 19:16:39 +00:00
Tobias Grosser	1f8b84094f	Update isl bindings to latest version (+ Polly extensions) After the isl C++ binding generator is now close to being upstreamed to isl, we synchronize the latest changes to Polly. These are mostly formatting changes plus a small interface change for the foreach callback function and some naming changes in isl::boolean. llvm-svn: 300398	2017-04-15 08:15:54 +00:00
Tobias Grosser	75aa1a9a49	Use isl C++ foreach implementation This commit switches Polly over to the isl::obj::foreach_* implementation, which is part of the new isl bindings and follows the foreach pattern established in Polly by Michael Kruse. The original isl C function: isl_stat isl_union_set_foreach_set(__isl_keep isl_union_set uset, isl_stat (fn)(__isl_take isl_set set, void user), void user); which required the user to define a static callback function to which all interesting parameters are passed via a 'void ' user-pointer, is on the C++ side available as a function that takes a std::function<>, which can carry any additional arguments without the need for a user pointer: stat UnionSet::foreach_set(const std::function<stat(set)> &fn) const; The following code illustrates the use of the new C++ interface: auto Lambda = [=, &Result](isl::set Set) -> isl::stat { auto Shifted = shiftDimension(Set, Pos, Amount); Result = Result.add(Shifted); return isl::stat::ok; } UnionSet.foreach_set(Lambda); Polly had some specialized foreach functions which did not require the lambdas to return a status flag. We remove these functions in this commit to move Polly completely over to the new isl interface. We may in the future discuss if functors without return values can be supported easily. Another extension proposed by Michael Kruse is the use of C++ iterators to allow the use of normal for loops to iterate over these sets. Such an extension would allow us to further simplify the code. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D30620 llvm-svn: 300323	2017-04-14 13:39:40 +00:00
Michael Kruse	a8e885d87c	[DeLICM] Introduce unittesting infrastructure for Known and Written. NFC. llvm-svn: 300212	2017-04-13 16:32:46 +00:00
Michael Kruse	72f3922534	[DeLICM] Export Known and Written to DeLICMTests. NFC. This will allow unittesting of new functionality based on Known and Written. llvm-svn: 300211	2017-04-13 16:32:39 +00:00
Michael Kruse	a2acc11949	[DeLICM] Add Knowledge::Known. NFC. This field will later contain a ValInst that is known to be stored in an occupied array element. llvm-svn: 300210	2017-04-13 16:32:31 +00:00
Michael Kruse	fa7c8cdfc6	[DeLICM] Make Knowledge::Written an isl::union_map. NFC. The map will later point to a ValInst that is written. llvm-svn: 300208	2017-04-13 16:32:25 +00:00
Michael Kruse	5e6456979b	[DeLICM] Rename Knowledge to KnowledgeStr. NFC. Some debuggers get confused by different class of the same name defined independently in different translation units. llvm-svn: 300207	2017-04-13 16:32:16 +00:00
Tobias Grosser	7b5a4dfd46	Exploit BasicBlock::getModule to shorten code Suggested-by: Roman Gareev <gareevroman@gmail.com> llvm-svn: 299914	2017-04-11 04:59:13 +00:00
Tobias Grosser	67726b3260	SAdjust to recent change in constructor definition of AllocaInst llvm-svn: 299913	2017-04-11 04:23:38 +00:00
Matt Arsenault	b3e30c32ce	Update for alloca construction changes llvm-svn: 299905	2017-04-11 00:12:58 +00:00
Philip Pfaffe	78265cd237	Fix missing .git/indexloadPolly in ensure-correct-tile-sizes testcase llvm-svn: 299765	2017-04-07 12:55:26 +00:00
Roman Gareev	9d4d91ca6a	[FIX] Fix ScheduleTreeOptimizer::optimizeMatMulPattern Use new values of the dimensions during their permutation. llvm-svn: 299663	2017-04-06 17:25:08 +00:00
Roman Gareev	e0d466342b	Restore the initial ordering of dimensions before applying the pattern matching Dimensions of band nodes can be implicitly permuted by the algorithm applied during the schedule generation. For example, in case of the following matrix-matrix multiplication, for (i = 0; i < 1024; i++) for (k = 0; k < 1024; k++) for (j = 0; j < 1024; j++) C[i][j] += A[i][k] * B[k][j]; it can produce the following schedule tree domain: "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and 0 <= i2 <= 1023 }" child: schedule: "[{ Stmt_for_body6[i0, i1, i2] -> [(i0)] }, { Stmt_for_body6[i0, i1, i2] -> [(i1)] }, { Stmt_for_body6[i0, i1, i2] -> [(i2)] }]" permutable: 1 coincident: [ 1, 1, 0 ] The current implementation of the pattern matching optimizations relies on the initial ordering of dimensions. Otherwise, it can produce the miscompilation (e.g., [1]). This patch helps to restore the initial ordering of dimensions by recreating the band node when the corresponding conditions are satisfied. Refs.: [1] - https://bugs.llvm.org/show_bug.cgi?id=32500 Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D31741 llvm-svn: 299662	2017-04-06 17:09:54 +00:00
Siddharth Bhat	5eeb1dd42e	[Polly] [ScheduleOptimizer] Prevent incorrect tile size computation Because Polly exposes parameters that directly influence tile size calculations, one can setup situations like divide-by-zero. Check against a possible divide-by-zero in getMacroKernelParams and return early. Also assert at the end of getMacroKernelParams that the block sizes computed for matrices are positive (>= 1). Tags: #polly Differential Revision: https://reviews.llvm.org/D31708 llvm-svn: 299633	2017-04-06 08:20:22 +00:00
Tobias Grosser	0d622a4bf9	Update to isl-0.18-417-gb9e7334 This is a regular maintenance update. llvm-svn: 299617	2017-04-06 03:41:47 +00:00
Michael Kruse	895f5d8080	Remove llvm.lifetime.start/end in original region. The current StackColoring algorithm does not correctly handle the situation when some, but not all paths from a BB to the entry node cross a llvm.lifetime.start. According to an interpretation of the language reference at http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic this might be correct, but it would cost too much effort to handle in StackColoring. To be on the safe side, remove all lifetime markers even in the original code version (they have never been copied to the optimized version) to ensure that no path to the entry block will cross a llvm.lifetime.start. The same principle applies to paths the a function return and the llvm.lifetime.end marker, so we remove them as well. This fixes llvm.org/PR32251. Also see the discussion at http://lists.llvm.org/pipermail/llvm-dev/2017-March/111551.html llvm-svn: 299585	2017-04-05 20:09:59 +00:00
Tobias Grosser	59e42b8f96	Add two Polly images llvm-svn: 299534	2017-04-05 11:50:31 +00:00
Siddharth Bhat	bcbfdade41	[Polly] [DependenceInfo] change WAR, WAW generation to correct semantics = Change of WAR, WAW generation: = - `buildFlow(Sink, MustSource, MaySource, Sink)` treates any flow of the form `sink <- may source <- must source` as a may dependence. - we used to call: ```lang=cpp, name=old-flow-call.cpp Flow = buildFlow(MustWrite, MustWrite, Read, Schedule); WAW = isl_union_flow_get_must_dependence(Flow); WAR = isl_union_flow_get_may_dependence(Flow); ``` - This caused some WAW dependences to be treated as WAR dependences. - Incorrect semantics. - Now, we call WAR and WAW correctly. == Correct WAW: == ```lang=cpp, name=new-waw-call.cpp Flow = buildFlow(Write, MustWrite, MayWrite, Schedule); WAW = isl_union_flow_get_may_dependence(Flow); isl_union_flow_free(Flow); ``` == Correct WAR: == ```lang=cpp, name=new-war-call.cpp Flow = buildFlow(Write, Read, MustaWrite, Schedule); WAR = isl_union_flow_get_must_dependence(Flow); isl_union_flow_free(Flow); ``` - We want the "shortest" WAR possible (exact dependences). - We mark all the must-writes as may-source, reads as must-souce. - Then, we ask for must dependence. - This removes all the reads that flow through a must-write before reaching a sink. - Note that we only block ealier writes with must-writes. This is intuitively correct, as we do not want may-writes to block must-writes. - Leaves us with direct (R -> W). - This affects reduction generation since RED is built using WAW and WAR. = New StrictWAW for Reductions: = - We used to call: ```lang=cpp,name=old-waw-war-call.cpp Flow = buildFlow(MustWrite, MustWrite, Read, Schedule); WAW = isl_union_flow_get_must_dependence(Flow); WAR = isl_union_flow_get_may_dependence(Flow); ``` - This is the right model of WAW we need for reductions, just not in general. - Reductions need to track only strict WAW, without any interfering reductions. = Explanation: Why the new WAR dependences in tests are correct: = - We no longer set WAR = WAR - WAW - Hence, we will have WAR dependences that were originally removed. - These may look incorrect, but in fact make sense. == Code: == ```lang=llvm, name=new-war-dependence.ll ; void manyreductions(long A) { ; for (long i = 0; i < 1024; i++) ; for (long j = 0; j < 1024; j++) ; S0: A += 42; ; ; for (long i = 0; i < 1024; i++) ; for (long j = 0; j < 1024; j++) ; S1: A += 42; ; ``` === WAR dependence: === { S0[1023, 1023] -> S1[0, 0] } - Between `S0[1023, 1023]` and `S1[0, 0]`, we will have the dependences: ```lang=cpp, name=dependence-incorrect, counterexample S0[1023, 1023]: -- tmp = A (load0)-- WAR 2 add = tmp + 42 \| -> A = add (store0) \| WAR 1 S1[0, 0]: \| tmp = A (load1) \| add = tmp + 42 \| A = add (store1)<- ``` - One may assume that WAR2 hides WAR1 (since store0 happens before store1). However, within a statement, Polly has no idea about the ordering of loads and stores. - Hence, according to Polly, the code may have looked like this: ```lang=cpp, name=dependence-correct S0[1023, 1023]: A = add (store0) tmp = A (load0) ---* add = A + 42 \| WAR 1 S1[0, 0]: \| tmp = A (load1) \| add = A + 42 \| A = add (store1) <-* ``` - So, Polly generates (correct) WAR dependences. It does not make sense to remove these dependences, since they are correct with respect to Polly's model. Reviewers: grosser, Meinersbur tags: #polly Differential revision: https://reviews.llvm.org/D31386 llvm-svn: 299429	2017-04-04 13:08:23 +00:00
Philip Pfaffe	447f175eb5	Fix formatting in LoopGenerators llvm-svn: 299424	2017-04-04 10:22:17 +00:00
Philip Pfaffe	2d950f36ee	[Polly][NewPM] Pull references to the legacy PM interface from utilities and helpers Summary: A couple of the utilities used to analyze or build IR make explicit use of the legacy PM on their interface, to access analysis results. This patch removes the legacy PM from the interface, and just passes the required results directly. This shouldn't introduce any function changes, although the API technically allowed to obtain two different analysis results before, one passed by reference and one through the PM. I don't believe that was ever intended, however. Reviewers: grosser, Meinersbur Reviewed By: grosser Subscribers: nemanjai, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D31653 llvm-svn: 299423	2017-04-04 10:01:53 +00:00
Tobias Grosser	637be04b77	[PerfMonitor] Use Intrinsics::getDeclaration Instead of creating the declaration ourselves, we obtain it directly from the LLVM intrinsic definitions. This addresses a post-review comment for r299359. Suggested-by: Hongzing Zheng <etherzhhb@gmail.com> llvm-svn: 299360	2017-04-03 15:23:08 +00:00
Tobias Grosser	65371af2e1	[CodeGen] Add Performance Monitor Add support for -polly-codegen-perf-monitoring. When performance monitoring is enabled, we emit performance monitoring code during code generation that prints after program exit statistics about the total number of cycles executed as well as the number of cycles spent in scops. This gives an estimate on how useful polyhedral optimizations might be for a given program. Example output: Polly runtime information ------------------------- Total: 783110081637 Scops: 663718949365 In the future, we might also add functionality to measure how much time is spent in optimized scops and how many cycles are spent in the fallback code. Reviewers: bollu,sebpop Tags: #polly Differential Revision: https://reviews.llvm.org/D31599 llvm-svn: 299359	2017-04-03 14:55:37 +00:00
Michael Kruse	0b8949e6ed	[test] Fix two testcases. NFC. Trivial fix for two testcases. When Polly isn't linked into opt, independent of whether it's built in-tree or not, these testcases forget to load the appropriate library. Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com> Differential Revision: https://reviews.llvm.org/D31596 llvm-svn: 299357	2017-04-03 12:37:10 +00:00
Michael Kruse	6e7854a560	[ScopInfo] Fix typos in option description. llvm-svn: 299356	2017-04-03 12:03:38 +00:00
Tobias Grosser	bd96c73a1a	Add test case for r299352. llvm-svn: 299353	2017-04-03 07:44:23 +00:00
Tobias Grosser	696a1ee99d	[PollyIRBuilder] Bound size of alias metadata No-alias metadata grows quadratic in the size of arrays involved, which can become very costly for large programs. This commit bounds the number of arrays for which we construct no-alias information to ten. This is conservatively correct, as we just provide less information to LLVM and speeds up the compile time of one of my internal test cases from 'does-not-terminate' to 'finishes-in-less-than-a-minute'. In the future we might try to be more clever here, but this change should provide a good baseline. llvm-svn: 299352	2017-04-03 07:42:50 +00:00
Tobias Grosser	af940ae280	Update to isl-0.18-410-gc253447 This is a regular maintenance update to ensure latest isl changes are tested in our buildbots. llvm-svn: 299350	2017-04-03 06:46:16 +00:00
Huihui Zhang	d6d6a3f2ee	revert test commit r299024 llvm-svn: 299026	2017-03-29 20:23:56 +00:00
Huihui Zhang	9d19e9d232	test commit, add blank line llvm-svn: 299024	2017-03-29 20:10:45 +00:00
Michael Kruse	c3e9c1442d	[ScopInfo] Introduce ScopStmt::contains(BB*). NFC. Provide an common way for testing if a statement contains something for region and block statements. First user is RegionGenerator::addOperandToPHI. Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 298617	2017-03-23 16:12:21 +00:00
Tobias Grosser	1f7e7d3d93	Update to isl-0.18-402-ga30c537 This is a regular maintenance update. llvm-svn: 298595	2017-03-23 13:38:24 +00:00
Michael Kruse	9e4e7b467f	[DeLICM] Add const qualifiers. NFC. llvm-svn: 298546	2017-03-22 20:09:58 +00:00
Michael Kruse	174f483990	[Support] Add functions to ISLTools. Add shiftDim and convertZoneToTimepoints overloads for isl maps. Add distributeDomain, liftDomains and applyDomainRange functions. These are going to be used in https://reviews.llvm.org/D31247 (Add known array contents to Knowledge) llvm-svn: 298543	2017-03-22 19:31:06 +00:00
Michael Kruse	d07d155ebb	[DeLICM] Remove overloaded Knowledge constructor. NFC. The isl C++ bindings now has implicit conversions from isl::set to isl::union_set. Therefore the additional overload accepting isl::set is not required anymore. llvm-svn: 298529	2017-03-22 18:01:23 +00:00
Michael Kruse	29143ec3f7	[DeLICM] Remove AllElements. NFC. It is not used and will not be used (anymore) in future commits. llvm-svn: 298522	2017-03-22 17:18:39 +00:00
Roman Gareev	cdfb57dc46	Introduce another level of metadata to distinguish non-aliasing accesses Introduce another level of alias metadata to distinguish the individual non-aliasing accesses that have inter iteration alias-free base pointers marked with "Inter iteration alias-free" mark nodes. It can be used to, for example, distinguish different stores (loads) produced by unrolling of the innermost loops and, subsequently, sink (hoist) them by LICM. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30606 llvm-svn: 298510	2017-03-22 14:25:24 +00:00
Roman Gareev	23df27682a	Map the new load to the base pointer of the invariant load hoisted load Map the new load to the base pointer of the invariant load hoisted load to be able to find the alias information for it. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30605 llvm-svn: 298507	2017-03-22 13:57:53 +00:00
Siddharth Bhat	44b6cb4e63	[DependenceInfo] change name Write to MustWrite to remove ambiguity [NFC] "Write" is an overloaded term. In collectInfo() till buildFlow(), it is used to mean "must writes". However, within the memory based analysis, it is used to mean "both may and must writes". Renaming the Write variable helps clarify this difference. Reviewers: grosser Tags: #polly Differential Revision: https://reviews.llvm.org/D31181 llvm-svn: 298361	2017-03-21 11:54:08 +00:00
Tobias Grosser	29eaa16b7e	Update isl to isl-0.18-395-g77701b3 This is a normal maintenance update. llvm-svn: 298352	2017-03-21 09:12:11 +00:00
Michael Kruse	0d10696693	[DeLICM] Refector out parseSetOrNull. NFC. Note that the isl::union_set(isl_ctx,std::string) constructor will auto-convert the char* to an std::string. Converting a nullptr to std::string is undefined in C++11 (sect. 21.4.2.9). llvm-svn: 298259	2017-03-20 15:37:32 +00:00
Michael Kruse	d75d56e9bf	[DeLICM] Add forgotten isl_space_set_tuple_id in unittests. Otherwise the isl_id NewId which ensures uniqueness of the created space is unused. None of the tests currently uses an nameless tuple, so there is not change in what is tested. llvm-svn: 298258	2017-03-20 15:24:45 +00:00
Tobias Grosser	b28f86e9e6	[CodeGen] Remove need for all parameters to be in scop context for load hoisting. When not adding constraints on parameters using -polly-ignore-parameter-bounds, the context may not necessarily list all parameter dimensions. To support code generation in this situation, we now always iterate over the actual parameter list, rather than relying on the context to list all parameter dimensions. llvm-svn: 298197	2017-03-18 23:12:49 +00:00
Tobias Grosser	1be726a40d	[IslExprBuilder] Print accessed memory locations with RuntimeDebugBuilder After this change, enabling -polly-codegen-add-debug-printing in combination with -polly-codegen-generate-expressions allows us to instrument the compiled binaries to not only print the values stored and loaded to a given memory access, but also to print the accessed location with array name and per-dimension offset: MemRef_A[3][2] Store to 6299784: 5.000000 MemRef_A[3][3] Load from 6299788: 0.000000 MemRef_A[3][3] Store to 6299788: 6.000000 This can be very helpful for debugging. llvm-svn: 298194	2017-03-18 20:54:43 +00:00
Tobias Grosser	7693b116a1	[OpenMP] Do not emit lifetime markers for context In commit r219005 lifetime markers have been introduced to mark the lifetime of the OpenMP context data structure. However, their use seems incorrect and recently caused a miscompile in ASC_Sequoia/CrystalMk after r298053 which was not at all related to r298053. r298053 only caused a change in the loop order, as this change resulted in a different isl internal representation which caused the scheduler to derive a different schedule. This change then caused the IR to change, which apparently created a pattern in which LLVM exploites the lifetime markers. It seems we are using the OpenMP context outside of the lifetime markers. Even though CrystalMk could probably be fixed by expanding the scope of the lifetime markers, it is not clear what happens in case the OpenMP function call is in a loop which will cause a sequence of starting and ending lifetimes. As it is unlikely that the lifetime markers give any performance benefit, we just drop them to remove complexity. llvm-svn: 298192	2017-03-18 20:10:07 +00:00
Siddharth Bhat	3e4a7d38ab	[ScheduleOptimiser] fix typos in top comment [NFC] coice -> choice Transations -> Transactions llvm-svn: 298095	2017-03-17 14:52:19 +00:00
Michael Kruse	89b1f94e64	Revert "Remove references to AssumptionCache. NFC." The AssumptionCache removal of r289756 has been reverted in r290086/r290087. A different solution has been implemented in r291671 which keeps the AssumptionCache. We can therefore use it again in Polly. This reverts r289791. llvm-svn: 298089	2017-03-17 13:56:53 +00:00
Siddharth Bhat	4fe11cf95f	[DependenceInfo] Remove idempotent union: must-writes with may-writes [NFC] Since may-writes are always a superset of the must-writes, there is no point in taking a union of one with the other. llvm-svn: 298085	2017-03-17 13:26:10 +00:00
Michael Kruse	9b91c62e3a	[ScopInfo/PruneUnprofitable] Move default profitability check. In the previous default ScopInfo applied the profitability heuristic for scalar accesses (-polly-unprofitable-scalar-accs=true) and the -polly-prune-unprofitable was disabled by default (-polly-enable-prune-unprofitable=false) as that pruning was already done. This changes switches the defaults to -polly-unprofitable-scalar-accs=true -polly-enable-prune-unprofitable=false such that the scalar access heuristic check is done by the pass. This allows passes between ScopInfo and PruneUnprofitable to optimize away scalar accesses. Without enabling such intermediate passes, there is no change in behaviour of profitability checks in a PassManagerBuilder built pass chain, but it allows us to cover this configuration with the buildbots. Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 298081	2017-03-17 13:10:05 +00:00
Michael Kruse	f3091bf4cf	[PruneUnprofitable] Add -polly-prune-unprofitable pass. ScopInfo's normal profitability heuristic considers SCoPs where all statements have scalar writes as not profitably optimizable and invalidate the SCoP in that case. However, -polly-delicm and -polly-simplify may be able to remove some of the scalar writes such that the flag -polly-unprofitable-scalar-accs=false allows disabling that part of the heuristic. In cases where DeLICM (or other passes after ScopInfo) are not successful in removing scalar writes, the SCoP is still not profitably optimizable. The schedule optimizer would again try computing another schedule, resulting in slower compilation. The -polly-prune-unprofitable pass applies the profitability heuristic again before the schedule optimizer Polly can still bail out even with -polly-unprofitable-scalar-accs=false. Differential Revision: https://reviews.llvm.org/D31033 llvm-svn: 298080	2017-03-17 13:09:52 +00:00
Tobias Grosser	5842dee251	[ScopInfo] Add option to not add parameter bounds to context [NFC] For experiments it is sometimes helpful to provide parameter bound information to polly and to not use these parameter bounds for simplification. Add a new option "-polly-ignore-parameter-bounds" which does precisely this. llvm-svn: 298077	2017-03-17 13:00:53 +00:00
Siddharth Bhat	db5dd14cbb	[DependenceInfo] Replace use of deprecated isl_dim_n_out [NFC] Change isl_dim_n_out to isl_map_dim(*, isl_dim_out) llvm-svn: 298075	2017-03-17 12:59:01 +00:00
Siddharth Bhat	65f3d5201e	[DependenceInfo] Track may-writes and build flow information in Dependences::calculateDependences. This ensures that we handle may-writes correctly when building dependence information. Also add a test case checking correctness of may-write information. Not handling it before was an oversight. Differential Revision: https://reviews.llvm.org/D31075 llvm-svn: 298074	2017-03-17 12:31:28 +00:00
Tobias Grosser	8a6e605e96	[ScopInfo] Do not take inbounds assumptions [NFC] For experiments it is sometimes helpful to not take any inbounds assumptions. Add a new option "-polly-ignore-inbounds" which does precisely this. llvm-svn: 298073	2017-03-17 12:26:58 +00:00
Tobias Grosser	b58ed8d3cd	[ScopInfo] Do not try to eliminate parameter dimensions that do not exist In subsequent changes we will make Polly a little bit more lazy in adding parameter dimensions to different sets. As a result, not all parameters will always be part of the parameter space. This change ensures that we do not use the '-1' returned when a parameter dimension cannot be found, but instead just do not try to eliminate the anyhow non-existing dimension. llvm-svn: 298054	2017-03-17 09:02:53 +00:00
Tobias Grosser	941cb7d979	[ScopInfo] Do not expand getDomains() to full parameter space. Since several years, isl can perform most operations on sets with differing parameter spaces, by expanding the parameter space on demand relying using named isl ids to distinguish different parameter dimensions. By not always expanding to full dimensionality the set remain smaller and can likely be operated on faster. This change by itself did not yet result in measurable performance benefits, but it is a step into the right direction needed to ensure that subsequent changes indeed can work with lower-dimensional sets and these sets do not get blown up by accident when later intersected with the domain context. llvm-svn: 298053	2017-03-17 09:02:50 +00:00
Tobias Grosser	f4fe34bfb8	Update to isl-0.18-387-g3fa6191 This is a normal / regular maintenance update. llvm-svn: 297999	2017-03-16 21:33:20 +00:00
Siddharth Bhat	65c4026992	Set Dependences::RED to be non-null once Dependences::calculateDependences() occurs, even if there is no actual reduction. This ensures correctness with isl operations. llvm-svn: 297981	2017-03-16 20:06:49 +00:00
Michael Kruse	5545407fa4	[ScopInfo] Introduce ScopStmt::getSurroundingLoop(). NFC. Introduce ScopStmt::getSurroundingLoop() to replace getFirstNonBoxedLoopFor. getSurroundingLoop() returns the precomputed surrounding/first non-boxed loop. Except in ScopDetection, the list of boxed loops is only used to get the surrounding loop. getFirstNonBoxedLoopFor also requires LoopInfo at every use which is not necessarily available everywhere where we may want to use it. Differential Revision: https://reviews.llvm.org/D30985 llvm-svn: 297899	2017-03-15 22:16:43 +00:00
Tobias Grosser	d614b3e6bd	Preserve the isl-noexceptions.h C++ bindings when updating isl The bindings currently need to be generated manually, as they are not yet part of the official isl distribution. Hence, we keep them across updates assuming they only need to be updated when new functions or functionality should be exposed. llvm-svn: 297710	2017-03-14 07:46:28 +00:00
Tobias Grosser	9c19a0e16a	Add back header file that was accidentally dropped in previous update llvm-svn: 297709	2017-03-14 07:39:05 +00:00
Tobias Grosser	593ebdfbd1	Update to isl-0.18-369-g5e613c6 This is a regular maintenance update. llvm-svn: 297708	2017-03-14 07:33:26 +00:00
Tobias Grosser	c9d4cb2f42	[ScheduleOptimizer] Allow tiling after fusion In ScheduleOptimizer::isTileableBand(), allow the case in which the band node's child is an isl_schedule_sequence_node and its grandchildren isl_schedule_leaf_nodes. This case can arise when two or more statements are fused by the isl scheduler. The tile_after_fusion.ll test has two statements in separate loop nests and checks whether they are tiled after being fused when polly-opt-fusion equals "max". Reviewers: grosser Subscribers: gareevroman, pollydev Tags: #polly Contributed-by: Theodoros Theodoridis <theodort@student.ethz.ch> Differential Revision: https://reviews.llvm.org/D30815 llvm-svn: 297587	2017-03-12 19:02:31 +00:00
Tobias Grosser	de244eb450	Possible error in doc comment If a SCoP is most probably sequential, then it's better to run it on a CPU. Hence, there's no point in running it on a GPU. Reviewers: grosser Subscribers: nemanjai Tags: #polly Contributed-by: Singapuram Sanjay <singapuram.sanjay@gmail.com> Differential Revision: https://reviews.llvm.org/D30864 llvm-svn: 297578	2017-03-12 08:19:01 +00:00
Tobias Grosser	b2347dc241	[isl++] Add missing /* implicit */ marker llvm-svn: 297577	2017-03-12 08:17:50 +00:00
Tobias Grosser	5ac963743f	[isl++] Add last set of missing isl:: prefixes to increase consistency [NFC] llvm-svn: 297558	2017-03-11 07:58:12 +00:00
Tobias Grosser	9cc7e3561d	[unittest] Do not convert large unsigned long to isl::val Currently the isl::val constructor only takes a signed long as parameter, which on Windows is only 32 bit large and can consequently not be used to obtain the same result when loading from the expression '(1ull << 32) - 1)' that we get when loading this value via isl_val_int_from_ui or when loading the value on Linux systems with 64 bit long data types. We avoid this issue by performing the shift and subtractiong within the isl::val. It would be nice to teach the isl++ bindings to construct isl::val from other integer types, but the current interface is not sufficient to do so. If constructors from both signed long and unsigned long are exposed, then integer literals that are of type 'int' and which must be converted to 'long' to match the constructor have two ambigious constructors to choose from, which result in a compiler error. The right solution is likely to additionally expose constructors from signed and unsigned int, but these are not yet available in the isl C interface and adding those adhoc to our bindings is something I would like to avoid for now. We should address this issue with a proper discussion on the isl side. llvm-svn: 297522	2017-03-10 22:25:39 +00:00
Tobias Grosser	d67d368e12	[isl++] Add namespace prefixes to isl::ctx and isl::stat These were missed in r297478. We add them for consistency. llvm-svn: 297520	2017-03-10 22:10:19 +00:00
Tobias Grosser	30a06dce68	[isl++] Drop warning about experimental status As most discussions about these bindings have concluded and only the final patch review on the isl mailing list is missing, we drop the experimental warning tag to match the patchset we will submit to isl, which is expected to not change notably any more. llvm-svn: 297519	2017-03-10 22:10:15 +00:00
Tobias Grosser	9839774e5d	[isl++] Do not use enum prefix Instead of declaring a function as: inline val plain_get_val_if_fixed(enum dim type, unsigned int pos) const; we use: inline isl::val plain_get_val_if_fixed(isl::dim type, unsigned int pos) const; The first argument caused the following compile time error on windows: "error C3431: 'dim': a scoped enumeration cannot be redeclared as an unscoped enumeration" In some cases it is sufficient to just drop the 'enum' prefix, but for example for isl::set the 'enum class dim' type collides with the function name isl::set::dim and can consequently not be referenced. To avoid such kind of ambiguities in the future we add the isl:: prefix consistently to all types used. Reported-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 297478	2017-03-10 17:01:30 +00:00
Michael Kruse	0446d81e2d	[Simplify] Add -polly-simplify pass. This new pass removes unnecessary accesses and writes. It currently supports 2 simplifications, but more are planned. It removes write accesses that write a loaded value back to the location it was loaded from. It is a typical artifact from DeLICM. Removing it will get rid of bogus dependencies later in dependency analysis. It also removes statements without side-effects. ScopInfo already removes these, but the removal of unnecessary writes can result in more side-effect free statements. Differential Revision: https://reviews.llvm.org/D30820 llvm-svn: 297473	2017-03-10 16:05:24 +00:00
Tobias Grosser	3e618c33fe	[DeadCodeElimination] Translate to C++ bindings This pass is a small and self-contained example of a piece of code that was written with the isl C interface. The diff of this change nicely shows how the C++ bindings can improve the readability of the code by avoiding the long C function names and by avoiding any need for memory management. As you will see, no calls to isl__copy or isl__free are needed anymore. Instead the C++ interface takes care of automatically managing the objects. This may introduce internally additional copies, but due to the isl reference counting, such copies are expected to be cheap. For performance critical operations, we will later exploit move semantics to eliminate unnecessary copies that have shown to be costly. Below we give a set of examples that shows the benefit of the C++ interface vs. the pure C interface. Check properties ---------------- Before: if (isl_aff_is_zero(aff) \|\| isl_aff_is_one(aff)) return true; After: if (Aff.is_zero() \|\| Aff.is_one()) return true; Type conversion --------------- Before: isl_union_pw_multi_aff UPMA = isl_union_pw_multi_aff_from_union_map(umap); After: isl::union_pw_multi_aff UPMA = UMap; Type construction ----------------- Before: auto Empty = isl_union_map_empty(space); After: auto Empty = isl::union_map::empty(Space); Operations ---------- Before: set = isl_union_set_intersect(set, set2); After: Set = Set.intersect(Set2); The use of isl::boolean in return types also adds an increases the robustness of Polly, as on conversion to true or false, we verify that no isl_bool_error has been returned and assert in case an error was returned. Before this change we would have just ignored the error and proceeded with (some) exection path. Tags: #polly Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D30619 llvm-svn: 297466	2017-03-10 15:05:38 +00:00
Tobias Grosser	3cc57fa1e7	[unittest] Translate isl tests to C++ bindings For this translation we introduce two functions, valFromAPInt and APIntFromVal, to convert between isl::val and APInt. For now these are just proxies, but in the future they will replace the current isl_val* based conversion functions. The isl unit test cases benefit most from the new isl::boolean (from Michael Kruse), which can be explicitly casted to bool and which -- as part of this cast -- emits a check that no error condition has been triggered so far. This allows us to simplify EXPECT_EQ(isl_bool_true, isl_val_is_zero(IslZero)); to EXPECT_TRUE(IslZero.is_zero()); This simplification also becomes very clear in operator==, which changes from auto IsEqual = isl_set_is_equal(LHS.keep(), RHS.keep()); EXPECT_NE(isl_bool_error, IsEqual); return IsEqual; to just return bool(LHS.is_equal(RHS)); Some background for non-isl users. The isl C interface has an isl_bool type, which can be either true, false, or error. Hence, whenever a function returns a value of type isl_bool, an explicit error check should be considered. By using isl::boolean, we can just cast the isl::boolean to 'bool' or simply use the isl::boolean in a context where it will automatically be casted to bool (e.g., in an if-condition). When doing so, the C++ bindings automatically add code that verifies that the return value is not an error code. If it is, the program will warn about this and abort. For cases where errors are expected, isl::boolean provides checks such as boolean::is_true_or_error() or boolean::is_true_no_error() to explicitly control program behavior in case of error conditions. Thanks to the new automatic memory management, we also can avoid many calls to isl_*_free. For code that had previously been managed by IslPtr<>, many calls to give/take/copy are eliminated. Tags: #polly Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D30618 llvm-svn: 297464	2017-03-10 14:58:50 +00:00
Tobias Grosser	51ebda8c9d	[FlattenAlgo] Translate to C++ bindings Translate the full algorithm to use the new isl C++ bindings This is a large piece of code that has been written with the Polly IslPtr<> memory management tool, which only performed memory management, but did not provide a method interface. As such the code was littered with calls to give(), copy(), keep(), and take(). The diff of this change should give a good example how the new method interface simplifies the code by removing the need for switching between managed types and C functions all the time and consequently also the need to use the long C function names. These are a couple of examples comparing the old IslPtr memory management interface with the complete method interface. Check properties ---------------- Before: if (isl_aff_is_zero(Aff.get()) \|\| isl_aff_is_one(Aff.get())) return true; After: if (Aff.is_zero() \|\| Aff.is_one()) return true; Type conversion --------------- Before: isl_union_pw_multi_aff *UPMA = give(isl_union_pw_multi_aff_from_union_map(UMap.copy()); After: isl::union_pw_multi_aff UPMA = UMap; Type construction ----------------- Before: auto Empty = give(isl_union_map_empty(Space.copy()); After: auto Empty = isl::union_map::empty(Space); Operations ---------- Before: Set = give(isl_union_set_intersect(Set.copy(), Set2.copy()); After: Set = Set.intersect(Set2); Tags: #polly Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D30617 llvm-svn: 297463	2017-03-10 14:55:58 +00:00
Tobias Grosser	4c24e57965	Add method interface to isl C++ bindings The isl C++ binding method interface introduces a thin C++ layer that allows to call isl methods directly on the memory managed C++ objects. This makes the relevant methods directly available via code-completion interfaces, allows for the use of overloading, conversion constructors, and many other nice C++ features that make using isl a lot easier. The individual features will be highlighted in the subsequent commits. Tags: #polly Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D30616 llvm-svn: 297462	2017-03-10 14:53:00 +00:00
Tobias Grosser	deaef15f52	Introduce isl C++ bindings, Part 1: value_ptr style interface Over the last couple of months several authors of independent isl C++ bindings worked together to jointly design an official set of isl C++ bindings which combines their experience in developing isl C++ bindings. The new bindings have been designed around a value pointer style interface and remove the need for explicit pointer managenent and instead use C++ language features to manage isl objects. This commit introduces the smart-pointer part of the isl C++ bindings and replaces the current IslPtr<T> classes, which served the very same purpose, but had to be manually maintained. Instead, we now rely on automatically generated classes for each isl object, which provide value_ptr semantics. An isl object has the following smart pointer interface: inline set manage(__isl_take isl_set ptr); class set { friend inline set manage(__isl_take isl_set ptr); isl_set ptr = nullptr; inline explicit set(__isl_take isl_set ptr); public: inline set(); inline set(const set &obj); inline set &operator=(set obj); inline ~set(); inline __isl_give isl_set copy() const &; inline __isl_give isl_set copy() && = delete; inline __isl_keep isl_set get() const; inline __isl_give isl_set release(); inline bool is_null() const; } The interface and behavior of the new value pointer style classes is inspired by http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3339.pdf, which proposes a std::value_ptr, a smart pointer that applies value semantics to its pointee. We currently only provide a limited set of public constructors and instead require provide a global overloaded type constructor method "isl::obj isl::manage(isl_obj )", which allows to convert an isl_set to an isl::set by calling 'S = isl::manage(s)'. This pattern models the make_unique() constructor for unique pointers. The next two functions isl::obj::get() and isl::obj::release() are taken directly from the std::value_ptr proposal: S.get() extracts the raw pointer of the object managed by S. S.release() extracts the raw pointer of the object managed by S and sets the object in S to null. We additionally add std::obj::copy(). S.copy() returns a raw pointer refering to a copy of S, which is a shortcut for "isl::obj(oldobj).release()", a functionality commonly needed when interacting directly with the isl C interface where all methods marked with __isl_take require consumable raw pointers. S.is_null() checks if S manages a pointer or if the managed object is currently null. We add this function to provide a more explicit way to check if the pointer is empty compared to a direct conversion to bool. This commit also introduces a couple of polly-specific extensions that cover features currently not handled by the official isl C++ bindings draft, but which have been provided by IslPtr<T> and are consequently added to avoid code churn. These extensions include: - operator bool() : Conversion from objects to bool - construction from nullptr_t - get_ctx() method - take/keep/give methods, which match the currently used naming convention of IslPtr<T> in Polly. They just forward to (release/get/manage). - raw_ostream printers We expect that these extensions are over time either removed or upstreamed to the official isl bindings. We also export a couple of classes that have not yet been exported in isl (e.g., isl::space) As part of the code review, the following two questions were asked: - Why do we not use a standard smart pointer? std::value_ptr was a proposal that has not been accepted. It is consequently not available in the standard library. Even if it would be available, we want to expand this interface with a complete method interface that is conveniently available from each managed pointer. The most direct way to achieve this is to generate a specialiced value style pointer class for each isl object type and add any additional methods to this class. The relevant changes follow in subsequent commits. - Why do we not use templates or macros to avoid code duplication? It is certainly possible to use templates or macros, but as this code is auto-generated there is no need to make writing this code more efficient. Also, most of these classes will be specialized with individual member functions in subsequent commits, such that there will be little code reuse to exploit. Hence, we decided to do so at the moment. These bindings are not yet officially part of isl, but the draft is already very stable. The smart pointer interface itself did not change since serveral months. Adding this code to Polly is against our normal policy of only importing official isl code. In this case however, we make an exception to showcase a non-trivial use case of these bindings which should increase confidence in these bindings and will help upstreaming them to isl. Tags: #polly Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D30325 llvm-svn: 297452	2017-03-10 11:41:03 +00:00
Tobias Grosser	e5671e54c0	Update to isl-0.18-356-g0b05d01 This is a regular maintenance update. llvm-svn: 297449	2017-03-10 09:17:55 +00:00
Michael Kruse	0666a76aac	[Support] Correct filename in file head comment. NFC. llvm-svn: 297430	2017-03-10 00:36:54 +00:00
Michael Kruse	e4292bf086	[Support] Add -polly-dump-module pass. This pass allows writing the LLVM-IR just before and after the Polly passes to a file. Dumping the IR before Polly helps reproducing bugs that occur in code generated by clang. It is the only reliable way to get the IR that triggers a bug. The alternative is to emit the IR with clang -c -emit-llvm -S -o dump.ll then pass it through all optimization passes opt dump.ll -basicaa -sroa ... -S -o optdump.ll to then reproduce the error with opt optdump.ll -polly-opt-isl -polly-codegen -analyze However, the IR is not the same. -O3 uses a PassBuilder than creates passes with different parameters than the default. Dumping the IR after Polly is useful to compare a miscompilation with a known-good configuration. Differential Revision: https://reviews.llvm.org/D30788 llvm-svn: 297415	2017-03-09 22:29:58 +00:00
Michael Kruse	a9520b94d5	[Cmake] Generate a PollyConfig.cmake. Generate a PollyConfig.cmake for use with Cmake's find_package in out-of-tree projects. Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com> Differential Revision: https://reviews.llvm.org/D30495 llvm-svn: 297395	2017-03-09 17:58:20 +00:00
Tobias Grosser	8bd7f3c0a5	[ScopDetect/Info] Allow unconditional hoisting of loads from dereferenceable ptrs In case LLVM pointers are annotated with !dereferencable attributes/metadata or LLVM can look at the allocation from which a pointer is derived, we can know that dereferencing pointers is safe and can be done unconditionally. We use this information to proof certain pointers as save to hoist and then hoist them unconditionally. llvm-svn: 297375	2017-03-09 11:36:00 +00:00
Michael Kruse	9fb3ab1b19	[DeLICM] Add -polly-delicm-overapproximate-writes option. One of the current limitations of DeLICM is that it only creates PHI WRITEs that it knows are read by some PHI. Such writes may not span all instances of a statement. Polly's code generator currently does not support MemoryAccesses that are not executed in all instances ('partial accesses') and so has to give up on a possible mapping. This workaround has once been suggested by Tobias Grosser: Try to interpolate an arbitrary expansion to all instances. It will be checked for possible conflicts with the existing Knowledge and can be applied if the conflict checking result is that no semantics are changed. Expansion is done by simplifying the mapping by coalescing with the hope that coalescing will find a polyhedral 'rule' of the relevant map. It is then 'gist'-ed using the domain of the relevant instances such that the rule is expanded to the universe and finally intersected with the domain of all statement instances. The expansion makes conflicts become more likely, the found rule may still not encompass all statement instances and the found rule exposes internals of isl's implementation of coalesce and gist. The latter means that the result depends on how much effort the implementation invests into finding a rule which may change between versions of isl. Trivial implementations of gist and coalesce just return the input arguments. A patch that makes codegen support partial accesses is in preparation as well. Differential Revision: https://reviews.llvm.org/D30763 llvm-svn: 297373	2017-03-09 11:23:22 +00:00
Michael Kruse	935b2a3654	[DeadCodeElim] Put -polly-dce-precise-steps into the Polly category. llvm-svn: 297318	2017-03-08 23:25:35 +00:00
Michael Kruse	6744efa8d8	[ScopDetection] Only allow SCoP-wide available base pointers. Simplify ScopDetection::isInvariant(). Essentially deny everything that is defined within the SCoP and is not load-hoisted. The previous understanding of "invariant" has a few holes: - Expressions without side-effects with only invariant arguments, but are defined withing the SCoP's region with the exception of selects and PHIs. These should be part of the index expression derived by ScalarEvolution and not of the base pointer. - Function calls with that are !mayHaveSideEffects() (typically functions with "readnone nounwind" attributes). An example is given below. @C = external global i32 declare float* @getNextBasePtr(float) readnone nounwind ... %ptr = call float @getNextBasePtr(float* %A, float %B) The call might return: * %A, so %ptr aliases with it in the SCoP * %B, so %ptr aliases with it in the SCoP * @C, so %ptr aliases with it in the SCoP * a new pointer everytime it is called, such as malloc() * a pointer into the allocated block of one of the aforementioned * any of the above, at random at each call Hence and contrast to a comment in the base_pointer.ll regression test, %ptr is not necessarily the same all the time. It might also alias with anything and no AliasAnalysis can tell otherwise if the definition is external. It is hence not suitable in the role of a base pointer. The practical problem with base pointers defined in SCoP statements is that it is not available globally in the SCoP. The statement instance must be executed first before the base pointer can be used. This is no problem if the base pointer is transferred as a scalar value between statements. Uses of MemoryAccess::setNewAccessRelation may add a use of the base pointer anywhere in the array. setNewAccessRelation is used by JSONImporter, DeLICM and D28518. Indeed, BlockGenerator currently assumes that base pointers are available globally and generates invalid code for new access relation (referring to the base pointer of the original code) if not, even if the base pointer would be available in the statement. This could be fixed with some added complexity and restrictions. The ExprBuilder must lookup the local BBMap and code that call setNewAccessRelation must check whether the base pointer is available first. The code would still be incorrect in the presence of aliasing. There is the switch -polly-ignore-aliasing to explicitly allow this, but it is hardly a justification for the additional complexity. It would still be mostly useless because in most cases either getNextBasePtr() has external linkage in which case the readnone nounwind attributes cannot be derived in the translation unit itself, or is defined in the same translation unit and gets inlined. Reviewed By: grosser Differential Revision: https://reviews.llvm.org/D30695 llvm-svn: 297281	2017-03-08 15:14:46 +00:00
Michael Kruse	5a4ec5c42b	[ScopDetection] Require LoadInst base pointers to be hoisted. Only when load-hoisted we can be sure the base pointer is invariant during the SCoP's execution. Most of the time it would be added to the required hoists for the alias checks anyway, except with -polly-ignore-aliasing, -polly-use-runtime-alias-checks=0 or if AliasAnalysis is already sure it doesn't alias with anything (for instance if there is no other pointer to alias with). Two more parts in Polly assume that this load-hoisting took place: - setNewAccessRelation() which contains an assert which tests this. - BlockGenerator which would use to the base ptr from the original code if not load-hoisted (if the access expression is regenerated) Differential Revision: https://reviews.llvm.org/D30694 llvm-svn: 297195	2017-03-07 20:28:43 +00:00
Tobias Grosser	a0b85963ba	Update isl to isl-0.18-336-g1e193d9 This is a regular maintenance update llvm-svn: 297169	2017-03-07 17:53:34 +00:00
Tobias Grosser	6c9958e0b3	[tests] Make sure tests do not end in 'unreachable' - Part III There is no point in optimizing unreachable code, hence our test cases should always return. This commit is part of a series that makes Polly more robust on the presence of unreachables. llvm-svn: 297158	2017-03-07 16:28:53 +00:00
Tobias Grosser	2d233fb35d	[tests] Update bounds-check elimination test cases These test cases should work in combination with https://reviews.llvm.org/D12676, but became outdated over time. Update them in preparation of discussions with Daniel Berlin on how to represent unreachable in the post-dominator tree. llvm-svn: 297157	2017-03-07 16:17:58 +00:00
Tobias Grosser	ce69e7b593	[ScopInfo] Avoid infinite loop during schedule construction Our current scop modeling enters an infinite loop when trying to model code that has unreachable instructions (e.g., test/ScopInfo/BoundChecks/single-loop.ll), as the number of basic blocks returned by the LLVM Loop* does not include unreachable basic blocks that branch off from the core loop body. This arises for example in the following piece of code: for (i = 0; i < N; i++) { if (i > 1024) abort(); <- this abort might be translated to an unreachable A[i] = ... } This patch adds these unreachable basic blocks in our per loop basic block count to ensure that the schedule construction does not assume a loop has been processed completely, despite certain unreachable basic blocks still remaining. The infinite loop is only observable in combination with https://reviews.llvm.org/D12676 or a similar patch. llvm-svn: 297156	2017-03-07 16:17:55 +00:00
Tobias Grosser	134a572951	[ScopDetection] Do not detect scops that exit to an unreachable Scops that exit with an unreachable are today still permitted, but make little sense to optimize. We therefore can already skip them during scop detection. This speeds up scop detection in certain cases and also ensures that bugpoint does not introduce unreachables when reducing test cases. In practice this change should have little impact, as the performance of unreachable code is unlikely to matter. This commit is part of a series that makes Polly more robust in the presence of unreachables. llvm-svn: 297151	2017-03-07 15:50:43 +00:00
Tobias Grosser	87dcd46aa7	[tests] Make sure tests do not end in 'unreachable' - Part II There is no point in optimizing unreachable code, hence our test cases should always return. This commit is part of a series that makes Polly more robust on the presence of unreachables. llvm-svn: 297150	2017-03-07 15:23:30 +00:00
Tobias Grosser	2dc1f547ae	[tests] Make sure tests do not end in 'unreachable' There is no point in optimizing unreachable code, hence our test cases should always return. This commit is part of a series that makes Polly more robust on the presence of unreachables. llvm-svn: 297147	2017-03-07 15:17:23 +00:00
Sanjoy Das	b641a90529	Adapt to llvm change r296992 to unbreak the bots r296992 made ScalarEvolution's CompareValueComplexity less aggressive, and that broke the polly test being fixed in this change. This change explicitly bumps CompareValueComplexity in said test case to make it pass. Can someone from the polly team please can give me an idea on if this case is important enough to have scalar-evolution-max-value-compare-depth be 3 by default? llvm-svn: 296994	2017-03-06 01:12:16 +00:00
Tobias Grosser	7d136d952e	[tests] Specify the dependence to NVPTX backend for Polly ACC test cases Some Polly ACC test cases fail without a working NVPTX backend. We explicitly specify this dependence in REQUIRES. Alternatively, we could have only marked polly-acc as supported in case the NVPTX backend is available, but as we might use other backends in the future, this does not seem to be the best choice. For this to work, we also need to make the 'targets_to_build' information available. Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 296853	2017-03-03 03:38:50 +00:00
Tobias Grosser	9d551da5c1	[test] Do not emit binary data to output Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 296852	2017-03-03 03:24:34 +00:00
Tobias Grosser	7a93d94a8f	Revert "Currently broken by recent LLVM upstream changes" This reverts commit r296579, which is not needed anymore as the relevant changes in trunk have been reverted. llvm-svn: 296817	2017-03-02 21:43:50 +00:00
Tobias Grosser	1c787e0b49	[ScopDetection] Do not allow required-invariant loads in non-affine region These loads cannot be savely hoisted as the condition guarding the non-affine region cannot be duplicated to also protect the hoisted load later on. Today they are dropped in ScopInfo. By checking for this early, we do not even try to model them and possibly can still optimize smaller regions not containing this specific required-invariant load. llvm-svn: 296744	2017-03-02 12:15:37 +00:00
Tobias Grosser	c2f151084d	[ScopInfo] Disable memory folding in case it results in multi-disjunct relations Multi-disjunct access maps can easily result in inbound assumptions which explode in case of many memory accesses and many parameters. This change reduces compilation time of some larger kernel from over 15 minutes to less than 16 seconds. Interesting is the test case test/ScopInfo/multidim_param_in_subscript.ll which has a memory access [n] -> { Stmt_for_body3[i0, i1] -> MemRef_A[i0, -1 + n - i1] } which requires folding, but where only a single disjunct remains. We can still model this test case even when only using limited memory folding. For people only reading commit messages, here the comment that explains what memory folding is: To recover memory accesses with array size parameters in the subscript expression we post-process the delinearization results. We would normally recover from an access A[exp0(i) * N + exp1(i)] into an array A[][N] the 2D access A[exp0(i)][exp1(i)]. However, another valid delinearization is A[exp0(i) - 1][exp1(i) + N] which - depending on the range of exp1(i) - may be preferrable. Specifically, for cases where we know exp1(i) is negative, we want to choose the latter expression. As we commonly do not have any information about the range of exp1(i), we do not choose one of the two options, but instead create a piecewise access function that adds the (-1, N) offsets as soon as exp1(i) becomes negative. For a 2D array such an access function is created by applying the piecewise map: [i,j] -> [i, j] : j >= 0 [i,j] -> [i-1, j+N] : j < 0 After this patch we generate only the first case, except for situations where we can proove the first case to be invalid and can consequently select the second without introducing disjuncts. llvm-svn: 296679	2017-03-01 21:11:27 +00:00
Tobias Grosser	24222c7357	Fix namespaces after clang-format update llvm-svn: 296635	2017-03-01 15:54:27 +00:00
Tobias Grosser	6f9b60cf38	Currently broken by recent LLVM upstream changes We mark it as XFAIL to get buildbots back to green, until the upstream changes have been addressed. llvm-svn: 296579	2017-03-01 04:34:44 +00:00
Tobias Grosser	d7c4975349	[ScopInfo] Simplify inbounds assumptions under domain constraints Without this simplification for a loop nest: void foo(long n1_a, long n1_b, long n1_c, long n1_d, long p1_b, long p1_c, long p1_d, float A_1[][p1_b][p1_c][p1_d]) { for (long i = 0; i < n1_a; i++) for (long j = 0; j < n1_b; j++) for (long k = 0; k < n1_c; k++) for (long l = 0; l < n1_d; l++) A_1[i][j][k][l] += i + j + k + l; } the assumption: n1_a <= 0 or (n1_a > 0 and n1_b <= 0) or (n1_a > 0 and n1_b > 0 and n1_c <= 0) or (n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d <= 0) or (n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d > 0 and p1_b >= n1_b and p1_c >= n1_c and p1_d >= n1_d) is taken rather than the simpler assumption: p9_b >= n9_b and p9_c >= n9_c and p9_d >= n9_d. The former is less strict, as it allows arbitrary values of p1_* in case, the loop is not executed at all. However, in practice these precise constraints explode when combined across different accesses and loops. For now it seems to make more sense to take less precise, but more scalable constraints by default. In case we find a practical example where more precise constraints are needed, we can think about allowing such precise constraints in specific situations where they help. This change speeds up the new test case from taking very long (waited at least a minute, but it probably takes a lot more) to below a second. llvm-svn: 296456	2017-02-28 09:45:54 +00:00
Tobias Grosser	cf66ea3845	Update isl to isl-0.18-304-g1efe43d This is a normal maintenance update. llvm-svn: 296441	2017-02-28 07:06:06 +00:00
Michael Kruse	6469380daa	[Cmake] Optionally use a system isl version. This patch adds an option to build against a version of libisl already installed on the system. The installation is autodetected using the pkg-config file shipped with isl. The detection of the library is in the FindISL.cmake module that creates an imported target. Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com> Differential Revision: https://reviews.llvm.org/D30043 llvm-svn: 296361	2017-02-27 17:54:25 +00:00
Michael Kruse	c4f61d2346	[DeLICM] Add nomap regressions tests. NFC. These verify that some scalars are not mapped because it would be incorrect to do so. For these check we verify that no transformation has been executed from output of the pass's '-analyze'. Adding optimization remarks is not useful as it would result in too many messages, even repeated ones. I avoided checking the '-debug-only=polly-delicm' output which is an antipattern. llvm-svn: 296348	2017-02-27 15:53:18 +00:00
Michael Kruse	b295c37a15	[DeLICM] Statistics for use in regression tests. Print some measurements of the DeLICM transformation at -analyze to be used in regression tests. llvm-svn: 296347	2017-02-27 15:53:13 +00:00
Roman Gareev	bc3fbe49c5	Disable the parallel code generation in case of extension nodes We can not perform the dependence analysis and, consequently, the parallel code generation in case the schedule tree contains extension nodes. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30394 llvm-svn: 296325	2017-02-27 08:03:11 +00:00
Michael Kruse	e199f285b0	[DeLICM] Fortify against exceeding isl's max operations counter. Control flow would flow-through after the check whether the operations quota exceeded, with the intention that it would later be caught by Knowledge::isUsable(). However, the Knowledge constructor has its own assertions to check consistency which would fail if its fields have only been initialized partially because some sets have been computed correctly before the operations quota takes effect. Fix by erroring-out early instead of falling-throught into the code that might expect that everything has been computed correctly. For robustness, also bail-out if any of the fields contain nullptr values instead of relying on isl always setting exactly this error code if something went wrong. This should fix the perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable (-polly-process-unprofitable -polly-position=before-vectorizer -polly-enable-delicm) buildbot. llvm-svn: 296022	2017-02-23 21:58:20 +00:00
Michael Kruse	f4e201e09f	[Support] Remove NonowningIslPtr. NFC. NonowningIslPtr<isl_X> was used as types of function parameters when the function does not consume the isl object, i.e. an __isl_keep parameter. The alternatives are: 1. IslPtr<isl_X> This has additional calls to isl_X_copy and isl_X_free to increase/decrease the reference counter even though not needed. The caller already owns a reference to the isl object. 2. const IslPtr<isl_X>& This does not change the reference counter, but requires an additional load to get the pointer to the isl object (instead of just passing the pointer itself). Moreover, the compiler cannot rely on the constness of the pointer and has to reload the pointer every time it writes to memory (unless alias analysis such as TBAA says it is not possible). The isl C++ bindings currently in development do not have an equivalent to NonowningIslPtr and adding one would make the binding more complicated and its advantage in performance is small. In order to simplify the transition to these C++ bindings, remove NonowningIslPtr. Change every former use of it to alternative 2 mentioned aboce (const IslPtr<isl_X>&). llvm-svn: 295998	2017-02-23 17:57:27 +00:00
Michael Kruse	2c7169d00c	[DependenceInfo] Remove unused variable. NFC. llvm-svn: 295987	2017-02-23 15:41:01 +00:00
Michael Kruse	dd6f29375b	[DependenceInfo] Use references instead of double pointers. NFC. Non-const references are the more C++-ish way to modify a variable passed by the caller. llvm-svn: 295986	2017-02-23 15:40:56 +00:00
Michael Kruse	ec8fc32160	[DependenceInfo] Rename StmtScheduleDomain -> TaggedStmtDomain. NFC. llvm-svn: 295985	2017-02-23 15:40:52 +00:00
Michael Kruse	00c38e0df2	[DependenceInfo] Simplify use of StmtSchedule's domain [NFC] Once a StmtSchedule is created, only its domain is used anywhere within DependenceInfo::calculateDependences. So, we choose to return the wrapped domain of the union_map rather than the entire union_map. However, we still build the union_map first within collectInfo(). It is cleaner to first build the entire union_map and then pull the domain out in one shot, rather than repeatedly extracting the domain in bits and pieces from accdom. Contributed-by: Siddharth Bhat <siddu.druid@gmail.com> Differential Revision: https://reviews.llvm.org/D30208 llvm-svn: 295984	2017-02-23 15:40:46 +00:00
Michael Kruse	52ab4943b4	Remove all references to PostDominators. NFC. Marking a pass as preserved is necessary if any Polly pass uses it, even if it is not preserved within the generated code. Not marking it would cause the the Polly pass chain to be interrupted. It is not used by any Polly pass anymore, hence we can remove all references to it. llvm-svn: 295983	2017-02-23 15:16:22 +00:00
Michael Kruse	9f519714b3	[DeLICM] Add missing Doxygen comment. NFC. llvm-svn: 295978	2017-02-23 14:51:50 +00:00
Michael Kruse	311ecb00dc	[DeLICM] Capitalize parameter name. NFC. llvm-svn: 295977	2017-02-23 14:51:45 +00:00
Tobias Grosser	59d23bbdc6	Update isl to isl-0.18-282-g12465a5 Besides a variety of smaller cleanups, this update also contains a correctness fix to isl coalesce which resolves a crash in Polly. llvm-svn: 295966	2017-02-23 12:48:42 +00:00
Roman Gareev	96e1119a96	Make optimizations based on pattern matching be enabled by default Currently, pattern based optimizations of Polly can identify matrix multiplication and optimize it according to BLIS matmul optimization pattern (see ScheduleTreeOptimizer for details). This patch makes optimizations based on pattern matching be enabled by default. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30293 llvm-svn: 295958	2017-02-23 11:44:12 +00:00
Michael Kruse	d8d32bb3d1	[DeLICM] Regression test for skipping map targets. Add optimization-remarks-missed for when mapping targets have been skipped and add regression tests for them. llvm-svn: 295953	2017-02-23 10:25:20 +00:00
Michael Kruse	deb30e8278	[DeLICM] Add regression tests for DeLICM reject cases. These tests were not included in the main DeLICM commit. These check the cases where zone analysis cannot be successful because of assumption violations. We use the LLVM optimization remark infrastructure as it seems to be the best fit for this kind of messages. I tried to make use if the OptimizationRemarkEmitter. However, it would insert additional function passes into the pass manager to get the hotness information. The pass manager would insert them between the flatten pass and delicm, causing the ScopInfo with the flattened schedule being thrown away. Differential Revision: https://reviews.llvm.org/D30253 llvm-svn: 295846	2017-02-22 15:14:08 +00:00
Michael Kruse	8474470500	[DeLICM] Fix wrong comment. NFC. Correct a comment that claimed that a store after load was detected when the code checks a load after a store. llvm-svn: 295835	2017-02-22 14:14:40 +00:00
Michael Kruse	43ed25f1d9	[DeLICM] Print message when zone analysis is not available on -analysis. This is to distinguish the cases that analysis has failed from the case where not transformation was performed. llvm-svn: 295833	2017-02-22 13:48:35 +00:00
Michael Kruse	91cdafb86f	[DeLICM] Use opt<int>. There is no template specialization for cl::parser<unsigned long> such that parsing an cl::opt<unsigned long> command line argument will fail. Use opt<int> instead which has an associated parser. llvm-svn: 295832	2017-02-22 13:48:18 +00:00
Tobias Grosser	cc43087afc	[DependenceInfo] Simplify creation and subsequent use of AccessSchedule [NFC] We only ever use the wrapped domain of AccessSchedule, so stop creating an entire union_map and then pulling the domain out. Reviewers: grosser Tags: #polly Contributed-by: Siddharth Bhat <siddu.druid@gmail.com> Differential Revision: https://reviews.llvm.org/D30179 llvm-svn: 295726	2017-02-21 15:38:31 +00:00
Michael Kruse	9e52c39f0a	[DeLICM] Map values hoisted by LICM back to the array. Implement the -polly-delicm pass. The pass intends to undo the effects of LoopInvariantCodeMotion (LICM) which adds additional scalar dependencies into SCoPs. DeLICM will try to map those scalars back to the array elements they were promoted from, as long as the array element is unused. The is the main patch from the DeLICM/DePRE patch series. It does not yet undo GVN PRE for which additional information about known values is needed and does not handle PHI write accesses that have have no target. As such its usefulness is limited. Patches for these issues including regression tests for error situatons will follow. Reviewers: grosser Differential Revision: https://reviews.llvm.org/D24716 llvm-svn: 295713	2017-02-21 10:20:54 +00:00
Michael Kruse	d9cdeb453d	[Cmake] Bump required cmake version to 3.4.3. This is currently the minimum required version by LLVM. Since LLVM is needed to build Polly, we also require at least that version. Suggested-by: Philip Pfaffe <philip.pfaffe@gmail.com> llvm-svn: 295672	2017-02-20 17:06:31 +00:00
Michael Kruse	5ab24fdb73	[Cmake] Install the isl headers into the install tree. isl headers are currently missing in a Polly installation. Because the Polly headers depend on those, code can't be compiled against an installed Polly. This patch installs the isl headers. I left a TODO, as optionally it should be possible to use a system version of isl instead of the one shipped with Polly. When compiling, clients of the installation need to add -I${PREFIX}/include/polly/ to there include path right now, because there currently is no way to export this path automatically. Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com> Differential Revision: https://reviews.llvm.org/D29931 llvm-svn: 295671	2017-02-20 16:57:14 +00:00
Tobias Grosser	079d511891	[ScopInfo] Count read-only arrays when computing complexity of alias check Instead of counting the number of read-only accesses, we now count the number of distinct read-only array references when checking if a run-time alias check may be too complex. The run-time alias check is quadratic in the number of base pointers, not the number of accesses. Before this change we accidentally skipped SPEC's lbm test case. llvm-svn: 295567	2017-02-18 20:51:29 +00:00
Tobias Grosser	28492b85e2	[DependenceInfo] Pull out statement [NFC] This simplifies the code slightly. llvm-svn: 295551	2017-02-18 16:41:28 +00:00
Tobias Grosser	8ee46985d2	[Dependences] Compute reduction dependences on schedule tree [NFC] This change gets rid of the need for zero padding, makes the reduction computation code more similar to the normal dependence computation, and also better documents what we do at the moment. Making the dependence computation for reductions a little bit easier to understand will hopefully help us to further reduce code duplication. This reduces the time spent only in the reduction dependence pass from 260ms to 150ms for test/DependenceInfo/reduction_sequence.ll. This is a reduction of over 40% in dependence computation time. This change was inspired by discussions with Michael Kruse, Utpal Bora, Siddharth Bhat, and Johannes Doerfert. It can hopefully lay the base for further cleanups of the reduction code. llvm-svn: 295550	2017-02-18 16:39:04 +00:00
Tobias Grosser	41f0d81b31	[test] Add reduction sequence test case [NFC] This test case is a mini performance test case that shows the time needed for a couple of simple reductions. It takes today about 325ms on my machine to run this test case through 'opt' with scop construction and reduction detection. It can be used as mini-proxy for further tuning of the reduction code. Generally we do not commit performance test cases, but as this is very small and also very fast it seems OK to keep it in the lit test suite. This test case will also help to verify that future changes to the reduction code will not affect the ordering of the reduction sets and will consequently not cause spurious performance changes that only result from reordering of dependences in the reduction set. llvm-svn: 295549	2017-02-18 16:38:58 +00:00
Tobias Grosser	2461021150	Drop leftover debug statement llvm-svn: 295444	2017-02-17 13:39:45 +00:00
Tobias Grosser	cd01a363d6	[ScopInfo] Add statistics to count loops after scop modeling llvm-svn: 295431	2017-02-17 08:12:36 +00:00
Tobias Grosser	65ce9362b8	[ScopDetection] Compute the maximal loop depth correctly Before this change, we obtained loop depth numbers that were deeper then the actual loop depth. llvm-svn: 295430	2017-02-17 08:08:54 +00:00
Tobias Grosser	72745c2ef5	Updated isl to isl-0.18-254-g6bc184d This update includes a couple more coalescing changes as well as a large number of isl-internal code cleanups (dead assigments, ...). llvm-svn: 295419	2017-02-17 05:11:16 +00:00
Tobias Grosser	ca2cfd0bd8	[ScopInfo] Do not try to fold array dimensions of size zero Trying to fold such kind of dimensions will result in a division by zero, which crashes the compiler. As such arrays are likely to invalidate the scop anyhow (but are not illegal in LLVM-IR), there is no point in trying to optimize the array layout. Hence, we just avoid the folding of constant dimensions of size zero. llvm-svn: 295415	2017-02-17 04:48:52 +00:00
Tobias Grosser	90411a967b	[ScopInfo] Rename MaxDisjunctions -> MaxDisjuncts [NFC] There is only a single disjunction. However, we bound the number of 'disjuncts' in this disjunction. Name the variable accordingly. llvm-svn: 295362	2017-02-16 19:11:33 +00:00
Tobias Grosser	76ec194951	[tests] Fix some misspellings [NFC] llvm-svn: 295361	2017-02-16 19:11:29 +00:00
Tobias Grosser	c8a8276710	[ScopInfo] Bound the number of disjuncts in context Before this change wrapping range metadata resulted in exponential growth of the context, which made context construction of large scops very slow. Instead, we now just do not model the range information precisely, in case the number of disjuncts in the context has already reached a certain limit. llvm-svn: 295360	2017-02-16 19:11:25 +00:00
Tobias Grosser	98a3aa4f19	[ScopInfo] Use uppercase variable name [NFC] llvm-svn: 295350	2017-02-16 18:39:18 +00:00
Tobias Grosser	3281f601bb	[ScopInfo] Always derive upper and lower bounds for parameters Commit r230230 introduced the use of range metadata to derive bounds for parameters, instead of just looking at the type of the parameter. As part of this commit support for wrapping ranges was added, where the lower bound of a parameter is larger than the upper bound: { 255 < p \|\| p < 0 } However, at the same time, for wrapping ranges support for adding bounds given by the size of the containing type has acidentally been dropped. As a result, the range of the parameters was not guaranteed to be bounded any more. This change makes sure we always add the bounds given by the size of the type and then additionally add bounds based on signed wrapping, if available. For a parameter p with a type size of 32 bit, the valid range is then: { -2147483648 <= p <= 2147483647 and (255 < p or p < 0) } llvm-svn: 295349	2017-02-16 18:39:14 +00:00
Roman Gareev	4eb07e481e	[FIX] Fix the typo in ScheduleOptimizer.cpp. llvm-svn: 295292	2017-02-16 07:04:41 +00:00
Michael Kruse	c28c584604	[DeLICM] Add forgotten unittests in previous commit. NFC. llvm-svn: 295204	2017-02-15 17:19:22 +00:00
Michael Kruse	e23e94a08d	[DeLICM] Add Knowledge class. NFC. The Knowledge class remembers the state of data at any timepoint of a SCoP's execution. Currently, it tracks whether an array element is unused or is occupied by some value, and the writes to it. A future addition will be to also remember which value it contains. Objects are used to determine whether two Knowledge contain conflicting information, i.e. two states cannot be true a the same time. This commit was extracted from the DeLICM algorithm at https://reviews.llvm.org/D24716. llvm-svn: 295197	2017-02-15 16:59:10 +00:00
Tobias Grosser	288c450cf6	[ScopDetectDiagnostics] Do not format unnamed array names Formatting unnamed array names is expensive in LLVM as the this requires deriving the numbered virtual instruction name (e.g., %12) for an llvm::Value, which is currently not implemented efficiently. As instruction numberes anyhow do not really carry a lot of information for the user, we just print 'unknown' instead. This change reduces the scop detection time from 24 to 19 seconds, for one of our large-scale inputs. This is a reduction by 21%. llvm-svn: 294894	2017-02-12 10:53:02 +00:00
Tobias Grosser	9fe37df27c	[ScopDetection] Add statistics to count the maximal number of scops in loop llvm-svn: 294893	2017-02-12 10:52:57 +00:00
Tobias Grosser	b3a85884f7	Do not use wrapping ranges to bound non-affine accesses When deriving the range of valid values of a scalar evolution expression might be a range [12, 8), where the upper bound is smaller than the lower bound and where the range is expected to possibly wrap around. We theoretically could model such a range as a union of two non-wrapping ranges, but do not do this as of yet. Instead, we just do not derive any bounds. Before this change, we could have obtained bounds where the maximal possible value is strictly smaller than the minimal possible value, which is incorrect and also caused assertions during scop modeling. llvm-svn: 294891	2017-02-12 08:11:12 +00:00
Roman Gareev	b196055c0c	Check reduction dependencies in case of the matrix multiplication optimization To determine parameters of the matrix multiplication, we check RAW dependencies that can be expressed using only reduction dependencies. Consequently, we should check the reduction dependencies, if this is the case. Reviewed-by: Tobias Grosser <tobias@grosser.es>, Sven Verdoolaege <skimo-polly@kotnet.org> Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D29814 llvm-svn: 294836	2017-02-11 09:59:09 +00:00
Roman Gareev	de69293b01	[FIX] Fix the potential issue of containsOnlyMatMulDep. llvm-svn: 294835	2017-02-11 09:48:09 +00:00
Roman Gareev	5ef7e210c0	[NFC] Fix the style issue of lib/Transform/ScheduleOptimizer.cpp. llvm-svn: 294834	2017-02-11 08:43:41 +00:00
Roman Gareev	afcf026d81	[NFC] Fix style issues of lib/Transform/ScheduleOptimizer.cpp. llvm-svn: 294831	2017-02-11 07:14:37 +00:00
Roman Gareev	3d4eae31ea	Use the size of the widest type of the matrix multiplication operands The size of the operands type is the one of the parameters required to determine the BLIS micro-kernel. We get the size of the widest type of the matrix multiplication operands in case there are several different types. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D29269 llvm-svn: 294828	2017-02-11 07:00:05 +00:00
Tobias Grosser	30a02088c0	Porting the example illustrating Polly from HTML to reStructuredText http://polly.llvm.org/example_manual_matmul.html which illustrates individual passes of Polly, has been ported to reStructuredText and necessary changes have been made to the configuration files used by SPHINX to include the new source as a part of the documentation. Contributed-by: Singapuram Sanjay Srivallabh <singapuram.sanjay@gmail.com> Differential Revision: https://reviews.llvm.org/D25163 llvm-svn: 294735	2017-02-10 11:46:57 +00:00
Tobias Grosser	296fe2e2ad	[ScopInfo] Use original base address when building ScopArrayInfo [NFC] This change clarfies that we want to indeed use the original base address when creating the ScopArrayInfo that corresponds to a given memory access. This change prepares for https://reviews.llvm.org/D28518. llvm-svn: 294734	2017-02-10 10:09:46 +00:00
Tobias Grosser	5db171a9da	[ScopInfo] Use getAccessValue to obtain the accessed value This replaces the use of getOriginalAddrPtr, a value that is stored in ScopArrayInfo and might at some point not be unique any more. However, the access value is defined to be unique. This change is an update on r294576, which only clarified that we need the original memory access, but where we still remained dependent to have one base pointer per scop. This change removes unnecessary uses of MemoryAddress::getOriginalBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294733	2017-02-10 10:09:44 +00:00
Tobias Grosser	583be06fb2	[BlockGenerator] Use MemoryAccess::getAccessValue to get load instruction When generating code in the BlockGenerator we copy all (interesting) instructions and keep track of the new values in a basic block map. To obtain the original llvm::Value that belongs to a load memory access, we use getAccessValue() instead of getOriginalBaseAddr(). The former always references the instruction we use to load values from. The latter, on the other hand, is obtaine from the corresponding ScopArrayInfo and would not be unique in case ScopArrayInfo objects at some point allow memory accesses with different base addresses. This change is an update on r294566, which only clarified that we need the original memory access, but where we still remained dependent to have one base pointer per scop. This change removes unnecessary uses of MemoryAddress::getOriginalBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294669	2017-02-09 23:54:23 +00:00
Tobias Grosser	e24b7b929d	[ScopInfo] Use MemoryAccess::getScopArrayInfo() interface to access Array [NFC] By using the public interface MemoryAccess::getScopArrayInfo() we avoid the direct access to the ScopArrayInfoMap and as a result also do not need to use the BasePtr as key. This change makes the code cleaner. The const-cast we introduce is a little ugly. We may consider to drop const correctness for getScopArrayInfo() at some point. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294655	2017-02-09 23:24:57 +00:00
Tobias Grosser	9c7d181c92	[ScopInfo] Use types instead of 'auto' and use more descriptive variable names [NFC] LLVM's coding conventions suggest to use auto only in obvious cases. Hence, we move this code to actually declare the types used. We also replace the variable name 'SAI', with the name 'Array', as this improves readability. llvm-svn: 294654	2017-02-09 23:24:54 +00:00
Tobias Grosser	889830b1c5	[ScopInfo] Use ScopArrayInfo instead of base address When building alias groups, we sort different ScopArrays into unrelated groups. Historically we identified arrays through their base pointer, as no ScopArrayInfo class was yet available. This change changes the alias group construction to reference arrays through their ScopArrayInfo object. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294649	2017-02-09 23:12:22 +00:00
Tobias Grosser	be372d5a04	[ScopInfo] Expect the OriginalBaseAddr when looking at underlying instructions [NFC] During SCoP construction we sometimes inspect the underlying IR by looking at the base address of a MemoryAccess. In such cases, we always want the original base address. Make this clear by calling getOriginalBaseAddr(). This is a non-functional change as getBaseAddr maps to getOriginalBaseAddr at the moment. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294576	2017-02-09 10:11:58 +00:00
Tobias Grosser	e0e0e4d4f6	[ScopInfo] Remove unnecessary indirection through SCEV [NFC] The base address of a memory access is already an llvm::Value. Hence, there is no need to go through SCEV, but we can directly work with the llvm::Value. Also use 'Value *' instead of 'auto' for cases where the type is not obvious. llvm-svn: 294575	2017-02-09 09:34:46 +00:00
Tobias Grosser	4553463be4	[IRBuilder] Extract base pointers directly from ScopArray Instead of iterating over statements and their memory accesses to extract the set of available base pointers, just directly iterate over all ScopArray objects. This reflects more the actual intend of the code: collect all arrays (and their base pointers) to emit alias information that specifies that accesses to different arrays cannot alias. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294574	2017-02-09 09:34:42 +00:00
Roman Gareev	028ba3702c	[FIX] Disable the problematic run lines There are problems with using the machine information to derive the precise vector size on polly-amd64-linux and polly-arm-linux. We temporarily disable the problematic run lines. llvm-svn: 294571	2017-02-09 09:03:13 +00:00
Roman Gareev	2d0d294e3c	[FIX] Specify the CPU to overwrite the machine info and set a fixed vector size. llvm-svn: 294569	2017-02-09 08:29:55 +00:00
Tobias Grosser	26fb7d7517	[IslAst] Print the ScopArray name to mark reductions Before this change we used the name of the base pointer to mark reductions. This is imprecise as the canonical reference is the ScopArray itself and not the basepointer of a reduction. Using the base pointer of reductions is problematic in cases where a single ScopArray is referenced through two different base pointers. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294568	2017-02-09 08:06:15 +00:00
Tobias Grosser	114f6d6ff5	[DependenceInfo] Use ScopArrayInfo to keep track of arrays [NFC] When computing reduction dependences we first identify all ScopArrays which are part of reductions and then only compute for these ScopArrays the more detailed data dependences that allow us to identify reductions and optimize across them. Instead of using the base pointer as identifier of a ScopArray, it is clearer and more understandable to directly use the ScopArray as identifier. This change implements such a switch. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294567	2017-02-09 08:06:05 +00:00
Tobias Grosser	02400a0e0c	[BlockGenerator] BBMap uses original BaseAddress for scalar loads [NFC] When regenerating code in the BlockGenerator we copy instructions that may references scalar values, for which the new value of a given scalar is looked up in BBMap using the original scalar llvm::Value as index. It is consequently necessary that (re)loaded scalar values are made available in BBMap using the original llvm::Value as key independently if the llvm::Value was (re)loaded from the original scalar or a new access function has been specified that caused the value to be reloaded from an array with a differnet base address. We make this clear by using MemoryAccess::getOriginalBaseAddr() instead of MemoryAccess::getBaseAddr() as index to BBMap. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294566	2017-02-09 08:05:50 +00:00
Roman Gareev	9989088ee9	Isolate a set of partial tile prefixes in case of the matrix multiplication optimization Isolate a set of partial tile prefixes to allow hoisting and sinking out of the unrolled innermost loops produced by the optimization of the matrix multiplication. In case it cannot be proved that the number of loop iterations can be evenly divided by tile sizes and we tile and unroll the point loop, the isl generates conditional expressions. Subsequently, the conditional expressions can prevent stores and loads of the unrolled loops from being sunk and hoisted. The patch isolates a set of partial tile prefixes, which have exactly Mr x Nr iterations of the two innermost loops, the result of the loop tiling performed by the matrix multiplication optimization, where Mr and Mr are parameters of the micro-kernel. This helps to get rid of the conditional expressions of the unrolled innermost loops. Probably this approach can be replaced with padding in future. In case of, for example, the gemm from Polybench/C 3.2 and parametric loop bounds, it helps to increase the performance from 7.98 GFlops (27.71% of theoretical peak) to 21.47 GFlops (74.57% of theoretical peak). Hence, we get the same performance as in case of scalar loops bounds. It also cause compile time regression. The compile-time is increased from 0.795 seconds to 0.837 seconds in case of scalar loops bounds and from 1.222 seconds to 1.490 seconds in case of parametric loops bounds. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D29244 llvm-svn: 294564	2017-02-09 07:10:01 +00:00
Roman Gareev	772498dc68	[NFC] Make ScheduleTreeOptimizer::optimizeBand return a schedule node optimized with optimizeMatMulPattern This patch makes ScheduleTreeOptimizer::optimizeBand return a schedule node optimized with optimizeMatMulPattern. Otherwise, it could not use the isolate option, because standardBandOpts could try to tile a band node with anchored subtree and get the error, since the use of the isolate option causes any tree containing the node to be considered anchored. Furthermore, it is not intended to apply standard optimizations, when the matrix multiplication has been detected. llvm-svn: 294444	2017-02-08 13:29:06 +00:00
Michael Kruse	49c21222a0	[External] Move lib/JSON to lib/External/JSON. NFC. For consistency with isl and ppcg which are already in lib/External. llvm-svn: 294126	2017-02-05 15:26:56 +00:00
Michael Kruse	acb08aaed5	[Support] Add convertZoneToTimepoints. NFC. This function has been extracted from the upcoming DeLICM patch (https://reviews.llvm.org/D24716). In contrast to computeReachingWrite and computeArrayUnused, convertZoneToTimepoints implies a format for zones (ranges between timepoints). Zones at the moment are unique to DeLICM, but convertZoneToTimepoints makes most sense in conjunction with the previous two functions. llvm-svn: 294094	2017-02-04 15:42:17 +00:00
Michael Kruse	ec67d36493	[Support] Add computeArrayUnused. NFC. This function has been extracted from the upcoming DeLICM patch (https://reviews.llvm.org/D24716). llvm-svn: 294093	2017-02-04 15:42:10 +00:00
Michael Kruse	f4dc133e69	[Support] Add computeReachingWrite. NFC. This function has been extracted from the upcoming DeLICM patch (https://reviews.llvm.org/D24716). llvm-svn: 294092	2017-02-04 15:42:01 +00:00
Michael Kruse	eeadf31de1	[Support] Remove unused function hasInvokeEdge. NFC. llvm-svn: 294062	2017-02-03 22:53:10 +00:00
Roman Gareev	98075fe181	A new algorithm for identification of a SCoP statement that implement a matrix multiplication The current identification of a SCoP statement that implement a matrix multiplication does not help to identify different permutations of loops that contain it and check for dependencies, which can prevent it from being optimized. It also requires external determination of the operands of the matrix multiplication. This patch contains the implementation of a new algorithm that helps to avoid these issues. It also modifies the test cases that generate matrix multiplications with linearized accesses, because the new algorithm does not support them. Reviewed-by: Michael Kruse <llvm@meinersbur.de>, Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28357 llvm-svn: 293890	2017-02-02 14:23:14 +00:00
Tobias Grosser	ff40087a6a	Update to recent formatting changes llvm-svn: 293756	2017-02-01 10:12:09 +00:00
Daniel Jasper	baaa152294	Fix format after recent clang-format change. llvm-svn: 293753	2017-02-01 09:31:42 +00:00
Roman Gareev	7758a2af53	Update the documentation on how the packing transformation is implemented Add a simple example to update the documentation on how the packing transformation is implemented. Reviewed-by: Tobias Grosser <tobias@grosser.es>, Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D28021 llvm-svn: 293429	2017-01-29 10:37:50 +00:00
Tobias Grosser	f58469ad45	Add forgotten test case for r293169 llvm-svn: 293383	2017-01-28 14:32:45 +00:00
Tobias Grosser	682c51143d	[BlockGenerator] Comment corretions for r293374 [NFC] This addresses some additional comments from Michael Kruse for commit r293374 as expressed in https://reviews.llvm.org/D28901. llvm-svn: 293378	2017-01-28 11:39:02 +00:00
Tobias Grosser	587f1f57ad	[Polly] [BlockGenerator] Unify ScalarMap and PhiOpsMap Instead of keeping two separate maps from Value to Allocas, one for MemoryType::Value and the other for MemoryType::PHI, we introduce a single map from ScopArrayInfo to the corresponding Alloca. This change is intended, both as a general simplification and cleanup, but also to reduce our use of MemoryAccess::getBaseAddr(). Moving away from using getBaseAddr() makes sure we have only a single place where the array (and its base pointer) for which we generate code for is specified, which means we can more easily introduce new access functions that use a different ScopArrayInfo as base. We already today experiment with modifiable access functions, so this change does not address a specific bug, but it just reduces the scope one needs to reason about. Another motivation for this patch is https://reviews.llvm.org/D28518, where memory accesses with different base pointers could possibly be mapped to a single ScopArrayInfo object. Such a mapping is currently not possible, as we currently generate alloca instructions according to the base addresses of the memory accesses, not according to the ScopArrayInfo object they belong to. By making allocas ScopArrayInfo specific, a mapping to a single ScopArrayInfo object will automatically mean that the same stack slot is used for these arrays. For D28518 this is not a problem, as only MemoryType::Array objects are mapping, but resolving this inconsistency will hopefully avoid confusion. llvm-svn: 293374	2017-01-28 07:42:10 +00:00
Michael Kruse	d1508812f5	[Support] Add general isl tools for DeLICM. NFC. Add some generally useful isl tools into a their own new ISLTools.cpp. These are the helpers were extracted from and will be use by the DeLICM algorithm (https://reviews.llvm.org/D24716). Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 293340	2017-01-27 22:51:36 +00:00
Michael Kruse	33dc454700	[CodePrepa] Remove unused declaration. NFC. llvm-svn: 293304	2017-01-27 16:59:09 +00:00
Tobias Grosser	77363965c0	[ScopDetectionDiagnostic] Add meaningfull enduser message for regions with entry block Before this change the user only saw "Unspecified Error", when a region contained the entry block. Now we report: "Scop contains function entry (not yet supported)." llvm-svn: 293169	2017-01-26 10:41:37 +00:00
Tobias Grosser	64bbb1357f	ScopDetectionDiagnostics: Also emit diagnostics in case no debug info is available In this case, we just use the start of the scop as the debug location. llvm-svn: 293165	2017-01-26 10:30:55 +00:00
Tobias Grosser	75dfaa1dbe	BlockGenerator: Do not redundantly reload from PHI-allocas in non-affine stmts Before this change we created an additional reload in the copy of the incoming block of a PHI node to reload the incoming value, even though the necessary value has already been made available by the normally generated scalar loads. In this change, we drop the code that generates this redundant reload and instead just reuse the scalar value already available. Besides making the generated code slightly cleaner, this change also makes sure that scalar loads go through the normal logic, which means they can be remapped (e.g. to array slots) and corresponding code is generated to load from the remapped location. Without this change, the original scalar load at the beginning of the non-affine region would have been remapped, but the redundant scalar load would continue to load from the old PHI slot location. It might be possible to further simplify the code in addOperandToPHI, but this would not only mean to pull out getNewValue, but to also change the insertion point update logic. As this did not work when trying it the first time, this change is likely not trivial. To not introduce bugs last minute, we postpone further simplications to a subsequent commit. We also document the current behavior a little bit better. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D28892 llvm-svn: 292486	2017-01-19 14:12:45 +00:00
Tobias Grosser	943c369c60	BlockGenerator: remove obfuscating const and const casts Making certain values 'const' to just cast it away a little later mainly obfuscates the code. Hence, we just drop the 'const' parts. Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 292480	2017-01-19 13:25:52 +00:00
Tobias Grosser	97b8490982	Use range-based for loop [NFC] llvm-svn: 292471	2017-01-19 05:09:23 +00:00

... 2 3 4 5 6 ...

3192 Commits