llvm-project

Commit Graph

Author	SHA1	Message	Date
Roman Gareev	15db81ef71	[NFC] Fix typos in getMacroKernelParams. llvm-svn: 289808	2016-12-15 12:00:57 +00:00
Roman Gareev	8babe1a216	The order of the loops defines the data reused in the BLIS implementation of gemm ([1]). In particular, elements of the matrix B, the second operand of matrix multiplication, are reused between iterations of the innermost loop. To keep the reused data in cache, only elements of matrix A, the first operand of matrix multiplication, should be evicted during an iteration of the innermost loop. To provide such a cache replacement policy, elements of the matrix A can, in particular, be loaded first and, consequently, be least-recently-used. In our case matrices are stored in row-major order instead of column-major order used in the BLIS implementation ([1]). One of the ways to address it is to accordingly change the order of the loops of the loop nest. However, it makes elements of the matrix A to be reused in the innermost loop and, consequently, requires to load elements of the matrix B first. Since the LLVM vectorizer always generates loads from the matrix A before loads from the matrix B and we can not provide it. Consequently, we only change the BLIS micro kernel and the computation of its parameters instead. In particular, reused elements of the matrix B are successively multiplied by specific elements of the matrix A . Refs.: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D25653 llvm-svn: 289806	2016-12-15 11:47:38 +00:00
Michael Kruse	7037fde427	Remove references to AssumptionCache. NFC. The AssumptionCache was removed in r289756 after being replaced by the an addtional operand list of affected values in r289755. The absence of that cache means that we have now have to manually search for llvm.assume intrinsics as now done by other passes (LazyValueInfo, CodeMetrics) do not take into account an llvm::Instruction's user lists (ScalarEvolution). llvm-svn: 289791	2016-12-15 09:25:14 +00:00
Tobias Grosser	b02b6a8404	Adjust clang-format formatting to r289531 clang-format has been updated in r289531 to keep labels and values on the same line. This change updates Polly to the new formatting style. llvm-svn: 289533	2016-12-13 12:44:00 +00:00
Michael Kruse	79c0173f53	[ScheduleOptimizer] Fix memory leak. NFC. llvm-svn: 289434	2016-12-12 14:51:06 +00:00
Michael Kruse	b9a683d75d	Add more ISL foreachElt functions. NFC. Add and implement foreachElt for isl_map, isl_set and isl_union_set. These are used by an out-of-tree patch which is in process of being upstreamed. llvm-svn: 288924	2016-12-07 17:47:57 +00:00
Michael Kruse	2ead2bfc12	Add IslPtr type traits. NFC. Add traits for isl_id and isl_multi_aff, required by out-of-tree patches currently in progress of upstreaming. isl_union_pw_aff_dump has been added to ISL during one of the last ISL updates, such that we can also enable its dump() trait. llvm-svn: 288915	2016-12-07 16:17:59 +00:00
Michael Kruse	1b8eb4104b	Update to isl-0.17.1-314-g3106e8d This version includes an update for imath (isl-0.17.1-49-g2f1c129). It fixes the compilation under windows, which does not know ssize_t. In addition, isl-0.17.1-288-g0500299 changed the way isl_test finds the source directory. It now generates a file isl_srcdir.c at configure-time, containing the source path, to not require setting the environment variable "srcdir" at test-time. The cmake build system had to be modified to also generate that file. llvm-svn: 288811	2016-12-06 14:37:39 +00:00
Johannes Doerfert	bda814350a	Allow to disable unsigned operations (zext, icmp ugt, ...) Unsigned operations are often useful to support but the heuristics are not yet tuned. This options allows to disable them if necessary. llvm-svn: 288521	2016-12-02 17:55:41 +00:00
Johannes Doerfert	a94ae1aede	Do not allow multiple possibly aliasing ptrs in an expression Relational comparisons should not involve multiple potentially aliasing pointers. Similarly this should hold for switch conditions and the two conditions involved in equality comparisons (separately!). This is a heuristic based on the C semantics that does only allow such operations when the base pointers do point into the same object. Since this makes aliasing likely we will bail out early instead of producing a probably failing runtime check. llvm-svn: 288516	2016-12-02 17:49:52 +00:00
Johannes Doerfert	2df9963fe3	Rerun mem2reg after the inliner It did happen that after the inliner finished we end up with promotable allocas in a function. We now run mem2reg to make sure everything is promoted if possible. llvm-svn: 288514	2016-12-02 17:43:57 +00:00
Tobias Grosser	bedef00e2c	[ScopInfo] Fold constant coefficients in array dimensions to the right This allows us to delinearize code such as the one below, where the array sizes are A[][2 * n] as there are n times two elements in the innermost dimension. Alternatively, we could try to generate another dimension for the struct in the innermost dimension, but as the struct has constant size, recovering this dimension is easy. struct com { double Real; double Img; }; void foo(long n, struct com A[][n]) { for (long i = 0; i < 100; i++) for (long j = 0; j < 1000; j++) A[i][j].Real += A[i][j].Img; } int main() { struct com A[100][1000]; foo(1000, A); llvm-svn: 288489	2016-12-02 08:10:56 +00:00
Tobias Grosser	491b799a4d	[ScopInfo] Separate construction and finalization of memory accesses [NFC] After having built memory accesses we perform some additional transformations on them to increase the chances that our delinearization guesses the right shape. Only after these transformations, we take the assumptions that the array shape we predict is such that no out-of-bounds memory accesses arise. Before this change, the construction of the memory access, the access folding that improves the represenation for certain parametric subscripts, and taking the assumption was all done right after a memory access was created. In this change we split this now into three separate iterations over all memory accesses. This means only after all memory accesses have been built, we start to canonicalize accesses, and to take assumptions. This split prepares for future canonicalizations that must consider all memory accesses for deriving additional beneficial transformations. llvm-svn: 288479	2016-12-02 05:21:22 +00:00
Johannes Doerfert	b1d6608430	[NFC] Check for feasibility prior to the profitability check Feasibility is checked late on its own but early it is hidden behind the "PollyProcessUnprofitable" guard. This change will make sure we opt out early if the runtime context is infeasible anyway. llvm-svn: 288329	2016-12-01 11:12:14 +00:00
Johannes Doerfert	b6c5a5dd01	[FIX] Do not try to hoist obviously overwritten loads llvm-svn: 288328	2016-12-01 11:10:45 +00:00
Tobias Grosser	dc6b87c56e	Add newline at end of debug print In '[DBG] Allow to emit the RTC value at runtime' the diagnostics were printed without a newline at the end of each diagnostic. We add such a newline to improve readability. llvm-svn: 288323	2016-12-01 08:08:47 +00:00
Michael Kruse	36e79ecaec	[DeLICM] Add pass boilerplate code. Add an empty DeLICM pass, without any functional parts. Extracting the boilerplate from the the functional part reduces the size of the code to review (https://reviews.llvm.org/D24716) Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 288160	2016-11-29 16:41:21 +00:00
Michael Kruse	11c5e07925	canSynthesize: Remove unused argument LI. NFC. The helper function polly::canSynthesize() does not directly use the LoopInfo analysis, hence remove it from its argument list. llvm-svn: 288144	2016-11-29 15:11:04 +00:00
Tobias Grosser	df8f35b7b8	Update for clang-format change in r288119 llvm-svn: 288134	2016-11-29 12:52:08 +00:00
Tobias Grosser	278f9e7d27	[ScopInfo] Use SCEVRewriteVisitor to simplify SCEVSensitiveParameterRewriter [NFC] llvm-svn: 287984	2016-11-26 17:58:40 +00:00
Tobias Grosser	b45ae5601b	[ScopDetect] Expand statistics of the detected scops We now collect: Number of total loops Number of loops in scops Number of scops Number of scops with maximal loop depth 1 Number of scops with maximal loop depth 2 Number of scops with maximal loop depth 3 Number of scops with maximal loop depth 4 Number of scops with maximal loop depth 5 Number of scops with maximal loop depth 6 and larger Number of loops in scops (profitable scops only) Number of scops (profitable scops only) Number of scops with maximal loop depth 1 (profitable scops only) Number of scops with maximal loop depth 2 (profitable scops only) Number of scops with maximal loop depth 3 (profitable scops only) Number of scops with maximal loop depth 4 (profitable scops only) Number of scops with maximal loop depth 5 (profitable scops only) Number of scops with maximal loop depth 6 and larger (profitable scops only) These statistics are certainly completely accurate as we might drop scops when building up their polyhedral representation, but they should give a good indication of the number of scops we detect. llvm-svn: 287973	2016-11-26 07:37:46 +00:00
Tobias Grosser	5c00b0dc74	[ScopDetectionDiagnostic] Collect statistics for each diagnostic type Our original statistics were added before we introduced a more fine-grained diagnostic system, but the granularity of our statistics has never been increased accordingly. This change introduces now one statistic counter per diagnostic to enable us to collect fine-grained statistics about who certain scops are not detected. In case coarser grained statistics are needed, the user is expected to combine counters manually. llvm-svn: 287968	2016-11-26 05:53:09 +00:00
Tobias Grosser	c64269ea1b	[ScopDectionDiagnostic] Use scoped enums instead three letter prefix [NFC] This improves readability of the code. llvm-svn: 287963	2016-11-26 03:44:31 +00:00
Hongbin Zheng	4ff1c3983d	Fix typo. llvm-svn: 287819	2016-11-23 21:59:33 +00:00
Tobias Grosser	eab0943ec0	Update to isl-0.17.1-284-gbb38638 Regular maintenance update with only minor changes. llvm-svn: 287703	2016-11-22 21:31:59 +00:00
Tobias Grosser	b3c3d149b9	[CodeGen] Add flag to code-generate most memory access expressions Introduce the new flag -polly-codegen-generate-expressions which forces Polly to code generate AST expressions instead of using our SCEV based access expression generation even for cases where the original memory access relation was not changed and the SCEV based access expression could be code generated without any issue. This is an experimental option for better testing the isl ast expression generation. The default behavior of Polly remains unchanged. We also exclude a couple of cases for which the AST expression is not yet working. llvm-svn: 287694	2016-11-22 20:21:16 +00:00
Hongbin Zheng	a8fb73fc0b	Split ScopInfo::addScopStmt into two versions. NFC One for adding statement for region, another one for BB llvm-svn: 287566	2016-11-21 20:09:40 +00:00
Hongbin Zheng	3ffa6f40b0	Minor change llvm-svn: 287562	2016-11-21 19:26:10 +00:00
Tobias Grosser	1f0236d8e5	[ScopDetect] Use mayReadOrWriteMemory to shorten condition llvm-svn: 287525	2016-11-21 09:07:30 +00:00
Tobias Grosser	b94e9b31d0	[ScopDetect] Remove unnecessary namespace qualifier llvm-svn: 287524	2016-11-21 09:04:45 +00:00
Johannes Doerfert	81aa6e882f	[NFC] Adjust naming scheme of statistic variables Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 287347	2016-11-18 14:37:08 +00:00
Johannes Doerfert	6cd59e9076	Probably overwritten loads should not be considered hoistable Do not assume a load to be hoistable/invariant if the pointer is used by another instruction in the SCoP that might write to memory and that is always executed. llvm-svn: 287272	2016-11-17 22:25:17 +00:00
Johannes Doerfert	50dfbc572a	[NFC] Add flag to disable error block assumptions The declaration as an "error block" is currently aggressive and not very smart. This patch allows to disable error blocks completely. This might be useful to prevent SCoP expansion to a point where the assumed context becomes infeasible, thus the SCoP has to be discarded. llvm-svn: 287271	2016-11-17 22:16:35 +00:00
Johannes Doerfert	c97654681e	[FIX] Do not try to hoist memory intrinsic Since we do not necessarily treat memory intrinsics as non-affine anymore, we have to check for them explicitly before we try to hoist an access. llvm-svn: 287270	2016-11-17 22:11:56 +00:00
Johannes Doerfert	b3265a3612	[NFC] Skip over trivial assumptions Filter trivial assumptions, thus assume { : } or restrict { : 0 = 1 }, as they clutter the user output as well as the statistics. llvm-svn: 287269	2016-11-17 22:08:40 +00:00
Johannes Doerfert	dae2e9287d	[DBG] Collect statistics about actually versioned SCoPs llvm-svn: 287267	2016-11-17 21:55:43 +00:00
Johannes Doerfert	8c5464a715	[DBG] Allow to emit the RTC value at runtime The new command line flag "polly-codegen-emit-rtc-print" can be used to place a "printf" in the generated code that will print the RTC value and the overflow state. llvm-svn: 287265	2016-11-17 21:49:19 +00:00
Johannes Doerfert	cfadb2293f	[DBG] Collect statistics about statically infeasible SCoPs llvm-svn: 287263	2016-11-17 21:44:47 +00:00
Johannes Doerfert	cd195326bf	[DBG] Collect statistics about taken assumptions llvm-svn: 287261	2016-11-17 21:41:08 +00:00
Tobias Grosser	06e1592663	Update to isl-0.17.1-267-gbf9723d This update corrects an incorrect generation of min/max expressions in the isl AST generator and a problematic nullptr dereference. llvm-svn: 287098	2016-11-16 11:06:47 +00:00
Tobias Grosser	26be8e99b6	[ScopBuilder] Drop unnecessary namespace identifiers [NFC] llvm-svn: 286781	2016-11-13 21:28:13 +00:00
Tobias Grosser	5743e8de86	[SCEVAffinator] Do not scan redundantly for parameters In r286430 "SCEVValidator: add new parameters resulting from constant extraction" we added functionality to scan for parameters after constant extraction has taken place to ensure newly created parameters are correctly registered. This addition made the already existing registration of parameters redundant. Hence, we remove the corresponding call in this commit. An alternative solution would have been to also perform constant extraction when validating SCEV expressions and to then scan for parameters when validating a SCEV expression. However, as SCEV validation is used during SCoP detection where we want to be especially fast, adding additional functionality on this hot path should be avoided if good alternatives exist. In this case, we can choose to continue to only transform SCEV expression when actually modeling them. As all transformations we perform are expected to not change the validity of the SCEV expressions, this solution seems preferable. Suggested-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286780	2016-11-13 21:28:07 +00:00
Tobias Grosser	70d2709b1a	[ScopDetect] Conservatively handle inaccessible memory alias attributes Commit r286294 introduced support for inaccessiblememonly and inaccessiblemem_or_argmemonly attributes to BasicAA, which we need to support to avoid undefined behavior. This change just refuses all calls which are annotated with these attributes, which is conservatively correct. In the future we may consider to model and support such function calls in Polly. llvm-svn: 286771	2016-11-13 19:27:24 +00:00
Tobias Grosser	a2f8fa33aa	[ScopDetect] Evaluate and verify branches at branch condition, not icmp The validity of a branch condition must be verified at the location of the branch (the branch instruction), not the location of the icmp that is used in the branch instruction. When verifying at the wrong location, we may accept an icmp that is defined within a loop which itself dominates, but does not contain the branch instruction. Such loops cannot be modeled as we only introduce domain dimensions for surrounding loops. To address this problem we change the scop detection to evaluate and verify SCEV expressions at the right location. This issue has been around since at least r179148 "scop detection: properly instantiate SCEVs to the place where they are used", where we explicitly set the scope to the wrong location. Before this commit the scope was not explicitly set, which probably also resulted in the scope around the ICmp to be choosen. This resolves http://llvm.org/PR30989 Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286769	2016-11-13 19:27:04 +00:00
Tobias Grosser	f67433abd9	SCEVAffinator: pass parameter-only set to addRestriction if BB=nullptr Assumptions can either be added for a given basic block, in which case the set describing the assumptions is expected to match the dimensions of its domain. In case no basic block is provided a parameter-only set is expected to describe the assumption. The piecewise expressions that are generated by the SCEVAffinator sometimes have a zero-dimensional domain (e.g., [p] -> { [] : p <= -129 or p >= 128 }), which looks similar to a parameter-only domain, but is still a set domain. This change adds an assert that checks that we always pass parameter domains to addAssumptions if BB is empty to make mismatches here fail early. We also change visitTruncExpr to always convert to parameter sets, if BB is null. This change resolves http://llvm.org/PR30941 Another alternative to this change would have been to inspect all code to make sure we directly generate in the SCEV affinator parameter sets in case of empty domains. However, this would likely complicate the code which combines parameter and non-parameter domains when constructing a statement domain. We might still consider doing this at some point, but as this likely requires several non-local changes this should probably be done as a separate refactoring. Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286444	2016-11-10 11:44:10 +00:00
Tobias Grosser	d0b9173caa	IslAst: always use the context during ast generation Providing the context to the ast generator allows for additional simplifcations and -- more importantly -- allows to generate loops with only partially bounded domains, assuming the domains are bounded for all parameter configurations that are valid as defined by the context. This change fixes the crash reported in http://llvm.org/PR30956 The original reason why we did not include the context when generating an AST was that CLooG and later isl used to sometimes transfer some of the constraints that bound the size of parameters from the context into the generated AST. This resulted in operations with very large constants, which sometimes introduced problematic integer overflows. The latest versions of the isl AST generator are careful to not introduce such constants. Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286442	2016-11-10 09:39:58 +00:00
Tobias Grosser	4d543d654a	SCEVValidator: add new parameters resulting from constant extraction When extracting constant expressions out of SCEVs, new parameters may be introduced, which have not been registered before. This change scans SCEV expressions after constant extraction again to make sure newly introduced parameters are registered. We may for example extract the constant '8' from the expression '((8 * ((%a * %b) + %c)) + (-8 * %a))' and obtain the expression '(((-1 + %b) * %a) + %c)'. The new expression has a new parameter '(-1 + %b) * %a)', which was not registered before, but must be registered to not crash. This closes http://llvm.org/PR30953 Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286430	2016-11-10 06:45:28 +00:00
Tobias Grosser	bbaeda3fe5	Do not allow switch statements in loop latches In r248701 "Allow switch instructions in SCoPs" support for switch statements has been introduced, but support for switch statements in loop latches was incomplete. This change completely disables switch statements in loop latches. The original commit changed addLoopBoundsToHeaderDomain to support non-branch terminator instructions, but this change was incorrect: it added a check for BI != null to the if-branch of a condition, but BI was used in the else branch es well. As a result, when a non-branch terminator instruction is encounted a nullptr dereference is triggered. Due to missing test coverage, this bug was overlooked. r249273 "[FIX] Approximate non-affine loops correctly" added code to disallow switch statements for non-affine loops, if they appear in either a loop latch or a loop exit. We adapt this code to now prohibit switch statements in loop latches even if the control condition is affine. We could possibly add support for switch statements in loop latches, but such support should be evaluated and tested separately. This fixes llvm.org/PR30952 Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286426	2016-11-10 05:20:29 +00:00
Tobias Grosser	eba86a1208	ScopInfo: only run code needed for ASSERT in DEBUG mode Suggested-by: Johannes Doerfert llvm-svn: 286338	2016-11-09 04:24:49 +00:00
Tobias Grosser	a8ca3ed06a	SCEVValidator: reduce indentation to increase readability [NFC] llvm-svn: 286217	2016-11-08 07:17:48 +00:00
Tobias Grosser	16480186f8	IslNodeBuilder: Ensure newly generated memory accesses are well-defined Add some additional asserts that ensure newly code-generated memory accesses are defined on all domain and schedule domain instances. llvm-svn: 286050	2016-11-05 21:46:01 +00:00
Tobias Grosser	744740ad91	ScopInfo: Ensure copy statement memory accesses are correct Add asserts that verify that the memory accesses of a new copy statement are defined for all domain instances the copy statement is defined for. llvm-svn: 286047	2016-11-05 21:02:43 +00:00
Tobias Grosser	9321e208e8	Update isl to isl-0.17.1-243-g24c0339 This introduces big-endian support in imath and resolves http://llvm.org/PR24632. llvm-svn: 285993	2016-11-04 11:56:48 +00:00
Michael Kruse	e1dc387731	[ScopInfo] Fix isl object leak. Fix return from function without releasing isl objects, which was introduced in r269055. llvm-svn: 285924	2016-11-03 15:19:41 +00:00
Eli Friedman	acf8006471	[Polly CodeGen] Break critical edge from RTC to original loop. This makes polly generate a CFG which is closer to what we want in LLVM IR, with a loop preheader for the original loop. This is just a cleanup, but it exposes some fragile assumptions. I'm not completely happy with the changes related to expandCodeFor; RTCBB->getTerminator() is basically a random insertion point which happens to work due to the way we generate runtime checks. I'm not sure what the right answer looks like, though. Differential Revision: https://reviews.llvm.org/D26053 llvm-svn: 285864	2016-11-02 22:32:23 +00:00
Eli Friedman	b9c6f01a81	[ScopInfo] Make memset etc. affine where possible. We don't actually check whether a MemoryAccess is affine in very many places, but one important one is in checks for aliasing. Differential Revision: https://reviews.llvm.org/D25706 llvm-svn: 285746	2016-11-01 20:53:11 +00:00
Tobias Grosser	ebb626e4b7	[ScopDetect] Use SCEVRewriteVisitor to simplify SCEVRemoveSMax rewriter ScalarEvolution got at some pointer a SCEVRewriteVisitor. Use it to simplify our SCEVRemoveSMax visitor. llvm-svn: 285491	2016-10-29 06:19:34 +00:00
Michael Kruse	426e6f71f8	[ScopInfo] Fix: use raw source pointer. When adding an llvm.memcpy instruction to AliasSetTracker, it uses the raw source and target pointers which preserve bitcasts. MemAccInst::getPointerOperand() also returns the raw target pointers, but Scop::buildAliasGroups() did not for the source pointer. This lead to mismatches between AliasSetTracker and ScopInfo on which pointer to use. Fixed by also using raw pointers in Scop::buildAliasGroups(). llvm-svn: 285071	2016-10-25 13:37:43 +00:00
Eli Friedman	286c5a76ba	[SCEVAffinator] Make precise modular math more correct. Integer math in LLVM IR is modular. Integer math in isl is arbitrary-precision. Modeling LLVM IR math correctly in isl requires either adding assumptions that math doesn't actually overflow, or explicitly wrapping the math. However, expressions with the "nsw" flag are special; we can pretend they're arbitrary-precision because it's undefined behavior if the result wraps. SCEV expressions based on IR instructions with an nsw flag also carry an nsw flag (roughly; actually, the real rule is a bit more complicated, but the details don't matter here). Before this patch, SCEV flags were also overloaded with an additional function: the ZExt code was mutating SCEV expressions as a hack to indicate to checkForWrapping that we don't need to add assumptions to the operand of a ZExt; it'll add explicit wrapping itself. This kind of works... the problem is that if anything else ever touches that SCEV expression, it'll get confused by the incorrect flags. Instead, with this patch, we make the decision about whether to explicitly wrap the math a bit earlier, basing the decision purely on the SCEV expression itself, and not its users. Differential Revision: https://reviews.llvm.org/D25287 llvm-svn: 284848	2016-10-21 18:08:02 +00:00
Mandeep Singh Grang	48e7add80f	[polly] Change SmallPtrSet which are being iterated into SmallSetVector Summary: Otherwise the lack of an iteration order results in non-determinism in codegen. Reviewers: _jdoerfert, zinob, grosser Tags: #polly Differential Revision: https://reviews.llvm.org/D25863 llvm-svn: 284845	2016-10-21 17:29:10 +00:00
Michael Kruse	3d0266b664	[cmake] Avoid warnings in feature tests. NFC. Apply the __attribute__((unused)) before the function to unambiguously apply to the function declaration. Add more casts-to-void to mark return values unused as intended. Contributed-by: Andy Gibbs <andyg1001@hotmail.co.uk> llvm-svn: 284718	2016-10-20 11:16:19 +00:00
Tobias Grosser	69ed8542f5	Update isl to isl-0.17.1-236-ga9c6cc7 This includes isl_id_to_str, which is used in Michael's upcoming DeLICM patch. llvm-svn: 284689	2016-10-20 01:59:24 +00:00
Mandeep Singh Grang	5b1abfc88e	[polly] Fix non-determinism in polly BlockGenerators Summary: Iterating over SeenBlocks which is a SmallPtrSet results in non-determinism in codegen Reviewers: jdoerfert, zinob, grosser Tags: #polly Differential Revision: https://reviews.llvm.org/D25778 llvm-svn: 284622	2016-10-19 17:56:49 +00:00
Eli Friedman	3c1a75bf9c	Handle multi-dimensional invariant load. If the address of a load depends on another load, make sure to emit the loads in the right order. llvm-svn: 284426	2016-10-17 21:04:26 +00:00
Michael Kruse	6a19d592da	[ScopDetect] Depend transitively on ScalarEvolution. ScopDetection might be queried by -dot-scops or -view-scops passes for which it accesses ScalarEvolution. llvm-svn: 284385	2016-10-17 13:29:20 +00:00
Tobias Grosser	4b5e24df9a	cmake: avoid "zero-length gnu_printf format string" warning in gcc 6.1.1 Contributed-by: Andy Gibbs <andyg1001@hotmail.co.uk> llvm-svn: 284302	2016-10-15 05:08:12 +00:00
Michael Kruse	fa53c86dc1	[ScopInfo/CodeGen] ExitPHI reads are implicit. Under some conditions MK_Value read accessed where converted to MK_ExitPHI read accessed. This is unexpected because MK_ExitPHI read accesses are implicit after the scop execution. This behaviour was introduced in r265261, which fixed a failed assertion/crash in CodeGen. Instead, we fix this failure in CodeGen itself. createExitPHINodeMerges(), despite its name, also handles accesses of kind MK_Value, only to skip them because they access values that are usually not PHI nodes in the SCoP region's exit block. Except in the situation observed in r265261. Do not convert value accessed to ExitPHI accesses and do not handle value accesses like ExitPHI accessed in CodeGen anymore. llvm-svn: 284023	2016-10-12 16:31:09 +00:00
Michael Kruse	c9edc2ee8d	[DepInfo] Print -debug output outside of max-operations scope. ISL tries to simplify the polyhedral operations before printing its objects. This increases the operations counter and therefore can contribute to hitting the operations limit. Therefore the result could be different when -debug output is enabled, making debugging harder. llvm-svn: 283745	2016-10-10 11:45:59 +00:00
Michael Kruse	8bfba1ff46	[Support/DepInfo] Introduce IslMaxOperationsGuard and make DepInfo use it. NFC. IslMaxOperationsGuard defines a scope where ISL may abort operations because if it takes too many operations. Replace the call to the raw ISL interface by a use of the guard. IslMaxOperationsGuard provides a uniform way to define a maximal computation time for a code region in C++ using RAII. llvm-svn: 283744	2016-10-10 11:45:54 +00:00
Tobias Grosser	b270288752	Fix formatting after recent cl:: changes This fixes 'make check-polly' llvm-svn: 283693	2016-10-09 08:31:35 +00:00
Mehdi Amini	732afdd09a	Turn cl::values() (for enum) from a vararg function to using C++ variadic template The core of the change is supposed to be NFC, however it also fixes what I believe was an undefined behavior when calling: va_start(ValueArgs, Desc); with Desc being a StringRef. Differential Revision: https://reviews.llvm.org/D25342 llvm-svn: 283671	2016-10-08 19:41:06 +00:00
Hongbin Zheng	5860aef675	Define PATH_MAX on windows Differential Revision: https://reviews.llvm.org/D25372 llvm-svn: 283600	2016-10-07 20:58:20 +00:00
Michael Kruse	de42b43de0	[cmake] Unify disabling upstream project warnings. Handle MSVC, ISL and PPCG in one place. The only functional change is that warnings are also disabled for MSVC compiling PPCG (Which currently fails anyway). llvm-svn: 283547	2016-10-07 12:38:32 +00:00
Michael Kruse	4b5f6af2dc	[cmake] Move isl_test artifacts to Polly folder. Folders in Visual Studio solutions help organize the build artifacts from all LLVM projects. There is a folder to keep Polly-built files in. llvm-svn: 283546	2016-10-07 12:38:24 +00:00
Tobias Grosser	e84ee850d1	Build and run isl_test as part of check-polly Running isl tests is important to gain confidence that the isl build we created works as expected. Besides the actual isl tests, there are also isl AST generation tests shipped with isl. This change only adds support for the isl unit tests. AST generation test support is left for a later commit. There is a choice to run tests directly through the build system or in the context of lit. We choose to run tests as part of lit to as this allows us to easily set environment variables, print output only on error and generally run the tests directly from the lit command. Reviewers: brad.king, Meinersbur Subscribers: modocache, brad.king, pollydev, beanz, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D25155 llvm-svn: 283245	2016-10-04 19:48:40 +00:00
Michael Kruse	6ab4476835	[ScopInfo] Add -polly-unprofitable-scalar-accs option. With this option one can disable the heuristic that assumes that statements with a scalar write access cannot be profitably optimized. Such a statement instances necessarily have WAW-dependences to itself. With DeLICM scalar accesses can be changed to array accesses, which can avoid these WAW-dependence. llvm-svn: 283233	2016-10-04 17:33:39 +00:00
Michael Kruse	ca7cbcca37	[ScopInfo] Scalar access do not have indirect base pointers. ScopArrayInfo used to determine base pointer origins by looking up whether the base pointer is a load. The "base pointer" for scalar accesses is the llvm::Value being accessed. This is only a symbolic base pointer, it represents the alloca variable (.s2a or .phiops) generated for it at code generation. This patch disables determining base pointer origin for scalars. A test case where this caused a crash will be added in the next commit. In that test SAI tried to get the origin base pointer that was only declared later, therefore not existing. This is probably only possible for scalars used in PHINode incoming blocks. llvm-svn: 283232	2016-10-04 17:33:34 +00:00
Tobias Grosser	5652c9c830	isl: update to isl-0.17.1-233-gc911e6a llvm-svn: 283049	2016-10-01 19:46:51 +00:00
Michael Kruse	4b0c5aea78	[CodeGen] Add assertion for indirect array index expression generation. NFC. Currently Polly cannot generate code for index expressions if the base pointer is computed within the scop. The base pointer must be generated as well, but there is no code that triggers that. Add an assertion to detect when this would occur and miscompile. The IR verifier should catch it as well. llvm-svn: 282893	2016-09-30 18:29:37 +00:00
Michael Kruse	51f514d853	[Support] Compile fix for gcc. NFC. gcc 5.4 insists on template specialization to be in a namespace polly { ... } block, instead of being prefixed with 'polly::'. Error message: root/src/llvm/tools/polly/lib/Support/GICHelper.cpp:203:54: error: specialization of ‘template<class T> void polly::IslPtr<T>::dump() const’ in different namespace [-fpermissive] template <> void polly::IslPtr<isl_##TYPE>::dump() const { \ ^ msvc14 and clang 3.8 did not complain. llvm-svn: 282874	2016-09-30 16:47:43 +00:00
Michael Kruse	55519dad62	[Support] Add (Nonowning-)IslPtr::dump(). NFC. The dump() methods can be called from a debugger instead of e.g. isl_*_dump(Var.Obj) where Var is a variable of type IslPtr/NonowningIslPtr. To ensure that the existence of the function pointers do not depdend on whether the methods are used somwhere, they are declared with external linkage. llvm-svn: 282870	2016-09-30 16:10:19 +00:00
Michael Kruse	888ab55140	[CodeGen] Change 'Scalar' to 'Array' in method names. NFC. generateScalarLoad() and generateScalarStore() are used for explicit (MK_Array) memory accesses, therefore the method names were misleading. The names also were similar to generateScalarLoads() and generateScalarStores() (plural forms) which indeed handle scalar accesses. Presumbly, they were originally named to contrast VectorBlockGenerator::generateLoad(). Rename the two methods to generateArrayLoad(), respectively generateArrayStore(). llvm-svn: 282861	2016-09-30 14:34:05 +00:00
Michael Kruse	77394f1394	[CodeGen] Add assertion for partial scalar accesses. NFC. The code generator always adds unconditional LoadInst and StoreInst, hence the MemoryAccess must be defined over all statement instances. llvm-svn: 282853	2016-09-30 14:01:46 +00:00
Tobias Grosser	349d1c3368	[ScopDetection] Remove redundant checks for endless loops Summary: Both `canUseISLTripCount()` and `addOverApproximatedRegion()` contained checks to reject endless loops which are now removed and replaced by a single check in `isValidLoop()`. For reporting such loops the `ReportLoopOverlapWithNonAffineSubRegion` is renamed to `ReportLoopHasNoExit`. The test case `ReportLoopOverlapWithNonAffineSubRegion.ll` is adapted and renamed as well. The schedule generation in `buildSchedule()` is based on the following assumption: Given some block B that is contained in a loop L and a SESE region R, we assume that L is contained in R or the other way around. However, this assumption is broken in the presence of endless loops that are nested inside other loops. Therefore, in order to prevent erroneous behavior in `buildSchedule()`, r265280 introduced a corresponding check in `canUseISLTripCount()` to reject endless loops. Unfortunately, it was possible to bypass this check with -polly-allow-nonaffine-loops which was fixed by adding another check to reject endless loops in `allowOverApproximatedRegion()` in r273905. Hence there existed two separate locations that handled this case. Thank you Johannes Doerfert for helping to provide the above background information. Reviewers: Meinersbur, grosser Subscribers: _jdoerfert, pollydev Differential Revision: https://reviews.llvm.org/D24560 Contributed-by: Matthias Reisinger <d412vv1n@gmail.com> llvm-svn: 281987	2016-09-20 17:05:22 +00:00
Tobias Grosser	bc653f2031	GPGPU: Do not run mostly sequential kernels in GPU In case sequential kernels are found deeper in the loop tree than any parallel kernel, the overall scop is probably mostly sequential. Hence, run it on the CPU. llvm-svn: 281849	2016-09-18 08:31:09 +00:00
Tobias Grosser	82f2af3508	GPGPU: Dynamically ensure 'sufficient compute' Offloading to a GPU is only beneficial if there is a sufficient amount of compute that can be accelerated. Many kernels just have a very small number of dynamic compute, which means GPU acceleration is not beneficial. We compute at run-time an approximation of how many dynamic instructions will be executed and fall back to CPU code in case this number is not sufficiently large. To keep the run-time checking code simple, we over-approximate the number of instructions executed in each statement by computing the volume of the rectangular hull of its iteration space. llvm-svn: 281848	2016-09-18 06:50:35 +00:00
Tobias Grosser	51dfc27589	GPGPU: Store back non-read-only scalars We may generate GPU kernels that store into scalars in case we run some sequential code on the GPU because the remaining data is expected to already be on the GPU. For these kernels it is important to not keep the scalar values in thread-local registers, but to store them back to the corresponding device memory objects that backs them up. We currently only store scalars back at the end of a kernel. This is only correct if precisely one thread is executed. In case more than one thread may be run, we currently invalidate the scop. To support such cases correctly, we would need to always load and store back from a corresponding global memory slot instead of a thread-local alloca slot. llvm-svn: 281838	2016-09-17 19:22:31 +00:00
Tobias Grosser	fe74a7a1f5	GPGPU: Detect read-only scalar arrays ... and pass these by value rather than by reference. llvm-svn: 281837	2016-09-17 19:22:18 +00:00
Tobias Grosser	8f86a47461	Update CFGPrinter -> CFGPrinterLegacyPass .. to match recent changes in LLVM that broke the Polly compilation. llvm-svn: 281705	2016-09-16 05:48:09 +00:00
Tobias Grosser	aaabbbf886	GPGPU: Do not assume arrays start at 0 Our alias checks precisely check that the minimal and maximal accessed elements do not overlap in a kernel. Hence, we must ensure that our host <-> device transfers do not touch additional memory locations that are not covered in the alias check. To ensure this, we make sure that the data we copy for a given array is only the data from the smallest element accessed to the largest element accessed. We also adjust the size of the array according to the offset at which the array is actually accessed. An interesting result of this is: In case array are accessed with negative subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to cover the full array. This is important as such code indeed exists in the wild. llvm-svn: 281611	2016-09-15 14:05:58 +00:00
Roman Gareev	b3224adfb6	Perform copying to created arrays according to the packing transformation This is the fourth patch to apply the BLIS matmul optimization pattern on matmul kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf). BLIS implements gemm as three nested loops around a macro-kernel, plus two packing routines. The macro-kernel is implemented in terms of two additional loops around a micro-kernel. The micro-kernel is a loop around a rank-1 (i.e., outer product) update. In this change we perform copying to created arrays, which is the last step to implement the packing transformation. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D23260 llvm-svn: 281441	2016-09-14 06:26:09 +00:00
Tobias Grosser	e8c69bbabd	cmake: PollyPPCG depends on PollyISL This line makes BUILD_SHARED_LIBS=ON work for Polly-ACC. Without it, ld complains about missing isl symbols when constructing the shared library. llvm-svn: 281396	2016-09-13 21:09:35 +00:00
Tobias Grosser	0a893f7df4	GPGPU: Use const_cast to avoid compiler warning [NFC] llvm-svn: 281333	2016-09-13 13:22:27 +00:00
Michael Kruse	19c9d99f45	Use value directly instead of reference. NFC. The alias to the array element is read-only and a primitive type (pointer), therefore use the value directly instead of a reference to it. llvm-svn: 281311	2016-09-13 09:56:05 +00:00
Tobias Grosser	a82c4b5df8	GPGPU: Allow region statements llvm-svn: 281305	2016-09-13 08:42:10 +00:00
Tobias Grosser	b79f4d3970	GPGPU: Extend types when array sizes have smaller types This prevents a compiler crash. llvm-svn: 281303	2016-09-13 08:02:14 +00:00
Michael Kruse	e5e752a28b	Remove -fvisibility=hidden and FORCE_STATIC. The flag -fvisibility=hidden flag was used for the integrated Integer Set Library (and PPCG) to keep their definitions local to Polly. The motivation was the be loaded into a DragonEgg-powered GCC, where GCC might itself use ISL for its Graphite extension. The symbols of Polly's ISL and GCC's ISL would clash. The DragonEgg project is not actively developed anymore, but Polly's unittests need to call ISL functions to set up a testing environment. Unfortunately, the -fvisibility=hidden flag means that the ISL symbols are not available to the gtest executable as it resides outside of libPolly when linked dynamically. Currently, CMake links a second copy of ISL into the unittests which leads to subtle bugs. What got observed is that two isl_ids for isl_id_none exist, one for each library instance. Because isl_id's are compared by address, isl_id_none could happen to be different from isl_id_none, depending on which library instance set the address and does the comparison. Also remove the FORCE_STATIC flag which was introduced to keep the ISL symbols visible inside the same libPolly shared object, even when build with BUILD_SHARED_LIBS. Differential Revision: https://reviews.llvm.org/D24460 llvm-svn: 281242	2016-09-12 18:25:00 +00:00
Roman Gareev	f5aff70405	Store the size of the outermost dimension in case of newly created arrays that require memory allocation. We do not need the size of the outermost dimension in most cases, but if we allocate memory for newly created arrays, that size is needed. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D23991 llvm-svn: 281234	2016-09-12 17:08:31 +00:00
Tobias Grosser	5857b701a3	GPGPU: Bail out gracefully in case of invalid IR Instead of aborting, we now bail out gracefully in case the kernel IR we generate is invalid. This can currently happen in case the SCoP stores pointer values, which we model as arrays, as data values into other arrays. In this case, the original pointer value is not available on the device and can consequently not be stored. As detecting this ahead of time is not so easy, we detect these situations after the invalid IR has been generated and bail out. llvm-svn: 281193	2016-09-12 06:06:31 +00:00
Tobias Grosser	02293ed755	GPGPU: Do not fail in case of arrays never accessed If these arrays have never been accessed we failed to derive an upper bound of the accesses and consequently a size for the outermost dimension. We now explicitly check for empty access sets and then just use zero as size for the outermost dimension. llvm-svn: 281165	2016-09-11 13:30:12 +00:00

1 2 3 4 5 ...

2049 Commits