llvm-project

Commit Graph

Author	SHA1	Message	Date
Tobias Grosser	94e5371dde	Update to isl-0.18-43-g0b4256f Even more isl coalesce changes. llvm-svn: 290783	2016-12-31 07:46:11 +00:00
Tobias Grosser	ba3ea97689	Update to isl-0.18-28-gccb9f33 Another set of isl coalesce changes. llvm-svn: 290681	2016-12-28 19:35:49 +00:00
Tobias Grosser	600941351e	Update to isl-0.18-17-g2844ebf This update improves isl's ability to coalesce different convex sets/maps, especially when the contain existentially quantified variables. llvm-svn: 290538	2016-12-26 12:11:40 +00:00
Roman Gareev	1c2927b209	Specify the default values of the cache parameters If the parameters of the target cache (i.e., cache level sizes, cache level associativities) are not specified or have wrong values, we use ones for parameters of the macro-kernel and do not perform data-layout optimizations of the matrix multiplication. In this patch we specify the default values of the cache parameters to be able to apply the pattern matching optimizations even in this case. Since there is no typical values of this parameters, we use the parameters of Intel Core i7-3820 SandyBridge that also help to attain the high-performance on IBM POWER System S822 and IBM Power 730 Express server. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28090 llvm-svn: 290518	2016-12-25 16:32:28 +00:00
Tobias Grosser	0791d5f5aa	ScheduleOptimizer: Fix spelling of option '-polly-target-throughput-vector-fma' througput -> throughput llvm-svn: 290418	2016-12-23 07:33:39 +00:00
Tobias Grosser	ccae1ee4df	Update isl to isl-0.18-9-gd4734f3 llvm-svn: 290389	2016-12-22 23:08:57 +00:00
Roman Gareev	be5299af0b	Change the determination of parameters of macro-kernel Typically processor architectures do not include an L3 cache, which means that Nc, the parameter of the micro-kernel, is, for all practical purposes, redundant ([1]). However, its small values can cause the redundant packing of the same elements of the matrix A, the first operand of the matrix multiplication. At the same time, big values of the parameter Nc can cause segmentation faults in case the available stack is exceeded. This patch adds an option to specify the parameter Nc as a multiple of the parameter of the micro-kernel Nr. In case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=8 it helps to improve the performance from 11.303 GFlops/sec (39,247% of theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak). Refs.: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28019 llvm-svn: 290256	2016-12-21 12:51:12 +00:00
Roman Gareev	bd5c6039c6	Align newly created arrays to the first level cache line boundary Aligning data to cache lines boundaries helps to avoid overheads related to an access to it ([1]). This patch aligns newly created arrays and adds an option to specify the first level cache line size. By default we use 64 bytes, which is a typical cache-line size ([2]). In case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=8 it helps to improve the performance from 11.303 GFlops/sec (39,247% of theoretical peak) to 12.63 GFlops/sec (43,8542% of theoretical peak). Refs.: [1] - http://www.alexonlinux.com/aligned-vs-unaligned-memory-access [2] - http://igoro.com/archive/gallery-of-processor-cache-effects/ Differential Revision: https://reviews.llvm.org/D28020 Reviewed-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 290253	2016-12-21 12:37:36 +00:00
Roman Gareev	92c446016a	[Polly] Use three-dimensional arrays to store packed operands of the matrix multiplication Previously we had two-dimensional accesses to store packed operands of the matrix multiplication for the sake of simplicity of the packed arrays. However, addition of the third dimension helps to simplify the corresponding memory access, reduce the execution time of isl operations applied to it, and consequently reduce the compile-time of Polly. For example, in case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=7 it helps to reduce the compile-time from about 361.456 seconds to about 0.816 seconds. Reviewed-by: Michael Kruse <llvm@meinersbur.de>, Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D27878 llvm-svn: 290251	2016-12-21 11:18:42 +00:00
Tobias Grosser	b6945e3301	Fix clang-format llvm-svn: 290103	2016-12-19 14:06:40 +00:00
Daniel Jasper	e5f3eba9c3	Fix format after recent clang-format change. llvm-svn: 290085	2016-12-19 07:54:15 +00:00
Alexandre Isoard	cbed3ce39f	Add isl_multi_pw_aff to GICHelper Add isl_multi_pw_aff* to GICHelper and add some missing isl_pw_multi_aff* handlers. llvm-svn: 290007	2016-12-16 23:41:26 +00:00
Roman Gareev	2606c48a1d	Restrict ranges of extension maps To prevent copy statements from accessing arrays out of bounds, ranges of their extension maps are restricted, according to the constraints of domains. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D25655 llvm-svn: 289815	2016-12-15 12:35:59 +00:00
Roman Gareev	15db81ef71	[NFC] Fix typos in getMacroKernelParams. llvm-svn: 289808	2016-12-15 12:00:57 +00:00
Roman Gareev	8babe1a216	The order of the loops defines the data reused in the BLIS implementation of gemm ([1]). In particular, elements of the matrix B, the second operand of matrix multiplication, are reused between iterations of the innermost loop. To keep the reused data in cache, only elements of matrix A, the first operand of matrix multiplication, should be evicted during an iteration of the innermost loop. To provide such a cache replacement policy, elements of the matrix A can, in particular, be loaded first and, consequently, be least-recently-used. In our case matrices are stored in row-major order instead of column-major order used in the BLIS implementation ([1]). One of the ways to address it is to accordingly change the order of the loops of the loop nest. However, it makes elements of the matrix A to be reused in the innermost loop and, consequently, requires to load elements of the matrix B first. Since the LLVM vectorizer always generates loads from the matrix A before loads from the matrix B and we can not provide it. Consequently, we only change the BLIS micro kernel and the computation of its parameters instead. In particular, reused elements of the matrix B are successively multiplied by specific elements of the matrix A . Refs.: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D25653 llvm-svn: 289806	2016-12-15 11:47:38 +00:00
Michael Kruse	7037fde427	Remove references to AssumptionCache. NFC. The AssumptionCache was removed in r289756 after being replaced by the an addtional operand list of affected values in r289755. The absence of that cache means that we have now have to manually search for llvm.assume intrinsics as now done by other passes (LazyValueInfo, CodeMetrics) do not take into account an llvm::Instruction's user lists (ScalarEvolution). llvm-svn: 289791	2016-12-15 09:25:14 +00:00
Tobias Grosser	b02b6a8404	Adjust clang-format formatting to r289531 clang-format has been updated in r289531 to keep labels and values on the same line. This change updates Polly to the new formatting style. llvm-svn: 289533	2016-12-13 12:44:00 +00:00
Michael Kruse	79c0173f53	[ScheduleOptimizer] Fix memory leak. NFC. llvm-svn: 289434	2016-12-12 14:51:06 +00:00
Michael Kruse	b9a683d75d	Add more ISL foreachElt functions. NFC. Add and implement foreachElt for isl_map, isl_set and isl_union_set. These are used by an out-of-tree patch which is in process of being upstreamed. llvm-svn: 288924	2016-12-07 17:47:57 +00:00
Michael Kruse	2ead2bfc12	Add IslPtr type traits. NFC. Add traits for isl_id and isl_multi_aff, required by out-of-tree patches currently in progress of upstreaming. isl_union_pw_aff_dump has been added to ISL during one of the last ISL updates, such that we can also enable its dump() trait. llvm-svn: 288915	2016-12-07 16:17:59 +00:00
Michael Kruse	1b8eb4104b	Update to isl-0.17.1-314-g3106e8d This version includes an update for imath (isl-0.17.1-49-g2f1c129). It fixes the compilation under windows, which does not know ssize_t. In addition, isl-0.17.1-288-g0500299 changed the way isl_test finds the source directory. It now generates a file isl_srcdir.c at configure-time, containing the source path, to not require setting the environment variable "srcdir" at test-time. The cmake build system had to be modified to also generate that file. llvm-svn: 288811	2016-12-06 14:37:39 +00:00
Johannes Doerfert	bda814350a	Allow to disable unsigned operations (zext, icmp ugt, ...) Unsigned operations are often useful to support but the heuristics are not yet tuned. This options allows to disable them if necessary. llvm-svn: 288521	2016-12-02 17:55:41 +00:00
Johannes Doerfert	a94ae1aede	Do not allow multiple possibly aliasing ptrs in an expression Relational comparisons should not involve multiple potentially aliasing pointers. Similarly this should hold for switch conditions and the two conditions involved in equality comparisons (separately!). This is a heuristic based on the C semantics that does only allow such operations when the base pointers do point into the same object. Since this makes aliasing likely we will bail out early instead of producing a probably failing runtime check. llvm-svn: 288516	2016-12-02 17:49:52 +00:00
Johannes Doerfert	2df9963fe3	Rerun mem2reg after the inliner It did happen that after the inliner finished we end up with promotable allocas in a function. We now run mem2reg to make sure everything is promoted if possible. llvm-svn: 288514	2016-12-02 17:43:57 +00:00
Tobias Grosser	bedef00e2c	[ScopInfo] Fold constant coefficients in array dimensions to the right This allows us to delinearize code such as the one below, where the array sizes are A[][2 * n] as there are n times two elements in the innermost dimension. Alternatively, we could try to generate another dimension for the struct in the innermost dimension, but as the struct has constant size, recovering this dimension is easy. struct com { double Real; double Img; }; void foo(long n, struct com A[][n]) { for (long i = 0; i < 100; i++) for (long j = 0; j < 1000; j++) A[i][j].Real += A[i][j].Img; } int main() { struct com A[100][1000]; foo(1000, A); llvm-svn: 288489	2016-12-02 08:10:56 +00:00
Tobias Grosser	491b799a4d	[ScopInfo] Separate construction and finalization of memory accesses [NFC] After having built memory accesses we perform some additional transformations on them to increase the chances that our delinearization guesses the right shape. Only after these transformations, we take the assumptions that the array shape we predict is such that no out-of-bounds memory accesses arise. Before this change, the construction of the memory access, the access folding that improves the represenation for certain parametric subscripts, and taking the assumption was all done right after a memory access was created. In this change we split this now into three separate iterations over all memory accesses. This means only after all memory accesses have been built, we start to canonicalize accesses, and to take assumptions. This split prepares for future canonicalizations that must consider all memory accesses for deriving additional beneficial transformations. llvm-svn: 288479	2016-12-02 05:21:22 +00:00
Johannes Doerfert	b1d6608430	[NFC] Check for feasibility prior to the profitability check Feasibility is checked late on its own but early it is hidden behind the "PollyProcessUnprofitable" guard. This change will make sure we opt out early if the runtime context is infeasible anyway. llvm-svn: 288329	2016-12-01 11:12:14 +00:00
Johannes Doerfert	b6c5a5dd01	[FIX] Do not try to hoist obviously overwritten loads llvm-svn: 288328	2016-12-01 11:10:45 +00:00
Tobias Grosser	dc6b87c56e	Add newline at end of debug print In '[DBG] Allow to emit the RTC value at runtime' the diagnostics were printed without a newline at the end of each diagnostic. We add such a newline to improve readability. llvm-svn: 288323	2016-12-01 08:08:47 +00:00
Michael Kruse	36e79ecaec	[DeLICM] Add pass boilerplate code. Add an empty DeLICM pass, without any functional parts. Extracting the boilerplate from the the functional part reduces the size of the code to review (https://reviews.llvm.org/D24716) Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 288160	2016-11-29 16:41:21 +00:00
Michael Kruse	11c5e07925	canSynthesize: Remove unused argument LI. NFC. The helper function polly::canSynthesize() does not directly use the LoopInfo analysis, hence remove it from its argument list. llvm-svn: 288144	2016-11-29 15:11:04 +00:00
Tobias Grosser	df8f35b7b8	Update for clang-format change in r288119 llvm-svn: 288134	2016-11-29 12:52:08 +00:00
Tobias Grosser	278f9e7d27	[ScopInfo] Use SCEVRewriteVisitor to simplify SCEVSensitiveParameterRewriter [NFC] llvm-svn: 287984	2016-11-26 17:58:40 +00:00
Tobias Grosser	b45ae5601b	[ScopDetect] Expand statistics of the detected scops We now collect: Number of total loops Number of loops in scops Number of scops Number of scops with maximal loop depth 1 Number of scops with maximal loop depth 2 Number of scops with maximal loop depth 3 Number of scops with maximal loop depth 4 Number of scops with maximal loop depth 5 Number of scops with maximal loop depth 6 and larger Number of loops in scops (profitable scops only) Number of scops (profitable scops only) Number of scops with maximal loop depth 1 (profitable scops only) Number of scops with maximal loop depth 2 (profitable scops only) Number of scops with maximal loop depth 3 (profitable scops only) Number of scops with maximal loop depth 4 (profitable scops only) Number of scops with maximal loop depth 5 (profitable scops only) Number of scops with maximal loop depth 6 and larger (profitable scops only) These statistics are certainly completely accurate as we might drop scops when building up their polyhedral representation, but they should give a good indication of the number of scops we detect. llvm-svn: 287973	2016-11-26 07:37:46 +00:00
Tobias Grosser	5c00b0dc74	[ScopDetectionDiagnostic] Collect statistics for each diagnostic type Our original statistics were added before we introduced a more fine-grained diagnostic system, but the granularity of our statistics has never been increased accordingly. This change introduces now one statistic counter per diagnostic to enable us to collect fine-grained statistics about who certain scops are not detected. In case coarser grained statistics are needed, the user is expected to combine counters manually. llvm-svn: 287968	2016-11-26 05:53:09 +00:00
Tobias Grosser	c64269ea1b	[ScopDectionDiagnostic] Use scoped enums instead three letter prefix [NFC] This improves readability of the code. llvm-svn: 287963	2016-11-26 03:44:31 +00:00
Hongbin Zheng	4ff1c3983d	Fix typo. llvm-svn: 287819	2016-11-23 21:59:33 +00:00
Tobias Grosser	eab0943ec0	Update to isl-0.17.1-284-gbb38638 Regular maintenance update with only minor changes. llvm-svn: 287703	2016-11-22 21:31:59 +00:00
Tobias Grosser	b3c3d149b9	[CodeGen] Add flag to code-generate most memory access expressions Introduce the new flag -polly-codegen-generate-expressions which forces Polly to code generate AST expressions instead of using our SCEV based access expression generation even for cases where the original memory access relation was not changed and the SCEV based access expression could be code generated without any issue. This is an experimental option for better testing the isl ast expression generation. The default behavior of Polly remains unchanged. We also exclude a couple of cases for which the AST expression is not yet working. llvm-svn: 287694	2016-11-22 20:21:16 +00:00
Hongbin Zheng	a8fb73fc0b	Split ScopInfo::addScopStmt into two versions. NFC One for adding statement for region, another one for BB llvm-svn: 287566	2016-11-21 20:09:40 +00:00
Hongbin Zheng	3ffa6f40b0	Minor change llvm-svn: 287562	2016-11-21 19:26:10 +00:00
Tobias Grosser	1f0236d8e5	[ScopDetect] Use mayReadOrWriteMemory to shorten condition llvm-svn: 287525	2016-11-21 09:07:30 +00:00
Tobias Grosser	b94e9b31d0	[ScopDetect] Remove unnecessary namespace qualifier llvm-svn: 287524	2016-11-21 09:04:45 +00:00
Johannes Doerfert	81aa6e882f	[NFC] Adjust naming scheme of statistic variables Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 287347	2016-11-18 14:37:08 +00:00
Johannes Doerfert	6cd59e9076	Probably overwritten loads should not be considered hoistable Do not assume a load to be hoistable/invariant if the pointer is used by another instruction in the SCoP that might write to memory and that is always executed. llvm-svn: 287272	2016-11-17 22:25:17 +00:00
Johannes Doerfert	50dfbc572a	[NFC] Add flag to disable error block assumptions The declaration as an "error block" is currently aggressive and not very smart. This patch allows to disable error blocks completely. This might be useful to prevent SCoP expansion to a point where the assumed context becomes infeasible, thus the SCoP has to be discarded. llvm-svn: 287271	2016-11-17 22:16:35 +00:00
Johannes Doerfert	c97654681e	[FIX] Do not try to hoist memory intrinsic Since we do not necessarily treat memory intrinsics as non-affine anymore, we have to check for them explicitly before we try to hoist an access. llvm-svn: 287270	2016-11-17 22:11:56 +00:00
Johannes Doerfert	b3265a3612	[NFC] Skip over trivial assumptions Filter trivial assumptions, thus assume { : } or restrict { : 0 = 1 }, as they clutter the user output as well as the statistics. llvm-svn: 287269	2016-11-17 22:08:40 +00:00
Johannes Doerfert	dae2e9287d	[DBG] Collect statistics about actually versioned SCoPs llvm-svn: 287267	2016-11-17 21:55:43 +00:00
Johannes Doerfert	8c5464a715	[DBG] Allow to emit the RTC value at runtime The new command line flag "polly-codegen-emit-rtc-print" can be used to place a "printf" in the generated code that will print the RTC value and the overflow state. llvm-svn: 287265	2016-11-17 21:49:19 +00:00

1 2 3 4 5 ...

2012 Commits