llvm-project

Commit Graph

Author	SHA1	Message	Date
Tobias Grosser	696a1ee99d	[PollyIRBuilder] Bound size of alias metadata No-alias metadata grows quadratic in the size of arrays involved, which can become very costly for large programs. This commit bounds the number of arrays for which we construct no-alias information to ten. This is conservatively correct, as we just provide less information to LLVM and speeds up the compile time of one of my internal test cases from 'does-not-terminate' to 'finishes-in-less-than-a-minute'. In the future we might try to be more clever here, but this change should provide a good baseline. llvm-svn: 299352	2017-04-03 07:42:50 +00:00
Michael Kruse	c3e9c1442d	[ScopInfo] Introduce ScopStmt::contains(BB*). NFC. Provide an common way for testing if a statement contains something for region and block statements. First user is RegionGenerator::addOperandToPHI. Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 298617	2017-03-23 16:12:21 +00:00
Roman Gareev	cdfb57dc46	Introduce another level of metadata to distinguish non-aliasing accesses Introduce another level of alias metadata to distinguish the individual non-aliasing accesses that have inter iteration alias-free base pointers marked with "Inter iteration alias-free" mark nodes. It can be used to, for example, distinguish different stores (loads) produced by unrolling of the innermost loops and, subsequently, sink (hoist) them by LICM. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30606 llvm-svn: 298510	2017-03-22 14:25:24 +00:00
Roman Gareev	23df27682a	Map the new load to the base pointer of the invariant load hoisted load Map the new load to the base pointer of the invariant load hoisted load to be able to find the alias information for it. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30605 llvm-svn: 298507	2017-03-22 13:57:53 +00:00
Tobias Grosser	b28f86e9e6	[CodeGen] Remove need for all parameters to be in scop context for load hoisting. When not adding constraints on parameters using -polly-ignore-parameter-bounds, the context may not necessarily list all parameter dimensions. To support code generation in this situation, we now always iterate over the actual parameter list, rather than relying on the context to list all parameter dimensions. llvm-svn: 298197	2017-03-18 23:12:49 +00:00
Tobias Grosser	1be726a40d	[IslExprBuilder] Print accessed memory locations with RuntimeDebugBuilder After this change, enabling -polly-codegen-add-debug-printing in combination with -polly-codegen-generate-expressions allows us to instrument the compiled binaries to not only print the values stored and loaded to a given memory access, but also to print the accessed location with array name and per-dimension offset: MemRef_A[3][2] Store to 6299784: 5.000000 MemRef_A[3][3] Load from 6299788: 0.000000 MemRef_A[3][3] Store to 6299788: 6.000000 This can be very helpful for debugging. llvm-svn: 298194	2017-03-18 20:54:43 +00:00
Tobias Grosser	7693b116a1	[OpenMP] Do not emit lifetime markers for context In commit r219005 lifetime markers have been introduced to mark the lifetime of the OpenMP context data structure. However, their use seems incorrect and recently caused a miscompile in ASC_Sequoia/CrystalMk after r298053 which was not at all related to r298053. r298053 only caused a change in the loop order, as this change resulted in a different isl internal representation which caused the scheduler to derive a different schedule. This change then caused the IR to change, which apparently created a pattern in which LLVM exploites the lifetime markers. It seems we are using the OpenMP context outside of the lifetime markers. Even though CrystalMk could probably be fixed by expanding the scope of the lifetime markers, it is not clear what happens in case the OpenMP function call is in a loop which will cause a sequence of starting and ending lifetimes. As it is unlikely that the lifetime markers give any performance benefit, we just drop them to remove complexity. llvm-svn: 298192	2017-03-18 20:10:07 +00:00
Tobias Grosser	de244eb450	Possible error in doc comment If a SCoP is most probably sequential, then it's better to run it on a CPU. Hence, there's no point in running it on a GPU. Reviewers: grosser Subscribers: nemanjai Tags: #polly Contributed-by: Singapuram Sanjay <singapuram.sanjay@gmail.com> Differential Revision: https://reviews.llvm.org/D30864 llvm-svn: 297578	2017-03-12 08:19:01 +00:00
Michael Kruse	0446d81e2d	[Simplify] Add -polly-simplify pass. This new pass removes unnecessary accesses and writes. It currently supports 2 simplifications, but more are planned. It removes write accesses that write a loaded value back to the location it was loaded from. It is a typical artifact from DeLICM. Removing it will get rid of bogus dependencies later in dependency analysis. It also removes statements without side-effects. ScopInfo already removes these, but the removal of unnecessary writes can result in more side-effect free statements. Differential Revision: https://reviews.llvm.org/D30820 llvm-svn: 297473	2017-03-10 16:05:24 +00:00
Tobias Grosser	24222c7357	Fix namespaces after clang-format update llvm-svn: 296635	2017-03-01 15:54:27 +00:00
Roman Gareev	bc3fbe49c5	Disable the parallel code generation in case of extension nodes We can not perform the dependence analysis and, consequently, the parallel code generation in case the schedule tree contains extension nodes. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30394 llvm-svn: 296325	2017-02-27 08:03:11 +00:00
Michael Kruse	52ab4943b4	Remove all references to PostDominators. NFC. Marking a pass as preserved is necessary if any Polly pass uses it, even if it is not preserved within the generated code. Not marking it would cause the the Polly pass chain to be interrupted. It is not used by any Polly pass anymore, hence we can remove all references to it. llvm-svn: 295983	2017-02-23 15:16:22 +00:00
Tobias Grosser	583be06fb2	[BlockGenerator] Use MemoryAccess::getAccessValue to get load instruction When generating code in the BlockGenerator we copy all (interesting) instructions and keep track of the new values in a basic block map. To obtain the original llvm::Value that belongs to a load memory access, we use getAccessValue() instead of getOriginalBaseAddr(). The former always references the instruction we use to load values from. The latter, on the other hand, is obtaine from the corresponding ScopArrayInfo and would not be unique in case ScopArrayInfo objects at some point allow memory accesses with different base addresses. This change is an update on r294566, which only clarified that we need the original memory access, but where we still remained dependent to have one base pointer per scop. This change removes unnecessary uses of MemoryAddress::getOriginalBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294669	2017-02-09 23:54:23 +00:00
Tobias Grosser	4553463be4	[IRBuilder] Extract base pointers directly from ScopArray Instead of iterating over statements and their memory accesses to extract the set of available base pointers, just directly iterate over all ScopArray objects. This reflects more the actual intend of the code: collect all arrays (and their base pointers) to emit alias information that specifies that accesses to different arrays cannot alias. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294574	2017-02-09 09:34:42 +00:00
Tobias Grosser	26fb7d7517	[IslAst] Print the ScopArray name to mark reductions Before this change we used the name of the base pointer to mark reductions. This is imprecise as the canonical reference is the ScopArray itself and not the basepointer of a reduction. Using the base pointer of reductions is problematic in cases where a single ScopArray is referenced through two different base pointers. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294568	2017-02-09 08:06:15 +00:00
Tobias Grosser	02400a0e0c	[BlockGenerator] BBMap uses original BaseAddress for scalar loads [NFC] When regenerating code in the BlockGenerator we copy instructions that may references scalar values, for which the new value of a given scalar is looked up in BBMap using the original scalar llvm::Value as index. It is consequently necessary that (re)loaded scalar values are made available in BBMap using the original llvm::Value as key independently if the llvm::Value was (re)loaded from the original scalar or a new access function has been specified that caused the value to be reloaded from an array with a differnet base address. We make this clear by using MemoryAccess::getOriginalBaseAddr() instead of MemoryAccess::getBaseAddr() as index to BBMap. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294566	2017-02-09 08:05:50 +00:00
Tobias Grosser	ff40087a6a	Update to recent formatting changes llvm-svn: 293756	2017-02-01 10:12:09 +00:00
Tobias Grosser	682c51143d	[BlockGenerator] Comment corretions for r293374 [NFC] This addresses some additional comments from Michael Kruse for commit r293374 as expressed in https://reviews.llvm.org/D28901. llvm-svn: 293378	2017-01-28 11:39:02 +00:00
Tobias Grosser	587f1f57ad	[Polly] [BlockGenerator] Unify ScalarMap and PhiOpsMap Instead of keeping two separate maps from Value to Allocas, one for MemoryType::Value and the other for MemoryType::PHI, we introduce a single map from ScopArrayInfo to the corresponding Alloca. This change is intended, both as a general simplification and cleanup, but also to reduce our use of MemoryAccess::getBaseAddr(). Moving away from using getBaseAddr() makes sure we have only a single place where the array (and its base pointer) for which we generate code for is specified, which means we can more easily introduce new access functions that use a different ScopArrayInfo as base. We already today experiment with modifiable access functions, so this change does not address a specific bug, but it just reduces the scope one needs to reason about. Another motivation for this patch is https://reviews.llvm.org/D28518, where memory accesses with different base pointers could possibly be mapped to a single ScopArrayInfo object. Such a mapping is currently not possible, as we currently generate alloca instructions according to the base addresses of the memory accesses, not according to the ScopArrayInfo object they belong to. By making allocas ScopArrayInfo specific, a mapping to a single ScopArrayInfo object will automatically mean that the same stack slot is used for these arrays. For D28518 this is not a problem, as only MemoryType::Array objects are mapping, but resolving this inconsistency will hopefully avoid confusion. llvm-svn: 293374	2017-01-28 07:42:10 +00:00
Tobias Grosser	75dfaa1dbe	BlockGenerator: Do not redundantly reload from PHI-allocas in non-affine stmts Before this change we created an additional reload in the copy of the incoming block of a PHI node to reload the incoming value, even though the necessary value has already been made available by the normally generated scalar loads. In this change, we drop the code that generates this redundant reload and instead just reuse the scalar value already available. Besides making the generated code slightly cleaner, this change also makes sure that scalar loads go through the normal logic, which means they can be remapped (e.g. to array slots) and corresponding code is generated to load from the remapped location. Without this change, the original scalar load at the beginning of the non-affine region would have been remapped, but the redundant scalar load would continue to load from the old PHI slot location. It might be possible to further simplify the code in addOperandToPHI, but this would not only mean to pull out getNewValue, but to also change the insertion point update logic. As this did not work when trying it the first time, this change is likely not trivial. To not introduce bugs last minute, we postpone further simplications to a subsequent commit. We also document the current behavior a little bit better. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D28892 llvm-svn: 292486	2017-01-19 14:12:45 +00:00
Tobias Grosser	943c369c60	BlockGenerator: remove obfuscating const and const casts Making certain values 'const' to just cast it away a little later mainly obfuscates the code. Hence, we just drop the 'const' parts. Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 292480	2017-01-19 13:25:52 +00:00
Tobias Grosser	97b8490982	Use range-based for loop [NFC] llvm-svn: 292471	2017-01-19 05:09:23 +00:00
Tobias Grosser	e1ff0cf2eb	Relax assert when setting access functions with invariant base pointers Summary: Instead of forbidding such access functions completely, we verify that their base pointer has been hoisted and only assert in case the base pointer was not hoisted. I was trying for a little while to get a test case that ensures the assert is correctly fired in case of invariant load hoisting being disabled, but I could not find a good way to do so, as llvm-lit immediately aborts if a command yields a non-zero return value. As we do not generally test our asserts, not having a test case here seems OK. This resolves http://llvm.org/PR31494 Suggested-by: Michael Kruse <llvm@meinersbur.de> Reviewers: efriedma, jdoerfert, Meinersbur, gareevroman, sebpop, zinob, huihuiz, pollydev Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D28798 llvm-svn: 292213	2017-01-17 12:00:42 +00:00
Tobias Grosser	21a059af09	Adjust formatting to commit r292110 [NFC] llvm-svn: 292123	2017-01-16 14:08:10 +00:00
Tobias Grosser	4d5a917287	Use typed enums to model MemoryKind and move MemoryKind out of ScopArrayInfo To benefit of the type safety guarantees of C++11 typed enums, which would have caught the type mismatch fixed in r291960, we make MemoryKind a typed enum. This change also allows us to drop the 'MK_' prefix and to instead use the more descriptive full name of the enum as prefix. To reduce the amount of typing needed, we use this opportunity to move MemoryKind from ScopArrayInfo to a global scope, which means the ScopArrayInfo:: prefix is not needed. This move also makes historically sense. In the beginning of Polly we had different MemoryKind enums in both MemoryAccess and ScopArrayInfo, which were later canonicalized to one. During this canonicalization we just choose the enum in ScopArrayInfo, but did not consider to move this shared enum to global scope. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D28090 llvm-svn: 292030	2017-01-14 20:25:44 +00:00
Tobias Grosser	e29db2173b	Update to recent clang-format changes llvm-svn: 291810	2017-01-12 21:05:19 +00:00
Roman Gareev	bd5c6039c6	Align newly created arrays to the first level cache line boundary Aligning data to cache lines boundaries helps to avoid overheads related to an access to it ([1]). This patch aligns newly created arrays and adds an option to specify the first level cache line size. By default we use 64 bytes, which is a typical cache-line size ([2]). In case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=8 it helps to improve the performance from 11.303 GFlops/sec (39,247% of theoretical peak) to 12.63 GFlops/sec (43,8542% of theoretical peak). Refs.: [1] - http://www.alexonlinux.com/aligned-vs-unaligned-memory-access [2] - http://igoro.com/archive/gallery-of-processor-cache-effects/ Differential Revision: https://reviews.llvm.org/D28020 Reviewed-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 290253	2016-12-21 12:37:36 +00:00
Tobias Grosser	b6945e3301	Fix clang-format llvm-svn: 290103	2016-12-19 14:06:40 +00:00
Tobias Grosser	dc6b87c56e	Add newline at end of debug print In '[DBG] Allow to emit the RTC value at runtime' the diagnostics were printed without a newline at the end of each diagnostic. We add such a newline to improve readability. llvm-svn: 288323	2016-12-01 08:08:47 +00:00
Michael Kruse	11c5e07925	canSynthesize: Remove unused argument LI. NFC. The helper function polly::canSynthesize() does not directly use the LoopInfo analysis, hence remove it from its argument list. llvm-svn: 288144	2016-11-29 15:11:04 +00:00
Tobias Grosser	df8f35b7b8	Update for clang-format change in r288119 llvm-svn: 288134	2016-11-29 12:52:08 +00:00
Tobias Grosser	b3c3d149b9	[CodeGen] Add flag to code-generate most memory access expressions Introduce the new flag -polly-codegen-generate-expressions which forces Polly to code generate AST expressions instead of using our SCEV based access expression generation even for cases where the original memory access relation was not changed and the SCEV based access expression could be code generated without any issue. This is an experimental option for better testing the isl ast expression generation. The default behavior of Polly remains unchanged. We also exclude a couple of cases for which the AST expression is not yet working. llvm-svn: 287694	2016-11-22 20:21:16 +00:00
Johannes Doerfert	81aa6e882f	[NFC] Adjust naming scheme of statistic variables Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 287347	2016-11-18 14:37:08 +00:00
Johannes Doerfert	dae2e9287d	[DBG] Collect statistics about actually versioned SCoPs llvm-svn: 287267	2016-11-17 21:55:43 +00:00
Johannes Doerfert	8c5464a715	[DBG] Allow to emit the RTC value at runtime The new command line flag "polly-codegen-emit-rtc-print" can be used to place a "printf" in the generated code that will print the RTC value and the overflow state. llvm-svn: 287265	2016-11-17 21:49:19 +00:00
Tobias Grosser	d0b9173caa	IslAst: always use the context during ast generation Providing the context to the ast generator allows for additional simplifcations and -- more importantly -- allows to generate loops with only partially bounded domains, assuming the domains are bounded for all parameter configurations that are valid as defined by the context. This change fixes the crash reported in http://llvm.org/PR30956 The original reason why we did not include the context when generating an AST was that CLooG and later isl used to sometimes transfer some of the constraints that bound the size of parameters from the context into the generated AST. This resulted in operations with very large constants, which sometimes introduced problematic integer overflows. The latest versions of the isl AST generator are careful to not introduce such constants. Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286442	2016-11-10 09:39:58 +00:00
Tobias Grosser	16480186f8	IslNodeBuilder: Ensure newly generated memory accesses are well-defined Add some additional asserts that ensure newly code-generated memory accesses are defined on all domain and schedule domain instances. llvm-svn: 286050	2016-11-05 21:46:01 +00:00
Eli Friedman	acf8006471	[Polly CodeGen] Break critical edge from RTC to original loop. This makes polly generate a CFG which is closer to what we want in LLVM IR, with a loop preheader for the original loop. This is just a cleanup, but it exposes some fragile assumptions. I'm not completely happy with the changes related to expandCodeFor; RTCBB->getTerminator() is basically a random insertion point which happens to work due to the way we generate runtime checks. I'm not sure what the right answer looks like, though. Differential Revision: https://reviews.llvm.org/D26053 llvm-svn: 285864	2016-11-02 22:32:23 +00:00
Mandeep Singh Grang	5b1abfc88e	[polly] Fix non-determinism in polly BlockGenerators Summary: Iterating over SeenBlocks which is a SmallPtrSet results in non-determinism in codegen Reviewers: jdoerfert, zinob, grosser Tags: #polly Differential Revision: https://reviews.llvm.org/D25778 llvm-svn: 284622	2016-10-19 17:56:49 +00:00
Eli Friedman	3c1a75bf9c	Handle multi-dimensional invariant load. If the address of a load depends on another load, make sure to emit the loads in the right order. llvm-svn: 284426	2016-10-17 21:04:26 +00:00
Michael Kruse	fa53c86dc1	[ScopInfo/CodeGen] ExitPHI reads are implicit. Under some conditions MK_Value read accessed where converted to MK_ExitPHI read accessed. This is unexpected because MK_ExitPHI read accesses are implicit after the scop execution. This behaviour was introduced in r265261, which fixed a failed assertion/crash in CodeGen. Instead, we fix this failure in CodeGen itself. createExitPHINodeMerges(), despite its name, also handles accesses of kind MK_Value, only to skip them because they access values that are usually not PHI nodes in the SCoP region's exit block. Except in the situation observed in r265261. Do not convert value accessed to ExitPHI accesses and do not handle value accesses like ExitPHI accessed in CodeGen anymore. llvm-svn: 284023	2016-10-12 16:31:09 +00:00
Mehdi Amini	732afdd09a	Turn cl::values() (for enum) from a vararg function to using C++ variadic template The core of the change is supposed to be NFC, however it also fixes what I believe was an undefined behavior when calling: va_start(ValueArgs, Desc); with Desc being a StringRef. Differential Revision: https://reviews.llvm.org/D25342 llvm-svn: 283671	2016-10-08 19:41:06 +00:00
Michael Kruse	4b0c5aea78	[CodeGen] Add assertion for indirect array index expression generation. NFC. Currently Polly cannot generate code for index expressions if the base pointer is computed within the scop. The base pointer must be generated as well, but there is no code that triggers that. Add an assertion to detect when this would occur and miscompile. The IR verifier should catch it as well. llvm-svn: 282893	2016-09-30 18:29:37 +00:00
Michael Kruse	888ab55140	[CodeGen] Change 'Scalar' to 'Array' in method names. NFC. generateScalarLoad() and generateScalarStore() are used for explicit (MK_Array) memory accesses, therefore the method names were misleading. The names also were similar to generateScalarLoads() and generateScalarStores() (plural forms) which indeed handle scalar accesses. Presumbly, they were originally named to contrast VectorBlockGenerator::generateLoad(). Rename the two methods to generateArrayLoad(), respectively generateArrayStore(). llvm-svn: 282861	2016-09-30 14:34:05 +00:00
Michael Kruse	77394f1394	[CodeGen] Add assertion for partial scalar accesses. NFC. The code generator always adds unconditional LoadInst and StoreInst, hence the MemoryAccess must be defined over all statement instances. llvm-svn: 282853	2016-09-30 14:01:46 +00:00
Tobias Grosser	bc653f2031	GPGPU: Do not run mostly sequential kernels in GPU In case sequential kernels are found deeper in the loop tree than any parallel kernel, the overall scop is probably mostly sequential. Hence, run it on the CPU. llvm-svn: 281849	2016-09-18 08:31:09 +00:00
Tobias Grosser	82f2af3508	GPGPU: Dynamically ensure 'sufficient compute' Offloading to a GPU is only beneficial if there is a sufficient amount of compute that can be accelerated. Many kernels just have a very small number of dynamic compute, which means GPU acceleration is not beneficial. We compute at run-time an approximation of how many dynamic instructions will be executed and fall back to CPU code in case this number is not sufficiently large. To keep the run-time checking code simple, we over-approximate the number of instructions executed in each statement by computing the volume of the rectangular hull of its iteration space. llvm-svn: 281848	2016-09-18 06:50:35 +00:00
Tobias Grosser	51dfc27589	GPGPU: Store back non-read-only scalars We may generate GPU kernels that store into scalars in case we run some sequential code on the GPU because the remaining data is expected to already be on the GPU. For these kernels it is important to not keep the scalar values in thread-local registers, but to store them back to the corresponding device memory objects that backs them up. We currently only store scalars back at the end of a kernel. This is only correct if precisely one thread is executed. In case more than one thread may be run, we currently invalidate the scop. To support such cases correctly, we would need to always load and store back from a corresponding global memory slot instead of a thread-local alloca slot. llvm-svn: 281838	2016-09-17 19:22:31 +00:00
Tobias Grosser	fe74a7a1f5	GPGPU: Detect read-only scalar arrays ... and pass these by value rather than by reference. llvm-svn: 281837	2016-09-17 19:22:18 +00:00
Tobias Grosser	aaabbbf886	GPGPU: Do not assume arrays start at 0 Our alias checks precisely check that the minimal and maximal accessed elements do not overlap in a kernel. Hence, we must ensure that our host <-> device transfers do not touch additional memory locations that are not covered in the alias check. To ensure this, we make sure that the data we copy for a given array is only the data from the smallest element accessed to the largest element accessed. We also adjust the size of the array according to the offset at which the array is actually accessed. An interesting result of this is: In case array are accessed with negative subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to cover the full array. This is important as such code indeed exists in the wild. llvm-svn: 281611	2016-09-15 14:05:58 +00:00
Roman Gareev	b3224adfb6	Perform copying to created arrays according to the packing transformation This is the fourth patch to apply the BLIS matmul optimization pattern on matmul kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf). BLIS implements gemm as three nested loops around a macro-kernel, plus two packing routines. The macro-kernel is implemented in terms of two additional loops around a micro-kernel. The micro-kernel is a loop around a rank-1 (i.e., outer product) update. In this change we perform copying to created arrays, which is the last step to implement the packing transformation. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D23260 llvm-svn: 281441	2016-09-14 06:26:09 +00:00
Tobias Grosser	0a893f7df4	GPGPU: Use const_cast to avoid compiler warning [NFC] llvm-svn: 281333	2016-09-13 13:22:27 +00:00
Tobias Grosser	a82c4b5df8	GPGPU: Allow region statements llvm-svn: 281305	2016-09-13 08:42:10 +00:00
Tobias Grosser	b79f4d3970	GPGPU: Extend types when array sizes have smaller types This prevents a compiler crash. llvm-svn: 281303	2016-09-13 08:02:14 +00:00
Roman Gareev	f5aff70405	Store the size of the outermost dimension in case of newly created arrays that require memory allocation. We do not need the size of the outermost dimension in most cases, but if we allocate memory for newly created arrays, that size is needed. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D23991 llvm-svn: 281234	2016-09-12 17:08:31 +00:00
Tobias Grosser	5857b701a3	GPGPU: Bail out gracefully in case of invalid IR Instead of aborting, we now bail out gracefully in case the kernel IR we generate is invalid. This can currently happen in case the SCoP stores pointer values, which we model as arrays, as data values into other arrays. In this case, the original pointer value is not available on the device and can consequently not be stored. As detecting this ahead of time is not so easy, we detect these situations after the invalid IR has been generated and bail out. llvm-svn: 281193	2016-09-12 06:06:31 +00:00
Tobias Grosser	02293ed755	GPGPU: Do not fail in case of arrays never accessed If these arrays have never been accessed we failed to derive an upper bound of the accesses and consequently a size for the outermost dimension. We now explicitly check for empty access sets and then just use zero as size for the outermost dimension. llvm-svn: 281165	2016-09-11 13:30:12 +00:00
Tobias Grosser	a3afe44d6c	IslNodeBuilder: Add missing __isl_take annotation llvm-svn: 281034	2016-09-09 11:16:50 +00:00
Tobias Grosser	f3600dfa2d	IslNodeBuilder: Add missing __isl_take annotations llvm-svn: 280936	2016-09-08 13:48:55 +00:00
Tobias Grosser	c80d6979bd	Drop '@brief' from doxygen comments LLVM's coding guideline suggests to not use @brief for one-sentence doxygen comments to improve readability. Switch this once and for all to ensure people do not copy @brief comments from other parts of Polly, when writing new code. llvm-svn: 280468	2016-09-02 06:33:33 +00:00
Michael Kruse	2fa3519463	Allow mapping scalar MemoryAccesses to array elements. Change the code around setNewAccessRelation to allow to use a an existing array element for memory instead of an ad-hoc alloca. This facility will be used for DeLICM/DeGVN to convert scalar dependencies into regular ones. The changes necessary include: - Make the code generator use the implicit locations instead of the alloca ones. - A test case - Make the JScop importer accept changes of scalar accesses for that test case. - Adapt the MemoryAccess interface to the fact that the MemoryKind can change. They are named (get\|is)OriginalXXX() to get the status of the memory access before any change by setNewAccessRelation() (some properties such as getIncoming() do not change even if the kind is changed and are still required). To get the modified properties, there is (get\|is)LatestXXX(). The old accessors without Original\|Latest become synonyms of the (get\|is)OriginalXXX() to not make functional changes in unrelated code. Differential Revision: https://reviews.llvm.org/D23962 llvm-svn: 280408	2016-09-01 19:53:31 +00:00
Tobias Grosser	1c18440958	[BlockGenerator] Invalidate SCEV values for instructions in scop We already invalidated a couple of critical values earlier on, but we now invalidate all instructions contained in a scop after the scop has been code generated. This is necessary as later scops may otherwise obtain SCEV expressions that reference values in the earlier scop that before dominated the later scop, but which had been moved into the conditional branch and consequently do not dominate the later scop any more. If these very values are then used during code generation of the later scop, we generate used that are dominated by the values they use. This fixes: http://llvm.org/PR28984 llvm-svn: 279047	2016-08-18 10:45:57 +00:00
Tobias Grosser	d58acf866a	[GPGPU] Ensure arrays where only parts are modified are copied to GPU To do so we change the way array exents are computed. Instead of the precise set of memory locations accessed, we now compute the extent as the range between minimal and maximal address in the first dimension and the full extent defined by the sizes of the inner array dimensions. We also move the computation of the may_persist region after the construction of the arrays, as it relies on array information. Without arrays being constructed no useful information is computed at all. llvm-svn: 278212	2016-08-10 10:58:19 +00:00
Tobias Grosser	b06ff4574e	[GPGPU] Support PHI nodes used in GPU kernel Ensure the right scalar allocations are used as the host location of data transfers. For the device code, we clear the allocation cache before device code generation to be able to generate new device-specific allocation and we need to make sure to add back the old host allocations as soon as the device code generation is finished. llvm-svn: 278126	2016-08-09 15:35:06 +00:00
Tobias Grosser	750160e260	[GPGPU] Use separate basic block for GPU initialization code This increases the readability of the IR and also clarifies that the GPU inititialization is executed _after_ the scalar initialization which needs to before the code of the transformed scop is executed. Besides increased readability, the IR should not change. Specifically, I do not expect any changes in program semantics due to this patch. llvm-svn: 278125	2016-08-09 15:35:03 +00:00
Tobias Grosser	776700d0b7	[BlockGenerator] Insert initializations at beginning of start block In case some code -- not guarded by control flow -- would be emitted directly in the start block, it may happen that this code would use uninitalized scalar values if the scalar initialization is only emitted at the end of the start block. This is not a problem today in normal Polly, as all statements are emitted in their own basic blocks, but Polly-ACC emits host-to-device copy statements into the start block. Additional Polly-ACC test coverage will be added in subsequent changes that improve the handling of PHI nodes in Polly-ACC. llvm-svn: 278124	2016-08-09 15:34:59 +00:00
Tobias Grosser	c59b3ce044	[BlockGenerator] Also eliminate dead code not originating from BB After having generated the code for a ScopStmt, we run a simple dead-code elimination that drops all instructions that are known to be and remain unused. Until this change, we only considered instructions for dead-code elimination, if they have a corresponding instruction in the original BB that belongs to ScopStmt. However, when generating code we do not only copy code from the BB belonging to a ScopStmt, but also generate code for operands referenced from BB. After this change, we now also considers code for dead code elimination, which does not have a corresponding instruction in BB. This fixes a bug in Polly-ACC where such dead-code referenced CPU code from within a GPU kernel, which is possible as we do not guarantee that all variables that are used in known-dead-code are moved to the GPU. llvm-svn: 278103	2016-08-09 08:59:05 +00:00
Tobias Grosser	cf66ef26f3	[GPGPU] Pass parameters always by using their own type llvm-svn: 278100	2016-08-09 07:22:08 +00:00
Tobias Grosser	124534038a	[GPGPU] Support Values referenced from both isl expr and llvm instructions When adding code that avoids to pass values used in isl expressions and LLVM instructions twice, we forgot to make single variable passed to the kernel available in the ValueMap that makes it usable for instructions that are not replaced with isl ast expressions. This change adds the variable that is passed to the kernel to the ValueMap to ensure it is available for such use cases as well. llvm-svn: 278039	2016-08-08 19:22:19 +00:00
Tobias Grosser	cb1aef8de4	[GPGPU] Create code to verify run-time conditions llvm-svn: 278026	2016-08-08 17:35:55 +00:00
Tobias Grosser	fa9abd1f03	Fix compilation in 'asserts' mode llvm-svn: 278025	2016-08-08 17:35:52 +00:00
Tobias Grosser	0aa29532b7	[IslNodeBuilder] Move run-time check generation to NodeBuilder [NFC] This improves the structure of the code and allows us to reuse the runtime code generation in the PPCGCodeGeneration. llvm-svn: 278017	2016-08-08 15:41:52 +00:00
Tobias Grosser	219feac456	[CodeGeneration] Do not set insert position redundantly There is no need to reset the position of the builder, as we can just continue to insert code at the current position of the IRBuilder, which happens to be precisely the location we reset the builder to. llvm-svn: 278014	2016-08-08 15:25:50 +00:00
Tobias Grosser	000db70754	[IslNodeBuilder] Directly use the insert location of our Builder ... instead of adding instructions at the end of the basic block the builder is currently at. This makes it easier to reason about where IR is generated, as with the IRBuilder there is just a single location that specificies where IR is generated. llvm-svn: 278013	2016-08-08 15:25:46 +00:00
Michael Kruse	fbde435517	[CodeGen] Use MapVector instead of DenseMap. The map is iterated over when generating the values escaping the SCoP. The indeterministic iteration order of DenseMap causes the output IR to change at every compilation, adding noise to comparisons. Replace DenseMap by a MapVector to ensure the same iteration order at every compilation. llvm-svn: 277832	2016-08-05 16:45:51 +00:00
Tobias Grosser	928d7573dd	GPGPU: Sort dimension sizes of multi-dimensional shared memory arrays correctly Before this commit we generated the array type in reverse order and we also added the outermost dimension size to the new array declaration, which is incorrect as Polly additionally assumed an additional unsized outermost dimension, such that we had an off-by-one error in the linearization of access expressions. llvm-svn: 277802	2016-08-05 08:27:24 +00:00
Tobias Grosser	c1c6a2a61b	GPGPU: Add cuda annotations to specify maximal number of threads per block These annotations ensure that the NVIDIA PTX assembler limits the number of registers used such that we can be certain the resulting kernel can be executed for the number of threads in a thread block that we are planning to use. llvm-svn: 277799	2016-08-05 06:47:43 +00:00
Tobias Grosser	f919d8b360	GPGPU: Support scalars that are mapped to shared memory llvm-svn: 277726	2016-08-04 13:57:29 +00:00
Tobias Grosser	8950cead7f	GPGPU: Disable verbose debug output llvm-svn: 277724	2016-08-04 12:44:03 +00:00
Tobias Grosser	b0dd95bcd2	Remove leftover debug output llvm-svn: 277723	2016-08-04 12:41:28 +00:00
Tobias Grosser	130ca30f92	GPGPU: Add private memory support llvm-svn: 277722	2016-08-04 12:39:03 +00:00
Tobias Grosser	b513b4916b	GPGPU: Add support for shared memory llvm-svn: 277721	2016-08-04 12:18:14 +00:00
Tobias Grosser	00bb5a99f5	GPGPU: Handle scalar array references Pass the content of scalar array references to the alloca on the kernel side and do not pass them additional as normal LLVM scalar value. llvm-svn: 277699	2016-08-04 06:55:59 +00:00
Tobias Grosser	3216f8546c	BlockGenerator: Assert that we do not get alloca of array access llvm-svn: 277698	2016-08-04 06:55:53 +00:00
Tobias Grosser	576932728d	GPGPU: Pass subtree values correctly to the kernel llvm-svn: 277697	2016-08-04 06:55:49 +00:00
Tobias Grosser	629109b633	GPGPU: Mark kernel functions as polly.skip Otherwise, we would try to re-optimize them with Polly-ACC and possibly even generate kernels that try to offload themselves, which does not work as the GPURuntime is not available on the accelerator and also does not make any sense. llvm-svn: 277589	2016-08-03 12:00:07 +00:00
Tobias Grosser	2219d15748	Fix a couple of spelling mistakes llvm-svn: 277569	2016-08-03 05:28:09 +00:00
Roman Gareev	d7754a1245	Extend the jscop interface to allow the user to declare new arrays and to reference these arrays from access expressions Extend the jscop interface to allow the user to export arrays. It is required that already existing arrays of the list of arrays correspond to arrays of the SCoP. Each array that is appended to the list will be newly created. Furthermore, we allow the user to modify access expressions to reference any array in case it has the same element type. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D22828 llvm-svn: 277263	2016-07-30 09:25:51 +00:00
Tobias Grosser	d8b94bcac1	GPGPU: Pass context parameters to GPU kernel llvm-svn: 276963	2016-07-28 06:47:59 +00:00
Tobias Grosser	a490147c90	GPGPU: Pass host iterators to kernel llvm-svn: 276962	2016-07-28 06:47:56 +00:00
Tobias Grosser	44143bb927	GPGPU: use current 'Index' to find slot in parameter array Before this change we used the array index, which would result in us accessing the parameter array out-of-bounds. This bug was visible for test cases where not all arrays in a scop are passed to a given kernel. llvm-svn: 276961	2016-07-28 06:47:53 +00:00
Tobias Grosser	4e18d71c71	GPGPU: Generate kernel parameter allocation with right size Before this change we miscounted the number of function parameters. llvm-svn: 276960	2016-07-28 06:47:50 +00:00
Tobias Grosser	79a947c233	GPGPU: Add basic support for kernel launches llvm-svn: 276863	2016-07-27 13:20:16 +00:00
Tobias Grosser	5779359624	GPGPU: Load GPU kernels We embed the PTX code into the host IR as a global variable and compile it at run-time into a GPU kernel. llvm-svn: 276645	2016-07-25 16:31:21 +00:00
Tobias Grosser	13c78e4d51	GPGPU: Emit data-transfer code Also factor out getArraySize() to avoid code dupliciation and reorder some function arguments to indicate the direction into which data is transferred. llvm-svn: 276636	2016-07-25 12:47:39 +00:00
Tobias Grosser	7287aeddf1	GPGPU: Complete code to allocate and free device arrays At the beginning of each SCoP, we allocate device arrays for all arrays used on the GPU and we free such arrays after the SCoP has been executed. llvm-svn: 276635	2016-07-25 12:47:33 +00:00
Tobias Grosser	fa7b080218	GPGPU: initialize GPU context and simplify the corresponding GPURuntime interface. There is no need to expose the selected device at the moment. We also pass back pointers as return values, as this simplifies the interface. llvm-svn: 276623	2016-07-25 09:16:01 +00:00
Tobias Grosser	8ed5e5999f	IslNodeBuilder: Make finalize() virtual This allows the finalization routine of the IslNodeBuilder to be overwritten by derived classes. Being here, we also drop the unnecessary 'Scop' postfix and the unnecessary 'Scop' parameter. llvm-svn: 276622	2016-07-25 09:15:57 +00:00
Tobias Grosser	9a18d55947	GPGPU: Optimize kernel IR before generating assembly code We optimize the kernel _after_ dumping the IR we generate to make the IR we dump easier readable and independent of possible changes in the general purpose LLVM optimizers. llvm-svn: 276551	2016-07-24 06:43:21 +00:00
Tobias Grosser	e1a98343a1	GPGPU: Verify kernel IR before generating assembly llvm-svn: 276550	2016-07-24 06:43:17 +00:00
Tobias Grosser	74dc3cb431	GPGPU: Generate PTX assembly code for the kernel modules Run the NVPTX backend over the GPUModule IR and write the resulting assembly code in a string. To work correctly, it is important to invalidate analysis results that still reference the IR in the kernel module. Hence, this change clears all references to dominators, loop info, and scalar evolution. Finally, the NVPTX backend has troubles to generate code for various special floating point types (not surprising), but also for uncommon integer types. This commit does not resolve these issues, but pulls out problematic test cases into separate files to XFAIL them individually and resolve them in future (not immediate) changes one by one. llvm-svn: 276396	2016-07-22 07:11:12 +00:00
Tobias Grosser	edb885cb12	GPGPU: generate code for ScopStatements This change introduces the actual compute code in the GPU kernels. To ensure all values referenced from the statements in the GPU kernel are indeed available we scan all ScopStmts in the GPU kernel for references to llvm::Values that are not yet covered by already modeled outer loop iterators, parameters, or array base pointers and also pass these additional llvm::Values to the GPU kernel. For arrays used in the GPU kernel we introduce a new ScopArrayInfo object, which is referenced by the newly generated access functions within the GPU kernel and which is used to help with code generation. llvm-svn: 276270	2016-07-21 13:15:59 +00:00
Tobias Grosser	86083da0ec	IslNodeBuilder: expose addReferencesFromStmt [NFC] This will be used by Polly GPGPU to determine the values that need to be passed to GPU kernels. llvm-svn: 276269	2016-07-21 13:15:55 +00:00
Tobias Grosser	04b909fcca	IslExprBuilder: allow to specify an external isl_id to ScopArrayInfo mapping This is useful for external users using IslExprBuilder, in case they cannot embed ScopArrayInfo data into their isl_ids, because the isl_ids either already carry other information or the isl_ids have been created and their user pointers cannot be updated any more. llvm-svn: 276268	2016-07-21 13:15:51 +00:00
Tobias Grosser	9d12d8ade3	BlockGenerator: remove dead instructions in normal statements This ensures that no trivially dead code is generated. This is not only cleaner, but also avoids troubles in case code is generated in a separate function and some of this dead code contains references to values that are not available. This issue may happen, in case the memory access functions have been updated and old getelementptr instructions remain in the code. With normal Polly, a test case is difficult to draft, but the upcoming GPU code generation can possibly trigger such problems. We will later extend this dead-code elimination to region and vector statements. llvm-svn: 276263	2016-07-21 11:48:36 +00:00
Tobias Grosser	2d58a64e7f	GPGPU: Bail out of scops with hoisted invariant loads This is currently not supported and will only be added later. Also update the test cases to ensure no invariant code hoisting is applied. llvm-svn: 275987	2016-07-19 15:56:25 +00:00
Tobias Grosser	5260c041ea	GPGPU: Emit in-kernel synchronization statements We use this opportunity to further classify the different user statements that can arise and add TODOs for the ones not yet implemented. llvm-svn: 275957	2016-07-19 07:33:16 +00:00
Tobias Grosser	59ab070523	GPGPU: generate control flow within the kernel llvm-svn: 275956	2016-07-19 07:33:11 +00:00
Tobias Grosser	c84a1995fe	GPGPU: add scop parameters to kernel arguments llvm-svn: 275955	2016-07-19 07:33:06 +00:00
Tobias Grosser	f6044bd0ef	GPGPU: add host iterators to kernel arguments llvm-svn: 275954	2016-07-19 07:32:55 +00:00
Tobias Grosser	472f9654c8	GPGPU: add intrinsic functions to obtain a kernels thread and block ids llvm-svn: 275953	2016-07-19 07:32:44 +00:00
Tobias Grosser	32837fe313	GPGPU: create kernel function skeleton Create for each kernel a separate LLVM-IR module containing a single function marked as kernel function and taking one pointer for each array referenced by this kernel. Add debugging output to verify the kernels are generated correctly. llvm-svn: 275952	2016-07-19 07:32:38 +00:00
Tobias Grosser	b9fc860a57	GPGPU: collect array references Initialize the list of references to a GPU array to ensure that the arrays that need to be passed to kernel calls are computed correctly. Furthermore, the very same information is also necessary to compute synchronization correctly. As the functionality to compute these references is already available, what is left for us to do is only to connect the necessary functionality to compute array reference information. llvm-svn: 275798	2016-07-18 15:44:32 +00:00
Tobias Grosser	1fb9b64dc0	GPGPU: Pull implementation out of class definition This will allow us to see the full class definition even after we add non-trivial implementations of the different member functions. llvm-svn: 275797	2016-07-18 15:44:25 +00:00
Tobias Grosser	38fc0aed08	GPGPU: Create host control flow Create LLVM-IR for all host-side control flow of a given GPU AST. We implement this by introducing a new GPUNodeBuilder class derived from IslNodeBuilder. The IslNodeBuilder will take care of generating all general-purpose ast nodes, but we provide our own createUser implementation to handle the different GPU specific user statements. For now, we just skip any user statement and only generate a host-code sceleton, but in subsequent commits we will add handling of normal ScopStmt's performing computations, kernel calls, as well as host-device data transfers. We will also introduce run-time check generation and LICM in subsequent commits. llvm-svn: 275783	2016-07-18 11:56:39 +00:00
Tobias Grosser	2025173494	GPGPU: Format statements scheduled on the host ourselves Otherwise ppcg would try to call into pet functionality that this not available, which obviously will cause trouble. As we can easily print these statements ourselves, we just do so. llvm-svn: 275579	2016-07-15 17:12:41 +00:00
Tobias Grosser	2341fe9e76	GPGPU: Use schedule whole components for scheduler This option increases the scalability of the scheduler and allows us to remove the 'gisting' workaround we introduced in r275565 to handle a more complicated test case. Another benefit of using this option is also that the generated code looks a lot more streamlined. Thanks to Sven Verdoolaege for reminding me of this option. llvm-svn: 275573	2016-07-15 16:15:47 +00:00
Tobias Grosser	e4725437e8	GPGPU: Drop domain constraints from flow dependences This works around a shortcoming of the isl scheduler, which even for some smaller test cases does not terminate in case domain constraints are part of the flow dependences. llvm-svn: 275565	2016-07-15 14:43:04 +00:00
Tobias Grosser	6293ba6973	GPGPU: Add memory reference tag ids to tagged accesses It seems we forgot to actually add the memory access ids to the tagged accesses, but instead just tagged the accesses with empty isl_ids. This issue was found by inspection and without code generation it is difficult to test just by itself. We fix it for now without test case and expect our code generation tests to cover this later on. llvm-svn: 275557	2016-07-15 12:44:27 +00:00
Tobias Grosser	2d010daf85	GPGPU: Make sure scops with more than one array work We use this opportunity to add a test case containing a scalar parameter. llvm-svn: 275547	2016-07-15 10:51:14 +00:00
Tobias Grosser	b307ed4d08	GPGPU: Free options to avoid memory leak ppcg does not free the option structs for us. To avoid a memory leak we do this ourselves. llvm-svn: 275546	2016-07-15 10:32:22 +00:00
Tobias Grosser	a56f8f8e58	GPGPU: Shorten ppcg include paths to avoid conflict with cuda.h Instead of directly linking to ppcg's main source directory, we link to the parent director. This allows us to access ppcg's include files with 'ppcg/cuda.h' and avoids a conflict with NVIDIA's cuda.h header. Also drop an include directory that is currently not used. llvm-svn: 275536	2016-07-15 07:50:36 +00:00
Tobias Grosser	60f63b49f2	GPGPU: Model array access information This allows us to derive host-device and device-host data-transfers. llvm-svn: 275535	2016-07-15 07:05:54 +00:00
Tobias Grosser	69b4675180	GPGPU: Generate an AST for the GPU-mapped schedule For this we need to provide an explicit list of statements as they occur in the polly::Scop to ppcg. We also setup basic AST printing facilities to facilitate debugging. To allow code reuse some (minor) changes in ppcg are have been necessary. llvm-svn: 275436	2016-07-14 15:51:37 +00:00
Tobias Grosser	60c6002570	GPGPU: Add dummy implementation for ast expression construction Instead of calling to a pet function that does not return anything, we pass our own dummy implementation to ppcg that always returns a nullptr. This ensures that the list of ast expressions always contains a nullptr and we do not accidentally free a random (uninitalized) pointer. This resolves the last valgrind warning we see. We provide an implementation for this function, when the generated AST expressions can be used and consequently can be tested. llvm-svn: 275435	2016-07-14 15:51:32 +00:00
Tobias Grosser	4eaedde530	GPGPU: Use a tile size of 32 by default The tile size was previously uninitialized. As a result, it was often zero (aka. no tiling), which is not what we want in general. More importantly, there was the risk for arbitrary tile sizes to be choosen, which we did not observe, but which still is highly problematic. llvm-svn: 275418	2016-07-14 14:14:02 +00:00
Tobias Grosser	bd81a7eebc	Fix formatting llvm-svn: 275397	2016-07-14 10:53:00 +00:00
Tobias Grosser	aef5196f75	GPGPU: Map initial schedule to GPU schedule This change now applies ppcg's GPU mapping on our initial schedule. For this to work, we need to also initialize the set of all names (isl_ids) used in the scop as well as the program context. llvm-svn: 275396	2016-07-14 10:51:52 +00:00
Tobias Grosser	681bd5688f	GPGPU: Do not dump schedule by default llvm-svn: 275395	2016-07-14 10:51:47 +00:00
Tobias Grosser	f384594d5e	GPGPU: compute new schedule from polly scop To do so we copy the necessary information to compute an initial schedule from polly::Scop to ppcg's scop. Most of the necessary information is directly available and only needs to be passed on to ppcg, with the exception of 'tagged' access relations, access relations that additionally carry information about which memory access an access relation originates from. We could possibly perform the construction of tagged accesses as part of ScopInfo, but as this format is currently specific to ppcg we do not do this yet, but keep this functionality local to our GPU code generation. After the scop has been initialized, we compute data dependences and ask ppcg to compute an initial schedule. Some of this functionality is already available in polly::DependenceInfo and polly::ScheduleOptimizer, but to keep differences to ppcg small we use ppcg's functionality here. We may later investiage if a closer integration of these tools makes sense. llvm-svn: 275390	2016-07-14 10:22:25 +00:00
Tobias Grosser	e938517e37	GPGPU: create default initialized PPCG scop and gpu program At this stage, we do not yet modify the IR but just generate a default initialized ppcg_scop and gpu_prog and free both immediately. Both will later be filled with data from the polly::Scop and are needed to use PPCG for GPU schedule generation. This commit does not yet perform any GPU code generation, but ensures that the basic infrastructure has been put in place. We also add a simple test case to ensure the new code is run and use this opportunity to verify that GPU_CODEGEN tests are only run if GPU code generation has been enabled in cmake. llvm-svn: 275389	2016-07-14 10:22:19 +00:00
Tobias Grosser	9dfe4e7c05	Add accelerator code generation pass skeleton Add a new pass to serve as basis for automatic accelerator mapping in Polly. The pass structure and the analyses preserved are copied from CodeGeneration.cpp, as we will rely on IslNodeBuilder and IslExprBuilder for LLVM-IR code generation. Polly's accelerator code generation is enabled with -polly-target=gpu I would like to use this commit as opportunity to thank Yabin Hu for his work in the context of two Google summer of code projects during which he implemented initial prototypes of the Polly accelerator code generation -- in parts this code is already available in todays Polly (e.g., tools/GPURuntime). More will come as part of the upcoming Polly ACC changes. Reviewers: Meinersbur Subscribers: pollydev, llvm-commits Differential Revision: http://reviews.llvm.org/D22036 llvm-svn: 275275	2016-07-13 15:54:58 +00:00
Tobias Grosser	faef9a7667	Fix gcc compile failure Commit r275056 introduced a gcc compile failure due to us using two types named 'Type', the first being the newly introduced member variable 'Type' the second being llvm::Type. We resolve this issue by renaming the newly introduced member variable to AccessType. llvm-svn: 275057	2016-07-11 12:27:04 +00:00
Tobias Grosser	4e2d9c45b9	InvariantEquivClassTy: Use struct instead of 4-tuple to increase readability Summary: With a struct we can use named accessors instead of generic std::get<3>() calls. This increases readability of the source code. Reviewers: jdoerfert Subscribers: pollydev, llvm-commits Differential Revision: http://reviews.llvm.org/D21955 llvm-svn: 275056	2016-07-11 12:15:10 +00:00
Justin Bogner	e2467baba8	Update for llvm r274769 llvm-svn: 274777	2016-07-07 18:03:30 +00:00
Tobias Grosser	932ec01328	isl: isl-0.17.1-164-gcbba1b6 This is a regular maintenance update to ensure the latest version of isl is tested. Interesting Changes: - AST nodes and expressions are now printed as YAML llvm-svn: 274614	2016-07-06 09:11:00 +00:00
George Burgess IV	1a046de897	Try to fix polly buildbots. Broken by r274589. llvm-svn: 274595	2016-07-06 02:21:00 +00:00
Tobias Grosser	29a4dd92b7	CodegenCleanup: Drop CFLAA pass from codegen cleanup sequence Since r274197 -polly-position=before-vectorizer caused various LNT failures for example in SingleSource/Benchmarks/Linpack. These failures seem to only occur when the CFLAA pass is scheduled in our codegen-cleanup passes, which suggests that the way we call this AA pass is somehow problematic. As this pass is not of high importance, we drop the pass for now to prevent these failures from happening. At a later point, we might investigate more in-depth why this specific usage scenario caused correctness issues. llvm-svn: 274427	2016-07-02 07:58:13 +00:00
Tobias Grosser	522478d2c0	clang-tidy: Add llvm namespace comments llvm commonly adds a comment to the closing brace of a namespace to indicate which namespace is closed. clang-tidy provides with llvm-namespace-comment a handy tool to check for this habit. We use it to ensure we consitently use namespace comments in Polly. There are slightly different styles in how namespaces are closed in LLVM. As there is no large difference between the different comment styles we go for the style clang-tidy suggests by default. To reproduce this fix run: for i in `ls tools/polly/lib//.cpp`; \ clang-tidy -checks='-,llvm-namespace-comment' -p build $i -fix \ -header-filter="."; \ done This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in http://reviews.llvm.org/D21488 and was split out to increase readability. llvm-svn: 273621	2016-06-23 22:17:27 +00:00
Tobias Grosser	616449df6d	Add missing copyright header This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in http://reviews.llvm.org/D21488 and was split out to increase readability. llvm-svn: 273436	2016-06-22 16:29:28 +00:00
Tobias Grosser	8dd653d983	clang-tidy: apply modern-use-nullptr fixes Instead of using 0 or NULL use the C++11 nullptr symbol when referencing null pointers. This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in http://reviews.llvm.org/D21488 and was split out to increase readability. llvm-svn: 273435	2016-06-22 16:22:00 +00:00
Michael Kruse	6b4e928285	Replace ScalarReplAggregatesPass by SROAPass. ScalarReplAggregatesPass was deprecated and replaced by SROAPass. ScalarReplAggregatesPass got finally removed in LLVM commit r272737, hence this patch is also a compile fix. llvm-svn: 272783	2016-06-15 13:21:28 +00:00
Tobias Grosser	43de17872a	Recommit: "Simplify min/max expression generation" As part of this simplification we pull complex logic out of the loop body and skip the previously redundantly executed first loop iteration. This is a partial recommit of r271514 and r271535 which where in conflict with the revert in r272483 and consequently also had to be reverted temporarily. The original patch was contributed by Johannes Doerfert. This patch is mostly a NFC, but dropping the first loop iteration can sometimes result in slightly simpler code. llvm-svn: 272502	2016-06-12 04:49:41 +00:00
Tobias Grosser	3717aa5ddb	This reverts recent expression type changes The recent expression type changes still need more discussion, which will happen on phabricator or on the mailing list. The precise list of commits reverted are: - "Refactor division generation code" - "[NFC] Generate runtime checks after the SCoP" - "[FIX] Determine insertion point during SCEV expansion" - "Look through IntToPtr & PtrToInt instructions" - "Use minimal types for generated expressions" - "Temporarily promote values to i64 again" - "[NFC] Avoid unnecessary comparison for min/max expressions" - "[Polly] Fix -Wunused-variable warnings (NFC)" - "[NFC] Simplify min/max expression generation" - "Simplify the type adjustment in the IslExprBuilder" Some of them are just reverted as we would otherwise get conflicts. I will try to re-commit them if possible. llvm-svn: 272483	2016-06-11 19:17:15 +00:00
Johannes Doerfert	8448071d3e	Refactor division generation code This patch refactors the code generation for divisions. This allows to always generate a shift for a power-of-two division and to utilize information about constant divisors in order to truncate the result type. llvm-svn: 271898	2016-06-06 14:56:17 +00:00
Johannes Doerfert	c0ece9b67e	[NFC] Generate runtime checks after the SCoP We now generate runtime checks __after__ the SCoP code generation and not before, though they are still inserted at the same position int the code. This allows to modify the runtime check during SCoP code generation. llvm-svn: 271894	2016-06-06 13:32:52 +00:00
Johannes Doerfert	6a6a671c72	[NFC] Simplify code llvm-svn: 271889	2016-06-06 12:13:24 +00:00
Johannes Doerfert	0767a511ba	Use minimal types for generated expressions We now use the minimal necessary bit width for the generated code. If operations might overflow (add/sub/mul) we will try to adjust the types in order to ensure a non-wrapping computation. If the type adjustment is not possible, thus the necessary type is bigger than the type value of --polly-max-expr-bit-width, we will use assumptions to verify the computation will not wrap. However, for run-time checks we cannot build assumptions but instead utilize overflow tracking intrinsics. llvm-svn: 271878	2016-06-06 09:57:41 +00:00
Michael Kruse	5c527f9963	Fix modulo compared to zero. In case of modulo compared to zero, we need to do signed modulo operation as unsigned can give different results based on whether the dividend is negative or not. This addresses llvm.org/PR27707 Contributed-by: Chris Jenneisch <chrisj@codeaurora.org> Reviewers: _jdoerfert, grosser, Meinersbur Differential Revision: http://reviews.llvm.org/D20145 llvm-svn: 271707	2016-06-03 18:51:48 +00:00
Johannes Doerfert	6393ef135c	Temporarily promote values to i64 again Operands of binary operations that might overflow will be temporarily promoted to i64 again, though that is not a sound solution for the problem. llvm-svn: 271538	2016-06-02 17:09:22 +00:00
Johannes Doerfert	4cf79d4ca4	[NFC] Avoid unnecessary comparison for min/max expressions llvm-svn: 271535	2016-06-02 16:58:12 +00:00
Matthew Simpson	acae9e3b30	[Polly] Fix -Wunused-variable warnings (NFC) llvm-svn: 271518	2016-06-02 14:26:38 +00:00
Johannes Doerfert	47f15f6d7e	[NFC] Simplify min/max expression generation llvm-svn: 271514	2016-06-02 11:20:52 +00:00
Johannes Doerfert	d36553753e	Simplify the type adjustment in the IslExprBuilder We now have a simple function to adjust/unify the types of two (or three) operands before an operation that requieres the same type for all operands. Due to this change we will not promote parameters that are added to i64 anymore if that is not needed. llvm-svn: 271513	2016-06-02 11:15:57 +00:00
Johannes Doerfert	a91c85a5b9	[FIX] Ensure wrapping checks for unary expressions llvm-svn: 271512	2016-06-02 11:08:43 +00:00
Johannes Doerfert	99191c78c2	Decouple SCoP building logic from pass Created a new pass ScopInfoRegionPass. As name suggests, it is a region pass and it is there to preserve compatibility with our existing Polly passes. ScopInfoRegionPass will return a SCoP object for a valid region while the creation of the SCoP stays in the ScopInfo class. Contributed-by: Utpal Bora <cs14mtech11017@iith.ac.in> Reviewed-by: Tobias Grosser <tobias@grosser.es>, Johannes Doerfert <doerfert@cs.uni-saarland.de> Differential Revision: http://reviews.llvm.org/D20770 llvm-svn: 271259	2016-05-31 09:41:04 +00:00
Sanjoy Das	03bcb910de	[Polly] Remove usage of the `apply` function Summary: API-wise `apply` is a somewhat unidiomatic one-off function, and removing the only(?) use in polly will let me remove it from SCEV's exposed interface. Reviewers: jdoerfert, Meinersbur, grosser Subscribers: grosser, mcrosier, pollydev Differential Revision: http://reviews.llvm.org/D20779 llvm-svn: 271177	2016-05-29 07:33:16 +00:00
Michael Kruse	996fb611b3	Remove some unused local variables. NFC. Found by clang static analyzer (http://llvm.org/reports/scan-build/) and Visual Studio. llvm-svn: 270432	2016-05-23 13:00:41 +00:00
Johannes Doerfert	0f0d209bec	Use the SCoP directly for canSynthesize [NFC] llvm-svn: 270429	2016-05-23 12:47:09 +00:00
Johannes Doerfert	ef74443c97	Duplicate part of the Region interface in the Scop class [NFC] This allows to use the SCoP directly for various queries, thus to hide the underlying region more often. llvm-svn: 270426	2016-05-23 12:42:38 +00:00
Johannes Doerfert	952b5304bc	Add and use Scop::contains(Loop/BasicBlock/Instruction) [NFC] llvm-svn: 270424	2016-05-23 12:40:48 +00:00
Johannes Doerfert	3f52e35471	Directly access information through the Scop class [NFC] llvm-svn: 270421	2016-05-23 12:38:05 +00:00
Johannes Doerfert	38a012c46b	Simplify BlockGenerator::handleOutsideUsers interface [NFC] llvm-svn: 270411	2016-05-23 09:14:07 +00:00
Johannes Doerfert	a61eda7698	[FIX] Let ScalarEvolution forget hoisted values We have to rethink the handling of escaping values in order to make this kind of "fixes" go away. llvm-svn: 270409	2016-05-23 09:02:54 +00:00
Johannes Doerfert	404a0f81ea	Check overflows in RTCs and bail accordingly We utilize assumptions on the input to model IR in polyhedral world. To verify these assumptions we version the code and guard it with a runtime-check (RTC). However, since the RTCs are themselves generated from the polyhedral representation we generate them under the same assumptions that they should verify. In other words, the guarantees that we try to provide with the RTCs do not hold for the RTCs themselves. To this end it is necessary to employ a different check for the RTCs that will verify the assumptions did hold for them too. Differential Revision: http://reviews.llvm.org/D20165 llvm-svn: 269299	2016-05-12 15:12:43 +00:00
Johannes Doerfert	e243753a4d	Simplify access relation for invariant loads early [NFC] llvm-svn: 269046	2016-05-10 11:59:59 +00:00
Johannes Doerfert	5f173d414e	Prevent complex access ranges with low number of pieces. Previously we checked the number of pieces to decide whether or not a invariant load was to complex to be generated. However, there are cases when e.g., divisions cause the complexity to spike regardless of the number of pieces. To this end we now check the number of totally involved dimensions which will increase with the number of pieces but also the number of divisions. llvm-svn: 269045	2016-05-10 11:46:57 +00:00
Tobias Grosser	1022ca5646	Codegen: Enable the detection of min/max expressions Min/max expressions are easier to read and can in some cases also result in more concise IR that is generated as the min/max --- when lowered to a cmp+select pattern -- commonly has a simpler condition then the ternary condition isl would normally generate. llvm-svn: 268855	2016-05-07 08:03:44 +00:00
Michael Kruse	bc150127ae	Rename Conjuncts -> Disjunctions. NFC. The check for complexity compares the number of polyhedra in a set, which are combined by disjunctions (union, "OR"), not conjunctions (intersection, "AND"). llvm-svn: 268223	2016-05-02 12:25:18 +00:00
Tobias Grosser	2e27a0f5fd	BlockGenerator: Drop leftover debug statement llvm-svn: 267874	2016-04-28 12:31:05 +00:00
Johannes Doerfert	8ab2803b63	[FIX] Propagate execution domain of invariant loads If the base pointer of an invariant load is is loaded conditionally, that condition needs to hold for the invariant load too. The structure of the program will imply this for domain constraints but not for imprecisions in the modeling. To this end we will propagate the execution context of base pointers during code generation and thus ensure the derived pointer does not access an invalid base pointer. llvm-svn: 267707	2016-04-27 12:49:11 +00:00
Johannes Doerfert	517d8d2f94	Check only loop control of loops that are part of the region This also removes a duplicated line of code in the region generator that caused a SPEC benchmark to fail with the new SCoPs. llvm-svn: 267404	2016-04-25 13:37:24 +00:00
Johannes Doerfert	561d36b320	Allow pointer expressions in SCEVs again. In r247147 we disabled pointer expressions because the IslExprBuilder did not fully support them. This patch reintroduces them by simply treating them as integers. The only special handling for pointers that is left detects the comparison of two address_of operands and uses an unsigned compare. llvm-svn: 265894	2016-04-10 09:50:10 +00:00
Johannes Doerfert	3c6a99b818	Add __isl_give annotations to return types [NFC] llvm-svn: 265882	2016-04-09 21:55:23 +00:00
Johannes Doerfert	a9dc529442	Collect and verify generated parallel subfunctions We verify the optimized function now for a long time and it helped to track down bugs early. This will now also happen for all parallel subfunctions we generate. llvm-svn: 265823	2016-04-08 18:16:02 +00:00
Johannes Doerfert	7b81103589	[FIX] Look through div & srem instructions in SCEVs The findValues() function did not look through div & srem instructions that were part of the argument SCEV. However, in different other places we already look through it. This mismatch caused us to preload values in the wrong order. llvm-svn: 265775	2016-04-08 10:25:58 +00:00
Johannes Doerfert	6ba927148d	[FIX] Adjust the insert point for non-affine region PHIs If a non-affine region PHI is generated we should not move the insert point prior to the synthezised value in the same block as we might split that block at the insert point later on. Only if the incoming value should be placed in a different block we should change the insertion point. llvm-svn: 265132	2016-04-01 11:25:47 +00:00
Tobias Grosser	b339594f5d	CodegenCleanup: Drop -load-combine pass This pass is not enabled in the default tool chain and currently can run into an infinite loop, due to other parts of LLVM generating incorrect IR (http://llvm.org/PR27065) -- which is not executed and consequently does not seem to disturb other passes. As this pass is not really needed, we can just drop it to get our build clean. This fixes the timeout issues in MultiSource/Benchmarks/MiBench/consumer-jpeg and MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg for -polly-position=before-vectorizer -polly-process-unprofitable.. Unfortunately, we are still left with a miscompile in cjpeg. llvm-svn: 264396	2016-03-25 12:11:06 +00:00
Johannes Doerfert	47197fe3f3	Add namespace for struct [NFC] This will clean up the doxygen documentation. llvm-svn: 264272	2016-03-24 13:20:52 +00:00
Tobias Grosser	bfb6a9683b	Codegen:Do not invalidate dominator tree when bailing out during code generation When codegenerating invariant loads in some rare cases we cannot generate code and bail out. This change ensures that we maintain a valid dominator tree in these situations. This fixes llvm.org/PR26736 Contributed-by: Matthias Reisinger <d412vv1n@gmail.com> llvm-svn: 264142	2016-03-23 06:57:51 +00:00
Michael Kruse	faedfcbf6d	[BlockGenerator] Fix PHI merges for MK_Arrays. Value merging is only necessary for scalars when they are used outside of the scop. While an array's base pointer can be used after the scop, it gets an extra ScopArrayInfo of type MK_Value. We used to generate phi's for both of them, where one was assuming the reault of the other phi would be the original value, because it has already been replaced by the previous phi. This resulted in IR that the current IR verifier allows, but is probably illegal. This reduces the number of LNT test-suite fails with -polly-position=before-vectorizer -polly-process-unprofitable from 16 to 10. Also see llvm.org/PR26718. llvm-svn: 262629	2016-03-03 17:20:43 +00:00
Hongbin Zheng	2a798853f8	Allow the client of DependenceInfo to obtain dependences at different granularities. llvm-svn: 262591	2016-03-03 08:15:33 +00:00
Michael Kruse	c7e0d9c216	Fix non-synthesizable loop exit values. Polly recognizes affine loops that ScalarEvolution does not, in particular those with loop conditions that depend on hoisted invariant loads. Check for SCEVAddRec dependencies on such loops and do not consider their exit values as synthesizable because SCEVExpander would generate them as expressions that depend on the original induction variables. These are not available in generated code. llvm-svn: 262404	2016-03-01 21:44:06 +00:00
Johannes Doerfert	066dbf3f8e	Track assumptions and restrictions separatly In order to speed up compile time and to avoid random timeouts we now separately track assumptions and restrictions. In this context assumptions describe parameter valuations we need and restrictions describe parameter valuations we do not allow. During AST generation we create a runtime check for both, whereas the one for the restrictions is negated before a conjunction is build. Except the In-Bounds assumptions we currently only track restrictions. Differential Revision: http://reviews.llvm.org/D17247 llvm-svn: 262328	2016-03-01 13:06:28 +00:00
Johannes Doerfert	abadd71da1	[FIX] Prevent compile time problems due to complex invariant loads This cures the symptoms we see in h264 of SPEC2006 but not the cause. llvm-svn: 262327	2016-03-01 13:05:14 +00:00
Tobias Grosser	64ca00c344	IslAst: Expose run-time check generation as individual function This allows to construct run-time checks for a scop without having to generate a full AST. This is currently not taken advantage of in Polly itself, but external users may benefit from this feature. llvm-svn: 262009	2016-02-26 12:59:38 +00:00
Hongbin Zheng	defd098612	Adapt to LLVM head, again llvm-svn: 261905	2016-02-25 17:54:42 +00:00
Hongbin Zheng	566c614525	Revert "Adapt to LLVM head. NFC" This reverts commit 4d3753b9646a69c00d234ccd6e91dc3d0ea5d643. llvm-svn: 261892	2016-02-25 16:46:17 +00:00
Hongbin Zheng	f4e35f9cb9	Adapt to LLVM head. NFC llvm-svn: 261886	2016-02-25 16:36:09 +00:00
Michael Kruse	8f25b0cb4d	Use inline local variable declaration. NFC. llvm-svn: 261876	2016-02-25 15:52:43 +00:00
Johannes Doerfert	a792098047	Support calls with known ModRef function behaviour Check the ModRefBehaviour of functions in order to decide whether or not a call instruction might be acceptable. Differential Revision: http://reviews.llvm.org/D5227 llvm-svn: 261866	2016-02-25 14:08:48 +00:00
Michael Kruse	f33c125dd2	Fix DomTree preservation for generated subregions. The generated dedicated subregion exit block was assumed to have the same dominance relation as the original exit block. This is incorrect if the exit block receives other edges than only from the subregion, which results in that e.g. the subregion's entry block does not dominate the exit block. llvm-svn: 261865	2016-02-25 14:08:48 +00:00
Michael Kruse	375cb5fe0a	Introduce ScopStmt::getEntryBlock(). NFC. This replaces an ungly inline ternary operator pattern. llvm-svn: 261792	2016-02-24 22:08:24 +00:00
Michael Kruse	6f7721f02b	Introduce Scop::getStmtFor. NFC. Replace Scop::getStmtForBasicBlock and Scop::getStmtForRegionNode, and add overloads for llvm::Instruction and llvm::RegionNode. getStmtFor and overloads become the common interface to get the Stmt that contains something. Named after LoopInfo::getLoopFor and RegionInfo::getRegionFor. llvm-svn: 261791	2016-02-24 22:08:19 +00:00
Michael Kruse	eac9726e8c	Add assertions checking def dominates use. NFC. This is also be caught by the function verifier, but disconnected from the place that produced it. Catch it already at creation to be able to reason more directly about the cause. llvm-svn: 261790	2016-02-24 22:08:14 +00:00
Roman Gareev	11001e1534	Annotation of SIMD loops Use 'mark' nodes annotate a SIMD loop during ScheduleTransformation and skip parallelism checks. The buildbot shows the following compile/execution time changes: Compile time: Improvements Δ Previous Current σ …/gesummv -6.06% 0.2640 0.2480 0.0055 …/gemver -4.46% 0.4480 0.4280 0.0044 …/covariance -4.31% 0.8360 0.8000 0.0065 …/adi -3.23% 0.9920 0.9600 0.0065 …/doitgen -2.53% 0.9480 0.9240 0.0090 …/3mm -2.33% 1.0320 1.0080 0.0087 Execution time: Regressions Δ Previous Current σ …/viterbi 1.70% 5.1840 5.2720 0.0074 …/smallpt 1.06% 12.4920 12.6240 0.0040 Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: http://reviews.llvm.org/D14491 llvm-svn: 261620	2016-02-23 09:00:13 +00:00
Tobias Grosser	820cf20a98	IslAst: Expose IslAst class in header file [NFC] This allows other passes and transformations to use some of the existing AST building infrastructure. This is not yet used in Polly itself. llvm-svn: 261496	2016-02-21 20:01:28 +00:00
Tobias Grosser	2b809d1390	BlockGenerator: Drop unnecessary return value llvm-svn: 261473	2016-02-21 15:44:34 +00:00
Tobias Grosser	58e585444a	Codegen: Print error in Polly code verification and allow to disable verfication. We now always print the reason why the code did not pass the LLVM verifier and we also allow to disable verfication with -polly-codegen-verify=false. Before this change the first assertion had generally no information why or what might have gone wrong and it was also impossible to -view-cfg without recompile. This change makes debugging bugs that result in incorrect IR a lot easier. llvm-svn: 261320	2016-02-19 11:07:12 +00:00
Hongbin Zheng	8831eb7db4	[Refactor] Move isl_ctx into Scop. After we moved isl_ctx into Scop, we need to free the isl_ctx after freeing all isl objects, which requires the ScopInfo pass to be freed at last. But this is not guaranteed by the PassManager, and we need extra code to free the isl_ctx at the right time. We introduced a shared pointer to manage the isl_ctx, and distribute it to all analyses that create isl objects. As such, whenever we free an analyses with the shared_ptr (and also free the isl objects which are created by the analyses), we decrease the (shared) reference counter of the shared_ptr by 1. Whenever the reference counter reach 0 in the releaseMemory function of an analysis, that analysis will be the last one that hold any isl objects, and we can safely free the isl_ctx with that analysis. Differential Revision: http://reviews.llvm.org/D17241 llvm-svn: 261100	2016-02-17 15:49:21 +00:00

... 2 3 4 5 6 ...

816 Commits