llvm-project

Commit Graph

Author	SHA1	Message	Date
Siddharth Bhat	1fc7b76a2b	[NFC] [PPCGCodeGeneration] Add test for simple invariant load hoisting. - This already works, but add this to ensure that there is no regressions when I expand the invariant load hoisting ability of `PPCGCodeGeneration`. llvm-svn: 307398	2017-07-07 13:44:22 +00:00
Tobias Grosser	41f02a9960	Make create_ll work with latest LLVM [NFC] - Instead of running with -O0, we enable the highest optimization level, but then disable optimizations. This ensures that possibly important metadata is still emitted. - Update the code for attribute removal to work with latest LLVM - Do not cut an arbitrary number of lines from the LL file. It is undocumented why this was needed at the first place, and such a feature is likely to break with trivial IR changes that may come in the future. llvm-svn: 307355	2017-07-07 04:20:55 +00:00
Siddharth Bhat	761e5b9310	[Polly] [PPCGCodeGeneration] Teach `must_kills` to kill scalars that are local to the scop. - By definition, we can pass something as a `kill` to PPCG if we know that no data can flow across a kill. - This is useful for more complex examples where we have scalars that are local to a scop. - If the local is only used within a scop, we are free to kill it. Differential Revision: https://reviews.llvm.org/D35045 llvm-svn: 307260	2017-07-06 13:42:42 +00:00
Singapuram Sanjay Srivallabh	79f13b9a80	Prefix the name of the calling host function in the name of callee GPU kernel Summary: Provide more context to the name of a GPU kernel by prefixing its name with the host function that calls it. E.g. The first kernel called by `gemm` would be `FUNC_gemm_KERNEL_0`. Kernels currently follow the "kernel_#" (# = 0,1,2,3,...) nomenclature. This patch makes it easier to map host caller and device callee, especially when there are many kernels produced by Polly-ACC. Reviewers: grosser, Meinersbur, bollu, philip.pfaffe, kbarton! Reviewed By: grosser Subscribers: nemanjai, pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D33985 llvm-svn: 307173	2017-07-05 16:48:21 +00:00
Siddharth Bhat	de0a534c75	[NFC] Fix breaking build by adding REQUIRES: pollyacc llvm-svn: 307165	2017-07-05 15:20:28 +00:00
Siddharth Bhat	a82f2d264a	[PPCGCodeGeneration] Teach Polly to start using live range reordering. Polly did not use PPCG's live range reordering feature. Teach PPCGCodeGeneration to use this. Documentation on this is sparse, so much of the code is conservative. We currently kill all phi nodes in a Scop by appending them to the must_kill map we pass to PPCG. I do not have a proof of correctness, but it seems to be intuitively correct. We also do not handle `array_order`, which, quoting PPCG, is: PPCG/gpu.h: "Order dependences on non-scalars." It seems to consist of RAW dependences between arrays. We need to pass this information for more complex privatization cases. Differential Revision: https://reviews.llvm.org/D34941 llvm-svn: 307163	2017-07-05 14:57:04 +00:00
Tobias Grosser	5e41458985	Bump isl to isl-0.18-768-g033b61ae Summary: This is a general maintenance update Reviewers: grosser Subscribers: srhines, fedor.sergeev, pollydev, llvm-commits Contributed-by: Maximilian Falkenstein <falkensm@student.ethz.ch> Differential Revision: https://reviews.llvm.org/D34903 llvm-svn: 307090	2017-07-04 15:54:11 +00:00
Michael Kruse	b738ffa845	Heap allocation for new arrays. This patch aims to implement the option of allocating new arrays created by polly on heap instead of stack. To enable this option, a key named 'allocation' must be written in the imported json file with the value 'heap'. We need such a feature because in a next iteration, we will implement a mechanism of maximal static expansion which will need a way to allocate arrays on heap. Indeed, the expansion is very costly in terms of memory and doing the allocation on stack is not worth considering. The malloc and the free are added respectively at polly.start and polly.exiting such that there is no use-after-free (for instance in case of Scop in a loop) and such that all memory cells allocated with a malloc are free'd when we don't need them anymore. We also add : - In the class ScopArrayInfo, we add a boolean as member called IsOnHeap which represents the fact that the array in allocated on heap or not. - A new branch in the method allocateNewArrays in the ISLNodeBuilder for the case of heap allocation. allocateNewArrays now takes a BBPair containing polly.start and polly.exiting. allocateNewArrays takes this two blocks and add the malloc and free calls respectively to polly.start and polly.exiting. - As IntPtrTy for the malloc call, we use the DataLayout one. To do that, we have modified : - createScopArrayInfo and getOrCreateScopArrayInfo such that it returns a non-const SAI, in order to be able to call setIsOnHeap in the JSONImporter. - executeScopConditionnaly such that it return both start block and end block of the scop, because we need this two blocs to be able to add the malloc and the free calls at the right position. Differential Revision: https://reviews.llvm.org/D33688 llvm-svn: 306540	2017-06-28 13:02:43 +00:00
Andreas Simbuerger	6d08ec7233	[JSONImport] Check, if the size of an imported array is positive llvm-svn: 306479	2017-06-27 22:30:44 +00:00
Andreas Simbuerger	4e6eed8566	[FIX] Add %loadPolly to test This test fails, if polly is not linked into LLVM's tools. Our lit site-config already deals with this by not adding the -load option, if polly is linked into LLVM's tools. llvm-svn: 306395	2017-06-27 10:47:55 +00:00
Siddharth Bhat	65d7f72f2c	[PPCGCodeGeneration] Add flag to allow polly to fail in GPU kernel fails. - This is useful for debugging GPU code. llvm-svn: 306290	2017-06-26 14:56:56 +00:00
Siddharth Bhat	f291c8d510	[PPCGCodeGeneration] Allow intrinsics within kernels. - In D33414, if any function call was found within a kernel, we would bail out. - This is an over-approximation. This patch changes this by allowing the `llvm.sqrt.*` family of intrinsics. - This introduces an additional step when creating a separate llvm::Module for a kernel (GPUModule). We now copy function declarations from the original module to new module. - We also populate IslNodeBuilder::ValueMap so it replaces the function references to the old module to the ones in the new module (GPUModule). Differential Revision: https://reviews.llvm.org/D34145 llvm-svn: 306284	2017-06-26 13:12:06 +00:00
Tobias Grosser	2927cb7520	[tests] Add forgotten pollyacc REQUIRES line llvm-svn: 306273	2017-06-26 06:07:40 +00:00
Siddharth Bhat	a12f807f33	[PPCGCodeGeneration] Enable GPU code generation with invariant loads. The condition that disallowed code generation in PPCGCodeGeneration with invariant loads is not required. I haven't been able to construct a counterexample where this generates invalid code. Differential Revision: https://reviews.llvm.org/D34604 llvm-svn: 306245	2017-06-25 14:48:24 +00:00
Tobias Grosser	1b9d1bcc6d	[ScopInfo] Bound the number of array disjuncts in run-time bounds checks This reduces the compilation time of one reduced test case from Android from 16 seconds to 100 mseconds (we bail out), without negatively impacting any other test case we currently have. We still saw occasionally compilation timeouts on the AOSP buildbot. Hopefully, those will go away with this change. llvm-svn: 306235	2017-06-25 06:32:00 +00:00
Roman Gareev	c4a4d04717	[FIX] A small addition to r305675. llvm-svn: 306234	2017-06-25 06:30:11 +00:00
Eli Friedman	5e589ea4b1	[ScopInfo] Fix crash with sum of invariant load and AddRec. r303971 added an assertion that SCEV addition involving an AddRec and a SCEVUnknown must involve a dominance relation: either the SCEVUnknown value dominates the AddRec's loop, or the AddRec's loop header dominates the SCEVUnknown. This is generally fine for most usage of SCEV because it isn't possible to write an expression in IR which would violate it, but it's a bit inconvenient here for polly. To solve the issue, just avoid creating a SCEV expression which triggers the asssertion. I'm not really happy with this solution, but I don't have any better ideas. Fixes https://bugs.llvm.org/show_bug.cgi?id=33464. Differential Revision: https://reviews.llvm.org/D34259 llvm-svn: 305864	2017-06-20 22:53:02 +00:00
Michael Kruse	214deb7960	[CodeGen] Emit aliasing metadata for new arrays. Ensure that all array base pointers are assigned before generating aliasing metadata by allocating new arrays beforehand. Before this patch, getBasePtr() returned nullptr for new arrays because the arrays were created at a later point. Nullptr did not match to any array after the created array base pointers have been assigned and when the loads/stores are generated. llvm-svn: 305675	2017-06-19 10:19:29 +00:00
Eli Friedman	127e0cd21b	Don't check side effects for functions outside of SCoP In r304074 we introduce a patch to accept results from side effect free functions into SCEV modeling. This causes rejection of cases where the call is happening outside the SCoP. This patch checks if the call is outside the Region and treats the results as a parameter (SCEVType::PARAM) to the SCoP instead of returning SCEVType::INVALID. Patch by Sameer Abu Asal. llvm-svn: 305423	2017-06-14 22:43:28 +00:00
Siddharth Bhat	bccaea57c0	[Polly] [PPCGCodeGeneration] Skip Scops which contain function pointers. In `PPCGCodeGeneration`, we try to take the references of every `Value` that is used within a Scop to offload to the kernel. This occurs in `GPUNodeBuilder::createLaunchParameters`. This breaks if one of the values is a function pointer, since one of these cases will trigger: 1. We try to to take the references of an intrinsic function, and this breaks at `verifyModule`, since it is illegal to take the reference of an intrinsic. 2. We manage to take the reference to a function, but this fails at `verifyModule` since the function will not be present in the module that is created in the kernel. 3. Even if `verifyModule` succeeds (which should not occur), we would then try to call a host function from the device, which is illegal runtime behaviour. So, we disable this entire range of possibilities by simply not allowing function references within a `Scop` which corresponds to a kernel. However, note that this is too conservative. We can allow intrinsics within kernels if the backend can lower the intrinsic correctly. For example, an intrinsic like `llvm.powi.*` can actually be lowered by the `NVPTX` backend. We will now gradually whitelist intrinsics which are known to be safe. Differential Revision: https://reviews.llvm.org/D33414 llvm-svn: 305185	2017-06-12 11:41:09 +00:00
Siddharth Bhat	286c916dde	[Polly] [ScopDetection] Allow passing multiple functions to `-polly-only-func`. - This is useful to run optimisations on only certain functions. Differential Revision: https://reviews.llvm.org/D33990 llvm-svn: 305060	2017-06-09 08:23:40 +00:00
Michael Kruse	ad7a1805be	[Simplify] Use execution order of memory accesses. Iterate through memory accesses in execution order (first all implicit reads, then explicit accesses, then implicit writes). In the test case this caused an implicit load to be handled as if it was loaded after the write. That is, the value being written before it is available. This fixes llvm.org/PR33323 llvm-svn: 304810	2017-06-06 17:46:42 +00:00
Tobias Grosser	deefbced96	[Polly] [BlockGen] Support partial writes in regions Summary: The RegionGenerator traditionally kept a BlockMap that mapped from original basic blocks to newly generated basic blocks. With the introduction of partial writes such a 1:1 mapping is not possible any more, as a single basic block can be code generated into multiple basic blocks. Hence, depending on the use case we need to either use the first basic block or the last basic block. This is intended to address the last four cases of incorrect code generation in our AOSP buildbot and hopefully should turn it green. Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg Reviewed By: Meinersbur Subscribers: pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D33767 llvm-svn: 304808	2017-06-06 17:17:30 +00:00
Tobias Grosser	22be8a18f3	Add test coverage for regions with non-affine loops This adds test coverage for regions with non-affine loops, which we unfortunately missed when committing this features years ago. We will add more test coverage over time. llvm-svn: 304672	2017-06-03 23:39:02 +00:00
Siddharth Bhat	726c28f8c4	[CodeGen] Track trip counts per-scop for performance measurement. - Add a counter that is incremented once on exit from a scop. - Test cases got split into two: one to test the cycles, and another one to test trip counts. - Sample output: ```name=sample-output.txt scop function, entry block name, exit block name, total time, trip count warmup, %entry.split, %polly.merge_new_and_old, 5180, 1 f, %entry.split, %polly.merge_new_and_old, 409944, 500 g, %entry.split, %polly.merge_new_and_old, 1226, 1 ``` Differential Revision: https://reviews.llvm.org/D33822 llvm-svn: 304543	2017-06-02 11:36:52 +00:00
Siddharth Bhat	a4dea6bb05	[CodeGen] Print performance counter information in CSV. This ensures that tools can parse performance information which Polly generates easily. - Sample output: ```name=out.csv scop function, entry block name, exit block name, total time warmup, %entry.split, %polly.merge_new_and_old, 1960 f, %entry.split, %polly.merge_new_and_old, 1238 g, %entry.split, %polly.merge_new_and_old, 1218 ``` - Example code to parse output: ```lang=python, name=example-parse.py import asciitable import sys table = asciitable.read('out.csv', delimiter=',') asciitable.write(table, sys.stdout, delimiter=',') ``` llvm-svn: 304533	2017-06-02 09:20:02 +00:00
Siddharth Bhat	07bee290de	[CodeGen] Extend Performance Counter to track per-scop information. Previously, we would generate one performance counter for all scops. Now, we generate both the old information, as well as a per-scop performance counter to generate finer grained information. This patch needed a way to generate a unique name for a `Scop`. The start region, end region, and function name combined provides a unique `Scop` name. So, `Scop` has a new public API to provide its start and end region names. Differential Revision: https://reviews.llvm.org/D33723 llvm-svn: 304528	2017-06-02 08:01:22 +00:00
Michael Kruse	678aa336fa	[ScopBuilder] Exclude ignored intrinsics from explicit instruction list. Ignored intrinsics are ignored at code generation, therefore do not need to be part of the instruction list. Specifically, llvm.lifetime.* intrinisics are removed before code generation, referencing them would cause a use-after-free error. Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in> Differential Revision: https://reviews.llvm.org/D33768 llvm-svn: 304483	2017-06-01 21:46:27 +00:00
Tobias Grosser	f51decb5fe	[BlockGenerator] Take context into account when identifying partial writes A partial write is a write where the domain of the values written is a subset of the execution domain of the parent statement containing the write. Originally, we directly checked this subset relation whereas it is indeed only important that the subset relation holds for the parameter values that are known to be valid in the execution context of the scop. We update our check to avoid the unnecessary introduction of partial writes in situations where the write appears to be partial without context information, but where context information allows us to understand that a full write can be generated. This change fixes (hides) a recent regression introduced in r303517, which broke our AOSP builds. The part that is correctly fixed in this change is that we do not any more unnecessarily generate a partial write. This is good performance wise and, as we currently do not yet explicitly introduce partial writes in the default configuration, this also hides possible bugs in the partial writes implementation. The crashes that we have originally seen were caused by such a bug, where partial writes were incorrectly generated in region statements. An additional patch in a subsequent commit is needed to address this problem. Reported-by: Reported-by: Eli Friedman <efriedma@codeaurora.org> Differential Revision: https://reviews.llvm.org/D33759 llvm-svn: 304398	2017-06-01 09:34:20 +00:00
Tobias Grosser	5863087ad3	[test] Add a short explanation to test llvm-svn: 304279	2017-05-31 05:03:11 +00:00
Michael Kruse	ed0c2f7e90	[ScopInfo] Do not add terminator & synthesizable instructions to the output instructions. Such instructions are generates on-demand by the CodeGenerator and thus do not need representation in a statement. Differential Revision: https://reviews.llvm.org/D33642 llvm-svn: 304151	2017-05-29 12:27:38 +00:00
Tobias Grosser	1e55db30d5	Delinearize memory accesses that reference parameters coming from function calls Certain affine memory accesses which we model today might contain products of parameters which we might combined into a new parameter to be able to create an affine expression that represents these memory accesses. Especially in the context of OpenCL, this approach looses information as memory accesses such as A[get_global_id(0) * N + get_global_id(1)] are assumed to be linear. We correctly recover their multi-dimensional structure by assuming that parameters that are the result of a function call at IR level likely are not parameters, but indeed induction variables. The resulting access is now A[get_global_id(0)][get_global_id(1)] for an array A[][N]. llvm-svn: 304075	2017-05-27 15:18:53 +00:00
Tobias Grosser	f5e7e60bc8	Allow side-effect free function calls in valid affine SCEVs Side-effect free function calls with only constant parameters can be easily re-generated and consequently do not prevent us from modeling a SCEV. This change allows array subscripts to reference function calls such as 'get_global_id()' as used in OpenCL. We use the function name plus the constant operands to name the parameter. This is possible as the function name is required and is not dropped in release builds the same way names of llvm::Values are dropped. We also provide more readable names for common OpenCL functions, to make it easy to understand the polyhedral model we generate. llvm-svn: 304074	2017-05-27 15:18:46 +00:00
Tobias Grosser	7aa22859b6	Update some tests to changes in isl's internal representation This was forgotten as part of r304069. llvm-svn: 304070	2017-05-27 11:33:05 +00:00
Tobias Grosser	d5fcbef8ee	[Polly] Added the list of Instructions to output in ScopInfo pass Summary: This patch outputs all the list of instructions in BlockStmts. Reviewers: Meinersbur, grosser, bollu Subscribers: bollu, llvm-commits, pollydev Differential Revision: https://reviews.llvm.org/D33163 llvm-svn: 304062	2017-05-27 04:40:18 +00:00
Philip Pfaffe	1a0128faaa	[Polly] Add handling of Top Level Regions Summary: My goal is to make the newly added `AllowWholeFunctions` options more usable/powerful. The changes to ScopBuilder.cpp are exclusively checks to prevent `Region.getExit()` from being dereferenced, since Top Level Regions (TLRs) don't have an exit block. In ScopDetection's `isValidCFG`, I removed a check that disallowed ReturnInstructions to have return values. This might of course have been intentional, so I would welcome your feedback on this and maybe a small explanation why return values are forbidden. Maybe it can be done but needs more changes elsewhere? The remaining changes in ScopDetection are simply to consider the AllowWholeFunctions option in more places, i.e. allow TLRs when it is set and once again avoid derefererncing `getExit()` if it doesn't exist. Finally, in ScopHelper.cpp I extended `polly::isErrorBlock` to handle regions without exit blocks as well: The original check was if a given BasicBlock dominates all predecessors of the exit block. Therefore I do the same for TLRs by regarding all BasicBlocks terminating with a ReturnInst as predecessors of a "virtual" function exit block. Patch by: Lukas Boehm Reviewers: philip.pfaffe, grosser, Meinersbur Reviewed By: grosser Subscribers: pollydev, llvm-commits, bollu Tags: #polly Differential Revision: https://reviews.llvm.org/D33411 llvm-svn: 303790	2017-05-24 18:39:39 +00:00
Michael Kruse	5f16986271	[DeLICM] Partial writes for PHIs. Enable the use for partial writes for PHI write accesses with a switch. This simply skips the test for whether a PHI write would be partial. The analog test for partial value writes also protects for partial reads which we do not support (yet). It is possible to test for partial reads separately such that we could skip the partial write check as well. In case this shows up to be useful, I can implement it as well. Differential Revision: https://reviews.llvm.org/D33487 llvm-svn: 303762	2017-05-24 15:23:06 +00:00
Michael Kruse	cb58bd6ccd	[JSONImporter] misses checks whether the data it imports makes sense. Without this patch, the JSONImporter did not verify if the data it loads were correct or not (Bug llvm.org/PR32543). I add some checks in the JSONImporter class and some test cases. Here are the checks (and test cases) I added : JSONImporter::importContext - The "context" key does not exist. - The context was not parsed successfully by ISL. - The isl_set has the wrong number of parameters. - The isl_set is not a parameter set. JSONImporter::importSchedule - The "statements" key does not exist. - There is not the right number of statement in the file. - The "schedule" key does not exist. - The schedule was not parsed successfully by ISL. JSONImporter::importAccesses - The "statements" key does not exist. - There is not the right number of statement in the file. - The "accesses" key does not exist. - There is not the right number of memory accesses in the file. - The "relation" key does not exist. - The memory access was not parsed successfully by ISL. JSONImporter::areArraysEqual - The "type" key does not exist. - The "sizes" key does not exist. - The "name" key does not exist. JSONImporter::importArrays /!\ Do not check if there is an key name "arrays" because it is not considered as an error. All checks are already in place or implemented in JSONImporter::areArraysEqual. Contributed-by: Nicolas Bonfante <nicolas.bonfante@insa-lyon.fr> Differential Revision: https://reviews.llvm.org/D32739 llvm-svn: 303759	2017-05-24 15:09:35 +00:00
Tobias Grosser	6d459c5d3d	[ScopInfo] Simplify domains early This speeds up scop modeling for scops with many redundent existentially quantified constraints. For the attached test case, this change reduces scop modeling time from minutes (hours?) to 0.15 seconds. This change resolves a compilation timeout on the AOSP build. Thanks Eli for reporting _and_ reducing the test case! Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 303600	2017-05-23 04:26:28 +00:00
Michael Kruse	1aad76c18f	[CodeGen] Add invalidation of the loop SCEVs after merge block generation. The SCEVs of loops surrounding the escape users of a merge blocks are forgotten, so that loop trip counts based on old values can be revoked. This fixes llvm.org//PR32536 Contributed-by: Baranidharan Mohan <mbdharan@gmail.com> Differential Revision: https://reviews.llvm.org/D33195 llvm-svn: 303561	2017-05-22 15:36:53 +00:00
Michael Kruse	706f79ab14	[CodeGen] Support partial write accesses. Allow the BlockGenerator to generate memory writes that are not defined over the complete statement domain, but only over a subset of it. It generates a condition that evaluates to 1 if executing the subdomain, and only then execute the access. Only write accesses are supported. Read accesses would require a PHINode which has a value if the access is not executed. Partial write makes DeLICM able to apply mappings that are not defined over the entire domain (for instance, a branch that leaves a loop with a PHINode in its header; a MemoryKind::PHI write when leaving is never read by its PHI read). Differential Revision: https://reviews.llvm.org/D33255 llvm-svn: 303517	2017-05-21 22:46:57 +00:00
Tobias Grosser	ee61ebb134	Fix buildbots after r303429 A test case with a GPU runline was added without setting 'REQUIRES=pollyacc'. We drop the GPU run line, as the basic functionality can already be tested with the normal code generation. llvm-svn: 303485	2017-05-20 04:22:26 +00:00
Siddharth Bhat	b7f68b8c9e	[Fortran Support] Materialize outermost dimension for Fortran array. - We use the outermost dimension of arrays since we need this information to generate GPU transfers. - In general, if we do not know the outermost dimension of the array (because the indexing expression is non-affine, for example) then we simply cannot generate transfer code. - However, for Fortran arrays, we can use the Fortran array representation which stores the dimensions of all arrays. - This patch uses the Fortran array representation to generate code that computes the outermost dimension size. Differential Revision: https://reviews.llvm.org/D32967 llvm-svn: 303429	2017-05-19 15:07:45 +00:00
Tobias Grosser	d8945baa0a	[ScopDetection] Allow detection of full functions This is useful when only analyzing functions. llvm-svn: 303420	2017-05-19 12:13:02 +00:00
Tobias Grosser	45e9fd1810	[ScopInfo] Gracefully handle long compile times The following test case tried to compute the lexicographic minimum of the following set during alias analysis, which caused very long compile time: [p_0, p_1, p_2, p_3, p_4, p_5] -> { MemRef0[i0] : (517p_3 >= 70944 - 298p_2 and 256i0 >= -71199 + 298p_2 + 517p_3 and 256i0 <= -70944 + 298p_2 + 517p_3) or (409p_4 >= 57120 - 298p_2 and 256i0 >= -57375 + 298p_2 + 409p_4 and 256i0 <= -57120 + 298p_2 + 409p_4) or (104p_4 >= 17329 + 149p_2 - 50p_3 and 128i0 >= 17328 + 149p_2 - 50p_3 - 104p_4 and 128i0 <= 17455 + 149p_2 - 50p_3 - 104p_4) or (104p_4 <= 17328 + 149p_2 - 50p_3 and 128i0 >= 17201 + 149p_2 - 50p_3 - 104p_4 and 128i0 <= 17328 + 149p_2 - 50p_3 - 104p_4) or (409p_4 <= 57119 - 298p_2 and 256i0 >= -57120 + 298p_2 + 409p_4 and 256i0 <= -56865 + 298p_2 + 409p_4) or (517p_3 <= 70943 - 298p_2 and 256i0 >= -70944 + 298p_2 + 517p_3 and 256i0 <= -70689 + 298p_2 + 517p_3) or (p_1 >= 2 + 2p_0 and 298p_5 >= 70944 - 517p_3 and 256i0 >= -71199 + 517p_3 + 298p_5 and 256i0 <= -70944 + 517p_3 + 298p_5) or (p_1 >= 2 + 2p_0 and 298p_5 >= 57120 - 409p_4 and 256i0 >= -57375 + 409p_4 + 298p_5 >and 256i0 <= -57120 + 409p_4 + 298p_5) or (p_1 >= 2 + 2p_0 and 149p_5 <= -17329 >+ 50p_3 + 104p_4 and 128i0 >= 17328 - 50p_3 - 104p_4 + 149p_5 and 128i0 <= >17455 - 50p_3 - 104p_4 + 149p_5) or (p_1 >= 2 + 2p_0 and 149p_5 >= -17328 + >50p_3 + 104p_4 and 128i0 >= 17201 - 50p_3 - 104p_4 + 149p_5 and 128i0 <= 17328 >- 50p_3 - 104p_4 + 149p_5) or (p_1 >= 2 + 2p_0 and 298p_5 <= 57119 - 409p_4 and >256i0 >= -57120 + 409p_4 + 298p_5 and 256i0 <= -56865 + 409p_4 + 298p_5) or >(p_1 >= 2 + 2p_0 and 298p_5 <= 70943 - 517p_3 and 256i0 >= -70944 + 517p_3 + >298p_5 and 256i0 <= -70689 + 517p_3 + 298p_5) } We now guard the potentially expensive functions in Polly's scop analysis to gracefully bail out in case of overly long compilation times. llvm-svn: 303404	2017-05-19 03:45:00 +00:00
Siddharth Bhat	06e3c74d83	[Fortran Support] Change "global" pattern match to work for params Summary: - Rename global / local naming convention that did not make much sense to Visible / Invisible, where the visible refers to whether the ALLOCATE call to the Fortran array is present in the current module or not. - This match now works on both cross fortran module globals and on parameters to functions since neither of them are necessarily allocated at the point of their usage. - Add testcase that matches against both a load and a store against function parameters. Differential Revision: https://reviews.llvm.org/D33190 llvm-svn: 303356	2017-05-18 16:47:13 +00:00
Philip Pfaffe	3030bf0c81	[Polly][Fortran Support] Fix two testcases for the loadable-library use-case llvm-svn: 303057	2017-05-15 12:58:31 +00:00
Siddharth Bhat	0fe7231a2f	[Fortran Support] Add pattern match for Fortran Arrays that are parameters. - This breaks the previous assumption that Fortran Arrays are `GlobalValue`. - The names of functions were getting unwieldy. So, I renamed the Fortran related functions. Differential Revision: https://reviews.llvm.org/D33075 llvm-svn: 303040	2017-05-15 08:41:30 +00:00
Tobias Grosser	b693f42b71	[Polly] Fix code generation of llvm.expect intrinsic At the time of code generation, an instruction with an llvm intrinsic is ignored in copyBB. However, if the value of the instruction is used later in the program, the value needs to be synthesized. However, this is causing some issues with the instructions being generated in a hoisted basic block. Removing llvm.expect from the list of ignored intrinsics fixes this bug. This resolves http://llvm.org/PR32324. Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in> Tags: #polly Differential Revision: https://reviews.llvm.org/D32992 llvm-svn: 303006	2017-05-14 09:09:54 +00:00
Michael Kruse	fa7be88378	[Simplify] Remove identical write removal. NFC. Removal of overwritten writes currently encompasses all the cases of the identical write removal. There is an observable behavioral change in that the last, instead of the first, MemoryAccess is kept. This should not affect the generated code, however. Differential Revision: https://reviews.llvm.org/D33143 llvm-svn: 302987	2017-05-13 12:20:57 +00:00
Michael Kruse	f263610b82	[Simplify] Remove writes that are overwritten. Remove memory writes that are overwritten by later writes. This works for StoreInsts: store double 21.0, double* %A store double 42.0, double* %A scalar writes at the end of a statement and mixes of these. Multiple writes can be the result of DeLICM, which might map multiple writes to the same location when it knows that these do no conflict (for instance because they write the same value). Such writes interfere with pattern-matched optimization such as gemm and may not get removed by other LLVM passes after code generation. Differential Revision: https://reviews.llvm.org/D33142 llvm-svn: 302986	2017-05-13 11:49:34 +00:00
Siddharth Bhat	d0d29addf9	[NFC] [Fortran Support] Run -instnamer on testcases llvm-svn: 302892	2017-05-12 12:36:04 +00:00
Siddharth Bhat	f16db04cd5	[FIX] Fix regression caused by `c29f4ed`, testcase matches output - Commit changed codegen for induction variables - Updated testcase llvm-svn: 302891	2017-05-12 11:34:51 +00:00
Siddharth Bhat	c05fcc0d9e	[NFC] [Fortran Support] Cleanup Fortran Array pattern mactch testcases - Move the testcases to ScopInfo/ since the processing takes place in ScopBuilder. - Cleanup testcases, run -polly-canonicalize on them, find minimal set of opt parameters. llvm-svn: 302886	2017-05-12 09:37:39 +00:00
Hongbin Zheng	4fe342cb75	[Polly] Generate more 'canonical' induction variable Today Polly generates induction variable in this way: polly.indvar = phi 0, polly.indvar.next ... polly.indvar.next = polly.indvar + stide polly.loop_cond = predicate polly.indvar, (UB - stride) Instead of: polly.indvar = phi 0, polly.indvar.next ... polly.indvar.next = polly.indvar + stide polly.loop_cond = predicate polly.indvar.next, UB The way Polly generate induction variable cause some problem in the indvar simplify pass. This patch make polly generate the later form, by assuming the induction variable never overflow Differential Revision: https://reviews.llvm.org/D33089 llvm-svn: 302866	2017-05-12 02:17:15 +00:00
Michael Kruse	07e315e780	[Simplify] Remove identical scalar writes. After DeLICM, it is possible to have two writes of the same value to the same location in the same statement when it determined that those writes do not conflict (write the same value). Teach -polly-simplify to remove one of the writes. It interferes with the pattern matching of matrix-multiplication kernels and also seem to not be optimized away by LLVM. The algorthm is simple, has O(n^2) behaviour (n = max number of MemoryAccesses in a statement) and only matches the most obvious cases, but seem to be enough to pattern-match Boost ublas gemm. Not handled cases include: - StoreInst instructions (a.k.a. explicit writes), since the value might be loaded or overwritten between the two stores. - PHINode, especially LCSSA, when the PHI value matches with on other's. - Partial writes (in preparation) llvm-svn: 302805	2017-05-11 15:07:38 +00:00
Siddharth Bhat	abea18feba	[NFC] [Fortran Support] move Fortran array detection testcases move these testcases to where they belong: ScopDetect llvm-svn: 302735	2017-05-10 21:35:14 +00:00
Siddharth Bhat	f5c81fb199	[Fix][Fortran Support] Don't use -debug-only in pattern matching test cases -debug-only is unnecessary and causes the tests to break in Release mode. Remove the option to opt in the test cases. llvm-svn: 302722	2017-05-10 20:10:17 +00:00
Michael Kruse	f69a7c306b	[DeLICM] Always normalize domain. NFC. Some isl functions can simplify their __isl_keep arguments. The argument object after the call uses different contraints to represent the same set. Different contraints can result in different outputs when printed to a string. In assert builds additional isl functions are called (in assert() or mentioned, these can change the internal representation of its read-only arguments such that printed strings are different in debug and non-debug builds. What happened here is that a call to isl_set_is_equal inside an assert in getScatterFor normalizes one of its arguments such that one redundant constraint is removed. The redundant constraint therefore does not appear in the string representing the domain, which FileCheck notices as a regression test failure compared to a build with assertions disabled. This fix removes the redundant contraints the domain from the start such that the redundant contraint is removed in assert and non-assert builds. Isl adds a flag to such sets such that the removal of redundancies is not done multiple times (here: by isl_set_is_equal). Thanks to Tobias Grosser for reporting and hinting to the cause. llvm-svn: 302711	2017-05-10 19:50:45 +00:00
Siddharth Bhat	f2dbba8183	[Fortran Support] Detect Fortran arrays & metadata from dragonegg output Add the ability to tag certain memory accesses as those belonging to Fortran arrays. We do this by pattern matching against known patterns of Dragonegg's LLVM IR output from Fortran code. Fortran arrays have metadata stored with them in a struct. This struct is called the "Fortran array descriptor", and a reference to this is stored in each MemoryAccess. Differential Revision: https://reviews.llvm.org/D32639 llvm-svn: 302653	2017-05-10 13:11:20 +00:00
Tobias Grosser	f3adab4c20	[Polly] Canonicalize arrays according to base-ptr equivalence class Summary: In case two arrays share base pointers in the same invariant load equivalence class, we canonicalize all memory accesses to the first of these arrays (according to their order in the equivalence class). This enables us to optimize kernels such as boost::ublas by ensuring that different references to the C array are interpreted as accesses to the same array. Before this change the runtime alias check for ublas would fail, as it would assume models of the C array with differing (but identically valued) base pointers would reference distinct regions of memory whereas the referenced memory regions were indeed identical. As part of this change we remove most of the MemoryAccess::getBaseAddr interface. We removed already all references to getBaseAddr in previous commits to ensure that no code relies on matching base pointers between memory accesses and scop arrays -- except for three remaining uses where we need the original base pointer. We document for these situations that MemoryAccess::getOriginalBaseAddr may return a base pointer that is distinct to the base pointer of the scop array referenced by this memory access. Reviewers: sebpop, Meinersbur, zinob, gareevroman, pollydev, huihuiz, efriedma, jdoerfert Reviewed By: Meinersbur Subscribers: etherzhhb Tags: #polly Differential Revision: https://reviews.llvm.org/D28518 llvm-svn: 302636	2017-05-10 10:59:58 +00:00
Siddharth Bhat	a90be207c6	[Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen Summary: PPCGCodeGeneration now attaches the size of the kernel launch parameters at the end of the parameter list. For the existing CUDA Runtime, this gets ignored, but the OpenCL Runtime knows to check for kernel-argument size at the end of the parameter list. (The resulting parameters list is twice as long. This has been accounted for in the corresponding test cases). Reviewers: grosser, Meinersbur, bollu Reviewed By: bollu Subscribers: nemanjai, yaxunl, Anastasia, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D32961 llvm-svn: 302515	2017-05-09 10:45:52 +00:00
Siddharth Bhat	17f01968f1	[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen Summary: When compiling for GPU, one can now choose to compile for OpenCL or CUDA, with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library for that purpose, correctly choosing the corresponding library calls to the option chosen when compiling (via different initialization calls). Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far). Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay Reviewed By: grosser, Meinersbur Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia Tags: #polly Differential Revision: https://reviews.llvm.org/D32431 llvm-svn: 302379	2017-05-07 21:03:46 +00:00
Tobias Grosser	c6ad42165f	Really disable test as intended in the previous commit llvm-svn: 302360	2017-05-06 19:18:19 +00:00
Tobias Grosser	0f4e94673d	Disable test to avoid buildbot noise This test was introduced in r302339. It works on my system, but breaks on the buildbots. llvm-svn: 302358	2017-05-06 18:50:28 +00:00
Michael Kruse	5ae08c0ebb	[DeLICM] Known knowledge. Extend the Knowledge class to store information about the contents of array elements and which values are written. Two knowledges do not conflict the known content is the same. The content information if computed from writes to and loads from the array elements, and represented by "ValInst": isl spaces that compare equal if the value represented is the same. Differential Revision: https://reviews.llvm.org/D31247 llvm-svn: 302339	2017-05-06 14:03:58 +00:00
Tobias Grosser	c1ddedc657	Fix typo llvm-svn: 302244	2017-05-05 15:46:01 +00:00
Michael Kruse	f1052ceb5e	[ScopBuilder] Do not verify unfeasible SCoPs. SCoPs with unfeasible runtime context are thrown away and therefore do not need their uses verified. The added test case requires a complexity limit to exceed. Normally, error statements are removed from the SCoP and for that reason are skipped during the verification. If there is a unfeasible runtime context (here: because of the complexity limit being reached), the removal of error statements and other SCoP construction steps are skipped to not waste time. Error statements are not modeled in SCoPs and therefore have no requirements on whether the scalars used in them are available. llvm-svn: 302234	2017-05-05 13:38:35 +00:00
Tobias Grosser	d5727c5011	Fix handling of signWrappedSets in access relations Since r294891, in MemoryAccess::computeBoundsOnAccessRelation(), we skip manually bounding the access relation in case the parameter of the load instruction is already a wrapped set. Later on we assume that the lower bound on the set is always smaller or equal to the upper bound on the set. Bug 32715 manages to construct a sign wrapped set, in which case the assertion does not necessarily hold. Fix this by handling a sign wrapped set similar to a normal wrapped set, that is skipping the computation. Contributed-by: Maximilian Falkenstein <falkensm@student.ethz.ch> Reviewers: grosser Subscribers: pollydev, llvm-commits Tags: #Polly Differential Revision: https://reviews.llvm.org/D32893 llvm-svn: 302231	2017-05-05 13:20:47 +00:00
Siddharth Bhat	c1267b9baa	Revert "[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen" This reverts commit 17a84e414adb51ee375d14836d4c2a817b191933. Patches should have been submitted in the order of: 1. D32852 2. D32854 3. D32431 I mistakenly pushed D32431(3) first. Reverting to push in the correct order. llvm-svn: 302217	2017-05-05 09:02:08 +00:00
Siddharth Bhat	51904ae35a	[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen Summary: When compiling for GPU, one can now choose to compile for OpenCL or CUDA, with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library for that purpose, correctly choosing the corresponding library calls to the option chosen when compiling (via different initialization calls). Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far). Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay Reviewed By: grosser, Meinersbur Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia Tags: #polly Differential Revision: https://reviews.llvm.org/D32431 llvm-svn: 302215	2017-05-05 07:54:49 +00:00
Michael Kruse	45d5cf47bf	[CMake] Remove POLLY_TEST_DIRECTORIES. The test subdirectory POLLY_TEST_DIRECTORIES was heavily outdated and only used in out-of-LLVM-tree builds (to generate polly-test-${subdir} targets). llvm-svn: 302142	2017-05-04 12:21:25 +00:00
Tobias Grosser	1859463876	Adjust test case to not trigger the SCEV optimization committed in r302096 This makes sure we still test the case that a PHI-NODE cannot be analyzed by scalar evolution and consequently must be code generated explicitly. As Michael's optimization triggers only on a very specific "add %iv, %step" pattern, just changing 'add' to 'mul' adds back test coverage. llvm-svn: 302132	2017-05-04 08:56:54 +00:00
Tobias Grosser	e2ccc3fb33	[ScopInfo] Do not use LLVM names to identify statements, arrays, and parameters LLVM-IR names are commonly available in debug builds, but often not in release builds. Hence, using LLVM-IR names to identify statements or memory reference results makes the behavior of Polly depend on the compile mode. This is undesirable. Hence, we now just number the statements instead of using LLVM-IR names to identify them (this issue has previously been brought up by Zino Benaissa). However, as LLVM-IR names help in making test cases more readable, we add an option '-polly-use-llvm-names' to still use LLVM-IR names. This flag is by default set in the polly tests to make test cases more readable. This change reduces the time in ScopInfo from 32 seconds to 2 seconds for the following test case provided by Eli Friedman <efriedma@codeaurora.org> (already used in one of the previous commits): struct X { int x; }; void a(); #define SIG (int x, X y, X z) typedef void (fn)SIG; #define FN { for (int i = 0; i < x; ++i) { (y)[i].x += (*z)[i].x; } a(); } #define FN5 FN FN FN FN FN #define FN25 FN5 FN5 FN5 FN5 #define FN125 FN25 FN25 FN25 FN25 FN25 #define FN250 FN125 FN125 #define FN1250 FN250 FN250 FN250 FN250 FN250 void x SIG { FN1250 } For a larger benchmark I have on-hand (10000 loops), this reduces the time for running -polly-scops from 5 minutes to 4 minutes, a reduction by 20%. The reason for this large speedup is that our previous use of printAsOperand had a quadratic cost, as for each printed and unnamed operand the full function was scanned to find the instruction number that identifies the operand. We do not need to adjust the way memory reference ids are constructured, as they do not use LLVM values. Reviewed by: efriedma Tags: #polly Differential Revision: https://reviews.llvm.org/D32789 llvm-svn: 302072	2017-05-03 20:08:52 +00:00
Siddharth Bhat	88619946b6	[CUDA Managed Memory] Fix regression introduced by Managed Memory - Fixes breakage from commit 5536f. - Interference with commit 764f3 caused testcase to fail. Reverting 764f3 allows commit 5536f to succeed. - Generated kernel code was slightly different due to 764f3, which caused testcase to fail. llvm-svn: 302021	2017-05-03 13:15:27 +00:00
Tobias Grosser	8133128c17	[ScopInfo] Do not add array name into memory reference ids Before this change a memory reference identifier had the form: <STMT>_<ACCESSTYPE><ID>_<MEMREF>, e.g., Stmt_bb9_Write0_MemRef_tmp11 After this change, we use the format: <STMT>_<ACCESSTYPE><ID>, e.g., Stmt_bb9_Write0 The name of the array that is accessed through a memory reference is not necessary to uniquely identify a memory reference, but was only added to provide additional information for debugging. We drop this information now for the following two reasons: 1) This shortens the names and consequently improves readability 2) This removes a second location where we decide on the name of a scop array, leaving us only with the location where the actual scop array is created. Having after 2) only a single location to name scop arrays will allow us to change the naming convention of scop arrays more easily, which we will do in a future commit to reduce compilation time. llvm-svn: 302004	2017-05-03 07:57:35 +00:00
Tobias Grosser	3d76f2ccd3	[tests] Ensure all test cases use named variables This makes it easier to read and possibly even modify the test cases, as there is no need to keep the variable increment in steps of one. More importantly, by using explicit variable names we do not need to rely on the implicit numbering of statements when dumping the scop information. This makes it easier to read and possibly even modify the test cases. Furthermore, by using explicit variables we do not need to rely on the implicit numbering of statements when dumping the scop information. In a future commit, this implicit numbering will likely not be used any more to refer to LLVM-IR values as it is very expensive to construct. llvm-svn: 301689	2017-04-28 21:16:29 +00:00
Siddharth Bhat	abed49699b	[Polly] [PPCGCodeGeneration] Add managed memory support to GPU code generation. This needs changes to GPURuntime to expose synchronization between host and device. 1. Needs better function naming, I want a better name than "getOrCreateManagedDeviceArray" 2. DeviceAllocations is used by both the managed memory and the non-managed memory path. This exploits the fact that the two code paths are never run together. I'm not sure if this is the best design decision Reviewed by: PhilippSchaad Tags: #polly Differential Revision: https://reviews.llvm.org/D32215 llvm-svn: 301640	2017-04-28 11:16:30 +00:00
Tobias Grosser	c96c1d8c87	[ScopInfo] Consider only write-free dereferencable loads as invariant When we introduced in r297375 support for hoisting loads that are known to be dereferencable without any conditional guard, we forgot to keep the check to verify that no other write into the very same location exists. This change ensures now that dereferencable loads are allowed to access everything, but can only be hoisted in case no conflicting write exists. This resolves llvm.org/PR32778 Reported-by: Huihui Zhang <huihuiz@codeaurora.org> llvm-svn: 301582	2017-04-27 20:08:16 +00:00
Hongbin Zheng	0f8f177682	[Polly] Do not introduce address space cast Do not introduce address space cast in IslNodeBuilder::preloadUnconditionally. Differential Revision: https://reviews.llvm.org/D32581 llvm-svn: 301519	2017-04-27 06:42:14 +00:00
Siddharth Bhat	d277feda91	[PPCGCodeGeneration] Update PPCG Code Generation for OpenCL compatibility Added a small change to the way pointer arguments are set in the kernel code generation. The way the pointer is retrieved now, specifically requests global address space to be annotated. This is necessary, if the IR should be run through NVPTX to generate OpenCL compatible PTX. The changes do not affect the PTX Strings generated for the CUDA target (nvptx64-nvidia-cuda), but are necessary for OpenCL (nvptx64-nvidia-nvcl). Additionally, the data layout has been updated to what the NVPTX Backend requests/recommends. Contributed-by: Philipp Schaad Reviewers: Meinersbur, grosser, bollu Reviewed By: grosser, bollu Subscribers: jlebar, pollydev, llvm-commits, nemanjai, yaxunl, Anastasia Tags: #polly Differential Revision: https://reviews.llvm.org/D32215 llvm-svn: 301299	2017-04-25 08:08:29 +00:00
Siddharth Bhat	729377f063	[Polly] [DependenceInfo] change WAR generation, Read will not block Read Earlier, the call to buildFlow was: WAR = buildFlow(Write, Read, MustWrite, Schedule). This meant that Read could block another Read, since must-sources can block each other. Fixed the call to buildFlow to correctly compute Read. The resulting code needs to do some ISL juggling to get the output we want. Bug report: https://bugs.llvm.org/show_bug.cgi?id=32623 Reviewers: Meinersbur Tags: #polly Differential Revision: https://reviews.llvm.org/D32011 llvm-svn: 301266	2017-04-24 22:23:12 +00:00
Michael Kruse	abf05b18db	[CMake] Fix polly-isl-test execution in out-of-LLVM-tree builds. The isl unittest modified its PATH variable to point to the LLVM bin dir. When building out-of-LLVM-tree, it does not contain the polly-isl-test executable, hence the test fails. Ensure that the polly-isl-test is written to a bin directory in the build root, just like it would happen in an inside-LLVM build. Then, change PATH to include that dir such that the executable in it is prioritized before any other location. llvm-svn: 301096	2017-04-22 23:02:53 +00:00
Philip Pfaffe	78265cd237	Fix missing .git/indexloadPolly in ensure-correct-tile-sizes testcase llvm-svn: 299765	2017-04-07 12:55:26 +00:00
Roman Gareev	e0d466342b	Restore the initial ordering of dimensions before applying the pattern matching Dimensions of band nodes can be implicitly permuted by the algorithm applied during the schedule generation. For example, in case of the following matrix-matrix multiplication, for (i = 0; i < 1024; i++) for (k = 0; k < 1024; k++) for (j = 0; j < 1024; j++) C[i][j] += A[i][k] * B[k][j]; it can produce the following schedule tree domain: "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and 0 <= i2 <= 1023 }" child: schedule: "[{ Stmt_for_body6[i0, i1, i2] -> [(i0)] }, { Stmt_for_body6[i0, i1, i2] -> [(i1)] }, { Stmt_for_body6[i0, i1, i2] -> [(i2)] }]" permutable: 1 coincident: [ 1, 1, 0 ] The current implementation of the pattern matching optimizations relies on the initial ordering of dimensions. Otherwise, it can produce the miscompilation (e.g., [1]). This patch helps to restore the initial ordering of dimensions by recreating the band node when the corresponding conditions are satisfied. Refs.: [1] - https://bugs.llvm.org/show_bug.cgi?id=32500 Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D31741 llvm-svn: 299662	2017-04-06 17:09:54 +00:00
Siddharth Bhat	5eeb1dd42e	[Polly] [ScheduleOptimizer] Prevent incorrect tile size computation Because Polly exposes parameters that directly influence tile size calculations, one can setup situations like divide-by-zero. Check against a possible divide-by-zero in getMacroKernelParams and return early. Also assert at the end of getMacroKernelParams that the block sizes computed for matrices are positive (>= 1). Tags: #polly Differential Revision: https://reviews.llvm.org/D31708 llvm-svn: 299633	2017-04-06 08:20:22 +00:00
Michael Kruse	895f5d8080	Remove llvm.lifetime.start/end in original region. The current StackColoring algorithm does not correctly handle the situation when some, but not all paths from a BB to the entry node cross a llvm.lifetime.start. According to an interpretation of the language reference at http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic this might be correct, but it would cost too much effort to handle in StackColoring. To be on the safe side, remove all lifetime markers even in the original code version (they have never been copied to the optimized version) to ensure that no path to the entry block will cross a llvm.lifetime.start. The same principle applies to paths the a function return and the llvm.lifetime.end marker, so we remove them as well. This fixes llvm.org/PR32251. Also see the discussion at http://lists.llvm.org/pipermail/llvm-dev/2017-March/111551.html llvm-svn: 299585	2017-04-05 20:09:59 +00:00
Siddharth Bhat	bcbfdade41	[Polly] [DependenceInfo] change WAR, WAW generation to correct semantics = Change of WAR, WAW generation: = - `buildFlow(Sink, MustSource, MaySource, Sink)` treates any flow of the form `sink <- may source <- must source` as a may dependence. - we used to call: ```lang=cpp, name=old-flow-call.cpp Flow = buildFlow(MustWrite, MustWrite, Read, Schedule); WAW = isl_union_flow_get_must_dependence(Flow); WAR = isl_union_flow_get_may_dependence(Flow); ``` - This caused some WAW dependences to be treated as WAR dependences. - Incorrect semantics. - Now, we call WAR and WAW correctly. == Correct WAW: == ```lang=cpp, name=new-waw-call.cpp Flow = buildFlow(Write, MustWrite, MayWrite, Schedule); WAW = isl_union_flow_get_may_dependence(Flow); isl_union_flow_free(Flow); ``` == Correct WAR: == ```lang=cpp, name=new-war-call.cpp Flow = buildFlow(Write, Read, MustaWrite, Schedule); WAR = isl_union_flow_get_must_dependence(Flow); isl_union_flow_free(Flow); ``` - We want the "shortest" WAR possible (exact dependences). - We mark all the must-writes as may-source, reads as must-souce. - Then, we ask for must dependence. - This removes all the reads that flow through a must-write before reaching a sink. - Note that we only block ealier writes with must-writes. This is intuitively correct, as we do not want may-writes to block must-writes. - Leaves us with direct (R -> W). - This affects reduction generation since RED is built using WAW and WAR. = New StrictWAW for Reductions: = - We used to call: ```lang=cpp,name=old-waw-war-call.cpp Flow = buildFlow(MustWrite, MustWrite, Read, Schedule); WAW = isl_union_flow_get_must_dependence(Flow); WAR = isl_union_flow_get_may_dependence(Flow); ``` - This is the right model of WAW we need for reductions, just not in general. - Reductions need to track only strict WAW, without any interfering reductions. = Explanation: Why the new WAR dependences in tests are correct: = - We no longer set WAR = WAR - WAW - Hence, we will have WAR dependences that were originally removed. - These may look incorrect, but in fact make sense. == Code: == ```lang=llvm, name=new-war-dependence.ll ; void manyreductions(long A) { ; for (long i = 0; i < 1024; i++) ; for (long j = 0; j < 1024; j++) ; S0: A += 42; ; ; for (long i = 0; i < 1024; i++) ; for (long j = 0; j < 1024; j++) ; S1: A += 42; ; ``` === WAR dependence: === { S0[1023, 1023] -> S1[0, 0] } - Between `S0[1023, 1023]` and `S1[0, 0]`, we will have the dependences: ```lang=cpp, name=dependence-incorrect, counterexample S0[1023, 1023]: -- tmp = A (load0)-- WAR 2 add = tmp + 42 \| -> A = add (store0) \| WAR 1 S1[0, 0]: \| tmp = A (load1) \| add = tmp + 42 \| A = add (store1)<- ``` - One may assume that WAR2 hides WAR1 (since store0 happens before store1). However, within a statement, Polly has no idea about the ordering of loads and stores. - Hence, according to Polly, the code may have looked like this: ```lang=cpp, name=dependence-correct S0[1023, 1023]: A = add (store0) tmp = A (load0) ---* add = A + 42 \| WAR 1 S1[0, 0]: \| tmp = A (load1) \| add = A + 42 \| A = add (store1) <-* ``` - So, Polly generates (correct) WAR dependences. It does not make sense to remove these dependences, since they are correct with respect to Polly's model. Reviewers: grosser, Meinersbur tags: #polly Differential revision: https://reviews.llvm.org/D31386 llvm-svn: 299429	2017-04-04 13:08:23 +00:00
Tobias Grosser	65371af2e1	[CodeGen] Add Performance Monitor Add support for -polly-codegen-perf-monitoring. When performance monitoring is enabled, we emit performance monitoring code during code generation that prints after program exit statistics about the total number of cycles executed as well as the number of cycles spent in scops. This gives an estimate on how useful polyhedral optimizations might be for a given program. Example output: Polly runtime information ------------------------- Total: 783110081637 Scops: 663718949365 In the future, we might also add functionality to measure how much time is spent in optimized scops and how many cycles are spent in the fallback code. Reviewers: bollu,sebpop Tags: #polly Differential Revision: https://reviews.llvm.org/D31599 llvm-svn: 299359	2017-04-03 14:55:37 +00:00
Michael Kruse	0b8949e6ed	[test] Fix two testcases. NFC. Trivial fix for two testcases. When Polly isn't linked into opt, independent of whether it's built in-tree or not, these testcases forget to load the appropriate library. Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com> Differential Revision: https://reviews.llvm.org/D31596 llvm-svn: 299357	2017-04-03 12:37:10 +00:00
Tobias Grosser	bd96c73a1a	Add test case for r299352. llvm-svn: 299353	2017-04-03 07:44:23 +00:00
Roman Gareev	cdfb57dc46	Introduce another level of metadata to distinguish non-aliasing accesses Introduce another level of alias metadata to distinguish the individual non-aliasing accesses that have inter iteration alias-free base pointers marked with "Inter iteration alias-free" mark nodes. It can be used to, for example, distinguish different stores (loads) produced by unrolling of the innermost loops and, subsequently, sink (hoist) them by LICM. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30606 llvm-svn: 298510	2017-03-22 14:25:24 +00:00
Roman Gareev	23df27682a	Map the new load to the base pointer of the invariant load hoisted load Map the new load to the base pointer of the invariant load hoisted load to be able to find the alias information for it. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30605 llvm-svn: 298507	2017-03-22 13:57:53 +00:00
Tobias Grosser	b28f86e9e6	[CodeGen] Remove need for all parameters to be in scop context for load hoisting. When not adding constraints on parameters using -polly-ignore-parameter-bounds, the context may not necessarily list all parameter dimensions. To support code generation in this situation, we now always iterate over the actual parameter list, rather than relying on the context to list all parameter dimensions. llvm-svn: 298197	2017-03-18 23:12:49 +00:00
Tobias Grosser	7693b116a1	[OpenMP] Do not emit lifetime markers for context In commit r219005 lifetime markers have been introduced to mark the lifetime of the OpenMP context data structure. However, their use seems incorrect and recently caused a miscompile in ASC_Sequoia/CrystalMk after r298053 which was not at all related to r298053. r298053 only caused a change in the loop order, as this change resulted in a different isl internal representation which caused the scheduler to derive a different schedule. This change then caused the IR to change, which apparently created a pattern in which LLVM exploites the lifetime markers. It seems we are using the OpenMP context outside of the lifetime markers. Even though CrystalMk could probably be fixed by expanding the scope of the lifetime markers, it is not clear what happens in case the OpenMP function call is in a loop which will cause a sequence of starting and ending lifetimes. As it is unlikely that the lifetime markers give any performance benefit, we just drop them to remove complexity. llvm-svn: 298192	2017-03-18 20:10:07 +00:00
Michael Kruse	f3091bf4cf	[PruneUnprofitable] Add -polly-prune-unprofitable pass. ScopInfo's normal profitability heuristic considers SCoPs where all statements have scalar writes as not profitably optimizable and invalidate the SCoP in that case. However, -polly-delicm and -polly-simplify may be able to remove some of the scalar writes such that the flag -polly-unprofitable-scalar-accs=false allows disabling that part of the heuristic. In cases where DeLICM (or other passes after ScopInfo) are not successful in removing scalar writes, the SCoP is still not profitably optimizable. The schedule optimizer would again try computing another schedule, resulting in slower compilation. The -polly-prune-unprofitable pass applies the profitability heuristic again before the schedule optimizer Polly can still bail out even with -polly-unprofitable-scalar-accs=false. Differential Revision: https://reviews.llvm.org/D31033 llvm-svn: 298080	2017-03-17 13:09:52 +00:00
Siddharth Bhat	65f3d5201e	[DependenceInfo] Track may-writes and build flow information in Dependences::calculateDependences. This ensures that we handle may-writes correctly when building dependence information. Also add a test case checking correctness of may-write information. Not handling it before was an oversight. Differential Revision: https://reviews.llvm.org/D31075 llvm-svn: 298074	2017-03-17 12:31:28 +00:00
Siddharth Bhat	65c4026992	Set Dependences::RED to be non-null once Dependences::calculateDependences() occurs, even if there is no actual reduction. This ensures correctness with isl operations. llvm-svn: 297981	2017-03-16 20:06:49 +00:00
Tobias Grosser	c9d4cb2f42	[ScheduleOptimizer] Allow tiling after fusion In ScheduleOptimizer::isTileableBand(), allow the case in which the band node's child is an isl_schedule_sequence_node and its grandchildren isl_schedule_leaf_nodes. This case can arise when two or more statements are fused by the isl scheduler. The tile_after_fusion.ll test has two statements in separate loop nests and checks whether they are tiled after being fused when polly-opt-fusion equals "max". Reviewers: grosser Subscribers: gareevroman, pollydev Tags: #polly Contributed-by: Theodoros Theodoridis <theodort@student.ethz.ch> Differential Revision: https://reviews.llvm.org/D30815 llvm-svn: 297587	2017-03-12 19:02:31 +00:00
Michael Kruse	0446d81e2d	[Simplify] Add -polly-simplify pass. This new pass removes unnecessary accesses and writes. It currently supports 2 simplifications, but more are planned. It removes write accesses that write a loaded value back to the location it was loaded from. It is a typical artifact from DeLICM. Removing it will get rid of bogus dependencies later in dependency analysis. It also removes statements without side-effects. ScopInfo already removes these, but the removal of unnecessary writes can result in more side-effect free statements. Differential Revision: https://reviews.llvm.org/D30820 llvm-svn: 297473	2017-03-10 16:05:24 +00:00
Tobias Grosser	8bd7f3c0a5	[ScopDetect/Info] Allow unconditional hoisting of loads from dereferenceable ptrs In case LLVM pointers are annotated with !dereferencable attributes/metadata or LLVM can look at the allocation from which a pointer is derived, we can know that dereferencing pointers is safe and can be done unconditionally. We use this information to proof certain pointers as save to hoist and then hoist them unconditionally. llvm-svn: 297375	2017-03-09 11:36:00 +00:00
Michael Kruse	9fb3ab1b19	[DeLICM] Add -polly-delicm-overapproximate-writes option. One of the current limitations of DeLICM is that it only creates PHI WRITEs that it knows are read by some PHI. Such writes may not span all instances of a statement. Polly's code generator currently does not support MemoryAccesses that are not executed in all instances ('partial accesses') and so has to give up on a possible mapping. This workaround has once been suggested by Tobias Grosser: Try to interpolate an arbitrary expansion to all instances. It will be checked for possible conflicts with the existing Knowledge and can be applied if the conflict checking result is that no semantics are changed. Expansion is done by simplifying the mapping by coalescing with the hope that coalescing will find a polyhedral 'rule' of the relevant map. It is then 'gist'-ed using the domain of the relevant instances such that the rule is expanded to the universe and finally intersected with the domain of all statement instances. The expansion makes conflicts become more likely, the found rule may still not encompass all statement instances and the found rule exposes internals of isl's implementation of coalesce and gist. The latter means that the result depends on how much effort the implementation invests into finding a rule which may change between versions of isl. Trivial implementations of gist and coalesce just return the input arguments. A patch that makes codegen support partial accesses is in preparation as well. Differential Revision: https://reviews.llvm.org/D30763 llvm-svn: 297373	2017-03-09 11:23:22 +00:00
Michael Kruse	6744efa8d8	[ScopDetection] Only allow SCoP-wide available base pointers. Simplify ScopDetection::isInvariant(). Essentially deny everything that is defined within the SCoP and is not load-hoisted. The previous understanding of "invariant" has a few holes: - Expressions without side-effects with only invariant arguments, but are defined withing the SCoP's region with the exception of selects and PHIs. These should be part of the index expression derived by ScalarEvolution and not of the base pointer. - Function calls with that are !mayHaveSideEffects() (typically functions with "readnone nounwind" attributes). An example is given below. @C = external global i32 declare float* @getNextBasePtr(float) readnone nounwind ... %ptr = call float @getNextBasePtr(float* %A, float %B) The call might return: * %A, so %ptr aliases with it in the SCoP * %B, so %ptr aliases with it in the SCoP * @C, so %ptr aliases with it in the SCoP * a new pointer everytime it is called, such as malloc() * a pointer into the allocated block of one of the aforementioned * any of the above, at random at each call Hence and contrast to a comment in the base_pointer.ll regression test, %ptr is not necessarily the same all the time. It might also alias with anything and no AliasAnalysis can tell otherwise if the definition is external. It is hence not suitable in the role of a base pointer. The practical problem with base pointers defined in SCoP statements is that it is not available globally in the SCoP. The statement instance must be executed first before the base pointer can be used. This is no problem if the base pointer is transferred as a scalar value between statements. Uses of MemoryAccess::setNewAccessRelation may add a use of the base pointer anywhere in the array. setNewAccessRelation is used by JSONImporter, DeLICM and D28518. Indeed, BlockGenerator currently assumes that base pointers are available globally and generates invalid code for new access relation (referring to the base pointer of the original code) if not, even if the base pointer would be available in the statement. This could be fixed with some added complexity and restrictions. The ExprBuilder must lookup the local BBMap and code that call setNewAccessRelation must check whether the base pointer is available first. The code would still be incorrect in the presence of aliasing. There is the switch -polly-ignore-aliasing to explicitly allow this, but it is hardly a justification for the additional complexity. It would still be mostly useless because in most cases either getNextBasePtr() has external linkage in which case the readnone nounwind attributes cannot be derived in the translation unit itself, or is defined in the same translation unit and gets inlined. Reviewed By: grosser Differential Revision: https://reviews.llvm.org/D30695 llvm-svn: 297281	2017-03-08 15:14:46 +00:00
Michael Kruse	5a4ec5c42b	[ScopDetection] Require LoadInst base pointers to be hoisted. Only when load-hoisted we can be sure the base pointer is invariant during the SCoP's execution. Most of the time it would be added to the required hoists for the alias checks anyway, except with -polly-ignore-aliasing, -polly-use-runtime-alias-checks=0 or if AliasAnalysis is already sure it doesn't alias with anything (for instance if there is no other pointer to alias with). Two more parts in Polly assume that this load-hoisting took place: - setNewAccessRelation() which contains an assert which tests this. - BlockGenerator which would use to the base ptr from the original code if not load-hoisted (if the access expression is regenerated) Differential Revision: https://reviews.llvm.org/D30694 llvm-svn: 297195	2017-03-07 20:28:43 +00:00
Tobias Grosser	6c9958e0b3	[tests] Make sure tests do not end in 'unreachable' - Part III There is no point in optimizing unreachable code, hence our test cases should always return. This commit is part of a series that makes Polly more robust on the presence of unreachables. llvm-svn: 297158	2017-03-07 16:28:53 +00:00
Tobias Grosser	2d233fb35d	[tests] Update bounds-check elimination test cases These test cases should work in combination with https://reviews.llvm.org/D12676, but became outdated over time. Update them in preparation of discussions with Daniel Berlin on how to represent unreachable in the post-dominator tree. llvm-svn: 297157	2017-03-07 16:17:58 +00:00
Tobias Grosser	134a572951	[ScopDetection] Do not detect scops that exit to an unreachable Scops that exit with an unreachable are today still permitted, but make little sense to optimize. We therefore can already skip them during scop detection. This speeds up scop detection in certain cases and also ensures that bugpoint does not introduce unreachables when reducing test cases. In practice this change should have little impact, as the performance of unreachable code is unlikely to matter. This commit is part of a series that makes Polly more robust in the presence of unreachables. llvm-svn: 297151	2017-03-07 15:50:43 +00:00
Tobias Grosser	87dcd46aa7	[tests] Make sure tests do not end in 'unreachable' - Part II There is no point in optimizing unreachable code, hence our test cases should always return. This commit is part of a series that makes Polly more robust on the presence of unreachables. llvm-svn: 297150	2017-03-07 15:23:30 +00:00
Tobias Grosser	2dc1f547ae	[tests] Make sure tests do not end in 'unreachable' There is no point in optimizing unreachable code, hence our test cases should always return. This commit is part of a series that makes Polly more robust on the presence of unreachables. llvm-svn: 297147	2017-03-07 15:17:23 +00:00
Sanjoy Das	b641a90529	Adapt to llvm change r296992 to unbreak the bots r296992 made ScalarEvolution's CompareValueComplexity less aggressive, and that broke the polly test being fixed in this change. This change explicitly bumps CompareValueComplexity in said test case to make it pass. Can someone from the polly team please can give me an idea on if this case is important enough to have scalar-evolution-max-value-compare-depth be 3 by default? llvm-svn: 296994	2017-03-06 01:12:16 +00:00
Tobias Grosser	7d136d952e	[tests] Specify the dependence to NVPTX backend for Polly ACC test cases Some Polly ACC test cases fail without a working NVPTX backend. We explicitly specify this dependence in REQUIRES. Alternatively, we could have only marked polly-acc as supported in case the NVPTX backend is available, but as we might use other backends in the future, this does not seem to be the best choice. For this to work, we also need to make the 'targets_to_build' information available. Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 296853	2017-03-03 03:38:50 +00:00
Tobias Grosser	9d551da5c1	[test] Do not emit binary data to output Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 296852	2017-03-03 03:24:34 +00:00
Tobias Grosser	7a93d94a8f	Revert "Currently broken by recent LLVM upstream changes" This reverts commit r296579, which is not needed anymore as the relevant changes in trunk have been reverted. llvm-svn: 296817	2017-03-02 21:43:50 +00:00
Tobias Grosser	1c787e0b49	[ScopDetection] Do not allow required-invariant loads in non-affine region These loads cannot be savely hoisted as the condition guarding the non-affine region cannot be duplicated to also protect the hoisted load later on. Today they are dropped in ScopInfo. By checking for this early, we do not even try to model them and possibly can still optimize smaller regions not containing this specific required-invariant load. llvm-svn: 296744	2017-03-02 12:15:37 +00:00
Tobias Grosser	c2f151084d	[ScopInfo] Disable memory folding in case it results in multi-disjunct relations Multi-disjunct access maps can easily result in inbound assumptions which explode in case of many memory accesses and many parameters. This change reduces compilation time of some larger kernel from over 15 minutes to less than 16 seconds. Interesting is the test case test/ScopInfo/multidim_param_in_subscript.ll which has a memory access [n] -> { Stmt_for_body3[i0, i1] -> MemRef_A[i0, -1 + n - i1] } which requires folding, but where only a single disjunct remains. We can still model this test case even when only using limited memory folding. For people only reading commit messages, here the comment that explains what memory folding is: To recover memory accesses with array size parameters in the subscript expression we post-process the delinearization results. We would normally recover from an access A[exp0(i) * N + exp1(i)] into an array A[][N] the 2D access A[exp0(i)][exp1(i)]. However, another valid delinearization is A[exp0(i) - 1][exp1(i) + N] which - depending on the range of exp1(i) - may be preferrable. Specifically, for cases where we know exp1(i) is negative, we want to choose the latter expression. As we commonly do not have any information about the range of exp1(i), we do not choose one of the two options, but instead create a piecewise access function that adds the (-1, N) offsets as soon as exp1(i) becomes negative. For a 2D array such an access function is created by applying the piecewise map: [i,j] -> [i, j] : j >= 0 [i,j] -> [i-1, j+N] : j < 0 After this patch we generate only the first case, except for situations where we can proove the first case to be invalid and can consequently select the second without introducing disjuncts. llvm-svn: 296679	2017-03-01 21:11:27 +00:00
Tobias Grosser	6f9b60cf38	Currently broken by recent LLVM upstream changes We mark it as XFAIL to get buildbots back to green, until the upstream changes have been addressed. llvm-svn: 296579	2017-03-01 04:34:44 +00:00
Tobias Grosser	d7c4975349	[ScopInfo] Simplify inbounds assumptions under domain constraints Without this simplification for a loop nest: void foo(long n1_a, long n1_b, long n1_c, long n1_d, long p1_b, long p1_c, long p1_d, float A_1[][p1_b][p1_c][p1_d]) { for (long i = 0; i < n1_a; i++) for (long j = 0; j < n1_b; j++) for (long k = 0; k < n1_c; k++) for (long l = 0; l < n1_d; l++) A_1[i][j][k][l] += i + j + k + l; } the assumption: n1_a <= 0 or (n1_a > 0 and n1_b <= 0) or (n1_a > 0 and n1_b > 0 and n1_c <= 0) or (n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d <= 0) or (n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d > 0 and p1_b >= n1_b and p1_c >= n1_c and p1_d >= n1_d) is taken rather than the simpler assumption: p9_b >= n9_b and p9_c >= n9_c and p9_d >= n9_d. The former is less strict, as it allows arbitrary values of p1_* in case, the loop is not executed at all. However, in practice these precise constraints explode when combined across different accesses and loops. For now it seems to make more sense to take less precise, but more scalable constraints by default. In case we find a practical example where more precise constraints are needed, we can think about allowing such precise constraints in specific situations where they help. This change speeds up the new test case from taking very long (waited at least a minute, but it probably takes a lot more) to below a second. llvm-svn: 296456	2017-02-28 09:45:54 +00:00
Michael Kruse	6469380daa	[Cmake] Optionally use a system isl version. This patch adds an option to build against a version of libisl already installed on the system. The installation is autodetected using the pkg-config file shipped with isl. The detection of the library is in the FindISL.cmake module that creates an imported target. Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com> Differential Revision: https://reviews.llvm.org/D30043 llvm-svn: 296361	2017-02-27 17:54:25 +00:00
Michael Kruse	c4f61d2346	[DeLICM] Add nomap regressions tests. NFC. These verify that some scalars are not mapped because it would be incorrect to do so. For these check we verify that no transformation has been executed from output of the pass's '-analyze'. Adding optimization remarks is not useful as it would result in too many messages, even repeated ones. I avoided checking the '-debug-only=polly-delicm' output which is an antipattern. llvm-svn: 296348	2017-02-27 15:53:18 +00:00
Roman Gareev	96e1119a96	Make optimizations based on pattern matching be enabled by default Currently, pattern based optimizations of Polly can identify matrix multiplication and optimize it according to BLIS matmul optimization pattern (see ScheduleTreeOptimizer for details). This patch makes optimizations based on pattern matching be enabled by default. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D30293 llvm-svn: 295958	2017-02-23 11:44:12 +00:00
Michael Kruse	d8d32bb3d1	[DeLICM] Regression test for skipping map targets. Add optimization-remarks-missed for when mapping targets have been skipped and add regression tests for them. llvm-svn: 295953	2017-02-23 10:25:20 +00:00
Michael Kruse	deb30e8278	[DeLICM] Add regression tests for DeLICM reject cases. These tests were not included in the main DeLICM commit. These check the cases where zone analysis cannot be successful because of assumption violations. We use the LLVM optimization remark infrastructure as it seems to be the best fit for this kind of messages. I tried to make use if the OptimizationRemarkEmitter. However, it would insert additional function passes into the pass manager to get the hotness information. The pass manager would insert them between the flatten pass and delicm, causing the ScopInfo with the flattened schedule being thrown away. Differential Revision: https://reviews.llvm.org/D30253 llvm-svn: 295846	2017-02-22 15:14:08 +00:00
Michael Kruse	9e52c39f0a	[DeLICM] Map values hoisted by LICM back to the array. Implement the -polly-delicm pass. The pass intends to undo the effects of LoopInvariantCodeMotion (LICM) which adds additional scalar dependencies into SCoPs. DeLICM will try to map those scalars back to the array elements they were promoted from, as long as the array element is unused. The is the main patch from the DeLICM/DePRE patch series. It does not yet undo GVN PRE for which additional information about known values is needed and does not handle PHI write accesses that have have no target. As such its usefulness is limited. Patches for these issues including regression tests for error situatons will follow. Reviewers: grosser Differential Revision: https://reviews.llvm.org/D24716 llvm-svn: 295713	2017-02-21 10:20:54 +00:00
Tobias Grosser	079d511891	[ScopInfo] Count read-only arrays when computing complexity of alias check Instead of counting the number of read-only accesses, we now count the number of distinct read-only array references when checking if a run-time alias check may be too complex. The run-time alias check is quadratic in the number of base pointers, not the number of accesses. Before this change we accidentally skipped SPEC's lbm test case. llvm-svn: 295567	2017-02-18 20:51:29 +00:00
Tobias Grosser	41f0d81b31	[test] Add reduction sequence test case [NFC] This test case is a mini performance test case that shows the time needed for a couple of simple reductions. It takes today about 325ms on my machine to run this test case through 'opt' with scop construction and reduction detection. It can be used as mini-proxy for further tuning of the reduction code. Generally we do not commit performance test cases, but as this is very small and also very fast it seems OK to keep it in the lit test suite. This test case will also help to verify that future changes to the reduction code will not affect the ordering of the reduction sets and will consequently not cause spurious performance changes that only result from reordering of dependences in the reduction set. llvm-svn: 295549	2017-02-18 16:38:58 +00:00
Tobias Grosser	cd01a363d6	[ScopInfo] Add statistics to count loops after scop modeling llvm-svn: 295431	2017-02-17 08:12:36 +00:00
Tobias Grosser	65ce9362b8	[ScopDetection] Compute the maximal loop depth correctly Before this change, we obtained loop depth numbers that were deeper then the actual loop depth. llvm-svn: 295430	2017-02-17 08:08:54 +00:00
Tobias Grosser	72745c2ef5	Updated isl to isl-0.18-254-g6bc184d This update includes a couple more coalescing changes as well as a large number of isl-internal code cleanups (dead assigments, ...). llvm-svn: 295419	2017-02-17 05:11:16 +00:00
Tobias Grosser	ca2cfd0bd8	[ScopInfo] Do not try to fold array dimensions of size zero Trying to fold such kind of dimensions will result in a division by zero, which crashes the compiler. As such arrays are likely to invalidate the scop anyhow (but are not illegal in LLVM-IR), there is no point in trying to optimize the array layout. Hence, we just avoid the folding of constant dimensions of size zero. llvm-svn: 295415	2017-02-17 04:48:52 +00:00
Tobias Grosser	76ec194951	[tests] Fix some misspellings [NFC] llvm-svn: 295361	2017-02-16 19:11:29 +00:00
Tobias Grosser	c8a8276710	[ScopInfo] Bound the number of disjuncts in context Before this change wrapping range metadata resulted in exponential growth of the context, which made context construction of large scops very slow. Instead, we now just do not model the range information precisely, in case the number of disjuncts in the context has already reached a certain limit. llvm-svn: 295360	2017-02-16 19:11:25 +00:00
Tobias Grosser	3281f601bb	[ScopInfo] Always derive upper and lower bounds for parameters Commit r230230 introduced the use of range metadata to derive bounds for parameters, instead of just looking at the type of the parameter. As part of this commit support for wrapping ranges was added, where the lower bound of a parameter is larger than the upper bound: { 255 < p \|\| p < 0 } However, at the same time, for wrapping ranges support for adding bounds given by the size of the containing type has acidentally been dropped. As a result, the range of the parameters was not guaranteed to be bounded any more. This change makes sure we always add the bounds given by the size of the type and then additionally add bounds based on signed wrapping, if available. For a parameter p with a type size of 32 bit, the valid range is then: { -2147483648 <= p <= 2147483647 and (255 < p or p < 0) } llvm-svn: 295349	2017-02-16 18:39:14 +00:00
Tobias Grosser	b3a85884f7	Do not use wrapping ranges to bound non-affine accesses When deriving the range of valid values of a scalar evolution expression might be a range [12, 8), where the upper bound is smaller than the lower bound and where the range is expected to possibly wrap around. We theoretically could model such a range as a union of two non-wrapping ranges, but do not do this as of yet. Instead, we just do not derive any bounds. Before this change, we could have obtained bounds where the maximal possible value is strictly smaller than the minimal possible value, which is incorrect and also caused assertions during scop modeling. llvm-svn: 294891	2017-02-12 08:11:12 +00:00
Roman Gareev	b196055c0c	Check reduction dependencies in case of the matrix multiplication optimization To determine parameters of the matrix multiplication, we check RAW dependencies that can be expressed using only reduction dependencies. Consequently, we should check the reduction dependencies, if this is the case. Reviewed-by: Tobias Grosser <tobias@grosser.es>, Sven Verdoolaege <skimo-polly@kotnet.org> Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D29814 llvm-svn: 294836	2017-02-11 09:59:09 +00:00
Roman Gareev	3d4eae31ea	Use the size of the widest type of the matrix multiplication operands The size of the operands type is the one of the parameters required to determine the BLIS micro-kernel. We get the size of the widest type of the matrix multiplication operands in case there are several different types. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D29269 llvm-svn: 294828	2017-02-11 07:00:05 +00:00
Tobias Grosser	4553463be4	[IRBuilder] Extract base pointers directly from ScopArray Instead of iterating over statements and their memory accesses to extract the set of available base pointers, just directly iterate over all ScopArray objects. This reflects more the actual intend of the code: collect all arrays (and their base pointers) to emit alias information that specifies that accesses to different arrays cannot alias. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294574	2017-02-09 09:34:42 +00:00
Roman Gareev	028ba3702c	[FIX] Disable the problematic run lines There are problems with using the machine information to derive the precise vector size on polly-amd64-linux and polly-arm-linux. We temporarily disable the problematic run lines. llvm-svn: 294571	2017-02-09 09:03:13 +00:00
Roman Gareev	2d0d294e3c	[FIX] Specify the CPU to overwrite the machine info and set a fixed vector size. llvm-svn: 294569	2017-02-09 08:29:55 +00:00
Tobias Grosser	26fb7d7517	[IslAst] Print the ScopArray name to mark reductions Before this change we used the name of the base pointer to mark reductions. This is imprecise as the canonical reference is the ScopArray itself and not the basepointer of a reduction. Using the base pointer of reductions is problematic in cases where a single ScopArray is referenced through two different base pointers. This change removes unnecessary uses of MemoryAddress::getBaseAddr() in preparation for https://reviews.llvm.org/D28518. llvm-svn: 294568	2017-02-09 08:06:15 +00:00
Roman Gareev	9989088ee9	Isolate a set of partial tile prefixes in case of the matrix multiplication optimization Isolate a set of partial tile prefixes to allow hoisting and sinking out of the unrolled innermost loops produced by the optimization of the matrix multiplication. In case it cannot be proved that the number of loop iterations can be evenly divided by tile sizes and we tile and unroll the point loop, the isl generates conditional expressions. Subsequently, the conditional expressions can prevent stores and loads of the unrolled loops from being sunk and hoisted. The patch isolates a set of partial tile prefixes, which have exactly Mr x Nr iterations of the two innermost loops, the result of the loop tiling performed by the matrix multiplication optimization, where Mr and Mr are parameters of the micro-kernel. This helps to get rid of the conditional expressions of the unrolled innermost loops. Probably this approach can be replaced with padding in future. In case of, for example, the gemm from Polybench/C 3.2 and parametric loop bounds, it helps to increase the performance from 7.98 GFlops (27.71% of theoretical peak) to 21.47 GFlops (74.57% of theoretical peak). Hence, we get the same performance as in case of scalar loops bounds. It also cause compile time regression. The compile-time is increased from 0.795 seconds to 0.837 seconds in case of scalar loops bounds and from 1.222 seconds to 1.490 seconds in case of parametric loops bounds. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D29244 llvm-svn: 294564	2017-02-09 07:10:01 +00:00
Roman Gareev	772498dc68	[NFC] Make ScheduleTreeOptimizer::optimizeBand return a schedule node optimized with optimizeMatMulPattern This patch makes ScheduleTreeOptimizer::optimizeBand return a schedule node optimized with optimizeMatMulPattern. Otherwise, it could not use the isolate option, because standardBandOpts could try to tile a band node with anchored subtree and get the error, since the use of the isolate option causes any tree containing the node to be considered anchored. Furthermore, it is not intended to apply standard optimizations, when the matrix multiplication has been detected. llvm-svn: 294444	2017-02-08 13:29:06 +00:00
Roman Gareev	98075fe181	A new algorithm for identification of a SCoP statement that implement a matrix multiplication The current identification of a SCoP statement that implement a matrix multiplication does not help to identify different permutations of loops that contain it and check for dependencies, which can prevent it from being optimized. It also requires external determination of the operands of the matrix multiplication. This patch contains the implementation of a new algorithm that helps to avoid these issues. It also modifies the test cases that generate matrix multiplications with linearized accesses, because the new algorithm does not support them. Reviewed-by: Michael Kruse <llvm@meinersbur.de>, Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28357 llvm-svn: 293890	2017-02-02 14:23:14 +00:00
Tobias Grosser	f58469ad45	Add forgotten test case for r293169 llvm-svn: 293383	2017-01-28 14:32:45 +00:00
Tobias Grosser	64bbb1357f	ScopDetectionDiagnostics: Also emit diagnostics in case no debug info is available In this case, we just use the start of the scop as the debug location. llvm-svn: 293165	2017-01-26 10:30:55 +00:00
Tobias Grosser	75dfaa1dbe	BlockGenerator: Do not redundantly reload from PHI-allocas in non-affine stmts Before this change we created an additional reload in the copy of the incoming block of a PHI node to reload the incoming value, even though the necessary value has already been made available by the normally generated scalar loads. In this change, we drop the code that generates this redundant reload and instead just reuse the scalar value already available. Besides making the generated code slightly cleaner, this change also makes sure that scalar loads go through the normal logic, which means they can be remapped (e.g. to array slots) and corresponding code is generated to load from the remapped location. Without this change, the original scalar load at the beginning of the non-affine region would have been remapped, but the redundant scalar load would continue to load from the old PHI slot location. It might be possible to further simplify the code in addOperandToPHI, but this would not only mean to pull out getNewValue, but to also change the insertion point update logic. As this did not work when trying it the first time, this change is likely not trivial. To not introduce bugs last minute, we postpone further simplications to a subsequent commit. We also document the current behavior a little bit better. Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D28892 llvm-svn: 292486	2017-01-19 14:12:45 +00:00
Tobias Grosser	a989a8b84c	Improve test coverage in test/Isl/CodeGen/loop_partially_in_scop.ll [NFC] We rename the test case with -metarenamer to make the variable names easier to read and add additional check lines that verify the code we currently generate for PHI nodes. This code is interesting as it contains a PHI node in a non-affine sub-region, where some incoming blocks are within the non-affine sub-region and others are outside of the non-affine subregion. As can be seen in the check lines we currently load the PHI-node value twice. This commit documents this behavior. In a subsequent patch we will try to improve this. llvm-svn: 292470	2017-01-19 04:54:45 +00:00
Tobias Grosser	e1ff0cf2eb	Relax assert when setting access functions with invariant base pointers Summary: Instead of forbidding such access functions completely, we verify that their base pointer has been hoisted and only assert in case the base pointer was not hoisted. I was trying for a little while to get a test case that ensures the assert is correctly fired in case of invariant load hoisting being disabled, but I could not find a good way to do so, as llvm-lit immediately aborts if a command yields a non-zero return value. As we do not generally test our asserts, not having a test case here seems OK. This resolves http://llvm.org/PR31494 Suggested-by: Michael Kruse <llvm@meinersbur.de> Reviewers: efriedma, jdoerfert, Meinersbur, gareevroman, sebpop, zinob, huihuiz, pollydev Reviewed By: Meinersbur Differential Revision: https://reviews.llvm.org/D28798 llvm-svn: 292213	2017-01-17 12:00:42 +00:00
Tobias Grosser	7fcb689ea8	test: harden test case to fail even in non-asserts build The original test case was added in r292147. Suggested-by: Michael Kruse <llvm@meinersbur.de> llvm-svn: 292202	2017-01-17 07:03:25 +00:00
Tobias Grosser	eec7f6daa1	Add test showing the update of access functions with in-scop defined base ptrs This feature is currently not supported and an explicit assert to prevent the introduction of such accesses has been added in r282893. This test case allows to reproduce the assert (and without the assert the miscompile) added in r282893. It will help when adding such support at some point. llvm-svn: 292147	2017-01-16 17:51:28 +00:00
Tobias Grosser	93eb7e321f	Un-XFAIL test case after half support was added to PTX backend in r291956 llvm-svn: 292124	2017-01-16 14:08:14 +00:00
Tobias Grosser	67e94fb435	ScheduleOptimizer: Allow to set register width in command line We use this option to set a fixed register width in our test cases to make sure the results are identical accross platforms. llvm-svn: 292002	2017-01-14 07:14:54 +00:00
Tobias Grosser	bb1e386c4d	Update tests to more precise analysis results in LLVM core LLVM's range analysis became a little tighter, which means Polly can derive tighter bounds as well. llvm-svn: 291718	2017-01-11 22:53:34 +00:00
Tobias Grosser	94e5371dde	Update to isl-0.18-43-g0b4256f Even more isl coalesce changes. llvm-svn: 290783	2016-12-31 07:46:11 +00:00
Roman Gareev	1c2927b209	Specify the default values of the cache parameters If the parameters of the target cache (i.e., cache level sizes, cache level associativities) are not specified or have wrong values, we use ones for parameters of the macro-kernel and do not perform data-layout optimizations of the matrix multiplication. In this patch we specify the default values of the cache parameters to be able to apply the pattern matching optimizations even in this case. Since there is no typical values of this parameters, we use the parameters of Intel Core i7-3820 SandyBridge that also help to attain the high-performance on IBM POWER System S822 and IBM Power 730 Express server. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28090 llvm-svn: 290518	2016-12-25 16:32:28 +00:00
Tobias Grosser	0791d5f5aa	ScheduleOptimizer: Fix spelling of option '-polly-target-throughput-vector-fma' througput -> throughput llvm-svn: 290418	2016-12-23 07:33:39 +00:00
Roman Gareev	be5299af0b	Change the determination of parameters of macro-kernel Typically processor architectures do not include an L3 cache, which means that Nc, the parameter of the micro-kernel, is, for all practical purposes, redundant ([1]). However, its small values can cause the redundant packing of the same elements of the matrix A, the first operand of the matrix multiplication. At the same time, big values of the parameter Nc can cause segmentation faults in case the available stack is exceeded. This patch adds an option to specify the parameter Nc as a multiple of the parameter of the micro-kernel Nr. In case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=8 it helps to improve the performance from 11.303 GFlops/sec (39,247% of theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak). Refs.: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D28019 llvm-svn: 290256	2016-12-21 12:51:12 +00:00
Roman Gareev	92c446016a	[Polly] Use three-dimensional arrays to store packed operands of the matrix multiplication Previously we had two-dimensional accesses to store packed operands of the matrix multiplication for the sake of simplicity of the packed arrays. However, addition of the third dimension helps to simplify the corresponding memory access, reduce the execution time of isl operations applied to it, and consequently reduce the compile-time of Polly. For example, in case of Intel Core i7-3820 SandyBridge and the following options, clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME -march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true -DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8 -mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm -polly-target-latency-vector-fma=7 it helps to reduce the compile-time from about 361.456 seconds to about 0.816 seconds. Reviewed-by: Michael Kruse <llvm@meinersbur.de>, Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D27878 llvm-svn: 290251	2016-12-21 11:18:42 +00:00
Adrian Prantl	abd69332e2	Fix debug info metadata for upstream change in LLVM. llvm-svn: 290154	2016-12-20 02:09:59 +00:00
Adrian Prantl	80d13b4545	Revert "Fix debug info metadata for upstream change in LLVM." llvm-svn: 289983	2016-12-16 19:39:18 +00:00
Adrian Prantl	e0a1bdad3f	Fix debug info metadata for upstream change in LLVM. llvm-svn: 289953	2016-12-16 16:17:24 +00:00
Roman Gareev	2606c48a1d	Restrict ranges of extension maps To prevent copy statements from accessing arrays out of bounds, ranges of their extension maps are restricted, according to the constraints of domains. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D25655 llvm-svn: 289815	2016-12-15 12:35:59 +00:00
Roman Gareev	8babe1a216	The order of the loops defines the data reused in the BLIS implementation of gemm ([1]). In particular, elements of the matrix B, the second operand of matrix multiplication, are reused between iterations of the innermost loop. To keep the reused data in cache, only elements of matrix A, the first operand of matrix multiplication, should be evicted during an iteration of the innermost loop. To provide such a cache replacement policy, elements of the matrix A can, in particular, be loaded first and, consequently, be least-recently-used. In our case matrices are stored in row-major order instead of column-major order used in the BLIS implementation ([1]). One of the ways to address it is to accordingly change the order of the loops of the loop nest. However, it makes elements of the matrix A to be reused in the innermost loop and, consequently, requires to load elements of the matrix B first. Since the LLVM vectorizer always generates loads from the matrix A before loads from the matrix B and we can not provide it. Consequently, we only change the BLIS micro kernel and the computation of its parameters instead. In particular, reused elements of the matrix B are successively multiplied by specific elements of the matrix A . Refs.: [1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D25653 llvm-svn: 289806	2016-12-15 11:47:38 +00:00
Tobias Grosser	bedef00e2c	[ScopInfo] Fold constant coefficients in array dimensions to the right This allows us to delinearize code such as the one below, where the array sizes are A[][2 * n] as there are n times two elements in the innermost dimension. Alternatively, we could try to generate another dimension for the struct in the innermost dimension, but as the struct has constant size, recovering this dimension is easy. struct com { double Real; double Img; }; void foo(long n, struct com A[][n]) { for (long i = 0; i < 100; i++) for (long j = 0; j < 1000; j++) A[i][j].Real += A[i][j].Img; } int main() { struct com A[100][1000]; foo(1000, A); llvm-svn: 288489	2016-12-02 08:10:56 +00:00
Johannes Doerfert	b6c5a5dd01	[FIX] Do not try to hoist obviously overwritten loads llvm-svn: 288328	2016-12-01 11:10:45 +00:00
Michael Kruse	36e79ecaec	[DeLICM] Add pass boilerplate code. Add an empty DeLICM pass, without any functional parts. Extracting the boilerplate from the the functional part reduces the size of the code to review (https://reviews.llvm.org/D24716) Suggested-by: Tobias Grosser <tobias@grosser.es> llvm-svn: 288160	2016-11-29 16:41:21 +00:00
Tobias Grosser	b45ae5601b	[ScopDetect] Expand statistics of the detected scops We now collect: Number of total loops Number of loops in scops Number of scops Number of scops with maximal loop depth 1 Number of scops with maximal loop depth 2 Number of scops with maximal loop depth 3 Number of scops with maximal loop depth 4 Number of scops with maximal loop depth 5 Number of scops with maximal loop depth 6 and larger Number of loops in scops (profitable scops only) Number of scops (profitable scops only) Number of scops with maximal loop depth 1 (profitable scops only) Number of scops with maximal loop depth 2 (profitable scops only) Number of scops with maximal loop depth 3 (profitable scops only) Number of scops with maximal loop depth 4 (profitable scops only) Number of scops with maximal loop depth 5 (profitable scops only) Number of scops with maximal loop depth 6 and larger (profitable scops only) These statistics are certainly completely accurate as we might drop scops when building up their polyhedral representation, but they should give a good indication of the number of scops we detect. llvm-svn: 287973	2016-11-26 07:37:46 +00:00
Tobias Grosser	5c00b0dc74	[ScopDetectionDiagnostic] Collect statistics for each diagnostic type Our original statistics were added before we introduced a more fine-grained diagnostic system, but the granularity of our statistics has never been increased accordingly. This change introduces now one statistic counter per diagnostic to enable us to collect fine-grained statistics about who certain scops are not detected. In case coarser grained statistics are needed, the user is expected to combine counters manually. llvm-svn: 287968	2016-11-26 05:53:09 +00:00
Tobias Grosser	b3c3d149b9	[CodeGen] Add flag to code-generate most memory access expressions Introduce the new flag -polly-codegen-generate-expressions which forces Polly to code generate AST expressions instead of using our SCEV based access expression generation even for cases where the original memory access relation was not changed and the SCEV based access expression could be code generated without any issue. This is an experimental option for better testing the isl ast expression generation. The default behavior of Polly remains unchanged. We also exclude a couple of cases for which the AST expression is not yet working. llvm-svn: 287694	2016-11-22 20:21:16 +00:00
Tobias Grosser	d51a945f38	[test] Simplify test case by removing unreferenced instructions [NFC] Drop instructions that do not influence the memory impact of a basic block. They are not needed to reproduce the original bug (verified) and will cause random test noise if we would decide to only model the instructions that have visible side-effects. llvm-svn: 287626	2016-11-22 07:18:57 +00:00
Tobias Grosser	88c025e82e	[test] Ensure important basic blocks in test case have side effects Add two store instructions at the end of basic blocks that are required to reproduce the original bug to ensure we always process and model these basic blocks. This makes this test case stable even in case we would decide to bail out early of basic blocks which do not modify the global state. Also add additional check lines to verify how we model the basic block. llvm-svn: 287625	2016-11-22 07:06:59 +00:00
Tobias Grosser	07ce9a0bcc	test: add more details to non-affine test case We add CHECK lines to this test case to make it easier to see the difference between affine and non-affine memory accesses. We also change the test case to use a parameteric index expression as otherwise our range analysis will understand that the non-affine memory access can only access input[1], which makes it difficult to see that the memory access is in-fact modeled as non-affine access. llvm-svn: 287623	2016-11-22 06:28:08 +00:00
Johannes Doerfert	6cd59e9076	Probably overwritten loads should not be considered hoistable Do not assume a load to be hoistable/invariant if the pointer is used by another instruction in the SCoP that might write to memory and that is always executed. llvm-svn: 287272	2016-11-17 22:25:17 +00:00
Tobias Grosser	a9cac6a732	[tests] Adjust test output to recent changed SCEV canonocalization [NFC] LLVM recently changed the SCEV canonicalization which changed the output of one of our GPGPU test cases. llvm-svn: 286770	2016-11-13 19:27:17 +00:00
Tobias Grosser	a2f8fa33aa	[ScopDetect] Evaluate and verify branches at branch condition, not icmp The validity of a branch condition must be verified at the location of the branch (the branch instruction), not the location of the icmp that is used in the branch instruction. When verifying at the wrong location, we may accept an icmp that is defined within a loop which itself dominates, but does not contain the branch instruction. Such loops cannot be modeled as we only introduce domain dimensions for surrounding loops. To address this problem we change the scop detection to evaluate and verify SCEV expressions at the right location. This issue has been around since at least r179148 "scop detection: properly instantiate SCEVs to the place where they are used", where we explicitly set the scope to the wrong location. Before this commit the scope was not explicitly set, which probably also resulted in the scope around the ICmp to be choosen. This resolves http://llvm.org/PR30989 Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286769	2016-11-13 19:27:04 +00:00
Tobias Grosser	f67433abd9	SCEVAffinator: pass parameter-only set to addRestriction if BB=nullptr Assumptions can either be added for a given basic block, in which case the set describing the assumptions is expected to match the dimensions of its domain. In case no basic block is provided a parameter-only set is expected to describe the assumption. The piecewise expressions that are generated by the SCEVAffinator sometimes have a zero-dimensional domain (e.g., [p] -> { [] : p <= -129 or p >= 128 }), which looks similar to a parameter-only domain, but is still a set domain. This change adds an assert that checks that we always pass parameter domains to addAssumptions if BB is empty to make mismatches here fail early. We also change visitTruncExpr to always convert to parameter sets, if BB is null. This change resolves http://llvm.org/PR30941 Another alternative to this change would have been to inspect all code to make sure we directly generate in the SCEV affinator parameter sets in case of empty domains. However, this would likely complicate the code which combines parameter and non-parameter domains when constructing a statement domain. We might still consider doing this at some point, but as this likely requires several non-local changes this should probably be done as a separate refactoring. Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286444	2016-11-10 11:44:10 +00:00
Tobias Grosser	d0b9173caa	IslAst: always use the context during ast generation Providing the context to the ast generator allows for additional simplifcations and -- more importantly -- allows to generate loops with only partially bounded domains, assuming the domains are bounded for all parameter configurations that are valid as defined by the context. This change fixes the crash reported in http://llvm.org/PR30956 The original reason why we did not include the context when generating an AST was that CLooG and later isl used to sometimes transfer some of the constraints that bound the size of parameters from the context into the generated AST. This resulted in operations with very large constants, which sometimes introduced problematic integer overflows. The latest versions of the isl AST generator are careful to not introduce such constants. Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286442	2016-11-10 09:39:58 +00:00
Tobias Grosser	4d543d654a	SCEVValidator: add new parameters resulting from constant extraction When extracting constant expressions out of SCEVs, new parameters may be introduced, which have not been registered before. This change scans SCEV expressions after constant extraction again to make sure newly introduced parameters are registered. We may for example extract the constant '8' from the expression '((8 * ((%a * %b) + %c)) + (-8 * %a))' and obtain the expression '(((-1 + %b) * %a) + %c)'. The new expression has a new parameter '(-1 + %b) * %a)', which was not registered before, but must be registered to not crash. This closes http://llvm.org/PR30953 Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286430	2016-11-10 06:45:28 +00:00
Tobias Grosser	bbaeda3fe5	Do not allow switch statements in loop latches In r248701 "Allow switch instructions in SCoPs" support for switch statements has been introduced, but support for switch statements in loop latches was incomplete. This change completely disables switch statements in loop latches. The original commit changed addLoopBoundsToHeaderDomain to support non-branch terminator instructions, but this change was incorrect: it added a check for BI != null to the if-branch of a condition, but BI was used in the else branch es well. As a result, when a non-branch terminator instruction is encounted a nullptr dereference is triggered. Due to missing test coverage, this bug was overlooked. r249273 "[FIX] Approximate non-affine loops correctly" added code to disallow switch statements for non-affine loops, if they appear in either a loop latch or a loop exit. We adapt this code to now prohibit switch statements in loop latches even if the control condition is affine. We could possibly add support for switch statements in loop latches, but such support should be evaluated and tested separately. This fixes llvm.org/PR30952 Reported-by: Eli Friedman <efriedma@codeaurora.org> llvm-svn: 286426	2016-11-10 05:20:29 +00:00
Eli Friedman	b9c6f01a81	[ScopInfo] Make memset etc. affine where possible. We don't actually check whether a MemoryAccess is affine in very many places, but one important one is in checks for aliasing. Differential Revision: https://reviews.llvm.org/D25706 llvm-svn: 285746	2016-11-01 20:53:11 +00:00
Eli Friedman	6768285dcc	Add missing test from r284848. Original commit title: [SCEVAffinator] Make precise modular math more correct. llvm-svn: 285745	2016-11-01 20:45:28 +00:00
Michael Kruse	426e6f71f8	[ScopInfo] Fix: use raw source pointer. When adding an llvm.memcpy instruction to AliasSetTracker, it uses the raw source and target pointers which preserve bitcasts. MemAccInst::getPointerOperand() also returns the raw target pointers, but Scop::buildAliasGroups() did not for the source pointer. This lead to mismatches between AliasSetTracker and ScopInfo on which pointer to use. Fixed by also using raw pointers in Scop::buildAliasGroups(). llvm-svn: 285071	2016-10-25 13:37:43 +00:00
Eli Friedman	286c5a76ba	[SCEVAffinator] Make precise modular math more correct. Integer math in LLVM IR is modular. Integer math in isl is arbitrary-precision. Modeling LLVM IR math correctly in isl requires either adding assumptions that math doesn't actually overflow, or explicitly wrapping the math. However, expressions with the "nsw" flag are special; we can pretend they're arbitrary-precision because it's undefined behavior if the result wraps. SCEV expressions based on IR instructions with an nsw flag also carry an nsw flag (roughly; actually, the real rule is a bit more complicated, but the details don't matter here). Before this patch, SCEV flags were also overloaded with an additional function: the ZExt code was mutating SCEV expressions as a hack to indicate to checkForWrapping that we don't need to add assumptions to the operand of a ZExt; it'll add explicit wrapping itself. This kind of works... the problem is that if anything else ever touches that SCEV expression, it'll get confused by the incorrect flags. Instead, with this patch, we make the decision about whether to explicitly wrap the math a bit earlier, basing the decision purely on the SCEV expression itself, and not its users. Differential Revision: https://reviews.llvm.org/D25287 llvm-svn: 284848	2016-10-21 18:08:02 +00:00
Michael Kruse	6b87504973	[test] Fix buildbot after SCEV change. Update test after commit r284501: [SCEV] Make CompareValueComplexity a little bit smarter Contributed-by: Sanjoy Das <sanjoy@playingwithpointers.com> llvm-svn: 284543	2016-10-18 22:58:09 +00:00
Eli Friedman	3c1a75bf9c	Handle multi-dimensional invariant load. If the address of a load depends on another load, make sure to emit the loads in the right order. llvm-svn: 284426	2016-10-17 21:04:26 +00:00
Michael Kruse	6a19d592da	[ScopDetect] Depend transitively on ScalarEvolution. ScopDetection might be queried by -dot-scops or -view-scops passes for which it accesses ScalarEvolution. llvm-svn: 284385	2016-10-17 13:29:20 +00:00
Michael Kruse	2ddb279a39	[test] Add missing colon. llvm-svn: 284349	2016-10-16 22:05:51 +00:00
Michael Kruse	17d5090532	[cmake] Add polly-isl-test dependency to lit tests. Also handle the in-llvm-tree case forgotten in r284339. llvm-svn: 284347	2016-10-16 21:35:57 +00:00
Michael Kruse	8dee3427f7	[cmake] Add polly-isl-test dependency to lit tests. lit recursively iterates through the test subdirectories and finds the ISL unittest. For this test to work, the polly-isl-test executable needs to be compiled. Add the polly-isl-test dependency to POLLY_TEST_DEPS. This makes check-polly and check-polly-tests work from a fresh build directory. llvm-svn: 284339	2016-10-16 18:22:02 +00:00
Michael Kruse	f0c06900ed	[test] Add -polly-unprofitable-scalar-accs to test that needs it. The test non_affine_loop_used_later.ll also tests the profability heuristic. Add the option -polly-unprofitable-scalar-accs explicitely to ensure that the test succeeds if the default value is changed. llvm-svn: 284338	2016-10-16 18:13:01 +00:00
Michael Kruse	fa53c86dc1	[ScopInfo/CodeGen] ExitPHI reads are implicit. Under some conditions MK_Value read accessed where converted to MK_ExitPHI read accessed. This is unexpected because MK_ExitPHI read accesses are implicit after the scop execution. This behaviour was introduced in r265261, which fixed a failed assertion/crash in CodeGen. Instead, we fix this failure in CodeGen itself. createExitPHINodeMerges(), despite its name, also handles accesses of kind MK_Value, only to skip them because they access values that are usually not PHI nodes in the SCoP region's exit block. Except in the situation observed in r265261. Do not convert value accessed to ExitPHI accesses and do not handle value accesses like ExitPHI accessed in CodeGen anymore. llvm-svn: 284023	2016-10-12 16:31:09 +00:00
Michael Kruse	4b5f6af2dc	[cmake] Move isl_test artifacts to Polly folder. Folders in Visual Studio solutions help organize the build artifacts from all LLVM projects. There is a folder to keep Polly-built files in. llvm-svn: 283546	2016-10-07 12:38:24 +00:00
Tobias Grosser	e84ee850d1	Build and run isl_test as part of check-polly Running isl tests is important to gain confidence that the isl build we created works as expected. Besides the actual isl tests, there are also isl AST generation tests shipped with isl. This change only adds support for the isl unit tests. AST generation test support is left for a later commit. There is a choice to run tests directly through the build system or in the context of lit. We choose to run tests as part of lit to as this allows us to easily set environment variables, print output only on error and generally run the tests directly from the lit command. Reviewers: brad.king, Meinersbur Subscribers: modocache, brad.king, pollydev, beanz, llvm-commits, mgorny Differential Revision: https://reviews.llvm.org/D25155 llvm-svn: 283245	2016-10-04 19:48:40 +00:00
Michael Kruse	6ab4476835	[ScopInfo] Add -polly-unprofitable-scalar-accs option. With this option one can disable the heuristic that assumes that statements with a scalar write access cannot be profitably optimized. Such a statement instances necessarily have WAW-dependences to itself. With DeLICM scalar accesses can be changed to array accesses, which can avoid these WAW-dependence. llvm-svn: 283233	2016-10-04 17:33:39 +00:00
Michael Kruse	ca7cbcca37	[ScopInfo] Scalar access do not have indirect base pointers. ScopArrayInfo used to determine base pointer origins by looking up whether the base pointer is a load. The "base pointer" for scalar accesses is the llvm::Value being accessed. This is only a symbolic base pointer, it represents the alloca variable (.s2a or .phiops) generated for it at code generation. This patch disables determining base pointer origin for scalars. A test case where this caused a crash will be added in the next commit. In that test SAI tried to get the origin base pointer that was only declared later, therefore not existing. This is probably only possible for scalars used in PHINode incoming blocks. llvm-svn: 283232	2016-10-04 17:33:34 +00:00
Tobias Grosser	349d1c3368	[ScopDetection] Remove redundant checks for endless loops Summary: Both `canUseISLTripCount()` and `addOverApproximatedRegion()` contained checks to reject endless loops which are now removed and replaced by a single check in `isValidLoop()`. For reporting such loops the `ReportLoopOverlapWithNonAffineSubRegion` is renamed to `ReportLoopHasNoExit`. The test case `ReportLoopOverlapWithNonAffineSubRegion.ll` is adapted and renamed as well. The schedule generation in `buildSchedule()` is based on the following assumption: Given some block B that is contained in a loop L and a SESE region R, we assume that L is contained in R or the other way around. However, this assumption is broken in the presence of endless loops that are nested inside other loops. Therefore, in order to prevent erroneous behavior in `buildSchedule()`, r265280 introduced a corresponding check in `canUseISLTripCount()` to reject endless loops. Unfortunately, it was possible to bypass this check with -polly-allow-nonaffine-loops which was fixed by adding another check to reject endless loops in `allowOverApproximatedRegion()` in r273905. Hence there existed two separate locations that handled this case. Thank you Johannes Doerfert for helping to provide the above background information. Reviewers: Meinersbur, grosser Subscribers: _jdoerfert, pollydev Differential Revision: https://reviews.llvm.org/D24560 Contributed-by: Matthias Reisinger <d412vv1n@gmail.com> llvm-svn: 281987	2016-09-20 17:05:22 +00:00
Tobias Grosser	122d6d74f6	Fix spelling in CMakeLists llvm-svn: 281897	2016-09-19 10:55:31 +00:00
Tobias Grosser	05ee64e67a	GPGPU: add missing REQUIRES line to test case llvm-svn: 281850	2016-09-18 08:57:38 +00:00
Tobias Grosser	bc653f2031	GPGPU: Do not run mostly sequential kernels in GPU In case sequential kernels are found deeper in the loop tree than any parallel kernel, the overall scop is probably mostly sequential. Hence, run it on the CPU. llvm-svn: 281849	2016-09-18 08:31:09 +00:00
Tobias Grosser	82f2af3508	GPGPU: Dynamically ensure 'sufficient compute' Offloading to a GPU is only beneficial if there is a sufficient amount of compute that can be accelerated. Many kernels just have a very small number of dynamic compute, which means GPU acceleration is not beneficial. We compute at run-time an approximation of how many dynamic instructions will be executed and fall back to CPU code in case this number is not sufficiently large. To keep the run-time checking code simple, we over-approximate the number of instructions executed in each statement by computing the volume of the rectangular hull of its iteration space. llvm-svn: 281848	2016-09-18 06:50:35 +00:00
Tobias Grosser	cfdee6582b	GPGPU: Make test cases independent of register numbering [NFC] llvm-svn: 281847	2016-09-18 06:50:28 +00:00
Tobias Grosser	51dfc27589	GPGPU: Store back non-read-only scalars We may generate GPU kernels that store into scalars in case we run some sequential code on the GPU because the remaining data is expected to already be on the GPU. For these kernels it is important to not keep the scalar values in thread-local registers, but to store them back to the corresponding device memory objects that backs them up. We currently only store scalars back at the end of a kernel. This is only correct if precisely one thread is executed. In case more than one thread may be run, we currently invalidate the scop. To support such cases correctly, we would need to always load and store back from a corresponding global memory slot instead of a thread-local alloca slot. llvm-svn: 281838	2016-09-17 19:22:31 +00:00
Tobias Grosser	fe74a7a1f5	GPGPU: Detect read-only scalar arrays ... and pass these by value rather than by reference. llvm-svn: 281837	2016-09-17 19:22:18 +00:00
Tobias Grosser	aaabbbf886	GPGPU: Do not assume arrays start at 0 Our alias checks precisely check that the minimal and maximal accessed elements do not overlap in a kernel. Hence, we must ensure that our host <-> device transfers do not touch additional memory locations that are not covered in the alias check. To ensure this, we make sure that the data we copy for a given array is only the data from the smallest element accessed to the largest element accessed. We also adjust the size of the array according to the offset at which the array is actually accessed. An interesting result of this is: In case array are accessed with negative subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to cover the full array. This is important as such code indeed exists in the wild. llvm-svn: 281611	2016-09-15 14:05:58 +00:00
Roman Gareev	b3224adfb6	Perform copying to created arrays according to the packing transformation This is the fourth patch to apply the BLIS matmul optimization pattern on matmul kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf). BLIS implements gemm as three nested loops around a macro-kernel, plus two packing routines. The macro-kernel is implemented in terms of two additional loops around a micro-kernel. The micro-kernel is a loop around a rank-1 (i.e., outer product) update. In this change we perform copying to created arrays, which is the last step to implement the packing transformation. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D23260 llvm-svn: 281441	2016-09-14 06:26:09 +00:00
Tobias Grosser	a82c4b5df8	GPGPU: Allow region statements llvm-svn: 281305	2016-09-13 08:42:10 +00:00
Tobias Grosser	b79f4d3970	GPGPU: Extend types when array sizes have smaller types This prevents a compiler crash. llvm-svn: 281303	2016-09-13 08:02:14 +00:00
Tobias Grosser	b51d507c74	Adapt test case to recent change in Global Variable Definition llvm-svn: 281295	2016-09-13 05:19:26 +00:00
Roman Gareev	f5aff70405	Store the size of the outermost dimension in case of newly created arrays that require memory allocation. We do not need the size of the outermost dimension in most cases, but if we allocate memory for newly created arrays, that size is needed. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D23991 llvm-svn: 281234	2016-09-12 17:08:31 +00:00
Tobias Grosser	5857b701a3	GPGPU: Bail out gracefully in case of invalid IR Instead of aborting, we now bail out gracefully in case the kernel IR we generate is invalid. This can currently happen in case the SCoP stores pointer values, which we model as arrays, as data values into other arrays. In this case, the original pointer value is not available on the device and can consequently not be stored. As detecting this ahead of time is not so easy, we detect these situations after the invalid IR has been generated and bail out. llvm-svn: 281193	2016-09-12 06:06:31 +00:00
Tobias Grosser	0bf4cc6499	Add missing 'REQUIRES' line llvm-svn: 281166	2016-09-11 13:42:42 +00:00
Tobias Grosser	02293ed755	GPGPU: Do not fail in case of arrays never accessed If these arrays have never been accessed we failed to derive an upper bound of the accesses and consequently a size for the outermost dimension. We now explicitly check for empty access sets and then just use zero as size for the outermost dimension. llvm-svn: 281165	2016-09-11 13:30:12 +00:00
Michael Kruse	7886bd7ca5	Add -polly-flatten-schedule pass. The -polly-flatten-schedule pass reduces the number of scattering dimensions in its isl_union_map form to make them easier to understand. It is not meant to be used in production, only for debugging and regression tests. To illustrate, how it can make sets simpler, here is a lifetime set used computed by the porposed DeLICM pass without flattening: { Stmt_reduction_for[0, 4] -> [0, 2, o2, o3] : o2 < 0; Stmt_reduction_for[0, 4] -> [0, 1, o2, o3] : o2 >= 5; Stmt_reduction_for[0, 4] -> [0, 1, 4, o3] : o3 > 0; Stmt_reduction_for[0, i1] -> [0, 1, i1, 1] : 0 <= i1 <= 3; Stmt_reduction_for[0, 4] -> [0, 2, 0, o3] : o3 <= 0 } And here the same lifetime for a semantically identical one-dimensional schedule: { Stmt_reduction_for[0, i1] -> [2 + 3i1] : 0 <= i1 <= 4 } Differential Revision: https://reviews.llvm.org/D24310 llvm-svn: 280948	2016-09-08 15:02:36 +00:00
Michael Kruse	564579726a	Add check-polly-tests build target. The check-polly-tests target runs regression/unit tests but without checking formatting. This is useful to not having to reload a file in an open editor (which eg. clears the undo buffer, moves cursor/window position) when running polly-update-format. After this change, the following test targets exist: - check-polly-unittests to run unittests only - check-polly-tests to run unit and regression tests - polly-check-format to check formatting using clang-format - check-polly to run them all As a side-effect, when running check-polly, polly-check-format and run in parallel (instead of polly-check-format first). Differential Revision: https://reviews.llvm.org/D24191 llvm-svn: 280654	2016-09-05 10:54:16 +00:00
Michael Kruse	2fa3519463	Allow mapping scalar MemoryAccesses to array elements. Change the code around setNewAccessRelation to allow to use a an existing array element for memory instead of an ad-hoc alloca. This facility will be used for DeLICM/DeGVN to convert scalar dependencies into regular ones. The changes necessary include: - Make the code generator use the implicit locations instead of the alloca ones. - A test case - Make the JScop importer accept changes of scalar accesses for that test case. - Adapt the MemoryAccess interface to the fact that the MemoryKind can change. They are named (get\|is)OriginalXXX() to get the status of the memory access before any change by setNewAccessRelation() (some properties such as getIncoming() do not change even if the kind is changed and are still required). To get the modified properties, there is (get\|is)LatestXXX(). The old accessors without Original\|Latest become synonyms of the (get\|is)OriginalXXX() to not make functional changes in unrelated code. Differential Revision: https://reviews.llvm.org/D23962 llvm-svn: 280408	2016-09-01 19:53:31 +00:00
Michael Kruse	d262feff80	Add space between access string and follow-up. llvm-svn: 279826	2016-08-26 15:43:52 +00:00
Michael Kruse	6b6e38d9b1	Add "New access function" to update_check.py classifier. Lines with this prefix are printed by JSONImporter. llvm-svn: 279825	2016-08-26 15:43:43 +00:00
Roman Gareev	44aeef7ecf	[FIX] Access dimensions should correspond to number of dimensions of the accesses array. llvm-svn: 279821	2016-08-26 13:41:53 +00:00
Michael Kruse	05cf9c22f1	Introduce unittests. Add the infrastructure for unittests to Polly and two simple tests for conversion between isl_val and APInt. In addition, a build target check-polly-unittests is added to run only the unittests but not the regression tests. Clang's unittest mechanism served as as a blueprint which then was adapted to Polly. Differential Revision: https://reviews.llvm.org/D23833 llvm-svn: 279734	2016-08-25 12:36:15 +00:00
Michael Kruse	0e63ab4243	Use configure_lit_site_cfg instead of configure_file. configure_lit_site_cfg defines some more parameters that are used in lit.site.cfg.in. configure_file would leave those empty. These additional definitions seem to be unimportant for regression tests, but unittests do not work without them. In case of out-of-tree builds, define the additional parameters with default values. These may not take all configuration parameters into account, as configure_lit_site_cfg would. llvm-svn: 279733	2016-08-25 12:03:33 +00:00
Michael Kruse	4a080de057	Add %loadPolly to test command line. Required for out-of-tree builds of Polly. llvm-svn: 279657	2016-08-24 19:12:48 +00:00
Roman Gareev	5f99f8656e	Add a flag to dump SCoP optimized with the IslScheduleOptimizer pass Dump polyhedral descriptions of Scops optimized with the isl scheduling optimizer and the set of post-scheduling transformations applied on the schedule tree to be able to check the work of the IslScheduleOptimizer pass at the polyhedral level. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D23740 llvm-svn: 279395	2016-08-21 11:20:39 +00:00
Eli Friedman	28671c83d6	[SCEVValidator] Don't reorder multiplies in extractConstantFactor. The existing code would add the operands in the wrong order, and eventually crash because the SCEV expression doesn't exactly match the parameter SCEV expression in SCEVAffinator::visit. (SCEV doesn't sort the operands to getMulExpr in general.) Differential Revision: https://reviews.llvm.org/D23592 llvm-svn: 279087	2016-08-18 16:30:42 +00:00
Tobias Grosser	1c18440958	[BlockGenerator] Invalidate SCEV values for instructions in scop We already invalidated a couple of critical values earlier on, but we now invalidate all instructions contained in a scop after the scop has been code generated. This is necessary as later scops may otherwise obtain SCEV expressions that reference values in the earlier scop that before dominated the later scop, but which had been moved into the conditional branch and consequently do not dominate the later scop any more. If these very values are then used during code generation of the later scop, we generate used that are dominated by the values they use. This fixes: http://llvm.org/PR28984 llvm-svn: 279047	2016-08-18 10:45:57 +00:00
Tobias Grosser	b143e31164	[ScopInfo] Make scalars used by PHIs in non-affine regions available Normally this is ensured when adding PHI nodes, but as PHI node dependences do not need to be added in case all incoming blocks are within the same non-affine region, this was missed. This corrects an issue visible in LNT's sqlite3, in case invariant load hoisting was disabled. llvm-svn: 278792	2016-08-16 11:44:48 +00:00
Tobias Grosser	c80c15bd50	[ScopDetect] Do not assert in case of AddRecs with non-constant start expression llvm-svn: 278738	2016-08-15 20:59:30 +00:00
Tobias Grosser	13e55a32fd	[test] Force invariant load hoisting one last time Without invariant load hoisting an (unrelated) bug is exposed in this test case: http://llvm.org/PR28984 llvm-svn: 278680	2016-08-15 16:43:33 +00:00
Tobias Grosser	7cb809983d	[tests] Force invariant load hoisting for test cases that need it -- III llvm-svn: 278673	2016-08-15 15:56:24 +00:00
Tobias Grosser	ad61c170d5	[tests] Force invariant load hoisting for test cases that need it II llvm-svn: 278669	2016-08-15 13:58:16 +00:00
Tobias Grosser	75b9c7df4d	[test] Correct spelling in test case and explicitly enable invariant load hoisting for this test case. llvm-svn: 278668	2016-08-15 13:58:04 +00:00
Tobias Grosser	6e6264c142	[tests] Force invariant load hoisting for test cases that need it This will make it easier to switch the default of Polly's invariant load hoisting strategy and also makes it very clear that these test cases indeed require invariant code hoisting to work. llvm-svn: 278667	2016-08-15 13:27:49 +00:00
Roman Gareev	1c892e91e3	Perform replacement of access relations and creation of new arrays according to the packing transformation This is the third patch to apply the BLIS matmul optimization pattern on matmul kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf). BLIS implements gemm as three nested loops around a macro-kernel, plus two packing routines. The macro-kernel is implemented in terms of two additional loops around a micro-kernel. The micro-kernel is a loop around a rank-1 (i.e., outer product) update. In this change we perform replacement of the access relations and create empty arrays, which are steps to implement the packing transformation. In subsequent changes we will implement copying to created arrays. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: http://reviews.llvm.org/D22187 llvm-svn: 278666	2016-08-15 12:22:54 +00:00
Tobias Grosser	d58acf866a	[GPGPU] Ensure arrays where only parts are modified are copied to GPU To do so we change the way array exents are computed. Instead of the precise set of memory locations accessed, we now compute the extent as the range between minimal and maximal address in the first dimension and the full extent defined by the sizes of the inner array dimensions. We also move the computation of the may_persist region after the construction of the arrays, as it relies on array information. Without arrays being constructed no useful information is computed at all. llvm-svn: 278212	2016-08-10 10:58:19 +00:00
Tobias Grosser	b06ff4574e	[GPGPU] Support PHI nodes used in GPU kernel Ensure the right scalar allocations are used as the host location of data transfers. For the device code, we clear the allocation cache before device code generation to be able to generate new device-specific allocation and we need to make sure to add back the old host allocations as soon as the device code generation is finished. llvm-svn: 278126	2016-08-09 15:35:06 +00:00
Tobias Grosser	750160e260	[GPGPU] Use separate basic block for GPU initialization code This increases the readability of the IR and also clarifies that the GPU inititialization is executed _after_ the scalar initialization which needs to before the code of the transformed scop is executed. Besides increased readability, the IR should not change. Specifically, I do not expect any changes in program semantics due to this patch. llvm-svn: 278125	2016-08-09 15:35:03 +00:00
Tobias Grosser	776700d0b7	[BlockGenerator] Insert initializations at beginning of start block In case some code -- not guarded by control flow -- would be emitted directly in the start block, it may happen that this code would use uninitalized scalar values if the scalar initialization is only emitted at the end of the start block. This is not a problem today in normal Polly, as all statements are emitted in their own basic blocks, but Polly-ACC emits host-to-device copy statements into the start block. Additional Polly-ACC test coverage will be added in subsequent changes that improve the handling of PHI nodes in Polly-ACC. llvm-svn: 278124	2016-08-09 15:34:59 +00:00
Tobias Grosser	77f76788dc	[tests] Add two missing 'REQUIRES' lines llvm-svn: 278104	2016-08-09 09:11:39 +00:00
Tobias Grosser	c59b3ce044	[BlockGenerator] Also eliminate dead code not originating from BB After having generated the code for a ScopStmt, we run a simple dead-code elimination that drops all instructions that are known to be and remain unused. Until this change, we only considered instructions for dead-code elimination, if they have a corresponding instruction in the original BB that belongs to ScopStmt. However, when generating code we do not only copy code from the BB belonging to a ScopStmt, but also generate code for operands referenced from BB. After this change, we now also considers code for dead code elimination, which does not have a corresponding instruction in BB. This fixes a bug in Polly-ACC where such dead-code referenced CPU code from within a GPU kernel, which is possible as we do not guarantee that all variables that are used in known-dead-code are moved to the GPU. llvm-svn: 278103	2016-08-09 08:59:05 +00:00
Tobias Grosser	cf66ef26f3	[GPGPU] Pass parameters always by using their own type llvm-svn: 278100	2016-08-09 07:22:08 +00:00
Tobias Grosser	124534038a	[GPGPU] Support Values referenced from both isl expr and llvm instructions When adding code that avoids to pass values used in isl expressions and LLVM instructions twice, we forgot to make single variable passed to the kernel available in the ValueMap that makes it usable for instructions that are not replaced with isl ast expressions. This change adds the variable that is passed to the kernel to the ValueMap to ensure it is available for such use cases as well. llvm-svn: 278039	2016-08-08 19:22:19 +00:00
Tobias Grosser	cb1aef8de4	[GPGPU] Create code to verify run-time conditions llvm-svn: 278026	2016-08-08 17:35:55 +00:00
Tobias Grosser	928d7573dd	GPGPU: Sort dimension sizes of multi-dimensional shared memory arrays correctly Before this commit we generated the array type in reverse order and we also added the outermost dimension size to the new array declaration, which is incorrect as Polly additionally assumed an additional unsized outermost dimension, such that we had an off-by-one error in the linearization of access expressions. llvm-svn: 277802	2016-08-05 08:27:24 +00:00
Tobias Grosser	470608e3e4	Add missing 'REQUIRES' line llvm-svn: 277800	2016-08-05 07:08:45 +00:00
Tobias Grosser	c1c6a2a61b	GPGPU: Add cuda annotations to specify maximal number of threads per block These annotations ensure that the NVIDIA PTX assembler limits the number of registers used such that we can be certain the resulting kernel can be executed for the number of threads in a thread block that we are planning to use. llvm-svn: 277799	2016-08-05 06:47:43 +00:00
Tobias Grosser	f919d8b360	GPGPU: Support scalars that are mapped to shared memory llvm-svn: 277726	2016-08-04 13:57:29 +00:00
Tobias Grosser	130ca30f92	GPGPU: Add private memory support llvm-svn: 277722	2016-08-04 12:39:03 +00:00
Tobias Grosser	b513b4916b	GPGPU: Add support for shared memory llvm-svn: 277721	2016-08-04 12:18:14 +00:00
Tobias Grosser	00bb5a99f5	GPGPU: Handle scalar array references Pass the content of scalar array references to the alloca on the kernel side and do not pass them additional as normal LLVM scalar value. llvm-svn: 277699	2016-08-04 06:55:59 +00:00
Tobias Grosser	576932728d	GPGPU: Pass subtree values correctly to the kernel llvm-svn: 277697	2016-08-04 06:55:49 +00:00
Tobias Grosser	629109b633	GPGPU: Mark kernel functions as polly.skip Otherwise, we would try to re-optimize them with Polly-ACC and possibly even generate kernels that try to offload themselves, which does not work as the GPURuntime is not available on the accelerator and also does not make any sense. llvm-svn: 277589	2016-08-03 12:00:07 +00:00
Roman Gareev	0c09a3af00	Add missing prefixes. llvm-svn: 277264	2016-07-30 11:15:00 +00:00

... 3 4 5 6 7 ...

1345 Commits