llvm-project

Commit Graph

Author	SHA1	Message	Date
Michael Kruse	4b0c5aea78	[CodeGen] Add assertion for indirect array index expression generation. NFC. Currently Polly cannot generate code for index expressions if the base pointer is computed within the scop. The base pointer must be generated as well, but there is no code that triggers that. Add an assertion to detect when this would occur and miscompile. The IR verifier should catch it as well. llvm-svn: 282893	2016-09-30 18:29:37 +00:00
Michael Kruse	888ab55140	[CodeGen] Change 'Scalar' to 'Array' in method names. NFC. generateScalarLoad() and generateScalarStore() are used for explicit (MK_Array) memory accesses, therefore the method names were misleading. The names also were similar to generateScalarLoads() and generateScalarStores() (plural forms) which indeed handle scalar accesses. Presumbly, they were originally named to contrast VectorBlockGenerator::generateLoad(). Rename the two methods to generateArrayLoad(), respectively generateArrayStore(). llvm-svn: 282861	2016-09-30 14:34:05 +00:00
Michael Kruse	77394f1394	[CodeGen] Add assertion for partial scalar accesses. NFC. The code generator always adds unconditional LoadInst and StoreInst, hence the MemoryAccess must be defined over all statement instances. llvm-svn: 282853	2016-09-30 14:01:46 +00:00
Tobias Grosser	bc653f2031	GPGPU: Do not run mostly sequential kernels in GPU In case sequential kernels are found deeper in the loop tree than any parallel kernel, the overall scop is probably mostly sequential. Hence, run it on the CPU. llvm-svn: 281849	2016-09-18 08:31:09 +00:00
Tobias Grosser	82f2af3508	GPGPU: Dynamically ensure 'sufficient compute' Offloading to a GPU is only beneficial if there is a sufficient amount of compute that can be accelerated. Many kernels just have a very small number of dynamic compute, which means GPU acceleration is not beneficial. We compute at run-time an approximation of how many dynamic instructions will be executed and fall back to CPU code in case this number is not sufficiently large. To keep the run-time checking code simple, we over-approximate the number of instructions executed in each statement by computing the volume of the rectangular hull of its iteration space. llvm-svn: 281848	2016-09-18 06:50:35 +00:00
Tobias Grosser	51dfc27589	GPGPU: Store back non-read-only scalars We may generate GPU kernels that store into scalars in case we run some sequential code on the GPU because the remaining data is expected to already be on the GPU. For these kernels it is important to not keep the scalar values in thread-local registers, but to store them back to the corresponding device memory objects that backs them up. We currently only store scalars back at the end of a kernel. This is only correct if precisely one thread is executed. In case more than one thread may be run, we currently invalidate the scop. To support such cases correctly, we would need to always load and store back from a corresponding global memory slot instead of a thread-local alloca slot. llvm-svn: 281838	2016-09-17 19:22:31 +00:00
Tobias Grosser	fe74a7a1f5	GPGPU: Detect read-only scalar arrays ... and pass these by value rather than by reference. llvm-svn: 281837	2016-09-17 19:22:18 +00:00
Tobias Grosser	aaabbbf886	GPGPU: Do not assume arrays start at 0 Our alias checks precisely check that the minimal and maximal accessed elements do not overlap in a kernel. Hence, we must ensure that our host <-> device transfers do not touch additional memory locations that are not covered in the alias check. To ensure this, we make sure that the data we copy for a given array is only the data from the smallest element accessed to the largest element accessed. We also adjust the size of the array according to the offset at which the array is actually accessed. An interesting result of this is: In case array are accessed with negative subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to cover the full array. This is important as such code indeed exists in the wild. llvm-svn: 281611	2016-09-15 14:05:58 +00:00
Roman Gareev	b3224adfb6	Perform copying to created arrays according to the packing transformation This is the fourth patch to apply the BLIS matmul optimization pattern on matmul kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf). BLIS implements gemm as three nested loops around a macro-kernel, plus two packing routines. The macro-kernel is implemented in terms of two additional loops around a micro-kernel. The micro-kernel is a loop around a rank-1 (i.e., outer product) update. In this change we perform copying to created arrays, which is the last step to implement the packing transformation. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D23260 llvm-svn: 281441	2016-09-14 06:26:09 +00:00
Tobias Grosser	0a893f7df4	GPGPU: Use const_cast to avoid compiler warning [NFC] llvm-svn: 281333	2016-09-13 13:22:27 +00:00
Tobias Grosser	a82c4b5df8	GPGPU: Allow region statements llvm-svn: 281305	2016-09-13 08:42:10 +00:00
Tobias Grosser	b79f4d3970	GPGPU: Extend types when array sizes have smaller types This prevents a compiler crash. llvm-svn: 281303	2016-09-13 08:02:14 +00:00
Roman Gareev	f5aff70405	Store the size of the outermost dimension in case of newly created arrays that require memory allocation. We do not need the size of the outermost dimension in most cases, but if we allocate memory for newly created arrays, that size is needed. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D23991 llvm-svn: 281234	2016-09-12 17:08:31 +00:00
Tobias Grosser	5857b701a3	GPGPU: Bail out gracefully in case of invalid IR Instead of aborting, we now bail out gracefully in case the kernel IR we generate is invalid. This can currently happen in case the SCoP stores pointer values, which we model as arrays, as data values into other arrays. In this case, the original pointer value is not available on the device and can consequently not be stored. As detecting this ahead of time is not so easy, we detect these situations after the invalid IR has been generated and bail out. llvm-svn: 281193	2016-09-12 06:06:31 +00:00
Tobias Grosser	02293ed755	GPGPU: Do not fail in case of arrays never accessed If these arrays have never been accessed we failed to derive an upper bound of the accesses and consequently a size for the outermost dimension. We now explicitly check for empty access sets and then just use zero as size for the outermost dimension. llvm-svn: 281165	2016-09-11 13:30:12 +00:00
Tobias Grosser	a3afe44d6c	IslNodeBuilder: Add missing __isl_take annotation llvm-svn: 281034	2016-09-09 11:16:50 +00:00
Tobias Grosser	f3600dfa2d	IslNodeBuilder: Add missing __isl_take annotations llvm-svn: 280936	2016-09-08 13:48:55 +00:00
Tobias Grosser	c80d6979bd	Drop '@brief' from doxygen comments LLVM's coding guideline suggests to not use @brief for one-sentence doxygen comments to improve readability. Switch this once and for all to ensure people do not copy @brief comments from other parts of Polly, when writing new code. llvm-svn: 280468	2016-09-02 06:33:33 +00:00
Michael Kruse	2fa3519463	Allow mapping scalar MemoryAccesses to array elements. Change the code around setNewAccessRelation to allow to use a an existing array element for memory instead of an ad-hoc alloca. This facility will be used for DeLICM/DeGVN to convert scalar dependencies into regular ones. The changes necessary include: - Make the code generator use the implicit locations instead of the alloca ones. - A test case - Make the JScop importer accept changes of scalar accesses for that test case. - Adapt the MemoryAccess interface to the fact that the MemoryKind can change. They are named (get\|is)OriginalXXX() to get the status of the memory access before any change by setNewAccessRelation() (some properties such as getIncoming() do not change even if the kind is changed and are still required). To get the modified properties, there is (get\|is)LatestXXX(). The old accessors without Original\|Latest become synonyms of the (get\|is)OriginalXXX() to not make functional changes in unrelated code. Differential Revision: https://reviews.llvm.org/D23962 llvm-svn: 280408	2016-09-01 19:53:31 +00:00
Tobias Grosser	1c18440958	[BlockGenerator] Invalidate SCEV values for instructions in scop We already invalidated a couple of critical values earlier on, but we now invalidate all instructions contained in a scop after the scop has been code generated. This is necessary as later scops may otherwise obtain SCEV expressions that reference values in the earlier scop that before dominated the later scop, but which had been moved into the conditional branch and consequently do not dominate the later scop any more. If these very values are then used during code generation of the later scop, we generate used that are dominated by the values they use. This fixes: http://llvm.org/PR28984 llvm-svn: 279047	2016-08-18 10:45:57 +00:00
Tobias Grosser	d58acf866a	[GPGPU] Ensure arrays where only parts are modified are copied to GPU To do so we change the way array exents are computed. Instead of the precise set of memory locations accessed, we now compute the extent as the range between minimal and maximal address in the first dimension and the full extent defined by the sizes of the inner array dimensions. We also move the computation of the may_persist region after the construction of the arrays, as it relies on array information. Without arrays being constructed no useful information is computed at all. llvm-svn: 278212	2016-08-10 10:58:19 +00:00
Tobias Grosser	b06ff4574e	[GPGPU] Support PHI nodes used in GPU kernel Ensure the right scalar allocations are used as the host location of data transfers. For the device code, we clear the allocation cache before device code generation to be able to generate new device-specific allocation and we need to make sure to add back the old host allocations as soon as the device code generation is finished. llvm-svn: 278126	2016-08-09 15:35:06 +00:00
Tobias Grosser	750160e260	[GPGPU] Use separate basic block for GPU initialization code This increases the readability of the IR and also clarifies that the GPU inititialization is executed _after_ the scalar initialization which needs to before the code of the transformed scop is executed. Besides increased readability, the IR should not change. Specifically, I do not expect any changes in program semantics due to this patch. llvm-svn: 278125	2016-08-09 15:35:03 +00:00
Tobias Grosser	776700d0b7	[BlockGenerator] Insert initializations at beginning of start block In case some code -- not guarded by control flow -- would be emitted directly in the start block, it may happen that this code would use uninitalized scalar values if the scalar initialization is only emitted at the end of the start block. This is not a problem today in normal Polly, as all statements are emitted in their own basic blocks, but Polly-ACC emits host-to-device copy statements into the start block. Additional Polly-ACC test coverage will be added in subsequent changes that improve the handling of PHI nodes in Polly-ACC. llvm-svn: 278124	2016-08-09 15:34:59 +00:00
Tobias Grosser	c59b3ce044	[BlockGenerator] Also eliminate dead code not originating from BB After having generated the code for a ScopStmt, we run a simple dead-code elimination that drops all instructions that are known to be and remain unused. Until this change, we only considered instructions for dead-code elimination, if they have a corresponding instruction in the original BB that belongs to ScopStmt. However, when generating code we do not only copy code from the BB belonging to a ScopStmt, but also generate code for operands referenced from BB. After this change, we now also considers code for dead code elimination, which does not have a corresponding instruction in BB. This fixes a bug in Polly-ACC where such dead-code referenced CPU code from within a GPU kernel, which is possible as we do not guarantee that all variables that are used in known-dead-code are moved to the GPU. llvm-svn: 278103	2016-08-09 08:59:05 +00:00
Tobias Grosser	cf66ef26f3	[GPGPU] Pass parameters always by using their own type llvm-svn: 278100	2016-08-09 07:22:08 +00:00
Tobias Grosser	124534038a	[GPGPU] Support Values referenced from both isl expr and llvm instructions When adding code that avoids to pass values used in isl expressions and LLVM instructions twice, we forgot to make single variable passed to the kernel available in the ValueMap that makes it usable for instructions that are not replaced with isl ast expressions. This change adds the variable that is passed to the kernel to the ValueMap to ensure it is available for such use cases as well. llvm-svn: 278039	2016-08-08 19:22:19 +00:00
Tobias Grosser	cb1aef8de4	[GPGPU] Create code to verify run-time conditions llvm-svn: 278026	2016-08-08 17:35:55 +00:00
Tobias Grosser	fa9abd1f03	Fix compilation in 'asserts' mode llvm-svn: 278025	2016-08-08 17:35:52 +00:00
Tobias Grosser	0aa29532b7	[IslNodeBuilder] Move run-time check generation to NodeBuilder [NFC] This improves the structure of the code and allows us to reuse the runtime code generation in the PPCGCodeGeneration. llvm-svn: 278017	2016-08-08 15:41:52 +00:00
Tobias Grosser	219feac456	[CodeGeneration] Do not set insert position redundantly There is no need to reset the position of the builder, as we can just continue to insert code at the current position of the IRBuilder, which happens to be precisely the location we reset the builder to. llvm-svn: 278014	2016-08-08 15:25:50 +00:00
Tobias Grosser	000db70754	[IslNodeBuilder] Directly use the insert location of our Builder ... instead of adding instructions at the end of the basic block the builder is currently at. This makes it easier to reason about where IR is generated, as with the IRBuilder there is just a single location that specificies where IR is generated. llvm-svn: 278013	2016-08-08 15:25:46 +00:00
Michael Kruse	fbde435517	[CodeGen] Use MapVector instead of DenseMap. The map is iterated over when generating the values escaping the SCoP. The indeterministic iteration order of DenseMap causes the output IR to change at every compilation, adding noise to comparisons. Replace DenseMap by a MapVector to ensure the same iteration order at every compilation. llvm-svn: 277832	2016-08-05 16:45:51 +00:00
Tobias Grosser	928d7573dd	GPGPU: Sort dimension sizes of multi-dimensional shared memory arrays correctly Before this commit we generated the array type in reverse order and we also added the outermost dimension size to the new array declaration, which is incorrect as Polly additionally assumed an additional unsized outermost dimension, such that we had an off-by-one error in the linearization of access expressions. llvm-svn: 277802	2016-08-05 08:27:24 +00:00
Tobias Grosser	c1c6a2a61b	GPGPU: Add cuda annotations to specify maximal number of threads per block These annotations ensure that the NVIDIA PTX assembler limits the number of registers used such that we can be certain the resulting kernel can be executed for the number of threads in a thread block that we are planning to use. llvm-svn: 277799	2016-08-05 06:47:43 +00:00
Tobias Grosser	f919d8b360	GPGPU: Support scalars that are mapped to shared memory llvm-svn: 277726	2016-08-04 13:57:29 +00:00
Tobias Grosser	8950cead7f	GPGPU: Disable verbose debug output llvm-svn: 277724	2016-08-04 12:44:03 +00:00
Tobias Grosser	b0dd95bcd2	Remove leftover debug output llvm-svn: 277723	2016-08-04 12:41:28 +00:00
Tobias Grosser	130ca30f92	GPGPU: Add private memory support llvm-svn: 277722	2016-08-04 12:39:03 +00:00
Tobias Grosser	b513b4916b	GPGPU: Add support for shared memory llvm-svn: 277721	2016-08-04 12:18:14 +00:00
Tobias Grosser	00bb5a99f5	GPGPU: Handle scalar array references Pass the content of scalar array references to the alloca on the kernel side and do not pass them additional as normal LLVM scalar value. llvm-svn: 277699	2016-08-04 06:55:59 +00:00
Tobias Grosser	3216f8546c	BlockGenerator: Assert that we do not get alloca of array access llvm-svn: 277698	2016-08-04 06:55:53 +00:00
Tobias Grosser	576932728d	GPGPU: Pass subtree values correctly to the kernel llvm-svn: 277697	2016-08-04 06:55:49 +00:00
Tobias Grosser	629109b633	GPGPU: Mark kernel functions as polly.skip Otherwise, we would try to re-optimize them with Polly-ACC and possibly even generate kernels that try to offload themselves, which does not work as the GPURuntime is not available on the accelerator and also does not make any sense. llvm-svn: 277589	2016-08-03 12:00:07 +00:00
Tobias Grosser	2219d15748	Fix a couple of spelling mistakes llvm-svn: 277569	2016-08-03 05:28:09 +00:00
Roman Gareev	d7754a1245	Extend the jscop interface to allow the user to declare new arrays and to reference these arrays from access expressions Extend the jscop interface to allow the user to export arrays. It is required that already existing arrays of the list of arrays correspond to arrays of the SCoP. Each array that is appended to the list will be newly created. Furthermore, we allow the user to modify access expressions to reference any array in case it has the same element type. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D22828 llvm-svn: 277263	2016-07-30 09:25:51 +00:00
Tobias Grosser	d8b94bcac1	GPGPU: Pass context parameters to GPU kernel llvm-svn: 276963	2016-07-28 06:47:59 +00:00
Tobias Grosser	a490147c90	GPGPU: Pass host iterators to kernel llvm-svn: 276962	2016-07-28 06:47:56 +00:00
Tobias Grosser	44143bb927	GPGPU: use current 'Index' to find slot in parameter array Before this change we used the array index, which would result in us accessing the parameter array out-of-bounds. This bug was visible for test cases where not all arrays in a scop are passed to a given kernel. llvm-svn: 276961	2016-07-28 06:47:53 +00:00
Tobias Grosser	4e18d71c71	GPGPU: Generate kernel parameter allocation with right size Before this change we miscounted the number of function parameters. llvm-svn: 276960	2016-07-28 06:47:50 +00:00

1 2 3 4 5 ...

624 Commits