llvm-project

Commit Graph

Author	SHA1	Message	Date
Siddharth Bhat	a1b2086a33	[Invariant Loads] Do not consider invariant loads to have dependences. We need to relax constraints on invariant loads so that they do not create fake RAW dependences. So, we do not consider invariant loads as scalar dependences in a region. During these changes, it turned out that we do not consider `llvm::Value` replacements correctly within `PPCGCodeGeneration` and `ISLNodeBuilder`. The replacements dictated by `ValueMap` were not being followed in all places. This was fixed in this commit. There is no clean way to decouple this change because this bug only seems to arise when the relaxed version of invariant load hoisting was enabled. Differential Revision: https://reviews.llvm.org/D35120 llvm-svn: 307907	2017-07-13 12:18:56 +00:00
Singapuram Sanjay Srivallabh	1abd9ffa37	[PPCGCodeGen] Differentiate kernels based on their parent Scop Summary: Add a sequence number that identifies a ptx_kernel's parent Scop within a function to it's name to differentiate it from other kernels produced from the same function, yet different Scops. Kernels produced from different Scops can end up having the same name. Consider a function with 2 Scops and each Scop being able to produce just one kernel. Both of these kernels have the name "kernel_0". This can lead to the wrong kernel being launched when the runtime picks a kernel from its cache based on the name alone. This patch supplements D33985, by differentiating kernels across Scops as well. Previously (even before D33985) while profiling kernels generated through JIT e.g. Julia, [[ https://groups.google.com/d/msg/polly-dev/J1j587H3-Qw/mR-jfL16BgAJ \| kernels associated with different functions, and even different SCoPs within a function, would be grouped together due to the common name ]]. This patch prevents this grouping and the kernels are reported separately. Reviewers: grosser, bollu Reviewed By: grosser Subscribers: mehdi_amini, nemanjai, pollydev, kbarton Tags: #polly Differential Revision: https://reviews.llvm.org/D35176 llvm-svn: 307814	2017-07-12 16:46:19 +00:00
Siddharth Bhat	761e5b9310	[Polly] [PPCGCodeGeneration] Teach `must_kills` to kill scalars that are local to the scop. - By definition, we can pass something as a `kill` to PPCG if we know that no data can flow across a kill. - This is useful for more complex examples where we have scalars that are local to a scop. - If the local is only used within a scop, we are free to kill it. Differential Revision: https://reviews.llvm.org/D35045 llvm-svn: 307260	2017-07-06 13:42:42 +00:00
Singapuram Sanjay Srivallabh	79f13b9a80	Prefix the name of the calling host function in the name of callee GPU kernel Summary: Provide more context to the name of a GPU kernel by prefixing its name with the host function that calls it. E.g. The first kernel called by `gemm` would be `FUNC_gemm_KERNEL_0`. Kernels currently follow the "kernel_#" (# = 0,1,2,3,...) nomenclature. This patch makes it easier to map host caller and device callee, especially when there are many kernels produced by Polly-ACC. Reviewers: grosser, Meinersbur, bollu, philip.pfaffe, kbarton! Reviewed By: grosser Subscribers: nemanjai, pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D33985 llvm-svn: 307173	2017-07-05 16:48:21 +00:00
Siddharth Bhat	a82f2d264a	[PPCGCodeGeneration] Teach Polly to start using live range reordering. Polly did not use PPCG's live range reordering feature. Teach PPCGCodeGeneration to use this. Documentation on this is sparse, so much of the code is conservative. We currently kill all phi nodes in a Scop by appending them to the must_kill map we pass to PPCG. I do not have a proof of correctness, but it seems to be intuitively correct. We also do not handle `array_order`, which, quoting PPCG, is: PPCG/gpu.h: "Order dependences on non-scalars." It seems to consist of RAW dependences between arrays. We need to pass this information for more complex privatization cases. Differential Revision: https://reviews.llvm.org/D34941 llvm-svn: 307163	2017-07-05 14:57:04 +00:00
Singapuram Sanjay Srivallabh	02ca346e48	Introduce a hybrid target to generate code for either the GPU or CPU Summary: Introduce a "hybrid" `-polly-target` option to optimise code for either the GPU or CPU. When this target is selected, PPCGCodeGeneration will attempt first to optimise a Scop. If the Scop isn't modified, it is then sent to the passes that form the CPU pipeline, i.e. IslScheduleOptimizerPass, IslAstInfoWrapperPass and CodeGeneration. In case the Scop is modified, it is marked to be skipped by the subsequent CPU optimisation passes. Reviewers: grosser, Meinersbur, bollu Reviewed By: grosser Subscribers: kbarton, nemanjai, pollydev Tags: #polly Differential Revision: https://reviews.llvm.org/D34054 llvm-svn: 306863	2017-06-30 19:42:21 +00:00
Siddharth Bhat	65d7f72f2c	[PPCGCodeGeneration] Add flag to allow polly to fail in GPU kernel fails. - This is useful for debugging GPU code. llvm-svn: 306290	2017-06-26 14:56:56 +00:00
Siddharth Bhat	f291c8d510	[PPCGCodeGeneration] Allow intrinsics within kernels. - In D33414, if any function call was found within a kernel, we would bail out. - This is an over-approximation. This patch changes this by allowing the `llvm.sqrt.*` family of intrinsics. - This introduces an additional step when creating a separate llvm::Module for a kernel (GPUModule). We now copy function declarations from the original module to new module. - We also populate IslNodeBuilder::ValueMap so it replaces the function references to the old module to the ones in the new module (GPUModule). Differential Revision: https://reviews.llvm.org/D34145 llvm-svn: 306284	2017-06-26 13:12:06 +00:00
Andreas Simbuerger	256070d85c	[NFC] Return both polly.start and polly.exiting from executeScopConditionally. This commit returns both the start and the exit block that are created by executeScopConditionally. In a future commit we will make use of the exit block. Before we would have to use the implicit property that there won't be any code generated between polly.start and polly.exiting at the time of use to find the correct block ('polly.exiting'). All usage location are semantically unchanged. llvm-svn: 306283	2017-06-26 12:17:11 +00:00
Siddharth Bhat	a12f807f33	[PPCGCodeGeneration] Enable GPU code generation with invariant loads. The condition that disallowed code generation in PPCGCodeGeneration with invariant loads is not required. I haven't been able to construct a counterexample where this generates invalid code. Differential Revision: https://reviews.llvm.org/D34604 llvm-svn: 306245	2017-06-25 14:48:24 +00:00
Siddharth Bhat	bccaea57c0	[Polly] [PPCGCodeGeneration] Skip Scops which contain function pointers. In `PPCGCodeGeneration`, we try to take the references of every `Value` that is used within a Scop to offload to the kernel. This occurs in `GPUNodeBuilder::createLaunchParameters`. This breaks if one of the values is a function pointer, since one of these cases will trigger: 1. We try to to take the references of an intrinsic function, and this breaks at `verifyModule`, since it is illegal to take the reference of an intrinsic. 2. We manage to take the reference to a function, but this fails at `verifyModule` since the function will not be present in the module that is created in the kernel. 3. Even if `verifyModule` succeeds (which should not occur), we would then try to call a host function from the device, which is illegal runtime behaviour. So, we disable this entire range of possibilities by simply not allowing function references within a `Scop` which corresponds to a kernel. However, note that this is too conservative. We can allow intrinsics within kernels if the backend can lower the intrinsic correctly. For example, an intrinsic like `llvm.powi.*` can actually be lowered by the `NVPTX` backend. We will now gradually whitelist intrinsics which are known to be safe. Differential Revision: https://reviews.llvm.org/D33414 llvm-svn: 305185	2017-06-12 11:41:09 +00:00
Michael Kruse	a6d48f59a1	Fix a lot of typos. NFC. llvm-svn: 304974	2017-06-08 12:06:15 +00:00
Philip Pfaffe	2b852e2e42	[Polly][NewPM] Port IslAst to the new ScopPassManager Summary: This patch ports IslAst to the new PM. The change is mostly straightforward. The only major modification required is making IslAst move-only, to correctly manage the isl resources it owns. Reviewers: grosser, Meinersbur Reviewed By: grosser Subscribers: nemanjai, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D33422 llvm-svn: 303622	2017-05-23 10:12:56 +00:00
Siddharth Bhat	b7f68b8c9e	[Fortran Support] Materialize outermost dimension for Fortran array. - We use the outermost dimension of arrays since we need this information to generate GPU transfers. - In general, if we do not know the outermost dimension of the array (because the indexing expression is non-affine, for example) then we simply cannot generate transfer code. - However, for Fortran arrays, we can use the Fortran array representation which stores the dimensions of all arrays. - This patch uses the Fortran array representation to generate code that computes the outermost dimension size. Differential Revision: https://reviews.llvm.org/D32967 llvm-svn: 303429	2017-05-19 15:07:45 +00:00
Philip Pfaffe	5cc87e3ab3	[Polly][NewPM] Port ScopDetection to the new PassManager Summary: This is a proof of concept of how to port polly-passes to the new PassManager architecture. This approach works ootb for Function-Passes, but might not be directly applicable to Scop/Region-Passes. While we could just run the Analyses/Transforms over functions instead, we'd surrender the nice pipelining behaviour we have now. Reviewers: Meinersbur, grosser Reviewed By: grosser Subscribers: pollydev, sanjoy, nemanjai, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D31459 llvm-svn: 302902	2017-05-12 14:37:29 +00:00
Siddharth Bhat	a90be207c6	[Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen Summary: PPCGCodeGeneration now attaches the size of the kernel launch parameters at the end of the parameter list. For the existing CUDA Runtime, this gets ignored, but the OpenCL Runtime knows to check for kernel-argument size at the end of the parameter list. (The resulting parameters list is twice as long. This has been accounted for in the corresponding test cases). Reviewers: grosser, Meinersbur, bollu Reviewed By: bollu Subscribers: nemanjai, yaxunl, Anastasia, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D32961 llvm-svn: 302515	2017-05-09 10:45:52 +00:00
Siddharth Bhat	17f01968f1	[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen Summary: When compiling for GPU, one can now choose to compile for OpenCL or CUDA, with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library for that purpose, correctly choosing the corresponding library calls to the option chosen when compiling (via different initialization calls). Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far). Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay Reviewed By: grosser, Meinersbur Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia Tags: #polly Differential Revision: https://reviews.llvm.org/D32431 llvm-svn: 302379	2017-05-07 21:03:46 +00:00
Siddharth Bhat	c1267b9baa	Revert "[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen" This reverts commit 17a84e414adb51ee375d14836d4c2a817b191933. Patches should have been submitted in the order of: 1. D32852 2. D32854 3. D32431 I mistakenly pushed D32431(3) first. Reverting to push in the correct order. llvm-svn: 302217	2017-05-05 09:02:08 +00:00
Siddharth Bhat	51904ae35a	[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen Summary: When compiling for GPU, one can now choose to compile for OpenCL or CUDA, with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library for that purpose, correctly choosing the corresponding library calls to the option chosen when compiling (via different initialization calls). Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far). Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay Reviewed By: grosser, Meinersbur Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia Tags: #polly Differential Revision: https://reviews.llvm.org/D32431 llvm-svn: 302215	2017-05-05 07:54:49 +00:00
Siddharth Bhat	abed49699b	[Polly] [PPCGCodeGeneration] Add managed memory support to GPU code generation. This needs changes to GPURuntime to expose synchronization between host and device. 1. Needs better function naming, I want a better name than "getOrCreateManagedDeviceArray" 2. DeviceAllocations is used by both the managed memory and the non-managed memory path. This exploits the fact that the two code paths are never run together. I'm not sure if this is the best design decision Reviewed by: PhilippSchaad Tags: #polly Differential Revision: https://reviews.llvm.org/D32215 llvm-svn: 301640	2017-04-28 11:16:30 +00:00
Siddharth Bhat	d277feda91	[PPCGCodeGeneration] Update PPCG Code Generation for OpenCL compatibility Added a small change to the way pointer arguments are set in the kernel code generation. The way the pointer is retrieved now, specifically requests global address space to be annotated. This is necessary, if the IR should be run through NVPTX to generate OpenCL compatible PTX. The changes do not affect the PTX Strings generated for the CUDA target (nvptx64-nvidia-cuda), but are necessary for OpenCL (nvptx64-nvidia-nvcl). Additionally, the data layout has been updated to what the NVPTX Backend requests/recommends. Contributed-by: Philipp Schaad Reviewers: Meinersbur, grosser, bollu Reviewed By: grosser, bollu Subscribers: jlebar, pollydev, llvm-commits, nemanjai, yaxunl, Anastasia Tags: #polly Differential Revision: https://reviews.llvm.org/D32215 llvm-svn: 301299	2017-04-25 08:08:29 +00:00
Tobias Grosser	7b5a4dfd46	Exploit BasicBlock::getModule to shorten code Suggested-by: Roman Gareev <gareevroman@gmail.com> llvm-svn: 299914	2017-04-11 04:59:13 +00:00
Tobias Grosser	67726b3260	SAdjust to recent change in constructor definition of AllocaInst llvm-svn: 299913	2017-04-11 04:23:38 +00:00
Philip Pfaffe	2d950f36ee	[Polly][NewPM] Pull references to the legacy PM interface from utilities and helpers Summary: A couple of the utilities used to analyze or build IR make explicit use of the legacy PM on their interface, to access analysis results. This patch removes the legacy PM from the interface, and just passes the required results directly. This shouldn't introduce any function changes, although the API technically allowed to obtain two different analysis results before, one passed by reference and one through the PM. I don't believe that was ever intended, however. Reviewers: grosser, Meinersbur Reviewed By: grosser Subscribers: nemanjai, pollydev, llvm-commits Tags: #polly Differential Revision: https://reviews.llvm.org/D31653 llvm-svn: 299423	2017-04-04 10:01:53 +00:00
Tobias Grosser	de244eb450	Possible error in doc comment If a SCoP is most probably sequential, then it's better to run it on a CPU. Hence, there's no point in running it on a GPU. Reviewers: grosser Subscribers: nemanjai Tags: #polly Contributed-by: Singapuram Sanjay <singapuram.sanjay@gmail.com> Differential Revision: https://reviews.llvm.org/D30864 llvm-svn: 297578	2017-03-12 08:19:01 +00:00
Tobias Grosser	24222c7357	Fix namespaces after clang-format update llvm-svn: 296635	2017-03-01 15:54:27 +00:00
Michael Kruse	52ab4943b4	Remove all references to PostDominators. NFC. Marking a pass as preserved is necessary if any Polly pass uses it, even if it is not preserved within the generated code. Not marking it would cause the the Polly pass chain to be interrupted. It is not used by any Polly pass anymore, hence we can remove all references to it. llvm-svn: 295983	2017-02-23 15:16:22 +00:00
Tobias Grosser	ff40087a6a	Update to recent formatting changes llvm-svn: 293756	2017-02-01 10:12:09 +00:00
Tobias Grosser	587f1f57ad	[Polly] [BlockGenerator] Unify ScalarMap and PhiOpsMap Instead of keeping two separate maps from Value to Allocas, one for MemoryType::Value and the other for MemoryType::PHI, we introduce a single map from ScopArrayInfo to the corresponding Alloca. This change is intended, both as a general simplification and cleanup, but also to reduce our use of MemoryAccess::getBaseAddr(). Moving away from using getBaseAddr() makes sure we have only a single place where the array (and its base pointer) for which we generate code for is specified, which means we can more easily introduce new access functions that use a different ScopArrayInfo as base. We already today experiment with modifiable access functions, so this change does not address a specific bug, but it just reduces the scope one needs to reason about. Another motivation for this patch is https://reviews.llvm.org/D28518, where memory accesses with different base pointers could possibly be mapped to a single ScopArrayInfo object. Such a mapping is currently not possible, as we currently generate alloca instructions according to the base addresses of the memory accesses, not according to the ScopArrayInfo object they belong to. By making allocas ScopArrayInfo specific, a mapping to a single ScopArrayInfo object will automatically mean that the same stack slot is used for these arrays. For D28518 this is not a problem, as only MemoryType::Array objects are mapping, but resolving this inconsistency will hopefully avoid confusion. llvm-svn: 293374	2017-01-28 07:42:10 +00:00
Tobias Grosser	4d5a917287	Use typed enums to model MemoryKind and move MemoryKind out of ScopArrayInfo To benefit of the type safety guarantees of C++11 typed enums, which would have caught the type mismatch fixed in r291960, we make MemoryKind a typed enum. This change also allows us to drop the 'MK_' prefix and to instead use the more descriptive full name of the enum as prefix. To reduce the amount of typing needed, we use this opportunity to move MemoryKind from ScopArrayInfo to a global scope, which means the ScopArrayInfo:: prefix is not needed. This move also makes historically sense. In the beginning of Polly we had different MemoryKind enums in both MemoryAccess and ScopArrayInfo, which were later canonicalized to one. During this canonicalization we just choose the enum in ScopArrayInfo, but did not consider to move this shared enum to global scope. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D28090 llvm-svn: 292030	2017-01-14 20:25:44 +00:00
Tobias Grosser	e29db2173b	Update to recent clang-format changes llvm-svn: 291810	2017-01-12 21:05:19 +00:00
Tobias Grosser	df8f35b7b8	Update for clang-format change in r288119 llvm-svn: 288134	2016-11-29 12:52:08 +00:00
Eli Friedman	acf8006471	[Polly CodeGen] Break critical edge from RTC to original loop. This makes polly generate a CFG which is closer to what we want in LLVM IR, with a loop preheader for the original loop. This is just a cleanup, but it exposes some fragile assumptions. I'm not completely happy with the changes related to expandCodeFor; RTCBB->getTerminator() is basically a random insertion point which happens to work due to the way we generate runtime checks. I'm not sure what the right answer looks like, though. Differential Revision: https://reviews.llvm.org/D26053 llvm-svn: 285864	2016-11-02 22:32:23 +00:00
Tobias Grosser	bc653f2031	GPGPU: Do not run mostly sequential kernels in GPU In case sequential kernels are found deeper in the loop tree than any parallel kernel, the overall scop is probably mostly sequential. Hence, run it on the CPU. llvm-svn: 281849	2016-09-18 08:31:09 +00:00
Tobias Grosser	82f2af3508	GPGPU: Dynamically ensure 'sufficient compute' Offloading to a GPU is only beneficial if there is a sufficient amount of compute that can be accelerated. Many kernels just have a very small number of dynamic compute, which means GPU acceleration is not beneficial. We compute at run-time an approximation of how many dynamic instructions will be executed and fall back to CPU code in case this number is not sufficiently large. To keep the run-time checking code simple, we over-approximate the number of instructions executed in each statement by computing the volume of the rectangular hull of its iteration space. llvm-svn: 281848	2016-09-18 06:50:35 +00:00
Tobias Grosser	51dfc27589	GPGPU: Store back non-read-only scalars We may generate GPU kernels that store into scalars in case we run some sequential code on the GPU because the remaining data is expected to already be on the GPU. For these kernels it is important to not keep the scalar values in thread-local registers, but to store them back to the corresponding device memory objects that backs them up. We currently only store scalars back at the end of a kernel. This is only correct if precisely one thread is executed. In case more than one thread may be run, we currently invalidate the scop. To support such cases correctly, we would need to always load and store back from a corresponding global memory slot instead of a thread-local alloca slot. llvm-svn: 281838	2016-09-17 19:22:31 +00:00
Tobias Grosser	fe74a7a1f5	GPGPU: Detect read-only scalar arrays ... and pass these by value rather than by reference. llvm-svn: 281837	2016-09-17 19:22:18 +00:00
Tobias Grosser	aaabbbf886	GPGPU: Do not assume arrays start at 0 Our alias checks precisely check that the minimal and maximal accessed elements do not overlap in a kernel. Hence, we must ensure that our host <-> device transfers do not touch additional memory locations that are not covered in the alias check. To ensure this, we make sure that the data we copy for a given array is only the data from the smallest element accessed to the largest element accessed. We also adjust the size of the array according to the offset at which the array is actually accessed. An interesting result of this is: In case array are accessed with negative subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to cover the full array. This is important as such code indeed exists in the wild. llvm-svn: 281611	2016-09-15 14:05:58 +00:00
Tobias Grosser	0a893f7df4	GPGPU: Use const_cast to avoid compiler warning [NFC] llvm-svn: 281333	2016-09-13 13:22:27 +00:00
Tobias Grosser	a82c4b5df8	GPGPU: Allow region statements llvm-svn: 281305	2016-09-13 08:42:10 +00:00
Tobias Grosser	b79f4d3970	GPGPU: Extend types when array sizes have smaller types This prevents a compiler crash. llvm-svn: 281303	2016-09-13 08:02:14 +00:00
Roman Gareev	f5aff70405	Store the size of the outermost dimension in case of newly created arrays that require memory allocation. We do not need the size of the outermost dimension in most cases, but if we allocate memory for newly created arrays, that size is needed. Reviewed-by: Michael Kruse <llvm@meinersbur.de> Differential Revision: https://reviews.llvm.org/D23991 llvm-svn: 281234	2016-09-12 17:08:31 +00:00
Tobias Grosser	5857b701a3	GPGPU: Bail out gracefully in case of invalid IR Instead of aborting, we now bail out gracefully in case the kernel IR we generate is invalid. This can currently happen in case the SCoP stores pointer values, which we model as arrays, as data values into other arrays. In this case, the original pointer value is not available on the device and can consequently not be stored. As detecting this ahead of time is not so easy, we detect these situations after the invalid IR has been generated and bail out. llvm-svn: 281193	2016-09-12 06:06:31 +00:00
Tobias Grosser	02293ed755	GPGPU: Do not fail in case of arrays never accessed If these arrays have never been accessed we failed to derive an upper bound of the accesses and consequently a size for the outermost dimension. We now explicitly check for empty access sets and then just use zero as size for the outermost dimension. llvm-svn: 281165	2016-09-11 13:30:12 +00:00
Tobias Grosser	d58acf866a	[GPGPU] Ensure arrays where only parts are modified are copied to GPU To do so we change the way array exents are computed. Instead of the precise set of memory locations accessed, we now compute the extent as the range between minimal and maximal address in the first dimension and the full extent defined by the sizes of the inner array dimensions. We also move the computation of the may_persist region after the construction of the arrays, as it relies on array information. Without arrays being constructed no useful information is computed at all. llvm-svn: 278212	2016-08-10 10:58:19 +00:00
Tobias Grosser	b06ff4574e	[GPGPU] Support PHI nodes used in GPU kernel Ensure the right scalar allocations are used as the host location of data transfers. For the device code, we clear the allocation cache before device code generation to be able to generate new device-specific allocation and we need to make sure to add back the old host allocations as soon as the device code generation is finished. llvm-svn: 278126	2016-08-09 15:35:06 +00:00
Tobias Grosser	750160e260	[GPGPU] Use separate basic block for GPU initialization code This increases the readability of the IR and also clarifies that the GPU inititialization is executed _after_ the scalar initialization which needs to before the code of the transformed scop is executed. Besides increased readability, the IR should not change. Specifically, I do not expect any changes in program semantics due to this patch. llvm-svn: 278125	2016-08-09 15:35:03 +00:00
Tobias Grosser	cf66ef26f3	[GPGPU] Pass parameters always by using their own type llvm-svn: 278100	2016-08-09 07:22:08 +00:00
Tobias Grosser	124534038a	[GPGPU] Support Values referenced from both isl expr and llvm instructions When adding code that avoids to pass values used in isl expressions and LLVM instructions twice, we forgot to make single variable passed to the kernel available in the ValueMap that makes it usable for instructions that are not replaced with isl ast expressions. This change adds the variable that is passed to the kernel to the ValueMap to ensure it is available for such use cases as well. llvm-svn: 278039	2016-08-08 19:22:19 +00:00
Tobias Grosser	cb1aef8de4	[GPGPU] Create code to verify run-time conditions llvm-svn: 278026	2016-08-08 17:35:55 +00:00
Tobias Grosser	928d7573dd	GPGPU: Sort dimension sizes of multi-dimensional shared memory arrays correctly Before this commit we generated the array type in reverse order and we also added the outermost dimension size to the new array declaration, which is incorrect as Polly additionally assumed an additional unsized outermost dimension, such that we had an off-by-one error in the linearization of access expressions. llvm-svn: 277802	2016-08-05 08:27:24 +00:00
Tobias Grosser	c1c6a2a61b	GPGPU: Add cuda annotations to specify maximal number of threads per block These annotations ensure that the NVIDIA PTX assembler limits the number of registers used such that we can be certain the resulting kernel can be executed for the number of threads in a thread block that we are planning to use. llvm-svn: 277799	2016-08-05 06:47:43 +00:00
Tobias Grosser	f919d8b360	GPGPU: Support scalars that are mapped to shared memory llvm-svn: 277726	2016-08-04 13:57:29 +00:00
Tobias Grosser	8950cead7f	GPGPU: Disable verbose debug output llvm-svn: 277724	2016-08-04 12:44:03 +00:00
Tobias Grosser	b0dd95bcd2	Remove leftover debug output llvm-svn: 277723	2016-08-04 12:41:28 +00:00
Tobias Grosser	130ca30f92	GPGPU: Add private memory support llvm-svn: 277722	2016-08-04 12:39:03 +00:00
Tobias Grosser	b513b4916b	GPGPU: Add support for shared memory llvm-svn: 277721	2016-08-04 12:18:14 +00:00
Tobias Grosser	00bb5a99f5	GPGPU: Handle scalar array references Pass the content of scalar array references to the alloca on the kernel side and do not pass them additional as normal LLVM scalar value. llvm-svn: 277699	2016-08-04 06:55:59 +00:00
Tobias Grosser	576932728d	GPGPU: Pass subtree values correctly to the kernel llvm-svn: 277697	2016-08-04 06:55:49 +00:00
Tobias Grosser	629109b633	GPGPU: Mark kernel functions as polly.skip Otherwise, we would try to re-optimize them with Polly-ACC and possibly even generate kernels that try to offload themselves, which does not work as the GPURuntime is not available on the accelerator and also does not make any sense. llvm-svn: 277589	2016-08-03 12:00:07 +00:00
Roman Gareev	d7754a1245	Extend the jscop interface to allow the user to declare new arrays and to reference these arrays from access expressions Extend the jscop interface to allow the user to export arrays. It is required that already existing arrays of the list of arrays correspond to arrays of the SCoP. Each array that is appended to the list will be newly created. Furthermore, we allow the user to modify access expressions to reference any array in case it has the same element type. Reviewed-by: Tobias Grosser <tobias@grosser.es> Differential Revision: https://reviews.llvm.org/D22828 llvm-svn: 277263	2016-07-30 09:25:51 +00:00
Tobias Grosser	d8b94bcac1	GPGPU: Pass context parameters to GPU kernel llvm-svn: 276963	2016-07-28 06:47:59 +00:00
Tobias Grosser	a490147c90	GPGPU: Pass host iterators to kernel llvm-svn: 276962	2016-07-28 06:47:56 +00:00
Tobias Grosser	44143bb927	GPGPU: use current 'Index' to find slot in parameter array Before this change we used the array index, which would result in us accessing the parameter array out-of-bounds. This bug was visible for test cases where not all arrays in a scop are passed to a given kernel. llvm-svn: 276961	2016-07-28 06:47:53 +00:00
Tobias Grosser	4e18d71c71	GPGPU: Generate kernel parameter allocation with right size Before this change we miscounted the number of function parameters. llvm-svn: 276960	2016-07-28 06:47:50 +00:00
Tobias Grosser	79a947c233	GPGPU: Add basic support for kernel launches llvm-svn: 276863	2016-07-27 13:20:16 +00:00
Tobias Grosser	5779359624	GPGPU: Load GPU kernels We embed the PTX code into the host IR as a global variable and compile it at run-time into a GPU kernel. llvm-svn: 276645	2016-07-25 16:31:21 +00:00
Tobias Grosser	13c78e4d51	GPGPU: Emit data-transfer code Also factor out getArraySize() to avoid code dupliciation and reorder some function arguments to indicate the direction into which data is transferred. llvm-svn: 276636	2016-07-25 12:47:39 +00:00
Tobias Grosser	7287aeddf1	GPGPU: Complete code to allocate and free device arrays At the beginning of each SCoP, we allocate device arrays for all arrays used on the GPU and we free such arrays after the SCoP has been executed. llvm-svn: 276635	2016-07-25 12:47:33 +00:00
Tobias Grosser	fa7b080218	GPGPU: initialize GPU context and simplify the corresponding GPURuntime interface. There is no need to expose the selected device at the moment. We also pass back pointers as return values, as this simplifies the interface. llvm-svn: 276623	2016-07-25 09:16:01 +00:00
Tobias Grosser	8ed5e5999f	IslNodeBuilder: Make finalize() virtual This allows the finalization routine of the IslNodeBuilder to be overwritten by derived classes. Being here, we also drop the unnecessary 'Scop' postfix and the unnecessary 'Scop' parameter. llvm-svn: 276622	2016-07-25 09:15:57 +00:00
Tobias Grosser	9a18d55947	GPGPU: Optimize kernel IR before generating assembly code We optimize the kernel _after_ dumping the IR we generate to make the IR we dump easier readable and independent of possible changes in the general purpose LLVM optimizers. llvm-svn: 276551	2016-07-24 06:43:21 +00:00
Tobias Grosser	e1a98343a1	GPGPU: Verify kernel IR before generating assembly llvm-svn: 276550	2016-07-24 06:43:17 +00:00
Tobias Grosser	74dc3cb431	GPGPU: Generate PTX assembly code for the kernel modules Run the NVPTX backend over the GPUModule IR and write the resulting assembly code in a string. To work correctly, it is important to invalidate analysis results that still reference the IR in the kernel module. Hence, this change clears all references to dominators, loop info, and scalar evolution. Finally, the NVPTX backend has troubles to generate code for various special floating point types (not surprising), but also for uncommon integer types. This commit does not resolve these issues, but pulls out problematic test cases into separate files to XFAIL them individually and resolve them in future (not immediate) changes one by one. llvm-svn: 276396	2016-07-22 07:11:12 +00:00
Tobias Grosser	edb885cb12	GPGPU: generate code for ScopStatements This change introduces the actual compute code in the GPU kernels. To ensure all values referenced from the statements in the GPU kernel are indeed available we scan all ScopStmts in the GPU kernel for references to llvm::Values that are not yet covered by already modeled outer loop iterators, parameters, or array base pointers and also pass these additional llvm::Values to the GPU kernel. For arrays used in the GPU kernel we introduce a new ScopArrayInfo object, which is referenced by the newly generated access functions within the GPU kernel and which is used to help with code generation. llvm-svn: 276270	2016-07-21 13:15:59 +00:00
Tobias Grosser	2d58a64e7f	GPGPU: Bail out of scops with hoisted invariant loads This is currently not supported and will only be added later. Also update the test cases to ensure no invariant code hoisting is applied. llvm-svn: 275987	2016-07-19 15:56:25 +00:00
Tobias Grosser	5260c041ea	GPGPU: Emit in-kernel synchronization statements We use this opportunity to further classify the different user statements that can arise and add TODOs for the ones not yet implemented. llvm-svn: 275957	2016-07-19 07:33:16 +00:00
Tobias Grosser	59ab070523	GPGPU: generate control flow within the kernel llvm-svn: 275956	2016-07-19 07:33:11 +00:00
Tobias Grosser	c84a1995fe	GPGPU: add scop parameters to kernel arguments llvm-svn: 275955	2016-07-19 07:33:06 +00:00
Tobias Grosser	f6044bd0ef	GPGPU: add host iterators to kernel arguments llvm-svn: 275954	2016-07-19 07:32:55 +00:00
Tobias Grosser	472f9654c8	GPGPU: add intrinsic functions to obtain a kernels thread and block ids llvm-svn: 275953	2016-07-19 07:32:44 +00:00
Tobias Grosser	32837fe313	GPGPU: create kernel function skeleton Create for each kernel a separate LLVM-IR module containing a single function marked as kernel function and taking one pointer for each array referenced by this kernel. Add debugging output to verify the kernels are generated correctly. llvm-svn: 275952	2016-07-19 07:32:38 +00:00
Tobias Grosser	b9fc860a57	GPGPU: collect array references Initialize the list of references to a GPU array to ensure that the arrays that need to be passed to kernel calls are computed correctly. Furthermore, the very same information is also necessary to compute synchronization correctly. As the functionality to compute these references is already available, what is left for us to do is only to connect the necessary functionality to compute array reference information. llvm-svn: 275798	2016-07-18 15:44:32 +00:00
Tobias Grosser	1fb9b64dc0	GPGPU: Pull implementation out of class definition This will allow us to see the full class definition even after we add non-trivial implementations of the different member functions. llvm-svn: 275797	2016-07-18 15:44:25 +00:00
Tobias Grosser	38fc0aed08	GPGPU: Create host control flow Create LLVM-IR for all host-side control flow of a given GPU AST. We implement this by introducing a new GPUNodeBuilder class derived from IslNodeBuilder. The IslNodeBuilder will take care of generating all general-purpose ast nodes, but we provide our own createUser implementation to handle the different GPU specific user statements. For now, we just skip any user statement and only generate a host-code sceleton, but in subsequent commits we will add handling of normal ScopStmt's performing computations, kernel calls, as well as host-device data transfers. We will also introduce run-time check generation and LICM in subsequent commits. llvm-svn: 275783	2016-07-18 11:56:39 +00:00
Tobias Grosser	2025173494	GPGPU: Format statements scheduled on the host ourselves Otherwise ppcg would try to call into pet functionality that this not available, which obviously will cause trouble. As we can easily print these statements ourselves, we just do so. llvm-svn: 275579	2016-07-15 17:12:41 +00:00
Tobias Grosser	2341fe9e76	GPGPU: Use schedule whole components for scheduler This option increases the scalability of the scheduler and allows us to remove the 'gisting' workaround we introduced in r275565 to handle a more complicated test case. Another benefit of using this option is also that the generated code looks a lot more streamlined. Thanks to Sven Verdoolaege for reminding me of this option. llvm-svn: 275573	2016-07-15 16:15:47 +00:00
Tobias Grosser	e4725437e8	GPGPU: Drop domain constraints from flow dependences This works around a shortcoming of the isl scheduler, which even for some smaller test cases does not terminate in case domain constraints are part of the flow dependences. llvm-svn: 275565	2016-07-15 14:43:04 +00:00
Tobias Grosser	6293ba6973	GPGPU: Add memory reference tag ids to tagged accesses It seems we forgot to actually add the memory access ids to the tagged accesses, but instead just tagged the accesses with empty isl_ids. This issue was found by inspection and without code generation it is difficult to test just by itself. We fix it for now without test case and expect our code generation tests to cover this later on. llvm-svn: 275557	2016-07-15 12:44:27 +00:00
Tobias Grosser	2d010daf85	GPGPU: Make sure scops with more than one array work We use this opportunity to add a test case containing a scalar parameter. llvm-svn: 275547	2016-07-15 10:51:14 +00:00
Tobias Grosser	b307ed4d08	GPGPU: Free options to avoid memory leak ppcg does not free the option structs for us. To avoid a memory leak we do this ourselves. llvm-svn: 275546	2016-07-15 10:32:22 +00:00
Tobias Grosser	a56f8f8e58	GPGPU: Shorten ppcg include paths to avoid conflict with cuda.h Instead of directly linking to ppcg's main source directory, we link to the parent director. This allows us to access ppcg's include files with 'ppcg/cuda.h' and avoids a conflict with NVIDIA's cuda.h header. Also drop an include directory that is currently not used. llvm-svn: 275536	2016-07-15 07:50:36 +00:00
Tobias Grosser	60f63b49f2	GPGPU: Model array access information This allows us to derive host-device and device-host data-transfers. llvm-svn: 275535	2016-07-15 07:05:54 +00:00
Tobias Grosser	69b4675180	GPGPU: Generate an AST for the GPU-mapped schedule For this we need to provide an explicit list of statements as they occur in the polly::Scop to ppcg. We also setup basic AST printing facilities to facilitate debugging. To allow code reuse some (minor) changes in ppcg are have been necessary. llvm-svn: 275436	2016-07-14 15:51:37 +00:00
Tobias Grosser	60c6002570	GPGPU: Add dummy implementation for ast expression construction Instead of calling to a pet function that does not return anything, we pass our own dummy implementation to ppcg that always returns a nullptr. This ensures that the list of ast expressions always contains a nullptr and we do not accidentally free a random (uninitalized) pointer. This resolves the last valgrind warning we see. We provide an implementation for this function, when the generated AST expressions can be used and consequently can be tested. llvm-svn: 275435	2016-07-14 15:51:32 +00:00
Tobias Grosser	4eaedde530	GPGPU: Use a tile size of 32 by default The tile size was previously uninitialized. As a result, it was often zero (aka. no tiling), which is not what we want in general. More importantly, there was the risk for arbitrary tile sizes to be choosen, which we did not observe, but which still is highly problematic. llvm-svn: 275418	2016-07-14 14:14:02 +00:00
Tobias Grosser	bd81a7eebc	Fix formatting llvm-svn: 275397	2016-07-14 10:53:00 +00:00
Tobias Grosser	aef5196f75	GPGPU: Map initial schedule to GPU schedule This change now applies ppcg's GPU mapping on our initial schedule. For this to work, we need to also initialize the set of all names (isl_ids) used in the scop as well as the program context. llvm-svn: 275396	2016-07-14 10:51:52 +00:00
Tobias Grosser	681bd5688f	GPGPU: Do not dump schedule by default llvm-svn: 275395	2016-07-14 10:51:47 +00:00
Tobias Grosser	f384594d5e	GPGPU: compute new schedule from polly scop To do so we copy the necessary information to compute an initial schedule from polly::Scop to ppcg's scop. Most of the necessary information is directly available and only needs to be passed on to ppcg, with the exception of 'tagged' access relations, access relations that additionally carry information about which memory access an access relation originates from. We could possibly perform the construction of tagged accesses as part of ScopInfo, but as this format is currently specific to ppcg we do not do this yet, but keep this functionality local to our GPU code generation. After the scop has been initialized, we compute data dependences and ask ppcg to compute an initial schedule. Some of this functionality is already available in polly::DependenceInfo and polly::ScheduleOptimizer, but to keep differences to ppcg small we use ppcg's functionality here. We may later investiage if a closer integration of these tools makes sense. llvm-svn: 275390	2016-07-14 10:22:25 +00:00
Tobias Grosser	e938517e37	GPGPU: create default initialized PPCG scop and gpu program At this stage, we do not yet modify the IR but just generate a default initialized ppcg_scop and gpu_prog and free both immediately. Both will later be filled with data from the polly::Scop and are needed to use PPCG for GPU schedule generation. This commit does not yet perform any GPU code generation, but ensures that the basic infrastructure has been put in place. We also add a simple test case to ensure the new code is run and use this opportunity to verify that GPU_CODEGEN tests are only run if GPU code generation has been enabled in cmake. llvm-svn: 275389	2016-07-14 10:22:19 +00:00
Tobias Grosser	9dfe4e7c05	Add accelerator code generation pass skeleton Add a new pass to serve as basis for automatic accelerator mapping in Polly. The pass structure and the analyses preserved are copied from CodeGeneration.cpp, as we will rely on IslNodeBuilder and IslExprBuilder for LLVM-IR code generation. Polly's accelerator code generation is enabled with -polly-target=gpu I would like to use this commit as opportunity to thank Yabin Hu for his work in the context of two Google summer of code projects during which he implemented initial prototypes of the Polly accelerator code generation -- in parts this code is already available in todays Polly (e.g., tools/GPURuntime). More will come as part of the upcoming Polly ACC changes. Reviewers: Meinersbur Subscribers: pollydev, llvm-commits Differential Revision: http://reviews.llvm.org/D22036 llvm-svn: 275275	2016-07-13 15:54:58 +00:00

1 2 3 4 5

202 Commits