Commit Graph

1839 Commits

Author SHA1 Message Date
Tobias Grosser 2219d15748 Fix a couple of spelling mistakes
llvm-svn: 277569
2016-08-03 05:28:09 +00:00
Roman Gareev d7754a1245 Extend the jscop interface to allow the user to declare new arrays and to reference these arrays from access expressions
Extend the jscop interface to allow the user to export arrays. It is required
that already existing arrays of the list of arrays correspond to arrays
of the SCoP. Each array that is appended to the list will be newly created.
Furthermore, we allow the user to modify access expressions to reference
any array in case it has the same element type.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D22828

llvm-svn: 277263
2016-07-30 09:25:51 +00:00
Tobias Grosser d8b94bcac1 GPGPU: Pass context parameters to GPU kernel
llvm-svn: 276963
2016-07-28 06:47:59 +00:00
Tobias Grosser a490147c90 GPGPU: Pass host iterators to kernel
llvm-svn: 276962
2016-07-28 06:47:56 +00:00
Tobias Grosser 44143bb927 GPGPU: use current 'Index' to find slot in parameter array
Before this change we used the array index, which would result in us accessing
the parameter array out-of-bounds. This bug was visible for test cases where not
all arrays in a scop are passed to a given kernel.

llvm-svn: 276961
2016-07-28 06:47:53 +00:00
Tobias Grosser 4e18d71c71 GPGPU: Generate kernel parameter allocation with right size
Before this change we miscounted the number of function parameters.

llvm-svn: 276960
2016-07-28 06:47:50 +00:00
Tobias Grosser 79a947c233 GPGPU: Add basic support for kernel launches
llvm-svn: 276863
2016-07-27 13:20:16 +00:00
Tobias Grosser 5779359624 GPGPU: Load GPU kernels
We embed the PTX code into the host IR as a global variable and compile it
at run-time into a GPU kernel.

llvm-svn: 276645
2016-07-25 16:31:21 +00:00
Johannes Doerfert 8031238017 [GSoC] Add PolyhedralInfo pass - new interface to polly analysis
Adding a new pass PolyhedralInfo. This pass will be the interface to Polly.
  Initially, we will provide the following interface:
    - #IsParallel(Loop *L) - return a bool depending on whether the loop is
                             parallel or not for the given program order.

Patch by Utpal Bora <cs14mtech11017@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D21486

llvm-svn: 276637
2016-07-25 12:48:45 +00:00
Tobias Grosser 13c78e4d51 GPGPU: Emit data-transfer code
Also factor out getArraySize() to avoid code dupliciation and reorder some
function arguments to indicate the direction into which data is transferred.

llvm-svn: 276636
2016-07-25 12:47:39 +00:00
Tobias Grosser 7287aeddf1 GPGPU: Complete code to allocate and free device arrays
At the beginning of each SCoP, we allocate device arrays for all arrays
used on the GPU and we free such arrays after the SCoP has been executed.

llvm-svn: 276635
2016-07-25 12:47:33 +00:00
Johannes Doerfert 3b7ac0a691 [GSoC] Do not process SCoPs with infeasible runtime context
Do not process SCoPs with infeasible runtime context in the new
  ScopInfoWrapperPass. Do not compute dependences for such SCoPs in the new
  DependenceInfoWrapperPass.

Patch by Utpal Bora <cs14mtech11017@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D22402

llvm-svn: 276631
2016-07-25 12:40:59 +00:00
Roman Gareev 3a18a931a8 Apply all necessary tilings and interchangings to get a macro-kernel
This is the second patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus
two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update. In this change
we create the BLIS macro-kernel by applying a combination of tiling
and interchanging. In subsequent changes we will implement the packing
transformation.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D21491

llvm-svn: 276627
2016-07-25 09:42:53 +00:00
Tobias Grosser fa7b080218 GPGPU: initialize GPU context and simplify the corresponding GPURuntime interface.
There is no need to expose the selected device at the moment. We also pass back
pointers as return values, as this simplifies the interface.

llvm-svn: 276623
2016-07-25 09:16:01 +00:00
Tobias Grosser 8ed5e5999f IslNodeBuilder: Make finalize() virtual
This allows the finalization routine of the IslNodeBuilder to be overwritten
by derived classes. Being here, we also drop the unnecessary 'Scop' postfix
and the unnecessary 'Scop' parameter.

llvm-svn: 276622
2016-07-25 09:15:57 +00:00
Roman Gareev 2cb4d133f5 [NFC] Refactor creation of the BLIS mirco-kernel and improve documentation
Reviewed-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 276616
2016-07-25 07:27:59 +00:00
Tobias Grosser 9a18d55947 GPGPU: Optimize kernel IR before generating assembly code
We optimize the kernel _after_ dumping the IR we generate to make the IR we
dump easier readable and independent of possible changes in the general
purpose LLVM optimizers.

llvm-svn: 276551
2016-07-24 06:43:21 +00:00
Tobias Grosser e1a98343a1 GPGPU: Verify kernel IR before generating assembly
llvm-svn: 276550
2016-07-24 06:43:17 +00:00
Michael Kruse 977d38bd87 Remove unused parameters from simplifySCoP(). NFC.
llvm-svn: 276444
2016-07-22 17:31:17 +00:00
Tobias Grosser 74dc3cb431 GPGPU: Generate PTX assembly code for the kernel modules
Run the NVPTX backend over the GPUModule IR and write the resulting assembly
code in a string.

To work correctly, it is important to invalidate analysis results that still
reference the IR in the kernel module. Hence, this change clears all references
to dominators, loop info, and scalar evolution.

Finally, the NVPTX backend has troubles to generate code for various special
floating point types (not surprising), but also for uncommon integer types. This
commit does not resolve these issues, but pulls out problematic test cases into
separate files to XFAIL them individually and resolve them in future (not
immediate) changes one by one.

llvm-svn: 276396
2016-07-22 07:11:12 +00:00
Tobias Grosser edb885cb12 GPGPU: generate code for ScopStatements
This change introduces the actual compute code in the GPU kernels. To ensure
all values referenced from the statements in the GPU kernel are indeed available
we scan all ScopStmts in the GPU kernel for references to llvm::Values that
are not yet covered by already modeled outer loop iterators, parameters, or
array base pointers and also pass these additional llvm::Values to the
GPU kernel.

For arrays used in the GPU kernel we introduce a new ScopArrayInfo object, which
is referenced by the newly generated access functions within the GPU kernel and
which is used to help with code generation.

llvm-svn: 276270
2016-07-21 13:15:59 +00:00
Tobias Grosser 86083da0ec IslNodeBuilder: expose addReferencesFromStmt [NFC]
This will be used by Polly GPGPU to determine the values that need to be
passed to GPU kernels.

llvm-svn: 276269
2016-07-21 13:15:55 +00:00
Tobias Grosser 04b909fcca IslExprBuilder: allow to specify an external isl_id to ScopArrayInfo mapping
This is useful for external users using IslExprBuilder, in case they cannot
embed ScopArrayInfo data into their isl_ids, because the isl_ids either already
carry other information or the isl_ids have been created and their user pointers
cannot be updated any more.

llvm-svn: 276268
2016-07-21 13:15:51 +00:00
Tobias Grosser 9d12d8ade3 BlockGenerator: remove dead instructions in normal statements
This ensures that no trivially dead code is generated. This is not only cleaner,
but also avoids troubles in case code is generated in a separate function and
some of this dead code contains references to values that are not available.
This issue may happen, in case the memory access functions have been updated
and old getelementptr instructions remain in the code. With normal Polly,
a test case is difficult to draft, but the upcoming GPU code generation can
possibly trigger such problems. We will later extend this dead-code elimination
to region and vector statements.

llvm-svn: 276263
2016-07-21 11:48:36 +00:00
Tobias Grosser 9ea152714a JScop: Factor out importContext [NFC]
This makes the structure of the code clearer and reduces the size of runOnScop.

We also adjust the coding style to the latest LLVM style guide.

llvm-svn: 276246
2016-07-21 06:56:33 +00:00
Tobias Grosser dbe34f7c58 JScop: Factor out importContext [NFC]
This makes the structure of the code clearer and reduces the size of runOnScop.

We also adjust the coding style to the latest LLVM style guide.

llvm-svn: 276245
2016-07-21 06:56:31 +00:00
Tobias Grosser c602d3bc84 JScop: Factor out importSchedule [NFC]
This makes the structure of the code clearer and reduces the size of runOnScop.

We also adjust the coding style to the latest LLVM style guide.

llvm-svn: 276244
2016-07-21 06:56:28 +00:00
Tobias Grosser 9ec4f95234 Update isl to isl-0.17.1-191-g540b2fd
This update resolves a bug in computing lexicographic minima/maxima.

llvm-svn: 276138
2016-07-20 16:53:07 +00:00
Tobias Grosser f533571fd2 Update isl to isl-0.17.1-171-g233f589
This fixes an issue with equality detection that resulted in an assertion
being triggered during coalescing.

llvm-svn: 276094
2016-07-20 07:52:42 +00:00
Tobias Grosser 2d58a64e7f GPGPU: Bail out of scops with hoisted invariant loads
This is currently not supported and will only be added later. Also update the
test cases to ensure no invariant code hoisting is applied.

llvm-svn: 275987
2016-07-19 15:56:25 +00:00
Tobias Grosser 22117a8913 GPGPU: Disable invariant load hoisting for GPU code generation
This simplifies the upcoming patches to add code generation for ScopStmts. Load
hoisting support will later be added in a separate commit. This commit will
be implicitly tested by the subsequent GPGPU changes.

llvm-svn: 275969
2016-07-19 11:13:58 +00:00
Tobias Grosser 5260c041ea GPGPU: Emit in-kernel synchronization statements
We use this opportunity to further classify the different user statements that
can arise and add TODOs for the ones not yet implemented.

llvm-svn: 275957
2016-07-19 07:33:16 +00:00
Tobias Grosser 59ab070523 GPGPU: generate control flow within the kernel
llvm-svn: 275956
2016-07-19 07:33:11 +00:00
Tobias Grosser c84a1995fe GPGPU: add scop parameters to kernel arguments
llvm-svn: 275955
2016-07-19 07:33:06 +00:00
Tobias Grosser f6044bd0ef GPGPU: add host iterators to kernel arguments
llvm-svn: 275954
2016-07-19 07:32:55 +00:00
Tobias Grosser 472f9654c8 GPGPU: add intrinsic functions to obtain a kernels thread and block ids
llvm-svn: 275953
2016-07-19 07:32:44 +00:00
Tobias Grosser 32837fe313 GPGPU: create kernel function skeleton
Create for each kernel a separate LLVM-IR module containing a single function
marked as kernel function and taking one pointer for each array referenced
by this kernel. Add debugging output to verify the kernels are generated
correctly.

llvm-svn: 275952
2016-07-19 07:32:38 +00:00
Tobias Grosser b9fc860a57 GPGPU: collect array references
Initialize the list of references to a GPU array to ensure that the arrays that
need to be passed to kernel calls are computed correctly.  Furthermore, the very
same information is also necessary to compute synchronization correctly. As the
functionality to compute these references is already available, what is left for
us to do is only to connect the necessary functionality to compute array
reference information.

llvm-svn: 275798
2016-07-18 15:44:32 +00:00
Tobias Grosser 1fb9b64dc0 GPGPU: Pull implementation out of class definition
This will allow us to see the full class definition even after we add
non-trivial implementations of the different member functions.

llvm-svn: 275797
2016-07-18 15:44:25 +00:00
Tobias Grosser 38fc0aed08 GPGPU: Create host control flow
Create LLVM-IR for all host-side control flow of a given GPU AST. We implement
this by introducing a new GPUNodeBuilder class derived from IslNodeBuilder.  The
IslNodeBuilder will take care of generating all general-purpose ast nodes, but
we provide our own createUser implementation to handle the different GPU
specific user statements. For now, we just skip any user statement and only
generate a host-code sceleton, but in subsequent commits we will add handling of
normal ScopStmt's performing computations, kernel calls, as well as host-device
data transfers. We will also introduce run-time check generation and LICM in
subsequent commits.

llvm-svn: 275783
2016-07-18 11:56:39 +00:00
Tobias Grosser cda19c230c GPGPU: Abort if any dummy function is called
This ensures that accidental calls to these functions will break loadly instead
of corrupting the stack with invalid return values.

These functions have been introduced earlier as replacement of pet and parts of
ppcg which we will never use and consequently have not been imported or compiled
into Polly.

llvm-svn: 275680
2016-07-16 07:30:27 +00:00
Tobias Grosser 2025173494 GPGPU: Format statements scheduled on the host ourselves
Otherwise ppcg would try to call into pet functionality that this not available,
which obviously will cause trouble. As we can easily print these statements
ourselves, we just do so.

llvm-svn: 275579
2016-07-15 17:12:41 +00:00
Tobias Grosser 2341fe9e76 GPGPU: Use schedule whole components for scheduler
This option increases the scalability of the scheduler and allows us to remove
the 'gisting' workaround we introduced in r275565 to handle a more complicated
test case. Another benefit of using this option is also that the generated
code looks a lot more streamlined.

Thanks to Sven Verdoolaege for reminding me of this option.

llvm-svn: 275573
2016-07-15 16:15:47 +00:00
Tobias Grosser e4725437e8 GPGPU: Drop domain constraints from flow dependences
This works around a shortcoming of the isl scheduler, which even for some
smaller test cases does not terminate in case domain constraints are part
of the flow dependences.

llvm-svn: 275565
2016-07-15 14:43:04 +00:00
Tobias Grosser 6293ba6973 GPGPU: Add memory reference tag ids to tagged accesses
It seems we forgot to actually add the memory access ids to the tagged accesses,
but instead just tagged the accesses with empty isl_ids. This issue was found
by inspection and without code generation it is difficult to test just by
itself. We fix it for now without test case and expect our code generation
tests to cover this later on.

llvm-svn: 275557
2016-07-15 12:44:27 +00:00
Tobias Grosser cfa0361d35 GPGPU: Do not check for hidden declarations
We do not have them in Polly and the code to check for them is directly
referring to pet data structures which we do not have available.

This commit avoids undefined behavior. As such issues are difficult to
reproduce, this commit comes without a test case.

llvm-svn: 275553
2016-07-15 11:42:53 +00:00
Tobias Grosser 2d010daf85 GPGPU: Make sure scops with more than one array work
We use this opportunity to add a test case containing a scalar parameter.

llvm-svn: 275547
2016-07-15 10:51:14 +00:00
Tobias Grosser b307ed4d08 GPGPU: Free options to avoid memory leak
ppcg does not free the option structs for us. To avoid a memory leak we do this
ourselves.

llvm-svn: 275546
2016-07-15 10:32:22 +00:00
Tobias Grosser a56f8f8e58 GPGPU: Shorten ppcg include paths to avoid conflict with cuda.h
Instead of directly linking to ppcg's main source directory, we link to the
parent director. This allows us to access ppcg's include files with
'ppcg/cuda.h' and avoids a conflict with NVIDIA's cuda.h header.

Also drop an include directory that is currently not used.

llvm-svn: 275536
2016-07-15 07:50:36 +00:00
Tobias Grosser 60f63b49f2 GPGPU: Model array access information
This allows us to derive host-device and device-host data-transfers.

llvm-svn: 275535
2016-07-15 07:05:54 +00:00