Commit Graph

1208 Commits

Author SHA1 Message Date
Tobias Grosser c1cfe0a828 Add missing REQUIRES line
llvm-svn: 309943
2017-08-03 14:46:53 +00:00
Tobias Grosser b5563c6817 Make sure that all parameter dimensions are set in schedule
Summary:
In case the option -polly-ignore-parameter-bounds is set, not all parameters
will be added to context and domains. This is useful to keep the size of the
sets and maps we work with small. Unfortunately, for AST generation it is
necessary to ensure all parameters are part of the schedule tree. Hence,
we modify the GPGPU code generation to make sure this is the case.

To obtain the necessary information we expose a new function
Scop::getFullParamSpace(). We also make a couple of functions const to be
able to make SCoP::getFullParamSpace() const.

Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36243

llvm-svn: 309939
2017-08-03 13:51:15 +00:00
Michael Kruse 291fd8074e [test] Fix test case without Polly-ACC.
llvm-svn: 309938
2017-08-03 13:44:31 +00:00
Siddharth Bhat eadf76d34a [PPCGCodeGeneration] Construct `isl_multi_pw_aff` of PPCGArray.bounds even when polly-ignore-parameter-bounds is turned on.
When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain
all the paramters present in the program.

The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff`
to have the same parameter dimensions. To achieve this, we used to realign
every `pw_aff` with `Scop::Context`. However, in conjunction with
`-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context`
does not contain all parameters.

We set this up correctly by creating a space that has all the parameters
used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space.

llvm-svn: 309934
2017-08-03 12:09:33 +00:00
Singapuram Sanjay Srivallabh 188053af5e Remove debug metadata from copied instruction to prevent GPUModule verification failure
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**

When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.

The only debug metadata of the instruction, the DebugLoc, is unset by this patch.

This patch reattempts https://reviews.llvm.org/D35630 by targeting only those instructions that are to end up in a Module meant for the GPU.

Reviewers: grosser, bollu

Reviewed By: grosser

Subscribers: pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36161

llvm-svn: 309822
2017-08-02 15:20:07 +00:00
Michael Kruse bc88a78cb4 [Simplify] Rewrite redundant write detection algorithm.
The previous algorithm was to search a writes and the sours of its value
operand, and see whether the write just stores the same read value back,
which includes a search whether there is another write access between
them. This is O(n^2) in the max number of accesses in a statement
(+ the complexity of isl comparing the access functions).

The new algorithm is more similar to the one used for searching for
overwrites and coalescable writes. It scans over all accesses in order
of execution while tracking which array elements still have the same
value since it was read. This is O(n), not counting the complexity
within isl. It should be more reliable than trying to catch all
non-conforming cases in the previous approach. It is also less code.

We now also support if the write is a partial write of the read's
domain, and to some extent non-affine subregions.

Differential Revision: https://reviews.llvm.org/D36137

llvm-svn: 309734
2017-08-01 20:01:34 +00:00
Michael Kruse 693ef99935 [Simplify] Improve scalability.
With a lot of reads and writes to the same array in a statement,
some isl sets that capture the state between access can become
complex such that isl takes more considerable time and memory
for operations on them.

The problems identified were:

- is_subset() takes considerable time with many disjoints in the
  arguments. We limit the number of disjoints to 4, any additional
  information is thrown away.

- subtract() can lead to many disjoints. We instead assume that any
  array element is possibly accessed, which removes all disjoints.

- subtract_domain() may lead to considerable processing, even if all
  elements are are to be removed. Instead, we remove determine and
  remove the affected spaces manually. No behaviour is changed.

llvm-svn: 309728
2017-08-01 19:39:11 +00:00
Siddharth Bhat 1ec9cba4e3 [NFC] Add 'REQUIRES: pollyacc' on 'test/GPGPU/invariant-load-hoisting-of-array.ll'
- Should fix broken build due to `r309681`.

llvm-svn: 309686
2017-08-01 14:52:18 +00:00
Siddharth Bhat edf9581e4c [PPCGCodeGeneration] Correct usage of llvm::Value with getLatestValue.
It is possible that the `HostPtr` that coresponds to an array could be
invariant load hoisted. Make sure we use the invariant load hoisted
value by using `IslNodeBuilder::getLatestValue`.

Differential Revision: https://reviews.llvm.org/D36001

llvm-svn: 309681
2017-08-01 14:26:39 +00:00
Michael Kruse 9f6e41cdba [ForwardOpTree] Support synthesizable values.
This allows -polly-optree to move instructions that depend on
synthesizable values.

The difficulty for synthesizable values is that their value depends on
the location. When it is moved over a loop header, and the SCEV
expression depends on the loop induction variable (SCEVAddRecExpr), it
would use the current induction variable instead of the last one.

At the moment we cannot forward PHI nodes such that crossing the header
of loops referenced by SCEVAddRecExpr is not possible (assuming the loop
header has at least two incoming blocks: for entering the loop and the
backedge, such any instruction to be forwarded must have a phi between
use and definition).

A remaining issue is when the forwarded value is used after the loop,
but is only synthesizable inside the loop. This happens e.g. if
ScalarEvolution is unable to determine the number of loop iterations or
the initial loop value. We do not forward in this situation.

Differential Revision: https://reviews.llvm.org/D36102

llvm-svn: 309609
2017-07-31 19:46:21 +00:00
Michael Kruse 57cc92b790 [Simplify] Remove all kinds of redundant scalar writes.
In addition to array and PHI writes, also allow scalar value writes.
The only kind of write not allowed are writes by functions
(including memcpy/memmove/memset).

llvm-svn: 309582
2017-07-31 17:04:55 +00:00
Tobias Grosser 8fc6cdfb1c [GPGPU] Add support for NVIDIA libdevice
Summary:
This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp, cos and copysign functions. Other
functions will be enabled as needed.

Reviewers: bollu, singam-sanjay

Reviewed By: bollu

Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35703

llvm-svn: 309560
2017-07-31 14:03:16 +00:00
Tobias Grosser 39977e4e76 Revert "Remove Debug metadata from copied instruction to prevent Module verification failure"
This reverts commit r309490 as it triggers on our AOSP buildbut error messages
of the form:

inlinable function call in a function with debug info must have a !dbg location

llvm-svn: 309556
2017-07-31 11:43:38 +00:00
Singapuram Sanjay Srivallabh cf9a813368 Remove Debug metadata from copied instruction to prevent Module verification failure
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**

When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.

The only debug metadata of the instruction, the DebugLoc, is unset by this patch.

Reviewers: grosser, bollu, Meinersbur

Reviewed By: grosser, bollu

Subscribers: pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35630

llvm-svn: 309490
2017-07-29 18:03:49 +00:00
Michael Kruse ce9617f4fe [Simplify] Implement write accesses coalescing.
Write coalescing combines write accesses that

- Write the same llvm::Value.
- Write to the same array.
- Unless they do not write anything in a statement instance (partial
  writes), write to the same element.
- There is no other access between them that accesses the same element.

This is particularly useful after DeLICM, which leaves partial writes to
disjoint domains.

Differential Revision: https://reviews.llvm.org/D36010

llvm-svn: 309489
2017-07-29 16:21:16 +00:00
Michael Kruse 4335c3992a [test] Add test case for -polly-simplify. NFC.
llvm-svn: 309458
2017-07-29 00:06:06 +00:00
Michael Kruse 8e41d2baab [Simplify] Do not remove dependencies of phis within region stmts.
These were wrongly assumed to be phi nodes that require
MemoryKind::PHI accesses.

llvm-svn: 309454
2017-07-28 23:22:32 +00:00
Adrian Prantl 99c4a5fb8e Remove offset parameter from llvm.dbg.value intrinsics in testcase
llvm-svn: 309433
2017-07-28 21:08:53 +00:00
Michael Kruse c99209b4b2 [test] Fix typo in filename. NFC.
llvm-svn: 309403
2017-07-28 16:57:56 +00:00
Michael Kruse 6c8f91b908 [Simplify] Fix typo in statistics output. NFC.
llvm-svn: 309402
2017-07-28 16:57:51 +00:00
Michael Kruse 34a77780c5 [Simplify] Remove empty partial accesses first. NFC.
So follow-up cleanup do not need special handling for such accesses.

llvm-svn: 309401
2017-07-28 16:57:45 +00:00
Siddharth Bhat 0a1177b58e [ScopDetect] add `-polly-ignore-func` flag to ignore functions by name.
Ignore all functions whose name match a regex. Useful because creating a
regex that does *not* match a string is somewhat hard.

Example:
https://stackoverflow.com/questions/1240275/how-to-negate-specific-word-in-regex

llvm-svn: 309377
2017-07-28 11:47:24 +00:00
Tobias Grosser 25271b91b2 [GPGPU] Do not require the Scop::Context to have information about all parameters
llvm-svn: 309368
2017-07-28 06:49:44 +00:00
Tobias Grosser 30caae6d23 [GPGPU] Fix compilation issue with latest CUDA upgrade to i128
llvm-svn: 309366
2017-07-28 06:38:49 +00:00
Michael Kruse 8a8aca4299 [Simplify] Count PHINodes in simplifiable exit nodes as escaping use.
After region exit simplification, the incoming block of a phi node in
the SCoP region's exit block lands outside of the region. Since we
treat SCoPs as if this already happened, we need to account for that
when looking for outside uses of scalars (i.e. escaping scalars).

llvm-svn: 309271
2017-07-27 14:09:31 +00:00
Michael Kruse 95b39da8ae [Simplify] Fix invalid removal write for escaping values.
A PHI node's incoming block is the user of its operand, not the PHI's parent.

Assuming the PHINode's parent being the user lead to the removal of a
MemoryAccesses because its use was assumed to be inside of the SCoP.

llvm-svn: 309164
2017-07-26 19:58:15 +00:00
Michael Kruse 11ed062258 [SCEVValidator] Loop exit values of loops before the SCoP are synthesizable.
In the following loop:

   int i;
   for (i = 0; i < func(); i+=1)
     ;
SCoP:
   for (int j = 0; j<n; j+=1)
     S(i, j)

The value i is synthesizable in the SCoP that includes only the j-loop.
This is because i is fixed within the SCoP, it is irrelevant whether
it originates from another loop.

This fixes a strange case where a PHI was synthesiable in a SCoP,
but not its incoming value, triggering an assertion.

This should fix MultiSource/Applications/sgefa/sgefa of the
perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable buildbot.

llvm-svn: 309109
2017-07-26 13:05:45 +00:00
Philip Pfaffe 85cc5687df [IslAst] Untangle IslAst lit-testcases from specifics of the legacy-PM
Summary:
This consists instances of two changes:

- Accept any order of checks for a specific loop form, that appear in different order in the new vs legacy-PM.
- Remove checks for specific regions.

Reviewers: grosser

Reviewed By: grosser

Subscribers: pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35837

llvm-svn: 308976
2017-07-25 15:07:42 +00:00
Michael Kruse b6317007b4 [ScopInfo] Fix assertion for PHIs not in a region stmts entry.
A PHI node within a region statement is legal, but does not have
a MemoryKind::PHI access.

llvm-svn: 308973
2017-07-25 13:28:39 +00:00
Siddharth Bhat 43f178bbc9 [PPCGCodeGeneration] Skip arrays with empty extent.
Invariant load hoisted scalars, and arrays whose size we can statically compute
to be 0 do not need to be allocated as arrays.

Invariant load hoisted scalars are sent to the kernel directly as parameters.

Earlier, we used to allocate `0` bytes of memory for these because our
computation of size from `PPCGCodeGeneration::getArraySize` would result in `0`.

Now, since we don't invariant loads as arrays in PPCGCodeGeneration, this
problem does not occur anymore.

Differential Revision: https://reviews.llvm.org/D35795

llvm-svn: 308971
2017-07-25 12:35:36 +00:00
Michael Kruse 07e8c36dc7 [ForwardOpTree] Support read-only value uses.
Read-only values (values defined before the SCoP) require special
handing with -polly-analyze-read-only-scalars=true (which is the
default). If active, each use of a value requires a read access.
When a copied value uses a read-only value, we must also ensure that
such a MemoryAccess is available or is created.

Differential Revision: https://reviews.llvm.org/D35764

llvm-svn: 308876
2017-07-24 12:43:27 +00:00
Siddharth Bhat e2699b572e [Polly] [NFC] [ScopDetection] Make `polly-only-func` perform regex scop name match.
Summary:

- We were using `.count` in `StringRef`, which matches substrings.
- We may want to use this for equality as well.
- Generalise this, so allow regexes as a parameter to `polly-only-func`.

Differential Revision: https://reviews.llvm.org/D35728

llvm-svn: 308875
2017-07-24 12:40:52 +00:00
Michael Kruse ab8f0d57df [Simplify] Remove partial write accesses with empty domain.
If the access relation's domain is empty, the access will never be
executed. We can just remove it.

We only remove write accesses. Partial read accesses are not yet
supported and instructions in the statement might require the
llvm::Value holding the read's result to be defined.

llvm-svn: 308830
2017-07-22 20:33:09 +00:00
Michael Kruse e5f4706a55 [ForwardOpTree] Support hoisted invariant loads.
Hoisted loads can be trivially supported because there are no
MemoryAccess to be modified, the loaded value is just available
at code generation.

llvm-svn: 308826
2017-07-22 14:30:02 +00:00
Michael Kruse a6b2de3b59 [ForwardOpTree] Introduce the -polly-optree pass.
This pass 'forwards' operand trees into statements that use them in
order to avoid scalar dependencies.

This minimal implementation handles only the case of speculatable
instructions. We will successively add support for:
- Hoisted loads
- Read-only values
- Synthesizable values
- Loads
- PHIs
- Forwarding only parts of the tree

Differential Revision: https://reviews.llvm.org/D35754

llvm-svn: 308825
2017-07-22 14:02:47 +00:00
Philip Pfaffe 8f6c48e2aa Untangle ScopInfo lit-testcases from specifics of the legacy-PM
Summary:
For the ScopInfo lit testsuite, this patch removes some dependences on output behaviour of the legacy PM.

In most cases, these tests checked the tool output for labels created by the pass printer in the legacy PM. This doesn't work for the new PM anymore. Untangling the testcases is the first step to porting the testsuite for the new PM infrastructure.

Reviewers: grosser, Meinersbur, bollu

Reviewed By: grosser

Subscribers: llvm-commits, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35727

llvm-svn: 308754
2017-07-21 16:47:36 +00:00
Philipp Schaad 2f3073b5cb [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for Intel
Summary:
Added SPIR Code Generation to the PPCG Code Generator. This can be invoked using
the polly-gpu-arch flag value 'spir32' or 'spir64' for 32 and 64 bit code respectively.
In addition to that, runtime support has been added to execute said SPIR code on Intel
GPU's, where the system is equipped with Intel's open source driver Beignet (development
version). This requires the cmake flag 'USE_INTEL_OCL' to be turned on, and the polly-gpu-runtime
flag value to be 'libopencl'.
The transformation of LLVM IR to SPIR is currently quite a hack, consisting in part of regex
string transformations.
Has been tested (working) with Polybench 3.2 on an Intel i7-5500U (integrated graphics chip).

Reviewers: bollu, grosser, Meinersbur, singam-sanjay

Reviewed By: grosser, singam-sanjay

Subscribers: pollydev, nemanjai, mgorny, Anastasia, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35185

llvm-svn: 308751
2017-07-21 16:11:06 +00:00
Tobias Grosser 1eeedf4829 [IslNodeBuilder] Relax complexity check in invariant loads and run it early
When performing invariant load hoisting we check that invariant load expressions
are not too complex. Up to this commit, we performed this check by counting the
sum of dimensions in the access range as a very simple heuristic. This heuristic
is a little too conservative, as it prevents hoisting for any scops with a
very large number of parameters. Hence, we update the heuristic to only count
existentially quantified dimensions and set dimensions. We expect this to still
detect the problematic expressions in h264 because of which this check was
originally introduced.

For some unknown reason, this complexity check was originally committed in
IslNodeBuilder. It really belongs in ScopInfo, as there is no point in
optimizing a program which we could have known earlier cannot be code generated.
The benefit of running the check early is that we can avoid to even hoist checks
that are expensive to code generate as invariant loads. This can be seen in
the changed tests, where we now indeed detect the scop, but just not invariant
load hoist the complicated access.

We also improve the formatting of the code, document it, and use isl++ to
simplify expressions.

llvm-svn: 308659
2017-07-20 19:55:19 +00:00
Tobias Grosser 54491db687 Support fabs and copysign in Polly-ACC
llvm-svn: 308649
2017-07-20 18:26:34 +00:00
Michael Kruse 22058c3fbb [Simplify] Remove unused instructions and accesses.
Use a mark-and-sweep algorithm to find and remove unused instructions
and MemoryAccesses. This is useful in particular to remove scalar
writes that are never used anywhere. A scalar write in a loop induces
a write-after-write dependency that stops the loop iterations to be
rescheduled. Such writes can be a result of previous transformations
such as DeLICM and operand tree forwarding.

It adds a new class VirtualInstruction that represents an instruction in
a particular statement. At the moment an instruction can only belong to
the statement that represents a BasicBlock. In the future, instructions
can be in one of multiple statements representing a BasicBlock
(Nandini's work), in different statements than its BasicBlock would
indicate, and even multiple statements at once (by forwarding operand
trees). It also integrates nicely with the VirtualUse class.

ScopStmt::contains(Instruction*) currently uses the instruction's parent
BasicBlock to check whether it contains the instruction. It will need to
check the actual statement list when one of the aforementioned features
become possible.

Differential Revision: https://reviews.llvm.org/D35656

llvm-svn: 308626
2017-07-20 16:21:55 +00:00
Siddharth Bhat 9e3db2b756 [PPCGCodeGen] [3/3] Update PPCGCodeGen + tests to latest ppcg.
This commit *WILL COMPILE*.

1. `PPCG` now uses `isl_multi_pw_aff` instead of an array of `pw_aff`.
   This needs us to adjust how we index array bounds and how we construct
   array bounds.

2. `PPCG` introduces two new kinds of nodes: `init_device` and `clear_device`.
   We should investigate what the correct way to handle these are.

3. `PPCG` has gotten smarter with its use of live range reordering, so some of
   the tests have a qualitative improvement.

4. `PPCG` changed its output style, so many test cases need to be updated to
   fit the new style for `polly-acc-dump-code` checks.

Differential Revision: https://reviews.llvm.org/D35677

llvm-svn: 308625
2017-07-20 15:48:36 +00:00
Michael Kruse 0865585eab [ScopInfo] Add support for wrap-around of integers in unsigned comparisons.
This is one possible solution to implement wrap-arounds for integers in
unsigned icmp operations. For example,

    store i32 -1, i32* %A_addr
    %0 = load i32, i32* %A_addr
    %1 = icmp ult i32 %0, 0

%1 should hold false, because under the assumption of unsigned integers,
-1 should wrap around to 2^32-1. However, previously. it was assumed
that the MSB (Most Significant Bit - aka the Sign bit) was never set for
integers in unsigned operations.

This patch modifies the buildConditionSets function in ScopInfo.cpp to
give better information about the integers in these unsigned
comparisons.

Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D35464

llvm-svn: 308608
2017-07-20 12:37:02 +00:00
Philip Pfaffe 17b1ecfdc5 [CMake] Fix r307650: Readd missing dependency.
The commit erroneously removed the dependency of the Polly tests on
things like opt and FileCheck. Add that dependency back.

llvm-svn: 308512
2017-07-19 19:20:58 +00:00
Roman Gareev 5abea0c97a [FIX] Update test/ScheduleOptimizer/pattern-matching-based-opts_11.ll.
llvm-svn: 308501
2017-07-19 18:01:51 +00:00
Roman Gareev 6531df41ae [FIX] Fix pattern-matching-based-opts_11.ll.
llvm-svn: 308499
2017-07-19 17:33:42 +00:00
Roman Gareev 750374181b Make the pattern matching work with modified memory accesses
Some optimizations (e.g., DeLICM) can modify memory accesses (e.g., change
their MemoryKind). Consequently, the pattern matching should take it into
the account.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D33138

llvm-svn: 308494
2017-07-19 16:59:06 +00:00
Michael Kruse bb7d22a31a [Test] Do not pipe binary data to FileCheck.
llvm-svn: 308437
2017-07-19 11:12:16 +00:00
Eli Friedman e737fc120e [Polly] [OptDiag] Updating Polly Diagnostics Remarks
Utilizing newer LLVM diagnostic remark API in order to enable use of
opt-viewer tool. Polly Diagnostic Remarks also now appear in YAML
remark file.

In this patch, I've added the OptimizationRemarkEmitter into certain
classes where remarks are being emitted and update the remark emit calls
itself. I also provide each remark a BasicBlock or Instruction from where
it is being called, in order to compute the hotness of the remark.

Patch by Tarun Rajendran!

Differential Revision: https://reviews.llvm.org/D35399

llvm-svn: 308233
2017-07-17 23:58:33 +00:00
Tobias Grosser 4556c9b8fe [ScopInfo] Simplify new access functions under domain context
Summary:
We do not keep domain constraints on access functions when building the
scop. Hence, for consistency reasons, it makes also sense to not include
them when storing a new access function. This change results in simpler
access functions that make output easier to read.

This patch also helps to make DeLICMed memory accesses to be understood by
our matrix multiplication pattern matching pass. Further changes to the
matrix multiplication pattern matching are needed for this to work, so the
corresponding test case will be added in a future commit.

Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg

Subscribers: pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35237

llvm-svn: 308215
2017-07-17 20:47:10 +00:00
Siddharth Bhat 233d717ec1 [PPCGCodeGeneration] Generate invariant loads before trying to generate IR.
- We should call `preloadInvariantLoads` to make sure that code is
   generated for invariant loads in the kernel.

Differential Revision: https://reviews.llvm.org/D35410

llvm-svn: 308187
2017-07-17 15:57:01 +00:00