Commit Graph

3650 Commits

Author SHA1 Message Date
Michael Kruse 420c4863a9 [Simplify] Actually remove unsed instruction from region header.
Since r312249 instructions of a entry block of region statements are
not marked as root anymore and hence can theoretically be removed
if unused. Theoretically, because the instruction list was not changed.

Still, MemoryAccesses for unused instructions were removed. This lead
to a failed assertion in the code generator  when the MemoryAccess for
the still listed instruction was not found.

This hould fix the
Assertion failed: ArrayAccess && "No array access found for instruction!",
file ScopInfo.h, line 1494
compiler crashes.

llvm-svn: 312566
2017-09-05 19:44:39 +00:00
Tobias Grosser 1a695b1d6c [CodegenCleanup] Use old GVN pass instead of NewGVN
It seems NewGVN still has some problems: llvm.org/PR34452, we will switch back
after they have been resolved.

llvm-svn: 312480
2017-09-04 11:04:33 +00:00
Tobias Grosser 8703e38380 [ISLTools]: Move singleton to isl++
llvm-svn: 312476
2017-09-04 10:05:29 +00:00
Tobias Grosser 3575afd739 [DeLICM] Move some functions to isl++ [NFC]
llvm-svn: 312475
2017-09-04 10:05:25 +00:00
Tobias Grosser d6e0679c4e [ForwardOp] Remove read accesses for all instructions that have been moved
Before this patch, OpTree did not consider forwarding an operand tree consisting
of only single LoadInst as useful. The motivation was that, like an access to a
read-only variable, it would just replace one MemoryAccess by another. However,
in contrast to read-only accesses, this would replace a scalar access by an
array access, which is something worth doing.

In addition, leaving scalar MemoryAccess is problematic in that VirtualUse
prioritizes inter-Stmt use over intra-Stmt. It was possible that the same LLVM
value has a MemoryAccess for accessing the remote Stmt's LoadInst as well as
having the same LoadInst in its own instruction list (due to being forwarded
from another operand tree).

With this patch we ensure that if a LoadInst is forwarded is any operand tree,
also the operand tree containing just the LoadInst is forwarded as well, which
effectively removes the scalar MemoryAccess such that only the array access
remains, not both.

Thanks Michael for the detailed explanation.

Reviewers: Meinersbur, bellu, singam-sanjay, gareevroman

Subscribers: hfinkel, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D37424

llvm-svn: 312456
2017-09-03 19:52:15 +00:00
Tobias Grosser 701d943d12 [IslAst] Do not assert in case of empty min/max alias locations
In certain situations, the context in the isl_ast_build could result for the
min/max locations of our alias sets to become empty, which would cause an
internal error in isl, which is then unable to derive a value for these
expressions. Check these conditions before code generating expressions and
instead assume that alias check succeeded. This is valid, as the corresponding
memory accesses will not be executed under any valid context.

This fixed llvm.org/PR34432. Thanks to Qirun Zhang for reporting.

llvm-svn: 312455
2017-09-03 19:47:19 +00:00
Tobias Grosser 6b1e461329 [IslAst] Move buildCondition to isl++
llvm-svn: 312452
2017-09-03 18:31:44 +00:00
Tobias Grosser 99ccf05694 [ScopHelper] Do not crash on unreachable blocks
This resolves llvm.org/PR34433. Thanks to Zhendong Su for reporting.

llvm-svn: 312451
2017-09-03 18:01:22 +00:00
Michael Kruse 7954a221f3 [ForwardOpTree] Fix typos. NFC.
llvm-svn: 312446
2017-09-03 16:09:38 +00:00
Tobias Grosser 4baedc70d1 [ScopDetect/Info] Look through PHIs that follow an error block
In case a PHI node follows an error block we can assume that the incoming value
can only come from the node that is not an error block. As a result, conditions
that seemed non-affine before are now in fact affine.

llvm-svn: 312410
2017-09-02 08:25:55 +00:00
Siddharth Bhat 3928e3f50a [ISLNodeBuilder] Materialize Fortran array sizes of arrays without memory accesses.
In Polly, we specifically add a paramter to represent the outermost dimension
 size of fortran arrays. We do this because this information is statically
 available from the fortran metadata generated by dragonegg.
 However, we were only materializing these parameters (meaning, creating an
 llvm::Value to back the isl_id) from *memory accesses*. This is wrong,
 we should materialize parameters from *scop array info*.

 It is wrong because if there is a case where we detect 2 fortran arrays,
 but only one of them is accessed, we may not materialize the other array's
 dimensions at all.

 This is incorrect. We fix this by looping over all
 `polly::ScopArrayInfo` in a scop, rather that just all `polly::MemoryAccess`.

 Differential Revision: https://reviews.llvm.org/D37379

llvm-svn: 312350
2017-09-01 18:55:43 +00:00
Michael Kruse 0c6c555beb Fix Memory Access of failing tests.
Mark scalar dependences for different statements belonging to same BB
as 'Inter'.

Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D37147

llvm-svn: 312324
2017-09-01 11:36:52 +00:00
Roman Gareev 1cb3491620 Run GVN during the cleanup
Currently, GVN can be necessary to eliminate redundant instructions in case
of, for instance, GEMM and float type. This patch makes GVN be run during
the cleanup.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D37340

llvm-svn: 312307
2017-09-01 06:52:28 +00:00
Tobias Grosser 04567fd480 Drop unused statistic counter
llvm-svn: 312304
2017-09-01 02:17:10 +00:00
Mandeep Singh Grang c2774a549b [polly] Fix non-deterministic output due to iteration of unordered ScopArrayInfo
Summary:
This fixes the following failures in the reverse iteration builder:
http://lab.llvm.org:8011/builders/reverse-iteration/builds/25

    Polly :: MaximalStaticExpansion/working_deps_between_inners.ll
    Polly :: MaximalStaticExpansion/working_expansion_multiple_dependences_per_statement.ll
    Polly :: MaximalStaticExpansion/working_expansion_multiple_instruction_per_statement.ll
    Polly :: MaximalStaticExpansion/working_phi_expansion.ll

Reviewers: simbuerg, Eugene.Zelenko, grosser, zinob, bollu

Reviewed By: grosser

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D37349

llvm-svn: 312273
2017-08-31 20:10:30 +00:00
Roman Gareev 6589748920 Use the information about the target cache provided by the TargetTransformInfo.
Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D37178

llvm-svn: 312255
2017-08-31 17:07:54 +00:00
Tobias Grosser 2307f86c47 [ForwardOpTree] Allow forwarding in the presence of region statements
Summary:
After region statements now also have instruction lists, this is a
straightforward extension.

Reviewers: Meinersbur, bollu, singam-sanjay, gareevroman

Reviewed By: Meinersbur

Subscribers: hfinkel, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D37298

llvm-svn: 312249
2017-08-31 16:04:49 +00:00
Siddharth Bhat 56572c6a5e [PPCGCodeGen] Convert intrinsics to libdevice functions whenever possible.
This is useful when we face certain intrinsics such as `llvm.exp.*`
which cannot be lowered by the NVPTX backend while other intrinsics can.

So, we would need to keep blacklists of intrinsics that cannot be
handled by the NVPTX backend. It is much simpler to try and promote
all intrinsics to libdevice versions.

This patch makes function/intrinsic very uniform, and will always try to use
a libdevice version if it exists.

Differential Revision: https://reviews.llvm.org/D37056

llvm-svn: 312239
2017-08-31 13:03:37 +00:00
Tobias Grosser c43d0360cc [BlockGenerator] Generate entry block of regions from instruction lists
The adds code generation support for the previous commit.

This patch has been re-applied, after the memory issue in the previous patch
has been fixed.

llvm-svn: 312211
2017-08-31 03:17:35 +00:00
Tobias Grosser bd15d13d4e [ScopInfo] Use statement lists for entry blocks of region statements
By using statement lists in the entry blocks of region statements, instruction
level analyses also work on region statements.

We currently only model the entry block of a region statements, as this is
sufficient for most transformations the known-passes currently execute. Modeling
instructions in the presence of control flow (e.g. infinite loops) is left
out to not increase code complexity too much. It can be added when good use
cases are found.

This change set is reapplied, after a memory corruption issue had been fixed.

llvm-svn: 312210
2017-08-31 03:15:56 +00:00
Tobias Grosser d3edc16416 Revert "[ScopInfo] Use statement lists for entry blocks of region statements"
This reverts commit r312128. It aused some memory issues.

llvm-svn: 312209
2017-08-31 02:43:49 +00:00
Tobias Grosser 6f1f5cbb5b Revert "[BlockGenerator] Generate entry block of regions from instruction lists"
This reverts commit r312129. It caused some memory issues.

llvm-svn: 312208
2017-08-31 02:43:27 +00:00
Adrian Prantl 6120801066 Adapt testcase to LLVM change in DIGlobalVariableExpression.
llvm-svn: 312147
2017-08-30 18:12:35 +00:00
Tobias Grosser 1e34508bcc [BlockGenerator] Generate entry block of regions from instruction lists
The adds code generation support for the previous commit.

llvm-svn: 312129
2017-08-30 15:08:30 +00:00
Tobias Grosser 6fbe4c8501 [ScopInfo] Use statement lists for entry blocks of region statements
By using statement lists in the entry blocks of region statements, instruction
level analyses also work on region statements.

We currently only model the entry block of a region statements, as this is
sufficient for most transformations the known-passes currently execute. Modeling
instructions in the presence of control flow (e.g. infinite loops) is left
out to not increase code complexity too much. It can be added when good use
cases are found.

llvm-svn: 312128
2017-08-30 15:08:21 +00:00
Michael Kruse f3387836d0 [ScopBuilder/ScopInfo] Move reduction detection to ScopBuilder. NFC.
Reduction detection is only executed in the SCoP building phase.
Hence it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.

llvm-svn: 312118
2017-08-30 13:05:08 +00:00
Michael Kruse 35aa9d862e [ScopBuilder/ScopInfo] Move ScopStmt::collectSurroundingLoops to ScopBuilder. NFC.
This method is only called in the SCoP building phase.
Therefore it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.

llvm-svn: 312117
2017-08-30 13:05:01 +00:00
Michael Kruse eb83141f9e [ScopBuilder/ScopInfo] Move ScopStmt::buildDomain to ScopBuilder. NFC.
This method is only called in the SCoP building phase.
Therefore it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.

llvm-svn: 312116
2017-08-30 13:04:54 +00:00
Michael Kruse a29f8c03d4 [ScopBuilder/ScopInfo] Move ScopStmt::buildAccessRelations to ScopBuilder. NFC.
This method is only called in the SCoP building phase.
Therefore it fits better into ScopBuilder to separate
SCoP-construction from SCoP modeling.

This mostly mechanical change makes ScopBuilder directly access some of
ScopStmt/MemoryAccess private fields. We add ScopBuilder as a friend
class and will add proper accessor functions sometime later.

llvm-svn: 312115
2017-08-30 13:04:46 +00:00
Michael Kruse f6eb3a2ed2 [ScopBuilder/ScopInfo] Move and inline Scop::init into ScopBuilder::buildScop. NFC.
The method is only needed in the SCoP building phase, and doesn't need
to be part of the general API.

llvm-svn: 312114
2017-08-30 13:04:39 +00:00
Michael Kruse 860870b7b0 [ScopBuilder] Report to dbgs() on SCoP bailout. NFC.
This allows to use -debug to see that a SCoP was found in ScopDetect,
but dismissed by ScopBuilder.

llvm-svn: 312113
2017-08-30 11:52:03 +00:00
Michael Kruse 591255183b [ScopBuilder] Introduce metadata for splitting scop statement.
This patch allows annotating of metadata in ir instruction
(with "polly_split_after"), which specifies where to split a particular
scop statement.

Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D36402

llvm-svn: 312107
2017-08-30 10:11:06 +00:00
Michael Kruse 99cc9ded41 Do not consider mem intrinsics as error.
The intrinsics memset, memcopy and memmove do have their memory accesses
modeled by ScopBuilder. Do not consider them error-case behavior.

Test case will come with a future patch that requires memory intrinsics
outside of error blocks.

llvm-svn: 312021
2017-08-29 18:27:47 +00:00
Michael Kruse 25d3f85a43 Skip ignored intrinsics.
Commit r252725 introduced a "return false" if an ignored intrinsics was
found. The consequence of this was that the mere existence of an ignored
intrinsic (such as llvm.dbg.value) before a call that would have
qualified the block to be an error block, to not be an error block.

The obvious goal was to just skip ignored intrinsics, not changing the
meaning of what an error block is.

llvm-svn: 312020
2017-08-29 18:27:42 +00:00
Siddharth Bhat 7de7abb09c [ScopInfo] Fix comment grammar. "..to be build" -> "..to be built". [NFC]
llvm-svn: 311995
2017-08-29 11:46:14 +00:00
Michael Kruse 4728184342 [ZoneAlgo] More fine-grained bail-out.
ZoneAlgo used to bail out for the complete SCoP if it encountered
something violating its assumption. This meant the neither OpTree can
forward any load nor DeLICM do anything in such cases, even if their
transformations are unrelated to the violations.

This patch adds a list of compatible elements (currently with the
granularity of entire arrays) that can be used for analysis. OpTree
and DeLICM can then check whether their transformations only concern
compatible elements, and skip non-compatible ones.

This will be useful for e.g. Polybench's benchmarks covariance,
correlation, bicg, doitgen, durbin, gramschmidt, adi that have
assumption violation, but which are not necessarily relevant
for all transformations.

Differential Revision: https://reviews.llvm.org/D37219

llvm-svn: 311929
2017-08-28 20:39:07 +00:00
Tobias Grosser ee8ad1c0ff [IslAst] Do not compare arrays in alias check which are known to be identical
This possibly helps to avoid run-time check failures in the COSMO kernels.

llvm-svn: 311920
2017-08-28 20:17:02 +00:00
Michael Kruse a4f447c2a4 [PM] Properly require and preserve OptimizationRemarkEmitter. NFCI.
Properly require and preserve the OptimizationRemarkEmitter for use in
ScopPass. Previously one had to get the ORE from ScopDetection because
CodeGeneration did not mark it as preserved. It would need to be
recomputed which results in the legacy PM to throw away all previous
SCoP analysis.

This also changes the implementation of ScopPass::getAnalysisUsage to
not unconditionally preserve all passes, but only those needed to be
preserved by any SCoP pass (at least when using the legacy PM). This
allows invalidating DependenceInfo (and IslAstInfo) in case the pass
would cause them to change (e.g. OpTree, DeLICM, MaximalArrayExpansion)

JSONImporter should also invalidate the DependenceInfo. In this patch
it marks DependenceInfo as preserved anyway because some regression
tests depend on it.

Differential Revision: https://reviews.llvm.org/D37010

llvm-svn: 311888
2017-08-28 14:07:33 +00:00
Michael Kruse e983e6b1c5 [ZoneAlgo] Print rejection reasons to llvm::dbgs(). NFC.
llvm-svn: 311885
2017-08-28 11:22:23 +00:00
Tobias Grosser 93ab558d2e [Detect] Consider nested loop profitable if entry block is not in loop
In cases where the entry block of a scop was not contained in a loop that was
part of the scop region and at the same time there was a loop surrounding the
scop, we missed to count the loops in the scop and consequently did not consider
the scop profitable. We correct this by only moving to the loop parent, in case
the current loop is loop contained in the scop.

This increases the number of loops in COSMO which we assume to be profitable
from 3974 to 4981.

llvm-svn: 311863
2017-08-27 21:39:25 +00:00
Philipp Schaad 8cb2e3245c [Polly][GPGPU] Fixed undefined reference for CUDA's managed memory in Runtime library.
llvm-svn: 311848
2017-08-27 12:50:51 +00:00
Eugene Zelenko a32707d5b1 [Polly] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 311802
2017-08-25 21:35:27 +00:00
Eugene Zelenko 9248fde53a [Polly] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 311704
2017-08-24 21:22:41 +00:00
Tobias Grosser 6d0970f64e Revert "[polly] Fix ScopDetectionDiagnostic test failure caused by r310940"
This reverts commit 950849ece9bb8fdd2b41e3ec348b9653b4e37df6.

This commit broke various buildbots.

llvm-svn: 311692
2017-08-24 19:47:15 +00:00
Michael Kruse b795bfc0d4 [CodeGen] Detect impossible partial write conditions more reliably.
Whether a partial write is tautological/unsatisfiable not only
depends on the access domain, but also on the domain covered
by its node in the AST.

In the example below, there are two instances of Stmt_cond_false. It may have a partial write access that is not executed in instance Stmt_cond_false(0).

      for (int c0 = 0; c0 < tmp5; c0 += 1) {
        Stmt_for_body344(c0);
        if (tmp5 >= c0 + 2)
          Stmt_cond_false(c0);
        Stmt_cond_end(c0);
      }
      if (tmp5 <= 0) {
        Stmt_for_body344(0);
        Stmt_cond_false(0);
        Stmt_cond_end(0);
      }

Isl cannot derive a subscript for an array element that is never accessed.
This caused an error in that no subscript expression has been generated
in IslNodeBuilder::createNewAccesses, but BlockGenerator expected one
to exist because there is an execution of that write, just not in that
ast node.

Fixed by instead of determining whether the access domain is empty,
inspect whether isl generated a constant "false" ast expression in
the current ast node.

This should fix a compiler crash of the aosp buildbot.

llvm-svn: 311663
2017-08-24 14:51:35 +00:00
Siddharth Bhat 78027437e6 [Polly] [PPCGCodeGeneration] Mild refactoring of checking validity of functions in a kernel.
This is a stylistic change to make the function a little more readable.
Also add a debug print to show what instruction contains a use of a
function we don't understand in the kernel.

Differential Revision: https://reviews.llvm.org/D37058

llvm-svn: 311648
2017-08-24 09:54:15 +00:00
Andreas Simbuerger e478e2de83 [Polly][WIP] Scalar fully indexed expansion
Summary:
This patch comes directly after https://reviews.llvm.org/D34982 which allows fully indexed expansion of MemoryKind::Array. This patch allows expansion for MemoryKind::Value and MemoryKind::PHI.

MemoryKind::Value seems to be working with no majors modifications of D34982. A test case has been added. Unfortunatly, no "run time" checks can be done for now because as @Meinersbur explains in a comment on D34982, DependenceInfo need to be cleared and reset to take expansion into account in the remaining part of the Polly pipeline. There is no way to do that in Polly for now.

MemoryKind::PHI is not working. Test case is in place, but not working. To expand MemoryKind::Array, we expand first the write and then after the reads. For MemoryKind::PHI, the idea of the current implementation is to exchange the "roles" of the read and write and expand first the read according to its domain and after the writes.
But with this strategy, I still encounter the problem of union_map in new access map.
For example with the following source code (source code of the test case) :

```
void mse(double A[Ni], double B[Nj]) {
  int i,j;
  double tmp = 6;
  for (i = 0; i < Ni; i++) {
    for (int j = 0; j<Nj; j++) {
      tmp = tmp + 2;
    }
    B[i] = tmp;
  }
}
```

Polly gives us the following statements and memory accesses :

```
    Statements {
    	Stmt_for_body
            Domain :=
                { Stmt_for_body[i0] : 0 <= i0 <= 9999 };
            Schedule :=
                { Stmt_for_body[i0] -> [i0, 0, 0] };
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_body[i0] -> MemRef_tmp_04__phi[] };
            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_body[i0] -> MemRef_tmp_11__phi[] };
            Instructions {
                  %tmp.04 = phi double [ 6.000000e+00, %entry.split ], [ %add.lcssa, %for.end ]
            }
    	Stmt_for_inc
            Domain :=
                { Stmt_for_inc[i0, i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9999 };
            Schedule :=
                { Stmt_for_inc[i0, i1] -> [i0, 1, i1] };
            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_inc[i0, i1] -> MemRef_add_lcssa__phi[] };
            Instructions {
                  %tmp.11 = phi double [ %tmp.04, %for.body ], [ %add, %for.inc ]
                  %add = fadd double %tmp.11, 2.000000e+00
                  %exitcond = icmp ne i32 %inc, 10000
            }
    	Stmt_for_end
            Domain :=
                { Stmt_for_end[i0] : 0 <= i0 <= 9999 };
            Schedule :=
                { Stmt_for_end[i0] -> [i0, 2, 0] };
            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_end[i0] -> MemRef_tmp_04__phi[] };
            ReadAccess :=	[Reduction Type: NONE] [Scalar: 1]
                { Stmt_for_end[i0] -> MemRef_add_lcssa__phi[] };
            MustWriteAccess :=	[Reduction Type: NONE] [Scalar: 0]
                { Stmt_for_end[i0] -> MemRef_B[i0] };
            Instructions {
                  %add.lcssa = phi double [ %add, %for.inc ]
                  store double %add.lcssa, double* %arrayidx, align 8
                  %exitcond5 = icmp ne i64 %indvars.iv.next, 10000
            }
    }

```

and the following dependences :
```
{ Stmt_for_inc[i0, 9999] -> Stmt_for_end[i0] : 0 <= i0 <= 9999;
Stmt_for_inc[i0, i1] -> Stmt_for_inc[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998;
Stmt_for_body[i0] -> Stmt_for_inc[i0, 0] : 0 <= i0 <= 9999;
Stmt_for_end[i0] -> Stmt_for_body[1 + i0] : 0 <= i0 <= 9998 }
```

When trying to expand this memory access :
```
{ Stmt_for_inc[i0, i1] -> MemRef_tmp_11__phi[] };
```

The new access map would look like this :
```
{ Stmt_for_inc[i0, 9999] -> MemRef_tmp_11__phi_exp[i0] : 0 <= i0 <= 9999; Stmt_for_inc[i0, i1] ->MemRef_tmp_11__phi_exp[i0, 1 + i1] : 0 <= i0 <= 9999 and 0 <= i1 <= 9998 }
```

The idea to implement the expansion for PHI access is an idea from @Meinersbur and I don't understand why my implementation does not work. I should have miss something in the understanding of the idea.

Contributed by: Nicolas Bonfante <nicolas.bonfante@gmail.com>

Reviewers: Meinersbur, simbuerg, bollu

Reviewed By: Meinersbur

Subscribers: llvm-commits, pollydev, Meinersbur

Differential Revision: https://reviews.llvm.org/D36647

llvm-svn: 311619
2017-08-24 00:04:45 +00:00
Michael Kruse 06ed529205 Add more statistics.
Add statistics about
- Which optimizations are applied
- Number of loops in Scops at various stages
- Number of scalar/singleton writes at various stages representative
  for scalar false dependencies
- Number of parallel loops

These will be useful to find regressions due to moving Polly further
down of LLVM's pass pipeline.

Differential Revision: https://reviews.llvm.org/D37049

llvm-svn: 311553
2017-08-23 13:50:30 +00:00
Michael Kruse 7fac28fa4f [ScopDetect] Include zero-iteration loops in loop count.
Loop with zero iteration are, syntactically, loops. They have been
excluded from the loop counter even for the non-profitable counters.
This seems to be unintentially as the sentinel value of '0' minimal
iterations does exclude such loops.

Fix by never considering the iteration count when the sentinel
value of 0 is found.

This makes the recently added NumTotalLoops couter redundant
with NumLoopsOverall, which now is equivalent. Hence, NumTotalLoops
is removed as well.

Note: The test case 'ScopDetect/statistics.ll' effectively does not
check profitability, because -polly-process-unprofitable is passed
to all test cases.

llvm-svn: 311551
2017-08-23 13:29:59 +00:00
Michael Kruse 99fba1fd52 [ScopInliner] Fix hidden overload warning. NFC.
By exposing the the hidden member, but as private.

llvm-svn: 311550
2017-08-23 13:07:43 +00:00
Michael Kruse a1579aab46 [MaximumStaticExpansion] Avoid warning in release builds.
Conditionally compile function only used in an assert().

llvm-svn: 311549
2017-08-23 12:50:02 +00:00
Michael Kruse 3044dc51cf [PPCGCodeGen] Fix compiler warning: '<': signed/unsigned mismatch. NFC.
MSVC warns about comparison between a signed and unsigned integer.
The rules of C(++) define that an unsigned comparison has to be
carried-out in this case. This is unlikely to be intended.

Fix by assigning the loop's upper bound to a signed integer first.
This also avoids repeated evaluation of the invariant upper bound.

llvm-svn: 311548
2017-08-23 12:45:25 +00:00
Michael Kruse 594386e773 [ScopInfo] Remove stray semicolon. NFC.
llvm-svn: 311547
2017-08-23 12:34:37 +00:00
Tobias Grosser d680edfb98 Move include/isl-noexceptions.h to include/isl/isl-noexceptions.h
llvm-svn: 311504
2017-08-22 22:04:22 +00:00
Jakub Kuderski 0ac1e585fc [polly] Fix ScopDetectionDiagnostic test failure caused by r310940
Summary:
ScopDetection used to check if a loop withing a region was infinite and emitted a diagnostic in such cases. After r310940 there's no point checking against that situation, as infinite loops don't appear in regions anymore.

The test failure was observed on these two polly buildbots:
http://lab.llvm.org:8011/builders/polly-arm-linux/builds/8368
http://lab.llvm.org:8011/builders/polly-amd64-linux/builds/10310

This patch XFAILs `ReportLoopHasNoExit.ll` and turns infinite loop detection into an assert.

Reviewers: grosser, sanjoy, bollu

Reviewed By: grosser

Subscribers: efriedma, aemerson, kristof.beyls, dberlin, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36776

llvm-svn: 311503
2017-08-22 22:01:53 +00:00
Tobias Grosser 4a07bbe3f6 [IRBuilder] Only emit alias scop metadata for arrays, but not scalars
Summary:
There is no need to emit alias metadata for scalars, as basicaa will easily
distinguish them from arrays. This reduces the size of the metadata we generate.
This is especially useful after we moved to -polly-position=before-vectorizer,
where a lot more scalar dependences are introduced, which increased the size of
the alias analysis metadata and made us commonly reach the limits after which
we do not emit alias metadata that have been introduced to prevent quadratic
growth of this alias metadata.

This improves 2mm performance from 1.5 seconds to 0.17 seconds.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: Meinersbur

Subscribers: pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D37028

llvm-svn: 311498
2017-08-22 21:58:48 +00:00
Eugene Zelenko bff61d220e [Polly] Satisfy Clang-format for r311489 changes, but it's weird that Clang-format didn't complain about headers order in previous versions (NFC).
llvm-svn: 311494
2017-08-22 21:47:17 +00:00
Eugene Zelenko 0c4c2ce0b0 [Polly] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 311489
2017-08-22 21:25:51 +00:00
Roman Gareev 6bfeba24d3 [NFC] Fix the broken comment.
llvm-svn: 311477
2017-08-22 17:43:03 +00:00
Roman Gareev 0956a606ff Disable the Loop Vectorizer in case of GEMM
Currently, in case of GEMM and the pattern matching based optimizations, we
use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop
Vectorizer can get in the way of optimal code generation, we disable the Loop
Vectorizer for the innermost loop using mark nodes and emitting the
corresponding metadata.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D36928

llvm-svn: 311473
2017-08-22 17:38:46 +00:00
Michael Kruse 595b77bc0b [ScopInfo] Fix typos in comment. NFC.
llvm-svn: 311472
2017-08-22 17:32:51 +00:00
Siddharth Bhat 14544a8068 [GPUJIT] Make max managed pointers an environment variable.
This was originally a `#define`. It is much easier to play around with
this as an environment variable when we run on large programs.

Differential Revision: https://reviews.llvm.org/D37012

llvm-svn: 311471
2017-08-22 17:32:27 +00:00
Michael Kruse a28260f486 [test] Do not pipe binary data to FileCheck.
llvm-svn: 311470
2017-08-22 17:09:56 +00:00
Michael Kruse 5b228bbb12 [ScopDetection] Add stat for total number of loops.
The total number of loops is useful as a baseline comparing how many
loops have been optimized in different configurations.

llvm-svn: 311469
2017-08-22 17:09:51 +00:00
Siddharth Bhat cb5155bf6d [ManagedMemoryRewrite] Use `unit64_t` to store size, not `int`.
llvm-svn: 311440
2017-08-22 09:30:37 +00:00
Siddharth Bhat 603544863f [ManagedMemoryRewrite] Get size in bytes rather than in bits and dividing by 8.
llvm-svn: 311439
2017-08-22 09:27:41 +00:00
Tobias Grosser 6683c81af8 test/GPGPU/invalid-kernel-assert-verifymodule.ll also requires assertions
llvm-svn: 311423
2017-08-22 03:12:29 +00:00
Michael Kruse f281ae5992 [test] Add some test cases for computeArrayUnused.
llvm-svn: 311404
2017-08-21 23:04:55 +00:00
Michael Kruse ade14269cd [DeLICM] Fix unused zone for writes without in-between read.
The implementation of computeArrayUnused did not consider writes without
reads before, except for the first write in the SCoP. This caused it to
'forget' writes directly following another write.

This patch re-adds the entire reaching defintion of a write that has not
been covered before by a read.

This fixes Polybench 4.2 2mm where only one of the matrix-multiplication
was detected.

llvm-svn: 311403
2017-08-21 23:04:45 +00:00
Siddharth Bhat a8c329b0eb [ManagedMemoryRewrite] slightly tweak debug output style. [NFC]
llvm-svn: 311361
2017-08-21 18:58:33 +00:00
Siddharth Bhat 557ce3a8b0 [ManagedMemoryRewrite] Print reasons for skipping global array to dbgs(). [NFC]
llvm-svn: 311360
2017-08-21 18:52:15 +00:00
Tobias Grosser 0dd42512ff [ZoneAlgorithm] Move computeScalarReachingDefinition to c++
llvm-svn: 311336
2017-08-21 14:19:40 +00:00
Siddharth Bhat 0a198dc18a [ManagedMemoryRewrite] hide debug output behing DEBUG(...). [NFC]
llvm-svn: 311331
2017-08-21 12:51:57 +00:00
Siddharth Bhat 7bc77e87c8 [ScopInfo] Add option to treat all function parameters as dereferencible.
Dragonegg generates most function parameters as pointers to the actual
parameters. However, it does not mark these parameters with the
dereferencable attribute.

Polly is conservative when it comes to invariant load
hoisting, thus we add runtime checks to invariant load hoisted pointers
when we do not know that pointers are dereferencable. This is correct behaviour,
but is a performance penalty.

Add a flag that allows all pointer parameters to be dereferencable. That
way, polly can speculatively load-hoist paramters to functions without
runtime checks.

Differential Revision: https://reviews.llvm.org/D36461

llvm-svn: 311329
2017-08-21 11:57:04 +00:00
Siddharth Bhat 7b9f5ca27e [PPCGCodeGeneration] Enable `polly-codegen-perf-monitoring` for PPCGCodegen.
This feature was not enabled for `PPCGCodeGeneration`. Now that this is
enabled, we can benchmark Scops that have been optimised with
`-polly-codegen-ppcg` with the `-polly-codegen-perf-monitoring` option.

Differential Revision: https://reviews.llvm.org/D36934

llvm-svn: 311328
2017-08-21 11:44:01 +00:00
Tobias Grosser b09bd74da8 [GPGPU] Add llvm.powi to the libdevice supported functions
These intrinsics are used in COSMO.

llvm-svn: 311324
2017-08-21 09:52:08 +00:00
Tobias Grosser 5170b6627a [GPGPU] Add log / logf to the libdevice supported functions
These two functions are used in COSMO

llvm-svn: 311322
2017-08-21 09:00:31 +00:00
Michael Kruse d091bf8d8e [MatMul] Make MatMul detection independent of internal isl representations.
The pattern recognition for MatMul is restrictive.

The number of "disjuncts" in the isl_map containing constraint
information was previously required to be 1
(as per isl_*_coalesce - which should ideally produce a domain map with
a single disjunct, but does not under some circumstances).

This was changed and made more flexible.

Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D36460

llvm-svn: 311302
2017-08-20 21:31:11 +00:00
Siddharth Bhat 9a5a278f78 [GPUJIT] Switch from Runtime API calls for managed memory to Driver API calls.
We now load the function pointer for `cuMemAllocManaged` dynamically, so
it should be possible to compile `GPUJIT` on non-CUDA systems again.

It should now be possible to link on non-cuda systems again.

Thanks to Philipp Schaad for noticing this inconsitency.

Differential Revision: https://reviews.llvm.org/D36921

llvm-svn: 311289
2017-08-20 13:38:04 +00:00
Tobias Grosser e32498c9c3 Revert "[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]"
We still see some issues with parameter space mismatches. Revert this to get
a clean baseline. We will recommit after these issues have been resolved.

This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49.

llvm-svn: 311268
2017-08-19 23:49:26 +00:00
Tobias Grosser 9041118983 [ManagedMemoryRewrite] Make pass more robust and fix memory issue
Instead of using Twines and temporary expressions, we do string manipulation
through a std::string. This resolves a memory corruption issue, which likely
was caused by twines loosing their underlying string too soon.

llvm-svn: 311264
2017-08-19 23:03:45 +00:00
Siddharth Bhat 205a78a6f9 [ManagedMemoryRewrite] Iterate over operands of the expanded instruction, not the constantexpr itself.
- We should iterate over `I`, which is `Cur` expanded out to an
instruction, and not `Cur` itself.

- This is a bugfix.

Differential Revision: https://reviews.llvm.org/D36923

llvm-svn: 311261
2017-08-19 20:52:11 +00:00
Tobias Grosser ecb94a0392 [GPGPU] Correctly initialize array order and fixed_element information
Summary:
This information is necessary for PPCG to perform correct life range reordering.
With these changes applied we can live-range reorder some of the important
kernels in COSMO.

We also update and rename one test case, which previously could not be optimized
and now is optimized thanks to live-range reordering. To preserve test coverage
we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises
our automatic abort in case of scalar writes in the kernel.

Reviewers: Meinersbur, bollu, singam-sanjay

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36929

llvm-svn: 311259
2017-08-19 20:21:22 +00:00
Philipp Schaad 50139f0f38 [PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtime
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.

Differential revision: D36925

llvm-svn: 311248
2017-08-19 17:04:57 +00:00
Tobias Grosser 9f2eb24c06 Clarify the intend of the run-time check
llvm-svn: 311243
2017-08-19 16:26:39 +00:00
Tobias Grosser 43df2020e7 [GPGPU] Collect parameter dimension used in MemoryAccesses
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.

llvm-svn: 311239
2017-08-19 12:58:28 +00:00
Tobias Grosser d5f1fad77c [Polly] Run early cse + memory SSA to remove redundancies in the input code
This allows us to get rid of many identical loads as they commonly appear in
Fortran code.

llvm-svn: 311231
2017-08-19 08:44:46 +00:00
Andreas Simbuerger 8d5b257d02 [Polly][Bug fix] Wrong dependences filtering during Fully Indexed expansion
Summary:
When trying to expand memory accesses, the current version of Polly uses statement Level dependences. The actual implementation is not working in case of multiple dependences per statement. For example in the following source code :
```
void mse(double A[Ni], double B[Nj], double C[Nj], double D[Nj]) {
  int i,j;
  for (j = 0; j < Ni; j++) {
    for (int i = 0; i<Nj; i++)
S:    B[i] = i;
    for (int i = 0; i<Nj; i++)
T:    D[i] = i;

U:  A[j] = B[j];
      C[j] = D[j];
  }
}
```
The statement U has two dependences with S and T. The current version of polly fails during expansion.

This patch aims to fix this bug. For that, we use Reference Level dependences to be able to filter dependences according to statement and memory ref. The principle of expansion remains the same as before.

We also noticed that we need to bail out if load come after store (at the same position) in same statement. So a check was added to isExpandable.

Contributed by: Nicholas Bonfante <nicolas.bonfante@insa-lyon.fr>

Reviewers: Meinersbur, simbuerg, bollu

Reviewed By: Meinersbur, simbuerg

Subscribers: pollydev, llvm-commits

Differential Revision: https://reviews.llvm.org/D36791

llvm-svn: 311165
2017-08-18 15:01:18 +00:00
Tobias Grosser ec02acfb98 [GPGPU] Simplify PPCGSCop to reduce compile time [NFC]
Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accumulate a lot of parameters
that are present in the input memory accesses, but often not necessary to
express the actual dependences. As isl represents maps and sets with dense
matrices, reducing the dimensionality of isl sets commonly reduces code
generation performance.

This reduces compile time from 17 to 11 seconds for our test case. While this is
not impressive, this patch helped me to identify the previous two performance
improvements and additionally also increases readability of the isl data
structures we use.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36869

llvm-svn: 311161
2017-08-18 13:38:12 +00:00
Siddharth Bhat 656e629572 [Polly] [PPCGCodeGeneration] Print current Scop and loop depth in PPCGCodeGen. [NFC]
Differential Revision: https://reviews.llvm.org/D36871

llvm-svn: 311158
2017-08-18 13:16:58 +00:00
Tobias Grosser 861a387fac [GPGPU] Do not create copy statements when targetting managed memory
Summary:
They are not used and consequently do not even need to be computed. This reduces
the overall compile time for our kernel from 1m33s to 17s.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36868

llvm-svn: 311157
2017-08-18 13:11:05 +00:00
Tobias Grosser 62acb344d0 [GPGPU] Synchronize after each kernel, not each copy out
Summary:
This change reduces the overall number of synchronize calls for kernels with
a lot of output data at the cost of additional synchronize calls for kernels
launched in sequence without any device to host transfers in between. As the
latter pattern is a lot less frequent, this seems a better tradeoff.

Even though the above motivation would be motivation enough, this is just
a step towards enabling ppcg to not compute to and from device copy calls
at all, which would be incorrect in case we still relied on these calls to
place our synchronization statements.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36867

llvm-svn: 311155
2017-08-18 12:55:58 +00:00
Siddharth Bhat dd616e9519 [ScpInliner] Move DEBUG-TYPE to below all includes to prevent cross-module interaction. [NFC]
This fixes compile errors.

llvm-svn: 311130
2017-08-17 22:21:16 +00:00
Tobias Grosser fa03cb7687 [GPGPU] Only collect the access that belong to an array [NFC]
This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile time reduction from 5
minutes to only slightly above 1 minute for one of our larger test cases.

llvm-svn: 311127
2017-08-17 22:04:53 +00:00
Siddharth Bhat b46847c035 [ScopInliner] Add a simple Scop-based inliner to polly.
We add a ScopInliner pass which inlines functions based on a simple heuristic:
Let `g` call `f`.
If we can model all of `f` as a Scop, we inline `f` into `g`.

This requires `-polly-detect-full-function` to be enabled. So, the pass
asserts that `-polly-detect-full-function` is enabled.

Differential Revision: https://reviews.llvm.org/D36832

llvm-svn: 311126
2017-08-17 21:57:23 +00:00
Tobias Grosser d2e57981fd [GPGPU] Move getExtend to C++ [NFC]
llvm-svn: 311123
2017-08-17 21:20:28 +00:00
Siddharth Bhat a2c4112791 [ManagedMemoryRewrite] Rewrite malloc, free correctly inside `Constant`s.
Reuse the machinery built for replacing global arrays to replace malloc/free as
well. Example replacement that was missed earlier:

```
call void \
    bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13)
```

- Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss
this. We don't miss this anymore.

Differential Revision: https://reviews.llvm.org/D36825

llvm-svn: 311121
2017-08-17 20:26:38 +00:00
Tobias Grosser abc5416be1 [GPGPU] Make test case independent of LLVM names
In release builds LLVM may not pass along LLVM names consistently. We make the
test cases independent of the LLVM-IR names to avoid spurious test case
failures.

llvm-svn: 311118
2017-08-17 20:09:02 +00:00
Siddharth Bhat 8a2c07f6d4 [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.
- If we have global arrays, we would like to rewrite them to global
  pointers which are allocated using `cudaMallocManaged`.

- If we have allocas in a function, we would like to rewrite them to
  heap-allocations with `cudaMallocManaged` and `cudaFree`.

- With these rewrite mechanisms, we can offload _any_ function to the
  GPU with no code rewrite whatsover.

Differential Revision: https://reviews.llvm.org/D36516

llvm-svn: 311080
2017-08-17 11:22:52 +00:00
Tobias Grosser ed6a4acc7f Add rewrite by-reference parameter pass
Summary:
This pass detangles induction variables from functions, which take variables by
reference. Most fortran functions compiled with gfortran pass variables by
reference. Unfortunately a common pattern, printf calls of induction variables,
prevent in this situation the promotion of the induction variable to a register,
which again inhibits any kind of loop analysis. To work around this issue
we developed a specialized pass which introduces separate alloca slots for
known-read-only references, which indicate the mem2reg pass that the induction
variables can be promoted to registers and consquently enable SCEV to work.

We currently hardcode the information that a function
_gfortran_transfer_integer_write does not read its second parameter, as
dragonegg does not add the right annotations and we cannot change old dragonegg
releases. Hopefully flang will produce the right annotations.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: mgorny, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36800

llvm-svn: 311066
2017-08-17 05:25:08 +00:00