Commit Graph

128 Commits

Author SHA1 Message Date
Roman Gareev 113fa82c3c [Polly] Check the properties of accesses to operands of a matrix-matrix
multiplication

The following code modifies elements of the array D.

    for (i = 0; i < _PB_NI; i++)
      for (j = 0; j < _PB_NJ; j++)
      {
        for (k = 0; k < _PB_NK; k++)
        {
          double Mul = A[i][k] * B[k][j];
          D[i][j][k] += Mul;
          C[i][j] += Mul;
        }
      }

Nevertheless, the code is recognised as a matrix-matrix multiplication, since
the second and third dimensions of D are accessed with non-zero strides.

This fixes the typo, which was made during the translation to C++ bindings
(https://reviews.llvm.org/D35845).

Reviewed By: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D110491
2021-09-28 22:58:57 +05:00
Michael Kruse a5d47b3fa0 [Polly] Fix wrong redirect in test case. 2021-09-24 14:53:00 -05:00
Michael Kruse e470f9268a [Polly] Implement user-directed loop distribution/fission.
This is a simple version without the possibility to define distribute
points or followup-transformations. However, it is the first
transformation that has to check whether the transformation is correct.

It interprets the same metadata as the LoopDistribute pass.

Re-apply after revert in c7bcd72a38 with
fix: Take isBand out of #ifndef NDEBUG since it now is used
unconditionally.
2021-09-23 21:11:01 -05:00
Petr Hosek c7bcd72a38 Revert "[Polly] Implement user-directed loop distribution/fission."
This reverts commit 52c30adc7d which
breaks the build when NDEBUG is defined.
2021-09-23 14:04:25 -07:00
Michael Kruse 07e7cb9433 [Polly] Remove -polly-opt-fusion option.
The name of the option is misleading and has been renamed by isl to
"serialize-sccs". Instead of also renaming the option, remove it.
The option is still accessible using

    -polly-isl-arg=--no-schedule-serialize-sccs
2021-09-23 15:43:08 -05:00
Michael Kruse 52c30adc7d [Polly] Implement user-directed loop distribution/fission.
This is a simple version without the possibility to define distribute
points or followup-transformations. However, it is the first
transformation that has to check whether the transformation is correct.

It interprets the same metadata as the LoopDistribute pass.
2021-09-22 17:28:25 -05:00
Michael Kruse cad9f98a2a [Polly] Don't generate inter-iteration noalias metadata.
This metadata was intended to mark all accesses within an iteration to be pairwise non-aliasing, in this case because every memory of a base pointer is touched (read or write) at most once. This is typical for 'sweeps' over all data. The stated motivation from D30606 is to ensure that unrolled iterations are considered non-aliasing.

Rhe implemention had multiple issues:

 * The structure of the noalias metadata was malformed. D110026 added check in the verifier for this metadata, and the tests were failing since then.

 * This is not true for the outer loops of the BLIS matrix multiplication, where it was being inserted. Each element of A, B, C is accessed multiple times, as often as the loop not used as an index is iterating.

 * Scopes were added to SecondLevelOtherAliasScopeList (used for the !noalias scop list) on-the-fly when another SCEV was seen. This meant that previously visited instructions would not be updated with alias scopes that are only seen later, missing out those SCEVs they should not be aliasing with.

 * Since the !noalias scope list would ideally consists of all other SCEV for this base pointer, we might run quickly into scalability issues. Especially after unrolling there would probably at least once SCEV per instruction and unroll instance.

 * The inter-iteration noalias base pointer was not removed after leaving the loop marked with it, effectively marking everything after it to noalias as well.

A solution I considered was to mark each instruction as non-aliasing with its own scope. The instruction itself would obviously alias itself, but such construction might also be considered invalid. Duplicating the instruction (e.g. due to speculation) would mark the instruction non-aliasing with its clone. I don't want to go into this territory, especially since the original motivation of determining unrolled instances as noalias based on SCEV is the what scev-aa does as well.

This effectively reverts D30606 and D35761.
2021-09-20 22:20:17 -05:00
Nikita Popov 53720f74e4 [Polly] Partially fix scoped alias metadata
This partially addresses the verifier failures caused by D110026.
In particular, it does not fix the "second level" alias metadata.
2021-09-20 22:51:31 +02:00
Michael Kruse a56bd7dec8 [Polly][Matmul] Re-pack A in every iteration.
Packed_A must be copied repeatedly, not just for the first iteration of
the outer tile.

This fixes llvm.org/PR50557
2021-06-09 15:19:52 -05:00
serge-sans-paille 4ab3041acb Revert "[NFC] remove explicit default value for strboolattr attribute in tests"
This reverts commit bda6e5bee0.

See https://lab.llvm.org/buildbot/#/builders/109/builds/15424 for instance
2021-05-24 19:43:40 +02:00
serge-sans-paille bda6e5bee0 [NFC] remove explicit default value for strboolattr attribute in tests
Since d6de1e1a71, no attributes is quivalent to
setting attribute to false.

This is a preliminary commit for https://reviews.llvm.org/D99080
2021-05-24 19:31:04 +02:00
Michael Kruse 286677870b [Polly][ManualOpt] Match interpretation of unroll metadata to LoopUnrolls's.
We previously had a different interpretation of unroll transformation
attributes than how LoopUnroll interpreted it. In particular,
llvm.loop.unroll.enable was needed explicitly to enable it and disabling
metadata was ignored.
Additionally, it required that either full unrolling or an unroll factor
to be specified or fail otherwise. An unroll factor is still required,
but the transformation is ignored with the hope that LoopUnroll is going
to apply the unrolling, since Polly currently does not implement an
heuristic.

Fixes llvm.org/PR50109
2021-04-24 04:30:19 -05:00
Michael Kruse f51427afb5 [Polly][Unroll] Fix unroll_double test.
We enumerated the cross product Domain x Scatter, but sorted only be the
scatter key. In case there are are multiple statement instances per
scatter value, the order between statement instances of the same loop
iteration was undefined.

Propertly enumerate and sort only by the scatter value, and group the
domains using the scatter dimension again.

Thanks to Leonard Chan for the report.
2021-03-16 09:00:42 -05:00
Michael Kruse 3f170eb197 [Polly][Optimizer] Apply user-directed unrolling.
Make Polly look for unrolling metadata (https://llvm.org/docs/TransformMetadata.html#loop-unrolling) that is usually only interpreted by the LoopUnroll pass and apply it to the SCoP's schedule.

While not that useful by itself (there already is an unroll pass), it introduces mechanism to apply arbitrary loop transformation directives in arbitrary order to the schedule. Transformations are applied until no more directives are found. Since ISL's rescheduling would discard the manual transformations and it is assumed that when the user specifies the sequence of transformations, they do not want any other transformations to apply. Applying user-directed transformations can be controlled using the `-polly-pragma-based-opts` switch and is enabled by default.

This does not influence the SCoP detection heuristic. As a consequence, loop that do not fulfill SCoP requirements or the initial profitability heuristic will be ignored. `-polly-process-unprofitable` can be used to disable the latter.

Other than manually editing the IR, there is currently no way for the user to add loop transformations in an order other than the order in the default pipeline, or transformations other than the one supported by clang's LoopHint. See the `unroll_double.ll` test as example that clang currently is unable to emit. My own extension of `#pragma clang loop` allowing an arbitrary order and additional transformations is available here: https://github.com/meinersbur/llvm-project/tree/pragma-clang-loop. An effort to upstream this functionality as `#pragma clang transform` (because `#pragma clang loop` has an implicit transformation order defined by the loop pipeline) is D69088.

Additional transformations from my downstream pragma-clang-loop branch are tiling, interchange, reversal, unroll-and-jam, thread-parallelization and array packing. Unroll was chosen because it uses already-defined metadata and does not require correctness checks.

Reviewed By: sebastiankreutzer

Differential Revision: https://reviews.llvm.org/D97977
2021-03-15 13:05:39 -05:00
Michael Kruse e200df952b [Polly] Port IslScheduleOptimizer to the NewPM. 2021-02-09 23:56:21 -06:00
Michael Kruse 842314b5f0 [Polly] Update isl to isl-0.23-61-g24e8cd12.
This fixes llvm.org/PR48554

Some test cases had to be updated because the hash function for
union_maps have been changed which affects the output order.
2021-01-19 12:01:31 -06:00
Mark Schimmel 8e570abf10 Polly - specify address space when creating a pointer to a vector type
Polly incorrectly dropped the address space specified for a load instruction when it vectorized the code.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D88907
2020-10-14 11:17:15 -05:00
Michael Kruse c0bc995429 [Polly] Fix prevectorization of fused loops.
The schedule of a fused loop has one isl_space per statement, such that
a conversion to a isl_map fails. However, the prevectorization is
interested in the schedule space only: Converting to the non-union
representation only after extracting the schedule range fixes the problem.

This fixes llvm.org/PR46578
2020-07-10 16:42:03 -05:00
Michael Kruse 32bf468420 [Polly] Fix -polly-opt-isl -analyze
The member LastSchedule was never set, such that printScop would always
print "n/a" instead of the last schedule.

To ensure that the isl_ctx lives as least as long as the stored
schedule, also store a shared_ptr.

Also set the schedule tree output style to ISL_YAML_STYLE_BLOCK to avoid
printing everything on a single line.

`opt -polly-opt-isl -analyze` will be used in the next commit.
2020-07-10 16:42:03 -05:00
Arthur Eubanks b210c9899b [BasicAA] Replace -basicaa with -basic-aa in polly
Follow up to https://reviews.llvm.org/D82607.
2020-06-30 15:50:17 -07:00
Fangrui Song a36ddf0aa9 Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351 2019-12-24 16:27:51 -08:00
Fangrui Song 502a77f125 Migrate function attribute "no-frame-pointer-elim" to "frame-pointer"="all" as cleanups after D56351 2019-12-24 15:57:33 -08:00
Michael Kruse aa8a976174 [ScheduleOptimizer] Hoist extension nodes after schedule optimization.
Extension nodes make schedule trees are less flexible: Many operations,
such as rescheduling, do not work on such schedule trees with extension.
As such, some functionality such as determining parallel loops in isl's
AST are disabled.

Currently, only the pattern-matching generalized matrix-matrix
multiplication optimization adds extension nodes (to add copy-in
statements).

This patch removes all extension nodes as the last step of the schedule
optimization by hoisting the extension node's added domain up to the
root domain node. All following passes can assume that schedule trees
work without restrictions, including the parallelism test. Mark the
outermost loop of the optimized matrix-matrix multiplication as parallel
such that -polly-parallel is able to parallelize that loop.

Differential Revision: https://reviews.llvm.org/D58202

llvm-svn: 362257
2019-05-31 19:26:57 +00:00
Michael Kruse 7860c5fe4e [IslAst] Fix InParallelFor nesting.
IslAst could mark two nested outer loops as "OutermostParallel". It
caused that the code generator tried to OpenMP-parallelize both loops,
which it is not prepared loop.

It was because the recursive AST build algorithm managed a flag
"InParallelFor" to ensure that no nested loop is also marked as
"OutermostParallel". Unfortunatetly the same flag was used by nodes
marked as SIMD, and reset to false after the SIMD node. Since loops can
be marked as SIMD inside "OutermostParallel" loops, the recursive
algorithm again tried to mark loops as "OutermostParellel" although
still nested inside another "OutermostParallel" loop.

The fix exposed another bug: The function "astScheduleDimIsParallel" was
only called when a loop was potentially "OutermostParallel" or
"InnermostParallel", but as a side-effect also determines the minimum
dependence distance. Hence, changing when we need to know whether a loop
is "OutermostParallel" also changed which loop was annotated with
"#pragma minimal dependence distance".

Moreover, some complex condition linked with "InParallelFor" determined
whether a loop should be an "InnermostParallel" loop. It missed some
situations where it would not use mark as such although being inside an
SIMD mark node, and therefore not be annotated using "#pragma simd".

The changes in particular:

1. Split the "InParallelFor" flag into an "InParallelFor" and an
   "InSIMD" flag.

2. Unconditionally call "astScheduleDimIsParallel" for its side-effects
   and store the result in "InParallel" for later use.

3. Simplify the condition when a loop is "InnermostParallel".

Fixes llvm.org/PR33153 and llvm.org/PR38073.

llvm-svn: 343212
2018-09-27 13:39:37 +00:00
Tobias Grosser fa8079d0dc Update isl to isl-0.18-1047-g4a20ef8
This update:

  - Removes several deprecated functions (e.g., isl_band).
  - Improves the pretty-printing of sets by detecting modulos and "false"
    equalities.
  - Minor improvements to coalescing and increased robustness of the isl
    scheduler.

This update does not yet include isl commit isl-0.18-90-gd00cb45
(isl_pw_*_alloc: add missing check for compatible spaces, Wed Sep 6 12:18:04
2017 +0200), as this additional check is too tight and unfortunately causes
two test case failures in Polly. A patch has been submitted to isl and will be
included in the next isl update for Polly.

llvm-svn: 325557
2018-02-20 07:26:42 +00:00
Philip Pfaffe 00fd43b327 Port ScopInfo to the isl cpp bindings
Summary:
Most changes are mechanical, but in one place I changed the program semantics
by fixing a likely bug:

In `Scop::hasFeasibleRuntimeContext()`, I'm now explicitely handling the
error-case. Before, when the call to `addNonEmptyDomainConstraints()`
returned a null set, this (probably) accidentally worked because
isl_bool_error converts to true. I'm checking for nullptr now.

Reviewers: grosser, Meinersbur, bollu

Reviewed By: Meinersbur

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Differential Revision: https://reviews.llvm.org/D39971

llvm-svn: 318632
2017-11-19 22:13:34 +00:00
Michael Kruse 8dceb76066 [ScheduleOptimizer] Fix and test schedule tree statistics.
Fix walking over the schedule tree to collect its properties
(Number of permutable bands etc.).

Also add regression tests for these statistics.

llvm-svn: 313750
2017-09-20 11:53:05 +00:00
Roman Gareev 925ce50f1b Unroll and separate the remaining parts of isolation
The remaining parts produced by the full partial tile isolation can contain
hot spots that are worth to be optimized. Currently, we rely on the simple
loop unrolling pass, LiCM and the SLP vectorizer to optimize such parts.
However, the approach can suffer from the lack of the information about
aliasing that Polly provides using additional alias metadata or/and the lack
of the information required by simple loop unrolling pass.

This patch is the first step to optimize the remaining parts. To do it, we
unroll and separate them. In case of, for instance, Intel Kaby Lake, it helps
to increase the performance of the generated code from 39.87 GFlop/s to
49.23 GFlop/s.

The next possible step is to avoid unrolling performed by Polly in case of
isolated and remaining parts and rely only on simple loop unrolling pass and
the Loop vectorizer.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D37692

llvm-svn: 312929
2017-09-11 17:46:47 +00:00
Tobias Grosser 4a07bbe3f6 [IRBuilder] Only emit alias scop metadata for arrays, but not scalars
Summary:
There is no need to emit alias metadata for scalars, as basicaa will easily
distinguish them from arrays. This reduces the size of the metadata we generate.
This is especially useful after we moved to -polly-position=before-vectorizer,
where a lot more scalar dependences are introduced, which increased the size of
the alias analysis metadata and made us commonly reach the limits after which
we do not emit alias metadata that have been introduced to prevent quadratic
growth of this alias metadata.

This improves 2mm performance from 1.5 seconds to 0.17 seconds.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: Meinersbur

Subscribers: pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D37028

llvm-svn: 311498
2017-08-22 21:58:48 +00:00
Roman Gareev 0956a606ff Disable the Loop Vectorizer in case of GEMM
Currently, in case of GEMM and the pattern matching based optimizations, we
use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop
Vectorizer can get in the way of optimal code generation, we disable the Loop
Vectorizer for the innermost loop using mark nodes and emitting the
corresponding metadata.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D36928

llvm-svn: 311473
2017-08-22 17:38:46 +00:00
Michael Kruse d091bf8d8e [MatMul] Make MatMul detection independent of internal isl representations.
The pattern recognition for MatMul is restrictive.

The number of "disjuncts" in the isl_map containing constraint
information was previously required to be 1
(as per isl_*_coalesce - which should ideally produce a domain map with
a single disjunct, but does not under some circumstances).

This was changed and made more flexible.

Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D36460

llvm-svn: 311302
2017-08-20 21:31:11 +00:00
Roman Gareev 1563f039f5 Use SCEV information for the second level aliasing
We introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. To distinguish two
accesses, the comparison of raw pointers representing base pointers is used.

In case of, for example, ublas's prod function that implements GEMM, and
DeLiCM we can get accesses to same location represented by different raw
pointers. Consequently, we create different alias sets that can prevent
accesses from, for example, being sinked or hoisted.

To avoid the issue, we compare the corresponding SCEV information instead
of the corresponding raw pointers.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D35761

llvm-svn: 310380
2017-08-08 16:50:28 +00:00
Roman Gareev dbde718676 Do not use isl_set_project_out to get all loop prefixes
Currently, only convex isolation sets can be efficiently processed by isl.
Consequently, as a temporary solution, we use a different algorithm for partial
tile isolation that helps to build convex isolation sets in some cases.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D36278

llvm-svn: 310374
2017-08-08 16:15:33 +00:00
Tobias Grosser 327e9ecb0d [ScheduleOptimizer] Make matmul pattern detection work with delicm output
In certain cases delicm might decide to not leave the original array write in
the loop body, but to remove it and instead leave a transformed phi node as
write access. This commit teached the matmul pattern detection to order the
memory accesses according to when the access actually happens and use this
information to detect the new pattern. This makes pattern based matmul
optimization work for 2mm and 3mm in polybench 4 after
polly-position=before-vectorizer has been enabled.

llvm-svn: 310338
2017-08-08 06:15:15 +00:00
Roman Gareev 5abea0c97a [FIX] Update test/ScheduleOptimizer/pattern-matching-based-opts_11.ll.
llvm-svn: 308501
2017-07-19 18:01:51 +00:00
Roman Gareev 6531df41ae [FIX] Fix pattern-matching-based-opts_11.ll.
llvm-svn: 308499
2017-07-19 17:33:42 +00:00
Roman Gareev 750374181b Make the pattern matching work with modified memory accesses
Some optimizations (e.g., DeLICM) can modify memory accesses (e.g., change
their MemoryKind). Consequently, the pattern matching should take it into
the account.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D33138

llvm-svn: 308494
2017-07-19 16:59:06 +00:00
Michael Kruse bb7d22a31a [Test] Do not pipe binary data to FileCheck.
llvm-svn: 308437
2017-07-19 11:12:16 +00:00
Tobias Grosser 4556c9b8fe [ScopInfo] Simplify new access functions under domain context
Summary:
We do not keep domain constraints on access functions when building the
scop. Hence, for consistency reasons, it makes also sense to not include
them when storing a new access function. This change results in simpler
access functions that make output easier to read.

This patch also helps to make DeLICMed memory accesses to be understood by
our matrix multiplication pattern matching pass. Further changes to the
matrix multiplication pattern matching are needed for this to work, so the
corresponding test case will be added in a future commit.

Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg

Subscribers: pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35237

llvm-svn: 308215
2017-07-17 20:47:10 +00:00
Tobias Grosser 5e41458985 Bump isl to isl-0.18-768-g033b61ae
Summary: This is a general maintenance update

Reviewers: grosser

Subscribers: srhines, fedor.sergeev, pollydev, llvm-commits

Contributed-by: Maximilian Falkenstein <falkensm@student.ethz.ch>

Differential Revision: https://reviews.llvm.org/D34903

llvm-svn: 307090
2017-07-04 15:54:11 +00:00
Roman Gareev c4a4d04717 [FIX] A small addition to r305675.
llvm-svn: 306234
2017-06-25 06:30:11 +00:00
Michael Kruse 214deb7960 [CodeGen] Emit aliasing metadata for new arrays.
Ensure that all array base pointers are assigned before generating
aliasing metadata by allocating new arrays beforehand.

Before this patch, getBasePtr() returned nullptr for new arrays because
the arrays were created at a later point. Nullptr did not match to any
array after the created array base pointers have been assigned and when
the loads/stores are generated.

llvm-svn: 305675
2017-06-19 10:19:29 +00:00
Hongbin Zheng 4fe342cb75 [Polly] Generate more 'canonical' induction variable
Today Polly generates induction variable in this way:

polly.indvar = phi 0, polly.indvar.next
...
polly.indvar.next = polly.indvar + stide
polly.loop_cond = predicate polly.indvar, (UB - stride)

Instead of:

polly.indvar = phi 0, polly.indvar.next
...
polly.indvar.next = polly.indvar + stide
polly.loop_cond = predicate polly.indvar.next, UB

The way Polly generate induction variable cause some problem in the indvar simplify pass.
This patch make polly generate the later form, by assuming the induction variable never overflow

Differential Revision: https://reviews.llvm.org/D33089

llvm-svn: 302866
2017-05-12 02:17:15 +00:00
Philip Pfaffe 78265cd237 Fix missing .git/indexloadPolly in ensure-correct-tile-sizes testcase
llvm-svn: 299765
2017-04-07 12:55:26 +00:00
Roman Gareev e0d466342b Restore the initial ordering of dimensions before applying the pattern matching
Dimensions of band nodes can be implicitly permuted by the algorithm applied
during the schedule generation.

For example, in case of the following matrix-matrix multiplication,

for (i = 0; i < 1024; i++)
  for (k = 0; k < 1024; k++)
    for (j = 0; j < 1024; j++)
      C[i][j] += A[i][k] * B[k][j];

it can produce the following schedule tree

domain: "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and
                                        0 <= i2 <= 1023 }"
child:
  schedule: "[{ Stmt_for_body6[i0, i1, i2] -> [(i0)] },
              { Stmt_for_body6[i0, i1, i2] -> [(i1)] },
              { Stmt_for_body6[i0, i1, i2] -> [(i2)] }]"
  permutable: 1
  coincident: [ 1, 1, 0 ]

The current implementation of the pattern matching optimizations relies on the
initial ordering of dimensions. Otherwise, it can produce the miscompilation
(e.g., [1]).

This patch helps to restore the initial ordering of dimensions by recreating
the band node when the corresponding conditions are satisfied.

Refs.:

[1] - https://bugs.llvm.org/show_bug.cgi?id=32500

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D31741

llvm-svn: 299662
2017-04-06 17:09:54 +00:00
Siddharth Bhat 5eeb1dd42e [Polly] [ScheduleOptimizer] Prevent incorrect tile size computation
Because Polly exposes parameters that directly influence tile size
calculations, one can setup situations like divide-by-zero.

Check against a possible divide-by-zero in getMacroKernelParams
and return early.

Also assert at the end of getMacroKernelParams that the block sizes
computed for matrices are positive (>= 1).

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31708

llvm-svn: 299633
2017-04-06 08:20:22 +00:00
Roman Gareev cdfb57dc46 Introduce another level of metadata to distinguish non-aliasing accesses
Introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. It can be used to,
for example, distinguish different stores (loads) produced by unrolling of
the innermost loops and, subsequently, sink (hoist) them by LICM.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30606

llvm-svn: 298510
2017-03-22 14:25:24 +00:00
Tobias Grosser c9d4cb2f42 [ScheduleOptimizer] Allow tiling after fusion
In ScheduleOptimizer::isTileableBand(), allow the case in which
the band node's child is an isl_schedule_sequence_node and its
grandchildren isl_schedule_leaf_nodes. This case can arise when
two or more statements are fused by the isl scheduler.

The tile_after_fusion.ll test has two statements in separate
loop nests and checks whether they are tiled after being fused
when polly-opt-fusion equals "max".

Reviewers: grosser

Subscribers: gareevroman, pollydev

Tags: #polly

Contributed-by: Theodoros Theodoridis <theodort@student.ethz.ch>

Differential Revision: https://reviews.llvm.org/D30815

llvm-svn: 297587
2017-03-12 19:02:31 +00:00
Roman Gareev 96e1119a96 Make optimizations based on pattern matching be enabled by default
Currently, pattern based optimizations of Polly can identify matrix
multiplication and optimize it according to BLIS matmul optimization pattern
(see ScheduleTreeOptimizer for details). This patch makes optimizations
based on pattern matching be enabled by default.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30293

llvm-svn: 295958
2017-02-23 11:44:12 +00:00
Roman Gareev b196055c0c Check reduction dependencies in case of the matrix multiplication optimization
To determine parameters of the matrix multiplication, we check RAW dependencies
that can be expressed using only reduction dependencies. Consequently, we
should check the reduction dependencies, if this is the case.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Sven Verdoolaege <skimo-polly@kotnet.org>
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29814

llvm-svn: 294836
2017-02-11 09:59:09 +00:00