Commit Graph

2333 Commits

Author SHA1 Message Date
Tobias Grosser de244eb450 Possible error in doc comment
If a SCoP is most probably sequential, then it's better to run it on a CPU.
Hence, there's no point in running it on a GPU.

Reviewers: grosser

Subscribers: nemanjai

Tags: #polly

Contributed-by: Singapuram Sanjay <singapuram.sanjay@gmail.com>

Differential Revision: https://reviews.llvm.org/D30864

llvm-svn: 297578
2017-03-12 08:19:01 +00:00
Tobias Grosser b2347dc241 [isl++] Add missing /* implicit */ marker
llvm-svn: 297577
2017-03-12 08:17:50 +00:00
Tobias Grosser 5ac963743f [isl++] Add last set of missing isl:: prefixes to increase consistency [NFC]
llvm-svn: 297558
2017-03-11 07:58:12 +00:00
Tobias Grosser d67d368e12 [isl++] Add namespace prefixes to isl::ctx and isl::stat
These were missed in r297478. We add them for consistency.

llvm-svn: 297520
2017-03-10 22:10:19 +00:00
Tobias Grosser 30a06dce68 [isl++] Drop warning about experimental status
As most discussions about these bindings have concluded and only the final
patch review on the isl mailing list is missing, we drop the experimental
warning tag to match the patchset we will submit to isl, which is expected to
not change notably any more.

llvm-svn: 297519
2017-03-10 22:10:15 +00:00
Tobias Grosser 9839774e5d [isl++] Do not use enum prefix
Instead of declaring a function as:

  inline val plain_get_val_if_fixed(enum dim type, unsigned int pos) const;

we use:

  inline isl::val plain_get_val_if_fixed(isl::dim type, unsigned int pos) const;

The first argument caused the following compile time error on windows:

  "error C3431: 'dim': a scoped enumeration cannot be redeclared as an
  unscoped enumeration"

In some cases it is sufficient to just drop the 'enum' prefix, but for example
for isl::set the 'enum class dim' type collides with the function name
isl::set::dim and can consequently not be referenced. To avoid such kind of
ambiguities in the future we add the isl:: prefix consistently to all types
used.

Reported-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 297478
2017-03-10 17:01:30 +00:00
Michael Kruse 0446d81e2d [Simplify] Add -polly-simplify pass.
This new pass removes unnecessary accesses and writes. It currently
supports 2 simplifications, but more are planned.

It removes write accesses that write a loaded value back to the location
it was loaded from. It is a typical artifact from DeLICM. Removing it
will get rid of bogus dependencies later in dependency analysis.

It also removes statements without side-effects. ScopInfo already
removes these, but the removal of unnecessary writes can result in
more side-effect free statements.

Differential Revision: https://reviews.llvm.org/D30820

llvm-svn: 297473
2017-03-10 16:05:24 +00:00
Tobias Grosser 3e618c33fe [DeadCodeElimination] Translate to C++ bindings
This pass is a small and self-contained example of a piece of code that was
written with the isl C interface. The diff of this change nicely shows how the
C++ bindings can improve the readability of the code by avoiding the long C
function names and by avoiding any need for memory management.

As you will see, no calls to isl_*_copy or isl_*_free are needed anymore.
Instead the C++ interface takes care of automatically managing the objects.
This may introduce internally additional copies, but due to the isl reference
counting, such copies are expected to be cheap. For performance critical
operations, we will later exploit move semantics to eliminate unnecessary
copies that have shown to be costly.

Below we give a set of examples that shows the benefit of the C++ interface vs.
the pure C interface.

Check properties
----------------

Before:

  if (isl_aff_is_zero(aff) ||  isl_aff_is_one(aff))
    return true;

After:

  if (Aff.is_zero() || Aff.is_one())
    return true;

Type conversion
---------------

Before:

  isl_union_pw_multi_aff *UPMA = isl_union_pw_multi_aff_from_union_map(umap);

After:

  isl::union_pw_multi_aff UPMA = UMap;

Type construction
-----------------

Before:

  auto *Empty = isl_union_map_empty(space);

After:

  auto Empty = isl::union_map::empty(Space);

Operations
----------

Before:

  set = isl_union_set_intersect(set, set2);

After:

  Set = Set.intersect(Set2);

The use of isl::boolean in return types also adds an increases the robustness
of Polly, as on conversion to true or false, we verify that no isl_bool_error
has been returned and assert in case an error was returned. Before this change
we would have just ignored the error and proceeded with (some) exection path.

Tags: #polly

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D30619

llvm-svn: 297466
2017-03-10 15:05:38 +00:00
Tobias Grosser 51ebda8c9d [FlattenAlgo] Translate to C++ bindings
Translate the full algorithm to use the new isl C++ bindings

This is a large piece of code that has been written with the Polly IslPtr<>
memory management tool, which only performed memory management, but did not
provide a method interface. As such the code was littered with calls to
give(), copy(), keep(), and take(). The diff of this change should give a
good example how the new method interface simplifies the code by removing the
need for switching between managed types and C functions all the time
and consequently also the need to use the long C function names.

These are a couple of examples comparing the old IslPtr memory management
interface with the complete method interface.

Check properties
----------------

Before:

  if (isl_aff_is_zero(Aff.get()) ||  isl_aff_is_one(Aff.get()))
    return true;

After:

  if (Aff.is_zero() || Aff.is_one())
    return true;

Type conversion
---------------

Before:

  isl_union_pw_multi_aff *UPMA =
      give(isl_union_pw_multi_aff_from_union_map(UMap.copy());

After:

  isl::union_pw_multi_aff UPMA = UMap;

Type construction
-----------------

Before:

  auto Empty = give(isl_union_map_empty(Space.copy());

After:

  auto Empty = isl::union_map::empty(Space);

Operations
----------

Before:

  Set = give(isl_union_set_intersect(Set.copy(), Set2.copy());

After:

  Set = Set.intersect(Set2);

Tags: #polly

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D30617

llvm-svn: 297463
2017-03-10 14:55:58 +00:00
Tobias Grosser 4c24e57965 Add method interface to isl C++ bindings
The isl C++ binding method interface introduces a thin C++ layer that allows
to call isl methods directly on the memory managed C++ objects. This makes the
relevant methods directly available via code-completion interfaces, allows for
the use of overloading, conversion constructors, and many other nice C++
features that make using isl a lot easier.

The individual features will be highlighted in the subsequent commits.

Tags: #polly

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D30616

llvm-svn: 297462
2017-03-10 14:53:00 +00:00
Tobias Grosser deaef15f52 Introduce isl C++ bindings, Part 1: value_ptr style interface
Over the last couple of months several authors of independent isl C++ bindings
worked together to jointly design an official set of isl C++ bindings which
combines their experience in developing isl C++ bindings. The new bindings have
been designed around a value pointer style interface and remove the need for
explicit pointer managenent and instead use C++ language features to manage isl
objects.

This commit introduces the smart-pointer part of the isl C++ bindings and
replaces the current IslPtr<T> classes, which served the very same purpose, but
had to be manually maintained. Instead, we now rely on automatically generated
classes for each isl object, which provide value_ptr semantics.

An isl object has the following smart pointer interface:

    inline set manage(__isl_take isl_set *ptr);

    class set {
      friend inline set manage(__isl_take isl_set *ptr);
      isl_set *ptr = nullptr;
      inline explicit set(__isl_take isl_set *ptr);

    public:
      inline set();
      inline set(const set &obj);
      inline set &operator=(set obj);
      inline ~set();
      inline __isl_give isl_set *copy() const &;
      inline __isl_give isl_set *copy() && = delete;
      inline __isl_keep isl_set *get() const;
      inline __isl_give isl_set *release();
      inline bool is_null() const;
    }

The interface and behavior of the new value pointer style classes is inspired
by http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3339.pdf, which
proposes a std::value_ptr, a smart pointer that applies value semantics to its
pointee.

We currently only provide a limited set of public constructors and instead
require provide a global overloaded type constructor method "isl::obj
isl::manage(isl_obj *)", which allows to convert an isl_set* to an isl::set by
calling 'S = isl::manage(s)'. This pattern models the make_unique() constructor
for unique pointers.

The next two functions isl::obj::get() and isl::obj::release() are taken
directly from the std::value_ptr proposal:

S.get() extracts the raw pointer of the object managed by S.
S.release() extracts the raw pointer of the object managed by S and sets the
object in S to null.

We additionally add std::obj::copy(). S.copy() returns a raw pointer refering
to a copy of S, which is a shortcut for "isl::obj(oldobj).release()", a
functionality commonly needed when interacting directly with the isl C
interface where all methods marked with __isl_take require consumable raw
pointers.

S.is_null() checks if S manages a pointer or if the managed object is currently
null. We add this function to provide a more explicit way to check if the
pointer is empty compared to a direct conversion to bool.

This commit also introduces a couple of polly-specific extensions that cover
features currently not handled by the official isl C++ bindings draft, but
which have been provided by IslPtr<T> and are consequently added to avoid code
churn. These extensions include:

	- operator bool() : Conversion from objects to bool
	- construction from nullptr_t
	- get_ctx() method
	- take/keep/give methods, which match the currently used naming
	  convention of IslPtr<T> in Polly. They just forward to
	  (release/get/manage).
	- raw_ostream printers

We expect that these extensions are over time either removed or upstreamed to
the official isl bindings.

We also export a couple of classes that have not yet been exported in isl (e.g.,
isl::space)

As part of the code review, the following two questions were asked:

- Why do we not use a standard smart pointer?

std::value_ptr was a proposal that has not been accepted. It is consequently
not available in the standard library. Even if it would be available, we want
to expand this interface with a complete method interface that is conveniently
available from each managed pointer. The most direct way to achieve this is to
generate a specialiced value style pointer class for each isl object type and
add any additional methods to this class. The relevant changes follow in
subsequent commits.

- Why do we not use templates or macros to avoid code duplication?

It is certainly possible to use templates or macros, but as this code is
auto-generated there is no need to make writing this code more efficient. Also,
most of these classes will be specialized with individual member functions in
subsequent commits, such that there will be little code reuse to exploit. Hence,
we decided to do so at the moment.

These bindings are not yet officially part of isl, but the draft is already very
stable. The smart pointer interface itself did not change since serveral months.
Adding this code to Polly is against our normal policy of only importing
official isl code. In this case however, we make an exception to showcase a
non-trivial use case of these bindings which should increase confidence in these
bindings and will help upstreaming them to isl.

Tags: #polly

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D30325

llvm-svn: 297452
2017-03-10 11:41:03 +00:00
Tobias Grosser e5671e54c0 Update to isl-0.18-356-g0b05d01
This is a regular maintenance update.

llvm-svn: 297449
2017-03-10 09:17:55 +00:00
Michael Kruse e4292bf086 [Support] Add -polly-dump-module pass.
This pass allows writing the LLVM-IR just before and after the Polly
passes to a file.

Dumping the IR before Polly helps reproducing bugs that occur in code
generated by clang. It is the only reliable way to get the IR that
triggers a bug. The alternative is to emit the IR with

    clang -c -emit-llvm -S -o dump.ll

then pass it through all optimization passes

    opt dump.ll -basicaa -sroa ... -S -o optdump.ll

to then reproduce the error with

    opt optdump.ll -polly-opt-isl -polly-codegen -analyze

However, the IR is not the same. -O3 uses a PassBuilder than creates passes
with different parameters than the default.

Dumping the IR after Polly is useful to compare a miscompilation with
a known-good configuration.

Differential Revision: https://reviews.llvm.org/D30788

llvm-svn: 297415
2017-03-09 22:29:58 +00:00
Tobias Grosser 8bd7f3c0a5 [ScopDetect/Info] Allow unconditional hoisting of loads from dereferenceable ptrs
In case LLVM pointers are annotated with !dereferencable attributes/metadata
or LLVM can look at the allocation from which a pointer is derived, we can know
that dereferencing pointers is safe and can be done unconditionally. We use this
information to proof certain pointers as save to hoist and then hoist them
unconditionally.

llvm-svn: 297375
2017-03-09 11:36:00 +00:00
Michael Kruse 9fb3ab1b19 [DeLICM] Add -polly-delicm-overapproximate-writes option.
One of the current limitations of DeLICM is that it only creates
PHI WRITEs that it knows are read by some PHI. Such writes may not span
all instances of a statement. Polly's code generator currently does not
support MemoryAccesses that are not executed in all instances
('partial accesses') and so has to give up on a possible mapping.

This workaround has once been suggested by Tobias Grosser: Try to
interpolate an arbitrary expansion to all instances. It will be checked
for possible conflicts with the existing Knowledge and can be applied if
the conflict checking result is that no semantics are changed.

Expansion is done by simplifying the mapping by coalescing with the hope
that coalescing will find a polyhedral 'rule' of the relevant map. It is
then 'gist'-ed using the domain of the relevant instances such that the
rule is expanded to the universe and finally intersected with the domain
of all statement instances.

The expansion makes conflicts become more likely, the found rule may
still not encompass all statement instances and the found rule exposes
internals of isl's implementation of coalesce and gist. The latter means
that the result depends on how much effort the implementation invests
into finding a rule which may change between versions of isl. Trivial
implementations of gist and coalesce just return the input arguments.

A patch that makes codegen support partial accesses is in preparation
as well.

Differential Revision: https://reviews.llvm.org/D30763

llvm-svn: 297373
2017-03-09 11:23:22 +00:00
Michael Kruse 935b2a3654 [DeadCodeElim] Put -polly-dce-precise-steps into the Polly category.
llvm-svn: 297318
2017-03-08 23:25:35 +00:00
Michael Kruse 6744efa8d8 [ScopDetection] Only allow SCoP-wide available base pointers.
Simplify ScopDetection::isInvariant(). Essentially deny everything that
is defined within the SCoP and is not load-hoisted.

The previous understanding of "invariant" has a few holes:

- Expressions without side-effects with only invariant arguments, but
  are defined withing the SCoP's region with the exception of selects
  and PHIs. These should be part of the index expression derived by
  ScalarEvolution and not of the base pointer.

- Function calls with that are !mayHaveSideEffects() (typically
  functions with "readnone nounwind" attributes). An example is given
  below.

      @C = external global i32
      declare float* @getNextBasePtr(float*) readnone nounwind
      ...
      %ptr = call float* @getNextBasePtr(float* %A, float %B)

  The call might return:

  * %A, so %ptr aliases with it in the SCoP
  * %B, so %ptr aliases with it in the SCoP
  * @C, so %ptr aliases with it in the SCoP
  * a new pointer everytime it is called, such as malloc()
  * a pointer into the allocated block of one of the aforementioned
  * any of the above, at random at each call

  Hence and contrast to a comment in the base_pointer.ll regression
  test, %ptr is not necessarily the same all the time. It might also
  alias with anything and no AliasAnalysis can tell otherwise if the
  definition is external. It is hence not suitable in the role of a
  base pointer.

The practical problem with base pointers defined in SCoP statements is
that it is not available globally in the SCoP. The statement instance
must be executed first before the base pointer can be used. This is no
problem if the base pointer is transferred as a scalar value between
statements. Uses of MemoryAccess::setNewAccessRelation may add a use of
the base pointer anywhere in the array. setNewAccessRelation is used by
JSONImporter, DeLICM and D28518. Indeed, BlockGenerator currently
assumes that base pointers are available globally and generates invalid
code for new access relation (referring to the base pointer of the
original code) if not, even if the base pointer would be available in
the statement.

This could be fixed with some added complexity and restrictions. The
ExprBuilder must lookup the local BBMap and code that call
setNewAccessRelation must check whether the base pointer is available
first.

The code would still be incorrect in the presence of aliasing. There
is the switch -polly-ignore-aliasing to explicitly allow this, but
it is hardly a justification for the additional complexity. It would
still be mostly useless because in most cases either getNextBasePtr()
has external linkage in which case the readnone nounwind attributes
cannot be derived in the translation unit itself, or is defined in the
same translation unit and gets inlined.

Reviewed By: grosser

Differential Revision: https://reviews.llvm.org/D30695

llvm-svn: 297281
2017-03-08 15:14:46 +00:00
Michael Kruse 5a4ec5c42b [ScopDetection] Require LoadInst base pointers to be hoisted.
Only when load-hoisted we can be sure the base pointer is invariant
during the SCoP's execution. Most of the time it would be added to
the required hoists for the alias checks anyway, except with
-polly-ignore-aliasing, -polly-use-runtime-alias-checks=0 or if
AliasAnalysis is already sure it doesn't alias with anything
(for instance if there is no other pointer to alias with).

Two more parts in Polly assume that this load-hoisting took place:
- setNewAccessRelation() which contains an assert which tests this.
- BlockGenerator which would use to the base ptr from the original
  code if not load-hoisted (if the access expression is regenerated)

Differential Revision: https://reviews.llvm.org/D30694

llvm-svn: 297195
2017-03-07 20:28:43 +00:00
Tobias Grosser a0b85963ba Update isl to isl-0.18-336-g1e193d9
This is a regular maintenance update

llvm-svn: 297169
2017-03-07 17:53:34 +00:00
Tobias Grosser ce69e7b593 [ScopInfo] Avoid infinite loop during schedule construction
Our current scop modeling enters an infinite loop when trying to model code
that has unreachable instructions (e.g.,
test/ScopInfo/BoundChecks/single-loop.ll), as the number of basic blocks
returned by the LLVM Loop* does not include unreachable basic blocks that
branch off from the core loop body. This arises for example in the following
piece of code:

  for (i = 0; i < N; i++) {
    if (i > 1024)
      abort();            <- this abort might be translated to an
                             unreachable

    A[i] = ...
  }

This patch adds these unreachable basic blocks in our per loop basic block
count to ensure that the schedule construction does not assume a loop has been
processed completely, despite certain unreachable basic blocks still remaining.

The infinite loop is only observable in combination with
https://reviews.llvm.org/D12676 or a similar patch.

llvm-svn: 297156
2017-03-07 16:17:55 +00:00
Tobias Grosser 134a572951 [ScopDetection] Do not detect scops that exit to an unreachable
Scops that exit with an unreachable are today still permitted, but make little
sense to optimize. We therefore can already skip them during scop detection.
This speeds up scop detection in certain cases and also ensures that bugpoint
does not introduce unreachables when reducing test cases.

In practice this change should have little impact, as the performance of
unreachable code is unlikely to matter.

This commit is part of a series that makes Polly more robust in the presence
of unreachables.

llvm-svn: 297151
2017-03-07 15:50:43 +00:00
Tobias Grosser 1c787e0b49 [ScopDetection] Do not allow required-invariant loads in non-affine region
These loads cannot be savely hoisted as the condition guarding the
non-affine region cannot be duplicated to also protect the hoisted load
later on. Today they are dropped in ScopInfo. By checking for this early, we
do not even try to model them and possibly can still optimize smaller regions
not containing this specific required-invariant load.

llvm-svn: 296744
2017-03-02 12:15:37 +00:00
Tobias Grosser c2f151084d [ScopInfo] Disable memory folding in case it results in multi-disjunct relations
Multi-disjunct access maps can easily result in inbound assumptions which
explode in case of many memory accesses and many parameters. This change reduces
compilation time of some larger kernel from over 15 minutes to less than 16
seconds.

Interesting is the test case test/ScopInfo/multidim_param_in_subscript.ll
which has a memory access

  [n] -> { Stmt_for_body3[i0, i1] -> MemRef_A[i0, -1 + n - i1] }

which requires folding, but where only a single disjunct remains. We can still
model this test case even when only using limited memory folding.

For people only reading commit messages, here the comment that explains what
memory folding is:

To recover memory accesses with array size parameters in the subscript
expression we post-process the delinearization results.

We would normally recover from an access A[exp0(i) * N + exp1(i)] into an
array A[][N] the 2D access A[exp0(i)][exp1(i)]. However, another valid
delinearization is A[exp0(i) - 1][exp1(i) + N] which - depending on the
range of exp1(i) - may be preferrable. Specifically, for cases where we
know exp1(i) is negative, we want to choose the latter expression.

As we commonly do not have any information about the range of exp1(i),
we do not choose one of the two options, but instead create a piecewise
access function that adds the (-1, N) offsets as soon as exp1(i) becomes
negative. For a 2D array such an access function is created by applying
the piecewise map:

[i,j] -> [i, j] :      j >= 0
[i,j] -> [i-1, j+N] :  j <  0

After this patch we generate only the first case, except for situations where
we can proove the first case to be invalid and can consequently select the
second without introducing disjuncts.

llvm-svn: 296679
2017-03-01 21:11:27 +00:00
Tobias Grosser 24222c7357 Fix namespaces after clang-format update
llvm-svn: 296635
2017-03-01 15:54:27 +00:00
Tobias Grosser d7c4975349 [ScopInfo] Simplify inbounds assumptions under domain constraints
Without this simplification for a loop nest:

  void foo(long n1_a, long n1_b, long n1_c, long n1_d,
           long p1_b, long p1_c, long p1_d,
           float A_1[][p1_b][p1_c][p1_d]) {
    for (long i = 0; i < n1_a; i++)
      for (long j = 0; j < n1_b; j++)
        for (long k = 0; k < n1_c; k++)
          for (long l = 0; l < n1_d; l++)
            A_1[i][j][k][l] += i + j + k + l;
 }

the assumption:

  n1_a <= 0 or (n1_a > 0 and n1_b <= 0) or
  (n1_a > 0 and n1_b > 0 and n1_c <= 0) or
  (n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d <= 0) or
  (n1_a > 0 and n1_b > 0 and n1_c > 0 and n1_d > 0 and
   p1_b >= n1_b and p1_c >= n1_c and p1_d >= n1_d)

is taken rather than the simpler assumption:

  p9_b >= n9_b and p9_c >= n9_c and p9_d >= n9_d.

The former is less strict, as it allows arbitrary values of p1_* in case, the
loop is not executed at all. However, in practice these precise constraints
explode when combined across different accesses and loops. For now it seems
to make more sense to take less precise, but more scalable constraints by
default. In case we find a practical example where more precise constraints
are needed, we can think about allowing such precise constraints in specific
situations where they help.

This change speeds up the new test case from taking very long (waited at least
a minute, but it probably takes a lot more) to below a second.

llvm-svn: 296456
2017-02-28 09:45:54 +00:00
Tobias Grosser cf66ea3845 Update isl to isl-0.18-304-g1efe43d
This is a normal maintenance update.

llvm-svn: 296441
2017-02-28 07:06:06 +00:00
Michael Kruse 6469380daa [Cmake] Optionally use a system isl version.
This patch adds an option to build against a version of libisl already
installed on the system. The installation is autodetected using the
pkg-config file shipped with isl.

The detection of the library is in the FindISL.cmake module that creates
an imported target.

Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com>

Differential Revision: https://reviews.llvm.org/D30043

llvm-svn: 296361
2017-02-27 17:54:25 +00:00
Michael Kruse b295c37a15 [DeLICM] Statistics for use in regression tests.
Print some measurements of the DeLICM transformation at -analyze to be
used in regression tests.

llvm-svn: 296347
2017-02-27 15:53:13 +00:00
Roman Gareev bc3fbe49c5 Disable the parallel code generation in case of extension nodes
We can not perform the dependence analysis and, consequently, the parallel
code generation in case the schedule tree contains extension nodes.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30394

llvm-svn: 296325
2017-02-27 08:03:11 +00:00
Michael Kruse e199f285b0 [DeLICM] Fortify against exceeding isl's max operations counter.
Control flow would flow-through after the check whether the operations
quota exceeded, with the intention that it would later be caught by
Knowledge::isUsable(). However, the Knowledge constructor has its own
assertions to check consistency which would fail if its fields have only
been initialized partially because some sets have been computed correctly
before the operations quota takes effect.

Fix by erroring-out early instead of falling-throught into the code that
might expect that everything has been computed correctly. For robustness,
also bail-out if any of the fields contain nullptr values instead of
relying on isl always setting exactly this error code if something went
wrong.

This should fix the
perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable
(-polly-process-unprofitable -polly-position=before-vectorizer
-polly-enable-delicm) buildbot.

llvm-svn: 296022
2017-02-23 21:58:20 +00:00
Michael Kruse f4e201e09f [Support] Remove NonowningIslPtr. NFC.
NonowningIslPtr<isl_X> was used as types of function parameters when the
function does not consume the isl object, i.e. an __isl_keep parameter.

The alternatives are:

1. IslPtr<isl_X>
   This has additional calls to isl_X_copy and isl_X_free to
   increase/decrease the reference counter even though not needed. The
   caller already owns a reference to the isl object.

2. const IslPtr<isl_X>&
   This does not change the reference counter, but requires an
   additional load to get the pointer to the isl object (instead of just
   passing the pointer itself).
   Moreover, the compiler cannot rely on the constness of the pointer
   and has to reload the pointer every time it writes to memory (unless
   alias analysis such as TBAA says it is not possible).

The isl C++ bindings currently in development do not have an equivalent
to NonowningIslPtr and adding one would make the binding more
complicated and its advantage in performance is small. In order to
simplify the transition to these C++ bindings, remove NonowningIslPtr.
Change every former use of it to alternative 2 mentioned aboce
(const IslPtr<isl_X>&).

llvm-svn: 295998
2017-02-23 17:57:27 +00:00
Michael Kruse 2c7169d00c [DependenceInfo] Remove unused variable. NFC.
llvm-svn: 295987
2017-02-23 15:41:01 +00:00
Michael Kruse dd6f29375b [DependenceInfo] Use references instead of double pointers. NFC.
Non-const references are the more C++-ish way to modify a variable
passed by the caller.

llvm-svn: 295986
2017-02-23 15:40:56 +00:00
Michael Kruse ec8fc32160 [DependenceInfo] Rename StmtScheduleDomain -> TaggedStmtDomain. NFC.
llvm-svn: 295985
2017-02-23 15:40:52 +00:00
Michael Kruse 00c38e0df2 [DependenceInfo] Simplify use of StmtSchedule's domain [NFC]
Once a StmtSchedule is created, only its domain is used anywhere within
DependenceInfo::calculateDependences. So, we choose to return the
wrapped domain of the union_map rather than the entire union_map.

However, we still build the union_map first within collectInfo(). It is
cleaner to first build the entire union_map and then pull the domain out in
one shot, rather than repeatedly extracting the domain in bits and pieces
from accdom.

Contributed-by: Siddharth Bhat <siddu.druid@gmail.com>

Differential Revision: https://reviews.llvm.org/D30208

llvm-svn: 295984
2017-02-23 15:40:46 +00:00
Michael Kruse 52ab4943b4 Remove all references to PostDominators. NFC.
Marking a pass as preserved is necessary if any Polly pass uses it, even
if it is not preserved within the generated code. Not marking it would
cause the the Polly pass chain to be interrupted. It is not used by any
Polly pass anymore, hence we can remove all references to it.

llvm-svn: 295983
2017-02-23 15:16:22 +00:00
Michael Kruse 9f519714b3 [DeLICM] Add missing Doxygen comment. NFC.
llvm-svn: 295978
2017-02-23 14:51:50 +00:00
Michael Kruse 311ecb00dc [DeLICM] Capitalize parameter name. NFC.
llvm-svn: 295977
2017-02-23 14:51:45 +00:00
Tobias Grosser 59d23bbdc6 Update isl to isl-0.18-282-g12465a5
Besides a variety of smaller cleanups, this update also contains a correctness
fix to isl coalesce which resolves a crash in Polly.

llvm-svn: 295966
2017-02-23 12:48:42 +00:00
Roman Gareev 96e1119a96 Make optimizations based on pattern matching be enabled by default
Currently, pattern based optimizations of Polly can identify matrix
multiplication and optimize it according to BLIS matmul optimization pattern
(see ScheduleTreeOptimizer for details). This patch makes optimizations
based on pattern matching be enabled by default.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30293

llvm-svn: 295958
2017-02-23 11:44:12 +00:00
Michael Kruse d8d32bb3d1 [DeLICM] Regression test for skipping map targets.
Add optimization-remarks-missed for when mapping targets have been
skipped and add regression tests for them.

llvm-svn: 295953
2017-02-23 10:25:20 +00:00
Michael Kruse deb30e8278 [DeLICM] Add regression tests for DeLICM reject cases.
These tests were not included in the main DeLICM commit. These check the
cases where zone analysis cannot be successful because of assumption
violations.

We use the LLVM optimization remark infrastructure as it seems to be the
best fit for this kind of messages. I tried to make use if the
OptimizationRemarkEmitter. However, it would insert additional function
passes into the pass manager to get the hotness information. The pass
manager would insert them between the flatten pass and delicm, causing
the ScopInfo with the flattened schedule being thrown away.

Differential Revision: https://reviews.llvm.org/D30253

llvm-svn: 295846
2017-02-22 15:14:08 +00:00
Michael Kruse 8474470500 [DeLICM] Fix wrong comment. NFC.
Correct a comment that claimed that a store after load was detected
when the code checks a load after a store.

llvm-svn: 295835
2017-02-22 14:14:40 +00:00
Michael Kruse 43ed25f1d9 [DeLICM] Print message when zone analysis is not available on -analysis.
This is to distinguish the cases that analysis has failed from the case
where not transformation was performed.

llvm-svn: 295833
2017-02-22 13:48:35 +00:00
Michael Kruse 91cdafb86f [DeLICM] Use opt<int>.
There is no template specialization for cl::parser<unsigned long> such
that parsing an cl::opt<unsigned long> command line argument will fail.
Use opt<int> instead which has an associated parser.

llvm-svn: 295832
2017-02-22 13:48:18 +00:00
Tobias Grosser cc43087afc [DependenceInfo] Simplify creation and subsequent use of AccessSchedule [NFC]
We only ever use the wrapped domain of AccessSchedule, so stop
creating an entire union_map and then pulling the domain out.

Reviewers: grosser
Tags: #polly

Contributed-by: Siddharth Bhat <siddu.druid@gmail.com>

Differential Revision: https://reviews.llvm.org/D30179

llvm-svn: 295726
2017-02-21 15:38:31 +00:00
Michael Kruse 9e52c39f0a [DeLICM] Map values hoisted by LICM back to the array.
Implement the -polly-delicm pass. The pass intends to undo the
effects of LoopInvariantCodeMotion (LICM) which adds additional scalar
dependencies into SCoPs. DeLICM will try to map those scalars back to
the array elements they were promoted from, as long as the array
element is unused.

The is the main patch from the DeLICM/DePRE patch series. It does not
yet undo GVN PRE for which additional information about known values
is needed and does not handle PHI write accesses that have have no
target. As such its usefulness is limited. Patches for these issues
including regression tests for error situatons will follow.

Reviewers: grosser

Differential Revision: https://reviews.llvm.org/D24716

llvm-svn: 295713
2017-02-21 10:20:54 +00:00
Michael Kruse 5ab24fdb73 [Cmake] Install the isl headers into the install tree.
isl headers are currently missing in a Polly installation. Because the
Polly headers depend on those, code can't be compiled against an
installed Polly.

This patch installs the isl headers. I left a TODO, as optionally it
should be possible to use a system version of isl instead of the one
shipped with Polly.

When compiling, clients of the installation need to add
-I${PREFIX}/include/polly/ to there include path right now, because
there currently is no way to export this path automatically.

Contributed-by: Philip Pfaffe <philip.pfaffe@gmail.com>

Differential Revision: https://reviews.llvm.org/D29931

llvm-svn: 295671
2017-02-20 16:57:14 +00:00
Tobias Grosser 079d511891 [ScopInfo] Count read-only arrays when computing complexity of alias check
Instead of counting the number of read-only accesses, we now count the number of
distinct read-only array references when checking if a run-time alias check
may be too complex. The run-time alias check is quadratic in the number of
base pointers, not the number of accesses.

Before this change we accidentally skipped SPEC's lbm test case.

llvm-svn: 295567
2017-02-18 20:51:29 +00:00
Tobias Grosser 28492b85e2 [DependenceInfo] Pull out statement [NFC]
This simplifies the code slightly.

llvm-svn: 295551
2017-02-18 16:41:28 +00:00
Tobias Grosser 8ee46985d2 [Dependences] Compute reduction dependences on schedule tree [NFC]
This change gets rid of the need for zero padding, makes the reduction
computation code more similar to the normal dependence computation, and also
better documents what we do at the moment.

Making the dependence computation for reductions a little bit easier to
understand will hopefully help us to further reduce code duplication.

This reduces the time spent only in the reduction dependence pass from 260ms to
150ms for test/DependenceInfo/reduction_sequence.ll. This is a reduction of over
40% in dependence computation time.

This change was inspired by discussions with Michael Kruse, Utpal Bora,
Siddharth Bhat, and Johannes Doerfert. It can hopefully lay the base for further
cleanups of the reduction code.

llvm-svn: 295550
2017-02-18 16:39:04 +00:00
Tobias Grosser 2461021150 Drop leftover debug statement
llvm-svn: 295444
2017-02-17 13:39:45 +00:00
Tobias Grosser cd01a363d6 [ScopInfo] Add statistics to count loops after scop modeling
llvm-svn: 295431
2017-02-17 08:12:36 +00:00
Tobias Grosser 65ce9362b8 [ScopDetection] Compute the maximal loop depth correctly
Before this change, we obtained loop depth numbers that were deeper then the
actual loop depth.

llvm-svn: 295430
2017-02-17 08:08:54 +00:00
Tobias Grosser 72745c2ef5 Updated isl to isl-0.18-254-g6bc184d
This update includes a couple more coalescing changes as well as a large
number of isl-internal code cleanups (dead assigments, ...).

llvm-svn: 295419
2017-02-17 05:11:16 +00:00
Tobias Grosser ca2cfd0bd8 [ScopInfo] Do not try to fold array dimensions of size zero
Trying to fold such kind of dimensions will result in a division by zero,
which crashes the compiler. As such arrays are likely to invalidate the
scop anyhow (but are not illegal in LLVM-IR), there is no point in trying
to optimize the array layout. Hence, we just avoid the folding of
constant dimensions of size zero.

llvm-svn: 295415
2017-02-17 04:48:52 +00:00
Tobias Grosser 90411a967b [ScopInfo] Rename MaxDisjunctions -> MaxDisjuncts [NFC]
There is only a single disjunction. However, we bound the number of 'disjuncts'
in this disjunction. Name the variable accordingly.

llvm-svn: 295362
2017-02-16 19:11:33 +00:00
Tobias Grosser c8a8276710 [ScopInfo] Bound the number of disjuncts in context
Before this change wrapping range metadata resulted in exponential growth of
the context, which made context construction of large scops very slow. Instead,
we now just do not model the range information precisely, in case the number
of disjuncts in the context has already reached a certain limit.

llvm-svn: 295360
2017-02-16 19:11:25 +00:00
Tobias Grosser 98a3aa4f19 [ScopInfo] Use uppercase variable name [NFC]
llvm-svn: 295350
2017-02-16 18:39:18 +00:00
Tobias Grosser 3281f601bb [ScopInfo] Always derive upper and lower bounds for parameters
Commit r230230 introduced the use of range metadata to derive bounds for
parameters, instead of just looking at the type of the parameter. As part of
this commit support for wrapping ranges was added, where the lower bound of a
parameter is larger than the upper bound:

  { 255 < p || p < 0 }

However, at the same time, for wrapping ranges support for adding bounds given
by the size of the containing type has acidentally been dropped. As a result,
the range of the parameters was not guaranteed to be bounded any more. This
change makes sure we always add the bounds given by the size of the type and
then additionally add bounds based on signed wrapping, if available. For a
parameter p with a type size of 32 bit, the valid range is then:

  { -2147483648 <= p <= 2147483647 and (255 < p or p < 0) }

llvm-svn: 295349
2017-02-16 18:39:14 +00:00
Roman Gareev 4eb07e481e [FIX] Fix the typo in ScheduleOptimizer.cpp.
llvm-svn: 295292
2017-02-16 07:04:41 +00:00
Michael Kruse e23e94a08d [DeLICM] Add Knowledge class. NFC.
The Knowledge class remembers the state of data at any timepoint of a SCoP's
execution. Currently, it tracks whether an array element is unused or is
occupied by some value, and the writes to it. A future addition will be to also
remember which value it contains.

Objects are used to determine whether two Knowledge contain conflicting
information, i.e. two states cannot be true a the same time.

This commit was extracted from the DeLICM algorithm at
https://reviews.llvm.org/D24716.

llvm-svn: 295197
2017-02-15 16:59:10 +00:00
Tobias Grosser 288c450cf6 [ScopDetectDiagnostics] Do not format unnamed array names
Formatting unnamed array names is expensive in LLVM as the this requires
deriving the numbered virtual instruction name (e.g., %12) for an llvm::Value,
which is currently not implemented efficiently. As instruction numberes anyhow
do not really carry a lot of information for the user, we just print 'unknown'
instead.

This change reduces the scop detection time from 24 to 19 seconds, for one of
our large-scale inputs. This is a reduction by 21%.

llvm-svn: 294894
2017-02-12 10:53:02 +00:00
Tobias Grosser 9fe37df27c [ScopDetection] Add statistics to count the maximal number of scops in loop
llvm-svn: 294893
2017-02-12 10:52:57 +00:00
Tobias Grosser b3a85884f7 Do not use wrapping ranges to bound non-affine accesses
When deriving the range of valid values of a scalar evolution expression might
be a range [12, 8), where the upper bound is smaller than the lower bound and
where the range is expected to possibly wrap around. We theoretically could
model such a range as a union of two non-wrapping ranges, but do not do this
as of yet. Instead, we just do not derive any bounds. Before this change,
we could have obtained bounds where the maximal possible value is strictly
smaller than the minimal possible value, which is incorrect and also caused
assertions during scop modeling.

llvm-svn: 294891
2017-02-12 08:11:12 +00:00
Roman Gareev b196055c0c Check reduction dependencies in case of the matrix multiplication optimization
To determine parameters of the matrix multiplication, we check RAW dependencies
that can be expressed using only reduction dependencies. Consequently, we
should check the reduction dependencies, if this is the case.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Sven Verdoolaege <skimo-polly@kotnet.org>
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29814

llvm-svn: 294836
2017-02-11 09:59:09 +00:00
Roman Gareev de69293b01 [FIX] Fix the potential issue of containsOnlyMatMulDep.
llvm-svn: 294835
2017-02-11 09:48:09 +00:00
Roman Gareev 5ef7e210c0 [NFC] Fix the style issue of lib/Transform/ScheduleOptimizer.cpp.
llvm-svn: 294834
2017-02-11 08:43:41 +00:00
Roman Gareev afcf026d81 [NFC] Fix style issues of lib/Transform/ScheduleOptimizer.cpp.
llvm-svn: 294831
2017-02-11 07:14:37 +00:00
Roman Gareev 3d4eae31ea Use the size of the widest type of the matrix multiplication operands
The size of the operands type is the one of the parameters required
to determine the BLIS micro-kernel. We get the size of the widest type
of the matrix multiplication operands in case there are several
different types.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29269

llvm-svn: 294828
2017-02-11 07:00:05 +00:00
Tobias Grosser 296fe2e2ad [ScopInfo] Use original base address when building ScopArrayInfo [NFC]
This change clarfies that we want to indeed use the original base address
when creating the ScopArrayInfo that corresponds to a given memory access.

This change prepares for https://reviews.llvm.org/D28518.

llvm-svn: 294734
2017-02-10 10:09:46 +00:00
Tobias Grosser 5db171a9da [ScopInfo] Use getAccessValue to obtain the accessed value
This replaces the use of getOriginalAddrPtr, a value that is stored in
ScopArrayInfo and might at some point not be unique any more. However, the
access value is defined to be unique.

This change is an update on r294576, which only clarified that we need the
original memory access, but where we still remained dependent to have one base
pointer per scop.

This change removes unnecessary uses of MemoryAddress::getOriginalBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294733
2017-02-10 10:09:44 +00:00
Tobias Grosser 583be06fb2 [BlockGenerator] Use MemoryAccess::getAccessValue to get load instruction
When generating code in the BlockGenerator we copy all (interesting)
instructions and keep track of the new values in a basic block map. To obtain
the original llvm::Value that belongs to a load memory access, we use
getAccessValue() instead of getOriginalBaseAddr(). The former always references
the instruction we use to load values from. The latter, on the other hand,
is obtaine from the corresponding ScopArrayInfo and would not be unique in
case ScopArrayInfo objects at some point allow memory accesses with different
base addresses.

This change is an update on r294566, which only clarified that we need the
original memory access, but where we still remained dependent to have one
base pointer per scop.

This change removes unnecessary uses of MemoryAddress::getOriginalBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294669
2017-02-09 23:54:23 +00:00
Tobias Grosser e24b7b929d [ScopInfo] Use MemoryAccess::getScopArrayInfo() interface to access Array [NFC]
By using the public interface MemoryAccess::getScopArrayInfo() we avoid the
direct access to the ScopArrayInfoMap and as a result also do not need to
use the BasePtr as key. This change makes the code cleaner.

The const-cast we introduce is a little ugly. We may consider to drop const
correctness for getScopArrayInfo() at some point.

This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294655
2017-02-09 23:24:57 +00:00
Tobias Grosser 9c7d181c92 [ScopInfo] Use types instead of 'auto' and use more descriptive variable names [NFC]
LLVM's coding conventions suggest to use auto only in obvious cases. Hence,
we move this code to actually declare the types used. We also replace the
variable name 'SAI', with the name 'Array', as this improves readability.

llvm-svn: 294654
2017-02-09 23:24:54 +00:00
Tobias Grosser 889830b1c5 [ScopInfo] Use ScopArrayInfo instead of base address
When building alias groups, we sort different ScopArrays into unrelated groups.
Historically we identified arrays through their base pointer, as no
ScopArrayInfo class was yet available. This change changes the alias group
construction to reference arrays through their ScopArrayInfo object.

This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294649
2017-02-09 23:12:22 +00:00
Tobias Grosser be372d5a04 [ScopInfo] Expect the OriginalBaseAddr when looking at underlying instructions [NFC]
During SCoP construction we sometimes inspect the underlying IR by looking at
the base address of a MemoryAccess. In such cases, we always want the original
base address. Make this clear by calling getOriginalBaseAddr().

This is a non-functional change as getBaseAddr maps to getOriginalBaseAddr
at the moment.

This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294576
2017-02-09 10:11:58 +00:00
Tobias Grosser e0e0e4d4f6 [ScopInfo] Remove unnecessary indirection through SCEV [NFC]
The base address of a memory access is already an llvm::Value. Hence, there is
no need to go through SCEV, but we can directly work with the llvm::Value.

Also use 'Value *' instead of 'auto' for cases where the type is not obvious.

llvm-svn: 294575
2017-02-09 09:34:46 +00:00
Tobias Grosser 4553463be4 [IRBuilder] Extract base pointers directly from ScopArray
Instead of iterating over statements and their memory accesses to extract the
set of available base pointers, just directly iterate over all ScopArray
objects. This reflects more the actual intend of the code: collect all arrays
(and their base pointers) to emit alias information that specifies that accesses
to different arrays cannot alias.

This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294574
2017-02-09 09:34:42 +00:00
Tobias Grosser 26fb7d7517 [IslAst] Print the ScopArray name to mark reductions
Before this change we used the name of the base pointer to mark reductions. This
is imprecise as the canonical reference is the ScopArray itself and not the
basepointer of a reduction. Using the base pointer of reductions is problematic
in cases where a single ScopArray is referenced through two different base
pointers.

This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294568
2017-02-09 08:06:15 +00:00
Tobias Grosser 114f6d6ff5 [DependenceInfo] Use ScopArrayInfo to keep track of arrays [NFC]
When computing reduction dependences we first identify all ScopArrays which are
part of reductions and then only compute for these ScopArrays the more detailed
data dependences that allow us to identify reductions and optimize across them.
Instead of using the base pointer as identifier of a ScopArray, it is clearer
and more understandable to directly use the ScopArray as identifier. This change
implements such a switch.

This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294567
2017-02-09 08:06:05 +00:00
Tobias Grosser 02400a0e0c [BlockGenerator] BBMap uses original BaseAddress for scalar loads [NFC]
When regenerating code in the BlockGenerator we copy instructions that may
references scalar values, for which the new value of a given scalar is looked up
in BBMap using the original scalar llvm::Value as index. It is consequently
necessary that (re)loaded scalar values are made available in BBMap using the
original llvm::Value as key independently if the llvm::Value was (re)loaded from
the original scalar or a new access function has been specified that caused the
value to be reloaded from an array with a differnet base address. We make this
clear by using MemoryAccess::getOriginalBaseAddr() instead of
MemoryAccess::getBaseAddr() as index to BBMap.

This change removes unnecessary uses of MemoryAddress::getBaseAddr() in
preparation for https://reviews.llvm.org/D28518.

llvm-svn: 294566
2017-02-09 08:05:50 +00:00
Roman Gareev 9989088ee9 Isolate a set of partial tile prefixes in case of the matrix multiplication
optimization

Isolate a set of partial tile prefixes to allow hoisting and sinking out of
the unrolled innermost loops produced by the optimization of the matrix
multiplication.

In case it cannot be proved that the number of loop iterations can be evenly
divided by tile sizes and we tile and unroll the point loop, the isl generates
conditional expressions. Subsequently, the conditional expressions can prevent
stores and loads of the unrolled loops from being sunk and hoisted.

The patch isolates a set of partial tile prefixes, which have exactly Mr x Nr
iterations of the two innermost loops, the result of the loop tiling performed
by the matrix multiplication optimization, where Mr and Mr are parameters of
the micro-kernel. This helps to get rid of the conditional expressions of
the unrolled innermost loops. Probably this approach can be replaced with
padding in future.

In case of, for example, the gemm from Polybench/C 3.2 and parametric loop
bounds, it helps to increase the performance from 7.98 GFlops (27.71% of
theoretical peak) to 21.47 GFlops (74.57% of theoretical peak). Hence, we
get the same performance as in case of scalar loops bounds.

It also cause compile time regression. The compile-time is increased from
0.795 seconds to 0.837 seconds in case of scalar loops bounds and from 1.222
seconds to 1.490 seconds in case of parametric loops bounds.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29244

llvm-svn: 294564
2017-02-09 07:10:01 +00:00
Roman Gareev 772498dc68 [NFC] Make ScheduleTreeOptimizer::optimizeBand return a schedule node optimized
with optimizeMatMulPattern

This patch makes ScheduleTreeOptimizer::optimizeBand return a schedule node
optimized with optimizeMatMulPattern. Otherwise, it could not use the isolate
option, because standardBandOpts could try to tile a band node with anchored
subtree and get the error, since the use of the isolate option causes any tree
containing the node to be considered anchored. Furthermore, it is not intended
to apply standard optimizations, when the matrix multiplication has been
detected.

llvm-svn: 294444
2017-02-08 13:29:06 +00:00
Michael Kruse 49c21222a0 [External] Move lib/JSON to lib/External/JSON. NFC.
For consistency with isl and ppcg which are already in lib/External.

llvm-svn: 294126
2017-02-05 15:26:56 +00:00
Michael Kruse acb08aaed5 [Support] Add convertZoneToTimepoints. NFC.
This function has been extracted from the upcoming DeLICM patch
(https://reviews.llvm.org/D24716).

In contrast to computeReachingWrite and computeArrayUnused,
convertZoneToTimepoints implies a format for zones (ranges between timepoints).
Zones at the moment are unique to DeLICM, but convertZoneToTimepoints makes most
sense in conjunction with the previous two functions.

llvm-svn: 294094
2017-02-04 15:42:17 +00:00
Michael Kruse ec67d36493 [Support] Add computeArrayUnused. NFC.
This function has been extracted from the upcoming DeLICM patch
(https://reviews.llvm.org/D24716).

llvm-svn: 294093
2017-02-04 15:42:10 +00:00
Michael Kruse f4dc133e69 [Support] Add computeReachingWrite. NFC.
This function has been extracted from the upcoming DeLICM patch
(https://reviews.llvm.org/D24716).

llvm-svn: 294092
2017-02-04 15:42:01 +00:00
Michael Kruse eeadf31de1 [Support] Remove unused function hasInvokeEdge. NFC.
llvm-svn: 294062
2017-02-03 22:53:10 +00:00
Roman Gareev 98075fe181 A new algorithm for identification of a SCoP statement that implement a matrix
multiplication

The current identification of a SCoP statement that implement a matrix
multiplication does not help to identify different permutations of loops that
contain it and check for dependencies, which can prevent it from being
optimized. It also requires external determination of the operands of
the matrix multiplication. This patch contains the implementation of a new
algorithm that helps to avoid these issues. It also modifies the test cases
that generate matrix multiplications with linearized accesses, because
the new algorithm does not support them.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
             Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D28357

llvm-svn: 293890
2017-02-02 14:23:14 +00:00
Tobias Grosser ff40087a6a Update to recent formatting changes
llvm-svn: 293756
2017-02-01 10:12:09 +00:00
Daniel Jasper baaa152294 Fix format after recent clang-format change.
llvm-svn: 293753
2017-02-01 09:31:42 +00:00
Roman Gareev 7758a2af53 Update the documentation on how the packing transformation is implemented
Add a simple example to update the documentation on how the packing
transformation is implemented.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D28021

llvm-svn: 293429
2017-01-29 10:37:50 +00:00
Tobias Grosser 682c51143d [BlockGenerator] Comment corretions for r293374 [NFC]
This addresses some additional comments from Michael Kruse for commit r293374
as expressed in https://reviews.llvm.org/D28901.

llvm-svn: 293378
2017-01-28 11:39:02 +00:00
Tobias Grosser 587f1f57ad [Polly] [BlockGenerator] Unify ScalarMap and PhiOpsMap
Instead of keeping two separate maps from Value to Allocas, one for
MemoryType::Value and the other for MemoryType::PHI, we introduce a single map
from ScopArrayInfo to the corresponding Alloca. This change is intended, both as
a general simplification and cleanup, but also to reduce our use of
MemoryAccess::getBaseAddr(). Moving away from using getBaseAddr() makes sure
we have only a single place where the array (and its base pointer) for which we
generate code for is specified, which means we can more easily introduce new
access functions that use a different ScopArrayInfo as base. We already today
experiment with modifiable access functions, so this change does not address
a specific bug, but it just reduces the scope one needs to reason about.

Another motivation for this patch is https://reviews.llvm.org/D28518, where
memory accesses with different base pointers could possibly be mapped to a
single ScopArrayInfo object. Such a mapping is currently not possible, as we
currently generate alloca instructions according to the base addresses of the
memory accesses, not according to the ScopArrayInfo object they belong to.  By
making allocas ScopArrayInfo specific, a mapping to a single ScopArrayInfo
object will automatically mean that the same stack slot is used for these
arrays. For D28518 this is not a problem, as only MemoryType::Array objects are
mapping, but resolving this inconsistency will hopefully avoid confusion.

llvm-svn: 293374
2017-01-28 07:42:10 +00:00
Michael Kruse d1508812f5 [Support] Add general isl tools for DeLICM. NFC.
Add some generally useful isl tools into a their own new ISLTools.cpp.
These are the helpers were extracted from and will be use by the DeLICM
algorithm (https://reviews.llvm.org/D24716).

Suggested-by: 	Tobias Grosser <tobias@grosser.es>
llvm-svn: 293340
2017-01-27 22:51:36 +00:00
Michael Kruse 33dc454700 [CodePrepa] Remove unused declaration. NFC.
llvm-svn: 293304
2017-01-27 16:59:09 +00:00
Tobias Grosser 77363965c0 [ScopDetectionDiagnostic] Add meaningfull enduser message for regions with entry block
Before this change the user only saw "Unspecified Error", when a region
contained the entry block. Now we report:

"Scop contains function entry (not yet supported)."

llvm-svn: 293169
2017-01-26 10:41:37 +00:00
Tobias Grosser 64bbb1357f ScopDetectionDiagnostics: Also emit diagnostics in case no debug info is available
In this case, we just use the start of the scop as the debug location.

llvm-svn: 293165
2017-01-26 10:30:55 +00:00
Tobias Grosser 75dfaa1dbe BlockGenerator: Do not redundantly reload from PHI-allocas in non-affine stmts
Before this change we created an additional reload in the copy of the incoming
block of a PHI node to reload the incoming value, even though the necessary
value has already been made available by the normally generated scalar loads.
In this change, we drop the code that generates this redundant reload and
instead just reuse the scalar value already available.

Besides making the generated code slightly cleaner, this change also makes sure
that scalar loads go through the normal logic, which means they can be remapped
(e.g. to array slots) and corresponding code is generated to load from the
remapped location. Without this change, the original scalar load at the
beginning of the non-affine region would have been remapped, but the redundant
scalar load would continue to load from the old PHI slot location.

It might be possible to further simplify the code in addOperandToPHI,
but this would not only mean to pull out getNewValue, but to also change the
insertion point update logic. As this did not work when trying it the first
time, this change is likely not trivial. To not introduce bugs last minute, we
postpone further simplications to a subsequent commit.

We also document the current behavior a little bit better.

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D28892

llvm-svn: 292486
2017-01-19 14:12:45 +00:00
Tobias Grosser 943c369c60 BlockGenerator: remove obfuscating const and const casts
Making certain values 'const' to just cast it away a little later mainly
obfuscates the code. Hence, we just drop the 'const' parts.

Suggested-by: Michael Kruse <llvm@meinersbur.de>
llvm-svn: 292480
2017-01-19 13:25:52 +00:00
Tobias Grosser 97b8490982 Use range-based for loop [NFC]
llvm-svn: 292471
2017-01-19 05:09:23 +00:00
Tobias Grosser e1ff0cf2eb Relax assert when setting access functions with invariant base pointers
Summary:
Instead of forbidding such access functions completely, we verify that their
base pointer has been hoisted and only assert in case the base pointer was
not hoisted.

I was trying for a little while to get a test case that ensures the assert is
correctly fired in case of invariant load hoisting being disabled, but I could
not find a good way to do so, as llvm-lit immediately aborts if a command
yields a non-zero return value. As we do not generally test our asserts,
not having a test case here seems OK.

This resolves http://llvm.org/PR31494

Suggested-by: Michael Kruse <llvm@meinersbur.de>

Reviewers: efriedma, jdoerfert, Meinersbur, gareevroman, sebpop, zinob, huihuiz, pollydev

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D28798

llvm-svn: 292213
2017-01-17 12:00:42 +00:00
Eli Friedman 71329901ea Tidy up getFirstNonBoxedLoopFor [NFC]
Move the function getFirstNonBoxedLoopFor which is used in ScopBuilder
and in ScopInfo to Support/ScopHelpers to make it reusable in other
locations. No functionality change.

Patch by Sameer Abu Asal.

Differential Revision: https://reviews.llvm.org/D28754

llvm-svn: 292168
2017-01-16 22:54:29 +00:00
Tobias Grosser 0032d87337 ScopInfo: document base pointers in alias-checks must be invariant [NFC]
Before this change, this code has been mixed with a check for non-affine
loops (and when originally introduce was also duplicated). By creating
a separate loop and explicitly documenting this property, the current
behavior becomes a lot more clear.

llvm-svn: 292140
2017-01-16 15:49:14 +00:00
Tobias Grosser f3c145f2ab ScopInfo: Improve comments in buildAliasGroup [NFC]
llvm-svn: 292139
2017-01-16 15:49:09 +00:00
Tobias Grosser 77f3257b41 ScopInfo: split out construction of a single alias group [NFC]
The loop body in buildAliasGroups is still too large to easily scan it. Hence,
we split the loop body out into a separate function to improve readability.

llvm-svn: 292138
2017-01-16 15:49:07 +00:00
Tobias Grosser e95222343c ScopInfo: Do not modify the original alias group [NFC]
Instead of modifying the original alias group and repurposing it as read-write
access group when splitting accesses in read-only and read-write accesses, we
just keep all three groups: the original alias group, the set of read-only
accesses and the set of read-write accesses.  This allows us to remove some
complicated iterator handling and also allows for more code-reuse in
calculateMinMaxAccess.

llvm-svn: 292137
2017-01-16 15:49:04 +00:00
Tobias Grosser 457eb579dd ScopInfo: No need to keep ReadOnlyAccesses in an additional map [NFC]
It seems over time we added an additional map that maps from the base address
of a read-only access to the actual access. However this map is never used.
Drop the creation and use of this map to simplify our alias check generation
code.

llvm-svn: 292126
2017-01-16 14:24:48 +00:00
Tobias Grosser dba2206b65 ScopInfo: no need to clear alias group explicitly
The alias group will anyhow be cleared at the end of this function and is not
used afterwards. We avoid an explicit clear() call at multiple places to
improve readability of this code.

llvm-svn: 292125
2017-01-16 14:13:01 +00:00
Tobias Grosser 21a059af09 Adjust formatting to commit r292110 [NFC]
llvm-svn: 292123
2017-01-16 14:08:10 +00:00
Tobias Grosser 92fd612c84 ScopInfo: Fold SmallVectors used in alias check generation back into loop [NFC]
Hoisting small vectors out of a loop seems to be a pure performance
optimization, which is unlikely to have great impact in practice. As this
hoisting just increases code-complexity, we fold the SmallVectors back into
the loop.

In subsequent commits, we will further simplify and structure this code, but
we committed this change separately to provide an explanation to make clear
that we purposefully reverted this optimization.

llvm-svn: 292122
2017-01-16 14:08:02 +00:00
Tobias Grosser e39f9127f9 ScopInfo: Extract out splitAliasGroupsByDomain [NFC]
The function buildAliasGroups got very large. We extract out the splitting
of alias groups to reduce its size and to better document the current behavior.

llvm-svn: 292121
2017-01-16 14:08:00 +00:00
Tobias Grosser 9edcf07e83 ScopInfo: Extract out buildAliasGroupsForAccesses [NFC]
The function buildAliasGroups got very large. We extract out the actual
construction of alias groups to reduce its size and to better document the
current behavior.

llvm-svn: 292120
2017-01-16 14:07:57 +00:00
Tobias Grosser 2ef3887d28 Do not track the isl PDF manual in SVN
There is no point in regularly committing a binary file to the repository, as
this just unnecessarily increases the repository size. Interested people can
find the isl manual for example at isl.gforge.inria.fr/manual.pdf.

llvm-svn: 292105
2017-01-16 11:48:03 +00:00
Hongbin Zheng 6aded2a0e4 Fix compilation on MSVC, NFC
Differential Revision: https://reviews.llvm.org/D28739

llvm-svn: 292067
2017-01-15 16:47:26 +00:00
Tobias Grosser 4d5a917287 Use typed enums to model MemoryKind and move MemoryKind out of ScopArrayInfo
To benefit of the type safety guarantees of C++11 typed enums, which would have
caught the type mismatch fixed in r291960, we make MemoryKind a typed enum.
This change also allows us to drop the 'MK_' prefix and to instead use the more
descriptive full name of the enum as prefix. To reduce the amount of typing
needed, we use this opportunity to move MemoryKind from ScopArrayInfo to a
global scope, which means the ScopArrayInfo:: prefix is not needed. This move
also makes historically sense. In the beginning of Polly we had different
MemoryKind enums in both MemoryAccess and ScopArrayInfo, which were later
canonicalized to one. During this canonicalization we just choose the enum in
ScopArrayInfo, but did not consider to move this shared enum to global scope.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>
Differential Revision: https://reviews.llvm.org/D28090

llvm-svn: 292030
2017-01-14 20:25:44 +00:00
Tobias Grosser 67e94fb435 ScheduleOptimizer: Allow to set register width in command line
We use this option to set a fixed register width in our test cases to make
sure the results are identical accross platforms.

llvm-svn: 292002
2017-01-14 07:14:54 +00:00
Tobias Grosser e29db2173b Update to recent clang-format changes
llvm-svn: 291810
2017-01-12 21:05:19 +00:00
Eli Friedman c6e3b6f156 Delete stray isl_map_dump call.
llvm-svn: 291521
2017-01-10 01:08:11 +00:00
Tobias Grosser cdbe5c9d6c Fix some typos in comments
llvm-svn: 291247
2017-01-06 17:30:34 +00:00
Tobias Grosser 94e5371dde Update to isl-0.18-43-g0b4256f
Even more isl coalesce changes.

llvm-svn: 290783
2016-12-31 07:46:11 +00:00
Tobias Grosser ba3ea97689 Update to isl-0.18-28-gccb9f33
Another set of isl coalesce changes.

llvm-svn: 290681
2016-12-28 19:35:49 +00:00
Tobias Grosser 600941351e Update to isl-0.18-17-g2844ebf
This update improves isl's ability to coalesce different convex sets/maps,
especially when the contain existentially quantified variables.

llvm-svn: 290538
2016-12-26 12:11:40 +00:00
Roman Gareev 1c2927b209 Specify the default values of the cache parameters
If the parameters of the target cache (i.e., cache level sizes, cache level
associativities) are not specified or have wrong values, we use ones for
parameters of the macro-kernel and do not perform data-layout optimizations of
the matrix multiplication. In this patch we specify the default values of the
cache parameters to be able to apply the pattern matching optimizations even in
this case. Since there is no typical values of this parameters, we use the
parameters of Intel Core i7-3820 SandyBridge that also help to attain the
high-performance on IBM POWER System S822 and IBM Power 730 Express server.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D28090

llvm-svn: 290518
2016-12-25 16:32:28 +00:00
Tobias Grosser 0791d5f5aa ScheduleOptimizer: Fix spelling of option '-polly-target-throughput-vector-fma'
througput -> throughput

llvm-svn: 290418
2016-12-23 07:33:39 +00:00
Tobias Grosser ccae1ee4df Update isl to isl-0.18-9-gd4734f3
llvm-svn: 290389
2016-12-22 23:08:57 +00:00
Roman Gareev be5299af0b Change the determination of parameters of macro-kernel
Typically processor architectures do not include an L3 cache, which means that
Nc, the parameter of the micro-kernel, is, for all practical purposes,
redundant ([1]). However, its small values can cause the redundant packing of
the same elements of the matrix A, the first operand of the matrix
multiplication. At the same time, big values of the parameter Nc can cause
segmentation faults in case the available stack is exceeded.

This patch adds an option to specify the parameter Nc as a multiple of
the parameter of the micro-kernel Nr.

In case of Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8

it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak).

Refs.:

[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D28019

llvm-svn: 290256
2016-12-21 12:51:12 +00:00
Roman Gareev bd5c6039c6 Align newly created arrays to the first level cache line boundary
Aligning data to cache lines boundaries helps to avoid overheads related to
an access to it ([1]). This patch aligns newly created arrays and adds an
option to specify the first level cache line size. By default we use 64 bytes,
which is a typical cache-line size ([2]).

In case of Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8

it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 12.63 GFlops/sec (43,8542% of theoretical peak).

Refs.:

[1] - http://www.alexonlinux.com/aligned-vs-unaligned-memory-access
[2] - http://igoro.com/archive/gallery-of-processor-cache-effects/

Differential Revision: https://reviews.llvm.org/D28020

Reviewed-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 290253
2016-12-21 12:37:36 +00:00
Roman Gareev 92c446016a [Polly] Use three-dimensional arrays to store packed operands of the matrix
multiplication

Previously we had two-dimensional accesses to store packed operands of
the matrix multiplication for the sake of simplicity of the packed arrays.
However, addition of the third dimension helps to simplify the corresponding
memory access, reduce the execution time of isl operations applied to it, and
consequently reduce the compile-time of Polly. For example, in case of
Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=7

it helps to reduce the compile-time from about 361.456 seconds to about 0.816
seconds.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
             Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D27878

llvm-svn: 290251
2016-12-21 11:18:42 +00:00
Tobias Grosser b6945e3301 Fix clang-format
llvm-svn: 290103
2016-12-19 14:06:40 +00:00
Daniel Jasper e5f3eba9c3 Fix format after recent clang-format change.
llvm-svn: 290085
2016-12-19 07:54:15 +00:00
Alexandre Isoard cbed3ce39f Add isl_multi_pw_aff to GICHelper
Add isl_multi_pw_aff* to GICHelper and add some missing isl_pw_multi_aff* handlers.

llvm-svn: 290007
2016-12-16 23:41:26 +00:00
Roman Gareev 2606c48a1d Restrict ranges of extension maps
To prevent copy statements from accessing arrays out of bounds, ranges of their
extension maps are restricted, according to the constraints of domains.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D25655

llvm-svn: 289815
2016-12-15 12:35:59 +00:00
Roman Gareev 15db81ef71 [NFC] Fix typos in getMacroKernelParams.
llvm-svn: 289808
2016-12-15 12:00:57 +00:00
Roman Gareev 8babe1a216 The order of the loops defines the data reused in the BLIS implementation of
gemm ([1]). In particular, elements of the matrix B, the second operand of
matrix multiplication, are reused between iterations of the innermost loop.
To keep the reused data in cache, only elements of matrix A, the first operand
of matrix multiplication, should be evicted during an iteration of the
innermost loop. To provide such a cache replacement policy, elements of the
matrix A can, in particular, be loaded first and, consequently, be
least-recently-used.

In our case matrices are stored in row-major order instead of column-major
order used in the BLIS implementation ([1]). One of the ways to address it is
to accordingly change the order of the loops of the loop nest. However, it
makes elements of the matrix A to be reused in the innermost loop and,
consequently, requires to load elements of the matrix B first. Since the LLVM
vectorizer always generates loads from the matrix A before loads from the
matrix B and we can not provide it. Consequently, we only change the BLIS micro
kernel and the computation of its parameters instead. In particular, reused
elements of the matrix B are successively multiplied by specific elements of
the matrix A .

Refs.:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D25653

llvm-svn: 289806
2016-12-15 11:47:38 +00:00
Michael Kruse 7037fde427 Remove references to AssumptionCache. NFC.
The AssumptionCache was removed in r289756 after being replaced by the an
addtional operand list of affected values in r289755. The absence of that cache
means that we have now have to manually search for llvm.assume intrinsics as
now done by other passes (LazyValueInfo, CodeMetrics) do not take into
account an llvm::Instruction's user lists (ScalarEvolution).

llvm-svn: 289791
2016-12-15 09:25:14 +00:00
Tobias Grosser b02b6a8404 Adjust clang-format formatting to r289531
clang-format has been updated in r289531 to keep labels and values on
the same line. This change updates Polly to the new formatting style.

llvm-svn: 289533
2016-12-13 12:44:00 +00:00
Michael Kruse 79c0173f53 [ScheduleOptimizer] Fix memory leak. NFC.
llvm-svn: 289434
2016-12-12 14:51:06 +00:00
Michael Kruse b9a683d75d Add more ISL foreachElt functions. NFC.
Add and implement foreachElt for isl_map, isl_set and isl_union_set. These are
used by an out-of-tree patch which is in process of being upstreamed.

llvm-svn: 288924
2016-12-07 17:47:57 +00:00
Michael Kruse 2ead2bfc12 Add IslPtr type traits. NFC.
Add traits for isl_id and isl_multi_aff, required by out-of-tree patches
currently in progress of upstreaming.

isl_union_pw_aff_dump has been added to ISL during one of the last ISL
updates, such that we can also enable its dump() trait.

llvm-svn: 288915
2016-12-07 16:17:59 +00:00
Michael Kruse 1b8eb4104b Update to isl-0.17.1-314-g3106e8d
This version includes an update for imath (isl-0.17.1-49-g2f1c129). It fixes
the compilation under windows, which does not know ssize_t.

In addition, isl-0.17.1-288-g0500299 changed the way isl_test finds the source
directory. It now generates a file isl_srcdir.c at configure-time, containing
the source path, to not require setting the environment variable "srcdir" at
test-time. The cmake build system had to be modified to also generate that file.

llvm-svn: 288811
2016-12-06 14:37:39 +00:00
Johannes Doerfert bda814350a Allow to disable unsigned operations (zext, icmp ugt, ...)
Unsigned operations are often useful to support but the heuristics are
not yet tuned. This options allows to disable them if necessary.

llvm-svn: 288521
2016-12-02 17:55:41 +00:00
Johannes Doerfert a94ae1aede Do not allow multiple possibly aliasing ptrs in an expression
Relational comparisons should not involve multiple potentially
  aliasing pointers. Similarly this should hold for switch conditions
  and the two conditions involved in equality comparisons (separately!).
  This is a heuristic based on the C semantics that does only allow such
  operations when the base pointers do point into the same object.
  Since this makes aliasing likely we will bail out early instead of
  producing a probably failing runtime check.

llvm-svn: 288516
2016-12-02 17:49:52 +00:00
Johannes Doerfert 2df9963fe3 Rerun mem2reg after the inliner
It did happen that after the inliner finished we end up with promotable
allocas in a function. We now run mem2reg to make sure everything is
promoted if possible.

llvm-svn: 288514
2016-12-02 17:43:57 +00:00
Tobias Grosser bedef00e2c [ScopInfo] Fold constant coefficients in array dimensions to the right
This allows us to delinearize code such as the one below, where the array
sizes are A[][2 * n] as there are n times two elements in the innermost
dimension. Alternatively, we could try to generate another dimension for the
struct in the innermost dimension, but as the struct has constant size,
recovering this dimension is easy.

   struct com {
     double Real;
     double Img;
   };

   void foo(long n, struct com A[][n]) {
     for (long i = 0; i < 100; i++)
       for (long j = 0; j < 1000; j++)
         A[i][j].Real += A[i][j].Img;
   }

   int main() {
     struct com A[100][1000];
     foo(1000, A);

llvm-svn: 288489
2016-12-02 08:10:56 +00:00
Tobias Grosser 491b799a4d [ScopInfo] Separate construction and finalization of memory accesses [NFC]
After having built memory accesses we perform some additional transformations
on them to increase the chances that our delinearization guesses the right
shape. Only after these transformations, we take the assumptions that the
array shape we predict is such that no out-of-bounds memory accesses arise.

Before this change, the construction of the memory access, the access folding
that improves the represenation for certain parametric subscripts, and taking
the assumption was all done right after a memory access was created. In this
change we split this now into three separate iterations over all memory
accesses. This means only after all memory accesses have been built, we start
to canonicalize accesses, and to take assumptions. This split prepares for
future canonicalizations that must consider all memory accesses for deriving
additional beneficial transformations.

llvm-svn: 288479
2016-12-02 05:21:22 +00:00
Johannes Doerfert b1d6608430 [NFC] Check for feasibility prior to the profitability check
Feasibility is checked late on its own but early it is hidden behind
the "PollyProcessUnprofitable" guard. This change will make sure we opt
out early if the runtime context is infeasible anyway.

llvm-svn: 288329
2016-12-01 11:12:14 +00:00
Johannes Doerfert b6c5a5dd01 [FIX] Do not try to hoist obviously overwritten loads
llvm-svn: 288328
2016-12-01 11:10:45 +00:00
Tobias Grosser dc6b87c56e Add newline at end of debug print
In '[DBG] Allow to emit the RTC value at runtime' the diagnostics were printed
without a newline at the end of each diagnostic. We add such a newline to
improve readability.

llvm-svn: 288323
2016-12-01 08:08:47 +00:00
Michael Kruse 36e79ecaec [DeLICM] Add pass boilerplate code.
Add an empty DeLICM pass, without any functional parts.

Extracting the boilerplate from the the functional part reduces the size of the
code to review (https://reviews.llvm.org/D24716)

Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 288160
2016-11-29 16:41:21 +00:00
Michael Kruse 11c5e07925 canSynthesize: Remove unused argument LI. NFC.
The helper function polly::canSynthesize() does not directly use the LoopInfo
analysis, hence remove it from its argument list.

llvm-svn: 288144
2016-11-29 15:11:04 +00:00
Tobias Grosser df8f35b7b8 Update for clang-format change in r288119
llvm-svn: 288134
2016-11-29 12:52:08 +00:00
Tobias Grosser 278f9e7d27 [ScopInfo] Use SCEVRewriteVisitor to simplify SCEVSensitiveParameterRewriter [NFC]
llvm-svn: 287984
2016-11-26 17:58:40 +00:00
Tobias Grosser b45ae5601b [ScopDetect] Expand statistics of the detected scops
We now collect:

  Number of total loops
  Number of loops in scops
  Number of scops
  Number of scops with maximal loop depth 1
  Number of scops with maximal loop depth 2
  Number of scops with maximal loop depth 3
  Number of scops with maximal loop depth 4
  Number of scops with maximal loop depth 5
  Number of scops with maximal loop depth 6 and larger
  Number of loops in scops (profitable scops only)
  Number of scops (profitable scops only)
  Number of scops with maximal loop depth 1 (profitable scops only)
  Number of scops with maximal loop depth 2 (profitable scops only)
  Number of scops with maximal loop depth 3 (profitable scops only)
  Number of scops with maximal loop depth 4 (profitable scops only)
  Number of scops with maximal loop depth 5 (profitable scops only)
  Number of scops with maximal loop depth 6 and larger (profitable scops only)

These statistics are certainly completely accurate as we might drop scops
when building up their polyhedral representation, but they should give a good
indication of the number of scops we detect.

llvm-svn: 287973
2016-11-26 07:37:46 +00:00
Tobias Grosser 5c00b0dc74 [ScopDetectionDiagnostic] Collect statistics for each diagnostic type
Our original statistics were added before we introduced a more fine-grained
diagnostic system, but the granularity of our statistics has never been
increased accordingly. This change introduces now one statistic counter per
diagnostic to enable us to collect fine-grained statistics about who certain
scops are not detected. In case coarser grained statistics are needed, the
user is expected to combine counters manually.

llvm-svn: 287968
2016-11-26 05:53:09 +00:00
Tobias Grosser c64269ea1b [ScopDectionDiagnostic] Use scoped enums instead three letter prefix [NFC]
This improves readability of the code.

llvm-svn: 287963
2016-11-26 03:44:31 +00:00
Hongbin Zheng 4ff1c3983d Fix typo.
llvm-svn: 287819
2016-11-23 21:59:33 +00:00
Tobias Grosser eab0943ec0 Update to isl-0.17.1-284-gbb38638
Regular maintenance update with only minor changes.

llvm-svn: 287703
2016-11-22 21:31:59 +00:00
Tobias Grosser b3c3d149b9 [CodeGen] Add flag to code-generate most memory access expressions
Introduce the new flag -polly-codegen-generate-expressions which forces Polly
to code generate AST expressions instead of using our SCEV based access
expression generation even for cases where the original memory access relation
was not changed and the SCEV based access expression could be code generated
without any issue.

This is an experimental option for better testing the isl ast expression
generation. The default behavior of Polly remains unchanged. We also exclude
a couple of cases for which the AST expression is not yet working.

llvm-svn: 287694
2016-11-22 20:21:16 +00:00
Hongbin Zheng a8fb73fc0b Split ScopInfo::addScopStmt into two versions. NFC
One for adding statement for region, another one for BB

llvm-svn: 287566
2016-11-21 20:09:40 +00:00
Hongbin Zheng 3ffa6f40b0 Minor change
llvm-svn: 287562
2016-11-21 19:26:10 +00:00
Tobias Grosser 1f0236d8e5 [ScopDetect] Use mayReadOrWriteMemory to shorten condition
llvm-svn: 287525
2016-11-21 09:07:30 +00:00
Tobias Grosser b94e9b31d0 [ScopDetect] Remove unnecessary namespace qualifier
llvm-svn: 287524
2016-11-21 09:04:45 +00:00
Johannes Doerfert 81aa6e882f [NFC] Adjust naming scheme of statistic variables
Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 287347
2016-11-18 14:37:08 +00:00
Johannes Doerfert 6cd59e9076 Probably overwritten loads should not be considered hoistable
Do not assume a load to be hoistable/invariant if the pointer is used by
another instruction in the SCoP that might write to memory and that is
always executed.

llvm-svn: 287272
2016-11-17 22:25:17 +00:00
Johannes Doerfert 50dfbc572a [NFC] Add flag to disable error block assumptions
The declaration as an "error block" is currently aggressive and not very
smart. This patch allows to disable error blocks completely. This might
be useful to prevent SCoP expansion to a point where the assumed context
becomes infeasible, thus the SCoP has to be discarded.

llvm-svn: 287271
2016-11-17 22:16:35 +00:00
Johannes Doerfert c97654681e [FIX] Do not try to hoist memory intrinsic
Since we do not necessarily treat memory intrinsics as non-affine
anymore, we have to check for them explicitly before we try to hoist an
access.

llvm-svn: 287270
2016-11-17 22:11:56 +00:00
Johannes Doerfert b3265a3612 [NFC] Skip over trivial assumptions
Filter trivial assumptions, thus assume { : } or restrict { : 0 = 1 },
as they clutter the user output as well as the statistics.

llvm-svn: 287269
2016-11-17 22:08:40 +00:00
Johannes Doerfert dae2e9287d [DBG] Collect statistics about actually versioned SCoPs
llvm-svn: 287267
2016-11-17 21:55:43 +00:00
Johannes Doerfert 8c5464a715 [DBG] Allow to emit the RTC value at runtime
The new command line flag "polly-codegen-emit-rtc-print" can be used to
place a "printf" in the generated code that will print the RTC value and
the overflow state.

llvm-svn: 287265
2016-11-17 21:49:19 +00:00
Johannes Doerfert cfadb2293f [DBG] Collect statistics about statically infeasible SCoPs
llvm-svn: 287263
2016-11-17 21:44:47 +00:00
Johannes Doerfert cd195326bf [DBG] Collect statistics about taken assumptions
llvm-svn: 287261
2016-11-17 21:41:08 +00:00
Tobias Grosser 06e1592663 Update to isl-0.17.1-267-gbf9723d
This update corrects an incorrect generation of min/max expressions in the isl
AST generator and a problematic nullptr dereference.

llvm-svn: 287098
2016-11-16 11:06:47 +00:00
Tobias Grosser 26be8e99b6 [ScopBuilder] Drop unnecessary namespace identifiers [NFC]
llvm-svn: 286781
2016-11-13 21:28:13 +00:00
Tobias Grosser 5743e8de86 [SCEVAffinator] Do not scan redundantly for parameters
In r286430 "SCEVValidator: add new parameters resulting from constant
extraction" we added functionality to scan for parameters after constant
extraction has taken place to ensure newly created parameters are correctly
registered. This addition made the already existing registration of parameters
redundant. Hence, we remove the corresponding call in this commit.

An alternative solution would have been to also perform constant extraction when
validating SCEV expressions and to then scan for parameters when validating
a SCEV expression. However, as SCEV validation is used during SCoP detection
where we want to be especially fast, adding additional functionality on this
hot path should be avoided if good alternatives exist. In this case, we can
choose to continue to only transform SCEV expression when actually modeling
them. As all transformations we perform are expected to not change the validity
of the SCEV expressions, this solution seems preferable.

Suggested-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286780
2016-11-13 21:28:07 +00:00
Tobias Grosser 70d2709b1a [ScopDetect] Conservatively handle inaccessible memory alias attributes
Commit r286294 introduced support for inaccessiblememonly and
inaccessiblemem_or_argmemonly attributes to BasicAA, which we need to
support to avoid undefined behavior. This change just refuses all calls
which are annotated with these attributes, which is conservatively correct.
In the future we may consider to model and support such function calls
in Polly.

llvm-svn: 286771
2016-11-13 19:27:24 +00:00
Tobias Grosser a2f8fa33aa [ScopDetect] Evaluate and verify branches at branch condition, not icmp
The validity of a branch condition must be verified at the location of the
branch (the branch instruction), not the location of the icmp that is
used in the branch instruction. When verifying at the wrong location, we
may accept an icmp that is defined within a loop which itself dominates, but
does not contain the branch instruction. Such loops cannot be modeled as
we only introduce domain dimensions for surrounding loops. To address this
problem we change the scop detection to evaluate and verify SCEV expressions at
the right location.

This issue has been around since at least r179148 "scop detection: properly
instantiate SCEVs to the place where they are used", where we explicitly
set the scope to the wrong location. Before this commit the scope
was not explicitly set, which probably also resulted in the scope around the
ICmp to be choosen.

This resolves http://llvm.org/PR30989

Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286769
2016-11-13 19:27:04 +00:00
Tobias Grosser f67433abd9 SCEVAffinator: pass parameter-only set to addRestriction if BB=nullptr
Assumptions can either be added for a given basic block, in which case the set
describing the assumptions is expected to match the dimensions of its domain.
In case no basic block is provided a parameter-only set is expected to describe
the assumption.

The piecewise expressions that are generated by the SCEVAffinator sometimes
have a zero-dimensional domain (e.g., [p] -> { [] : p <= -129 or p >= 128 }),
which looks similar to a parameter-only domain, but is still a set domain.

This change adds an assert that checks that we always pass parameter domains to
addAssumptions if BB is empty to make mismatches here fail early.

We also change visitTruncExpr to always convert to parameter sets, if BB is
null. This change resolves http://llvm.org/PR30941

Another alternative to this change would have been to inspect all code to make
sure we directly generate in the SCEV affinator parameter sets in case of empty
domains. However, this would likely complicate the code which combines parameter
and non-parameter domains when constructing a statement domain. We might still
consider doing this at some point, but as this likely requires several non-local
changes this should probably be done as a separate refactoring.

Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286444
2016-11-10 11:44:10 +00:00
Tobias Grosser d0b9173caa IslAst: always use the context during ast generation
Providing the context to the ast generator allows for additional simplifcations
and -- more importantly -- allows to generate loops with only partially bounded
domains, assuming the domains are bounded for all parameter configurations
that are valid as defined by the context.

This change fixes the crash reported in http://llvm.org/PR30956

The original reason why we did not include the context when generating an
AST was that CLooG and later isl used to sometimes transfer some of the
constraints that bound the size of parameters from the context into the
generated AST. This resulted in operations with very large constants, which
sometimes introduced problematic integer overflows. The latest versions of
the isl AST generator are careful to not introduce such constants.

Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286442
2016-11-10 09:39:58 +00:00
Tobias Grosser 4d543d654a SCEVValidator: add new parameters resulting from constant extraction
When extracting constant expressions out of SCEVs, new parameters may be
introduced, which have not been registered before. This change scans
SCEV expressions after constant extraction again to make sure newly
introduced parameters are registered.

We may for example extract the constant '8' from the expression '((8 * ((%a *
%b) + %c)) + (-8 * %a))' and obtain the expression '(((-1 + %b) * %a) + %c)'.
The new expression has a new parameter '(-1 + %b) * %a)', which was not
registered before, but must be registered to not crash.

This closes http://llvm.org/PR30953

Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286430
2016-11-10 06:45:28 +00:00
Tobias Grosser bbaeda3fe5 Do not allow switch statements in loop latches
In r248701 "Allow switch instructions in SCoPs" support for switch statements
has been introduced, but support for switch statements in loop latches was
incomplete. This change completely disables switch statements in loop latches.

The original commit changed addLoopBoundsToHeaderDomain to support non-branch
terminator instructions, but this change was incorrect: it added a check for
BI != null to the if-branch of a condition, but BI was used in the else branch
es well. As a result, when a non-branch terminator instruction is encounted a
nullptr dereference is triggered. Due to missing test coverage, this bug was
overlooked.

r249273 "[FIX] Approximate non-affine loops correctly" added code to disallow
switch statements for non-affine loops, if they appear in either a loop latch
or a loop exit. We adapt this code to now prohibit switch statements in
loop latches even if the control condition is affine.

We could possibly add support for switch statements in loop latches, but such
support should be evaluated and tested separately.

This fixes llvm.org/PR30952

Reported-by: Eli Friedman <efriedma@codeaurora.org>
llvm-svn: 286426
2016-11-10 05:20:29 +00:00
Tobias Grosser eba86a1208 ScopInfo: only run code needed for ASSERT in DEBUG mode
Suggested-by: Johannes Doerfert
llvm-svn: 286338
2016-11-09 04:24:49 +00:00
Tobias Grosser a8ca3ed06a SCEVValidator: reduce indentation to increase readability [NFC]
llvm-svn: 286217
2016-11-08 07:17:48 +00:00
Tobias Grosser 16480186f8 IslNodeBuilder: Ensure newly generated memory accesses are well-defined
Add some additional asserts that ensure newly code-generated memory accesses
are defined on all domain and schedule domain instances.

llvm-svn: 286050
2016-11-05 21:46:01 +00:00
Tobias Grosser 744740ad91 ScopInfo: Ensure copy statement memory accesses are correct
Add asserts that verify that the memory accesses of a new copy statement
are defined for all domain instances the copy statement is defined for.

llvm-svn: 286047
2016-11-05 21:02:43 +00:00
Tobias Grosser 9321e208e8 Update isl to isl-0.17.1-243-g24c0339
This introduces big-endian support in imath and resolves
http://llvm.org/PR24632.

llvm-svn: 285993
2016-11-04 11:56:48 +00:00
Michael Kruse e1dc387731 [ScopInfo] Fix isl object leak.
Fix return from function without releasing isl objects, which was introduced
in r269055.

llvm-svn: 285924
2016-11-03 15:19:41 +00:00
Eli Friedman acf8006471 [Polly CodeGen] Break critical edge from RTC to original loop.
This makes polly generate a CFG which is closer to what we want
in LLVM IR, with a loop preheader for the original loop. This is
just a cleanup, but it exposes some fragile assumptions.

I'm not completely happy with the changes related to expandCodeFor;
RTCBB->getTerminator() is basically a random insertion point which
happens to work due to the way we generate runtime checks. I'm not
sure what the right answer looks like, though.

Differential Revision: https://reviews.llvm.org/D26053

llvm-svn: 285864
2016-11-02 22:32:23 +00:00
Eli Friedman b9c6f01a81 [ScopInfo] Make memset etc. affine where possible.
We don't actually check whether a MemoryAccess is affine in very many
places, but one important one is in checks for aliasing.

Differential Revision: https://reviews.llvm.org/D25706

llvm-svn: 285746
2016-11-01 20:53:11 +00:00
Tobias Grosser ebb626e4b7 [ScopDetect] Use SCEVRewriteVisitor to simplify SCEVRemoveSMax rewriter
ScalarEvolution got at some pointer a SCEVRewriteVisitor. Use it to simplify
our SCEVRemoveSMax visitor.

llvm-svn: 285491
2016-10-29 06:19:34 +00:00
Michael Kruse 426e6f71f8 [ScopInfo] Fix: use raw source pointer.
When adding an llvm.memcpy instruction to AliasSetTracker, it uses the raw
source and target pointers which preserve bitcasts.
MemAccInst::getPointerOperand() also returns the raw target pointers, but
Scop::buildAliasGroups() did not for the source pointer. This lead to mismatches
between AliasSetTracker and ScopInfo on which pointer to use.

Fixed by also using raw pointers in Scop::buildAliasGroups().

llvm-svn: 285071
2016-10-25 13:37:43 +00:00
Eli Friedman 286c5a76ba [SCEVAffinator] Make precise modular math more correct.
Integer math in LLVM IR is modular. Integer math in isl is
arbitrary-precision. Modeling LLVM IR math correctly in isl requires
either adding assumptions that math doesn't actually overflow, or
explicitly wrapping the math. However, expressions with the "nsw" flag
are special; we can pretend they're arbitrary-precision because it's
undefined behavior if the result wraps. SCEV expressions based on IR
instructions with an nsw flag also carry an nsw flag (roughly; actually,
the real rule is a bit more complicated, but the details don't matter
here).

Before this patch, SCEV flags were also overloaded with an additional
function: the ZExt code was mutating SCEV expressions as a hack to
indicate to checkForWrapping that we don't need to add assumptions to
the operand of a ZExt; it'll add explicit wrapping itself. This kind of
works... the problem is that if anything else ever touches that SCEV
expression, it'll get confused by the incorrect flags.

Instead, with this patch, we make the decision about whether to
explicitly wrap the math a bit earlier, basing the decision purely on
the SCEV expression itself, and not its users.

Differential Revision: https://reviews.llvm.org/D25287

llvm-svn: 284848
2016-10-21 18:08:02 +00:00
Mandeep Singh Grang 48e7add80f [polly] Change SmallPtrSet which are being iterated into SmallSetVector
Summary: Otherwise the lack of an iteration order results in non-determinism in codegen.

Reviewers: _jdoerfert, zinob, grosser

Tags: #polly

Differential Revision: https://reviews.llvm.org/D25863

llvm-svn: 284845
2016-10-21 17:29:10 +00:00
Michael Kruse 3d0266b664 [cmake] Avoid warnings in feature tests. NFC.
Apply the __attribute__((unused)) before the function to unambiguously apply to
the function declaration.

Add more casts-to-void to mark return values unused as intended.

Contributed-by: Andy Gibbs <andyg1001@hotmail.co.uk>
llvm-svn: 284718
2016-10-20 11:16:19 +00:00
Tobias Grosser 69ed8542f5 Update isl to isl-0.17.1-236-ga9c6cc7
This includes isl_id_to_str, which is used in Michael's upcoming DeLICM patch.

llvm-svn: 284689
2016-10-20 01:59:24 +00:00
Mandeep Singh Grang 5b1abfc88e [polly] Fix non-determinism in polly BlockGenerators
Summary: Iterating over SeenBlocks which is a SmallPtrSet results in non-determinism in codegen

Reviewers: jdoerfert, zinob, grosser

Tags: #polly

Differential Revision: https://reviews.llvm.org/D25778

llvm-svn: 284622
2016-10-19 17:56:49 +00:00
Eli Friedman 3c1a75bf9c Handle multi-dimensional invariant load.
If the address of a load depends on another load, make sure to emit
the loads in the right order.

llvm-svn: 284426
2016-10-17 21:04:26 +00:00
Michael Kruse 6a19d592da [ScopDetect] Depend transitively on ScalarEvolution.
ScopDetection might be queried by -dot-scops or -view-scops passes for which
it accesses ScalarEvolution.

llvm-svn: 284385
2016-10-17 13:29:20 +00:00
Tobias Grosser 4b5e24df9a cmake: avoid "zero-length gnu_printf format string" warning in gcc 6.1.1
Contributed-by: Andy Gibbs <andyg1001@hotmail.co.uk>
llvm-svn: 284302
2016-10-15 05:08:12 +00:00
Michael Kruse fa53c86dc1 [ScopInfo/CodeGen] ExitPHI reads are implicit.
Under some conditions MK_Value read accessed where converted to MK_ExitPHI read
accessed. This is unexpected because MK_ExitPHI read accesses are implicit after
the scop execution. This behaviour was introduced in r265261, which fixed a
failed assertion/crash in CodeGen.

Instead, we fix this failure in CodeGen itself. createExitPHINodeMerges(),
despite its name, also handles accesses of kind MK_Value, only to skip them
because they access values that are usually not PHI nodes in the SCoP region's
exit block. Except in the situation observed in r265261.

Do not convert value accessed to ExitPHI accesses and do not handle
value accesses like ExitPHI accessed in CodeGen anymore.

llvm-svn: 284023
2016-10-12 16:31:09 +00:00
Michael Kruse c9edc2ee8d [DepInfo] Print -debug output outside of max-operations scope.
ISL tries to simplify the polyhedral operations before printing its objects.
This increases the operations counter and therefore can contribute to hitting
the operations limit. Therefore the result could be different when -debug output
is enabled, making debugging harder.

llvm-svn: 283745
2016-10-10 11:45:59 +00:00
Michael Kruse 8bfba1ff46 [Support/DepInfo] Introduce IslMaxOperationsGuard and make DepInfo use it. NFC.
IslMaxOperationsGuard defines a scope where ISL may abort operations because if
it takes too many operations. Replace the call to the raw ISL interface by a
use of the guard.

IslMaxOperationsGuard provides a uniform way to define a maximal computation
time for a code region in C++ using RAII.

llvm-svn: 283744
2016-10-10 11:45:54 +00:00
Tobias Grosser b270288752 Fix formatting after recent cl:: changes
This fixes 'make check-polly'

llvm-svn: 283693
2016-10-09 08:31:35 +00:00
Mehdi Amini 732afdd09a Turn cl::values() (for enum) from a vararg function to using C++ variadic template
The core of the change is supposed to be NFC, however it also fixes
what I believe was an undefined behavior when calling:

 va_start(ValueArgs, Desc);

with Desc being a StringRef.

Differential Revision: https://reviews.llvm.org/D25342

llvm-svn: 283671
2016-10-08 19:41:06 +00:00
Hongbin Zheng 5860aef675 Define PATH_MAX on windows
Differential Revision: https://reviews.llvm.org/D25372

llvm-svn: 283600
2016-10-07 20:58:20 +00:00
Michael Kruse de42b43de0 [cmake] Unify disabling upstream project warnings.
Handle MSVC, ISL and PPCG in one place. The only functional change is that
warnings are also disabled for MSVC compiling PPCG (Which currently fails
anyway).

llvm-svn: 283547
2016-10-07 12:38:32 +00:00
Michael Kruse 4b5f6af2dc [cmake] Move isl_test artifacts to Polly folder.
Folders in Visual Studio solutions help organize the build artifacts from all
LLVM projects. There is a folder to keep Polly-built files in.

llvm-svn: 283546
2016-10-07 12:38:24 +00:00
Tobias Grosser e84ee850d1 Build and run isl_test as part of check-polly
Running isl tests is important to gain confidence that the isl build we created
works as expected. Besides the actual isl tests, there are also isl AST
generation tests shipped with isl. This change only adds support for the isl
unit tests. AST generation test support is left for a later commit.

There is a choice to run tests directly through the build system or in the
context of lit. We choose to run tests as part of lit to as this allows us to
easily set environment variables, print output only on error and generally run
the tests directly from the lit command.

Reviewers: brad.king, Meinersbur

Subscribers: modocache, brad.king, pollydev, beanz, llvm-commits, mgorny

Differential Revision: https://reviews.llvm.org/D25155

llvm-svn: 283245
2016-10-04 19:48:40 +00:00
Michael Kruse 6ab4476835 [ScopInfo] Add -polly-unprofitable-scalar-accs option.
With this option one can disable the heuristic that assumes that statements with
a scalar write access cannot be profitably optimized. Such a statement instances
necessarily have WAW-dependences to itself. With DeLICM scalar accesses can be
changed to array accesses, which can avoid these WAW-dependence.

llvm-svn: 283233
2016-10-04 17:33:39 +00:00
Michael Kruse ca7cbcca37 [ScopInfo] Scalar access do not have indirect base pointers.
ScopArrayInfo used to determine base pointer origins by looking up whether the
base pointer is a load. The "base pointer" for scalar accesses is the
llvm::Value being accessed. This is only a symbolic base pointer, it
represents the alloca variable (.s2a or .phiops) generated for it at code
generation.

This patch disables determining base pointer origin for scalars.
A test case where this caused a crash will be added in the next commit. In that
test SAI tried to get the origin base pointer that was only declared later,
therefore not existing. This is probably only possible for scalars used in
PHINode incoming blocks.

llvm-svn: 283232
2016-10-04 17:33:34 +00:00
Tobias Grosser 5652c9c830 isl: update to isl-0.17.1-233-gc911e6a
llvm-svn: 283049
2016-10-01 19:46:51 +00:00
Michael Kruse 4b0c5aea78 [CodeGen] Add assertion for indirect array index expression generation. NFC.
Currently Polly cannot generate code for index expressions if the base pointer
is computed within the scop. The base pointer must be generated as well, but
there is no code that triggers that.

Add an assertion to detect when this would occur and miscompile. The IR verifier
should catch it as well.

llvm-svn: 282893
2016-09-30 18:29:37 +00:00
Michael Kruse 51f514d853 [Support] Compile fix for gcc. NFC.
gcc 5.4 insists on template specialization to be in a namespace polly { ... }
block, instead of being prefixed with 'polly::'. Error message:

root/src/llvm/tools/polly/lib/Support/GICHelper.cpp:203:54: error: specialization of ‘template<class T> void polly::IslPtr<T>::dump() const’ in different namespace [-fpermissive]
   template <> void polly::IslPtr<isl_##TYPE>::dump() const {                   \
                                                      ^
msvc14 and clang 3.8 did not complain.

llvm-svn: 282874
2016-09-30 16:47:43 +00:00
Michael Kruse 55519dad62 [Support] Add (Nonowning-)IslPtr::dump(). NFC.
The dump() methods can be called from a debugger instead of e.g.

    isl_*_dump(Var.Obj)

where Var is a variable of type IslPtr/NonowningIslPtr. To ensure that the
existence of the function pointers do not depdend on whether the methods are
used somwhere, they are declared with external linkage.

llvm-svn: 282870
2016-09-30 16:10:19 +00:00
Michael Kruse 888ab55140 [CodeGen] Change 'Scalar' to 'Array' in method names. NFC.
generateScalarLoad() and generateScalarStore() are used for explicit (MK_Array)
memory accesses, therefore the method names were misleading. The names also
were similar to generateScalarLoads() and generateScalarStores() (plural forms)
which indeed handle scalar accesses. Presumbly, they were originally named to
contrast VectorBlockGenerator::generateLoad().

Rename the two methods to generateArrayLoad(),
respectively generateArrayStore().

llvm-svn: 282861
2016-09-30 14:34:05 +00:00
Michael Kruse 77394f1394 [CodeGen] Add assertion for partial scalar accesses. NFC.
The code generator always adds unconditional LoadInst and StoreInst, hence the
MemoryAccess must be defined over all statement instances.

llvm-svn: 282853
2016-09-30 14:01:46 +00:00
Tobias Grosser 349d1c3368 [ScopDetection] Remove redundant checks for endless loops
Summary:
Both `canUseISLTripCount()` and `addOverApproximatedRegion()` contained checks
to reject endless loops which are now removed and replaced by a single check
in `isValidLoop()`.

For reporting such loops the `ReportLoopOverlapWithNonAffineSubRegion` is
renamed to `ReportLoopHasNoExit`. The test case
`ReportLoopOverlapWithNonAffineSubRegion.ll` is adapted and renamed as well.

The schedule generation in `buildSchedule()` is based on the following
assumption:

Given some block B that is contained in a loop L and a SESE region R,
we assume that L is contained in R or the other way around.

However, this assumption is broken in the presence of endless loops that are
nested inside other loops. Therefore, in order to prevent erroneous behavior
in `buildSchedule()`, r265280 introduced a corresponding check in
`canUseISLTripCount()` to reject endless loops. Unfortunately, it was possible
to bypass this check with -polly-allow-nonaffine-loops which was fixed by adding
another check to reject endless loops in `allowOverApproximatedRegion()` in
r273905. Hence there existed two separate locations that handled this case.

Thank you Johannes Doerfert for helping to provide the above background
information.

Reviewers: Meinersbur, grosser

Subscribers: _jdoerfert, pollydev

Differential Revision: https://reviews.llvm.org/D24560

Contributed-by: Matthias Reisinger <d412vv1n@gmail.com>
llvm-svn: 281987
2016-09-20 17:05:22 +00:00
Tobias Grosser bc653f2031 GPGPU: Do not run mostly sequential kernels in GPU
In case sequential kernels are found deeper in the loop tree than any parallel
kernel, the overall scop is probably mostly sequential. Hence, run it on the
CPU.

llvm-svn: 281849
2016-09-18 08:31:09 +00:00
Tobias Grosser 82f2af3508 GPGPU: Dynamically ensure 'sufficient compute'
Offloading to a GPU is only beneficial if there is a sufficient amount of
compute that can be accelerated. Many kernels just have a very small number
of dynamic compute, which means GPU acceleration is not beneficial. We
compute at run-time an approximation of how many dynamic instructions will be
executed and fall back to CPU code in case this number is not sufficiently
large. To keep the run-time checking code simple, we over-approximate the
number of instructions executed in each statement by computing the volume of
the rectangular hull of its iteration space.

llvm-svn: 281848
2016-09-18 06:50:35 +00:00
Tobias Grosser 51dfc27589 GPGPU: Store back non-read-only scalars
We may generate GPU kernels that store into scalars in case we run some
sequential code on the GPU because the remaining data is expected to already be
on the GPU. For these kernels it is important to not keep the scalar values
in thread-local registers, but to store them back to the corresponding device
memory objects that backs them up.

We currently only store scalars back at the end of a kernel. This is only
correct if precisely one thread is executed. In case more than one thread may
be run, we currently invalidate the scop. To support such cases correctly,
we would need to always load and store back from a corresponding global
memory slot instead of a thread-local alloca slot.

llvm-svn: 281838
2016-09-17 19:22:31 +00:00
Tobias Grosser fe74a7a1f5 GPGPU: Detect read-only scalar arrays ...
and pass these by value rather than by reference.

llvm-svn: 281837
2016-09-17 19:22:18 +00:00
Tobias Grosser 8f86a47461 Update CFGPrinter -> CFGPrinterLegacyPass
.. to match recent changes in LLVM that broke the Polly compilation.

llvm-svn: 281705
2016-09-16 05:48:09 +00:00
Tobias Grosser aaabbbf886 GPGPU: Do not assume arrays start at 0
Our alias checks precisely check that the minimal and maximal accessed elements
do not overlap in a kernel. Hence, we must ensure that our host <-> device
transfers do not touch additional memory locations that are not covered in
the alias check. To ensure this, we make sure that the data we copy for a
given array is only the data from the smallest element accessed to the largest
element accessed.

We also adjust the size of the array according to the offset at which the array
is actually accessed.

An interesting result of this is: In case array are accessed with negative
subscripts ,e.g., A[-100], we automatically allocate and transfer _more_ data to
cover the full array. This is important as such code indeed exists in the wild.

llvm-svn: 281611
2016-09-15 14:05:58 +00:00
Roman Gareev b3224adfb6 Perform copying to created arrays according to the packing transformation
This is the fourth patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform copying to created
arrays, which is the last step to implement the packing transformation.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D23260

llvm-svn: 281441
2016-09-14 06:26:09 +00:00
Tobias Grosser e8c69bbabd cmake: PollyPPCG depends on PollyISL
This line makes BUILD_SHARED_LIBS=ON work for Polly-ACC. Without it, ld
complains about missing isl symbols when constructing the shared library.

llvm-svn: 281396
2016-09-13 21:09:35 +00:00
Tobias Grosser 0a893f7df4 GPGPU: Use const_cast to avoid compiler warning [NFC]
llvm-svn: 281333
2016-09-13 13:22:27 +00:00
Michael Kruse 19c9d99f45 Use value directly instead of reference. NFC.
The alias to the array element is read-only and a primitive type (pointer),
therefore use the value directly instead of a reference to it.

llvm-svn: 281311
2016-09-13 09:56:05 +00:00
Tobias Grosser a82c4b5df8 GPGPU: Allow region statements
llvm-svn: 281305
2016-09-13 08:42:10 +00:00
Tobias Grosser b79f4d3970 GPGPU: Extend types when array sizes have smaller types
This prevents a compiler crash.

llvm-svn: 281303
2016-09-13 08:02:14 +00:00
Michael Kruse e5e752a28b Remove -fvisibility=hidden and FORCE_STATIC.
The flag -fvisibility=hidden flag was used for the integrated Integer
Set Library (and PPCG) to keep their definitions local to Polly. The
motivation was the be loaded into a DragonEgg-powered GCC, where GCC
might itself use ISL for its Graphite extension. The symbols of Polly's
ISL and GCC's ISL would clash.

The DragonEgg project is not actively developed anymore, but Polly's
unittests need to call ISL functions to set up a testing environment.
Unfortunately, the -fvisibility=hidden flag means that the ISL symbols
are not available to the gtest executable as it resides outside of
libPolly when linked dynamically. Currently, CMake links a second copy
of ISL into the unittests which leads to subtle bugs. What got observed
is that two isl_ids for isl_id_none exist, one for each library
instance. Because isl_id's are compared by address, isl_id_none could
happen to be different from isl_id_none, depending on which library
instance set the address and does the comparison.

Also remove the FORCE_STATIC flag which was introduced to keep the ISL
symbols visible inside the same libPolly shared object, even when build
with BUILD_SHARED_LIBS.

Differential Revision: https://reviews.llvm.org/D24460

llvm-svn: 281242
2016-09-12 18:25:00 +00:00
Roman Gareev f5aff70405 Store the size of the outermost dimension in case of newly created arrays that require memory allocation.
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D23991

llvm-svn: 281234
2016-09-12 17:08:31 +00:00
Tobias Grosser 5857b701a3 GPGPU: Bail out gracefully in case of invalid IR
Instead of aborting, we now bail out gracefully in case the kernel IR we
generate is invalid. This can currently happen in case the SCoP stores
pointer values, which we model as arrays, as data values into other arrays. In
this case, the original pointer value is not available on the device and can
consequently not be stored. As detecting this ahead of time is not so easy, we
detect these situations after the invalid IR has been generated and bail out.

llvm-svn: 281193
2016-09-12 06:06:31 +00:00
Tobias Grosser 02293ed755 GPGPU: Do not fail in case of arrays never accessed
If these arrays have never been accessed we failed to derive an upper bound
of the accesses and consequently a size for the outermost dimension. We
now explicitly check for empty access sets and then just use zero as size
for the outermost dimension.

llvm-svn: 281165
2016-09-11 13:30:12 +00:00
Tobias Grosser 5aea5653b3 FlattenAlgo: Ensure we _really_ obtain a param space
This resolves "isl_space.c:1775: not a parameter space" errors I have seen
on two systems.

llvm-svn: 281052
2016-09-09 16:11:26 +00:00
Tobias Grosser a3afe44d6c IslNodeBuilder: Add missing __isl_take annotation
llvm-svn: 281034
2016-09-09 11:16:50 +00:00
Michael Kruse 7886bd7ca5 Add -polly-flatten-schedule pass.
The -polly-flatten-schedule pass reduces the number of scattering
dimensions in its isl_union_map form to make them easier to understand.
It is not meant to be used in production, only for debugging and
regression tests.

To illustrate, how it can make sets simpler, here is a lifetime set
used computed by the porposed DeLICM pass without flattening:

    { Stmt_reduction_for[0, 4] -> [0, 2, o2, o3] : o2 < 0;
      Stmt_reduction_for[0, 4] -> [0, 1, o2, o3] : o2 >= 5;
      Stmt_reduction_for[0, 4] -> [0, 1, 4, o3] : o3 > 0;
      Stmt_reduction_for[0, i1] -> [0, 1, i1, 1] : 0 <= i1 <= 3;
      Stmt_reduction_for[0, 4] -> [0, 2, 0, o3] : o3 <= 0 }

And here the same lifetime for a semantically identical one-dimensional
schedule:

    { Stmt_reduction_for[0, i1] -> [2 + 3i1] : 0 <= i1 <= 4 }

Differential Revision: https://reviews.llvm.org/D24310

llvm-svn: 280948
2016-09-08 15:02:36 +00:00
Tobias Grosser a2d80ba58a GICHelper: Correctly assign return value
... to preserve reference counting logic.

In practice the missing assignment would not have caused any issues. We still
fix it as the code is wrong and it also causes noise in the clang static
analysis runs.

llvm-svn: 280946
2016-09-08 14:34:54 +00:00
Tobias Grosser b27ed0da37 SCEVAffinator: Add missing __isl_take annotations
llvm-svn: 280943
2016-09-08 14:31:31 +00:00
Tobias Grosser 55a7af7da5 ScopInfo: Make clear that no double-free problem exists
When running the clang static analyser to check for memory issues, this code
originally showed a double free, as the analyser was unable to understand that
isl_set_free always returns NULL and consequently later uses of the isl object
we just freed will never be reached. Without this knowledge, the analyser has
to issue a warning.

We refactor the code to make it clear that for empty maps the current loop
iteration is aborted.

llvm-svn: 280940
2016-09-08 14:08:07 +00:00
Tobias Grosser b316dc166f ScopDetection: Make sure we do not accidentally divide by zero
This code path is likely never triggered, but by still handling this case
locally we avoid warnings in clangs static analyzer.

llvm-svn: 280939
2016-09-08 14:08:05 +00:00
Tobias Grosser adfc971820 DependenceInfo: Make clear that no double-free problem exists
When running the clang static analyser to check for memory issues, this code
originally showed a double free, as the analyser was unable to understand that
isl_union_map_free always returns NULL and consequently later uses of the isl
object we just freed will never be reached. Without this knowledge, the analyser
has to issue a warning.

We refactor the code to make it clear that for empty maps the current loop
iteration is aborted.

llvm-svn: 280938
2016-09-08 14:08:01 +00:00
Tobias Grosser f3600dfa2d IslNodeBuilder: Add missing __isl_take annotations
llvm-svn: 280936
2016-09-08 13:48:55 +00:00
Tobias Grosser 2a526feec9 ScopInfo: Add missing __isl_take annotation
llvm-svn: 280923
2016-09-08 11:18:56 +00:00
Michael Kruse 349779cc99 Disable MSVC warnings on ISL.
Disable some Visual C++ warnings on ISL. These are not reported by GCC/Clang in
the ISL build system. We do not intend to fix them in the Polly in-tree copy,
hence disable these warnings.

llvm-svn: 280811
2016-09-07 14:11:20 +00:00
Tobias Grosser 8d4cb1a060 ScopInfo: Do not derive assumptions from all GEP pointer instructions
... but instead rely on the assumptions that we derive for load/store
instructions.

Before we were able to delinearize arrays, we used GEP pointer instructions
to derive information about the likely range of induction variables, which
gave us more freedom during loop scheduling. Today, this is not needed
any more as we delinearize multi-dimensional memory accesses and as part
of this process also "assume" that all accesses to these arrays remain
inbounds. The old derive-assumptions-from-GEP code has consequently become
mostly redundant. We drop it both to clean up our code, but also to improve
compile time. This change reduces the scop construction time for 3mm in
no-asserts mode on my machine from 48 to 37 ms.

llvm-svn: 280601
2016-09-03 21:55:25 +00:00
Tobias Grosser 66c6506aac Dependences: Only create flat StmtSchedule in presence of reductions
Without reductions we do not need a flat union_map schedule describing
the computation we want to perform, but can work purely on the schedule
tree. This reduces the dependence computation and scheduling time from 33ms
to 25ms. Another 30% reduction.

llvm-svn: 280558
2016-09-02 23:40:15 +00:00
Tobias Grosser dff5de2e44 Dependences: Exit early, if no reduction dependences are needed.
In case we do not compute reduction dependences or dependences that are more
fine-grained than statement level dependences, we can avoid the corresponding
part of the dependence analysis all together. For the 3mm benchmark, this
reduces scheduling + dependence analysis time from 62ms to 33ms for a no-asserts
build.  The majority of the compile time is anyhow spent in the LLVM backends,
when doing code generation. Nevertheless, there is no need to waste compile time
either.

llvm-svn: 280557
2016-09-02 23:29:38 +00:00
Tobias Grosser b1000c39a0 Introduce option to run isl AST generation, but no IR generation.
We replace the options

  -polly-code-generator=none
                       =isl

with the options

  -polly-code-generation=none
                        =ast
                        =full

This allows us to measure the overhead of Polly itself, versus the compile
time increases due to us generating more IR and consequently the LLVM backends
spending more time on this IR.

We also use this opportunity to rename the option. The original name was
introduced at a point where we still had two code generators. CLooG and the
isl AST generator. Since we only have one AST generator left, there is no need
to distinguish between 'isl' and something else. However, being able to disable
code generation all together has been shown useful for debugging. Hence, we
rename and extend this option to make it a good fit for its new use case.

llvm-svn: 280554
2016-09-02 23:05:42 +00:00
Tobias Grosser c80d6979bd Drop '@brief' from doxygen comments
LLVM's coding guideline suggests to not use @brief for one-sentence doxygen
comments to improve readability. Switch this once and for all to ensure people
do not copy @brief comments from other parts of Polly, when writing new code.

llvm-svn: 280468
2016-09-02 06:33:33 +00:00