Commit Graph

402 Commits

Author SHA1 Message Date
Tobias Grosser feae3dfe9f [unittests] Add unittest for getPartialTilePrefixes
In https://reviews.llvm.org/D36278 it was pointed out that the behavior of
getPartialTilePrefixes is not very well understood. To allow for a better
understanding, we first provide some basic unittests.

llvm-svn: 310175
2017-08-05 09:38:09 +00:00
Michael Kruse 138a3fbae1 [DeLICM] Refactor ZoneAlgorithm into ZoneAlgo.cpp. NFC.
Extract ZoneAlgorithm from DeLICM.cpp into its own file.
It will gain a second use by the load forwarding part of
-polly-optree.

llvm-svn: 310146
2017-08-04 22:51:23 +00:00
Michael Kruse a9a7086319 [ForwardOpTree] Refactor out forwardSpeculatable(). NFC.
The method forwardSpeculatable forwards speculatively executable
instructions and is currently the only way to forward an
instruction.

In the future we intend to add more methods.

llvm-svn: 310056
2017-08-04 12:28:42 +00:00
Tobias Grosser 7b45af13ce Move setNewAccessRelation to isl++
llvm-svn: 309871
2017-08-02 19:27:25 +00:00
Philip Pfaffe a70e2649ab [Polly][PM][WIP] Polly pass registration
Summary:
This patch is a first attempt at registering Polly passes with the LLVM tools. Tool plugins are still unsupported, but this registration is usable from the tools if Polly is linked into them (albeit requiring minimal patches to those tools). Registration requires a small amount of machinery (the owning analysis proxies), necessary for injecting ScopAnalysisManager objects into the calling tools.

This patch is marked WIP because the registration is incomplete. Parsing manual pipelines is fully supported, but default pass injection into the O3 pipeline is lacking, mostly because there is opportunity for some redesign here, I believe. The first point of order would be insertion points. I think it makes sense to run before the vectorizers. Running Polly Early, however, is weird. Mostly because it actually is the default (which to me is unexpected), and because Polly runs it's own O1 pipeline. Why not instead insert it at an appropriate place somewhere after simplification happend? Running after the loop optimizers seems intuitive, but it also seems wasteful, since multiple consecutive loops might well be a single scop, and we don't need to run for all of them.

My second request for comments would be regarding all those smallish helper passes we have,  like PollyViewer, PollyPrinter, PollyImportJScop. Right now these are controlled by command line options, deciding whether they should be part of the Polly pipeline. What is your opinion on treating them like real passes, and have the user write an appropriate pipeline if they want to use any of them?

Reviewers: grosser, Meinersbur, bollu

Reviewed By: grosser

Subscribers: llvm-commits, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35458

llvm-svn: 309826
2017-08-02 15:52:25 +00:00
Michael Kruse fd35089689 [ForwardOpTree] Execute canForwardTree also in release builds.
Commit r309730 moved the call to canForwardTree into an assert(), even
though this function has side-effects if its DoIt parameter is true. To
avoid a warning in release builds, do an (void)Execution of its result
instead.

To avoid such confusion in the future, rename
canForwardTree() to forwardTree().

llvm-svn: 309753
2017-08-01 22:15:04 +00:00
Michael Kruse bc88a78cb4 [Simplify] Rewrite redundant write detection algorithm.
The previous algorithm was to search a writes and the sours of its value
operand, and see whether the write just stores the same read value back,
which includes a search whether there is another write access between
them. This is O(n^2) in the max number of accesses in a statement
(+ the complexity of isl comparing the access functions).

The new algorithm is more similar to the one used for searching for
overwrites and coalescable writes. It scans over all accesses in order
of execution while tracking which array elements still have the same
value since it was read. This is O(n), not counting the complexity
within isl. It should be more reliable than trying to catch all
non-conforming cases in the previous approach. It is also less code.

We now also support if the write is a partial write of the read's
domain, and to some extent non-affine subregions.

Differential Revision: https://reviews.llvm.org/D36137

llvm-svn: 309734
2017-08-01 20:01:34 +00:00
Reid Kleckner 859c1e606a Silence -Wunused-variable warning in NDEBUG builds
llvm-svn: 309730
2017-08-01 19:53:01 +00:00
Michael Kruse 693ef99935 [Simplify] Improve scalability.
With a lot of reads and writes to the same array in a statement,
some isl sets that capture the state between access can become
complex such that isl takes more considerable time and memory
for operations on them.

The problems identified were:

- is_subset() takes considerable time with many disjoints in the
  arguments. We limit the number of disjoints to 4, any additional
  information is thrown away.

- subtract() can lead to many disjoints. We instead assume that any
  array element is possibly accessed, which removes all disjoints.

- subtract_domain() may lead to considerable processing, even if all
  elements are are to be removed. Instead, we remove determine and
  remove the affected spaces manually. No behaviour is changed.

llvm-svn: 309728
2017-08-01 19:39:11 +00:00
Michael Kruse 9f6e41cdba [ForwardOpTree] Support synthesizable values.
This allows -polly-optree to move instructions that depend on
synthesizable values.

The difficulty for synthesizable values is that their value depends on
the location. When it is moved over a loop header, and the SCEV
expression depends on the loop induction variable (SCEVAddRecExpr), it
would use the current induction variable instead of the last one.

At the moment we cannot forward PHI nodes such that crossing the header
of loops referenced by SCEVAddRecExpr is not possible (assuming the loop
header has at least two incoming blocks: for entering the loop and the
backedge, such any instruction to be forwarded must have a phi between
use and definition).

A remaining issue is when the forwarded value is used after the loop,
but is only synthesizable inside the loop. This happens e.g. if
ScalarEvolution is unable to determine the number of loop iterations or
the initial loop value. We do not forward in this situation.

Differential Revision: https://reviews.llvm.org/D36102

llvm-svn: 309609
2017-07-31 19:46:21 +00:00
Michael Kruse 57cc92b790 [Simplify] Remove all kinds of redundant scalar writes.
In addition to array and PHI writes, also allow scalar value writes.
The only kind of write not allowed are writes by functions
(including memcpy/memmove/memset).

llvm-svn: 309582
2017-07-31 17:04:55 +00:00
Michael Kruse ce9617f4fe [Simplify] Implement write accesses coalescing.
Write coalescing combines write accesses that

- Write the same llvm::Value.
- Write to the same array.
- Unless they do not write anything in a statement instance (partial
  writes), write to the same element.
- There is no other access between them that accesses the same element.

This is particularly useful after DeLICM, which leaves partial writes to
disjoint domains.

Differential Revision: https://reviews.llvm.org/D36010

llvm-svn: 309489
2017-07-29 16:21:16 +00:00
Michael Kruse 6c8f91b908 [Simplify] Fix typo in statistics output. NFC.
llvm-svn: 309402
2017-07-28 16:57:51 +00:00
Michael Kruse 34a77780c5 [Simplify] Remove empty partial accesses first. NFC.
So follow-up cleanup do not need special handling for such accesses.

llvm-svn: 309401
2017-07-28 16:57:45 +00:00
Michael Kruse cedd7a74e1 [Simplify] Do not setInstructions() of region stmts. NFC.
The instruction list is ignored for region statements, there
is no reason to set it.

llvm-svn: 309196
2017-07-26 22:01:28 +00:00
Roman Gareev 2e580538be [ScheduleOptimizer] Translate to C++ bindings
Translate the ScheduleOptimizer to use the new isl C++ bindings.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D35845

llvm-svn: 309119
2017-07-26 14:59:15 +00:00
Tobias Grosser d7065e5df5 Move MemoryAccess::isStride* to isl++
llvm-svn: 308927
2017-07-24 20:50:22 +00:00
Michael Kruse 54071126d8 [ForwardOpTree] Properly indent enumeration in comment. NFC.
llvm-svn: 308887
2017-07-24 15:34:03 +00:00
Michael Kruse 67752076bc [ForwardOpTree] Rename FD_CanForward to FD_CanForwardLeaf. NFC.
To make the meaning and distinction to FD_CanForwardTree clearer.

llvm-svn: 308886
2017-07-24 15:33:58 +00:00
Michael Kruse d85e345ce0 [ForwardOpTree] Add comments to ForwardingDecision items. NFC.
In particular, explain the difference between FD_CanForward
and FD_CanForwardTree.

llvm-svn: 308885
2017-07-24 15:33:53 +00:00
Michael Kruse 07e8c36dc7 [ForwardOpTree] Support read-only value uses.
Read-only values (values defined before the SCoP) require special
handing with -polly-analyze-read-only-scalars=true (which is the
default). If active, each use of a value requires a read access.
When a copied value uses a read-only value, we must also ensure that
such a MemoryAccess is available or is created.

Differential Revision: https://reviews.llvm.org/D35764

llvm-svn: 308876
2017-07-24 12:43:27 +00:00
Michael Kruse 5b8a9095e8 [ForwardOpTree] Fix mixup in comment. NFC.
The cases DoIt==false and DoIt==true were mixed up.

Thanks to Siddharth for noticing.

llvm-svn: 308874
2017-07-24 12:39:46 +00:00
Michael Kruse 25a688165b [ScopInfo] Fix typo in method name. NFC.
prependInstrunction -> prependInstruction

Thanks Nandini for noticing.

llvm-svn: 308873
2017-07-24 12:39:41 +00:00
Tobias Grosser 325812ac6d Simplify: Adopt for translation of MemoryAccess::getAccessRelation
For some reason this one was missed earlier.

llvm-svn: 308845
2017-07-23 08:15:28 +00:00
Tobias Grosser 1515f6b937 Move MemoryAccess::NewAccessRelation to isl++
We also move related accessor functions

llvm-svn: 308840
2017-07-23 04:08:38 +00:00
Michael Kruse ab8f0d57df [Simplify] Remove partial write accesses with empty domain.
If the access relation's domain is empty, the access will never be
executed. We can just remove it.

We only remove write accesses. Partial read accesses are not yet
supported and instructions in the statement might require the
llvm::Value holding the read's result to be defined.

llvm-svn: 308830
2017-07-22 20:33:09 +00:00
Michael Kruse e5f4706a55 [ForwardOpTree] Support hoisted invariant loads.
Hoisted loads can be trivially supported because there are no
MemoryAccess to be modified, the loaded value is just available
at code generation.

llvm-svn: 308826
2017-07-22 14:30:02 +00:00
Michael Kruse a6b2de3b59 [ForwardOpTree] Introduce the -polly-optree pass.
This pass 'forwards' operand trees into statements that use them in
order to avoid scalar dependencies.

This minimal implementation handles only the case of speculatable
instructions. We will successively add support for:
- Hoisted loads
- Read-only values
- Synthesizable values
- Loads
- PHIs
- Forwarding only parts of the tree

Differential Revision: https://reviews.llvm.org/D35754

llvm-svn: 308825
2017-07-22 14:02:47 +00:00
Tobias Grosser 77eef90f50 Move ScopArrayInfo to isl++
This moves the full ScopArrayInfo class to isl++

llvm-svn: 308801
2017-07-21 23:07:56 +00:00
Michael Kruse cd4c977b8b [ScopInfo] Print instructions in dump().
Print a statement's instruction on dump() regardless of
-polly-print-instructions. dump() is supposed to be used in the debugger
only and never in regression tests. While debugging, get all the
information we have and we are not bound to break anything. For non-dump
purposes of print, forward the setting of -polly-print-instructions as
parameters.

Some calls to print() had to be changed because the
PollyPrintInstructions setting is only available in ScopInfo.cpp.
In ScheduleOptimizer.cpp, dump() was used in regression tests.
That's not what dump() is for.

The print parameter "PrintInstructions" will also be useful for an
explicit print SCoP pass in a future patch.

llvm-svn: 308746
2017-07-21 15:35:53 +00:00
Michael Kruse 22058c3fbb [Simplify] Remove unused instructions and accesses.
Use a mark-and-sweep algorithm to find and remove unused instructions
and MemoryAccesses. This is useful in particular to remove scalar
writes that are never used anywhere. A scalar write in a loop induces
a write-after-write dependency that stops the loop iterations to be
rescheduled. Such writes can be a result of previous transformations
such as DeLICM and operand tree forwarding.

It adds a new class VirtualInstruction that represents an instruction in
a particular statement. At the moment an instruction can only belong to
the statement that represents a BasicBlock. In the future, instructions
can be in one of multiple statements representing a BasicBlock
(Nandini's work), in different statements than its BasicBlock would
indicate, and even multiple statements at once (by forwarding operand
trees). It also integrates nicely with the VirtualUse class.

ScopStmt::contains(Instruction*) currently uses the instruction's parent
BasicBlock to check whether it contains the instruction. It will need to
check the actual statement list when one of the aforementioned features
become possible.

Differential Revision: https://reviews.llvm.org/D35656

llvm-svn: 308626
2017-07-20 16:21:55 +00:00
Michael Kruse 8b8058072f [ScopInfo] Integrate ScalarDefUseChain into polly::Scop. NFC.
Before this patch, ScalarDefUseChain was a tool used by DeLICM to find
all reads and writes of scalar accesses. It iterated once over all
accesses and stores the accesses into maps.

By integrating it into the Scop class, we can keep the maps up-to-date
without the need for recomputing them. It will be needed for more than
DeLICM in the future, such as SCoP simplification, code movement between
virtual statements, and array expansion (GSoC project).

Compared to ScalarUseDefChain, we save two maps by finding the ScopStmt
a Def/PHIRead must reside in, and use its already existing lookup
function to find the MemoryAccess.

Differential Revision: https://reviews.llvm.org/D35631

llvm-svn: 308495
2017-07-19 17:11:25 +00:00
Roman Gareev 750374181b Make the pattern matching work with modified memory accesses
Some optimizations (e.g., DeLICM) can modify memory accesses (e.g., change
their MemoryKind). Consequently, the pattern matching should take it into
the account.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D33138

llvm-svn: 308494
2017-07-19 16:59:06 +00:00
Michael Kruse 629f9185bf [Simplify] Ensure all counters are reset before next SCoP is processed. NFC.
llvm-svn: 308473
2017-07-19 14:07:21 +00:00
Tobias Grosser 303bd07c6e [ScopInfo] Introduce tryGetValueStored
Summary:
This makes code more readable and allows to reuse this functionality in
the future at other places.

Suggested-by Michael Kruse in post-commit review of r307660.

Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg

Reviewed By: Meinersbur

Subscribers: pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35585

llvm-svn: 308435
2017-07-19 11:09:16 +00:00
Tobias Grosser 8e1280b8b2 [Polly] Fix a typo [NFC]
Reviewers: grosser, Meinersbur, bollu

Tags: #polly

Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D35459

llvm-svn: 308134
2017-07-16 13:54:41 +00:00
Tobias Grosser bed2ca6eac [Simplify] Also remove redundant writes which originally came from PHI nodes
llvm-svn: 307660
2017-07-11 14:29:39 +00:00
Singapuram Sanjay Srivallabh 02ca346e48 Introduce a hybrid target to generate code for either the GPU or CPU
Summary:
Introduce a "hybrid" `-polly-target` option to optimise code for either the GPU or CPU.

When this target is selected, PPCGCodeGeneration will attempt first to optimise a Scop. If the Scop isn't modified, it is then sent to the passes that form the CPU pipeline, i.e. IslScheduleOptimizerPass, IslAstInfoWrapperPass and CodeGeneration.

In case the Scop is modified, it is marked to be skipped by the subsequent CPU optimisation passes.

Reviewers: grosser, Meinersbur, bollu

Reviewed By: grosser

Subscribers: kbarton, nemanjai, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D34054

llvm-svn: 306863
2017-06-30 19:42:21 +00:00
Tobias Grosser dcd94e3e93 [ScheduleOptimizer] Fix minor typo [NFC]
llvm-svn: 305709
2017-06-19 16:55:48 +00:00
Tobias Grosser 2fb3ed200a [ScheduleOptimizer] Move isolateFullPartialTiles and isolateAndUnrollMatMulInnerLoops to C++
llvm-svn: 305676
2017-06-19 10:40:12 +00:00
Michael Kruse a6d48f59a1 Fix a lot of typos. NFC.
llvm-svn: 304974
2017-06-08 12:06:15 +00:00
Michael Kruse ad7a1805be [Simplify] Use execution order of memory accesses.
Iterate through memory accesses in execution order (first all implicit reads,
then explicit accesses, then implicit writes).

In the test case this caused an implicit load to be handled as if it was loaded
after the write. That is, the value being written before it is available.

This fixes llvm.org/PR33323

llvm-svn: 304810
2017-06-06 17:46:42 +00:00
Eli Friedman de1b318dad Add opt-bisect support to polly.
This is useful for debugging miscompiles and extracting testcases
for crashes. See http://llvm.org/docs/OptBisect.html .

Differential Revision: https://reviews.llvm.org/D33752

llvm-svn: 304480
2017-06-01 21:29:05 +00:00
Michael Kruse 5f16986271 [DeLICM] Partial writes for PHIs.
Enable the use for partial writes for PHI write accesses with a switch.
This simply skips the test for whether a PHI write would be partial.

The analog test for partial value writes also protects for partial reads
which we do not support (yet). It is possible to test for partial reads
separately such that we could skip the partial write check as well. In
case this shows up to be useful, I can implement it as well.

Differential Revision: https://reviews.llvm.org/D33487

llvm-svn: 303762
2017-05-24 15:23:06 +00:00
Tobias Grosser 7205f93a98 [ScheduleOptimizer] Move schedule construction to isl C++ [NFC]
llvm-svn: 303508
2017-05-21 16:21:33 +00:00
Tobias Grosser b5f61bdeeb [Simplify] Move to isl C++
llvm-svn: 303507
2017-05-21 16:12:21 +00:00
Tobias Grosser 443f6814a1 [isl++] Rebase isl C++ bindings on top of 29aee98ce
This reduces the diff to the official isl C++ bindings and solves a correctness
issue with isl::booleans, where isl_bool_error results were accidentally
converted to isl::boolean::true.

llvm-svn: 303505
2017-05-21 15:59:15 +00:00
Tobias Grosser 3320485961 [isl++] Move isl raw_ostream printers into separate header
Instead of relying on these functions to be part of the isl C++ bindings, we
just define this functionality independently. This allows us to use isl C++
bindings that do not contain LLVM specific functionality.

llvm-svn: 303503
2017-05-21 13:16:05 +00:00
Siddharth Bhat 9746f817ea [Simplify] Fix r302986 that introduced non-inferrable templates.
- auto + decltype + template use was not inferrable in
  `Transform/Simplify.cpp accessesInOrder`.

- changed code to explicitly construct required vector instead of using
  higher order iterator helpers.

- Failing compiler spec:
    Apple LLVM version 7.3.0 (clang-703.0.31)
    Target: x86_64-apple-darwin15.6.0

llvm-svn: 303039
2017-05-15 08:18:51 +00:00
Tobias Grosser 497fdd7dff [Simplify] Remove some leftover dead code
llvm-svn: 303007
2017-05-14 09:20:56 +00:00
Michael Kruse fa7be88378 [Simplify] Remove identical write removal. NFC.
Removal of overwritten writes currently encompasses all the cases
of the identical write removal.

There is an observable behavioral change in that the last, instead
of the first, MemoryAccess is kept. This should not affect the
generated code, however.

Differential Revision: https://reviews.llvm.org/D33143

llvm-svn: 302987
2017-05-13 12:20:57 +00:00
Michael Kruse f263610b82 [Simplify] Remove writes that are overwritten.
Remove memory writes that are overwritten by later writes. This works
for StoreInsts:

      store double 21.0, double* %A
      store double 42.0, double* %A

scalar writes at the end of a statement and mixes of these.

Multiple writes can be the result of DeLICM, which might map multiple
writes to the same location when it knows that these do no conflict
(for instance because they write the same value). Such writes
interfere with pattern-matched optimization such as gemm and may not
get removed by other LLVM passes after code generation.

Differential Revision: https://reviews.llvm.org/D33142

llvm-svn: 302986
2017-05-13 11:49:34 +00:00
Michael Kruse aeb4864090 [Simplify] Reset all stats between runs.
llvm-svn: 302926
2017-05-12 17:23:07 +00:00
Michael Kruse d644ec7647 [DeLICM] Use input access heuristic for mapped PHI WRITEs.
As with the scalar operand of the initial StoreInst, also use input
accesses when searching for new opportunities after mapping a
PHI write.

The same rational applies here: After LICM has been applied, the
promoted value will either be an instruction in the same statement
(in which case we fall back to try every scalar access of the
statement), or in another statement such that there will be such
an input access. In the latter case other scalars cannot have
originated from the same register promotion, at least not by LICM.

This mostly helps to decrease compilation time and makes debugging
easier by not pursuing unpromising routes. In some circumstances,
it may change the compiler's output.

llvm-svn: 302839
2017-05-11 22:56:59 +00:00
Michael Kruse 4c27643398 [DeLICM] Lookup input accesses.
Previous to this patch, we used VirtualUse to determine the input
access of an llvm::Value in a statement. The input access is the
READ MemoryAccess that makes a value available in that statement,
which can either be a READ of a MemoryKind::Value or the
MemoryKind::PHI for a PHINode in the statement. DeLICM uses the input
access to heuristically find a candidate to map without searching all
possible values.

This might modify the behaviour in that previously PHI accesses were
not considered input accesses before. This was unintentially lost when
"VirtualUse" was extracted from the "Known Knowledge" patch.

llvm-svn: 302838
2017-05-11 22:56:46 +00:00
Michael Kruse 07e315e780 [Simplify] Remove identical scalar writes.
After DeLICM, it is possible to have two writes of the same value to
the same location in the same statement when it determined that those
writes do not conflict (write the same value).

Teach -polly-simplify to remove one of the writes. It interferes with
the pattern matching of matrix-multiplication kernels and also seem
to not be optimized away by LLVM.

The algorthm is simple, has O(n^2) behaviour (n = max number of
MemoryAccesses in a statement) and only matches the most obvious cases,
but seem to be enough to pattern-match Boost ublas gemm.

Not handled cases include:
- StoreInst instructions (a.k.a. explicit writes), since the value might
  be loaded or overwritten between the two stores.
- PHINode, especially LCSSA, when the PHI value matches with on other's.
- Partial writes (in preparation)

llvm-svn: 302805
2017-05-11 15:07:38 +00:00
Michael Kruse a0987b83d5 [Simplify] Mark variables as used. NFC.
Mark one more variable as used that is needed in assertions.

llvm-svn: 302726
2017-05-10 20:45:10 +00:00
Michael Kruse 4aac59cee1 [Simplify] Mark variables as used. NFC.
Mark variables as used that are needed in assertions.

llvm-svn: 302725
2017-05-10 20:42:02 +00:00
Michael Kruse f41f274bf8 [DeLICM] Avoid compiler warning. NFC.
gcc 5.4 warns about using a C-style case to case away a const.
Use case a const_cast instead.

llvm-svn: 302715
2017-05-10 19:58:52 +00:00
Michael Kruse f69a7c306b [DeLICM] Always normalize domain. NFC.
Some isl functions can simplify their __isl_keep arguments. The
argument object after the call uses different contraints to represent
the same set. Different contraints can result in different outputs
when printed to a string.

In assert builds additional isl functions are called (in assert() or
mentioned, these can change the internal representation of its read-only
arguments such that printed strings are different in debug and non-debug
builds.

What happened here is that a call to isl_set_is_equal inside an assert
in getScatterFor normalizes one of its arguments such that one redundant
constraint is removed. The redundant constraint therefore does not appear
in the string representing the domain, which FileCheck notices as a
regression test failure compared to a build with assertions disabled.

This fix removes the redundant contraints the domain from the start such
that the redundant contraint is removed in assert and non-assert builds.
Isl adds a flag to such sets such that the removal of redundancies is
not done multiple times (here: by isl_set_is_equal).

Thanks to Tobias Grosser for reporting and hinting to the cause.

llvm-svn: 302711
2017-05-10 19:50:45 +00:00
Tobias Grosser f3adab4c20 [Polly] Canonicalize arrays according to base-ptr equivalence class
Summary:
    In case two arrays share base pointers in the same invariant load equivalence
    class, we canonicalize all memory accesses to the first of these arrays
    (according to their order in the equivalence class).

    This enables us to optimize kernels such as boost::ublas by ensuring that
    different references to the C array are interpreted as accesses to the same
    array. Before this change the runtime alias check for ublas would fail, as it
    would assume models of the C array with differing (but identically valued) base
    pointers would reference distinct regions of memory whereas the referenced
    memory regions were indeed identical.

    As part of this change we remove most of the MemoryAccess::get*BaseAddr
    interface. We removed already all references to get*BaseAddr in previous
    commits to ensure that no code relies on matching base pointers between
    memory accesses and scop arrays -- except for three remaining uses where we
    need the original base pointer. We document for these situations that
    MemoryAccess::getOriginalBaseAddr may return a base pointer that is distinct
    to the base pointer of the scop array referenced by this memory access.

Reviewers: sebpop, Meinersbur, zinob, gareevroman, pollydev, huihuiz, efriedma, jdoerfert

Reviewed By: Meinersbur

Subscribers: etherzhhb

Tags: #polly

Differential Revision: https://reviews.llvm.org/D28518

llvm-svn: 302636
2017-05-10 10:59:58 +00:00
Michael Kruse 5ae08c0ebb [DeLICM] Known knowledge.
Extend the Knowledge class to store information about the contents
of array elements and which values are written. Two knowledges do
not conflict the known content is the same. The content information
if computed from writes to and loads from the array elements, and
represented by "ValInst": isl spaces that compare equal if the value
represented is the same.

Differential Revision: https://reviews.llvm.org/D31247

llvm-svn: 302339
2017-05-06 14:03:58 +00:00
Michael Kruse 3e519b949b [DeLICM] Use Known information when comparing Occupied and Written.
Do not conflict if a write writes the same value as already known.

This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.

Differential Revision: https://reviews.llvm.org/D32026

llvm-svn: 301460
2017-04-26 20:35:07 +00:00
Michael Kruse cd2be66bf0 [DeLICM] Use Known information when comparing Existing.Occupied and Proposed.Occupied.
Do not conflict if the value of Existing and Proposed are the same.

This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.

Differential Revision: https://reviews.llvm.org/D32025

llvm-svn: 301301
2017-04-25 10:57:32 +00:00
Michael Kruse 8431e996d3 [DeLICM] Use Known information when comparing Existing.Written and Proposed.Written.
This change only affects unit tests, but no functional changes are
expected on LLVM-IR, as no Known information is yet extracted and
consequently this functionality is only triggered through unit tests.

Differential Revision: https://reviews.llvm.org/D32027

llvm-svn: 300874
2017-04-20 19:16:39 +00:00
Tobias Grosser 1f8b84094f Update isl bindings to latest version (+ Polly extensions)
After the isl C++ binding generator is now close to being upstreamed to isl, we
synchronize the latest changes to Polly. These are mostly formatting changes
plus a small interface change for the foreach callback function and some naming
changes in isl::boolean.

llvm-svn: 300398
2017-04-15 08:15:54 +00:00
Tobias Grosser 75aa1a9a49 Use isl C++ foreach implementation
This commit switches Polly over to the isl::obj::foreach_* implementation, which
is part of the new isl bindings and follows the foreach pattern established in
Polly by Michael Kruse.

The original isl C function:

  isl_stat isl_union_set_foreach_set(__isl_keep isl_union_set *uset,
      isl_stat (*fn)(__isl_take isl_set *set, void *user), void *user);

which required the user to define a static callback function to which all
interesting parameters are passed via a 'void *' user-pointer, is on the
C++ side available as a function that takes a std::function<>, which can
carry any additional arguments without the need for a user pointer:

  stat UnionSet::foreach_set(const std::function<stat(set)> &fn) const;

The following code illustrates the use of the new C++ interface:

  auto Lambda = [=, &Result](isl::set Set) -> isl::stat {
    auto Shifted = shiftDimension(Set, Pos, Amount);
    Result = Result.add(Shifted);
    return isl::stat::ok;
  }

  UnionSet.foreach_set(Lambda);

Polly had some specialized foreach functions which did not require the lambdas
to return a status flag. We remove these functions in this commit to move Polly
completely over to the new isl interface. We may in the future discuss if
functors without return values can be supported easily.

Another extension proposed by Michael Kruse is the use of C++ iterators to allow
the use of normal for loops to iterate over these sets. Such an extension would
allow us to further simplify the code.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D30620

llvm-svn: 300323
2017-04-14 13:39:40 +00:00
Michael Kruse 72f3922534 [DeLICM] Export Known and Written to DeLICMTests. NFC.
This will allow unittesting of new functionality based on
Known and Written.

llvm-svn: 300211
2017-04-13 16:32:39 +00:00
Michael Kruse a2acc11949 [DeLICM] Add Knowledge::Known. NFC.
This field will later contain a ValInst that is known to be stored
in an occupied array element.

llvm-svn: 300210
2017-04-13 16:32:31 +00:00
Michael Kruse fa7c8cdfc6 [DeLICM] Make Knowledge::Written an isl::union_map. NFC.
The map will later point to a ValInst that is written.

llvm-svn: 300208
2017-04-13 16:32:25 +00:00
Tobias Grosser 7b5a4dfd46 Exploit BasicBlock::getModule to shorten code
Suggested-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 299914
2017-04-11 04:59:13 +00:00
Roman Gareev 9d4d91ca6a [FIX] Fix ScheduleTreeOptimizer::optimizeMatMulPattern
Use new values of the dimensions during their permutation.

llvm-svn: 299663
2017-04-06 17:25:08 +00:00
Roman Gareev e0d466342b Restore the initial ordering of dimensions before applying the pattern matching
Dimensions of band nodes can be implicitly permuted by the algorithm applied
during the schedule generation.

For example, in case of the following matrix-matrix multiplication,

for (i = 0; i < 1024; i++)
  for (k = 0; k < 1024; k++)
    for (j = 0; j < 1024; j++)
      C[i][j] += A[i][k] * B[k][j];

it can produce the following schedule tree

domain: "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and
                                        0 <= i2 <= 1023 }"
child:
  schedule: "[{ Stmt_for_body6[i0, i1, i2] -> [(i0)] },
              { Stmt_for_body6[i0, i1, i2] -> [(i1)] },
              { Stmt_for_body6[i0, i1, i2] -> [(i2)] }]"
  permutable: 1
  coincident: [ 1, 1, 0 ]

The current implementation of the pattern matching optimizations relies on the
initial ordering of dimensions. Otherwise, it can produce the miscompilation
(e.g., [1]).

This patch helps to restore the initial ordering of dimensions by recreating
the band node when the corresponding conditions are satisfied.

Refs.:

[1] - https://bugs.llvm.org/show_bug.cgi?id=32500

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D31741

llvm-svn: 299662
2017-04-06 17:09:54 +00:00
Siddharth Bhat 5eeb1dd42e [Polly] [ScheduleOptimizer] Prevent incorrect tile size computation
Because Polly exposes parameters that directly influence tile size
calculations, one can setup situations like divide-by-zero.

Check against a possible divide-by-zero in getMacroKernelParams
and return early.

Also assert at the end of getMacroKernelParams that the block sizes
computed for matrices are positive (>= 1).

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31708

llvm-svn: 299633
2017-04-06 08:20:22 +00:00
Siddharth Bhat bcbfdade41 [Polly] [DependenceInfo] change WAR, WAW generation to correct semantics
= Change of WAR, WAW generation: =

- `buildFlow(Sink, MustSource, MaySource, Sink)` treates any flow of the form
    `sink <- may source <- must source` as a *may* dependence.

- we used to call:
```lang=cpp, name=old-flow-call.cpp
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
WAW = isl_union_flow_get_must_dependence(Flow);
WAR = isl_union_flow_get_may_dependence(Flow);
```

- This caused some WAW dependences to be treated as WAR dependences.
- Incorrect semantics.

- Now, we call WAR and WAW correctly.

== Correct WAW: ==
```lang=cpp, name=new-waw-call.cpp
   Flow = buildFlow(Write, MustWrite, MayWrite, Schedule);
   WAW = isl_union_flow_get_may_dependence(Flow);
   isl_union_flow_free(Flow);
```

== Correct WAR: ==
```lang=cpp, name=new-war-call.cpp
    Flow = buildFlow(Write, Read, MustaWrite, Schedule);
    WAR = isl_union_flow_get_must_dependence(Flow);
    isl_union_flow_free(Flow);
```

- We want the "shortest" WAR possible (exact dependences).
- We mark all the *must-writes* as may-source, reads as must-souce.
- Then, we ask for *must* dependence.
- This removes all the reads that flow through a *must-write*
  before reaching a sink.
- Note that we only block ealier writes with *must-writes*. This is
  intuitively correct, as we do not want may-writes to block
  must-writes.
- Leaves us with direct (R -> W).

- This affects reduction generation since RED is built using WAW and WAR.

= New StrictWAW for Reductions: =

- We used to call:
```lang=cpp,name=old-waw-war-call.cpp
      Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
      WAW = isl_union_flow_get_must_dependence(Flow);
      WAR = isl_union_flow_get_may_dependence(Flow);
```

- This *is* the right model of WAW we need for reductions, just not in general.
- Reductions need to track only *strict* WAW, without any interfering reductions.

= Explanation: Why the new WAR dependences in tests are correct: =

- We no longer set WAR = WAR - WAW
- Hence, we will have WAR dependences that were originally removed.
- These may look incorrect, but in fact make sense.

== Code: ==
```lang=llvm, name=new-war-dependence.ll
  ;    void manyreductions(long *A) {
  ;      for (long i = 0; i < 1024; i++)
  ;        for (long j = 0; j < 1024; j++)
  ; S0:          *A += 42;
  ;
  ;      for (long i = 0; i < 1024; i++)
  ;        for (long j = 0; j < 1024; j++)
  ; S1:          *A += 42;
  ;
```
=== WAR dependence: ===
  {  S0[1023, 1023] -> S1[0, 0] }

- Between `S0[1023, 1023]` and `S1[0, 0]`, we will have the dependences:

```lang=cpp, name=dependence-incorrect, counterexample
        S0[1023, 1023]:
    *-- tmp = *A (load0)--*
WAR 2   add = tmp + 42    |
    *-> *A = add (store0) |
                         WAR 1
        S1[0, 0]:         |
        tmp = *A (load1)  |
        add = tmp + 42    |
        A = add (store1)<-*
```

- One may assume that WAR2 *hides* WAR1 (since store0 happens before
  store1). However, within a statement, Polly has no idea about the
  ordering of loads and stores.

- Hence, according to Polly, the code may have looked like this:
```lang=cpp, name=dependence-correct
    S0[1023, 1023]:
    A = add (store0)
    tmp = A (load0) ---*
    add = A + 42       |
                     WAR 1
    S1[0, 0]:          |
    tmp = A (load1)    |
    add = A + 42       |
    A = add (store1) <-*
```

- So, Polly  generates (correct) WAR dependences. It does not make sense
  to remove these dependences, since they are correct with respect to
  Polly's model.

    Reviewers: grosser, Meinersbur

    tags: #polly

    Differential revision: https://reviews.llvm.org/D31386

llvm-svn: 299429
2017-04-04 13:08:23 +00:00
Michael Kruse 9e4e7b467f [DeLICM] Add const qualifiers. NFC.
llvm-svn: 298546
2017-03-22 20:09:58 +00:00
Michael Kruse d07d155ebb [DeLICM] Remove overloaded Knowledge constructor. NFC.
The isl C++ bindings now has implicit conversions from isl::set to
isl::union_set. Therefore the additional overload accepting isl::set
is not required anymore.

llvm-svn: 298529
2017-03-22 18:01:23 +00:00
Michael Kruse 29143ec3f7 [DeLICM] Remove AllElements. NFC.
It is not used and will not be used (anymore) in future commits.

llvm-svn: 298522
2017-03-22 17:18:39 +00:00
Roman Gareev cdfb57dc46 Introduce another level of metadata to distinguish non-aliasing accesses
Introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. It can be used to,
for example, distinguish different stores (loads) produced by unrolling of
the innermost loops and, subsequently, sink (hoist) them by LICM.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30606

llvm-svn: 298510
2017-03-22 14:25:24 +00:00
Siddharth Bhat 3e4a7d38ab [ScheduleOptimiser] fix typos in top comment [NFC]
coice -> choice
Transations -> Transactions

llvm-svn: 298095
2017-03-17 14:52:19 +00:00
Tobias Grosser c9d4cb2f42 [ScheduleOptimizer] Allow tiling after fusion
In ScheduleOptimizer::isTileableBand(), allow the case in which
the band node's child is an isl_schedule_sequence_node and its
grandchildren isl_schedule_leaf_nodes. This case can arise when
two or more statements are fused by the isl scheduler.

The tile_after_fusion.ll test has two statements in separate
loop nests and checks whether they are tiled after being fused
when polly-opt-fusion equals "max".

Reviewers: grosser

Subscribers: gareevroman, pollydev

Tags: #polly

Contributed-by: Theodoros Theodoridis <theodort@student.ethz.ch>

Differential Revision: https://reviews.llvm.org/D30815

llvm-svn: 297587
2017-03-12 19:02:31 +00:00
Michael Kruse 0446d81e2d [Simplify] Add -polly-simplify pass.
This new pass removes unnecessary accesses and writes. It currently
supports 2 simplifications, but more are planned.

It removes write accesses that write a loaded value back to the location
it was loaded from. It is a typical artifact from DeLICM. Removing it
will get rid of bogus dependencies later in dependency analysis.

It also removes statements without side-effects. ScopInfo already
removes these, but the removal of unnecessary writes can result in
more side-effect free statements.

Differential Revision: https://reviews.llvm.org/D30820

llvm-svn: 297473
2017-03-10 16:05:24 +00:00
Tobias Grosser 3e618c33fe [DeadCodeElimination] Translate to C++ bindings
This pass is a small and self-contained example of a piece of code that was
written with the isl C interface. The diff of this change nicely shows how the
C++ bindings can improve the readability of the code by avoiding the long C
function names and by avoiding any need for memory management.

As you will see, no calls to isl_*_copy or isl_*_free are needed anymore.
Instead the C++ interface takes care of automatically managing the objects.
This may introduce internally additional copies, but due to the isl reference
counting, such copies are expected to be cheap. For performance critical
operations, we will later exploit move semantics to eliminate unnecessary
copies that have shown to be costly.

Below we give a set of examples that shows the benefit of the C++ interface vs.
the pure C interface.

Check properties
----------------

Before:

  if (isl_aff_is_zero(aff) ||  isl_aff_is_one(aff))
    return true;

After:

  if (Aff.is_zero() || Aff.is_one())
    return true;

Type conversion
---------------

Before:

  isl_union_pw_multi_aff *UPMA = isl_union_pw_multi_aff_from_union_map(umap);

After:

  isl::union_pw_multi_aff UPMA = UMap;

Type construction
-----------------

Before:

  auto *Empty = isl_union_map_empty(space);

After:

  auto Empty = isl::union_map::empty(Space);

Operations
----------

Before:

  set = isl_union_set_intersect(set, set2);

After:

  Set = Set.intersect(Set2);

The use of isl::boolean in return types also adds an increases the robustness
of Polly, as on conversion to true or false, we verify that no isl_bool_error
has been returned and assert in case an error was returned. Before this change
we would have just ignored the error and proceeded with (some) exection path.

Tags: #polly

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D30619

llvm-svn: 297466
2017-03-10 15:05:38 +00:00
Tobias Grosser 51ebda8c9d [FlattenAlgo] Translate to C++ bindings
Translate the full algorithm to use the new isl C++ bindings

This is a large piece of code that has been written with the Polly IslPtr<>
memory management tool, which only performed memory management, but did not
provide a method interface. As such the code was littered with calls to
give(), copy(), keep(), and take(). The diff of this change should give a
good example how the new method interface simplifies the code by removing the
need for switching between managed types and C functions all the time
and consequently also the need to use the long C function names.

These are a couple of examples comparing the old IslPtr memory management
interface with the complete method interface.

Check properties
----------------

Before:

  if (isl_aff_is_zero(Aff.get()) ||  isl_aff_is_one(Aff.get()))
    return true;

After:

  if (Aff.is_zero() || Aff.is_one())
    return true;

Type conversion
---------------

Before:

  isl_union_pw_multi_aff *UPMA =
      give(isl_union_pw_multi_aff_from_union_map(UMap.copy());

After:

  isl::union_pw_multi_aff UPMA = UMap;

Type construction
-----------------

Before:

  auto Empty = give(isl_union_map_empty(Space.copy());

After:

  auto Empty = isl::union_map::empty(Space);

Operations
----------

Before:

  Set = give(isl_union_set_intersect(Set.copy(), Set2.copy());

After:

  Set = Set.intersect(Set2);

Tags: #polly

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D30617

llvm-svn: 297463
2017-03-10 14:55:58 +00:00
Tobias Grosser deaef15f52 Introduce isl C++ bindings, Part 1: value_ptr style interface
Over the last couple of months several authors of independent isl C++ bindings
worked together to jointly design an official set of isl C++ bindings which
combines their experience in developing isl C++ bindings. The new bindings have
been designed around a value pointer style interface and remove the need for
explicit pointer managenent and instead use C++ language features to manage isl
objects.

This commit introduces the smart-pointer part of the isl C++ bindings and
replaces the current IslPtr<T> classes, which served the very same purpose, but
had to be manually maintained. Instead, we now rely on automatically generated
classes for each isl object, which provide value_ptr semantics.

An isl object has the following smart pointer interface:

    inline set manage(__isl_take isl_set *ptr);

    class set {
      friend inline set manage(__isl_take isl_set *ptr);
      isl_set *ptr = nullptr;
      inline explicit set(__isl_take isl_set *ptr);

    public:
      inline set();
      inline set(const set &obj);
      inline set &operator=(set obj);
      inline ~set();
      inline __isl_give isl_set *copy() const &;
      inline __isl_give isl_set *copy() && = delete;
      inline __isl_keep isl_set *get() const;
      inline __isl_give isl_set *release();
      inline bool is_null() const;
    }

The interface and behavior of the new value pointer style classes is inspired
by http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3339.pdf, which
proposes a std::value_ptr, a smart pointer that applies value semantics to its
pointee.

We currently only provide a limited set of public constructors and instead
require provide a global overloaded type constructor method "isl::obj
isl::manage(isl_obj *)", which allows to convert an isl_set* to an isl::set by
calling 'S = isl::manage(s)'. This pattern models the make_unique() constructor
for unique pointers.

The next two functions isl::obj::get() and isl::obj::release() are taken
directly from the std::value_ptr proposal:

S.get() extracts the raw pointer of the object managed by S.
S.release() extracts the raw pointer of the object managed by S and sets the
object in S to null.

We additionally add std::obj::copy(). S.copy() returns a raw pointer refering
to a copy of S, which is a shortcut for "isl::obj(oldobj).release()", a
functionality commonly needed when interacting directly with the isl C
interface where all methods marked with __isl_take require consumable raw
pointers.

S.is_null() checks if S manages a pointer or if the managed object is currently
null. We add this function to provide a more explicit way to check if the
pointer is empty compared to a direct conversion to bool.

This commit also introduces a couple of polly-specific extensions that cover
features currently not handled by the official isl C++ bindings draft, but
which have been provided by IslPtr<T> and are consequently added to avoid code
churn. These extensions include:

	- operator bool() : Conversion from objects to bool
	- construction from nullptr_t
	- get_ctx() method
	- take/keep/give methods, which match the currently used naming
	  convention of IslPtr<T> in Polly. They just forward to
	  (release/get/manage).
	- raw_ostream printers

We expect that these extensions are over time either removed or upstreamed to
the official isl bindings.

We also export a couple of classes that have not yet been exported in isl (e.g.,
isl::space)

As part of the code review, the following two questions were asked:

- Why do we not use a standard smart pointer?

std::value_ptr was a proposal that has not been accepted. It is consequently
not available in the standard library. Even if it would be available, we want
to expand this interface with a complete method interface that is conveniently
available from each managed pointer. The most direct way to achieve this is to
generate a specialiced value style pointer class for each isl object type and
add any additional methods to this class. The relevant changes follow in
subsequent commits.

- Why do we not use templates or macros to avoid code duplication?

It is certainly possible to use templates or macros, but as this code is
auto-generated there is no need to make writing this code more efficient. Also,
most of these classes will be specialized with individual member functions in
subsequent commits, such that there will be little code reuse to exploit. Hence,
we decided to do so at the moment.

These bindings are not yet officially part of isl, but the draft is already very
stable. The smart pointer interface itself did not change since serveral months.
Adding this code to Polly is against our normal policy of only importing
official isl code. In this case however, we make an exception to showcase a
non-trivial use case of these bindings which should increase confidence in these
bindings and will help upstreaming them to isl.

Tags: #polly

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D30325

llvm-svn: 297452
2017-03-10 11:41:03 +00:00
Michael Kruse 9fb3ab1b19 [DeLICM] Add -polly-delicm-overapproximate-writes option.
One of the current limitations of DeLICM is that it only creates
PHI WRITEs that it knows are read by some PHI. Such writes may not span
all instances of a statement. Polly's code generator currently does not
support MemoryAccesses that are not executed in all instances
('partial accesses') and so has to give up on a possible mapping.

This workaround has once been suggested by Tobias Grosser: Try to
interpolate an arbitrary expansion to all instances. It will be checked
for possible conflicts with the existing Knowledge and can be applied if
the conflict checking result is that no semantics are changed.

Expansion is done by simplifying the mapping by coalescing with the hope
that coalescing will find a polyhedral 'rule' of the relevant map. It is
then 'gist'-ed using the domain of the relevant instances such that the
rule is expanded to the universe and finally intersected with the domain
of all statement instances.

The expansion makes conflicts become more likely, the found rule may
still not encompass all statement instances and the found rule exposes
internals of isl's implementation of coalesce and gist. The latter means
that the result depends on how much effort the implementation invests
into finding a rule which may change between versions of isl. Trivial
implementations of gist and coalesce just return the input arguments.

A patch that makes codegen support partial accesses is in preparation
as well.

Differential Revision: https://reviews.llvm.org/D30763

llvm-svn: 297373
2017-03-09 11:23:22 +00:00
Michael Kruse 935b2a3654 [DeadCodeElim] Put -polly-dce-precise-steps into the Polly category.
llvm-svn: 297318
2017-03-08 23:25:35 +00:00
Michael Kruse b295c37a15 [DeLICM] Statistics for use in regression tests.
Print some measurements of the DeLICM transformation at -analyze to be
used in regression tests.

llvm-svn: 296347
2017-02-27 15:53:13 +00:00
Michael Kruse e199f285b0 [DeLICM] Fortify against exceeding isl's max operations counter.
Control flow would flow-through after the check whether the operations
quota exceeded, with the intention that it would later be caught by
Knowledge::isUsable(). However, the Knowledge constructor has its own
assertions to check consistency which would fail if its fields have only
been initialized partially because some sets have been computed correctly
before the operations quota takes effect.

Fix by erroring-out early instead of falling-throught into the code that
might expect that everything has been computed correctly. For robustness,
also bail-out if any of the fields contain nullptr values instead of
relying on isl always setting exactly this error code if something went
wrong.

This should fix the
perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable
(-polly-process-unprofitable -polly-position=before-vectorizer
-polly-enable-delicm) buildbot.

llvm-svn: 296022
2017-02-23 21:58:20 +00:00
Michael Kruse f4e201e09f [Support] Remove NonowningIslPtr. NFC.
NonowningIslPtr<isl_X> was used as types of function parameters when the
function does not consume the isl object, i.e. an __isl_keep parameter.

The alternatives are:

1. IslPtr<isl_X>
   This has additional calls to isl_X_copy and isl_X_free to
   increase/decrease the reference counter even though not needed. The
   caller already owns a reference to the isl object.

2. const IslPtr<isl_X>&
   This does not change the reference counter, but requires an
   additional load to get the pointer to the isl object (instead of just
   passing the pointer itself).
   Moreover, the compiler cannot rely on the constness of the pointer
   and has to reload the pointer every time it writes to memory (unless
   alias analysis such as TBAA says it is not possible).

The isl C++ bindings currently in development do not have an equivalent
to NonowningIslPtr and adding one would make the binding more
complicated and its advantage in performance is small. In order to
simplify the transition to these C++ bindings, remove NonowningIslPtr.
Change every former use of it to alternative 2 mentioned aboce
(const IslPtr<isl_X>&).

llvm-svn: 295998
2017-02-23 17:57:27 +00:00
Michael Kruse 9f519714b3 [DeLICM] Add missing Doxygen comment. NFC.
llvm-svn: 295978
2017-02-23 14:51:50 +00:00
Michael Kruse 311ecb00dc [DeLICM] Capitalize parameter name. NFC.
llvm-svn: 295977
2017-02-23 14:51:45 +00:00
Roman Gareev 96e1119a96 Make optimizations based on pattern matching be enabled by default
Currently, pattern based optimizations of Polly can identify matrix
multiplication and optimize it according to BLIS matmul optimization pattern
(see ScheduleTreeOptimizer for details). This patch makes optimizations
based on pattern matching be enabled by default.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30293

llvm-svn: 295958
2017-02-23 11:44:12 +00:00
Michael Kruse d8d32bb3d1 [DeLICM] Regression test for skipping map targets.
Add optimization-remarks-missed for when mapping targets have been
skipped and add regression tests for them.

llvm-svn: 295953
2017-02-23 10:25:20 +00:00
Michael Kruse deb30e8278 [DeLICM] Add regression tests for DeLICM reject cases.
These tests were not included in the main DeLICM commit. These check the
cases where zone analysis cannot be successful because of assumption
violations.

We use the LLVM optimization remark infrastructure as it seems to be the
best fit for this kind of messages. I tried to make use if the
OptimizationRemarkEmitter. However, it would insert additional function
passes into the pass manager to get the hotness information. The pass
manager would insert them between the flatten pass and delicm, causing
the ScopInfo with the flattened schedule being thrown away.

Differential Revision: https://reviews.llvm.org/D30253

llvm-svn: 295846
2017-02-22 15:14:08 +00:00
Michael Kruse 8474470500 [DeLICM] Fix wrong comment. NFC.
Correct a comment that claimed that a store after load was detected
when the code checks a load after a store.

llvm-svn: 295835
2017-02-22 14:14:40 +00:00
Michael Kruse 43ed25f1d9 [DeLICM] Print message when zone analysis is not available on -analysis.
This is to distinguish the cases that analysis has failed from the case
where not transformation was performed.

llvm-svn: 295833
2017-02-22 13:48:35 +00:00
Michael Kruse 91cdafb86f [DeLICM] Use opt<int>.
There is no template specialization for cl::parser<unsigned long> such
that parsing an cl::opt<unsigned long> command line argument will fail.
Use opt<int> instead which has an associated parser.

llvm-svn: 295832
2017-02-22 13:48:18 +00:00
Michael Kruse 9e52c39f0a [DeLICM] Map values hoisted by LICM back to the array.
Implement the -polly-delicm pass. The pass intends to undo the
effects of LoopInvariantCodeMotion (LICM) which adds additional scalar
dependencies into SCoPs. DeLICM will try to map those scalars back to
the array elements they were promoted from, as long as the array
element is unused.

The is the main patch from the DeLICM/DePRE patch series. It does not
yet undo GVN PRE for which additional information about known values
is needed and does not handle PHI write accesses that have have no
target. As such its usefulness is limited. Patches for these issues
including regression tests for error situatons will follow.

Reviewers: grosser

Differential Revision: https://reviews.llvm.org/D24716

llvm-svn: 295713
2017-02-21 10:20:54 +00:00
Roman Gareev 4eb07e481e [FIX] Fix the typo in ScheduleOptimizer.cpp.
llvm-svn: 295292
2017-02-16 07:04:41 +00:00
Michael Kruse e23e94a08d [DeLICM] Add Knowledge class. NFC.
The Knowledge class remembers the state of data at any timepoint of a SCoP's
execution. Currently, it tracks whether an array element is unused or is
occupied by some value, and the writes to it. A future addition will be to also
remember which value it contains.

Objects are used to determine whether two Knowledge contain conflicting
information, i.e. two states cannot be true a the same time.

This commit was extracted from the DeLICM algorithm at
https://reviews.llvm.org/D24716.

llvm-svn: 295197
2017-02-15 16:59:10 +00:00
Roman Gareev b196055c0c Check reduction dependencies in case of the matrix multiplication optimization
To determine parameters of the matrix multiplication, we check RAW dependencies
that can be expressed using only reduction dependencies. Consequently, we
should check the reduction dependencies, if this is the case.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Sven Verdoolaege <skimo-polly@kotnet.org>
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29814

llvm-svn: 294836
2017-02-11 09:59:09 +00:00
Roman Gareev de69293b01 [FIX] Fix the potential issue of containsOnlyMatMulDep.
llvm-svn: 294835
2017-02-11 09:48:09 +00:00
Roman Gareev 5ef7e210c0 [NFC] Fix the style issue of lib/Transform/ScheduleOptimizer.cpp.
llvm-svn: 294834
2017-02-11 08:43:41 +00:00
Roman Gareev afcf026d81 [NFC] Fix style issues of lib/Transform/ScheduleOptimizer.cpp.
llvm-svn: 294831
2017-02-11 07:14:37 +00:00
Roman Gareev 3d4eae31ea Use the size of the widest type of the matrix multiplication operands
The size of the operands type is the one of the parameters required
to determine the BLIS micro-kernel. We get the size of the widest type
of the matrix multiplication operands in case there are several
different types.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29269

llvm-svn: 294828
2017-02-11 07:00:05 +00:00
Roman Gareev 9989088ee9 Isolate a set of partial tile prefixes in case of the matrix multiplication
optimization

Isolate a set of partial tile prefixes to allow hoisting and sinking out of
the unrolled innermost loops produced by the optimization of the matrix
multiplication.

In case it cannot be proved that the number of loop iterations can be evenly
divided by tile sizes and we tile and unroll the point loop, the isl generates
conditional expressions. Subsequently, the conditional expressions can prevent
stores and loads of the unrolled loops from being sunk and hoisted.

The patch isolates a set of partial tile prefixes, which have exactly Mr x Nr
iterations of the two innermost loops, the result of the loop tiling performed
by the matrix multiplication optimization, where Mr and Mr are parameters of
the micro-kernel. This helps to get rid of the conditional expressions of
the unrolled innermost loops. Probably this approach can be replaced with
padding in future.

In case of, for example, the gemm from Polybench/C 3.2 and parametric loop
bounds, it helps to increase the performance from 7.98 GFlops (27.71% of
theoretical peak) to 21.47 GFlops (74.57% of theoretical peak). Hence, we
get the same performance as in case of scalar loops bounds.

It also cause compile time regression. The compile-time is increased from
0.795 seconds to 0.837 seconds in case of scalar loops bounds and from 1.222
seconds to 1.490 seconds in case of parametric loops bounds.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29244

llvm-svn: 294564
2017-02-09 07:10:01 +00:00
Roman Gareev 772498dc68 [NFC] Make ScheduleTreeOptimizer::optimizeBand return a schedule node optimized
with optimizeMatMulPattern

This patch makes ScheduleTreeOptimizer::optimizeBand return a schedule node
optimized with optimizeMatMulPattern. Otherwise, it could not use the isolate
option, because standardBandOpts could try to tile a band node with anchored
subtree and get the error, since the use of the isolate option causes any tree
containing the node to be considered anchored. Furthermore, it is not intended
to apply standard optimizations, when the matrix multiplication has been
detected.

llvm-svn: 294444
2017-02-08 13:29:06 +00:00
Roman Gareev 98075fe181 A new algorithm for identification of a SCoP statement that implement a matrix
multiplication

The current identification of a SCoP statement that implement a matrix
multiplication does not help to identify different permutations of loops that
contain it and check for dependencies, which can prevent it from being
optimized. It also requires external determination of the operands of
the matrix multiplication. This patch contains the implementation of a new
algorithm that helps to avoid these issues. It also modifies the test cases
that generate matrix multiplications with linearized accesses, because
the new algorithm does not support them.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
             Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D28357

llvm-svn: 293890
2017-02-02 14:23:14 +00:00
Tobias Grosser ff40087a6a Update to recent formatting changes
llvm-svn: 293756
2017-02-01 10:12:09 +00:00
Roman Gareev 7758a2af53 Update the documentation on how the packing transformation is implemented
Add a simple example to update the documentation on how the packing
transformation is implemented.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D28021

llvm-svn: 293429
2017-01-29 10:37:50 +00:00
Michael Kruse 33dc454700 [CodePrepa] Remove unused declaration. NFC.
llvm-svn: 293304
2017-01-27 16:59:09 +00:00
Tobias Grosser 21a059af09 Adjust formatting to commit r292110 [NFC]
llvm-svn: 292123
2017-01-16 14:08:10 +00:00
Tobias Grosser 67e94fb435 ScheduleOptimizer: Allow to set register width in command line
We use this option to set a fixed register width in our test cases to make
sure the results are identical accross platforms.

llvm-svn: 292002
2017-01-14 07:14:54 +00:00
Roman Gareev 1c2927b209 Specify the default values of the cache parameters
If the parameters of the target cache (i.e., cache level sizes, cache level
associativities) are not specified or have wrong values, we use ones for
parameters of the macro-kernel and do not perform data-layout optimizations of
the matrix multiplication. In this patch we specify the default values of the
cache parameters to be able to apply the pattern matching optimizations even in
this case. Since there is no typical values of this parameters, we use the
parameters of Intel Core i7-3820 SandyBridge that also help to attain the
high-performance on IBM POWER System S822 and IBM Power 730 Express server.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D28090

llvm-svn: 290518
2016-12-25 16:32:28 +00:00
Tobias Grosser 0791d5f5aa ScheduleOptimizer: Fix spelling of option '-polly-target-throughput-vector-fma'
througput -> throughput

llvm-svn: 290418
2016-12-23 07:33:39 +00:00
Roman Gareev be5299af0b Change the determination of parameters of macro-kernel
Typically processor architectures do not include an L3 cache, which means that
Nc, the parameter of the micro-kernel, is, for all practical purposes,
redundant ([1]). However, its small values can cause the redundant packing of
the same elements of the matrix A, the first operand of the matrix
multiplication. At the same time, big values of the parameter Nc can cause
segmentation faults in case the available stack is exceeded.

This patch adds an option to specify the parameter Nc as a multiple of
the parameter of the micro-kernel Nr.

In case of Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8

it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak).

Refs.:

[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D28019

llvm-svn: 290256
2016-12-21 12:51:12 +00:00
Roman Gareev 92c446016a [Polly] Use three-dimensional arrays to store packed operands of the matrix
multiplication

Previously we had two-dimensional accesses to store packed operands of
the matrix multiplication for the sake of simplicity of the packed arrays.
However, addition of the third dimension helps to simplify the corresponding
memory access, reduce the execution time of isl operations applied to it, and
consequently reduce the compile-time of Polly. For example, in case of
Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=7

it helps to reduce the compile-time from about 361.456 seconds to about 0.816
seconds.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
             Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D27878

llvm-svn: 290251
2016-12-21 11:18:42 +00:00
Roman Gareev 2606c48a1d Restrict ranges of extension maps
To prevent copy statements from accessing arrays out of bounds, ranges of their
extension maps are restricted, according to the constraints of domains.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D25655

llvm-svn: 289815
2016-12-15 12:35:59 +00:00
Roman Gareev 15db81ef71 [NFC] Fix typos in getMacroKernelParams.
llvm-svn: 289808
2016-12-15 12:00:57 +00:00
Roman Gareev 8babe1a216 The order of the loops defines the data reused in the BLIS implementation of
gemm ([1]). In particular, elements of the matrix B, the second operand of
matrix multiplication, are reused between iterations of the innermost loop.
To keep the reused data in cache, only elements of matrix A, the first operand
of matrix multiplication, should be evicted during an iteration of the
innermost loop. To provide such a cache replacement policy, elements of the
matrix A can, in particular, be loaded first and, consequently, be
least-recently-used.

In our case matrices are stored in row-major order instead of column-major
order used in the BLIS implementation ([1]). One of the ways to address it is
to accordingly change the order of the loops of the loop nest. However, it
makes elements of the matrix A to be reused in the innermost loop and,
consequently, requires to load elements of the matrix B first. Since the LLVM
vectorizer always generates loads from the matrix A before loads from the
matrix B and we can not provide it. Consequently, we only change the BLIS micro
kernel and the computation of its parameters instead. In particular, reused
elements of the matrix B are successively multiplied by specific elements of
the matrix A .

Refs.:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D25653

llvm-svn: 289806
2016-12-15 11:47:38 +00:00
Michael Kruse 79c0173f53 [ScheduleOptimizer] Fix memory leak. NFC.
llvm-svn: 289434
2016-12-12 14:51:06 +00:00
Johannes Doerfert 2df9963fe3 Rerun mem2reg after the inliner
It did happen that after the inliner finished we end up with promotable
allocas in a function. We now run mem2reg to make sure everything is
promoted if possible.

llvm-svn: 288514
2016-12-02 17:43:57 +00:00
Michael Kruse 36e79ecaec [DeLICM] Add pass boilerplate code.
Add an empty DeLICM pass, without any functional parts.

Extracting the boilerplate from the the functional part reduces the size of the
code to review (https://reviews.llvm.org/D24716)

Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 288160
2016-11-29 16:41:21 +00:00
Roman Gareev b3224adfb6 Perform copying to created arrays according to the packing transformation
This is the fourth patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform copying to created
arrays, which is the last step to implement the packing transformation.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D23260

llvm-svn: 281441
2016-09-14 06:26:09 +00:00
Roman Gareev f5aff70405 Store the size of the outermost dimension in case of newly created arrays that require memory allocation.
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D23991

llvm-svn: 281234
2016-09-12 17:08:31 +00:00
Tobias Grosser 5aea5653b3 FlattenAlgo: Ensure we _really_ obtain a param space
This resolves "isl_space.c:1775: not a parameter space" errors I have seen
on two systems.

llvm-svn: 281052
2016-09-09 16:11:26 +00:00
Michael Kruse 7886bd7ca5 Add -polly-flatten-schedule pass.
The -polly-flatten-schedule pass reduces the number of scattering
dimensions in its isl_union_map form to make them easier to understand.
It is not meant to be used in production, only for debugging and
regression tests.

To illustrate, how it can make sets simpler, here is a lifetime set
used computed by the porposed DeLICM pass without flattening:

    { Stmt_reduction_for[0, 4] -> [0, 2, o2, o3] : o2 < 0;
      Stmt_reduction_for[0, 4] -> [0, 1, o2, o3] : o2 >= 5;
      Stmt_reduction_for[0, 4] -> [0, 1, 4, o3] : o3 > 0;
      Stmt_reduction_for[0, i1] -> [0, 1, i1, 1] : 0 <= i1 <= 3;
      Stmt_reduction_for[0, 4] -> [0, 2, 0, o3] : o3 <= 0 }

And here the same lifetime for a semantically identical one-dimensional
schedule:

    { Stmt_reduction_for[0, i1] -> [2 + 3i1] : 0 <= i1 <= 4 }

Differential Revision: https://reviews.llvm.org/D24310

llvm-svn: 280948
2016-09-08 15:02:36 +00:00
Tobias Grosser c80d6979bd Drop '@brief' from doxygen comments
LLVM's coding guideline suggests to not use @brief for one-sentence doxygen
comments to improve readability. Switch this once and for all to ensure people
do not copy @brief comments from other parts of Polly, when writing new code.

llvm-svn: 280468
2016-09-02 06:33:33 +00:00
Roman Gareev 5f99f8656e Add a flag to dump SCoP optimized with the IslScheduleOptimizer pass
Dump polyhedral descriptions of Scops optimized with the isl scheduling
optimizer and the set of post-scheduling transformations applied
on the schedule tree to be able to check the work of the IslScheduleOptimizer
pass at the polyhedral level.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D23740

llvm-svn: 279395
2016-08-21 11:20:39 +00:00
Roman Gareev 1c892e91e3 Perform replacement of access relations and creation of new arrays according to the packing transformation
This is the third patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform replacement of
the access relations and create empty arrays, which are steps to implement
the packing transformation. In subsequent changes we will implement copying
to created arrays.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D22187

llvm-svn: 278666
2016-08-15 12:22:54 +00:00
Tobias Grosser 2219d15748 Fix a couple of spelling mistakes
llvm-svn: 277569
2016-08-03 05:28:09 +00:00
Roman Gareev 3a18a931a8 Apply all necessary tilings and interchangings to get a macro-kernel
This is the second patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus
two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update. In this change
we create the BLIS macro-kernel by applying a combination of tiling
and interchanging. In subsequent changes we will implement the packing
transformation.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D21491

llvm-svn: 276627
2016-07-25 09:42:53 +00:00
Roman Gareev 2cb4d133f5 [NFC] Refactor creation of the BLIS mirco-kernel and improve documentation
Reviewed-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 276616
2016-07-25 07:27:59 +00:00
Tobias Grosser 3898a0468c Propagate on-error status
This ensures that the error status set with -polly-on-isl-error-abort is
maintained even after running DependenceInfo and ScheduleOptimizer. Both
passes temporarily set the error status to CONTINUE as the dependence
analysis uses a compute-out and the scheduler may not be able to derive
a schedule. In both cases we want to not abort, but to handle the error
gracefully. Before this commit, we always set the error reporting to ABORT
after these passes. After this commit, we use the error reporting mode that was
active earlier.

This comes without a test case as this would require us to introduce (memory)
errors which would trigger the isl errors.

llvm-svn: 274272
2016-06-30 20:42:58 +00:00
Tobias Grosser af14993016 Simplify: get isl_ctx only once [NFC]
... instead of call S.getIslCtx() many times.

llvm-svn: 274271
2016-06-30 20:42:56 +00:00
Tobias Grosser 522478d2c0 clang-tidy: Add llvm namespace comments
llvm commonly adds a comment to the closing brace of a namespace to indicate
which namespace is closed. clang-tidy provides with llvm-namespace-comment
a handy tool to check for this habit. We use it to ensure we consitently use
namespace comments in Polly.

There are slightly different styles in how namespaces are closed in LLVM. As
there is no large difference between the different comment styles we go for the
style clang-tidy suggests by default.

To reproduce this fix run:

for i in `ls tools/polly/lib/*/*.cpp`; \
  clang-tidy -checks='-*,llvm-namespace-comment' -p build $i -fix \
  -header-filter=".*"; \
done

This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.

llvm-svn: 273621
2016-06-23 22:17:27 +00:00
Tobias Grosser 1a1056798b Fix separator in header comment
This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.

llvm-svn: 273437
2016-06-22 16:29:33 +00:00
Tobias Grosser 8dd653d983 clang-tidy: apply modern-use-nullptr fixes
Instead of using 0 or NULL use the C++11 nullptr symbol when referencing null
pointers.

This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.

llvm-svn: 273435
2016-06-22 16:22:00 +00:00
Roman Gareev 397a34a08d [NFC] Use isl_schedule_node_band_n_member to get the number of dimensions of a band node.
llvm-svn: 273400
2016-06-22 12:11:30 +00:00
Roman Gareev 42402c9e89 Apply all necessary tilings and unrollings to get a micro-kernel
This is the first patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel,
plus two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update.
In this change we create the BLIS micro-kernel by applying
a combination of tiling and unrolling. In subsequent changes
we will add the extraction of the BLIS macro-kernel
and implement the packing transformation.

Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D21140

llvm-svn: 273397
2016-06-22 09:52:37 +00:00
Roman Gareev b17b9a8324 [NFC] Outline the application of register tiling.
llvm-svn: 272515
2016-06-12 17:20:05 +00:00
Roman Gareev 827264de98 [NFC] "#include <ciso646>" is unnecessary, because "and", "or" were replaced
by "&&", "||".

llvm-svn: 272168
2016-06-08 16:44:11 +00:00
Roman Gareev ba0fb97c0a [NFC] Check that a parameter of ScheduleTreeOptimizer::isMatrMultPattern contains a correct partial schedule
llvm-svn: 271780
2016-06-04 06:34:04 +00:00
Roman Gareev 4b8c7aeb62 [FIX] Fix potential issue related to subtraction from an unsigned 0 in circularShiftOutputDims
Reported-by: Mehdi Amini <mehdi.amini@apple.com>
Contributed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: http://reviews.llvm.org/D20969

llvm-svn: 271705
2016-06-03 18:46:29 +00:00
Roman Gareev 76614d3ed9 [GSoC 2016] [Polly] [FIX] Determination of statements that contain matrix
multiplication

Fix small issues related to characters, operators  and descriptions of tests.

Differential Revision: http://reviews.llvm.org/D20806

llvm-svn: 271264
2016-05-31 11:22:21 +00:00
Johannes Doerfert 99191c78c2 Decouple SCoP building logic from pass
Created a new pass ScopInfoRegionPass. As name suggests, it is a
  region pass and it is there to preserve compatibility with our
  existing Polly passes.  ScopInfoRegionPass will return a SCoP object
  for a valid region while the creation of the SCoP stays in the
  ScopInfo class.

  Contributed-by: Utpal Bora <cs14mtech11017@iith.ac.in>
  Reviewed-by: Tobias Grosser <tobias@grosser.es>,
               Johannes Doerfert <doerfert@cs.uni-saarland.de>

Differential Revision: http://reviews.llvm.org/D20770

llvm-svn: 271259
2016-05-31 09:41:04 +00:00
Michael Kruse 7410a27820 MSVC compile fix: #include <ciso646>. NFC.
This header is required to make the ISO 646 alternative operator
spellings ("and", "or" instead of "&&", "||") work. Should these
operators be replaced by the standard ones as already suggested by
Johannes, also remove this #include again.

llvm-svn: 271206
2016-05-30 14:27:14 +00:00
Roman Gareev 9c3eb5960a Determination of statements that contain matrix multiplication
Add determination of statements that contain, in particular,
matrix multiplications and can be optimized with [1] to try to
get close-to-peak performance. It can be enabled
via polly-pm-based-opts, which is false by default.

Refs:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D20575

llvm-svn: 271128
2016-05-28 16:17:58 +00:00
Michael Kruse 315aa3278e [ScheduleOptimizer] Add -polly-opt-outer-coincidence option.
Add a command line switch to set the
isl_options_set_schedule_outer_coincidence option. ISL then tries to
build schedules where the outer member of a band satisfies the
coincidence constraints.

In practice this allows loop skewing for more parallelism in inner
loops.

llvm-svn: 268222
2016-05-02 11:35:27 +00:00
Johannes Doerfert 3c6a99b818 Add __isl_give annotations to return types [NFC]
llvm-svn: 265882
2016-04-09 21:55:23 +00:00
Hongbin Zheng 2a798853f8 Allow the client of DependenceInfo to obtain dependences at different granularities.
llvm-svn: 262591
2016-03-03 08:15:33 +00:00
Hongbin Zheng defd098612 Adapt to LLVM head, again
llvm-svn: 261905
2016-02-25 17:54:42 +00:00
Hongbin Zheng 566c614525 Revert "Adapt to LLVM head. NFC"
This reverts commit 4d3753b9646a69c00d234ccd6e91dc3d0ea5d643.

llvm-svn: 261892
2016-02-25 16:46:17 +00:00
Hongbin Zheng f4e35f9cb9 Adapt to LLVM head. NFC
llvm-svn: 261886
2016-02-25 16:36:09 +00:00
Roman Gareev 11001e1534 Annotation of SIMD loops
Use 'mark' nodes annotate a SIMD loop during ScheduleTransformation and skip
parallelism checks.

The buildbot shows the following compile/execution time changes:

  Compile time:
    Improvements    Δ     Previous  Current  σ
    …/gesummv      -6.06% 0.2640    0.2480   0.0055
    …/gemver       -4.46% 0.4480    0.4280   0.0044
    …/covariance   -4.31% 0.8360    0.8000   0.0065
    …/adi          -3.23% 0.9920    0.9600   0.0065
    …/doitgen      -2.53% 0.9480    0.9240   0.0090
    …/3mm          -2.33% 1.0320    1.0080   0.0087

  Execution time:
    Regressions     Δ     Previous  Current  σ
    …/viterbi       1.70% 5.1840    5.2720   0.0074
    …/smallpt       1.06% 12.4920   12.6240  0.0040

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D14491

llvm-svn: 261620
2016-02-23 09:00:13 +00:00
Tobias Grosser 5624d3c978 Adjust formatting to clang-format changes in 256149
llvm-svn: 256151
2015-12-21 12:38:56 +00:00
Tobias Grosser ca7f5bb767 Full/partial tile separation for vectorization
We isolate full tiles from partial tiles to be able to, for example, vectorize
loops with parametric lower and/or upper bounds.

If we use -polly-vectorizer=stripmine, we can see execution-time improvements:
correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to
0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from
8m5565s to 7m4285s (-13.18 %).

The current full/partial tile separation increases compile-time more than
necessary. As a result, we see in compile time regressions, for example, for 3mm
from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected
as we generate more IR and consequently more time is spent in the LLVM backends.
However, a first investiagation has shown that a larger portion of compile time
is unnecessarily spent inside Polly's parallelism detection and could be
eliminated by propagating existing knowledge about vector loop parallelism.
Before enabling -polly-vectorizer=stripmine by default, it is necessary to
address this compile-time issue.

Contributed-by: Roman Gareev <gareevroman@gmail.com>

Reviewers: jdoerfert, grosser

Subscribers: grosser, #polly

Differential Revision: http://reviews.llvm.org/D13779

llvm-svn: 250809
2015-10-20 09:12:21 +00:00
Johannes Doerfert 01978cfa0c Remove independent blocks pass
Polly can now be used as a analysis only tool as long as the code
  generation is disabled. However, we do not have an alternative to the
  independent blocks pass in place yet, though in the relevant cases
  this does not seem to impact the performance much. Nevertheless, a
  virtual alternative that allows the same transformations without
  changing the input region will follow shortly.

llvm-svn: 250652
2015-10-18 12:28:00 +00:00
Tobias Grosser 0e3a6b13a4 Sort includes using 'clang-format -sort-includes'
llvm-svn: 250392
2015-10-15 12:17:36 +00:00
Tobias Grosser f30be2f370 RegisterPasses: Optionally run inliner before Polly
This will allow us to optimize C++ template code with Polly. This support is
mostly for debugging purpose and individual experiments. The ultimate goal is
still to run Polly later in the pass manager when inlining already happened.

llvm-svn: 250092
2015-10-12 20:03:44 +00:00
Johannes Doerfert f363ed9804 [NFC] Move helper functions to ScopHelper
Helper functions in the BlockGenerators.h/cpp introduce dependences
  from the frontend to the backend of Polly. As they are used in
  ScopDetection, ScopInfo, etc. we move them to the ScopHelper file.

llvm-svn: 249919
2015-10-09 23:40:24 +00:00
Johannes Doerfert 45be64464b [NFC] Consistenly use commented and annotated ScopPass functions
The changes affect methods that are part of the Pass interface and
  include:
    - Comments that describe the methods purpose.
    - A consistent use of the keywords override and virtual.
  Additionally, the printScop method is now optional and removed from
  SCoP passes that do not implement it.

llvm-svn: 248685
2015-09-27 15:43:29 +00:00
Johannes Doerfert 0f37630849 [NFC] Use releaseMemory to release internal memory
llvm-svn: 248684
2015-09-27 15:42:28 +00:00
Chandler Carruth 66ef16b289 [PM] Update Polly for the new AA infrastructure landed in r247167.
llvm-svn: 247198
2015-09-09 22:13:56 +00:00
Tobias Grosser fa57e9b7e6 Make our data-locality schedule tree transforms externally accessible
Other passes which perform different optimizations might be interested in
also applying data-locality transformations as part of their overall
transformation.

llvm-svn: 245824
2015-08-24 06:01:47 +00:00
Tobias Grosser 1ac884d73a Use marker nodes to annotate the different levels of tiling
Currently, marker nodes are ignored during AST generation, but visible in the
-debug-only=polly-ast output.

llvm-svn: 245809
2015-08-23 09:11:00 +00:00
Tobias Grosser fc490a99f5 Do really not unroll the vector loop in combination with register tiling
The previous commit lacked a test case for register tiling + pre-vectorization
and we obviously got it immediately wrong.

llvm-svn: 245599
2015-08-20 19:08:16 +00:00
Tobias Grosser 42e2489553 Add experimental support for trivial register tiling
Register tiling in Polly is for now just an additional level of tiling which
is fully unrolled. It is disabled by default. To make this useful for more than
experiments, we still need a cost function as well as possibly further
optimizations that teach LLVM to actually put some of the values we got into
scalar registers.

llvm-svn: 245564
2015-08-20 13:45:05 +00:00
Tobias Grosser 0483271662 Add support for two-level tiling
By default we only use one level of tiling for loops, but in general tiling
for multiple levels is trivial for us. Hence, we add a set of options that
allow people to play with a second level of tiling. If this is profitable for
some cases we can work on heuristics that allow us to identify these cases
and use two-level tiling for them.

llvm-svn: 245563
2015-08-20 13:45:02 +00:00
Tobias Grosser 862b9b5239 Factor out check for tileable band node.
llvm-svn: 245559
2015-08-20 12:32:45 +00:00
Tobias Grosser 9bdea573bd Introduce tileBand function to simplify code
llvm-svn: 245558
2015-08-20 12:22:37 +00:00
Tobias Grosser d891b54132 Add some forgotten isl memory annotations
llvm-svn: 245557
2015-08-20 12:16:23 +00:00
Tobias Grosser 07c1c2fcc9 Make prevectorization width configurable
Polly uses 'prevectorization' to enable outer loop vectorization. When
vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop
iterations which are than interchanged to the innermost level such that LLVM's
inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this
loop. The number of loop iterations to strip-mine is now configurable with the
option -polly-prevect-width=<number-of-prevec-dims>.

This is mostly a debugging option. We should probably add a heuristic that
derives the number of prevectorization dimensions from the target data and
the data types used.

llvm-svn: 245424
2015-08-19 08:46:11 +00:00
Tobias Grosser 161c9081e5 Do not use negative option name
Instead of -polly-no-tiling, we use -polly-tiling=false to disable tiling.

llvm-svn: 245423
2015-08-19 08:22:06 +00:00
Tobias Grosser f10f4636ff Simplify tiling code a bit
We only need to allocate the tile size vector if we actually want to perform
a tiling.

llvm-svn: 245422
2015-08-19 08:03:37 +00:00
Tobias Grosser 77c0f5a3b7 Drop dead and disable code from IndependentBlocks
Since Polly has now support for the code generation of scalar and PHI
dependences this code was unused and is now dropped.

llvm-svn: 245284
2015-08-18 09:30:28 +00:00
Tobias Grosser c5bcf246d1 Fix Polly after SCEV port to new pass manager
This fixes compilation after LLVM commit r245193.

llvm-svn: 245211
2015-08-17 10:57:08 +00:00
Tobias Grosser 234a48270e AST Generation Paper published in TOPLAS
The July issue of TOPLAS contains a 50 page discussion of the AST generation
techniques used in Polly. This discussion gives not only an in-depth
description of how we (re)generate an imperative AST from our polyhedral based
mathematical program description, but also gives interesting insights about:

- Schedule trees: A tree-based mathematical program description that enables us
to perform loop transformations on an abstract level, while issues like the
generation of the correct loop structure and loop bounds will be taken care of
by our AST generator.

- Polyhedral unrolling: We discuss techniques that allow the unrolling of
non-trivial loops in the context of parameteric loop bounds, complex tile
shapes and conditionally executed statements. Such unrolling support enables
the generation of predicated code e.g. in the context of GPGPU computing.

- Isolation for full/partial tile separation: We discuss native support for
handling full/partial tile separation and -- in general -- native support for
isolation of boundary cases to enable smooth code generation for core
computations.

- AST generation with modulo constraints: We discuss how modulo mappings are
lowered to efficient C/LLVM code.

- User-defined constraint sets for run-time checks We discuss how arbitrary
sets of constraints can be used to automatically create run-time checks that
ensure a set of constrainst actually hold. This feature is very useful to
verify at run-time various assumptions that have been taken program
optimization.

Polyhedral AST generation is more than scanning polyhedra
Tobias Grosser, Sven Verdoolaege, Albert Cohen
ACM Transations on Programming Languages and Systems (TOPLAS), 37(4), July 2015

llvm-svn: 245157
2015-08-15 09:34:33 +00:00
Michael Kruse 1d3c9b54fb Remove leftover comment
The function to which this commit applies has been removed in a
previous commit.

llvm-svn: 244450
2015-08-10 15:07:16 +00:00
Michael Kruse fd613545cb [Polly] Remove dead code in IndependentBlocks
Summary: The splitExitBlock function is never called. Going to replace its functionality in successive patches that do not modify the IR.

Reviewers: grosser

Subscribers: pollydev

Projects: #polly

Differential Revision: http://reviews.llvm.org/D11865

llvm-svn: 244404
2015-08-08 20:31:20 +00:00
Tobias Grosser b241d928bd Rewrite getPrevectorMap using schedule trees operations
Schedule trees are a lot easier to work with, for both humans and machines. For
humans the more structured schedule representation is easier to reason about.
Together with the more abstract isl programming interface this can result in a
lot cleaner code (see this changeset). For machines, the structured schedule and
the fact that we now use explicit piecewise affine expressions instead of
integer maps makes it easier to generate code from this schedule tree. As a
result, we can already see a slight compile-time improvement -- for 3mm from
0m0.593s to 0m0.551s seconds (-7 %). More importantly, future optimizations such
as full-partial tile separation will most likely result in more streamlined code
to be generated.

Contributed-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 243458
2015-07-28 18:03:36 +00:00
Tobias Grosser 2764794ba4 Simplify some isl expression we use
Suggested-by: Sven Verdoolaege <skimo-polly@kotnet.org>
llvm-svn: 243254
2015-07-26 19:22:35 +00:00
Tobias Grosser 3b10c94062 Prevectorize the schedule of the band (or the point loop in case of tiling)
Contributed-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 243214
2015-07-25 12:28:56 +00:00
Tobias Grosser 808cd69a92 Use schedule trees to represent execution order of statements
Instead of flat schedules, we now use so-called schedule trees to represent the
execution order of the statements in a SCoP. Schedule trees make it a lot easier
to analyze, understand and modify properties of a schedule, as specific nodes
in the tree can be choosen and possibly replaced.

This patch does not yet fully move our DependenceInfo pass to schedule trees,
as some additional performance analysis is needed here. (In general schedule
trees should be faster in compile-time, as the more structured representation
is generally easier to analyze and work with). We also can not yet perform the
reduction analysis on schedule trees.

For more information regarding schedule trees, please see Section 6 of
https://lirias.kuleuven.be/handle/123456789/497238

llvm-svn: 242130
2015-07-14 09:33:13 +00:00
Tobias Grosser af4e809ca6 Remove code for scalar and PHI to array translation
This removes old code that has been disabled since several weeks and was hidden
behind the flags -disable-polly-intra-scop-scalar-to-array=false and
-polly-model-phi-nodes=false. Earlier, Polly used to translate scalars and
PHI nodes to single element arrays, as this avoided the need for their special
handling in Polly. With Johannes' patches adding native support for such scalar
references to Polly, this code is not needed any more. After this commit both
-polly-prepare and -polly-independent are now mostly no-ops. Only a couple of
simple transformations still remain, but they are scheduled for removal too.

Thanks again to Johannes Doerfert for his nice work in making all this code
obsolete.

llvm-svn: 240766
2015-06-26 07:31:18 +00:00
Michael Kruse c59f22c556 Update ISL to isl-0.15-3-g532568a
This version adds small integer optimization, but is not active by
default. It will be enabled in a later commit.
    
The schedule-fuse=min/max option has been replaced by the
serialize-sccs option. Adapting Polly was necessary, but retaining the
name polly-opt-fusion=min/max.

Differential Revision: http://reviews.llvm.org/D10505

Reviewers: grosser
llvm-svn: 240027
2015-06-18 16:45:40 +00:00
Tobias Grosser 97d8745087 Dump YAML schedule tree as properly indented tree in DEBUG output
llvm-svn: 238645
2015-05-30 06:46:59 +00:00
Tobias Grosser 3e77d14563 Add indvar pass to canonicalization sequence
Running indvar before Polly is useful as this eliminates zexts as they commonly
appear when a 32 bit induction variable (type int) was used on a 64 bit system.
These zexts confuse our delinearization and prevent for example the successful
delinearization of the nussinov kernel in polybench-c-4.1.

This fixes http://llvm.org/PR23426

Suggested-by: Xing Su <xsu.llvm@outlook.com>
llvm-svn: 238643
2015-05-30 06:16:41 +00:00
Tobias Grosser b2f399264d Update isl to 93b8e43d
This update brings mostly interface cleanups, but also fixes two bugs in
imath (a memory leak, some undefined behavior).

llvm-svn: 238422
2015-05-28 13:32:11 +00:00
Tobias Grosser 7c3bad52dd Use value semantics for list of ScopStmt(s) instead of std::owningptr
David Blaike suggested this as an alternative to the use of owningptr(s) for our
memory management, as value semantics allow to avoid the additional interface
complexity caused by owningptr while still providing similar memory consistency
guarantees. We could also have used a std::vector, but the use of std::vector
would yield possibly changing pointers which currently causes problems as for
example the memory accesses carry pointers to their parent statements. Such
pointers should not change.

Reviewer: jblaikie, jdoerfert

Differential Revision: http://reviews.llvm.org/D10041

llvm-svn: 238290
2015-05-27 05:16:57 +00:00
Tobias Grosser 679dfafd33 Use unique_ptr to clarify ownership of ScopStmt
llvm-svn: 238090
2015-05-23 05:14:09 +00:00
Tobias Grosser ac60f4594f Enable scalar and PHI code generation for Polly
The feature itself has been committed by Johannes in r238070. As this is the
way forward, we now enable it to ensure we get test coverage.

Thank you Johannes for this nice work!

llvm-svn: 238088
2015-05-23 03:34:41 +00:00
Tobias Grosser 1b6ea573f2 Replace low-level constraint building with higher level functions
Instead of explicitly building constraints and adding them to our maps we
now use functions like map_order_le to add the relevant information to the
maps.

llvm-svn: 237934
2015-05-21 19:02:44 +00:00
Tobias Grosser cd524dc51d Add explicit #includes for used isl features
llvm-svn: 236931
2015-05-09 09:36:38 +00:00
Tobias Grosser ba0d09227c Sort include directives
Upcoming revisions of isl require us to include header files explicitly, which
have previously been already transitively included. Before we add them, we sort
the existing includes.

Thanks to Chandler for sort_includes.py. A simple, but very convenient script.

llvm-svn: 236930
2015-05-09 09:13:42 +00:00
Tobias Grosser 5483931117 Rename 'scattering' to 'schedule'
In Polly we used both the term 'scattering' and the term 'schedule' to describe
the execution order of a statement without actually distinguishing between them.
We now uniformly use the term 'schedule' for the execution order.  This
corresponds to the terminology of isl.

History: CLooG introduced the term scattering as the generated code can be used
as a sequential execution order (schedule) or as a parallel dimension
enumerating different threads of execution (placement). In Polly and/or isl the
term placement was never used, but we uniformly refer to an execution order as a
schedule and only later introduce parallelism. When doing so we do not talk
about about specific placement dimensions.

llvm-svn: 235380
2015-04-21 11:37:25 +00:00
Tobias Grosser 02cf69a6ed Make -polly-no-tiling work again
llvm-svn: 234125
2015-04-05 21:52:21 +00:00
Tobias Grosser 4f6bceface Do not scale tile loops
We now generate tile loops as:

 for (int c1 = 0; c1 <= 47; c1 += 1)
   for (int c2 = 0; c2 <= 47; c2 += 1)
     for (int c3 = 0; c3 <= 31; c3 += 1)
       for (int c4 = 0; c4 <= 31; c4 += 4)
         #pragma simd
         for (int c5 = c4; c5 <= c4 + 3; c5 += 1)
           Stmt_for_body3(32 * c1 + c3, 32 * c2 + c5);

instead of

 for (int c1 = 0; c1 <= 1535; c1 += 32)
   for (int c2 = 0; c2 <= 1535; c2 += 32)
     for (int c3 = 0; c3 <= 31; c3 += 1)
       for (int c4 = 0; c4 <= 31; c4 += 4)
         #pragma simd
         for (int c5 = c4; c5 <= c4 + 3; c5 += 1)
           Stmt_for_body3(c1 + c3, c2 + c5);

Run-time performance-wise this makes little difference, but this gives a large
reduction in compile time (10-30% on 17 LNT benchmarks). Apparently the isl
AST generator is not yet very efficient in generating the latter.

llvm-svn: 233675
2015-03-31 07:52:36 +00:00
Tobias Grosser 378e003748 Drop libpluto support
We do not have buildbots or anything that tests this functionality, hence it
most likely bitrots. People interested to use this functionality can always
recover it from svn history.

llvm-svn: 233570
2015-03-30 17:54:01 +00:00