Commit Graph

178 Commits

Author SHA1 Message Date
Eugene Zelenko 9248fde53a [Polly] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 311704
2017-08-24 21:22:41 +00:00
Michael Kruse 06ed529205 Add more statistics.
Add statistics about
- Which optimizations are applied
- Number of loops in Scops at various stages
- Number of scalar/singleton writes at various stages representative
  for scalar false dependencies
- Number of parallel loops

These will be useful to find regressions due to moving Polly further
down of LLVM's pass pipeline.

Differential Revision: https://reviews.llvm.org/D37049

llvm-svn: 311553
2017-08-23 13:50:30 +00:00
Roman Gareev 0956a606ff Disable the Loop Vectorizer in case of GEMM
Currently, in case of GEMM and the pattern matching based optimizations, we
use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop
Vectorizer can get in the way of optimal code generation, we disable the Loop
Vectorizer for the innermost loop using mark nodes and emitting the
corresponding metadata.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D36928

llvm-svn: 311473
2017-08-22 17:38:46 +00:00
Michael Kruse d091bf8d8e [MatMul] Make MatMul detection independent of internal isl representations.
The pattern recognition for MatMul is restrictive.

The number of "disjuncts" in the isl_map containing constraint
information was previously required to be 1
(as per isl_*_coalesce - which should ideally produce a domain map with
a single disjunct, but does not under some circumstances).

This was changed and made more flexible.

Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D36460

llvm-svn: 311302
2017-08-20 21:31:11 +00:00
Roman Gareev dbde718676 Do not use isl_set_project_out to get all loop prefixes
Currently, only convex isolation sets can be efficiently processed by isl.
Consequently, as a temporary solution, we use a different algorithm for partial
tile isolation that helps to build convex isolation sets in some cases.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D36278

llvm-svn: 310374
2017-08-08 16:15:33 +00:00
Tobias Grosser 327e9ecb0d [ScheduleOptimizer] Make matmul pattern detection work with delicm output
In certain cases delicm might decide to not leave the original array write in
the loop body, but to remove it and instead leave a transformed phi node as
write access. This commit teached the matmul pattern detection to order the
memory accesses according to when the access actually happens and use this
information to detect the new pattern. This makes pattern based matmul
optimization work for 2mm and 3mm in polybench 4 after
polly-position=before-vectorizer has been enabled.

llvm-svn: 310338
2017-08-08 06:15:15 +00:00
Tobias Grosser 61bd3a4840 [ScopInfo] Move Scop::getPwAffOnly to isl++ [NFC]
llvm-svn: 310231
2017-08-06 21:42:38 +00:00
Tobias Grosser 31df6f31c0 [ScopInfo] Move Scop::getDomains to isl++ [NFC]
llvm-svn: 310230
2017-08-06 21:42:25 +00:00
Tobias Grosser 85048eff1a [ScopInfo] Move ScopStmt::ScopStmt to isl++ [NFC]
llvm-svn: 310210
2017-08-06 17:24:59 +00:00
Tobias Grosser dcf8d696ff Move ScopInfo::getDomain(), getDomainSpace(), getDomainId() to isl++
llvm-svn: 310209
2017-08-06 16:39:52 +00:00
Tobias Grosser feae3dfe9f [unittests] Add unittest for getPartialTilePrefixes
In https://reviews.llvm.org/D36278 it was pointed out that the behavior of
getPartialTilePrefixes is not very well understood. To allow for a better
understanding, we first provide some basic unittests.

llvm-svn: 310175
2017-08-05 09:38:09 +00:00
Tobias Grosser 7b45af13ce Move setNewAccessRelation to isl++
llvm-svn: 309871
2017-08-02 19:27:25 +00:00
Roman Gareev 2e580538be [ScheduleOptimizer] Translate to C++ bindings
Translate the ScheduleOptimizer to use the new isl C++ bindings.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D35845

llvm-svn: 309119
2017-07-26 14:59:15 +00:00
Tobias Grosser d7065e5df5 Move MemoryAccess::isStride* to isl++
llvm-svn: 308927
2017-07-24 20:50:22 +00:00
Tobias Grosser 1515f6b937 Move MemoryAccess::NewAccessRelation to isl++
We also move related accessor functions

llvm-svn: 308840
2017-07-23 04:08:38 +00:00
Tobias Grosser 77eef90f50 Move ScopArrayInfo to isl++
This moves the full ScopArrayInfo class to isl++

llvm-svn: 308801
2017-07-21 23:07:56 +00:00
Michael Kruse cd4c977b8b [ScopInfo] Print instructions in dump().
Print a statement's instruction on dump() regardless of
-polly-print-instructions. dump() is supposed to be used in the debugger
only and never in regression tests. While debugging, get all the
information we have and we are not bound to break anything. For non-dump
purposes of print, forward the setting of -polly-print-instructions as
parameters.

Some calls to print() had to be changed because the
PollyPrintInstructions setting is only available in ScopInfo.cpp.
In ScheduleOptimizer.cpp, dump() was used in regression tests.
That's not what dump() is for.

The print parameter "PrintInstructions" will also be useful for an
explicit print SCoP pass in a future patch.

llvm-svn: 308746
2017-07-21 15:35:53 +00:00
Roman Gareev 750374181b Make the pattern matching work with modified memory accesses
Some optimizations (e.g., DeLICM) can modify memory accesses (e.g., change
their MemoryKind). Consequently, the pattern matching should take it into
the account.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D33138

llvm-svn: 308494
2017-07-19 16:59:06 +00:00
Singapuram Sanjay Srivallabh 02ca346e48 Introduce a hybrid target to generate code for either the GPU or CPU
Summary:
Introduce a "hybrid" `-polly-target` option to optimise code for either the GPU or CPU.

When this target is selected, PPCGCodeGeneration will attempt first to optimise a Scop. If the Scop isn't modified, it is then sent to the passes that form the CPU pipeline, i.e. IslScheduleOptimizerPass, IslAstInfoWrapperPass and CodeGeneration.

In case the Scop is modified, it is marked to be skipped by the subsequent CPU optimisation passes.

Reviewers: grosser, Meinersbur, bollu

Reviewed By: grosser

Subscribers: kbarton, nemanjai, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D34054

llvm-svn: 306863
2017-06-30 19:42:21 +00:00
Tobias Grosser dcd94e3e93 [ScheduleOptimizer] Fix minor typo [NFC]
llvm-svn: 305709
2017-06-19 16:55:48 +00:00
Tobias Grosser 2fb3ed200a [ScheduleOptimizer] Move isolateFullPartialTiles and isolateAndUnrollMatMulInnerLoops to C++
llvm-svn: 305676
2017-06-19 10:40:12 +00:00
Michael Kruse a6d48f59a1 Fix a lot of typos. NFC.
llvm-svn: 304974
2017-06-08 12:06:15 +00:00
Tobias Grosser 7205f93a98 [ScheduleOptimizer] Move schedule construction to isl C++ [NFC]
llvm-svn: 303508
2017-05-21 16:21:33 +00:00
Tobias Grosser f3adab4c20 [Polly] Canonicalize arrays according to base-ptr equivalence class
Summary:
    In case two arrays share base pointers in the same invariant load equivalence
    class, we canonicalize all memory accesses to the first of these arrays
    (according to their order in the equivalence class).

    This enables us to optimize kernels such as boost::ublas by ensuring that
    different references to the C array are interpreted as accesses to the same
    array. Before this change the runtime alias check for ublas would fail, as it
    would assume models of the C array with differing (but identically valued) base
    pointers would reference distinct regions of memory whereas the referenced
    memory regions were indeed identical.

    As part of this change we remove most of the MemoryAccess::get*BaseAddr
    interface. We removed already all references to get*BaseAddr in previous
    commits to ensure that no code relies on matching base pointers between
    memory accesses and scop arrays -- except for three remaining uses where we
    need the original base pointer. We document for these situations that
    MemoryAccess::getOriginalBaseAddr may return a base pointer that is distinct
    to the base pointer of the scop array referenced by this memory access.

Reviewers: sebpop, Meinersbur, zinob, gareevroman, pollydev, huihuiz, efriedma, jdoerfert

Reviewed By: Meinersbur

Subscribers: etherzhhb

Tags: #polly

Differential Revision: https://reviews.llvm.org/D28518

llvm-svn: 302636
2017-05-10 10:59:58 +00:00
Roman Gareev 9d4d91ca6a [FIX] Fix ScheduleTreeOptimizer::optimizeMatMulPattern
Use new values of the dimensions during their permutation.

llvm-svn: 299663
2017-04-06 17:25:08 +00:00
Roman Gareev e0d466342b Restore the initial ordering of dimensions before applying the pattern matching
Dimensions of band nodes can be implicitly permuted by the algorithm applied
during the schedule generation.

For example, in case of the following matrix-matrix multiplication,

for (i = 0; i < 1024; i++)
  for (k = 0; k < 1024; k++)
    for (j = 0; j < 1024; j++)
      C[i][j] += A[i][k] * B[k][j];

it can produce the following schedule tree

domain: "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and
                                        0 <= i2 <= 1023 }"
child:
  schedule: "[{ Stmt_for_body6[i0, i1, i2] -> [(i0)] },
              { Stmt_for_body6[i0, i1, i2] -> [(i1)] },
              { Stmt_for_body6[i0, i1, i2] -> [(i2)] }]"
  permutable: 1
  coincident: [ 1, 1, 0 ]

The current implementation of the pattern matching optimizations relies on the
initial ordering of dimensions. Otherwise, it can produce the miscompilation
(e.g., [1]).

This patch helps to restore the initial ordering of dimensions by recreating
the band node when the corresponding conditions are satisfied.

Refs.:

[1] - https://bugs.llvm.org/show_bug.cgi?id=32500

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D31741

llvm-svn: 299662
2017-04-06 17:09:54 +00:00
Siddharth Bhat 5eeb1dd42e [Polly] [ScheduleOptimizer] Prevent incorrect tile size computation
Because Polly exposes parameters that directly influence tile size
calculations, one can setup situations like divide-by-zero.

Check against a possible divide-by-zero in getMacroKernelParams
and return early.

Also assert at the end of getMacroKernelParams that the block sizes
computed for matrices are positive (>= 1).

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31708

llvm-svn: 299633
2017-04-06 08:20:22 +00:00
Siddharth Bhat bcbfdade41 [Polly] [DependenceInfo] change WAR, WAW generation to correct semantics
= Change of WAR, WAW generation: =

- `buildFlow(Sink, MustSource, MaySource, Sink)` treates any flow of the form
    `sink <- may source <- must source` as a *may* dependence.

- we used to call:
```lang=cpp, name=old-flow-call.cpp
Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
WAW = isl_union_flow_get_must_dependence(Flow);
WAR = isl_union_flow_get_may_dependence(Flow);
```

- This caused some WAW dependences to be treated as WAR dependences.
- Incorrect semantics.

- Now, we call WAR and WAW correctly.

== Correct WAW: ==
```lang=cpp, name=new-waw-call.cpp
   Flow = buildFlow(Write, MustWrite, MayWrite, Schedule);
   WAW = isl_union_flow_get_may_dependence(Flow);
   isl_union_flow_free(Flow);
```

== Correct WAR: ==
```lang=cpp, name=new-war-call.cpp
    Flow = buildFlow(Write, Read, MustaWrite, Schedule);
    WAR = isl_union_flow_get_must_dependence(Flow);
    isl_union_flow_free(Flow);
```

- We want the "shortest" WAR possible (exact dependences).
- We mark all the *must-writes* as may-source, reads as must-souce.
- Then, we ask for *must* dependence.
- This removes all the reads that flow through a *must-write*
  before reaching a sink.
- Note that we only block ealier writes with *must-writes*. This is
  intuitively correct, as we do not want may-writes to block
  must-writes.
- Leaves us with direct (R -> W).

- This affects reduction generation since RED is built using WAW and WAR.

= New StrictWAW for Reductions: =

- We used to call:
```lang=cpp,name=old-waw-war-call.cpp
      Flow = buildFlow(MustWrite, MustWrite, Read, Schedule);
      WAW = isl_union_flow_get_must_dependence(Flow);
      WAR = isl_union_flow_get_may_dependence(Flow);
```

- This *is* the right model of WAW we need for reductions, just not in general.
- Reductions need to track only *strict* WAW, without any interfering reductions.

= Explanation: Why the new WAR dependences in tests are correct: =

- We no longer set WAR = WAR - WAW
- Hence, we will have WAR dependences that were originally removed.
- These may look incorrect, but in fact make sense.

== Code: ==
```lang=llvm, name=new-war-dependence.ll
  ;    void manyreductions(long *A) {
  ;      for (long i = 0; i < 1024; i++)
  ;        for (long j = 0; j < 1024; j++)
  ; S0:          *A += 42;
  ;
  ;      for (long i = 0; i < 1024; i++)
  ;        for (long j = 0; j < 1024; j++)
  ; S1:          *A += 42;
  ;
```
=== WAR dependence: ===
  {  S0[1023, 1023] -> S1[0, 0] }

- Between `S0[1023, 1023]` and `S1[0, 0]`, we will have the dependences:

```lang=cpp, name=dependence-incorrect, counterexample
        S0[1023, 1023]:
    *-- tmp = *A (load0)--*
WAR 2   add = tmp + 42    |
    *-> *A = add (store0) |
                         WAR 1
        S1[0, 0]:         |
        tmp = *A (load1)  |
        add = tmp + 42    |
        A = add (store1)<-*
```

- One may assume that WAR2 *hides* WAR1 (since store0 happens before
  store1). However, within a statement, Polly has no idea about the
  ordering of loads and stores.

- Hence, according to Polly, the code may have looked like this:
```lang=cpp, name=dependence-correct
    S0[1023, 1023]:
    A = add (store0)
    tmp = A (load0) ---*
    add = A + 42       |
                     WAR 1
    S1[0, 0]:          |
    tmp = A (load1)    |
    add = A + 42       |
    A = add (store1) <-*
```

- So, Polly  generates (correct) WAR dependences. It does not make sense
  to remove these dependences, since they are correct with respect to
  Polly's model.

    Reviewers: grosser, Meinersbur

    tags: #polly

    Differential revision: https://reviews.llvm.org/D31386

llvm-svn: 299429
2017-04-04 13:08:23 +00:00
Roman Gareev cdfb57dc46 Introduce another level of metadata to distinguish non-aliasing accesses
Introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. It can be used to,
for example, distinguish different stores (loads) produced by unrolling of
the innermost loops and, subsequently, sink (hoist) them by LICM.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30606

llvm-svn: 298510
2017-03-22 14:25:24 +00:00
Siddharth Bhat 3e4a7d38ab [ScheduleOptimiser] fix typos in top comment [NFC]
coice -> choice
Transations -> Transactions

llvm-svn: 298095
2017-03-17 14:52:19 +00:00
Tobias Grosser c9d4cb2f42 [ScheduleOptimizer] Allow tiling after fusion
In ScheduleOptimizer::isTileableBand(), allow the case in which
the band node's child is an isl_schedule_sequence_node and its
grandchildren isl_schedule_leaf_nodes. This case can arise when
two or more statements are fused by the isl scheduler.

The tile_after_fusion.ll test has two statements in separate
loop nests and checks whether they are tiled after being fused
when polly-opt-fusion equals "max".

Reviewers: grosser

Subscribers: gareevroman, pollydev

Tags: #polly

Contributed-by: Theodoros Theodoridis <theodort@student.ethz.ch>

Differential Revision: https://reviews.llvm.org/D30815

llvm-svn: 297587
2017-03-12 19:02:31 +00:00
Roman Gareev 96e1119a96 Make optimizations based on pattern matching be enabled by default
Currently, pattern based optimizations of Polly can identify matrix
multiplication and optimize it according to BLIS matmul optimization pattern
(see ScheduleTreeOptimizer for details). This patch makes optimizations
based on pattern matching be enabled by default.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30293

llvm-svn: 295958
2017-02-23 11:44:12 +00:00
Roman Gareev 4eb07e481e [FIX] Fix the typo in ScheduleOptimizer.cpp.
llvm-svn: 295292
2017-02-16 07:04:41 +00:00
Roman Gareev b196055c0c Check reduction dependencies in case of the matrix multiplication optimization
To determine parameters of the matrix multiplication, we check RAW dependencies
that can be expressed using only reduction dependencies. Consequently, we
should check the reduction dependencies, if this is the case.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Sven Verdoolaege <skimo-polly@kotnet.org>
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29814

llvm-svn: 294836
2017-02-11 09:59:09 +00:00
Roman Gareev de69293b01 [FIX] Fix the potential issue of containsOnlyMatMulDep.
llvm-svn: 294835
2017-02-11 09:48:09 +00:00
Roman Gareev 5ef7e210c0 [NFC] Fix the style issue of lib/Transform/ScheduleOptimizer.cpp.
llvm-svn: 294834
2017-02-11 08:43:41 +00:00
Roman Gareev afcf026d81 [NFC] Fix style issues of lib/Transform/ScheduleOptimizer.cpp.
llvm-svn: 294831
2017-02-11 07:14:37 +00:00
Roman Gareev 3d4eae31ea Use the size of the widest type of the matrix multiplication operands
The size of the operands type is the one of the parameters required
to determine the BLIS micro-kernel. We get the size of the widest type
of the matrix multiplication operands in case there are several
different types.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29269

llvm-svn: 294828
2017-02-11 07:00:05 +00:00
Roman Gareev 9989088ee9 Isolate a set of partial tile prefixes in case of the matrix multiplication
optimization

Isolate a set of partial tile prefixes to allow hoisting and sinking out of
the unrolled innermost loops produced by the optimization of the matrix
multiplication.

In case it cannot be proved that the number of loop iterations can be evenly
divided by tile sizes and we tile and unroll the point loop, the isl generates
conditional expressions. Subsequently, the conditional expressions can prevent
stores and loads of the unrolled loops from being sunk and hoisted.

The patch isolates a set of partial tile prefixes, which have exactly Mr x Nr
iterations of the two innermost loops, the result of the loop tiling performed
by the matrix multiplication optimization, where Mr and Mr are parameters of
the micro-kernel. This helps to get rid of the conditional expressions of
the unrolled innermost loops. Probably this approach can be replaced with
padding in future.

In case of, for example, the gemm from Polybench/C 3.2 and parametric loop
bounds, it helps to increase the performance from 7.98 GFlops (27.71% of
theoretical peak) to 21.47 GFlops (74.57% of theoretical peak). Hence, we
get the same performance as in case of scalar loops bounds.

It also cause compile time regression. The compile-time is increased from
0.795 seconds to 0.837 seconds in case of scalar loops bounds and from 1.222
seconds to 1.490 seconds in case of parametric loops bounds.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D29244

llvm-svn: 294564
2017-02-09 07:10:01 +00:00
Roman Gareev 772498dc68 [NFC] Make ScheduleTreeOptimizer::optimizeBand return a schedule node optimized
with optimizeMatMulPattern

This patch makes ScheduleTreeOptimizer::optimizeBand return a schedule node
optimized with optimizeMatMulPattern. Otherwise, it could not use the isolate
option, because standardBandOpts could try to tile a band node with anchored
subtree and get the error, since the use of the isolate option causes any tree
containing the node to be considered anchored. Furthermore, it is not intended
to apply standard optimizations, when the matrix multiplication has been
detected.

llvm-svn: 294444
2017-02-08 13:29:06 +00:00
Roman Gareev 98075fe181 A new algorithm for identification of a SCoP statement that implement a matrix
multiplication

The current identification of a SCoP statement that implement a matrix
multiplication does not help to identify different permutations of loops that
contain it and check for dependencies, which can prevent it from being
optimized. It also requires external determination of the operands of
the matrix multiplication. This patch contains the implementation of a new
algorithm that helps to avoid these issues. It also modifies the test cases
that generate matrix multiplications with linearized accesses, because
the new algorithm does not support them.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
             Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D28357

llvm-svn: 293890
2017-02-02 14:23:14 +00:00
Tobias Grosser ff40087a6a Update to recent formatting changes
llvm-svn: 293756
2017-02-01 10:12:09 +00:00
Roman Gareev 7758a2af53 Update the documentation on how the packing transformation is implemented
Add a simple example to update the documentation on how the packing
transformation is implemented.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D28021

llvm-svn: 293429
2017-01-29 10:37:50 +00:00
Tobias Grosser 21a059af09 Adjust formatting to commit r292110 [NFC]
llvm-svn: 292123
2017-01-16 14:08:10 +00:00
Tobias Grosser 67e94fb435 ScheduleOptimizer: Allow to set register width in command line
We use this option to set a fixed register width in our test cases to make
sure the results are identical accross platforms.

llvm-svn: 292002
2017-01-14 07:14:54 +00:00
Roman Gareev 1c2927b209 Specify the default values of the cache parameters
If the parameters of the target cache (i.e., cache level sizes, cache level
associativities) are not specified or have wrong values, we use ones for
parameters of the macro-kernel and do not perform data-layout optimizations of
the matrix multiplication. In this patch we specify the default values of the
cache parameters to be able to apply the pattern matching optimizations even in
this case. Since there is no typical values of this parameters, we use the
parameters of Intel Core i7-3820 SandyBridge that also help to attain the
high-performance on IBM POWER System S822 and IBM Power 730 Express server.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D28090

llvm-svn: 290518
2016-12-25 16:32:28 +00:00
Tobias Grosser 0791d5f5aa ScheduleOptimizer: Fix spelling of option '-polly-target-throughput-vector-fma'
througput -> throughput

llvm-svn: 290418
2016-12-23 07:33:39 +00:00
Roman Gareev be5299af0b Change the determination of parameters of macro-kernel
Typically processor architectures do not include an L3 cache, which means that
Nc, the parameter of the micro-kernel, is, for all practical purposes,
redundant ([1]). However, its small values can cause the redundant packing of
the same elements of the matrix A, the first operand of the matrix
multiplication. At the same time, big values of the parameter Nc can cause
segmentation faults in case the available stack is exceeded.

This patch adds an option to specify the parameter Nc as a multiple of
the parameter of the micro-kernel Nr.

In case of Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=8

it helps to improve the performance from 11.303 GFlops/sec (39,247% of
theoretical peak) to 17.896 GFlops/sec (62,14% of theoretical peak).

Refs.:

[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D28019

llvm-svn: 290256
2016-12-21 12:51:12 +00:00
Roman Gareev 92c446016a [Polly] Use three-dimensional arrays to store packed operands of the matrix
multiplication

Previously we had two-dimensional accesses to store packed operands of
the matrix multiplication for the sake of simplicity of the packed arrays.
However, addition of the third dimension helps to simplify the corresponding
memory access, reduce the execution time of isl operations applied to it, and
consequently reduce the compile-time of Polly. For example, in case of
Intel Core i7-3820 SandyBridge and the following options,

clang -O3 gemm.c -I utilities/ utilities/polybench.c -DPOLYBENCH_TIME
-march=native -mllvm -polly -mllvm -polly-pattern-matching-based-opts=true
-DPOLYBENCH_USE_SCALAR_LB -mllvm -polly-target-cache-level-associativity=8,8
-mllvm -polly-target-cache-level-sizes=32768,262144 -mllvm
-polly-target-latency-vector-fma=7

it helps to reduce the compile-time from about 361.456 seconds to about 0.816
seconds.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>,
             Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D27878

llvm-svn: 290251
2016-12-21 11:18:42 +00:00
Roman Gareev 2606c48a1d Restrict ranges of extension maps
To prevent copy statements from accessing arrays out of bounds, ranges of their
extension maps are restricted, according to the constraints of domains.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D25655

llvm-svn: 289815
2016-12-15 12:35:59 +00:00
Roman Gareev 15db81ef71 [NFC] Fix typos in getMacroKernelParams.
llvm-svn: 289808
2016-12-15 12:00:57 +00:00
Roman Gareev 8babe1a216 The order of the loops defines the data reused in the BLIS implementation of
gemm ([1]). In particular, elements of the matrix B, the second operand of
matrix multiplication, are reused between iterations of the innermost loop.
To keep the reused data in cache, only elements of matrix A, the first operand
of matrix multiplication, should be evicted during an iteration of the
innermost loop. To provide such a cache replacement policy, elements of the
matrix A can, in particular, be loaded first and, consequently, be
least-recently-used.

In our case matrices are stored in row-major order instead of column-major
order used in the BLIS implementation ([1]). One of the ways to address it is
to accordingly change the order of the loops of the loop nest. However, it
makes elements of the matrix A to be reused in the innermost loop and,
consequently, requires to load elements of the matrix B first. Since the LLVM
vectorizer always generates loads from the matrix A before loads from the
matrix B and we can not provide it. Consequently, we only change the BLIS micro
kernel and the computation of its parameters instead. In particular, reused
elements of the matrix B are successively multiplied by specific elements of
the matrix A .

Refs.:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D25653

llvm-svn: 289806
2016-12-15 11:47:38 +00:00
Michael Kruse 79c0173f53 [ScheduleOptimizer] Fix memory leak. NFC.
llvm-svn: 289434
2016-12-12 14:51:06 +00:00
Roman Gareev b3224adfb6 Perform copying to created arrays according to the packing transformation
This is the fourth patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform copying to created
arrays, which is the last step to implement the packing transformation.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D23260

llvm-svn: 281441
2016-09-14 06:26:09 +00:00
Roman Gareev f5aff70405 Store the size of the outermost dimension in case of newly created arrays that require memory allocation.
We do not need the size of the outermost dimension in most cases, but if we
allocate memory for newly created arrays, that size is needed.

Reviewed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D23991

llvm-svn: 281234
2016-09-12 17:08:31 +00:00
Tobias Grosser c80d6979bd Drop '@brief' from doxygen comments
LLVM's coding guideline suggests to not use @brief for one-sentence doxygen
comments to improve readability. Switch this once and for all to ensure people
do not copy @brief comments from other parts of Polly, when writing new code.

llvm-svn: 280468
2016-09-02 06:33:33 +00:00
Roman Gareev 5f99f8656e Add a flag to dump SCoP optimized with the IslScheduleOptimizer pass
Dump polyhedral descriptions of Scops optimized with the isl scheduling
optimizer and the set of post-scheduling transformations applied
on the schedule tree to be able to check the work of the IslScheduleOptimizer
pass at the polyhedral level.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D23740

llvm-svn: 279395
2016-08-21 11:20:39 +00:00
Roman Gareev 1c892e91e3 Perform replacement of access relations and creation of new arrays according to the packing transformation
This is the third patch to apply the BLIS matmul optimization pattern on matmul
kernels (http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus two
packing routines. The macro-kernel is implemented in terms of two additional
loops around a micro-kernel. The micro-kernel is a loop around a rank-1
(i.e., outer product) update. In this change we perform replacement of
the access relations and create empty arrays, which are steps to implement
the packing transformation. In subsequent changes we will implement copying
to created arrays.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D22187

llvm-svn: 278666
2016-08-15 12:22:54 +00:00
Tobias Grosser 2219d15748 Fix a couple of spelling mistakes
llvm-svn: 277569
2016-08-03 05:28:09 +00:00
Roman Gareev 3a18a931a8 Apply all necessary tilings and interchangings to get a macro-kernel
This is the second patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel, plus
two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update. In this change
we create the BLIS macro-kernel by applying a combination of tiling
and interchanging. In subsequent changes we will implement the packing
transformation.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D21491

llvm-svn: 276627
2016-07-25 09:42:53 +00:00
Roman Gareev 2cb4d133f5 [NFC] Refactor creation of the BLIS mirco-kernel and improve documentation
Reviewed-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 276616
2016-07-25 07:27:59 +00:00
Tobias Grosser 3898a0468c Propagate on-error status
This ensures that the error status set with -polly-on-isl-error-abort is
maintained even after running DependenceInfo and ScheduleOptimizer. Both
passes temporarily set the error status to CONTINUE as the dependence
analysis uses a compute-out and the scheduler may not be able to derive
a schedule. In both cases we want to not abort, but to handle the error
gracefully. Before this commit, we always set the error reporting to ABORT
after these passes. After this commit, we use the error reporting mode that was
active earlier.

This comes without a test case as this would require us to introduce (memory)
errors which would trigger the isl errors.

llvm-svn: 274272
2016-06-30 20:42:58 +00:00
Tobias Grosser af14993016 Simplify: get isl_ctx only once [NFC]
... instead of call S.getIslCtx() many times.

llvm-svn: 274271
2016-06-30 20:42:56 +00:00
Tobias Grosser 522478d2c0 clang-tidy: Add llvm namespace comments
llvm commonly adds a comment to the closing brace of a namespace to indicate
which namespace is closed. clang-tidy provides with llvm-namespace-comment
a handy tool to check for this habit. We use it to ensure we consitently use
namespace comments in Polly.

There are slightly different styles in how namespaces are closed in LLVM. As
there is no large difference between the different comment styles we go for the
style clang-tidy suggests by default.

To reproduce this fix run:

for i in `ls tools/polly/lib/*/*.cpp`; \
  clang-tidy -checks='-*,llvm-namespace-comment' -p build $i -fix \
  -header-filter=".*"; \
done

This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.

llvm-svn: 273621
2016-06-23 22:17:27 +00:00
Tobias Grosser 8dd653d983 clang-tidy: apply modern-use-nullptr fixes
Instead of using 0 or NULL use the C++11 nullptr symbol when referencing null
pointers.

This cleanup was suggested by Eugene Zelenko <eugene.zelenko@gmail.com> in
http://reviews.llvm.org/D21488 and was split out to increase readability.

llvm-svn: 273435
2016-06-22 16:22:00 +00:00
Roman Gareev 397a34a08d [NFC] Use isl_schedule_node_band_n_member to get the number of dimensions of a band node.
llvm-svn: 273400
2016-06-22 12:11:30 +00:00
Roman Gareev 42402c9e89 Apply all necessary tilings and unrollings to get a micro-kernel
This is the first patch to apply the BLIS matmul optimization pattern
on matmul kernels
(http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf).
BLIS implements gemm as three nested loops around a macro-kernel,
plus two packing routines. The macro-kernel is implemented in terms
of two additional loops around a micro-kernel. The micro-kernel
is a loop around a rank-1 (i.e., outer product) update.
In this change we create the BLIS micro-kernel by applying
a combination of tiling and unrolling. In subsequent changes
we will add the extraction of the BLIS macro-kernel
and implement the packing transformation.

Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D21140

llvm-svn: 273397
2016-06-22 09:52:37 +00:00
Roman Gareev b17b9a8324 [NFC] Outline the application of register tiling.
llvm-svn: 272515
2016-06-12 17:20:05 +00:00
Roman Gareev 827264de98 [NFC] "#include <ciso646>" is unnecessary, because "and", "or" were replaced
by "&&", "||".

llvm-svn: 272168
2016-06-08 16:44:11 +00:00
Roman Gareev ba0fb97c0a [NFC] Check that a parameter of ScheduleTreeOptimizer::isMatrMultPattern contains a correct partial schedule
llvm-svn: 271780
2016-06-04 06:34:04 +00:00
Roman Gareev 4b8c7aeb62 [FIX] Fix potential issue related to subtraction from an unsigned 0 in circularShiftOutputDims
Reported-by: Mehdi Amini <mehdi.amini@apple.com>
Contributed-by: Michael Kruse <llvm@meinersbur.de>

Differential Revision: http://reviews.llvm.org/D20969

llvm-svn: 271705
2016-06-03 18:46:29 +00:00
Roman Gareev 76614d3ed9 [GSoC 2016] [Polly] [FIX] Determination of statements that contain matrix
multiplication

Fix small issues related to characters, operators  and descriptions of tests.

Differential Revision: http://reviews.llvm.org/D20806

llvm-svn: 271264
2016-05-31 11:22:21 +00:00
Johannes Doerfert 99191c78c2 Decouple SCoP building logic from pass
Created a new pass ScopInfoRegionPass. As name suggests, it is a
  region pass and it is there to preserve compatibility with our
  existing Polly passes.  ScopInfoRegionPass will return a SCoP object
  for a valid region while the creation of the SCoP stays in the
  ScopInfo class.

  Contributed-by: Utpal Bora <cs14mtech11017@iith.ac.in>
  Reviewed-by: Tobias Grosser <tobias@grosser.es>,
               Johannes Doerfert <doerfert@cs.uni-saarland.de>

Differential Revision: http://reviews.llvm.org/D20770

llvm-svn: 271259
2016-05-31 09:41:04 +00:00
Michael Kruse 7410a27820 MSVC compile fix: #include <ciso646>. NFC.
This header is required to make the ISO 646 alternative operator
spellings ("and", "or" instead of "&&", "||") work. Should these
operators be replaced by the standard ones as already suggested by
Johannes, also remove this #include again.

llvm-svn: 271206
2016-05-30 14:27:14 +00:00
Roman Gareev 9c3eb5960a Determination of statements that contain matrix multiplication
Add determination of statements that contain, in particular,
matrix multiplications and can be optimized with [1] to try to
get close-to-peak performance. It can be enabled
via polly-pm-based-opts, which is false by default.

Refs:
[1] - http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

Contributed-by: Roman Gareev <gareevroman@gmail.com>
Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D20575

llvm-svn: 271128
2016-05-28 16:17:58 +00:00
Michael Kruse 315aa3278e [ScheduleOptimizer] Add -polly-opt-outer-coincidence option.
Add a command line switch to set the
isl_options_set_schedule_outer_coincidence option. ISL then tries to
build schedules where the outer member of a band satisfies the
coincidence constraints.

In practice this allows loop skewing for more parallelism in inner
loops.

llvm-svn: 268222
2016-05-02 11:35:27 +00:00
Hongbin Zheng 2a798853f8 Allow the client of DependenceInfo to obtain dependences at different granularities.
llvm-svn: 262591
2016-03-03 08:15:33 +00:00
Roman Gareev 11001e1534 Annotation of SIMD loops
Use 'mark' nodes annotate a SIMD loop during ScheduleTransformation and skip
parallelism checks.

The buildbot shows the following compile/execution time changes:

  Compile time:
    Improvements    Δ     Previous  Current  σ
    …/gesummv      -6.06% 0.2640    0.2480   0.0055
    …/gemver       -4.46% 0.4480    0.4280   0.0044
    …/covariance   -4.31% 0.8360    0.8000   0.0065
    …/adi          -3.23% 0.9920    0.9600   0.0065
    …/doitgen      -2.53% 0.9480    0.9240   0.0090
    …/3mm          -2.33% 1.0320    1.0080   0.0087

  Execution time:
    Regressions     Δ     Previous  Current  σ
    …/viterbi       1.70% 5.1840    5.2720   0.0074
    …/smallpt       1.06% 12.4920   12.6240  0.0040

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: http://reviews.llvm.org/D14491

llvm-svn: 261620
2016-02-23 09:00:13 +00:00
Tobias Grosser ca7f5bb767 Full/partial tile separation for vectorization
We isolate full tiles from partial tiles to be able to, for example, vectorize
loops with parametric lower and/or upper bounds.

If we use -polly-vectorizer=stripmine, we can see execution-time improvements:
correlation from 1m7361s to 0m5720s (-67.05 %), covariance from 1m5561s to
0m5680s (-63.50 %), ary3 from 2m3201s to 1m2361s (-46.72 %), CrystalMk from
8m5565s to 7m4285s (-13.18 %).

The current full/partial tile separation increases compile-time more than
necessary. As a result, we see in compile time regressions, for example, for 3mm
from 0m6320s to 0m9881s (56.34%). Some of this compile time increase is expected
as we generate more IR and consequently more time is spent in the LLVM backends.
However, a first investiagation has shown that a larger portion of compile time
is unnecessarily spent inside Polly's parallelism detection and could be
eliminated by propagating existing knowledge about vector loop parallelism.
Before enabling -polly-vectorizer=stripmine by default, it is necessary to
address this compile-time issue.

Contributed-by: Roman Gareev <gareevroman@gmail.com>

Reviewers: jdoerfert, grosser

Subscribers: grosser, #polly

Differential Revision: http://reviews.llvm.org/D13779

llvm-svn: 250809
2015-10-20 09:12:21 +00:00
Johannes Doerfert 45be64464b [NFC] Consistenly use commented and annotated ScopPass functions
The changes affect methods that are part of the Pass interface and
  include:
    - Comments that describe the methods purpose.
    - A consistent use of the keywords override and virtual.
  Additionally, the printScop method is now optional and removed from
  SCoP passes that do not implement it.

llvm-svn: 248685
2015-09-27 15:43:29 +00:00
Johannes Doerfert 0f37630849 [NFC] Use releaseMemory to release internal memory
llvm-svn: 248684
2015-09-27 15:42:28 +00:00
Tobias Grosser fa57e9b7e6 Make our data-locality schedule tree transforms externally accessible
Other passes which perform different optimizations might be interested in
also applying data-locality transformations as part of their overall
transformation.

llvm-svn: 245824
2015-08-24 06:01:47 +00:00
Tobias Grosser 1ac884d73a Use marker nodes to annotate the different levels of tiling
Currently, marker nodes are ignored during AST generation, but visible in the
-debug-only=polly-ast output.

llvm-svn: 245809
2015-08-23 09:11:00 +00:00
Tobias Grosser fc490a99f5 Do really not unroll the vector loop in combination with register tiling
The previous commit lacked a test case for register tiling + pre-vectorization
and we obviously got it immediately wrong.

llvm-svn: 245599
2015-08-20 19:08:16 +00:00
Tobias Grosser 42e2489553 Add experimental support for trivial register tiling
Register tiling in Polly is for now just an additional level of tiling which
is fully unrolled. It is disabled by default. To make this useful for more than
experiments, we still need a cost function as well as possibly further
optimizations that teach LLVM to actually put some of the values we got into
scalar registers.

llvm-svn: 245564
2015-08-20 13:45:05 +00:00
Tobias Grosser 0483271662 Add support for two-level tiling
By default we only use one level of tiling for loops, but in general tiling
for multiple levels is trivial for us. Hence, we add a set of options that
allow people to play with a second level of tiling. If this is profitable for
some cases we can work on heuristics that allow us to identify these cases
and use two-level tiling for them.

llvm-svn: 245563
2015-08-20 13:45:02 +00:00
Tobias Grosser 862b9b5239 Factor out check for tileable band node.
llvm-svn: 245559
2015-08-20 12:32:45 +00:00
Tobias Grosser 9bdea573bd Introduce tileBand function to simplify code
llvm-svn: 245558
2015-08-20 12:22:37 +00:00
Tobias Grosser d891b54132 Add some forgotten isl memory annotations
llvm-svn: 245557
2015-08-20 12:16:23 +00:00
Tobias Grosser 07c1c2fcc9 Make prevectorization width configurable
Polly uses 'prevectorization' to enable outer loop vectorization. When
vectorizing an outer loop, we strip-mine <number-of-prevec-dims> loop
iterations which are than interchanged to the innermost level such that LLVM's
inner loop vectorizer (or Polly's simple vectorizer) can easily vectorize this
loop. The number of loop iterations to strip-mine is now configurable with the
option -polly-prevect-width=<number-of-prevec-dims>.

This is mostly a debugging option. We should probably add a heuristic that
derives the number of prevectorization dimensions from the target data and
the data types used.

llvm-svn: 245424
2015-08-19 08:46:11 +00:00
Tobias Grosser 161c9081e5 Do not use negative option name
Instead of -polly-no-tiling, we use -polly-tiling=false to disable tiling.

llvm-svn: 245423
2015-08-19 08:22:06 +00:00
Tobias Grosser f10f4636ff Simplify tiling code a bit
We only need to allocate the tile size vector if we actually want to perform
a tiling.

llvm-svn: 245422
2015-08-19 08:03:37 +00:00
Tobias Grosser 234a48270e AST Generation Paper published in TOPLAS
The July issue of TOPLAS contains a 50 page discussion of the AST generation
techniques used in Polly. This discussion gives not only an in-depth
description of how we (re)generate an imperative AST from our polyhedral based
mathematical program description, but also gives interesting insights about:

- Schedule trees: A tree-based mathematical program description that enables us
to perform loop transformations on an abstract level, while issues like the
generation of the correct loop structure and loop bounds will be taken care of
by our AST generator.

- Polyhedral unrolling: We discuss techniques that allow the unrolling of
non-trivial loops in the context of parameteric loop bounds, complex tile
shapes and conditionally executed statements. Such unrolling support enables
the generation of predicated code e.g. in the context of GPGPU computing.

- Isolation for full/partial tile separation: We discuss native support for
handling full/partial tile separation and -- in general -- native support for
isolation of boundary cases to enable smooth code generation for core
computations.

- AST generation with modulo constraints: We discuss how modulo mappings are
lowered to efficient C/LLVM code.

- User-defined constraint sets for run-time checks We discuss how arbitrary
sets of constraints can be used to automatically create run-time checks that
ensure a set of constrainst actually hold. This feature is very useful to
verify at run-time various assumptions that have been taken program
optimization.

Polyhedral AST generation is more than scanning polyhedra
Tobias Grosser, Sven Verdoolaege, Albert Cohen
ACM Transations on Programming Languages and Systems (TOPLAS), 37(4), July 2015

llvm-svn: 245157
2015-08-15 09:34:33 +00:00
Tobias Grosser b241d928bd Rewrite getPrevectorMap using schedule trees operations
Schedule trees are a lot easier to work with, for both humans and machines. For
humans the more structured schedule representation is easier to reason about.
Together with the more abstract isl programming interface this can result in a
lot cleaner code (see this changeset). For machines, the structured schedule and
the fact that we now use explicit piecewise affine expressions instead of
integer maps makes it easier to generate code from this schedule tree. As a
result, we can already see a slight compile-time improvement -- for 3mm from
0m0.593s to 0m0.551s seconds (-7 %). More importantly, future optimizations such
as full-partial tile separation will most likely result in more streamlined code
to be generated.

Contributed-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 243458
2015-07-28 18:03:36 +00:00
Tobias Grosser 2764794ba4 Simplify some isl expression we use
Suggested-by: Sven Verdoolaege <skimo-polly@kotnet.org>
llvm-svn: 243254
2015-07-26 19:22:35 +00:00
Tobias Grosser 3b10c94062 Prevectorize the schedule of the band (or the point loop in case of tiling)
Contributed-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 243214
2015-07-25 12:28:56 +00:00
Tobias Grosser 808cd69a92 Use schedule trees to represent execution order of statements
Instead of flat schedules, we now use so-called schedule trees to represent the
execution order of the statements in a SCoP. Schedule trees make it a lot easier
to analyze, understand and modify properties of a schedule, as specific nodes
in the tree can be choosen and possibly replaced.

This patch does not yet fully move our DependenceInfo pass to schedule trees,
as some additional performance analysis is needed here. (In general schedule
trees should be faster in compile-time, as the more structured representation
is generally easier to analyze and work with). We also can not yet perform the
reduction analysis on schedule trees.

For more information regarding schedule trees, please see Section 6 of
https://lirias.kuleuven.be/handle/123456789/497238

llvm-svn: 242130
2015-07-14 09:33:13 +00:00
Michael Kruse c59f22c556 Update ISL to isl-0.15-3-g532568a
This version adds small integer optimization, but is not active by
default. It will be enabled in a later commit.
    
The schedule-fuse=min/max option has been replaced by the
serialize-sccs option. Adapting Polly was necessary, but retaining the
name polly-opt-fusion=min/max.

Differential Revision: http://reviews.llvm.org/D10505

Reviewers: grosser
llvm-svn: 240027
2015-06-18 16:45:40 +00:00
Tobias Grosser 97d8745087 Dump YAML schedule tree as properly indented tree in DEBUG output
llvm-svn: 238645
2015-05-30 06:46:59 +00:00
Tobias Grosser b2f399264d Update isl to 93b8e43d
This update brings mostly interface cleanups, but also fixes two bugs in
imath (a memory leak, some undefined behavior).

llvm-svn: 238422
2015-05-28 13:32:11 +00:00