Commit Graph

1301 Commits

Author SHA1 Message Date
Tobias Grosser b09bd74da8 [GPGPU] Add llvm.powi to the libdevice supported functions
These intrinsics are used in COSMO.

llvm-svn: 311324
2017-08-21 09:52:08 +00:00
Tobias Grosser 5170b6627a [GPGPU] Add log / logf to the libdevice supported functions
These two functions are used in COSMO

llvm-svn: 311322
2017-08-21 09:00:31 +00:00
Michael Kruse d091bf8d8e [MatMul] Make MatMul detection independent of internal isl representations.
The pattern recognition for MatMul is restrictive.

The number of "disjuncts" in the isl_map containing constraint
information was previously required to be 1
(as per isl_*_coalesce - which should ideally produce a domain map with
a single disjunct, but does not under some circumstances).

This was changed and made more flexible.

Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D36460

llvm-svn: 311302
2017-08-20 21:31:11 +00:00
Tobias Grosser e32498c9c3 Revert "[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]"
We still see some issues with parameter space mismatches. Revert this to get
a clean baseline. We will recommit after these issues have been resolved.

This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49.

llvm-svn: 311268
2017-08-19 23:49:26 +00:00
Tobias Grosser ecb94a0392 [GPGPU] Correctly initialize array order and fixed_element information
Summary:
This information is necessary for PPCG to perform correct life range reordering.
With these changes applied we can live-range reorder some of the important
kernels in COSMO.

We also update and rename one test case, which previously could not be optimized
and now is optimized thanks to live-range reordering. To preserve test coverage
we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises
our automatic abort in case of scalar writes in the kernel.

Reviewers: Meinersbur, bollu, singam-sanjay

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36929

llvm-svn: 311259
2017-08-19 20:21:22 +00:00
Philipp Schaad 50139f0f38 [PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtime
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.

Differential revision: D36925

llvm-svn: 311248
2017-08-19 17:04:57 +00:00
Tobias Grosser 43df2020e7 [GPGPU] Collect parameter dimension used in MemoryAccesses
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.

llvm-svn: 311239
2017-08-19 12:58:28 +00:00
Andreas Simbuerger 8d5b257d02 [Polly][Bug fix] Wrong dependences filtering during Fully Indexed expansion
Summary:
When trying to expand memory accesses, the current version of Polly uses statement Level dependences. The actual implementation is not working in case of multiple dependences per statement. For example in the following source code :
```
void mse(double A[Ni], double B[Nj], double C[Nj], double D[Nj]) {
  int i,j;
  for (j = 0; j < Ni; j++) {
    for (int i = 0; i<Nj; i++)
S:    B[i] = i;
    for (int i = 0; i<Nj; i++)
T:    D[i] = i;

U:  A[j] = B[j];
      C[j] = D[j];
  }
}
```
The statement U has two dependences with S and T. The current version of polly fails during expansion.

This patch aims to fix this bug. For that, we use Reference Level dependences to be able to filter dependences according to statement and memory ref. The principle of expansion remains the same as before.

We also noticed that we need to bail out if load come after store (at the same position) in same statement. So a check was added to isExpandable.

Contributed by: Nicholas Bonfante <nicolas.bonfante@insa-lyon.fr>

Reviewers: Meinersbur, simbuerg, bollu

Reviewed By: Meinersbur, simbuerg

Subscribers: pollydev, llvm-commits

Differential Revision: https://reviews.llvm.org/D36791

llvm-svn: 311165
2017-08-18 15:01:18 +00:00
Tobias Grosser ec02acfb98 [GPGPU] Simplify PPCGSCop to reduce compile time [NFC]
Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accumulate a lot of parameters
that are present in the input memory accesses, but often not necessary to
express the actual dependences. As isl represents maps and sets with dense
matrices, reducing the dimensionality of isl sets commonly reduces code
generation performance.

This reduces compile time from 17 to 11 seconds for our test case. While this is
not impressive, this patch helped me to identify the previous two performance
improvements and additionally also increases readability of the isl data
structures we use.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36869

llvm-svn: 311161
2017-08-18 13:38:12 +00:00
Siddharth Bhat b46847c035 [ScopInliner] Add a simple Scop-based inliner to polly.
We add a ScopInliner pass which inlines functions based on a simple heuristic:
Let `g` call `f`.
If we can model all of `f` as a Scop, we inline `f` into `g`.

This requires `-polly-detect-full-function` to be enabled. So, the pass
asserts that `-polly-detect-full-function` is enabled.

Differential Revision: https://reviews.llvm.org/D36832

llvm-svn: 311126
2017-08-17 21:57:23 +00:00
Siddharth Bhat a2c4112791 [ManagedMemoryRewrite] Rewrite malloc, free correctly inside `Constant`s.
Reuse the machinery built for replacing global arrays to replace malloc/free as
well. Example replacement that was missed earlier:

```
call void \
    bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13)
```

- Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss
this. We don't miss this anymore.

Differential Revision: https://reviews.llvm.org/D36825

llvm-svn: 311121
2017-08-17 20:26:38 +00:00
Tobias Grosser abc5416be1 [GPGPU] Make test case independent of LLVM names
In release builds LLVM may not pass along LLVM names consistently. We make the
test cases independent of the LLVM-IR names to avoid spurious test case
failures.

llvm-svn: 311118
2017-08-17 20:09:02 +00:00
Siddharth Bhat 8a2c07f6d4 [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.
- If we have global arrays, we would like to rewrite them to global
  pointers which are allocated using `cudaMallocManaged`.

- If we have allocas in a function, we would like to rewrite them to
  heap-allocations with `cudaMallocManaged` and `cudaFree`.

- With these rewrite mechanisms, we can offload _any_ function to the
  GPU with no code rewrite whatsover.

Differential Revision: https://reviews.llvm.org/D36516

llvm-svn: 311080
2017-08-17 11:22:52 +00:00
Tobias Grosser ed6a4acc7f Add rewrite by-reference parameter pass
Summary:
This pass detangles induction variables from functions, which take variables by
reference. Most fortran functions compiled with gfortran pass variables by
reference. Unfortunately a common pattern, printf calls of induction variables,
prevent in this situation the promotion of the induction variable to a register,
which again inhibits any kind of loop analysis. To work around this issue
we developed a specialized pass which introduces separate alloca slots for
known-read-only references, which indicate the mem2reg pass that the induction
variables can be promoted to registers and consquently enable SCEV to work.

We currently hardcode the information that a function
_gfortran_transfer_integer_write does not read its second parameter, as
dragonegg does not add the right annotations and we cannot change old dragonegg
releases. Hopefully flang will produce the right annotations.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: mgorny, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36800

llvm-svn: 311066
2017-08-17 05:25:08 +00:00
Tobias Grosser 5502eb0986 Add missing 'REQUIRES' line
llvm-svn: 311046
2017-08-16 22:02:03 +00:00
Tobias Grosser e2a45f32dc [GPGPU] Also record invariant loads as kernel subtree values
Before this change kernels that used invariant loads would have resulted in
invalid PTX code.

llvm-svn: 311042
2017-08-16 21:37:53 +00:00
Jakub Kuderski 8fb57125b0 [Polly] XFAIL ReportLoopHasNoExit tests after r310940
ReportLoopHasNoExit started failing after r310940 that added
infinite loops to postdominators. The change made regions not
contain infinite loops anymore.

This patch unbreaks the polly tree by XFAILING the
ReportLoopHasNoExit test. Full fix is under review in D36776.

llvm-svn: 310980
2017-08-16 00:18:39 +00:00
Philip Pfaffe c3bcdc2f1a [JSON] Make the failure to parse a jscop file a hard error
Summary:
Before, if we fail to parse a jscop file, this will be reported as an
error and importing is aborted. However, this isn't actually strong
enough, since although the import is aborted, the scop has already been
modified and is very likely broken. Instead, make this a hard failure
and throw an LLVM error. This new behaviour requires small changes to
the tests for the legacy pass, namely using `not` to verify the error.
Further, fixed the jscop file for the
base_pointer_load_is_inst_inside_invariant_1 testcase.

Reviewed By: Meinersbur

Split out of D36578.

llvm-svn: 310599
2017-08-10 14:53:25 +00:00
Philip Pfaffe e18f3f6708 Fix 310555: Require pollyacc instead of asserts
llvm-svn: 310595
2017-08-10 14:21:04 +00:00
Philip Pfaffe 0360e5a3c2 Fix r310304: Fix the lit testcases.
In opt, Polly passes are only available after -load.

llvm-svn: 310581
2017-08-10 10:54:26 +00:00
Tobias Grosser 4db39c4829 Add missing 'REQUIRES' line
llvm-svn: 310555
2017-08-10 08:11:47 +00:00
Tobias Grosser cff9696e11 [GPGPU] Make the ast_build available to block generator
This is necessary for partial writes (as used by delicm) to work.

llvm-svn: 310553
2017-08-10 08:00:56 +00:00
Siddharth Bhat 9298ff2dee [ManagedMemoryRewrite] [Polly] Erase original malloc and free. [NFC]
We do not need to keep `malloc` and `free` around since they are
replaced by `polly_{malloc,free}Managed.`

llvm-svn: 310504
2017-08-09 18:19:46 +00:00
Siddharth Bhat 5a1f872623 [ManagedMemoryRewrite] Remove test case that was submitted by mistake. [NFC]
llvm-svn: 310473
2017-08-09 13:34:54 +00:00
Siddharth Bhat c4a4af47f3 [ManagedMemoryRewrite] Introduce a new pass to rewrite modules to use managed memory.
This pass is useful to automatically convert a codebase that uses malloc/free
to use their managed memory counterparts.

Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants.

A future patch will teach ManagedMemoryRewrite to rewrite global arrays
as pointers to globally allocated managed memory.

Differential Revision: https://reviews.llvm.org/D36513

llvm-svn: 310471
2017-08-09 12:59:23 +00:00
Michael Kruse 40d083956c [CodeGen] Use isLatestArrayKind().
Codegen with -polly-parallel queried the unmapped MemoryAccess, but only
the MemoryKind after mapping is relevant for codegen.

This should fix various fails of the
perf-x86_64-penryn-O3-polly-parallel-fast buildbot.

llvm-svn: 310466
2017-08-09 12:27:51 +00:00
Siddharth Bhat 34eeabbca3 [PPCGCodeGeneration] Compute element size in bytes for arrays correctly.
Previously, we used to compute this with `elementSizeInBits / 8`. This
would yield an element size of 0 when the array had element size < 8 in
bits.

To fix this, ask data layout what the size in bytes should be.

Differential Revision: https://reviews.llvm.org/D36459

llvm-svn: 310448
2017-08-09 08:29:16 +00:00
Michael Kruse 235726ee4b [test] Add descriptions and pseudocode to tests. NFC.
llvm-svn: 310385
2017-08-08 17:26:19 +00:00
Roman Gareev 1563f039f5 Use SCEV information for the second level aliasing
We introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. To distinguish two
accesses, the comparison of raw pointers representing base pointers is used.

In case of, for example, ublas's prod function that implements GEMM, and
DeLiCM we can get accesses to same location represented by different raw
pointers. Consequently, we create different alias sets that can prevent
accesses from, for example, being sinked or hoisted.

To avoid the issue, we compare the corresponding SCEV information instead
of the corresponding raw pointers.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D35761

llvm-svn: 310380
2017-08-08 16:50:28 +00:00
Roman Gareev dbde718676 Do not use isl_set_project_out to get all loop prefixes
Currently, only convex isolation sets can be efficiently processed by isl.
Consequently, as a temporary solution, we use a different algorithm for partial
tile isolation that helps to build convex isolation sets in some cases.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D36278

llvm-svn: 310374
2017-08-08 16:15:33 +00:00
Siddharth Bhat 9aca1cb519 [NFC] [PPCGCodeGen] Add missing REQUIRES: pollyacc line.
llvm-svn: 310354
2017-08-08 12:26:37 +00:00
Siddharth Bhat 71dfb3eb07 [Polly] [PPCGCodeGeneration] Handle failing of invariant load hoisting gracefully.
To do this, we replicate what `CodeGeneration` does. We expose
`markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`.

Differential Revision: https://reviews.llvm.org/D36457

llvm-svn: 310350
2017-08-08 12:00:59 +00:00
Michael Kruse 27c010a22e [DeLICM] Properly handle PHI writes becoming empty partial writes.
It is possible that partial writes are empty (write is never executed).
In this case, when in PHINode's incoming edge is never taken such that
the incoming write becomes an empty partial write, if enabled. The
issue is that when converting the union_map to an map, it's space
cannot be derived from the union_map itself. Rather, we need to
determine its space independently.

This fixes test-suite's MultiSource/Benchmarks/ASC_Sequoia/CrystalMk.

llvm-svn: 310348
2017-08-08 11:27:12 +00:00
Tobias Grosser 327e9ecb0d [ScheduleOptimizer] Make matmul pattern detection work with delicm output
In certain cases delicm might decide to not leave the original array write in
the loop body, but to remove it and instead leave a transformed phi node as
write access. This commit teached the matmul pattern detection to order the
memory accesses according to when the access actually happens and use this
information to detect the new pattern. This makes pattern based matmul
optimization work for 2mm and 3mm in polybench 4 after
polly-position=before-vectorizer has been enabled.

llvm-svn: 310338
2017-08-08 06:15:15 +00:00
Tobias Grosser 736c44c848 [test] Add some missing options that become necessary after the recent default changes
llvm-svn: 310315
2017-08-07 22:10:23 +00:00
Tobias Grosser a98081c9f5 [test] Add one more test case for the previous commit
llvm-svn: 310312
2017-08-07 22:02:06 +00:00
Tobias Grosser 2ef378120d [ZoneAlgo] Allow two writes that write identical values into same array slot
Two write statements which write into the very same array slot generally are
conflicting. However, in case the value that is written is identical, this
does not cause any problem. Hence, allow such write pairs in this specific
situation.

llvm-svn: 310311
2017-08-07 22:01:29 +00:00
Andreas Simbuerger 81fb6b3e40 [Polly] Fully-Indexed static expansion
This commit implements the initial version of fully-indexed static
expansion.

```
 for(int i = 0; i<Ni; i++)
   for(int j = 0; j<Ni; j++)
S:     B[j] = j;
T: A[i] = B[i]
```

After the pass, we want this :
```
 for(int i = 0; i<Ni; i++)
   for(int j = 0; j<Ni; j++)
S:     B[i][j] = j;
T: A[i] = B[i][i]
```

For now we bail (fail) in the following cases:
  - Scalar access
  - Multiple writes per SAI
  - MayWrite Access
  - Expansion that leads to an access to the original array

Furthermore: We still miss checks for escaping references to the array
base pointers. A future commit will add the missing escape-checks to
stay correct in those cases. The expansion is still locked behind a
CLI-Option and should not yet be used.

Patch contributed by: Nicholas Bonfante <bonfante.nicolas@gmail.com>

Reviewers: simbuerg, Meinersbur, bollu

Reviewed By: Meinersbur

Subscribers: mgorny, llvm-commits, pollydev

Differential Revision: https://reviews.llvm.org/D34982

llvm-svn: 310304
2017-08-07 20:54:20 +00:00
Michael Kruse 70af4f579d [ForwardOpTree] Use known array content analysis to forward load instructions.
This is an addition to the -polly-optree pass that reuses the array
content analysis from DeLICM to find array elements that contain the
same value as the value loaded when the target statement instance
is executed.

The analysis is now enabled by default.

The known content analysis could also be used to rematerialize any
llvm::Value that was written to some array element, but currently
only loads are forwarded.

Differential Revision: https://reviews.llvm.org/D36380

llvm-svn: 310279
2017-08-07 18:40:29 +00:00
Tobias Grosser aabfbfa5fc Add missing 'REQUIRES: pollyacc' line
llvm-svn: 310197
2017-08-06 11:21:09 +00:00
Tobias Grosser b99c11710c [GPGPU] Make sure managed arrays are prepared at the beginning of the scop
Summary:
This resolves some "instruction does not dominate use" errors, as we used to
prepare the arrays at the location of the first kernel, which not necessarily
dominated all other kernel calls.

Reviewers: Meinersbur, bollu, singam-sanjay

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D36372

llvm-svn: 310196
2017-08-06 11:10:38 +00:00
Tobias Grosser 5b307cdb8a [GPGPU] Rename all, not only the first libdevice function
llvm-svn: 310194
2017-08-06 03:04:15 +00:00
Siddharth Bhat e53c924b0f [Polly] [PPCGCodeGeneration] Deal with loops outside the Scop correctly in PPCGCodeGeneration.
A Scop with a loop outside it is not handled currently by
PPCGCodeGeneration. The test case is such that the Scop has only one inner loop
that is detected. This currently breaks codegen.

The fix is to reuse the existing mechanism in `IslNodeBuilder` within
`GPUNodeBuilder.

Differential Revision: https://reviews.llvm.org/D36290

llvm-svn: 310193
2017-08-06 02:39:05 +00:00
Tobias Grosser c1cfe0a828 Add missing REQUIRES line
llvm-svn: 309943
2017-08-03 14:46:53 +00:00
Tobias Grosser b5563c6817 Make sure that all parameter dimensions are set in schedule
Summary:
In case the option -polly-ignore-parameter-bounds is set, not all parameters
will be added to context and domains. This is useful to keep the size of the
sets and maps we work with small. Unfortunately, for AST generation it is
necessary to ensure all parameters are part of the schedule tree. Hence,
we modify the GPGPU code generation to make sure this is the case.

To obtain the necessary information we expose a new function
Scop::getFullParamSpace(). We also make a couple of functions const to be
able to make SCoP::getFullParamSpace() const.

Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36243

llvm-svn: 309939
2017-08-03 13:51:15 +00:00
Michael Kruse 291fd8074e [test] Fix test case without Polly-ACC.
llvm-svn: 309938
2017-08-03 13:44:31 +00:00
Siddharth Bhat eadf76d34a [PPCGCodeGeneration] Construct `isl_multi_pw_aff` of PPCGArray.bounds even when polly-ignore-parameter-bounds is turned on.
When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain
all the paramters present in the program.

The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff`
to have the same parameter dimensions. To achieve this, we used to realign
every `pw_aff` with `Scop::Context`. However, in conjunction with
`-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context`
does not contain all parameters.

We set this up correctly by creating a space that has all the parameters
used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space.

llvm-svn: 309934
2017-08-03 12:09:33 +00:00
Singapuram Sanjay Srivallabh 188053af5e Remove debug metadata from copied instruction to prevent GPUModule verification failure
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**

When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.

The only debug metadata of the instruction, the DebugLoc, is unset by this patch.

This patch reattempts https://reviews.llvm.org/D35630 by targeting only those instructions that are to end up in a Module meant for the GPU.

Reviewers: grosser, bollu

Reviewed By: grosser

Subscribers: pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36161

llvm-svn: 309822
2017-08-02 15:20:07 +00:00
Michael Kruse bc88a78cb4 [Simplify] Rewrite redundant write detection algorithm.
The previous algorithm was to search a writes and the sours of its value
operand, and see whether the write just stores the same read value back,
which includes a search whether there is another write access between
them. This is O(n^2) in the max number of accesses in a statement
(+ the complexity of isl comparing the access functions).

The new algorithm is more similar to the one used for searching for
overwrites and coalescable writes. It scans over all accesses in order
of execution while tracking which array elements still have the same
value since it was read. This is O(n), not counting the complexity
within isl. It should be more reliable than trying to catch all
non-conforming cases in the previous approach. It is also less code.

We now also support if the write is a partial write of the read's
domain, and to some extent non-affine subregions.

Differential Revision: https://reviews.llvm.org/D36137

llvm-svn: 309734
2017-08-01 20:01:34 +00:00
Michael Kruse 693ef99935 [Simplify] Improve scalability.
With a lot of reads and writes to the same array in a statement,
some isl sets that capture the state between access can become
complex such that isl takes more considerable time and memory
for operations on them.

The problems identified were:

- is_subset() takes considerable time with many disjoints in the
  arguments. We limit the number of disjoints to 4, any additional
  information is thrown away.

- subtract() can lead to many disjoints. We instead assume that any
  array element is possibly accessed, which removes all disjoints.

- subtract_domain() may lead to considerable processing, even if all
  elements are are to be removed. Instead, we remove determine and
  remove the affected spaces manually. No behaviour is changed.

llvm-svn: 309728
2017-08-01 19:39:11 +00:00
Siddharth Bhat 1ec9cba4e3 [NFC] Add 'REQUIRES: pollyacc' on 'test/GPGPU/invariant-load-hoisting-of-array.ll'
- Should fix broken build due to `r309681`.

llvm-svn: 309686
2017-08-01 14:52:18 +00:00
Siddharth Bhat edf9581e4c [PPCGCodeGeneration] Correct usage of llvm::Value with getLatestValue.
It is possible that the `HostPtr` that coresponds to an array could be
invariant load hoisted. Make sure we use the invariant load hoisted
value by using `IslNodeBuilder::getLatestValue`.

Differential Revision: https://reviews.llvm.org/D36001

llvm-svn: 309681
2017-08-01 14:26:39 +00:00
Michael Kruse 9f6e41cdba [ForwardOpTree] Support synthesizable values.
This allows -polly-optree to move instructions that depend on
synthesizable values.

The difficulty for synthesizable values is that their value depends on
the location. When it is moved over a loop header, and the SCEV
expression depends on the loop induction variable (SCEVAddRecExpr), it
would use the current induction variable instead of the last one.

At the moment we cannot forward PHI nodes such that crossing the header
of loops referenced by SCEVAddRecExpr is not possible (assuming the loop
header has at least two incoming blocks: for entering the loop and the
backedge, such any instruction to be forwarded must have a phi between
use and definition).

A remaining issue is when the forwarded value is used after the loop,
but is only synthesizable inside the loop. This happens e.g. if
ScalarEvolution is unable to determine the number of loop iterations or
the initial loop value. We do not forward in this situation.

Differential Revision: https://reviews.llvm.org/D36102

llvm-svn: 309609
2017-07-31 19:46:21 +00:00
Michael Kruse 57cc92b790 [Simplify] Remove all kinds of redundant scalar writes.
In addition to array and PHI writes, also allow scalar value writes.
The only kind of write not allowed are writes by functions
(including memcpy/memmove/memset).

llvm-svn: 309582
2017-07-31 17:04:55 +00:00
Tobias Grosser 8fc6cdfb1c [GPGPU] Add support for NVIDIA libdevice
Summary:
This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp, cos and copysign functions. Other
functions will be enabled as needed.

Reviewers: bollu, singam-sanjay

Reviewed By: bollu

Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35703

llvm-svn: 309560
2017-07-31 14:03:16 +00:00
Tobias Grosser 39977e4e76 Revert "Remove Debug metadata from copied instruction to prevent Module verification failure"
This reverts commit r309490 as it triggers on our AOSP buildbut error messages
of the form:

inlinable function call in a function with debug info must have a !dbg location

llvm-svn: 309556
2017-07-31 11:43:38 +00:00
Singapuram Sanjay Srivallabh cf9a813368 Remove Debug metadata from copied instruction to prevent Module verification failure
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**

When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.

The only debug metadata of the instruction, the DebugLoc, is unset by this patch.

Reviewers: grosser, bollu, Meinersbur

Reviewed By: grosser, bollu

Subscribers: pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35630

llvm-svn: 309490
2017-07-29 18:03:49 +00:00
Michael Kruse ce9617f4fe [Simplify] Implement write accesses coalescing.
Write coalescing combines write accesses that

- Write the same llvm::Value.
- Write to the same array.
- Unless they do not write anything in a statement instance (partial
  writes), write to the same element.
- There is no other access between them that accesses the same element.

This is particularly useful after DeLICM, which leaves partial writes to
disjoint domains.

Differential Revision: https://reviews.llvm.org/D36010

llvm-svn: 309489
2017-07-29 16:21:16 +00:00
Michael Kruse 4335c3992a [test] Add test case for -polly-simplify. NFC.
llvm-svn: 309458
2017-07-29 00:06:06 +00:00
Michael Kruse 8e41d2baab [Simplify] Do not remove dependencies of phis within region stmts.
These were wrongly assumed to be phi nodes that require
MemoryKind::PHI accesses.

llvm-svn: 309454
2017-07-28 23:22:32 +00:00
Adrian Prantl 99c4a5fb8e Remove offset parameter from llvm.dbg.value intrinsics in testcase
llvm-svn: 309433
2017-07-28 21:08:53 +00:00
Michael Kruse c99209b4b2 [test] Fix typo in filename. NFC.
llvm-svn: 309403
2017-07-28 16:57:56 +00:00
Michael Kruse 6c8f91b908 [Simplify] Fix typo in statistics output. NFC.
llvm-svn: 309402
2017-07-28 16:57:51 +00:00
Michael Kruse 34a77780c5 [Simplify] Remove empty partial accesses first. NFC.
So follow-up cleanup do not need special handling for such accesses.

llvm-svn: 309401
2017-07-28 16:57:45 +00:00
Siddharth Bhat 0a1177b58e [ScopDetect] add `-polly-ignore-func` flag to ignore functions by name.
Ignore all functions whose name match a regex. Useful because creating a
regex that does *not* match a string is somewhat hard.

Example:
https://stackoverflow.com/questions/1240275/how-to-negate-specific-word-in-regex

llvm-svn: 309377
2017-07-28 11:47:24 +00:00
Tobias Grosser 25271b91b2 [GPGPU] Do not require the Scop::Context to have information about all parameters
llvm-svn: 309368
2017-07-28 06:49:44 +00:00
Tobias Grosser 30caae6d23 [GPGPU] Fix compilation issue with latest CUDA upgrade to i128
llvm-svn: 309366
2017-07-28 06:38:49 +00:00
Michael Kruse 8a8aca4299 [Simplify] Count PHINodes in simplifiable exit nodes as escaping use.
After region exit simplification, the incoming block of a phi node in
the SCoP region's exit block lands outside of the region. Since we
treat SCoPs as if this already happened, we need to account for that
when looking for outside uses of scalars (i.e. escaping scalars).

llvm-svn: 309271
2017-07-27 14:09:31 +00:00
Michael Kruse 95b39da8ae [Simplify] Fix invalid removal write for escaping values.
A PHI node's incoming block is the user of its operand, not the PHI's parent.

Assuming the PHINode's parent being the user lead to the removal of a
MemoryAccesses because its use was assumed to be inside of the SCoP.

llvm-svn: 309164
2017-07-26 19:58:15 +00:00
Michael Kruse 11ed062258 [SCEVValidator] Loop exit values of loops before the SCoP are synthesizable.
In the following loop:

   int i;
   for (i = 0; i < func(); i+=1)
     ;
SCoP:
   for (int j = 0; j<n; j+=1)
     S(i, j)

The value i is synthesizable in the SCoP that includes only the j-loop.
This is because i is fixed within the SCoP, it is irrelevant whether
it originates from another loop.

This fixes a strange case where a PHI was synthesiable in a SCoP,
but not its incoming value, triggering an assertion.

This should fix MultiSource/Applications/sgefa/sgefa of the
perf-x86_64-penryn-O3-polly-before-vectorizer-unprofitable buildbot.

llvm-svn: 309109
2017-07-26 13:05:45 +00:00
Philip Pfaffe 85cc5687df [IslAst] Untangle IslAst lit-testcases from specifics of the legacy-PM
Summary:
This consists instances of two changes:

- Accept any order of checks for a specific loop form, that appear in different order in the new vs legacy-PM.
- Remove checks for specific regions.

Reviewers: grosser

Reviewed By: grosser

Subscribers: pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35837

llvm-svn: 308976
2017-07-25 15:07:42 +00:00
Michael Kruse b6317007b4 [ScopInfo] Fix assertion for PHIs not in a region stmts entry.
A PHI node within a region statement is legal, but does not have
a MemoryKind::PHI access.

llvm-svn: 308973
2017-07-25 13:28:39 +00:00
Siddharth Bhat 43f178bbc9 [PPCGCodeGeneration] Skip arrays with empty extent.
Invariant load hoisted scalars, and arrays whose size we can statically compute
to be 0 do not need to be allocated as arrays.

Invariant load hoisted scalars are sent to the kernel directly as parameters.

Earlier, we used to allocate `0` bytes of memory for these because our
computation of size from `PPCGCodeGeneration::getArraySize` would result in `0`.

Now, since we don't invariant loads as arrays in PPCGCodeGeneration, this
problem does not occur anymore.

Differential Revision: https://reviews.llvm.org/D35795

llvm-svn: 308971
2017-07-25 12:35:36 +00:00
Michael Kruse 07e8c36dc7 [ForwardOpTree] Support read-only value uses.
Read-only values (values defined before the SCoP) require special
handing with -polly-analyze-read-only-scalars=true (which is the
default). If active, each use of a value requires a read access.
When a copied value uses a read-only value, we must also ensure that
such a MemoryAccess is available or is created.

Differential Revision: https://reviews.llvm.org/D35764

llvm-svn: 308876
2017-07-24 12:43:27 +00:00
Siddharth Bhat e2699b572e [Polly] [NFC] [ScopDetection] Make `polly-only-func` perform regex scop name match.
Summary:

- We were using `.count` in `StringRef`, which matches substrings.
- We may want to use this for equality as well.
- Generalise this, so allow regexes as a parameter to `polly-only-func`.

Differential Revision: https://reviews.llvm.org/D35728

llvm-svn: 308875
2017-07-24 12:40:52 +00:00
Michael Kruse ab8f0d57df [Simplify] Remove partial write accesses with empty domain.
If the access relation's domain is empty, the access will never be
executed. We can just remove it.

We only remove write accesses. Partial read accesses are not yet
supported and instructions in the statement might require the
llvm::Value holding the read's result to be defined.

llvm-svn: 308830
2017-07-22 20:33:09 +00:00
Michael Kruse e5f4706a55 [ForwardOpTree] Support hoisted invariant loads.
Hoisted loads can be trivially supported because there are no
MemoryAccess to be modified, the loaded value is just available
at code generation.

llvm-svn: 308826
2017-07-22 14:30:02 +00:00
Michael Kruse a6b2de3b59 [ForwardOpTree] Introduce the -polly-optree pass.
This pass 'forwards' operand trees into statements that use them in
order to avoid scalar dependencies.

This minimal implementation handles only the case of speculatable
instructions. We will successively add support for:
- Hoisted loads
- Read-only values
- Synthesizable values
- Loads
- PHIs
- Forwarding only parts of the tree

Differential Revision: https://reviews.llvm.org/D35754

llvm-svn: 308825
2017-07-22 14:02:47 +00:00
Philip Pfaffe 8f6c48e2aa Untangle ScopInfo lit-testcases from specifics of the legacy-PM
Summary:
For the ScopInfo lit testsuite, this patch removes some dependences on output behaviour of the legacy PM.

In most cases, these tests checked the tool output for labels created by the pass printer in the legacy PM. This doesn't work for the new PM anymore. Untangling the testcases is the first step to porting the testsuite for the new PM infrastructure.

Reviewers: grosser, Meinersbur, bollu

Reviewed By: grosser

Subscribers: llvm-commits, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35727

llvm-svn: 308754
2017-07-21 16:47:36 +00:00
Philipp Schaad 2f3073b5cb [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for Intel
Summary:
Added SPIR Code Generation to the PPCG Code Generator. This can be invoked using
the polly-gpu-arch flag value 'spir32' or 'spir64' for 32 and 64 bit code respectively.
In addition to that, runtime support has been added to execute said SPIR code on Intel
GPU's, where the system is equipped with Intel's open source driver Beignet (development
version). This requires the cmake flag 'USE_INTEL_OCL' to be turned on, and the polly-gpu-runtime
flag value to be 'libopencl'.
The transformation of LLVM IR to SPIR is currently quite a hack, consisting in part of regex
string transformations.
Has been tested (working) with Polybench 3.2 on an Intel i7-5500U (integrated graphics chip).

Reviewers: bollu, grosser, Meinersbur, singam-sanjay

Reviewed By: grosser, singam-sanjay

Subscribers: pollydev, nemanjai, mgorny, Anastasia, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35185

llvm-svn: 308751
2017-07-21 16:11:06 +00:00
Tobias Grosser 1eeedf4829 [IslNodeBuilder] Relax complexity check in invariant loads and run it early
When performing invariant load hoisting we check that invariant load expressions
are not too complex. Up to this commit, we performed this check by counting the
sum of dimensions in the access range as a very simple heuristic. This heuristic
is a little too conservative, as it prevents hoisting for any scops with a
very large number of parameters. Hence, we update the heuristic to only count
existentially quantified dimensions and set dimensions. We expect this to still
detect the problematic expressions in h264 because of which this check was
originally introduced.

For some unknown reason, this complexity check was originally committed in
IslNodeBuilder. It really belongs in ScopInfo, as there is no point in
optimizing a program which we could have known earlier cannot be code generated.
The benefit of running the check early is that we can avoid to even hoist checks
that are expensive to code generate as invariant loads. This can be seen in
the changed tests, where we now indeed detect the scop, but just not invariant
load hoist the complicated access.

We also improve the formatting of the code, document it, and use isl++ to
simplify expressions.

llvm-svn: 308659
2017-07-20 19:55:19 +00:00
Tobias Grosser 54491db687 Support fabs and copysign in Polly-ACC
llvm-svn: 308649
2017-07-20 18:26:34 +00:00
Michael Kruse 22058c3fbb [Simplify] Remove unused instructions and accesses.
Use a mark-and-sweep algorithm to find and remove unused instructions
and MemoryAccesses. This is useful in particular to remove scalar
writes that are never used anywhere. A scalar write in a loop induces
a write-after-write dependency that stops the loop iterations to be
rescheduled. Such writes can be a result of previous transformations
such as DeLICM and operand tree forwarding.

It adds a new class VirtualInstruction that represents an instruction in
a particular statement. At the moment an instruction can only belong to
the statement that represents a BasicBlock. In the future, instructions
can be in one of multiple statements representing a BasicBlock
(Nandini's work), in different statements than its BasicBlock would
indicate, and even multiple statements at once (by forwarding operand
trees). It also integrates nicely with the VirtualUse class.

ScopStmt::contains(Instruction*) currently uses the instruction's parent
BasicBlock to check whether it contains the instruction. It will need to
check the actual statement list when one of the aforementioned features
become possible.

Differential Revision: https://reviews.llvm.org/D35656

llvm-svn: 308626
2017-07-20 16:21:55 +00:00
Siddharth Bhat 9e3db2b756 [PPCGCodeGen] [3/3] Update PPCGCodeGen + tests to latest ppcg.
This commit *WILL COMPILE*.

1. `PPCG` now uses `isl_multi_pw_aff` instead of an array of `pw_aff`.
   This needs us to adjust how we index array bounds and how we construct
   array bounds.

2. `PPCG` introduces two new kinds of nodes: `init_device` and `clear_device`.
   We should investigate what the correct way to handle these are.

3. `PPCG` has gotten smarter with its use of live range reordering, so some of
   the tests have a qualitative improvement.

4. `PPCG` changed its output style, so many test cases need to be updated to
   fit the new style for `polly-acc-dump-code` checks.

Differential Revision: https://reviews.llvm.org/D35677

llvm-svn: 308625
2017-07-20 15:48:36 +00:00
Michael Kruse 0865585eab [ScopInfo] Add support for wrap-around of integers in unsigned comparisons.
This is one possible solution to implement wrap-arounds for integers in
unsigned icmp operations. For example,

    store i32 -1, i32* %A_addr
    %0 = load i32, i32* %A_addr
    %1 = icmp ult i32 %0, 0

%1 should hold false, because under the assumption of unsigned integers,
-1 should wrap around to 2^32-1. However, previously. it was assumed
that the MSB (Most Significant Bit - aka the Sign bit) was never set for
integers in unsigned operations.

This patch modifies the buildConditionSets function in ScopInfo.cpp to
give better information about the integers in these unsigned
comparisons.

Contributed-by: Annanay Agarwal <cs14btech11001@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D35464

llvm-svn: 308608
2017-07-20 12:37:02 +00:00
Philip Pfaffe 17b1ecfdc5 [CMake] Fix r307650: Readd missing dependency.
The commit erroneously removed the dependency of the Polly tests on
things like opt and FileCheck. Add that dependency back.

llvm-svn: 308512
2017-07-19 19:20:58 +00:00
Roman Gareev 5abea0c97a [FIX] Update test/ScheduleOptimizer/pattern-matching-based-opts_11.ll.
llvm-svn: 308501
2017-07-19 18:01:51 +00:00
Roman Gareev 6531df41ae [FIX] Fix pattern-matching-based-opts_11.ll.
llvm-svn: 308499
2017-07-19 17:33:42 +00:00
Roman Gareev 750374181b Make the pattern matching work with modified memory accesses
Some optimizations (e.g., DeLICM) can modify memory accesses (e.g., change
their MemoryKind). Consequently, the pattern matching should take it into
the account.

Reviewed-by: Tobias Grosser <tobias@grosser.es>,
             Michael Kruse <llvm@meinersbur.de>

Differential Revision: https://reviews.llvm.org/D33138

llvm-svn: 308494
2017-07-19 16:59:06 +00:00
Michael Kruse bb7d22a31a [Test] Do not pipe binary data to FileCheck.
llvm-svn: 308437
2017-07-19 11:12:16 +00:00
Eli Friedman e737fc120e [Polly] [OptDiag] Updating Polly Diagnostics Remarks
Utilizing newer LLVM diagnostic remark API in order to enable use of
opt-viewer tool. Polly Diagnostic Remarks also now appear in YAML
remark file.

In this patch, I've added the OptimizationRemarkEmitter into certain
classes where remarks are being emitted and update the remark emit calls
itself. I also provide each remark a BasicBlock or Instruction from where
it is being called, in order to compute the hotness of the remark.

Patch by Tarun Rajendran!

Differential Revision: https://reviews.llvm.org/D35399

llvm-svn: 308233
2017-07-17 23:58:33 +00:00
Tobias Grosser 4556c9b8fe [ScopInfo] Simplify new access functions under domain context
Summary:
We do not keep domain constraints on access functions when building the
scop. Hence, for consistency reasons, it makes also sense to not include
them when storing a new access function. This change results in simpler
access functions that make output easier to read.

This patch also helps to make DeLICMed memory accesses to be understood by
our matrix multiplication pattern matching pass. Further changes to the
matrix multiplication pattern matching are needed for this to work, so the
corresponding test case will be added in a future commit.

Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg

Subscribers: pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35237

llvm-svn: 308215
2017-07-17 20:47:10 +00:00
Siddharth Bhat 233d717ec1 [PPCGCodeGeneration] Generate invariant loads before trying to generate IR.
- We should call `preloadInvariantLoads` to make sure that code is
   generated for invariant loads in the kernel.

Differential Revision: https://reviews.llvm.org/D35410

llvm-svn: 308187
2017-07-17 15:57:01 +00:00
Tobias Grosser a3aa423fc3 [ScopDetection] If a loop is not part of a scop, none of it backedges can be
This patch makes sure that in case a loop is not fully contained within a region
that later forms a SCoP, none of the loop backedges are allowed to be part of
the region. We currently do not support the situation where only some of a loops
backedges are part of a scop. Today, this can break both scop modeling and code
generation. One such breaking test case is for example
test/ScopDetectionDiagnostics/loop_partially_in_scop-2.ll, where we totally
forgot to code generate some of the backedges. Fortunately, it is commonly not
necessary to support these partial loops, it is way more common that either
no backedge is included in a region or all loop backedge are included.

This fixes a recent miscompile in
MultiSource/Benchmarks/MiBench/consumer-typeset which was exposed after
r306477.

llvm-svn: 308113
2017-07-15 22:42:17 +00:00
Siddharth Bhat 03346c2701 [PPCGCodeGeneration] Fix runtime check adjustments since they make assumptions about BB layout.
- There is a conditional branch that is used to switch between the old
   and new versions of the code.

- If we detect that the build was unsuccessful, `PPCGCodeGeneration` will
  change the runtime check to be always set to false.

- To actually *reach* this runtime check instruction, `PPCGCodeGeneration`
  was using assumptions about the layout of the BBs.

- However, invariant load hoisting violates this assumption by inserting
  an extra basic block in the middle.

- Fix the assumption on the layout by having `createScopConditionally`
   return the conditional branch instruction.

- Use this reference to set to always-false.

llvm-svn: 308010
2017-07-14 10:00:25 +00:00
Siddharth Bhat a1b2086a33 [Invariant Loads] Do not consider invariant loads to have dependences.
We need to relax constraints on invariant loads so that they do not
create fake RAW dependences. So, we do not consider invariant loads as
scalar dependences in a region.

During these changes, it turned out that we do not consider `llvm::Value`
replacements correctly within `PPCGCodeGeneration` and `ISLNodeBuilder`.
The replacements dictated by `ValueMap` were not being followed in all
places. This was fixed in this commit. There is no clean way to decouple
this change because this bug only seems to arise when the relaxed
version of invariant load hoisting was enabled.

Differential Revision: https://reviews.llvm.org/D35120

llvm-svn: 307907
2017-07-13 12:18:56 +00:00
Singapuram Sanjay Srivallabh 1abd9ffa37 [PPCGCodeGen] Differentiate kernels based on their parent Scop
Summary:
Add a sequence number that identifies a ptx_kernel's parent Scop within a function to it's name to differentiate it from other kernels produced from the same function, yet different Scops.

Kernels produced from different Scops can end up having the same name. Consider a function with 2 Scops and each Scop being able to produce just one kernel. Both of these kernels have the name "kernel_0". This can lead to the wrong kernel being launched when the runtime picks a kernel from its cache based on the name alone. This patch supplements D33985, by differentiating kernels across Scops as well.

Previously (even before D33985) while profiling kernels generated through JIT e.g. Julia, [[ https://groups.google.com/d/msg/polly-dev/J1j587H3-Qw/mR-jfL16BgAJ | kernels associated with different functions, and even different SCoPs within a function, would be grouped together due to the common name ]]. This patch prevents this grouping and the kernels are reported separately.

Reviewers: grosser, bollu

Reviewed By: grosser

Subscribers: mehdi_amini, nemanjai, pollydev, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35176

llvm-svn: 307814
2017-07-12 16:46:19 +00:00
Siddharth Bhat 87fa280831 [Polly] [Tests] Update `lit.cfg` uses of `lit.util.capture` to `subprocess.check_output`
- `lit.util.capture` was removed in `r306625`.
- Replace `lit.util.capture` to `subprocess.check_output` as LLVM did.
- LLVM revision of this change: `https://reviews.llvm.org/D35088`.

Differential Revision: https://reviews.llvm.org/D35255

llvm-svn: 307765
2017-07-12 09:42:05 +00:00
Tobias Grosser bed2ca6eac [Simplify] Also remove redundant writes which originally came from PHI nodes
llvm-svn: 307660
2017-07-11 14:29:39 +00:00
Philip Pfaffe 54df93d60e [Polly][CMake] Skip unit-tests in lit if gtest is not available
Summary:
There is a bug in the current lit configurations for the unittests. If gtest is not available, the site-config for the unit tests won't be generated. Because lit recurses through the test directory, the lit configuration for the unit tests will be discovered nevertheless, leading to a fatal error in lit.

This patch semi-gracefully skips the unittests if gtest is not available. As a result, running lit now prints this: `warning: test suite 'Polly-Unit' contained no test`.

If people think that this is too annoying, the alternative would be to pick apart the test directory, so that the lit testsuite discovery will always only find one configuration. In fact, both of these things could be combined. While it's certainly nice that running a single lit command runs all the tests, I suppose people use the `check-polly` make target over lit most of the time, so the difference might not be noticed.

Reviewers: Meinersbur, grosser

Reviewed By: grosser

Subscribers: mgorny, bollu, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D34053

llvm-svn: 307651
2017-07-11 11:37:35 +00:00