Commit Graph

912 Commits

Author SHA1 Message Date
Siddharth Bhat 78027437e6 [Polly] [PPCGCodeGeneration] Mild refactoring of checking validity of functions in a kernel.
This is a stylistic change to make the function a little more readable.
Also add a debug print to show what instruction contains a use of a
function we don't understand in the kernel.

Differential Revision: https://reviews.llvm.org/D37058

llvm-svn: 311648
2017-08-24 09:54:15 +00:00
Michael Kruse 06ed529205 Add more statistics.
Add statistics about
- Which optimizations are applied
- Number of loops in Scops at various stages
- Number of scalar/singleton writes at various stages representative
  for scalar false dependencies
- Number of parallel loops

These will be useful to find regressions due to moving Polly further
down of LLVM's pass pipeline.

Differential Revision: https://reviews.llvm.org/D37049

llvm-svn: 311553
2017-08-23 13:50:30 +00:00
Michael Kruse 3044dc51cf [PPCGCodeGen] Fix compiler warning: '<': signed/unsigned mismatch. NFC.
MSVC warns about comparison between a signed and unsigned integer.
The rules of C(++) define that an unsigned comparison has to be
carried-out in this case. This is unlikely to be intended.

Fix by assigning the loop's upper bound to a signed integer first.
This also avoids repeated evaluation of the invariant upper bound.

llvm-svn: 311548
2017-08-23 12:45:25 +00:00
Tobias Grosser 4a07bbe3f6 [IRBuilder] Only emit alias scop metadata for arrays, but not scalars
Summary:
There is no need to emit alias metadata for scalars, as basicaa will easily
distinguish them from arrays. This reduces the size of the metadata we generate.
This is especially useful after we moved to -polly-position=before-vectorizer,
where a lot more scalar dependences are introduced, which increased the size of
the alias analysis metadata and made us commonly reach the limits after which
we do not emit alias metadata that have been introduced to prevent quadratic
growth of this alias metadata.

This improves 2mm performance from 1.5 seconds to 0.17 seconds.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: Meinersbur

Subscribers: pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D37028

llvm-svn: 311498
2017-08-22 21:58:48 +00:00
Roman Gareev 6bfeba24d3 [NFC] Fix the broken comment.
llvm-svn: 311477
2017-08-22 17:43:03 +00:00
Roman Gareev 0956a606ff Disable the Loop Vectorizer in case of GEMM
Currently, in case of GEMM and the pattern matching based optimizations, we
use only the SLP Vectorizer out of two LLVM vectorizers. Since the Loop
Vectorizer can get in the way of optimal code generation, we disable the Loop
Vectorizer for the innermost loop using mark nodes and emitting the
corresponding metadata.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D36928

llvm-svn: 311473
2017-08-22 17:38:46 +00:00
Siddharth Bhat cb5155bf6d [ManagedMemoryRewrite] Use `unit64_t` to store size, not `int`.
llvm-svn: 311440
2017-08-22 09:30:37 +00:00
Siddharth Bhat 603544863f [ManagedMemoryRewrite] Get size in bytes rather than in bits and dividing by 8.
llvm-svn: 311439
2017-08-22 09:27:41 +00:00
Siddharth Bhat a8c329b0eb [ManagedMemoryRewrite] slightly tweak debug output style. [NFC]
llvm-svn: 311361
2017-08-21 18:58:33 +00:00
Siddharth Bhat 557ce3a8b0 [ManagedMemoryRewrite] Print reasons for skipping global array to dbgs(). [NFC]
llvm-svn: 311360
2017-08-21 18:52:15 +00:00
Siddharth Bhat 0a198dc18a [ManagedMemoryRewrite] hide debug output behing DEBUG(...). [NFC]
llvm-svn: 311331
2017-08-21 12:51:57 +00:00
Siddharth Bhat 7b9f5ca27e [PPCGCodeGeneration] Enable `polly-codegen-perf-monitoring` for PPCGCodegen.
This feature was not enabled for `PPCGCodeGeneration`. Now that this is
enabled, we can benchmark Scops that have been optimised with
`-polly-codegen-ppcg` with the `-polly-codegen-perf-monitoring` option.

Differential Revision: https://reviews.llvm.org/D36934

llvm-svn: 311328
2017-08-21 11:44:01 +00:00
Tobias Grosser b09bd74da8 [GPGPU] Add llvm.powi to the libdevice supported functions
These intrinsics are used in COSMO.

llvm-svn: 311324
2017-08-21 09:52:08 +00:00
Tobias Grosser 5170b6627a [GPGPU] Add log / logf to the libdevice supported functions
These two functions are used in COSMO

llvm-svn: 311322
2017-08-21 09:00:31 +00:00
Tobias Grosser e32498c9c3 Revert "[GPGPU] Simplify PPCGSCop to reduce compile time [NFC]"
We still see some issues with parameter space mismatches. Revert this to get
a clean baseline. We will recommit after these issues have been resolved.

This reverts commit 0e360a14194f722ded7aa2bc9d4be2ed2efeeb49.

llvm-svn: 311268
2017-08-19 23:49:26 +00:00
Tobias Grosser 9041118983 [ManagedMemoryRewrite] Make pass more robust and fix memory issue
Instead of using Twines and temporary expressions, we do string manipulation
through a std::string. This resolves a memory corruption issue, which likely
was caused by twines loosing their underlying string too soon.

llvm-svn: 311264
2017-08-19 23:03:45 +00:00
Siddharth Bhat 205a78a6f9 [ManagedMemoryRewrite] Iterate over operands of the expanded instruction, not the constantexpr itself.
- We should iterate over `I`, which is `Cur` expanded out to an
instruction, and not `Cur` itself.

- This is a bugfix.

Differential Revision: https://reviews.llvm.org/D36923

llvm-svn: 311261
2017-08-19 20:52:11 +00:00
Tobias Grosser ecb94a0392 [GPGPU] Correctly initialize array order and fixed_element information
Summary:
This information is necessary for PPCG to perform correct life range reordering.
With these changes applied we can live-range reorder some of the important
kernels in COSMO.

We also update and rename one test case, which previously could not be optimized
and now is optimized thanks to live-range reordering. To preserve test coverage
we add a new test case scalar-writes-in-scop-requires-abort.ll, which exercises
our automatic abort in case of scalar writes in the kernel.

Reviewers: Meinersbur, bollu, singam-sanjay

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36929

llvm-svn: 311259
2017-08-19 20:21:22 +00:00
Philipp Schaad 50139f0f38 [PPCG] Only add Kernel argument sizes for OpenCL, not CUDA runtime
Kernel argument sizes now only get appended to the kernel launch parameter list if the OpenCL runtime is selected, not if CUDA runtime is chosen.

Differential revision: D36925

llvm-svn: 311248
2017-08-19 17:04:57 +00:00
Tobias Grosser 9f2eb24c06 Clarify the intend of the run-time check
llvm-svn: 311243
2017-08-19 16:26:39 +00:00
Tobias Grosser 43df2020e7 [GPGPU] Collect parameter dimension used in MemoryAccesses
When using -polly-ignore-integer-wrapping and -polly-acc-codegen-managed-memory
we add parameter dimensions lazily to the domains, which results in PPCG not
including parameter dimensions that are only used in memory accesses in the
kernel space. To make sure these parameters are still passed to the kernel, we
collect these parameter dimensions and align the kernel's parameter space
before code-generating it.

llvm-svn: 311239
2017-08-19 12:58:28 +00:00
Tobias Grosser ec02acfb98 [GPGPU] Simplify PPCGSCop to reduce compile time [NFC]
Summary:
Drop unused parameter dimensions to reduce the size of the sets we are working
with. Especially the computed dependences tend to accumulate a lot of parameters
that are present in the input memory accesses, but often not necessary to
express the actual dependences. As isl represents maps and sets with dense
matrices, reducing the dimensionality of isl sets commonly reduces code
generation performance.

This reduces compile time from 17 to 11 seconds for our test case. While this is
not impressive, this patch helped me to identify the previous two performance
improvements and additionally also increases readability of the isl data
structures we use.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36869

llvm-svn: 311161
2017-08-18 13:38:12 +00:00
Siddharth Bhat 656e629572 [Polly] [PPCGCodeGeneration] Print current Scop and loop depth in PPCGCodeGen. [NFC]
Differential Revision: https://reviews.llvm.org/D36871

llvm-svn: 311158
2017-08-18 13:16:58 +00:00
Tobias Grosser 861a387fac [GPGPU] Do not create copy statements when targetting managed memory
Summary:
They are not used and consequently do not even need to be computed. This reduces
the overall compile time for our kernel from 1m33s to 17s.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36868

llvm-svn: 311157
2017-08-18 13:11:05 +00:00
Tobias Grosser 62acb344d0 [GPGPU] Synchronize after each kernel, not each copy out
Summary:
This change reduces the overall number of synchronize calls for kernels with
a lot of output data at the cost of additional synchronize calls for kernels
launched in sequence without any device to host transfers in between. As the
latter pattern is a lot less frequent, this seems a better tradeoff.

Even though the above motivation would be motivation enough, this is just
a step towards enabling ppcg to not compute to and from device copy calls
at all, which would be incorrect in case we still relied on these calls to
place our synchronization statements.

Reviewers: Meinersbur, bollu, singam-sanjay

Reviewed By: bollu

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36867

llvm-svn: 311155
2017-08-18 12:55:58 +00:00
Tobias Grosser fa03cb7687 [GPGPU] Only collect the access that belong to an array [NFC]
This avoid the construction of very large sets and in many cases also keeps the
number of parameters low. As a result, we see a compile time reduction from 5
minutes to only slightly above 1 minute for one of our larger test cases.

llvm-svn: 311127
2017-08-17 22:04:53 +00:00
Tobias Grosser d2e57981fd [GPGPU] Move getExtend to C++ [NFC]
llvm-svn: 311123
2017-08-17 21:20:28 +00:00
Siddharth Bhat a2c4112791 [ManagedMemoryRewrite] Rewrite malloc, free correctly inside `Constant`s.
Reuse the machinery built for replacing global arrays to replace malloc/free as
well. Example replacement that was missed earlier:

```
call void \
    bitcast (void (i8*)* @free to void (%custom_type*)*) (%custom_type* %13)
```

- Since the `bitcast` is a `ConstantExpr`, `replaceAllUsesWith` would miss
this. We don't miss this anymore.

Differential Revision: https://reviews.llvm.org/D36825

llvm-svn: 311121
2017-08-17 20:26:38 +00:00
Siddharth Bhat 8a2c07f6d4 [ManagedMemoryRewrite] Learn how to rewrite global arrays, allocas.
- If we have global arrays, we would like to rewrite them to global
  pointers which are allocated using `cudaMallocManaged`.

- If we have allocas in a function, we would like to rewrite them to
  heap-allocations with `cudaMallocManaged` and `cudaFree`.

- With these rewrite mechanisms, we can offload _any_ function to the
  GPU with no code rewrite whatsover.

Differential Revision: https://reviews.llvm.org/D36516

llvm-svn: 311080
2017-08-17 11:22:52 +00:00
Tobias Grosser e2a45f32dc [GPGPU] Also record invariant loads as kernel subtree values
Before this change kernels that used invariant loads would have resulted in
invalid PTX code.

llvm-svn: 311042
2017-08-16 21:37:53 +00:00
Tobias Grosser cff9696e11 [GPGPU] Make the ast_build available to block generator
This is necessary for partial writes (as used by delicm) to work.

llvm-svn: 310553
2017-08-10 08:00:56 +00:00
Philip Pfaffe f43e7c2e97 [Polly][PM] Improve invalidation in the Scop-Pipeline
Summary:
During code generation for a Scop we modify the IR of a function.
While this shouldn't affect a Scop in the formal sense, the implementation
caches various information about the IR such as SCEV expressions for bounds or
parameters. This cached information needs to be updated or invalidated. To this
end, SPMUpdater allows passes to report when they've invalidated a Scop to the
PassManager, which will then flush and recompute all Scops. This in turn
invalidates all iterators, so references to Scops shouldn't be held.

Reviewers: grosser, Meinersbur, bollu

Reviewed By: grosser

Subscribers: llvm-commits, pollydev

Differential Revision: https://reviews.llvm.org/D36524

llvm-svn: 310551
2017-08-10 07:43:46 +00:00
Siddharth Bhat 9298ff2dee [ManagedMemoryRewrite] [Polly] Erase original malloc and free. [NFC]
We do not need to keep `malloc` and `free` around since they are
replaced by `polly_{malloc,free}Managed.`

llvm-svn: 310504
2017-08-09 18:19:46 +00:00
Siddharth Bhat c4a4af47f3 [ManagedMemoryRewrite] Introduce a new pass to rewrite modules to use managed memory.
This pass is useful to automatically convert a codebase that uses malloc/free
to use their managed memory counterparts.

Currently, rewrite malloc and free to the `polly_{malloc,free}Managed` variants.

A future patch will teach ManagedMemoryRewrite to rewrite global arrays
as pointers to globally allocated managed memory.

Differential Revision: https://reviews.llvm.org/D36513

llvm-svn: 310471
2017-08-09 12:59:23 +00:00
Michael Kruse 40d083956c [CodeGen] Use isLatestArrayKind().
Codegen with -polly-parallel queried the unmapped MemoryAccess, but only
the MemoryKind after mapping is relevant for codegen.

This should fix various fails of the
perf-x86_64-penryn-O3-polly-parallel-fast buildbot.

llvm-svn: 310466
2017-08-09 12:27:51 +00:00
Siddharth Bhat 34eeabbca3 [PPCGCodeGeneration] Compute element size in bytes for arrays correctly.
Previously, we used to compute this with `elementSizeInBits / 8`. This
would yield an element size of 0 when the array had element size < 8 in
bits.

To fix this, ask data layout what the size in bytes should be.

Differential Revision: https://reviews.llvm.org/D36459

llvm-svn: 310448
2017-08-09 08:29:16 +00:00
Roman Gareev 1563f039f5 Use SCEV information for the second level aliasing
We introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. To distinguish two
accesses, the comparison of raw pointers representing base pointers is used.

In case of, for example, ublas's prod function that implements GEMM, and
DeLiCM we can get accesses to same location represented by different raw
pointers. Consequently, we create different alias sets that can prevent
accesses from, for example, being sinked or hoisted.

To avoid the issue, we compare the corresponding SCEV information instead
of the corresponding raw pointers.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D35761

llvm-svn: 310380
2017-08-08 16:50:28 +00:00
Siddharth Bhat 71dfb3eb07 [Polly] [PPCGCodeGeneration] Handle failing of invariant load hoisting gracefully.
To do this, we replicate what `CodeGeneration` does. We expose
`markNodeUnreachable` from `CodeGeneration` to `PPCGCodeGeneration`.

Differential Revision: https://reviews.llvm.org/D36457

llvm-svn: 310350
2017-08-08 12:00:59 +00:00
Tobias Grosser d70ea7fed0 [GPGPU] Remove redundant constructors
llvm-svn: 310284
2017-08-07 19:20:57 +00:00
Tobias Grosser 61bd3a4840 [ScopInfo] Move Scop::getPwAffOnly to isl++ [NFC]
llvm-svn: 310231
2017-08-06 21:42:38 +00:00
Tobias Grosser 31df6f31c0 [ScopInfo] Move Scop::getDomains to isl++ [NFC]
llvm-svn: 310230
2017-08-06 21:42:25 +00:00
Tobias Grosser 04ec2eb8c9 [ScopInfo] Move Scop::getInvalidContext to isl++ [NFC]
llvm-svn: 310229
2017-08-06 21:42:16 +00:00
Tobias Grosser e127033f98 [ScopInfo] Move Scop::getAssumedContext to isl++ [NFC]
llvm-svn: 310228
2017-08-06 21:42:09 +00:00
Tobias Grosser b65ccc4302 [ScopInfo] Translate Scop::getParamSpace to isl++ [NFC]
llvm-svn: 310224
2017-08-06 20:11:59 +00:00
Tobias Grosser 8ea1fc19b3 [ScopInfo] Translate Scop::getContext to isl++ [NFC]
llvm-svn: 310221
2017-08-06 19:52:38 +00:00
Tobias Grosser 9a63570b13 [ScopInfo] Translate Scop::getIdForParam to isl++ [NFC]
llvm-svn: 310220
2017-08-06 19:31:27 +00:00
Tobias Grosser 5ab39ff224 [ScopInfo] Move get*Writes/getReads/getAccesses to isl++
llvm-svn: 310219
2017-08-06 19:22:27 +00:00
Tobias Grosser 132860afe5 [ScopInfo] Move ScopStmt::setAstBuild/getAstBuild to isl++
llvm-svn: 310216
2017-08-06 17:53:04 +00:00
Tobias Grosser dcf8d696ff Move ScopInfo::getDomain(), getDomainSpace(), getDomainId() to isl++
llvm-svn: 310209
2017-08-06 16:39:52 +00:00
Tobias Grosser b99c11710c [GPGPU] Make sure managed arrays are prepared at the beginning of the scop
Summary:
This resolves some "instruction does not dominate use" errors, as we used to
prepare the arrays at the location of the first kernel, which not necessarily
dominated all other kernel calls.

Reviewers: Meinersbur, bollu, singam-sanjay

Subscribers: nemanjai, pollydev, llvm-commits, kbarton

Differential Revision: https://reviews.llvm.org/D36372

llvm-svn: 310196
2017-08-06 11:10:38 +00:00
Tobias Grosser 5b307cdb8a [GPGPU] Rename all, not only the first libdevice function
llvm-svn: 310194
2017-08-06 03:04:15 +00:00
Siddharth Bhat e53c924b0f [Polly] [PPCGCodeGeneration] Deal with loops outside the Scop correctly in PPCGCodeGeneration.
A Scop with a loop outside it is not handled currently by
PPCGCodeGeneration. The test case is such that the Scop has only one inner loop
that is detected. This currently breaks codegen.

The fix is to reuse the existing mechanism in `IslNodeBuilder` within
`GPUNodeBuilder.

Differential Revision: https://reviews.llvm.org/D36290

llvm-svn: 310193
2017-08-06 02:39:05 +00:00
Siddharth Bhat 0caed1fbe6 [IslNodeBuilder] [NFC] Refactor creation of loop induction variables of loops outside scops.
This logic is duplicated, so we refactor it into a separate function.
 This will be used in a later patch to teach PPCGCodeGen code generation
 for loops that are outside the scop.

 Differential Revision: https://reviews.llvm.org/D36310

llvm-svn: 310192
2017-08-06 02:07:11 +00:00
Siddharth Bhat 638316da5b [PPCGCodeGeneration] [NFC] Log every location from which PPCGCodegen bails.
This is useful when trying to understand why no GPU code was produced.

Differential Revision: https://reviews.llvm.org/D36318

llvm-svn: 310103
2017-08-04 19:36:40 +00:00
Tobias Grosser b5563c6817 Make sure that all parameter dimensions are set in schedule
Summary:
In case the option -polly-ignore-parameter-bounds is set, not all parameters
will be added to context and domains. This is useful to keep the size of the
sets and maps we work with small. Unfortunately, for AST generation it is
necessary to ensure all parameters are part of the schedule tree. Hence,
we modify the GPGPU code generation to make sure this is the case.

To obtain the necessary information we expose a new function
Scop::getFullParamSpace(). We also make a couple of functions const to be
able to make SCoP::getFullParamSpace() const.

Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg

Subscribers: nemanjai, kbarton, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36243

llvm-svn: 309939
2017-08-03 13:51:15 +00:00
Siddharth Bhat eadf76d34a [PPCGCodeGeneration] Construct `isl_multi_pw_aff` of PPCGArray.bounds even when polly-ignore-parameter-bounds is turned on.
When we have `-polly-ignore-parameter-bounds`, `Scop::Context` does not contain
all the paramters present in the program.

The construction of the `isl_multi_pw_aff` requires all the indivisual `pw_aff`
to have the same parameter dimensions. To achieve this, we used to realign
every `pw_aff` with `Scop::Context`. However, in conjunction with
`-polly-ignore-parameter-bounds`, this is now incorrect, since `Scop::Context`
does not contain all parameters.

We set this up correctly by creating a space that has all the parameters
used by all the `isl_pw_aff`. Then, we realign all `isl_pw_aff` to this space.

llvm-svn: 309934
2017-08-03 12:09:33 +00:00
Singapuram Sanjay Srivallabh 1f9ab16c4e Fix code format on r309826
Summary:
Fix code format on r309826 / D35458

Reviewers: grosser, bollu

Reviewed By: grosser

Subscribers: pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36232

llvm-svn: 309845
2017-08-02 17:56:39 +00:00
Singapuram Sanjay Srivallabh 188053af5e Remove debug metadata from copied instruction to prevent GPUModule verification failure
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**

When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.

The only debug metadata of the instruction, the DebugLoc, is unset by this patch.

This patch reattempts https://reviews.llvm.org/D35630 by targeting only those instructions that are to end up in a Module meant for the GPU.

Reviewers: grosser, bollu

Reviewed By: grosser

Subscribers: pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D36161

llvm-svn: 309822
2017-08-02 15:20:07 +00:00
Siddharth Bhat edf9581e4c [PPCGCodeGeneration] Correct usage of llvm::Value with getLatestValue.
It is possible that the `HostPtr` that coresponds to an array could be
invariant load hoisted. Make sure we use the invariant load hoisted
value by using `IslNodeBuilder::getLatestValue`.

Differential Revision: https://reviews.llvm.org/D36001

llvm-svn: 309681
2017-08-01 14:26:39 +00:00
Siddharth Bhat f2cfd2a4db [NFC] [IslNodeBuilder, GPUNodeBuilder] Unify mechanism for looking up replacement Values.
We populate `IslNodeBuilder::ValueMap` which contains replacements for
`llvm::Value`s. There was no simple method to pick up a replacement if
it exists, otherwise fall back to the original.

Create a method `IslNodeBuilder::getLatestValue` which provides this
functionality.

This will be used in a later patch to fix bugs in `PPCGCodeGeneration`
where the latest value is not being used.

Differential Revision: https://reviews.llvm.org/D36000

llvm-svn: 309674
2017-08-01 12:15:51 +00:00
Siddharth Bhat 4d5820d171 [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getGridSizes to isl++.
llvm-svn: 309671
2017-08-01 10:45:41 +00:00
Siddharth Bhat ccbf4b509c [NFC] [PPCGCodeGeneration] Convert GPUNodeBuilder::getArrayOffset to isl++.
llvm-svn: 309669
2017-08-01 09:58:55 +00:00
Tobias Grosser 8fc6cdfb1c [GPGPU] Add support for NVIDIA libdevice
Summary:
This allows us to map functions such as exp, expf, expl, for which no
LLVM intrinsics exist. Instead, we link to NVIDIA's libdevice which provides
high-performance implementations of a wide range of (math) functions. We
currently link only a small subset, the exp, cos and copysign functions. Other
functions will be enabled as needed.

Reviewers: bollu, singam-sanjay

Reviewed By: bollu

Subscribers: tstellar, tra, nemanjai, pollydev, mgorny, llvm-commits, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35703

llvm-svn: 309560
2017-07-31 14:03:16 +00:00
Tobias Grosser 39977e4e76 Revert "Remove Debug metadata from copied instruction to prevent Module verification failure"
This reverts commit r309490 as it triggers on our AOSP buildbut error messages
of the form:

inlinable function call in a function with debug info must have a !dbg location

llvm-svn: 309556
2017-07-31 11:43:38 +00:00
Tobias Grosser 7639db8ed9 [IslNodeBuilder] Remove unused instruction
Suggested-by: Maximilian Falkenstein <falkensm@student.ethz.ch>
llvm-svn: 309533
2017-07-31 01:59:23 +00:00
Singapuram Sanjay Srivallabh cf9a813368 Remove Debug metadata from copied instruction to prevent Module verification failure
Summary:
**Remove debug metadata from instruction to be copied to prevent the source file's debug metadata being copied into GPUModule and eventually failing Module verification and ASM string codegeneration.**

When copying the instruction onto the Module meant for the GPU, debug metadata attached to an instruction causes all related metadata to be pulled into the Module, including the DICompileUnit, which is not listed in llvm.dbg.cu of the Module. This fails the verification of the Module and generation of the ASM string.

The only debug metadata of the instruction, the DebugLoc, is unset by this patch.

Reviewers: grosser, bollu, Meinersbur

Reviewed By: grosser, bollu

Subscribers: pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35630

llvm-svn: 309490
2017-07-29 18:03:49 +00:00
Siddharth Bhat 4ebeb3568a [PPCGCodeGeneration] Check that invariant load hoisting succeeded.
If we fail, throw an error for now. We can gracefully handle this later.

llvm-svn: 309387
2017-07-28 14:48:32 +00:00
Tobias Grosser 25271b91b2 [GPGPU] Do not require the Scop::Context to have information about all parameters
llvm-svn: 309368
2017-07-28 06:49:44 +00:00
Tobias Grosser 30caae6d23 [GPGPU] Fix compilation issue with latest CUDA upgrade to i128
llvm-svn: 309366
2017-07-28 06:38:49 +00:00
Michael Kruse 8d89179e33 [ScopInfo] Rename ScopStmt::contains(BB) to represents(BB). NFC.
In future, there will be no more a 1:1 correspondence between statements
and basic blocks, the name `contains` does not correctly capture their
relationship. A BB may infact comprise of multiple statements; hence we
describe a statement 'representing' a basic block.

Differential Revision: https://reviews.llvm.org/D35838

llvm-svn: 308982
2017-07-25 16:25:37 +00:00
Siddharth Bhat 43f178bbc9 [PPCGCodeGeneration] Skip arrays with empty extent.
Invariant load hoisted scalars, and arrays whose size we can statically compute
to be 0 do not need to be allocated as arrays.

Invariant load hoisted scalars are sent to the kernel directly as parameters.

Earlier, we used to allocate `0` bytes of memory for these because our
computation of size from `PPCGCodeGeneration::getArraySize` would result in `0`.

Now, since we don't invariant loads as arrays in PPCGCodeGeneration, this
problem does not occur anymore.

Differential Revision: https://reviews.llvm.org/D35795

llvm-svn: 308971
2017-07-25 12:35:36 +00:00
Tobias Grosser d7065e5df5 Move MemoryAccess::isStride* to isl++
llvm-svn: 308927
2017-07-24 20:50:22 +00:00
Tobias Grosser 206e9e3b3b Move ScopArrayInfo::getFromAccessFunction and getFromId to isl++
llvm-svn: 308892
2017-07-24 16:22:27 +00:00
Siddharth Bhat f7face4bc4 Convert GPUNodeBuilder::getArraySize to islcpp.
Note: PPCGCodeGeneration::pollyBuildAstExprForStmt is at
      https://reviews.llvm.org/D35770

Differential Revision: https://reviews.llvm.org/D35771

llvm-svn: 308870
2017-07-24 09:08:21 +00:00
Siddharth Bhat 35de900917 [NFC] Move PPCGCodeGeneration::pollyBuildAstExprForStmt to isl++.
Differential Revision: https://reviews.llvm.org/D35771

llvm-svn: 308869
2017-07-24 08:34:24 +00:00
Tobias Grosser 3b196131b5 Move applyScheduleToAccessRelation to isl++
llvm-svn: 308842
2017-07-23 04:08:52 +00:00
Tobias Grosser 6a87036e0f Move MemoryAccess::getAddressFunction to isl++
llvm-svn: 308841
2017-07-23 04:08:45 +00:00
Tobias Grosser 1515f6b937 Move MemoryAccess::NewAccessRelation to isl++
We also move related accessor functions

llvm-svn: 308840
2017-07-23 04:08:38 +00:00
Tobias Grosser fe46c3ff3a Move MemoryAccess::id to isl++
llvm-svn: 308836
2017-07-23 04:08:11 +00:00
Tobias Grosser 77eef90f50 Move ScopArrayInfo to isl++
This moves the full ScopArrayInfo class to isl++

llvm-svn: 308801
2017-07-21 23:07:56 +00:00
Philipp Schaad 2f3073b5cb [Polly][GPGPU] Added SPIR Code Generation and Corresponding Runtime Support for Intel
Summary:
Added SPIR Code Generation to the PPCG Code Generator. This can be invoked using
the polly-gpu-arch flag value 'spir32' or 'spir64' for 32 and 64 bit code respectively.
In addition to that, runtime support has been added to execute said SPIR code on Intel
GPU's, where the system is equipped with Intel's open source driver Beignet (development
version). This requires the cmake flag 'USE_INTEL_OCL' to be turned on, and the polly-gpu-runtime
flag value to be 'libopencl'.
The transformation of LLVM IR to SPIR is currently quite a hack, consisting in part of regex
string transformations.
Has been tested (working) with Polybench 3.2 on an Intel i7-5500U (integrated graphics chip).

Reviewers: bollu, grosser, Meinersbur, singam-sanjay

Reviewed By: grosser, singam-sanjay

Subscribers: pollydev, nemanjai, mgorny, Anastasia, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35185

llvm-svn: 308751
2017-07-21 16:11:06 +00:00
Michael Kruse cd4c977b8b [ScopInfo] Print instructions in dump().
Print a statement's instruction on dump() regardless of
-polly-print-instructions. dump() is supposed to be used in the debugger
only and never in regression tests. While debugging, get all the
information we have and we are not bound to break anything. For non-dump
purposes of print, forward the setting of -polly-print-instructions as
parameters.

Some calls to print() had to be changed because the
PollyPrintInstructions setting is only available in ScopInfo.cpp.
In ScheduleOptimizer.cpp, dump() was used in regression tests.
That's not what dump() is for.

The print parameter "PrintInstructions" will also be useful for an
explicit print SCoP pass in a future patch.

llvm-svn: 308746
2017-07-21 15:35:53 +00:00
Siddharth Bhat a0fb8b23e1 [NFC] [PPCGCodeGeneration] Print `verifyModule` failure to debug stream.
If verifyModule fails, it is helpful to know why it failed. Add a log to
the debug stream that prints the failure.

llvm-svn: 308727
2017-07-21 11:21:44 +00:00
Tobias Grosser 018103d34e Fix typo in function name Bllock -> Block
llvm-svn: 308715
2017-07-21 06:00:38 +00:00
Tobias Grosser 1eeedf4829 [IslNodeBuilder] Relax complexity check in invariant loads and run it early
When performing invariant load hoisting we check that invariant load expressions
are not too complex. Up to this commit, we performed this check by counting the
sum of dimensions in the access range as a very simple heuristic. This heuristic
is a little too conservative, as it prevents hoisting for any scops with a
very large number of parameters. Hence, we update the heuristic to only count
existentially quantified dimensions and set dimensions. We expect this to still
detect the problematic expressions in h264 because of which this check was
originally introduced.

For some unknown reason, this complexity check was originally committed in
IslNodeBuilder. It really belongs in ScopInfo, as there is no point in
optimizing a program which we could have known earlier cannot be code generated.
The benefit of running the check early is that we can avoid to even hoist checks
that are expensive to code generate as invariant loads. This can be seen in
the changed tests, where we now indeed detect the scop, but just not invariant
load hoist the complicated access.

We also improve the formatting of the code, document it, and use isl++ to
simplify expressions.

llvm-svn: 308659
2017-07-20 19:55:19 +00:00
Tobias Grosser 54491db687 Support fabs and copysign in Polly-ACC
llvm-svn: 308649
2017-07-20 18:26:34 +00:00
Siddharth Bhat 9e3db2b756 [PPCGCodeGen] [3/3] Update PPCGCodeGen + tests to latest ppcg.
This commit *WILL COMPILE*.

1. `PPCG` now uses `isl_multi_pw_aff` instead of an array of `pw_aff`.
   This needs us to adjust how we index array bounds and how we construct
   array bounds.

2. `PPCG` introduces two new kinds of nodes: `init_device` and `clear_device`.
   We should investigate what the correct way to handle these are.

3. `PPCG` has gotten smarter with its use of live range reordering, so some of
   the tests have a qualitative improvement.

4. `PPCG` changed its output style, so many test cases need to be updated to
   fit the new style for `polly-acc-dump-code` checks.

Differential Revision: https://reviews.llvm.org/D35677

llvm-svn: 308625
2017-07-20 15:48:36 +00:00
Siddharth Bhat edfef5ae8e [NFC] [PPCGCodeGeneration] cleanup kills related code.
We extended kills in Polly to handle both `phi` nodes and scalars that
    are not used within the Scop. Update the comments and choice of
    variable names to reflect this.

llvm-svn: 308279
2017-07-18 09:15:16 +00:00
Siddharth Bhat 233d717ec1 [PPCGCodeGeneration] Generate invariant loads before trying to generate IR.
- We should call `preloadInvariantLoads` to make sure that code is
   generated for invariant loads in the kernel.

Differential Revision: https://reviews.llvm.org/D35410

llvm-svn: 308187
2017-07-17 15:57:01 +00:00
Siddharth Bhat 03346c2701 [PPCGCodeGeneration] Fix runtime check adjustments since they make assumptions about BB layout.
- There is a conditional branch that is used to switch between the old
   and new versions of the code.

- If we detect that the build was unsuccessful, `PPCGCodeGeneration` will
  change the runtime check to be always set to false.

- To actually *reach* this runtime check instruction, `PPCGCodeGeneration`
  was using assumptions about the layout of the BBs.

- However, invariant load hoisting violates this assumption by inserting
  an extra basic block in the middle.

- Fix the assumption on the layout by having `createScopConditionally`
   return the conditional branch instruction.

- Use this reference to set to always-false.

llvm-svn: 308010
2017-07-14 10:00:25 +00:00
Siddharth Bhat a1b2086a33 [Invariant Loads] Do not consider invariant loads to have dependences.
We need to relax constraints on invariant loads so that they do not
create fake RAW dependences. So, we do not consider invariant loads as
scalar dependences in a region.

During these changes, it turned out that we do not consider `llvm::Value`
replacements correctly within `PPCGCodeGeneration` and `ISLNodeBuilder`.
The replacements dictated by `ValueMap` were not being followed in all
places. This was fixed in this commit. There is no clean way to decouple
this change because this bug only seems to arise when the relaxed
version of invariant load hoisting was enabled.

Differential Revision: https://reviews.llvm.org/D35120

llvm-svn: 307907
2017-07-13 12:18:56 +00:00
Singapuram Sanjay Srivallabh 1abd9ffa37 [PPCGCodeGen] Differentiate kernels based on their parent Scop
Summary:
Add a sequence number that identifies a ptx_kernel's parent Scop within a function to it's name to differentiate it from other kernels produced from the same function, yet different Scops.

Kernels produced from different Scops can end up having the same name. Consider a function with 2 Scops and each Scop being able to produce just one kernel. Both of these kernels have the name "kernel_0". This can lead to the wrong kernel being launched when the runtime picks a kernel from its cache based on the name alone. This patch supplements D33985, by differentiating kernels across Scops as well.

Previously (even before D33985) while profiling kernels generated through JIT e.g. Julia, [[ https://groups.google.com/d/msg/polly-dev/J1j587H3-Qw/mR-jfL16BgAJ | kernels associated with different functions, and even different SCoPs within a function, would be grouped together due to the common name ]]. This patch prevents this grouping and the kernels are reported separately.

Reviewers: grosser, bollu

Reviewed By: grosser

Subscribers: mehdi_amini, nemanjai, pollydev, kbarton

Tags: #polly

Differential Revision: https://reviews.llvm.org/D35176

llvm-svn: 307814
2017-07-12 16:46:19 +00:00
Tobias Grosser 153a508349 [IslAst] Print memory accesses in AST dump
When providing the option "-polly-ast-print-accesses" Polly also prints the
memory accesses that are generated:

    #pragma known-parallel
    for (int c0 = 0; c0 <= 1023; c0 += 4)
      #pragma simd
      for (int c1 = c0; c1 <= c0 + 3; c1 += 1)
        Stmt_for_body(
          /* read  */ &MemRef_B[0]
          /* write */  MemRef_A[c1]
        );

This makes writing and debugging memory layout transformations easier.

Based on a patch contributed by Thomas Lang (ETH Zurich)

llvm-svn: 307579
2017-07-10 20:13:06 +00:00
Siddharth Bhat 761e5b9310 [Polly] [PPCGCodeGeneration] Teach `must_kills` to kill scalars that are local to the scop.
- By definition, we can pass something as a `kill` to PPCG if we know
that no data can flow across a kill.
- This is useful for more complex examples where we have scalars that
are local to a scop.
- If the local is only used within a scop, we are free to kill it.

Differential Revision: https://reviews.llvm.org/D35045

llvm-svn: 307260
2017-07-06 13:42:42 +00:00
Singapuram Sanjay Srivallabh 79f13b9a80 Prefix the name of the calling host function in the name of callee GPU kernel
Summary:
Provide more context to the name of a GPU kernel by prefixing its name with the host function that calls it. E.g. The first kernel called by `gemm` would be `FUNC_gemm_KERNEL_0`.

Kernels currently follow the "kernel_#" (# = 0,1,2,3,...) nomenclature. This patch makes it easier to map host caller and device callee, especially when there are many kernels produced by Polly-ACC.

Reviewers: grosser, Meinersbur, bollu, philip.pfaffe, kbarton!

Reviewed By: grosser

Subscribers: nemanjai, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D33985

llvm-svn: 307173
2017-07-05 16:48:21 +00:00
Siddharth Bhat a82f2d264a [PPCGCodeGeneration] Teach Polly to start using live range reordering.
Polly did not use PPCG's live range reordering feature. Teach
PPCGCodeGeneration to use this.

Documentation on this is sparse, so much of the code is conservative.

We currently kill all phi nodes in a Scop by appending them to the
must_kill map we pass to PPCG. I do not have a proof of correctness,
but it seems to be intuitively correct.

We also do not handle `array_order`, which, quoting PPCG, is:
PPCG/gpu.h: "Order dependences on non-scalars."
It seems to consist of RAW dependences between arrays. We need to
pass this information for more complex privatization cases.

Differential Revision: https://reviews.llvm.org/D34941

llvm-svn: 307163
2017-07-05 14:57:04 +00:00
Singapuram Sanjay Srivallabh 02ca346e48 Introduce a hybrid target to generate code for either the GPU or CPU
Summary:
Introduce a "hybrid" `-polly-target` option to optimise code for either the GPU or CPU.

When this target is selected, PPCGCodeGeneration will attempt first to optimise a Scop. If the Scop isn't modified, it is then sent to the passes that form the CPU pipeline, i.e. IslScheduleOptimizerPass, IslAstInfoWrapperPass and CodeGeneration.

In case the Scop is modified, it is marked to be skipped by the subsequent CPU optimisation passes.

Reviewers: grosser, Meinersbur, bollu

Reviewed By: grosser

Subscribers: kbarton, nemanjai, pollydev

Tags: #polly

Differential Revision: https://reviews.llvm.org/D34054

llvm-svn: 306863
2017-06-30 19:42:21 +00:00
Michael Kruse b738ffa845 Heap allocation for new arrays.
This patch aims to implement the option of allocating new arrays created
by polly on heap instead of stack. To enable this option, a key named
'allocation' must be written in the imported json file with the value
'heap'.

We need such a feature because in a next iteration, we will implement a
mechanism of maximal static expansion which will need a way to allocate
arrays on heap. Indeed, the expansion is very costly in terms of memory
and doing the allocation on stack is not worth considering.

The malloc and the free are added respectively at polly.start and
polly.exiting such that there is no use-after-free (for instance in case
of Scop in a loop) and such that all memory cells allocated with a
malloc are free'd when we don't need them anymore.

We also add :

- In the class ScopArrayInfo, we add a boolean as member called IsOnHeap
  which represents the fact that the array in allocated on heap or not.
- A new branch in the method allocateNewArrays in the ISLNodeBuilder for
  the case of heap allocation. allocateNewArrays now takes a BBPair
  containing polly.start and polly.exiting. allocateNewArrays takes this
  two blocks and add the malloc and free calls respectively to
  polly.start and polly.exiting.
- As IntPtrTy for the malloc call, we use the DataLayout one.

To do that, we have modified :

- createScopArrayInfo and getOrCreateScopArrayInfo such that it returns
  a non-const SAI, in order to be able to call setIsOnHeap in the
  JSONImporter.
- executeScopConditionnaly such that it return both start block and end
  block of the scop, because we need this two blocs to be able to add
  the malloc and the free calls at the right position.

Differential Revision: https://reviews.llvm.org/D33688

llvm-svn: 306540
2017-06-28 13:02:43 +00:00
Andreas Simbuerger dbb0ef8e94 [NFC][CodeGen] Use the ExitBlock explicitly.
Before we would 'guess' the correct location for the MergeBlock
that got introduced when executing a Scop conditionally. This
implicitly depends on the situation that at this point during
CodeGen there will be nothing between polly.start and polly.exiting.

With this commit we explicitly state that we want the block that
directly follows polly.exiting.

llvm-svn: 306398
2017-06-27 11:33:22 +00:00
Siddharth Bhat 65d7f72f2c [PPCGCodeGeneration] Add flag to allow polly to fail in GPU kernel fails.
- This is useful for debugging GPU code.

llvm-svn: 306290
2017-06-26 14:56:56 +00:00
Siddharth Bhat f291c8d510 [PPCGCodeGeneration] Allow intrinsics within kernels.
- In D33414, if any function call was found within a kernel, we would bail out.

- This is an over-approximation. This patch changes this by allowing the
  `llvm.sqrt.*` family of intrinsics.

- This introduces an additional step when creating a separate llvm::Module
  for a kernel (GPUModule). We now copy function declarations from the
  original module to new module.

- We also populate IslNodeBuilder::ValueMap so it replaces the function
  references to the old module to the ones in the new module
  (GPUModule).

Differential Revision: https://reviews.llvm.org/D34145

llvm-svn: 306284
2017-06-26 13:12:06 +00:00
Andreas Simbuerger 256070d85c [NFC] Return both polly.start and polly.exiting from executeScopConditionally.
This commit returns both the start and the exit block that are created
by executeScopConditionally.

In a future commit we will make use of the exit block. Before we would
have to use the implicit property that there won't be any code generated
between polly.start and polly.exiting at the time of use to find the
correct block ('polly.exiting').

All usage location are semantically unchanged.

llvm-svn: 306283
2017-06-26 12:17:11 +00:00
Siddharth Bhat a12f807f33 [PPCGCodeGeneration] Enable GPU code generation with invariant loads.
The condition that disallowed code generation in PPCGCodeGeneration with
invariant loads is not required. I haven't been able to construct a
counterexample where this generates invalid code.

Differential Revision: https://reviews.llvm.org/D34604

llvm-svn: 306245
2017-06-25 14:48:24 +00:00
Michael Kruse 214deb7960 [CodeGen] Emit aliasing metadata for new arrays.
Ensure that all array base pointers are assigned before generating
aliasing metadata by allocating new arrays beforehand.

Before this patch, getBasePtr() returned nullptr for new arrays because
the arrays were created at a later point. Nullptr did not match to any
array after the created array base pointers have been assigned and when
the loads/stores are generated.

llvm-svn: 305675
2017-06-19 10:19:29 +00:00
Siddharth Bhat bccaea57c0 [Polly] [PPCGCodeGeneration] Skip Scops which contain function pointers.
In `PPCGCodeGeneration`, we try to take the references of every `Value`
that is used within a Scop to offload to the kernel. This occurs in
`GPUNodeBuilder::createLaunchParameters`.

This breaks if one of the values is a function pointer, since one of
these cases will trigger:

1. We try to to take the references of an intrinsic function, and this
breaks at `verifyModule`, since it is illegal to take the reference of
an intrinsic.

2. We manage to take the reference to a function, but this fails at
`verifyModule` since the function will not be present in the module that
is created in the kernel.

3. Even if `verifyModule` succeeds (which should not occur), we would
then try to call a *host function* from the *device*, which is
illegal runtime behaviour.

So, we disable this entire range of possibilities by simply not allowing
function references within a `Scop` which corresponds to a kernel.

However, note that this is too conservative. We *can* allow intrinsics
within kernels if the backend can lower the intrinsic correctly. For
example, an intrinsic like `llvm.powi.*` can actually be lowered by the `NVPTX`
backend.

We will now gradually whitelist intrinsics which are known to be safe.

Differential Revision: https://reviews.llvm.org/D33414

llvm-svn: 305185
2017-06-12 11:41:09 +00:00
Michael Kruse a6d48f59a1 Fix a lot of typos. NFC.
llvm-svn: 304974
2017-06-08 12:06:15 +00:00
Tobias Grosser deefbced96 [Polly] [BlockGen] Support partial writes in regions
Summary:
The RegionGenerator traditionally kept a BlockMap that mapped from original
basic blocks to newly generated basic blocks. With the introduction of partial
writes such a 1:1 mapping is not possible any more, as a single basic block
can be code generated into multiple basic blocks. Hence, depending on the use
case we need to either use the first basic block or the last basic block.

This is intended to address the last four cases of incorrect code generation
in our AOSP buildbot and hopefully should turn it green.

Reviewers: Meinersbur, bollu, gareevroman, efriedma, huihuiz, sebpop, simbuerg

Reviewed By: Meinersbur

Subscribers: pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D33767

llvm-svn: 304808
2017-06-06 17:17:30 +00:00
Michael Kruse be194d4efd [CodeGen] Remove extra ';'. NFC.
Fix compiler warning:

polly/lib/CodeGen/PerfMonitor.cpp:81:2: warning: extra ‘;’ [-Wpedantic]
 };
  ^

llvm-svn: 304802
2017-06-06 15:56:50 +00:00
Siddharth Bhat 726c28f8c4 [CodeGen] Track trip counts per-scop for performance measurement.
- Add a counter that is incremented once on exit from a scop.

- Test cases got split into two: one to test the cycles, and another one
to test trip counts.

- Sample output:
```name=sample-output.txt
scop function, entry block name, exit block name, total time, trip count
warmup, %entry.split, %polly.merge_new_and_old, 5180, 1
f, %entry.split, %polly.merge_new_and_old, 409944, 500
g, %entry.split, %polly.merge_new_and_old, 1226, 1
```

Differential Revision: https://reviews.llvm.org/D33822

llvm-svn: 304543
2017-06-02 11:36:52 +00:00
Siddharth Bhat a4dea6bb05 [CodeGen] Print performance counter information in CSV.
This ensures that tools can parse performance information which Polly
generates easily.

- Sample output:
```name=out.csv
scop function, entry block name, exit block name, total time
warmup, %entry.split, %polly.merge_new_and_old, 1960
f, %entry.split, %polly.merge_new_and_old, 1238
g, %entry.split, %polly.merge_new_and_old, 1218
```

- Example code to parse output:
```lang=python, name=example-parse.py
import asciitable
import sys

table = asciitable.read('out.csv', delimiter=',')
asciitable.write(table, sys.stdout, delimiter=',')
```

llvm-svn: 304533
2017-06-02 09:20:02 +00:00
Siddharth Bhat fee75f4ba5 [NFC] [CodeGen] Bail out of per-scop performance reporting if not supported.
We should bail out if performance monitoring is not supported, since
we would have no information to print per-scop, and `FinalStartBB`,
`ReturnFromFinal` would be `nullptr`.

Assert that these are not `nullptr` if performance monitoring is supported.

llvm-svn: 304529
2017-06-02 08:44:19 +00:00
Siddharth Bhat 07bee290de [CodeGen] Extend Performance Counter to track per-scop information.
Previously, we would generate one performance counter for all scops.
Now, we generate both the old information, as well as a per-scop
performance counter to generate finer grained information.

This patch needed a way to generate a unique name for a `Scop`.
The start region, end region, and function name combined provides a
unique `Scop` name. So, `Scop` has a new public API to provide its start
and end region names.

Differential Revision: https://reviews.llvm.org/D33723

llvm-svn: 304528
2017-06-02 08:01:22 +00:00
Michael Kruse 3bb4829936 [CodeGen] Iterate over explicit instruction list for block statements. NFC
For when statements do not contain all instructions of a BasicBlock
anymore, the block generator needs to go through the explicit list of
instructions it contains.

Contributed-by: Nandini Singhal <cs15mtech01004@iith.ac.in>

Differential Revision: https://reviews.llvm.org/D33653

llvm-svn: 304502
2017-06-02 00:13:49 +00:00
Tobias Grosser f51decb5fe [BlockGenerator] Take context into account when identifying partial writes
A partial write is a write where the domain of the values written is a subset of
the execution domain of the parent statement containing the write. Originally,
we directly checked this subset relation whereas it is indeed only important
that the subset relation holds for the parameter values that are known to be
valid in the execution context of the scop. We update our check to avoid the
unnecessary introduction of partial writes in situations where the write appears
to be partial without context information, but where context information allows
us to understand that a full write can be generated.

This change fixes (hides) a recent regression introduced in r303517, which broke
our AOSP builds. The part that is correctly fixed in this change is that we do
not any more unnecessarily generate a partial write. This is good performance
wise and, as we currently do not yet explicitly introduce partial writes in the
default configuration, this also hides possible bugs in the partial writes
implementation. The crashes that we have originally seen were caused by such
a bug, where partial writes were incorrectly generated in region statements. An
additional patch in a subsequent commit is needed to address this problem.

Reported-by: Reported-by: Eli Friedman <efriedma@codeaurora.org>

Differential Revision: https://reviews.llvm.org/D33759

llvm-svn: 304398
2017-06-01 09:34:20 +00:00
Tobias Grosser 6b6ac90098 [BlockGenerator] Translate buildContainsCondition to idiomatic isl C++
llvm-svn: 304354
2017-05-31 21:49:51 +00:00
Philip Pfaffe 78ae52f0d6 [Polly][NewPM] Port CodeGen to the new PM
Summary: To move CG to the new PM I outlined the various helper that were previously members of the CG class into free static functions. The CG class itself I moved into a header, which is required because we need to include it in `RegisterPasses` eventually.

Reviewers: grosser, Meinersbur

Reviewed By: grosser

Subscribers: pollydev, llvm-commits, sanjoy

Tags: #polly

Differential Revision: https://reviews.llvm.org/D33423

llvm-svn: 303624
2017-05-23 10:18:12 +00:00
Philip Pfaffe 2b852e2e42 [Polly][NewPM] Port IslAst to the new ScopPassManager
Summary: This patch ports IslAst to the new PM. The change is mostly straightforward. The only major modification required is making IslAst move-only, to correctly manage the isl resources it owns.

Reviewers: grosser, Meinersbur

Reviewed By: grosser

Subscribers: nemanjai, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D33422

llvm-svn: 303622
2017-05-23 10:12:56 +00:00
Michael Kruse 1aad76c18f [CodeGen] Add invalidation of the loop SCEVs after merge block generation.
The SCEVs of loops surrounding the escape users of a merge blocks are
forgotten, so that loop trip counts based on old values can be revoked.

This fixes llvm.org//PR32536

Contributed-by: Baranidharan Mohan <mbdharan@gmail.com>

Differential Revision: https://reviews.llvm.org/D33195

llvm-svn: 303561
2017-05-22 15:36:53 +00:00
Michael Kruse 706f79ab14 [CodeGen] Support partial write accesses.
Allow the BlockGenerator to generate memory writes that are not defined
over the complete statement domain, but only over a subset of it. It
generates a condition that evaluates to 1 if executing the subdomain,
and only then execute the access.

Only write accesses are supported. Read accesses would require a PHINode
which has a value if the access is not executed.

Partial write makes DeLICM able to apply mappings that are not defined
over the entire domain (for instance, a branch that leaves a loop with
a PHINode in its header; a MemoryKind::PHI write when leaving is never
read by its PHI read).

Differential Revision: https://reviews.llvm.org/D33255

llvm-svn: 303517
2017-05-21 22:46:57 +00:00
Siddharth Bhat b7f68b8c9e [Fortran Support] Materialize outermost dimension for Fortran array.
- We use the outermost dimension of arrays since we need this
information to generate GPU transfers.

- In general, if we do not know the outermost dimension of the array
(because the indexing expression is non-affine, for example) then we
simply cannot generate transfer code.

- However, for Fortran arrays, we can use the Fortran array
representation which stores the dimensions of all arrays.

- This patch uses the Fortran array representation to generate code that
computes the outermost dimension size.

Differential Revision: https://reviews.llvm.org/D32967

llvm-svn: 303429
2017-05-19 15:07:45 +00:00
Reid Kleckner 96ab8726a3 [IR] De-virtualize ~Value to save a vptr
Summary:
Implements PR889

Removing the virtual table pointer from Value saves 1% of RSS when doing
LTO of llc on Linux. The impact on time was positive, but too noisy to
conclusively say that performance improved. Here is a link to the
spreadsheet with the original data:

https://docs.google.com/spreadsheets/d/1F4FHir0qYnV0MEp2sYYp_BuvnJgWlWPhWOwZ6LbW7W4/edit?usp=sharing

This change makes it invalid to directly delete a Value, User, or
Instruction pointer. Instead, such code can be rewritten to a null check
and a call Value::deleteValue(). Value objects tend to have their
lifetimes managed through iplist, so for the most part, this isn't a big
deal.  However, there are some places where LLVM deletes values, and
those places had to be migrated to deleteValue.  I have also created
llvm::unique_value, which has a custom deleter, so it can be used in
place of std::unique_ptr<Value>.

I had to add the "DerivedUser" Deleter escape hatch for MemorySSA, which
derives from User outside of lib/IR. Code in IR cannot include MemorySSA
headers or call the MemoryAccess object destructors without introducing
a circular dependency, so we need some level of indirection.
Unfortunately, no class derived from User may have any virtual methods,
because adding a virtual method would break User::getHungOffOperands(),
which assumes that it can find the use list immediately prior to the
User object. I've added a static_assert to the appropriate OperandTraits
templates to help people avoid this trap.

Reviewers: chandlerc, mehdi_amini, pete, dberlin, george.burgess.iv

Reviewed By: chandlerc

Subscribers: krytarowski, eraman, george.burgess.iv, mzolotukhin, Prazek, nlewycky, hans, inglorion, pcc, tejohnson, dberlin, llvm-commits

Differential Revision: https://reviews.llvm.org/D31261

llvm-svn: 303362
2017-05-18 17:24:10 +00:00
Philip Pfaffe 5cc87e3ab3 [Polly][NewPM] Port ScopDetection to the new PassManager
Summary: This is a proof of concept of how to port polly-passes to the new PassManager architecture.  This approach works ootb for Function-Passes, but might not be directly applicable to Scop/Region-Passes. While we could just run the Analyses/Transforms over functions instead, we'd surrender the nice pipelining behaviour we have now.

Reviewers: Meinersbur, grosser

Reviewed By: grosser

Subscribers: pollydev, sanjoy, nemanjai, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31459

llvm-svn: 302902
2017-05-12 14:37:29 +00:00
Hongbin Zheng 5b263d4ce1 [Polly] Remove unused header
llvm-svn: 302868
2017-05-12 02:21:50 +00:00
Hongbin Zheng 4fe342cb75 [Polly] Generate more 'canonical' induction variable
Today Polly generates induction variable in this way:

polly.indvar = phi 0, polly.indvar.next
...
polly.indvar.next = polly.indvar + stide
polly.loop_cond = predicate polly.indvar, (UB - stride)

Instead of:

polly.indvar = phi 0, polly.indvar.next
...
polly.indvar.next = polly.indvar + stide
polly.loop_cond = predicate polly.indvar.next, UB

The way Polly generate induction variable cause some problem in the indvar simplify pass.
This patch make polly generate the later form, by assuming the induction variable never overflow

Differential Revision: https://reviews.llvm.org/D33089

llvm-svn: 302866
2017-05-12 02:17:15 +00:00
Tobias Grosser f3adab4c20 [Polly] Canonicalize arrays according to base-ptr equivalence class
Summary:
    In case two arrays share base pointers in the same invariant load equivalence
    class, we canonicalize all memory accesses to the first of these arrays
    (according to their order in the equivalence class).

    This enables us to optimize kernels such as boost::ublas by ensuring that
    different references to the C array are interpreted as accesses to the same
    array. Before this change the runtime alias check for ublas would fail, as it
    would assume models of the C array with differing (but identically valued) base
    pointers would reference distinct regions of memory whereas the referenced
    memory regions were indeed identical.

    As part of this change we remove most of the MemoryAccess::get*BaseAddr
    interface. We removed already all references to get*BaseAddr in previous
    commits to ensure that no code relies on matching base pointers between
    memory accesses and scop arrays -- except for three remaining uses where we
    need the original base pointer. We document for these situations that
    MemoryAccess::getOriginalBaseAddr may return a base pointer that is distinct
    to the base pointer of the scop array referenced by this memory access.

Reviewers: sebpop, Meinersbur, zinob, gareevroman, pollydev, huihuiz, efriedma, jdoerfert

Reviewed By: Meinersbur

Subscribers: etherzhhb

Tags: #polly

Differential Revision: https://reviews.llvm.org/D28518

llvm-svn: 302636
2017-05-10 10:59:58 +00:00
Tobias Grosser 1a2e0e6415 Fix formatting in Polly
llvm-svn: 302620
2017-05-10 04:53:59 +00:00
Chandler Carruth d742e5efa8 Update Polly for LLVM API change r302571 that removed varargs functions
with a nullptr sentinel in favor of nicely typed variadic templates.

llvm-svn: 302618
2017-05-10 02:39:35 +00:00
Siddharth Bhat a90be207c6 [Polly][PPCGCodeGen] OpenCL now gets kernel argument size from PPCG CodeGen
Summary: PPCGCodeGeneration now attaches the size of the kernel launch parameters at the end of the parameter list. For the existing CUDA Runtime, this gets ignored, but the OpenCL Runtime knows to check for kernel-argument size at the end of the parameter list. (The resulting parameters list is twice as long. This has been accounted for in the corresponding test cases).

Reviewers: grosser, Meinersbur, bollu

Reviewed By: bollu

Subscribers: nemanjai, yaxunl, Anastasia, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32961

llvm-svn: 302515
2017-05-09 10:45:52 +00:00
Siddharth Bhat 17f01968f1 [Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen
Summary:
When compiling for GPU, one can now choose to compile for OpenCL or CUDA,
with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The
GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library
for that purpose, correctly choosing the corresponding library calls to the
option chosen when compiling (via different initialization calls).

Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far).

Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay

Reviewed By: grosser, Meinersbur

Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32431

llvm-svn: 302379
2017-05-07 21:03:46 +00:00
Siddharth Bhat c1267b9baa Revert "[Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen"
This reverts commit 17a84e414adb51ee375d14836d4c2a817b191933.

Patches should have been submitted in the order of:

1. D32852
2. D32854
3. D32431

I mistakenly pushed D32431(3) first. Reverting to push in the correct
order.

llvm-svn: 302217
2017-05-05 09:02:08 +00:00
Siddharth Bhat 51904ae35a [Polly] Added OpenCL Runtime to GPURuntime Library for GPGPU CodeGen
Summary:
When compiling for GPU, one can now choose to compile for OpenCL or CUDA,
with the corresponding polly-gpu-runtime flag (libopencl / libcudart). The
GPURuntime library (GPUJIT) has been extended with the OpenCL Runtime library
for that purpose, correctly choosing the corresponding library calls to the
option chosen when compiling (via different initialization calls).

Additionally, a specific GPU Target architecture can now be chosen with -polly-gpu-arch (only nvptx64 implemented thus far).

Reviewers: grosser, bollu, Meinersbur, etherzhhb, singam-sanjay

Reviewed By: grosser, Meinersbur

Subscribers: singam-sanjay, llvm-commits, pollydev, nemanjai, mgorny, yaxunl, Anastasia

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32431

llvm-svn: 302215
2017-05-05 07:54:49 +00:00
Michael Kruse eedae7630a Introduce VirtualUse. NFC.
If a ScopStmt references a (scalar) value, there are multiple
possibilities where this value can come. The decision about what kind of
use it is must be handled consistently at different places, which can be
error-prone. VirtualUse is meant to centralize the handling of the
different types of value uses.

This patch makes ScopBuilder and CodeGeneration use VirtualUse. This
already helps to show inconsistencies with the value handling. In order
to keep this patch NFC, exceptions to the general rules are added.
These might be fixed later if they turn to problems. Overall, this
should result in fewer post-codegen IR-verification errors, but instead
assertion failures in `getNewValue` that are closer to the actual error.

Differential Revision: https://reviews.llvm.org/D32667

llvm-svn: 302157
2017-05-04 15:22:57 +00:00
Siddharth Bhat 6c3d19ba45 [NFC] [IslAST] fix typo: "int the" -> "in the"
llvm-svn: 301925
2017-05-02 14:54:49 +00:00
Tobias Grosser f13722177b [Codegen] Disable Polly's codegen verification by default
As has been reported in the previous commit, codegen verification can result in
quadratic compile time increases for large functions with many scops. This is
certainly not something we would like to have in the Polly default
configuration. Hence, we disable codegen verification by default -- also to see
if this resolves some of the compilation timeouts we currently see on the AOSP
buildbots. We still leave this feature in Polly as it has shown _very_ useful
for debugging. In fact, we may want to have a discussion if we can bring this
feature back in a way that does not impact compilation time so much.

Thanks to Eli Friedman <efriedma@codeaurora.org> for reporting this issue and
for providing the test case in the previous commit (where I forgot to
acknowledge him).

llvm-svn: 301670
2017-04-28 19:15:28 +00:00
Tobias Grosser d439911f73 [CodeGen] Skip verify if -polly-codegen-verify is set to false
Before this change, we always tried to verify the function and printed
verification errors, but just did not abort in case -polly-codegen-verify=false
was set and verification failed. As verification can become very cosly -- for
large functions with many scops we may verify the very same function very often
-- this can affect compile time very negatively. Hence, we respect the
-polly-codegen-verify flag with this check, ensuring that no verification is run
if -polly-codegen-verify=false.

This reduces code generation time from 26 seconds to 4 seconds on the test
case below with -polly-codegen-verify=false:

  struct X { int x; };
  void a();
  #define SIG (int x, X **y, X **z)
  typedef void (*fn)SIG;
  #define FN { for (int i = 0; i < x; ++i) { (*y)[i].x += (*z)[i].x; } a(); }
  #define FN5 FN FN FN FN FN
  #define FN25 FN5 FN5 FN5 FN5
  #define FN125 FN25 FN25 FN25 FN25 FN25
  #define FN250 FN125 FN125
  #define FN1250 FN250 FN250 FN250 FN250 FN250
  void x SIG { FN1250 }

llvm-svn: 301669
2017-04-28 19:08:20 +00:00
Siddharth Bhat abed49699b [Polly] [PPCGCodeGeneration] Add managed memory support to GPU code
generation.

This needs changes to GPURuntime to expose synchronization between host
and device.

1. Needs better function naming, I want a better name than
"getOrCreateManagedDeviceArray"

2. DeviceAllocations is used by both the managed memory and the
non-managed memory path. This exploits the fact that the two code paths
are never run together. I'm not sure if this is the best design decision

Reviewed by: PhilippSchaad

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32215

llvm-svn: 301640
2017-04-28 11:16:30 +00:00
Hongbin Zheng 0f8f177682 [Polly] Do not introduce address space cast
Do not introduce address space cast in IslNodeBuilder::preloadUnconditionally.

Differential Revision: https://reviews.llvm.org/D32581

llvm-svn: 301519
2017-04-27 06:42:14 +00:00
Siddharth Bhat d277feda91 [PPCGCodeGeneration] Update PPCG Code Generation for OpenCL compatibility
Added a small change to the way pointer arguments are set in the kernel
code generation. The way the pointer is retrieved now, specifically requests
global address space to be annotated. This is necessary, if the IR should be
run through NVPTX to generate OpenCL compatible PTX.

The changes do not affect the PTX Strings generated for the CUDA target
(nvptx64-nvidia-cuda), but are necessary for OpenCL (nvptx64-nvidia-nvcl).

Additionally, the data layout has been updated to what the NVPTX Backend requests/recommends.

Contributed-by: Philipp Schaad

Reviewers: Meinersbur, grosser, bollu

Reviewed By: grosser, bollu

Subscribers: jlebar, pollydev, llvm-commits, nemanjai, yaxunl, Anastasia

Tags: #polly

Differential Revision: https://reviews.llvm.org/D32215

llvm-svn: 301299
2017-04-25 08:08:29 +00:00
Tobias Grosser 7b5a4dfd46 Exploit BasicBlock::getModule to shorten code
Suggested-by: Roman Gareev <gareevroman@gmail.com>
llvm-svn: 299914
2017-04-11 04:59:13 +00:00
Tobias Grosser 67726b3260 SAdjust to recent change in constructor definition of AllocaInst
llvm-svn: 299913
2017-04-11 04:23:38 +00:00
Matt Arsenault b3e30c32ce Update for alloca construction changes
llvm-svn: 299905
2017-04-11 00:12:58 +00:00
Michael Kruse 895f5d8080 Remove llvm.lifetime.start/end in original region.
The current StackColoring algorithm does not correctly handle the
situation when some, but not all paths from a BB to the entry node
cross a llvm.lifetime.start. According to an interpretation of the
language reference at
http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic
this might be correct, but it would cost too much effort to handle
in StackColoring.

To be on the safe side, remove all lifetime markers even in the original
code version (they have never been copied to the optimized version)
to ensure that no path to the entry block will cross a
llvm.lifetime.start.

The same principle applies to paths the a function return and the
llvm.lifetime.end marker, so we remove them as well.

This fixes llvm.org/PR32251.

Also see the discussion at
http://lists.llvm.org/pipermail/llvm-dev/2017-March/111551.html

llvm-svn: 299585
2017-04-05 20:09:59 +00:00
Philip Pfaffe 447f175eb5 Fix formatting in LoopGenerators
llvm-svn: 299424
2017-04-04 10:22:17 +00:00
Philip Pfaffe 2d950f36ee [Polly][NewPM] Pull references to the legacy PM interface from utilities and helpers
Summary:
A couple of the utilities used to analyze or build IR make explicit use of the legacy PM on their interface, to access analysis results. This patch removes the legacy PM from the interface, and just passes the required results directly.

This shouldn't introduce any function changes, although the API technically allowed to obtain two different analysis results before, one passed by reference and one through the PM. I don't believe that was ever intended, however.

Reviewers: grosser, Meinersbur

Reviewed By: grosser

Subscribers: nemanjai, pollydev, llvm-commits

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31653

llvm-svn: 299423
2017-04-04 10:01:53 +00:00
Tobias Grosser 637be04b77 [PerfMonitor] Use Intrinsics::getDeclaration
Instead of creating the declaration ourselves, we obtain it directly from the
LLVM intrinsic definitions. This addresses a post-review comment for r299359.

Suggested-by: Hongzing Zheng <etherzhhb@gmail.com>
llvm-svn: 299360
2017-04-03 15:23:08 +00:00
Tobias Grosser 65371af2e1 [CodeGen] Add Performance Monitor
Add support for -polly-codegen-perf-monitoring. When performance monitoring
is enabled, we emit performance monitoring code during code generation that
prints after program exit statistics about the total number of cycles executed
as well as the number of cycles spent in scops. This gives an estimate on how
useful polyhedral optimizations might be for a given program.

Example output:

  Polly runtime information
  -------------------------
  Total: 783110081637
  Scops: 663718949365

In the future, we might also add functionality to measure how much time is spent
in optimized scops and how many cycles are spent in the fallback code.

Reviewers: bollu,sebpop

Tags: #polly

Differential Revision: https://reviews.llvm.org/D31599

llvm-svn: 299359
2017-04-03 14:55:37 +00:00
Tobias Grosser 696a1ee99d [PollyIRBuilder] Bound size of alias metadata
No-alias metadata grows quadratic in the size of arrays involved, which can
become very costly for large programs. This commit bounds the number of arrays
for which we construct no-alias information to ten. This is conservatively
correct, as we just provide less information to LLVM and speeds up the compile
time of one of my internal test cases from 'does-not-terminate' to
'finishes-in-less-than-a-minute'. In the future we might try to be more clever
here, but this change should provide a good baseline.

llvm-svn: 299352
2017-04-03 07:42:50 +00:00
Michael Kruse c3e9c1442d [ScopInfo] Introduce ScopStmt::contains(BB*). NFC.
Provide an common way for testing if a statement contains something
for region and block statements. First user is
RegionGenerator::addOperandToPHI.

Suggested-by: Tobias Grosser <tobias@grosser.es>
llvm-svn: 298617
2017-03-23 16:12:21 +00:00
Roman Gareev cdfb57dc46 Introduce another level of metadata to distinguish non-aliasing accesses
Introduce another level of alias metadata to distinguish the individual
non-aliasing accesses that have inter iteration alias-free base pointers
marked with "Inter iteration alias-free" mark nodes. It can be used to,
for example, distinguish different stores (loads) produced by unrolling of
the innermost loops and, subsequently, sink (hoist) them by LICM.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30606

llvm-svn: 298510
2017-03-22 14:25:24 +00:00
Roman Gareev 23df27682a Map the new load to the base pointer of the invariant load hoisted load
Map the new load to the base pointer of the invariant load hoisted load
to be able to find the alias information for it.

Reviewed-by: Tobias Grosser <tobias@grosser.es>

Differential Revision: https://reviews.llvm.org/D30605

llvm-svn: 298507
2017-03-22 13:57:53 +00:00